All of lore.kernel.org
 help / color / mirror / Atom feed
* [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-14  8:22 ` Zhang Haoyu
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-14  8:22 UTC (permalink / raw)
  To: qemu-devel, kvm

Hi, all
I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
so how to only unbind (part of) the VFs but PF?
I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
I think I misunderstand someting,
any advises?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-14  8:22 ` Zhang Haoyu
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-14  8:22 UTC (permalink / raw)
  To: qemu-devel, kvm

Hi, all
I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
so how to only unbind (part of) the VFs but PF?
I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
I think I misunderstand someting,
any advises?

Thanks,
Zhang Haoyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-14  8:22 ` [Qemu-devel] " Zhang Haoyu
@ 2014-08-14 12:44   ` Alex Williamson
  -1 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-14 12:44 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On Thu, 2014-08-14 at 16:22 +0800, Zhang Haoyu wrote:
> Hi, all
> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
> so how to only unbind (part of) the VFs but PF?
> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> I think I misunderstand someting,
> any advises?

This occurs when the PF is installed behind components in the system
that do not support PCIe Access Control Services (ACS).  The IOMMU group
contains both the PF and the VF because upstream transactions can be
re-routed downstream by these non-ACS components before being translated
by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
version and we might be able to give you some advise on how to work
around the problem.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-14 12:44   ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-14 12:44 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On Thu, 2014-08-14 at 16:22 +0800, Zhang Haoyu wrote:
> Hi, all
> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
> so how to only unbind (part of) the VFs but PF?
> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> I think I misunderstand someting,
> any advises?

This occurs when the PF is installed behind components in the system
that do not support PCIe Access Control Services (ACS).  The IOMMU group
contains both the PF and the VF because upstream transactions can be
re-routed downstream by these non-ACS components before being translated
by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
version and we might be able to give you some advise on how to work
around the problem.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-14  8:22 ` [Qemu-devel] " Zhang Haoyu
@ 2014-08-16  6:48   ` Zhang Haoyu
  -1 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-16  6:48 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm

>> Hi, all
>> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
>> so how to only unbind (part of) the VFs but PF?
>> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> I think I misunderstand someting,
>> any advises?
>
>This occurs when the PF is installed behind components in the system
>that do not support PCIe Access Control Services (ACS).  The IOMMU group
>contains both the PF and the VF because upstream transactions can be
>re-routed downstream by these non-ACS components before being translated
>by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>version and we might be able to give you some advise on how to work
>around the problem.  Thanks,
>
# lspci | grep Ether
02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)

I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
# lspci -t
-[0000:00]-+-00.0
           +-01.0-[01]--
           +-01.1-[02-03]--+-00.0
           |                      \-00.1
           +-02.0
           +-06.0-[04]--
           +-16.0
           +-1a.0
           +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
           |                               +-05.0-[08]--+-00.0
           |                               |                  \-00.1
           |                               +-06.0-[09]--+-00.0
           |                               |                  \-00.1
           |                               +-08.0-[0a]--+-00.0
           |                               |                  \-00.1
           |                               \-09.0-[0b]--+-00.0
           |                                                  \-00.1
           +-1d.0
           +-1e.0-[0c]----00.0
           +-1f.0
           +-1f.2
           \-1f.3

lspci -vvv -s 02.00.0
02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at e020 [size=32]
	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00002000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <1us, L1 <8us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [e0] Vital Product Data
		Unknown small resource type 06, will not decode more.
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
		IOVSta:	Migration-
		Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
		VF offset: 384, stride: 2, Device ID: 10ed
		Supported Page Size: 00000553, System Page Size: 00000001
		Region 0: Memory at 00000000dfb00000 (64-bit, prefetchable)
		Region 3: Memory at 00000000dfc00000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Kernel driver in use: ixgbe

# lspci -vvv -s 00:01.1
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: f7e00000-f7efffff
	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee002f8  Data: 0000
	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #3, Speed 8GT/s, Width x8, ASPM L0s L1, Latency L0 <256ns, L1 <8us
			ClockPM- Surprise- LLActRep- BwNot+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #2, PowerLimit 75.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [140 v1] Root Complex Link
		Desc:	PortNumber=03 ComponentID=01 EltType=Config
		Link0:	Desc:	TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+
			Addr:	00000000fed19000
	Capabilities: [d94 v1] #19
	Kernel driver in use: pcieport

The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
does 00:01.1 PCI bridge support ACS ?

Thanks,
Zhang Haoyu
>Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-16  6:48   ` Zhang Haoyu
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-16  6:48 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm

>> Hi, all
>> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
>> so how to only unbind (part of) the VFs but PF?
>> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> I think I misunderstand someting,
>> any advises?
>
>This occurs when the PF is installed behind components in the system
>that do not support PCIe Access Control Services (ACS).  The IOMMU group
>contains both the PF and the VF because upstream transactions can be
>re-routed downstream by these non-ACS components before being translated
>by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>version and we might be able to give you some advise on how to work
>around the problem.  Thanks,
>
# lspci | grep Ether
02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)

I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
# lspci -t
-[0000:00]-+-00.0
           +-01.0-[01]--
           +-01.1-[02-03]--+-00.0
           |                      \-00.1
           +-02.0
           +-06.0-[04]--
           +-16.0
           +-1a.0
           +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
           |                               +-05.0-[08]--+-00.0
           |                               |                  \-00.1
           |                               +-06.0-[09]--+-00.0
           |                               |                  \-00.1
           |                               +-08.0-[0a]--+-00.0
           |                               |                  \-00.1
           |                               \-09.0-[0b]--+-00.0
           |                                                  \-00.1
           +-1d.0
           +-1e.0-[0c]----00.0
           +-1f.0
           +-1f.2
           \-1f.3

lspci -vvv -s 02.00.0
02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
	Region 2: I/O ports at e020 [size=32]
	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00002000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <1us, L1 <8us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [e0] Vital Product Data
		Unknown small resource type 06, will not decode more.
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
		IOVCap:	Migration-, Interrupt Message Number: 000
		IOVCtl:	Enable- Migration- Interrupt- MSE- ARIHierarchy-
		IOVSta:	Migration-
		Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
		VF offset: 384, stride: 2, Device ID: 10ed
		Supported Page Size: 00000553, System Page Size: 00000001
		Region 0: Memory at 00000000dfb00000 (64-bit, prefetchable)
		Region 3: Memory at 00000000dfc00000 (64-bit, prefetchable)
		VF Migration: offset: 00000000, BIR: 0
	Kernel driver in use: ixgbe

# lspci -vvv -s 00:01.1
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: f7e00000-f7efffff
	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee002f8  Data: 0000
	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #3, Speed 8GT/s, Width x8, ASPM L0s L1, Latency L0 <256ns, L1 <8us
			ClockPM- Surprise- LLActRep- BwNot+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #2, PowerLimit 75.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [140 v1] Root Complex Link
		Desc:	PortNumber=03 ComponentID=01 EltType=Config
		Link0:	Desc:	TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+
			Addr:	00000000fed19000
	Capabilities: [d94 v1] #19
	Kernel driver in use: pcieport

The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
does 00:01.1 PCI bridge support ACS ?

Thanks,
Zhang Haoyu
>Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-16  6:48   ` [Qemu-devel] " Zhang Haoyu
@ 2014-08-16 13:29     ` Alex Williamson
  -1 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-16 13:29 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On Sat, 2014-08-16 at 14:48 +0800, Zhang Haoyu wrote:
> >> Hi, all
> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
> >> so how to only unbind (part of) the VFs but PF?
> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> >> I think I misunderstand someting,
> >> any advises?
> >
> >This occurs when the PF is installed behind components in the system
> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
> >contains both the PF and the VF because upstream transactions can be
> >re-routed downstream by these non-ACS components before being translated
> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
> >version and we might be able to give you some advise on how to work
> >around the problem.  Thanks,
> >
> # lspci | grep Ether
> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
> 
> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
> # lspci -t
> -[0000:00]-+-00.0
>            +-01.0-[01]--
>            +-01.1-[02-03]--+-00.0
>            |                      \-00.1
>            +-02.0
>            +-06.0-[04]--
>            +-16.0
>            +-1a.0
>            +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
>            |                               +-05.0-[08]--+-00.0
>            |                               |                  \-00.1
>            |                               +-06.0-[09]--+-00.0
>            |                               |                  \-00.1
>            |                               +-08.0-[0a]--+-00.0
>            |                               |                  \-00.1
>            |                               \-09.0-[0b]--+-00.0
>            |                                                  \-00.1
>            +-1d.0
>            +-1e.0-[0c]----00.0
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 
> lspci -vvv -s 02.00.0
> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 17
> 	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
> 	Region 2: I/O ports at e020 [size=32]
> 	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 	Capabilities: [e0] Vital Product Data
> 	Capabilities: [100 v1] Advanced Error Reporting
> 	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> 	Kernel driver in use: ixgbe
> 
> # lspci -vvv -s 00:01.1
> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
> 	I/O behind bridge: 0000e000-0000efff
> 	Memory behind bridge: f7e00000-f7efffff
> 	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
> 	Capabilities: [80] Power Management version 3
> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> 	Capabilities: [100 v1] Virtual Channel
> 	Capabilities: [140 v1] Root Complex Link
> 	Capabilities: [d94 v1] #19
> 	Kernel driver in use: pcieport
> 
> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
> does 00:01.1 PCI bridge support ACS ?

It does not and that's exactly the problem.  We must assume that the
root port can redirect a transaction from a subordinate device back to
another subordinate device without IOMMU translation when ACS support is
not present.  If you had a device plugged in below 00:01.0, we'd also
need to assume that non-IOMMU translated peer-to-peer between devices
behind either function, 00:01.0 or 00:01.1, is possible.

Intel has indicated that processor root ports for all Xeon class
processors should support ACS and have verified isolation for PCH based
root ports allowing us to support quirks in place of ACS support.  I'm
not aware of any efforts at Intel to verify isolation capabilities of
root ports on client processors.  They are however aware that lack of
ACS is a limiting factor for usability of VT-d, and I hope that we'll
see future products with ACS support.

Chances are good that the PCH root port at 00:1c.0 is supported by an
ACS quirk, but it seems that your system has a PCIe switch below the
root port.  If the PCIe switch downstream ports support ACS, then you
may be able to move the 82599 to the empty slot at bus 07 to separate
the VFs into different IOMMU groups.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-16 13:29     ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-16 13:29 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On Sat, 2014-08-16 at 14:48 +0800, Zhang Haoyu wrote:
> >> Hi, all
> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
> >> so how to only unbind (part of) the VFs but PF?
> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> >> I think I misunderstand someting,
> >> any advises?
> >
> >This occurs when the PF is installed behind components in the system
> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
> >contains both the PF and the VF because upstream transactions can be
> >re-routed downstream by these non-ACS components before being translated
> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
> >version and we might be able to give you some advise on how to work
> >around the problem.  Thanks,
> >
> # lspci | grep Ether
> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
> 
> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
> # lspci -t
> -[0000:00]-+-00.0
>            +-01.0-[01]--
>            +-01.1-[02-03]--+-00.0
>            |                      \-00.1
>            +-02.0
>            +-06.0-[04]--
>            +-16.0
>            +-1a.0
>            +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
>            |                               +-05.0-[08]--+-00.0
>            |                               |                  \-00.1
>            |                               +-06.0-[09]--+-00.0
>            |                               |                  \-00.1
>            |                               +-08.0-[0a]--+-00.0
>            |                               |                  \-00.1
>            |                               \-09.0-[0b]--+-00.0
>            |                                                  \-00.1
>            +-1d.0
>            +-1e.0-[0c]----00.0
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 
> lspci -vvv -s 02.00.0
> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 17
> 	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
> 	Region 2: I/O ports at e020 [size=32]
> 	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> 	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 	Capabilities: [e0] Vital Product Data
> 	Capabilities: [100 v1] Advanced Error Reporting
> 	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> 	Kernel driver in use: ixgbe
> 
> # lspci -vvv -s 00:01.1
> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
> 	I/O behind bridge: 0000e000-0000efff
> 	Memory behind bridge: f7e00000-f7efffff
> 	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
> 	Capabilities: [80] Power Management version 3
> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> 	Capabilities: [100 v1] Virtual Channel
> 	Capabilities: [140 v1] Root Complex Link
> 	Capabilities: [d94 v1] #19
> 	Kernel driver in use: pcieport
> 
> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
> does 00:01.1 PCI bridge support ACS ?

It does not and that's exactly the problem.  We must assume that the
root port can redirect a transaction from a subordinate device back to
another subordinate device without IOMMU translation when ACS support is
not present.  If you had a device plugged in below 00:01.0, we'd also
need to assume that non-IOMMU translated peer-to-peer between devices
behind either function, 00:01.0 or 00:01.1, is possible.

Intel has indicated that processor root ports for all Xeon class
processors should support ACS and have verified isolation for PCH based
root ports allowing us to support quirks in place of ACS support.  I'm
not aware of any efforts at Intel to verify isolation capabilities of
root ports on client processors.  They are however aware that lack of
ACS is a limiting factor for usability of VT-d, and I hope that we'll
see future products with ACS support.

Chances are good that the PCH root port at 00:1c.0 is supported by an
ACS quirk, but it seems that your system has a PCIe switch below the
root port.  If the PCIe switch downstream ports support ACS, then you
may be able to move the 82599 to the empty slot at bus 07 to separate
the VFs into different IOMMU groups.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-16 13:29     ` [Qemu-devel] " Alex Williamson
@ 2014-08-18  1:00       ` Zhang Haoyu
  -1 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-18  1:00 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm

>> >> Hi, all
>> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
>> >> so how to only unbind (part of) the VFs but PF?
>> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> >> I think I misunderstand someting,
>> >> any advises?
>> >
>> >This occurs when the PF is installed behind components in the system
>> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>> >contains both the PF and the VF because upstream transactions can be
>> >re-routed downstream by these non-ACS components before being translated
>> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>> >version and we might be able to give you some advise on how to work
>> >around the problem.  Thanks,
>> >
>> # lspci | grep Ether
>> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
>> 
>> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
>> # lspci -t
>> -[0000:00]-+-00.0
>>            +-01.0-[01]--
>>            +-01.1-[02-03]--+-00.0
>>            |                      \-00.1
>>            +-02.0
>>            +-06.0-[04]--
>>            +-16.0
>>            +-1a.0
>>            +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
>>            |                               +-05.0-[08]--+-00.0
>>            |                               |                  \-00.1
>>            |                               +-06.0-[09]--+-00.0
>>            |                               |                  \-00.1
>>            |                               +-08.0-[0a]--+-00.0
>>            |                               |                  \-00.1
>>            |                               \-09.0-[0b]--+-00.0
>>            |                                                  \-00.1
>>            +-1d.0
>>            +-1e.0-[0c]----00.0
>>            +-1f.0
>>            +-1f.2
>>            \-1f.3
>> 
>> lspci -vvv -s 02.00.0
>> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 64 bytes
>> 	Interrupt: pin A routed to IRQ 17
>> 	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
>> 	Region 2: I/O ports at e020 [size=32]
>> 	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
>> 	Capabilities: [40] Power Management version 3
>> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>> 	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
>> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
>> 	Capabilities: [e0] Vital Product Data
>> 	Capabilities: [100 v1] Advanced Error Reporting
>> 	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
>> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
>> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>> 	Kernel driver in use: ixgbe
>> 
>> # lspci -vvv -s 00:01.1
>> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
>> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 64 bytes
>> 	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
>> 	I/O behind bridge: 0000e000-0000efff
>> 	Memory behind bridge: f7e00000-f7efffff
>> 	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
>> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>> 	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
>> 	Capabilities: [80] Power Management version 3
>> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> 	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
>> 	Capabilities: [100 v1] Virtual Channel
>> 	Capabilities: [140 v1] Root Complex Link
>> 	Capabilities: [d94 v1] #19
>> 	Kernel driver in use: pcieport
>> 
>> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>> does 00:01.1 PCI bridge support ACS ?
>
>It does not and that's exactly the problem.  We must assume that the
>root port can redirect a transaction from a subordinate device back to
>another subordinate device without IOMMU translation when ACS support is
>not present.  If you had a device plugged in below 00:01.0, we'd also
>need to assume that non-IOMMU translated peer-to-peer between devices
>behind either function, 00:01.0 or 00:01.1, is possible.
>
>Intel has indicated that processor root ports for all Xeon class
>processors should support ACS and have verified isolation for PCH based
>root ports allowing us to support quirks in place of ACS support.  I'm
>not aware of any efforts at Intel to verify isolation capabilities of
>root ports on client processors.  They are however aware that lack of
>ACS is a limiting factor for usability of VT-d, and I hope that we'll
>see future products with ACS support.
>
>Chances are good that the PCH root port at 00:1c.0 is supported by an
>ACS quirk, but it seems that your system has a PCIe switch below the
>root port.  If the PCIe switch downstream ports support ACS, then you
>may be able to move the 82599 to the empty slot at bus 07 to separate
>the VFs into different IOMMU groups.  Thanks,
>
Thanks, Alex,
how to tell whether a PCI bridge/deivce support ACS capability?

I perform "lspci -vvv -s | grep -i ACS", nothing matched.
# lspci -vvv -s 00:1c.0
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=05, subordinate=0b, sec-latency=0
	I/O behind bridge: 00002000-00003fff
	Memory behind bridge: f7800000-f7cfffff
	Prefetchable memory behind bridge: 00000000f0000000-00000000f03fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+ ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: pcieport

Thanks,
Zhang Haoyu
>Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-18  1:00       ` Zhang Haoyu
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-18  1:00 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm

>> >> Hi, all
>> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
>> >> so how to only unbind (part of) the VFs but PF?
>> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> >> I think I misunderstand someting,
>> >> any advises?
>> >
>> >This occurs when the PF is installed behind components in the system
>> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>> >contains both the PF and the VF because upstream transactions can be
>> >re-routed downstream by these non-ACS components before being translated
>> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>> >version and we might be able to give you some advise on how to work
>> >around the problem.  Thanks,
>> >
>> # lspci | grep Ether
>> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
>> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
>> 
>> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
>> # lspci -t
>> -[0000:00]-+-00.0
>>            +-01.0-[01]--
>>            +-01.1-[02-03]--+-00.0
>>            |                      \-00.1
>>            +-02.0
>>            +-06.0-[04]--
>>            +-16.0
>>            +-1a.0
>>            +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
>>            |                               +-05.0-[08]--+-00.0
>>            |                               |                  \-00.1
>>            |                               +-06.0-[09]--+-00.0
>>            |                               |                  \-00.1
>>            |                               +-08.0-[0a]--+-00.0
>>            |                               |                  \-00.1
>>            |                               \-09.0-[0b]--+-00.0
>>            |                                                  \-00.1
>>            +-1d.0
>>            +-1e.0-[0c]----00.0
>>            +-1f.0
>>            +-1f.2
>>            \-1f.3
>> 
>> lspci -vvv -s 02.00.0
>> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 64 bytes
>> 	Interrupt: pin A routed to IRQ 17
>> 	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
>> 	Region 2: I/O ports at e020 [size=32]
>> 	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
>> 	Capabilities: [40] Power Management version 3
>> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>> 	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
>> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
>> 	Capabilities: [e0] Vital Product Data
>> 	Capabilities: [100 v1] Advanced Error Reporting
>> 	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
>> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
>> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>> 	Kernel driver in use: ixgbe
>> 
>> # lspci -vvv -s 00:01.1
>> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
>> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> 	Latency: 0, Cache Line Size: 64 bytes
>> 	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
>> 	I/O behind bridge: 0000e000-0000efff
>> 	Memory behind bridge: f7e00000-f7efffff
>> 	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
>> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>> 	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
>> 	Capabilities: [80] Power Management version 3
>> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
>> 	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
>> 	Capabilities: [100 v1] Virtual Channel
>> 	Capabilities: [140 v1] Root Complex Link
>> 	Capabilities: [d94 v1] #19
>> 	Kernel driver in use: pcieport
>> 
>> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>> does 00:01.1 PCI bridge support ACS ?
>
>It does not and that's exactly the problem.  We must assume that the
>root port can redirect a transaction from a subordinate device back to
>another subordinate device without IOMMU translation when ACS support is
>not present.  If you had a device plugged in below 00:01.0, we'd also
>need to assume that non-IOMMU translated peer-to-peer between devices
>behind either function, 00:01.0 or 00:01.1, is possible.
>
>Intel has indicated that processor root ports for all Xeon class
>processors should support ACS and have verified isolation for PCH based
>root ports allowing us to support quirks in place of ACS support.  I'm
>not aware of any efforts at Intel to verify isolation capabilities of
>root ports on client processors.  They are however aware that lack of
>ACS is a limiting factor for usability of VT-d, and I hope that we'll
>see future products with ACS support.
>
>Chances are good that the PCH root port at 00:1c.0 is supported by an
>ACS quirk, but it seems that your system has a PCIe switch below the
>root port.  If the PCIe switch downstream ports support ACS, then you
>may be able to move the 82599 to the empty slot at bus 07 to separate
>the VFs into different IOMMU groups.  Thanks,
>
Thanks, Alex,
how to tell whether a PCI bridge/deivce support ACS capability?

I perform "lspci -vvv -s | grep -i ACS", nothing matched.
# lspci -vvv -s 00:1c.0
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=05, subordinate=0b, sec-latency=0
	I/O behind bridge: 00002000-00003fff
	Memory behind bridge: f7800000-f7cfffff
	Prefetchable memory behind bridge: 00000000f0000000-00000000f03fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+ ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1
	Capabilities: [a0] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: pcieport

Thanks,
Zhang Haoyu
>Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-18  1:00       ` [Qemu-devel] " Zhang Haoyu
@ 2014-08-18  1:14         ` Alex Williamson
  -1 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-18  1:14 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On Mon, 2014-08-18 at 09:00 +0800, Zhang Haoyu wrote:
> >> >> Hi, all
> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
> >> >> so how to only unbind (part of) the VFs but PF?
> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> >> >> I think I misunderstand someting,
> >> >> any advises?
> >> >
> >> >This occurs when the PF is installed behind components in the system
> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
> >> >contains both the PF and the VF because upstream transactions can be
> >> >re-routed downstream by these non-ACS components before being translated
> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
> >> >version and we might be able to give you some advise on how to work
> >> >around the problem.  Thanks,
> >> >
> >> # lspci | grep Ether
> >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
> >> 
> >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
> >> # lspci -t
> >> -[0000:00]-+-00.0
> >>            +-01.0-[01]--
> >>            +-01.1-[02-03]--+-00.0
> >>            |                      \-00.1
> >>            +-02.0
> >>            +-06.0-[04]--
> >>            +-16.0
> >>            +-1a.0
> >>            +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
> >>            |                               +-05.0-[08]--+-00.0
> >>            |                               |                  \-00.1
> >>            |                               +-06.0-[09]--+-00.0
> >>            |                               |                  \-00.1
> >>            |                               +-08.0-[0a]--+-00.0
> >>            |                               |                  \-00.1
> >>            |                               \-09.0-[0b]--+-00.0
> >>            |                                                  \-00.1
> >>            +-1d.0
> >>            +-1e.0-[0c]----00.0
> >>            +-1f.0
> >>            +-1f.2
> >>            \-1f.3
> >> 
> >> lspci -vvv -s 02.00.0
> >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >> 	Latency: 0, Cache Line Size: 64 bytes
> >> 	Interrupt: pin A routed to IRQ 17
> >> 	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
> >> 	Region 2: I/O ports at e020 [size=32]
> >> 	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
> >> 	Capabilities: [40] Power Management version 3
> >> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> >> 	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
> >> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> >> 	Capabilities: [e0] Vital Product Data
> >> 	Capabilities: [100 v1] Advanced Error Reporting
> >> 	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
> >> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> >> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> >> 	Kernel driver in use: ixgbe
> >> 
> >> # lspci -vvv -s 00:01.1
> >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
> >> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >> 	Latency: 0, Cache Line Size: 64 bytes
> >> 	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
> >> 	I/O behind bridge: 0000e000-0000efff
> >> 	Memory behind bridge: f7e00000-f7efffff
> >> 	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
> >> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> >> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
> >> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >> 	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
> >> 	Capabilities: [80] Power Management version 3
> >> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >> 	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> >> 	Capabilities: [100 v1] Virtual Channel
> >> 	Capabilities: [140 v1] Root Complex Link
> >> 	Capabilities: [d94 v1] #19
> >> 	Kernel driver in use: pcieport
> >> 
> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
> >> does 00:01.1 PCI bridge support ACS ?
> >
> >It does not and that's exactly the problem.  We must assume that the
> >root port can redirect a transaction from a subordinate device back to
> >another subordinate device without IOMMU translation when ACS support is
> >not present.  If you had a device plugged in below 00:01.0, we'd also
> >need to assume that non-IOMMU translated peer-to-peer between devices
> >behind either function, 00:01.0 or 00:01.1, is possible.
> >
> >Intel has indicated that processor root ports for all Xeon class
> >processors should support ACS and have verified isolation for PCH based
> >root ports allowing us to support quirks in place of ACS support.  I'm
> >not aware of any efforts at Intel to verify isolation capabilities of
> >root ports on client processors.  They are however aware that lack of
> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
> >see future products with ACS support.
> >
> >Chances are good that the PCH root port at 00:1c.0 is supported by an
> >ACS quirk, but it seems that your system has a PCIe switch below the
> >root port.  If the PCIe switch downstream ports support ACS, then you
> >may be able to move the 82599 to the empty slot at bus 07 to separate
> >the VFs into different IOMMU groups.  Thanks,
> >
> Thanks, Alex,
> how to tell whether a PCI bridge/deivce support ACS capability?
> 
> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
> # lspci -vvv -s 00:1c.0
> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])


Ideally there would be capabilities for it, something like:

Capabilities [xxx] Access Control Services...

But, Intel failed to provide this, so we enable "effective" ACS
capabilities via a quirk:

drivers/pci/quirks.c:
/*
 * Many Intel PCH root ports do provide ACS-like features to disable peer
 * transactions and validate bus numbers in requests, but do not provide an
 * actual PCIe ACS capability.  This is the list of device IDs known to fall
 * into that category as provided by Intel in Red Hat bugzilla 1037684.
 */
static const u16 pci_quirk_intel_pch_acs_ids[] = {
        /* Ibexpeak PCH */
        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
        /* Cougarpoint PCH */
        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
        /* Pantherpoint PCH */
        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
        /* Lynxpoint-H PCH */
        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
        /* Lynxpoint-LP PCH */
        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
        /* Wildcat PCH */
        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
        /* Patsburg (X79) PCH */
        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
};

Hopefully if you run 'lspci -n', you'll see your device ID listed among
these.  We don't currently have any quirks for PCIe switches, so if your
IOMMU group is still bigger than it should be, that may be the reason.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-18  1:14         ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-18  1:14 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm

On Mon, 2014-08-18 at 09:00 +0800, Zhang Haoyu wrote:
> >> >> Hi, all
> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ...,
> >> >> so how to only unbind (part of) the VFs but PF?
> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> >> >> I think I misunderstand someting,
> >> >> any advises?
> >> >
> >> >This occurs when the PF is installed behind components in the system
> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
> >> >contains both the PF and the VF because upstream transactions can be
> >> >re-routed downstream by these non-ACS components before being translated
> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
> >> >version and we might be able to give you some advise on how to work
> >> >around the problem.  Thanks,
> >> >
> >> # lspci | grep Ether
> >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
> >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
> >> 
> >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM,
> >> # lspci -t
> >> -[0000:00]-+-00.0
> >>            +-01.0-[01]--
> >>            +-01.1-[02-03]--+-00.0
> >>            |                      \-00.1
> >>            +-02.0
> >>            +-06.0-[04]--
> >>            +-16.0
> >>            +-1a.0
> >>            +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]--
> >>            |                               +-05.0-[08]--+-00.0
> >>            |                               |                  \-00.1
> >>            |                               +-06.0-[09]--+-00.0
> >>            |                               |                  \-00.1
> >>            |                               +-08.0-[0a]--+-00.0
> >>            |                               |                  \-00.1
> >>            |                               \-09.0-[0b]--+-00.0
> >>            |                                                  \-00.1
> >>            +-1d.0
> >>            +-1e.0-[0c]----00.0
> >>            +-1f.0
> >>            +-1f.2
> >>            \-1f.3
> >> 
> >> lspci -vvv -s 02.00.0
> >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> >> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >> 	Latency: 0, Cache Line Size: 64 bytes
> >> 	Interrupt: pin A routed to IRQ 17
> >> 	Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
> >> 	Region 2: I/O ports at e020 [size=32]
> >> 	Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K]
> >> 	Capabilities: [40] Power Management version 3
> >> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> >> 	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
> >> 	Capabilities: [a0] Express (v2) Endpoint, MSI 00
> >> 	Capabilities: [e0] Vital Product Data
> >> 	Capabilities: [100 v1] Advanced Error Reporting
> >> 	Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2
> >> 	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
> >> 	Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
> >> 	Kernel driver in use: ixgbe
> >> 
> >> # lspci -vvv -s 00:01.1
> >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
> >> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >> 	Latency: 0, Cache Line Size: 64 bytes
> >> 	Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
> >> 	I/O behind bridge: 0000e000-0000efff
> >> 	Memory behind bridge: f7e00000-f7efffff
> >> 	Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff
> >> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> >> 	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
> >> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >> 	Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port
> >> 	Capabilities: [80] Power Management version 3
> >> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> >> 	Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> >> 	Capabilities: [100 v1] Virtual Channel
> >> 	Capabilities: [140 v1] Root Complex Link
> >> 	Capabilities: [d94 v1] #19
> >> 	Kernel driver in use: pcieport
> >> 
> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
> >> does 00:01.1 PCI bridge support ACS ?
> >
> >It does not and that's exactly the problem.  We must assume that the
> >root port can redirect a transaction from a subordinate device back to
> >another subordinate device without IOMMU translation when ACS support is
> >not present.  If you had a device plugged in below 00:01.0, we'd also
> >need to assume that non-IOMMU translated peer-to-peer between devices
> >behind either function, 00:01.0 or 00:01.1, is possible.
> >
> >Intel has indicated that processor root ports for all Xeon class
> >processors should support ACS and have verified isolation for PCH based
> >root ports allowing us to support quirks in place of ACS support.  I'm
> >not aware of any efforts at Intel to verify isolation capabilities of
> >root ports on client processors.  They are however aware that lack of
> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
> >see future products with ACS support.
> >
> >Chances are good that the PCH root port at 00:1c.0 is supported by an
> >ACS quirk, but it seems that your system has a PCIe switch below the
> >root port.  If the PCIe switch downstream ports support ACS, then you
> >may be able to move the 82599 to the empty slot at bus 07 to separate
> >the VFs into different IOMMU groups.  Thanks,
> >
> Thanks, Alex,
> how to tell whether a PCI bridge/deivce support ACS capability?
> 
> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
> # lspci -vvv -s 00:1c.0
> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])


Ideally there would be capabilities for it, something like:

Capabilities [xxx] Access Control Services...

But, Intel failed to provide this, so we enable "effective" ACS
capabilities via a quirk:

drivers/pci/quirks.c:
/*
 * Many Intel PCH root ports do provide ACS-like features to disable peer
 * transactions and validate bus numbers in requests, but do not provide an
 * actual PCIe ACS capability.  This is the list of device IDs known to fall
 * into that category as provided by Intel in Red Hat bugzilla 1037684.
 */
static const u16 pci_quirk_intel_pch_acs_ids[] = {
        /* Ibexpeak PCH */
        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
        /* Cougarpoint PCH */
        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
        /* Pantherpoint PCH */
        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
        /* Lynxpoint-H PCH */
        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
        /* Lynxpoint-LP PCH */
        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
        /* Wildcat PCH */
        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
        /* Patsburg (X79) PCH */
        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
};

Hopefully if you run 'lspci -n', you'll see your device ID listed among
these.  We don't currently have any quirks for PCIe switches, so if your
IOMMU group is still bigger than it should be, that may be the reason.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-18  1:14         ` [Qemu-devel] " Alex Williamson
@ 2014-08-18  8:46           ` Zhang Haoyu
  -1 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-18  8:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, kvm, bhelgaas, donald.d.dugger

>> >> >> Hi, all
>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
>> >> >> so how to only unbind (part of) the VFs but PF?
>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> >> >> I think I misunderstand someting,
>> >> >> any advises?
>> >> >
>> >> >This occurs when the PF is installed behind components in the system
>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>> >> >contains both the PF and the VF because upstream transactions can be
>> >> >re-routed downstream by these non-ACS components before being translated
>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>> >> >version and we might be able to give you some advise on how to work
>> >> >around the problem.  Thanks,
>> >> >
>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>> >> does 00:01.1 PCI bridge support ACS ?
>> >
>> >It does not and that's exactly the problem.  We must assume that the
>> >root port can redirect a transaction from a subordinate device back to
>> >another subordinate device without IOMMU translation when ACS support is
>> >not present.  If you had a device plugged in below 00:01.0, we'd also
>> >need to assume that non-IOMMU translated peer-to-peer between devices
>> >behind either function, 00:01.0 or 00:01.1, is possible.
>> >
>> >Intel has indicated that processor root ports for all Xeon class
>> >processors should support ACS and have verified isolation for PCH based
>> >root ports allowing us to support quirks in place of ACS support.  I'm
>> >not aware of any efforts at Intel to verify isolation capabilities of
>> >root ports on client processors.  They are however aware that lack of
>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
>> >see future products with ACS support.
>> >
>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
>> >ACS quirk, but it seems that your system has a PCIe switch below the
>> >root port.  If the PCIe switch downstream ports support ACS, then you
>> >may be able to move the 82599 to the empty slot at bus 07 to separate
>> >the VFs into different IOMMU groups.  Thanks,
>> >
>> Thanks, Alex,
>> how to tell whether a PCI bridge/deivce support ACS capability?
>> 
>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
>> # lspci -vvv -s 00:1c.0
>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
>
>
>Ideally there would be capabilities for it, something like:
>
>Capabilities [xxx] Access Control Services...
>
>But, Intel failed to provide this, so we enable "effective" ACS
>capabilities via a quirk:
>
>drivers/pci/quirks.c:
>/*
> * Many Intel PCH root ports do provide ACS-like features to disable peer
> * transactions and validate bus numbers in requests, but do not provide an
> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> */
>static const u16 pci_quirk_intel_pch_acs_ids[] = {
>        /* Ibexpeak PCH */
>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
>        /* Cougarpoint PCH */
>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
>        /* Pantherpoint PCH */
>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
>        /* Lynxpoint-H PCH */
>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
>        /* Lynxpoint-LP PCH */
>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
>        /* Wildcat PCH */
>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
>        /* Patsburg (X79) PCH */
>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
>};
>
>Hopefully if you run 'lspci -n', you'll see your device ID listed among
>these.  We don't currently have any quirks for PCIe switches, so if your
>IOMMU group is still bigger than it should be, that may be the reason.
>Thanks,
>
Using device specific mechanisms to enable and verify ACS-like capability is okay,
but with regard to those devices which completely don't support ACS-like capabilities, 
what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
and how to reduce the risk of data corruption and info leakage between VMs?

Thanks,
Zhang Haoyu
>Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-18  8:46           ` Zhang Haoyu
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-18  8:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: bhelgaas, donald.d.dugger, qemu-devel, kvm

>> >> >> Hi, all
>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
>> >> >> so how to only unbind (part of) the VFs but PF?
>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> >> >> I think I misunderstand someting,
>> >> >> any advises?
>> >> >
>> >> >This occurs when the PF is installed behind components in the system
>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>> >> >contains both the PF and the VF because upstream transactions can be
>> >> >re-routed downstream by these non-ACS components before being translated
>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>> >> >version and we might be able to give you some advise on how to work
>> >> >around the problem.  Thanks,
>> >> >
>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>> >> does 00:01.1 PCI bridge support ACS ?
>> >
>> >It does not and that's exactly the problem.  We must assume that the
>> >root port can redirect a transaction from a subordinate device back to
>> >another subordinate device without IOMMU translation when ACS support is
>> >not present.  If you had a device plugged in below 00:01.0, we'd also
>> >need to assume that non-IOMMU translated peer-to-peer between devices
>> >behind either function, 00:01.0 or 00:01.1, is possible.
>> >
>> >Intel has indicated that processor root ports for all Xeon class
>> >processors should support ACS and have verified isolation for PCH based
>> >root ports allowing us to support quirks in place of ACS support.  I'm
>> >not aware of any efforts at Intel to verify isolation capabilities of
>> >root ports on client processors.  They are however aware that lack of
>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
>> >see future products with ACS support.
>> >
>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
>> >ACS quirk, but it seems that your system has a PCIe switch below the
>> >root port.  If the PCIe switch downstream ports support ACS, then you
>> >may be able to move the 82599 to the empty slot at bus 07 to separate
>> >the VFs into different IOMMU groups.  Thanks,
>> >
>> Thanks, Alex,
>> how to tell whether a PCI bridge/deivce support ACS capability?
>> 
>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
>> # lspci -vvv -s 00:1c.0
>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
>
>
>Ideally there would be capabilities for it, something like:
>
>Capabilities [xxx] Access Control Services...
>
>But, Intel failed to provide this, so we enable "effective" ACS
>capabilities via a quirk:
>
>drivers/pci/quirks.c:
>/*
> * Many Intel PCH root ports do provide ACS-like features to disable peer
> * transactions and validate bus numbers in requests, but do not provide an
> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> */
>static const u16 pci_quirk_intel_pch_acs_ids[] = {
>        /* Ibexpeak PCH */
>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
>        /* Cougarpoint PCH */
>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
>        /* Pantherpoint PCH */
>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
>        /* Lynxpoint-H PCH */
>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
>        /* Lynxpoint-LP PCH */
>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
>        /* Wildcat PCH */
>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
>        /* Patsburg (X79) PCH */
>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
>};
>
>Hopefully if you run 'lspci -n', you'll see your device ID listed among
>these.  We don't currently have any quirks for PCIe switches, so if your
>IOMMU group is still bigger than it should be, that may be the reason.
>Thanks,
>
Using device specific mechanisms to enable and verify ACS-like capability is okay,
but with regard to those devices which completely don't support ACS-like capabilities, 
what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
and how to reduce the risk of data corruption and info leakage between VMs?

Thanks,
Zhang Haoyu
>Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-18  8:46           ` [Qemu-devel] " Zhang Haoyu
@ 2014-08-18  9:49             ` Zhang Haoyu
  -1 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-18  9:49 UTC (permalink / raw)
  To: Zhang Haoyu, Alex Williamson
  Cc: bhelgaas, xudong.hao, donald.d.dugger, qemu-devel, kvm

>>> >> >> Hi, all
>>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
>>> >> >> so how to only unbind (part of) the VFs but PF?
>>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>>> >> >> I think I misunderstand someting,
>>> >> >> any advises?
>>> >> >
>>> >> >This occurs when the PF is installed behind components in the system
>>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>>> >> >contains both the PF and the VF because upstream transactions can be
>>> >> >re-routed downstream by these non-ACS components before being translated
>>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>>> >> >version and we might be able to give you some advise on how to work
>>> >> >around the problem.  Thanks,
>>> >> >
>>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>>> >> does 00:01.1 PCI bridge support ACS ?
>>> >
>>> >It does not and that's exactly the problem.  We must assume that the
>>> >root port can redirect a transaction from a subordinate device back to
>>> >another subordinate device without IOMMU translation when ACS support is
>>> >not present.  If you had a device plugged in below 00:01.0, we'd also
>>> >need to assume that non-IOMMU translated peer-to-peer between devices
>>> >behind either function, 00:01.0 or 00:01.1, is possible.
>>> >
>>> >Intel has indicated that processor root ports for all Xeon class
>>> >processors should support ACS and have verified isolation for PCH based
>>> >root ports allowing us to support quirks in place of ACS support.  I'm
>>> >not aware of any efforts at Intel to verify isolation capabilities of
>>> >root ports on client processors.  They are however aware that lack of
>>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
>>> >see future products with ACS support.
>>> >
>>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
>>> >ACS quirk, but it seems that your system has a PCIe switch below the
>>> >root port.  If the PCIe switch downstream ports support ACS, then you
>>> >may be able to move the 82599 to the empty slot at bus 07 to separate
>>> >the VFs into different IOMMU groups.  Thanks,
>>> >
>>> Thanks, Alex,
>>> how to tell whether a PCI bridge/deivce support ACS capability?
>>> 
>>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
>>> # lspci -vvv -s 00:1c.0
>>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
>>
>>
>>Ideally there would be capabilities for it, something like:
>>
>>Capabilities [xxx] Access Control Services...
>>
>>But, Intel failed to provide this, so we enable "effective" ACS
>>capabilities via a quirk:
>>
>>drivers/pci/quirks.c:
>>/*
>> * Many Intel PCH root ports do provide ACS-like features to disable peer
>> * transactions and validate bus numbers in requests, but do not provide an
>> * actual PCIe ACS capability.  This is the list of device IDs known to fall
>> * into that category as provided by Intel in Red Hat bugzilla 1037684.
>> */
>>static const u16 pci_quirk_intel_pch_acs_ids[] = {
>>        /* Ibexpeak PCH */
>>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
>>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
>>        /* Cougarpoint PCH */
>>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
>>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
>>        /* Pantherpoint PCH */
>>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
>>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
>>        /* Lynxpoint-H PCH */
>>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
>>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
>>        /* Lynxpoint-LP PCH */
>>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
>>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
>>        /* Wildcat PCH */
>>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
>>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
>>        /* Patsburg (X79) PCH */
>>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
>>};
>>
>>Hopefully if you run 'lspci -n', you'll see your device ID listed among
>>these.  We don't currently have any quirks for PCIe switches, so if your
>>IOMMU group is still bigger than it should be, that may be the reason.
>>Thanks,
>>
>Using device specific mechanisms to enable and verify ACS-like capability is okay,
>but with regard to those devices which completely don't support ACS-like capabilities, 
>what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
>and how to reduce the risk of data corruption and info leakage between VMs?
>
Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ?

>Thanks,
>Zhang Haoyu
>>Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-18  9:49             ` Zhang Haoyu
  0 siblings, 0 replies; 18+ messages in thread
From: Zhang Haoyu @ 2014-08-18  9:49 UTC (permalink / raw)
  To: Zhang Haoyu, Alex Williamson
  Cc: bhelgaas, xudong.hao, donald.d.dugger, qemu-devel, kvm

>>> >> >> Hi, all
>>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
>>> >> >> so how to only unbind (part of) the VFs but PF?
>>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>>> >> >> I think I misunderstand someting,
>>> >> >> any advises?
>>> >> >
>>> >> >This occurs when the PF is installed behind components in the system
>>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>>> >> >contains both the PF and the VF because upstream transactions can be
>>> >> >re-routed downstream by these non-ACS components before being translated
>>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>>> >> >version and we might be able to give you some advise on how to work
>>> >> >around the problem.  Thanks,
>>> >> >
>>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>>> >> does 00:01.1 PCI bridge support ACS ?
>>> >
>>> >It does not and that's exactly the problem.  We must assume that the
>>> >root port can redirect a transaction from a subordinate device back to
>>> >another subordinate device without IOMMU translation when ACS support is
>>> >not present.  If you had a device plugged in below 00:01.0, we'd also
>>> >need to assume that non-IOMMU translated peer-to-peer between devices
>>> >behind either function, 00:01.0 or 00:01.1, is possible.
>>> >
>>> >Intel has indicated that processor root ports for all Xeon class
>>> >processors should support ACS and have verified isolation for PCH based
>>> >root ports allowing us to support quirks in place of ACS support.  I'm
>>> >not aware of any efforts at Intel to verify isolation capabilities of
>>> >root ports on client processors.  They are however aware that lack of
>>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
>>> >see future products with ACS support.
>>> >
>>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
>>> >ACS quirk, but it seems that your system has a PCIe switch below the
>>> >root port.  If the PCIe switch downstream ports support ACS, then you
>>> >may be able to move the 82599 to the empty slot at bus 07 to separate
>>> >the VFs into different IOMMU groups.  Thanks,
>>> >
>>> Thanks, Alex,
>>> how to tell whether a PCI bridge/deivce support ACS capability?
>>> 
>>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
>>> # lspci -vvv -s 00:1c.0
>>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
>>
>>
>>Ideally there would be capabilities for it, something like:
>>
>>Capabilities [xxx] Access Control Services...
>>
>>But, Intel failed to provide this, so we enable "effective" ACS
>>capabilities via a quirk:
>>
>>drivers/pci/quirks.c:
>>/*
>> * Many Intel PCH root ports do provide ACS-like features to disable peer
>> * transactions and validate bus numbers in requests, but do not provide an
>> * actual PCIe ACS capability.  This is the list of device IDs known to fall
>> * into that category as provided by Intel in Red Hat bugzilla 1037684.
>> */
>>static const u16 pci_quirk_intel_pch_acs_ids[] = {
>>        /* Ibexpeak PCH */
>>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
>>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
>>        /* Cougarpoint PCH */
>>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
>>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
>>        /* Pantherpoint PCH */
>>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
>>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
>>        /* Lynxpoint-H PCH */
>>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
>>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
>>        /* Lynxpoint-LP PCH */
>>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
>>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
>>        /* Wildcat PCH */
>>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
>>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
>>        /* Patsburg (X79) PCH */
>>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
>>};
>>
>>Hopefully if you run 'lspci -n', you'll see your device ID listed among
>>these.  We don't currently have any quirks for PCIe switches, so if your
>>IOMMU group is still bigger than it should be, that may be the reason.
>>Thanks,
>>
>Using device specific mechanisms to enable and verify ACS-like capability is okay,
>but with regard to those devices which completely don't support ACS-like capabilities, 
>what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
>and how to reduce the risk of data corruption and info leakage between VMs?
>
Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ?

>Thanks,
>Zhang Haoyu
>>Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [questions] about using vfio to assign sr-iov vf to vm
  2014-08-18  9:49             ` [Qemu-devel] " Zhang Haoyu
@ 2014-08-18 12:53               ` Alex Williamson
  -1 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-18 12:53 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: qemu-devel, kvm, bhelgaas, donald.d.dugger, xudong.hao

On Mon, 2014-08-18 at 17:49 +0800, Zhang Haoyu wrote:
> >>> >> >> Hi, all
> >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
> >>> >> >> so how to only unbind (part of) the VFs but PF?
> >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> >>> >> >> I think I misunderstand someting,
> >>> >> >> any advises?
> >>> >> >
> >>> >> >This occurs when the PF is installed behind components in the system
> >>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
> >>> >> >contains both the PF and the VF because upstream transactions can be
> >>> >> >re-routed downstream by these non-ACS components before being translated
> >>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
> >>> >> >version and we might be able to give you some advise on how to work
> >>> >> >around the problem.  Thanks,
> >>> >> >
> >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
> >>> >> does 00:01.1 PCI bridge support ACS ?
> >>> >
> >>> >It does not and that's exactly the problem.  We must assume that the
> >>> >root port can redirect a transaction from a subordinate device back to
> >>> >another subordinate device without IOMMU translation when ACS support is
> >>> >not present.  If you had a device plugged in below 00:01.0, we'd also
> >>> >need to assume that non-IOMMU translated peer-to-peer between devices
> >>> >behind either function, 00:01.0 or 00:01.1, is possible.
> >>> >
> >>> >Intel has indicated that processor root ports for all Xeon class
> >>> >processors should support ACS and have verified isolation for PCH based
> >>> >root ports allowing us to support quirks in place of ACS support.  I'm
> >>> >not aware of any efforts at Intel to verify isolation capabilities of
> >>> >root ports on client processors.  They are however aware that lack of
> >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
> >>> >see future products with ACS support.
> >>> >
> >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
> >>> >ACS quirk, but it seems that your system has a PCIe switch below the
> >>> >root port.  If the PCIe switch downstream ports support ACS, then you
> >>> >may be able to move the 82599 to the empty slot at bus 07 to separate
> >>> >the VFs into different IOMMU groups.  Thanks,
> >>> >
> >>> Thanks, Alex,
> >>> how to tell whether a PCI bridge/deivce support ACS capability?
> >>> 
> >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
> >>> # lspci -vvv -s 00:1c.0
> >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
> >>
> >>
> >>Ideally there would be capabilities for it, something like:
> >>
> >>Capabilities [xxx] Access Control Services...
> >>
> >>But, Intel failed to provide this, so we enable "effective" ACS
> >>capabilities via a quirk:
> >>
> >>drivers/pci/quirks.c:
> >>/*
> >> * Many Intel PCH root ports do provide ACS-like features to disable peer
> >> * transactions and validate bus numbers in requests, but do not provide an
> >> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> >> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> >> */
> >>static const u16 pci_quirk_intel_pch_acs_ids[] = {
> >>        /* Ibexpeak PCH */
> >>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
> >>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
> >>        /* Cougarpoint PCH */
> >>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
> >>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
> >>        /* Pantherpoint PCH */
> >>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
> >>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
> >>        /* Lynxpoint-H PCH */
> >>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
> >>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
> >>        /* Lynxpoint-LP PCH */
> >>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
> >>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
> >>        /* Wildcat PCH */
> >>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
> >>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
> >>        /* Patsburg (X79) PCH */
> >>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
> >>};
> >>
> >>Hopefully if you run 'lspci -n', you'll see your device ID listed among
> >>these.  We don't currently have any quirks for PCIe switches, so if your
> >>IOMMU group is still bigger than it should be, that may be the reason.
> >>Thanks,
> >>
> >Using device specific mechanisms to enable and verify ACS-like capability is okay,
> >but with regard to those devices which completely don't support ACS-like capabilities, 
> >what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
> >and how to reduce the risk of data corruption and info leakage between VMs?
> >
> Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ?

You're welcome to use that patch, but you do so at your own risk.
Upstream is not willing to accept that patch because incorrectly
assuming isolation can lead to subtle issues which are difficult or
impossible to support.  This is why we only support ACS reported
isolation or vendor verified quirks.  If you're able to work with the
component vendor to verify isolation, we'd welcome more quirks in this
area.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
@ 2014-08-18 12:53               ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2014-08-18 12:53 UTC (permalink / raw)
  To: Zhang Haoyu; +Cc: bhelgaas, xudong.hao, donald.d.dugger, qemu-devel, kvm

On Mon, 2014-08-18 at 17:49 +0800, Zhang Haoyu wrote:
> >>> >> >> Hi, all
> >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
> >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
> >>> >> >> so how to only unbind (part of) the VFs but PF?
> >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
> >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
> >>> >> >> I think I misunderstand someting,
> >>> >> >> any advises?
> >>> >> >
> >>> >> >This occurs when the PF is installed behind components in the system
> >>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
> >>> >> >contains both the PF and the VF because upstream transactions can be
> >>> >> >re-routed downstream by these non-ACS components before being translated
> >>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
> >>> >> >version and we might be able to give you some advise on how to work
> >>> >> >around the problem.  Thanks,
> >>> >> >
> >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
> >>> >> does 00:01.1 PCI bridge support ACS ?
> >>> >
> >>> >It does not and that's exactly the problem.  We must assume that the
> >>> >root port can redirect a transaction from a subordinate device back to
> >>> >another subordinate device without IOMMU translation when ACS support is
> >>> >not present.  If you had a device plugged in below 00:01.0, we'd also
> >>> >need to assume that non-IOMMU translated peer-to-peer between devices
> >>> >behind either function, 00:01.0 or 00:01.1, is possible.
> >>> >
> >>> >Intel has indicated that processor root ports for all Xeon class
> >>> >processors should support ACS and have verified isolation for PCH based
> >>> >root ports allowing us to support quirks in place of ACS support.  I'm
> >>> >not aware of any efforts at Intel to verify isolation capabilities of
> >>> >root ports on client processors.  They are however aware that lack of
> >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
> >>> >see future products with ACS support.
> >>> >
> >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
> >>> >ACS quirk, but it seems that your system has a PCIe switch below the
> >>> >root port.  If the PCIe switch downstream ports support ACS, then you
> >>> >may be able to move the 82599 to the empty slot at bus 07 to separate
> >>> >the VFs into different IOMMU groups.  Thanks,
> >>> >
> >>> Thanks, Alex,
> >>> how to tell whether a PCI bridge/deivce support ACS capability?
> >>> 
> >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
> >>> # lspci -vvv -s 00:1c.0
> >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
> >>
> >>
> >>Ideally there would be capabilities for it, something like:
> >>
> >>Capabilities [xxx] Access Control Services...
> >>
> >>But, Intel failed to provide this, so we enable "effective" ACS
> >>capabilities via a quirk:
> >>
> >>drivers/pci/quirks.c:
> >>/*
> >> * Many Intel PCH root ports do provide ACS-like features to disable peer
> >> * transactions and validate bus numbers in requests, but do not provide an
> >> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> >> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> >> */
> >>static const u16 pci_quirk_intel_pch_acs_ids[] = {
> >>        /* Ibexpeak PCH */
> >>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
> >>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
> >>        /* Cougarpoint PCH */
> >>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
> >>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
> >>        /* Pantherpoint PCH */
> >>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
> >>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
> >>        /* Lynxpoint-H PCH */
> >>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
> >>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
> >>        /* Lynxpoint-LP PCH */
> >>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
> >>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
> >>        /* Wildcat PCH */
> >>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
> >>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
> >>        /* Patsburg (X79) PCH */
> >>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
> >>};
> >>
> >>Hopefully if you run 'lspci -n', you'll see your device ID listed among
> >>these.  We don't currently have any quirks for PCIe switches, so if your
> >>IOMMU group is still bigger than it should be, that may be the reason.
> >>Thanks,
> >>
> >Using device specific mechanisms to enable and verify ACS-like capability is okay,
> >but with regard to those devices which completely don't support ACS-like capabilities, 
> >what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
> >and how to reduce the risk of data corruption and info leakage between VMs?
> >
> Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ?

You're welcome to use that patch, but you do so at your own risk.
Upstream is not willing to accept that patch because incorrectly
assuming isolation can lead to subtle issues which are difficult or
impossible to support.  This is why we only support ACS reported
isolation or vendor verified quirks.  If you're able to work with the
component vendor to verify isolation, we'd welcome more quirks in this
area.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-08-18 12:54 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-14  8:22 [questions] about using vfio to assign sr-iov vf to vm Zhang Haoyu
2014-08-14  8:22 ` [Qemu-devel] " Zhang Haoyu
2014-08-14 12:44 ` Alex Williamson
2014-08-14 12:44   ` [Qemu-devel] " Alex Williamson
2014-08-16  6:48 ` Zhang Haoyu
2014-08-16  6:48   ` [Qemu-devel] " Zhang Haoyu
2014-08-16 13:29   ` Alex Williamson
2014-08-16 13:29     ` [Qemu-devel] " Alex Williamson
2014-08-18  1:00     ` Zhang Haoyu
2014-08-18  1:00       ` [Qemu-devel] " Zhang Haoyu
2014-08-18  1:14       ` Alex Williamson
2014-08-18  1:14         ` [Qemu-devel] " Alex Williamson
2014-08-18  8:46         ` Zhang Haoyu
2014-08-18  8:46           ` [Qemu-devel] " Zhang Haoyu
2014-08-18  9:49           ` Zhang Haoyu
2014-08-18  9:49             ` [Qemu-devel] " Zhang Haoyu
2014-08-18 12:53             ` Alex Williamson
2014-08-18 12:53               ` [Qemu-devel] " Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.