* [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-14 8:22 ` Zhang Haoyu 0 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-14 8:22 UTC (permalink / raw) To: qemu-devel, kvm Hi, all I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., so how to only unbind (part of) the VFs but PF? I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. I think I misunderstand someting, any advises? Thanks, Zhang Haoyu ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-14 8:22 ` Zhang Haoyu 0 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-14 8:22 UTC (permalink / raw) To: qemu-devel, kvm Hi, all I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., so how to only unbind (part of) the VFs but PF? I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. I think I misunderstand someting, any advises? Thanks, Zhang Haoyu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-14 8:22 ` [Qemu-devel] " Zhang Haoyu @ 2014-08-14 12:44 ` Alex Williamson -1 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-14 12:44 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On Thu, 2014-08-14 at 16:22 +0800, Zhang Haoyu wrote: > Hi, all > I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > so how to only unbind (part of) the VFs but PF? > I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > I think I misunderstand someting, > any advises? This occurs when the PF is installed behind components in the system that do not support PCIe Access Control Services (ACS). The IOMMU group contains both the PF and the VF because upstream transactions can be re-routed downstream by these non-ACS components before being translated by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel version and we might be able to give you some advise on how to work around the problem. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-14 12:44 ` Alex Williamson 0 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-14 12:44 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On Thu, 2014-08-14 at 16:22 +0800, Zhang Haoyu wrote: > Hi, all > I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > so how to only unbind (part of) the VFs but PF? > I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > I think I misunderstand someting, > any advises? This occurs when the PF is installed behind components in the system that do not support PCIe Access Control Services (ACS). The IOMMU group contains both the PF and the VF because upstream transactions can be re-routed downstream by these non-ACS components before being translated by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel version and we might be able to give you some advise on how to work around the problem. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-14 8:22 ` [Qemu-devel] " Zhang Haoyu @ 2014-08-16 6:48 ` Zhang Haoyu -1 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-16 6:48 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, kvm >> Hi, all >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., >> so how to only unbind (part of) the VFs but PF? >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >> I think I misunderstand someting, >> any advises? > >This occurs when the PF is installed behind components in the system >that do not support PCIe Access Control Services (ACS). The IOMMU group >contains both the PF and the VF because upstream transactions can be >re-routed downstream by these non-ACS components before being translated >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >version and we might be able to give you some advise on how to work >around the problem. Thanks, > # lspci | grep Ether 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, # lspci -t -[0000:00]-+-00.0 +-01.0-[01]-- +-01.1-[02-03]--+-00.0 | \-00.1 +-02.0 +-06.0-[04]-- +-16.0 +-1a.0 +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- | +-05.0-[08]--+-00.0 | | \-00.1 | +-06.0-[09]--+-00.0 | | \-00.1 | +-08.0-[0a]--+-00.0 | | \-00.1 | \-09.0-[0b]--+-00.0 | \-00.1 +-1d.0 +-1e.0-[0c]----00.0 +-1f.0 +-1f.2 \-1f.3 lspci -vvv -s 02.00.0 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] Region 2: I/O ports at e020 [size=32] Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <1us, L1 <8us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [e0] Vital Product Data Unknown small resource type 06, will not decode more. Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- IOVSta: Migration- Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00 VF offset: 384, stride: 2, Device ID: 10ed Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000dfb00000 (64-bit, prefetchable) Region 3: Memory at 00000000dfc00000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Kernel driver in use: ixgbe # lspci -vvv -s 00:01.1 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: f7e00000-f7efffff Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee002f8 Data: 0000 Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #3, Speed 8GT/s, Width x8, ASPM L0s L1, Latency L0 <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #2, PowerLimit 75.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis- ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=03 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed19000 Capabilities: [d94 v1] #19 Kernel driver in use: pcieport The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), does 00:01.1 PCI bridge support ACS ? Thanks, Zhang Haoyu >Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-16 6:48 ` Zhang Haoyu 0 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-16 6:48 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, kvm >> Hi, all >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., >> so how to only unbind (part of) the VFs but PF? >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >> I think I misunderstand someting, >> any advises? > >This occurs when the PF is installed behind components in the system >that do not support PCIe Access Control Services (ACS). The IOMMU group >contains both the PF and the VF because upstream transactions can be >re-routed downstream by these non-ACS components before being translated >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >version and we might be able to give you some advise on how to work >around the problem. Thanks, > # lspci | grep Ether 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, # lspci -t -[0000:00]-+-00.0 +-01.0-[01]-- +-01.1-[02-03]--+-00.0 | \-00.1 +-02.0 +-06.0-[04]-- +-16.0 +-1a.0 +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- | +-05.0-[08]--+-00.0 | | \-00.1 | +-06.0-[09]--+-00.0 | | \-00.1 | +-08.0-[0a]--+-00.0 | | \-00.1 | \-09.0-[0b]--+-00.0 | \-00.1 +-1d.0 +-1e.0-[0c]----00.0 +-1f.0 +-1f.2 \-1f.3 lspci -vvv -s 02.00.0 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] Region 2: I/O ports at e020 [size=32] Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=64 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <1us, L1 <8us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [e0] Vital Product Data Unknown small resource type 06, will not decode more. Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- IOVSta: Migration- Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00 VF offset: 384, stride: 2, Device ID: 10ed Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000dfb00000 (64-bit, prefetchable) Region 3: Memory at 00000000dfc00000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Kernel driver in use: ixgbe # lspci -vvv -s 00:01.1 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: f7e00000-f7efffff Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port Capabilities: [80] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee002f8 Data: 0000 Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #3, Speed 8GT/s, Width x8, ASPM L0s L1, Latency L0 <256ns, L1 <8us ClockPM- Surprise- LLActRep- BwNot+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #2, PowerLimit 75.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis- ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [140 v1] Root Complex Link Desc: PortNumber=03 ComponentID=01 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed19000 Capabilities: [d94 v1] #19 Kernel driver in use: pcieport The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), does 00:01.1 PCI bridge support ACS ? Thanks, Zhang Haoyu >Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-16 6:48 ` [Qemu-devel] " Zhang Haoyu @ 2014-08-16 13:29 ` Alex Williamson -1 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-16 13:29 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On Sat, 2014-08-16 at 14:48 +0800, Zhang Haoyu wrote: > >> Hi, all > >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > >> so how to only unbind (part of) the VFs but PF? > >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >> I think I misunderstand someting, > >> any advises? > > > >This occurs when the PF is installed behind components in the system > >that do not support PCIe Access Control Services (ACS). The IOMMU group > >contains both the PF and the VF because upstream transactions can be > >re-routed downstream by these non-ACS components before being translated > >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >version and we might be able to give you some advise on how to work > >around the problem. Thanks, > > > # lspci | grep Ether > 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) > > I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, > # lspci -t > -[0000:00]-+-00.0 > +-01.0-[01]-- > +-01.1-[02-03]--+-00.0 > | \-00.1 > +-02.0 > +-06.0-[04]-- > +-16.0 > +-1a.0 > +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- > | +-05.0-[08]--+-00.0 > | | \-00.1 > | +-06.0-[09]--+-00.0 > | | \-00.1 > | +-08.0-[0a]--+-00.0 > | | \-00.1 > | \-09.0-[0b]--+-00.0 > | \-00.1 > +-1d.0 > +-1e.0-[0c]----00.0 > +-1f.0 > +-1f.2 > \-1f.3 > > lspci -vvv -s 02.00.0 > 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 17 > Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] > Region 2: I/O ports at e020 [size=32] > Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable+ Count=64 Masked- > Capabilities: [a0] Express (v2) Endpoint, MSI 00 > Capabilities: [e0] Vital Product Data > Capabilities: [100 v1] Advanced Error Reporting > Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 > Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) > Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) > Kernel driver in use: ixgbe > > # lspci -vvv -s 00:01.1 > 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 > I/O behind bridge: 0000e000-0000efff > Memory behind bridge: f7e00000-f7efffff > Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port > Capabilities: [80] Power Management version 3 > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > Capabilities: [100 v1] Virtual Channel > Capabilities: [140 v1] Root Complex Link > Capabilities: [d94 v1] #19 > Kernel driver in use: pcieport > > The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > does 00:01.1 PCI bridge support ACS ? It does not and that's exactly the problem. We must assume that the root port can redirect a transaction from a subordinate device back to another subordinate device without IOMMU translation when ACS support is not present. If you had a device plugged in below 00:01.0, we'd also need to assume that non-IOMMU translated peer-to-peer between devices behind either function, 00:01.0 or 00:01.1, is possible. Intel has indicated that processor root ports for all Xeon class processors should support ACS and have verified isolation for PCH based root ports allowing us to support quirks in place of ACS support. I'm not aware of any efforts at Intel to verify isolation capabilities of root ports on client processors. They are however aware that lack of ACS is a limiting factor for usability of VT-d, and I hope that we'll see future products with ACS support. Chances are good that the PCH root port at 00:1c.0 is supported by an ACS quirk, but it seems that your system has a PCIe switch below the root port. If the PCIe switch downstream ports support ACS, then you may be able to move the 82599 to the empty slot at bus 07 to separate the VFs into different IOMMU groups. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-16 13:29 ` Alex Williamson 0 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-16 13:29 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On Sat, 2014-08-16 at 14:48 +0800, Zhang Haoyu wrote: > >> Hi, all > >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > >> so how to only unbind (part of) the VFs but PF? > >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >> I think I misunderstand someting, > >> any advises? > > > >This occurs when the PF is installed behind components in the system > >that do not support PCIe Access Control Services (ACS). The IOMMU group > >contains both the PF and the VF because upstream transactions can be > >re-routed downstream by these non-ACS components before being translated > >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >version and we might be able to give you some advise on how to work > >around the problem. Thanks, > > > # lspci | grep Ether > 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) > > I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, > # lspci -t > -[0000:00]-+-00.0 > +-01.0-[01]-- > +-01.1-[02-03]--+-00.0 > | \-00.1 > +-02.0 > +-06.0-[04]-- > +-16.0 > +-1a.0 > +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- > | +-05.0-[08]--+-00.0 > | | \-00.1 > | +-06.0-[09]--+-00.0 > | | \-00.1 > | +-08.0-[0a]--+-00.0 > | | \-00.1 > | \-09.0-[0b]--+-00.0 > | \-00.1 > +-1d.0 > +-1e.0-[0c]----00.0 > +-1f.0 > +-1f.2 > \-1f.3 > > lspci -vvv -s 02.00.0 > 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 17 > Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] > Region 2: I/O ports at e020 [size=32] > Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Capabilities: [70] MSI-X: Enable+ Count=64 Masked- > Capabilities: [a0] Express (v2) Endpoint, MSI 00 > Capabilities: [e0] Vital Product Data > Capabilities: [100 v1] Advanced Error Reporting > Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 > Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) > Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) > Kernel driver in use: ixgbe > > # lspci -vvv -s 00:01.1 > 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 > I/O behind bridge: 0000e000-0000efff > Memory behind bridge: f7e00000-f7efffff > Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port > Capabilities: [80] Power Management version 3 > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > Capabilities: [100 v1] Virtual Channel > Capabilities: [140 v1] Root Complex Link > Capabilities: [d94 v1] #19 > Kernel driver in use: pcieport > > The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > does 00:01.1 PCI bridge support ACS ? It does not and that's exactly the problem. We must assume that the root port can redirect a transaction from a subordinate device back to another subordinate device without IOMMU translation when ACS support is not present. If you had a device plugged in below 00:01.0, we'd also need to assume that non-IOMMU translated peer-to-peer between devices behind either function, 00:01.0 or 00:01.1, is possible. Intel has indicated that processor root ports for all Xeon class processors should support ACS and have verified isolation for PCH based root ports allowing us to support quirks in place of ACS support. I'm not aware of any efforts at Intel to verify isolation capabilities of root ports on client processors. They are however aware that lack of ACS is a limiting factor for usability of VT-d, and I hope that we'll see future products with ACS support. Chances are good that the PCH root port at 00:1c.0 is supported by an ACS quirk, but it seems that your system has a PCIe switch below the root port. If the PCIe switch downstream ports support ACS, then you may be able to move the 82599 to the empty slot at bus 07 to separate the VFs into different IOMMU groups. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-16 13:29 ` [Qemu-devel] " Alex Williamson @ 2014-08-18 1:00 ` Zhang Haoyu -1 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-18 1:00 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, kvm >> >> Hi, all >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., >> >> so how to only unbind (part of) the VFs but PF? >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >> >> I think I misunderstand someting, >> >> any advises? >> > >> >This occurs when the PF is installed behind components in the system >> >that do not support PCIe Access Control Services (ACS). The IOMMU group >> >contains both the PF and the VF because upstream transactions can be >> >re-routed downstream by these non-ACS components before being translated >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >> >version and we might be able to give you some advise on how to work >> >around the problem. Thanks, >> > >> # lspci | grep Ether >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) >> >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, >> # lspci -t >> -[0000:00]-+-00.0 >> +-01.0-[01]-- >> +-01.1-[02-03]--+-00.0 >> | \-00.1 >> +-02.0 >> +-06.0-[04]-- >> +-16.0 >> +-1a.0 >> +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- >> | +-05.0-[08]--+-00.0 >> | | \-00.1 >> | +-06.0-[09]--+-00.0 >> | | \-00.1 >> | +-08.0-[0a]--+-00.0 >> | | \-00.1 >> | \-09.0-[0b]--+-00.0 >> | \-00.1 >> +-1d.0 >> +-1e.0-[0c]----00.0 >> +-1f.0 >> +-1f.2 >> \-1f.3 >> >> lspci -vvv -s 02.00.0 >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 17 >> Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] >> Region 2: I/O ports at e020 [size=32] >> Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] >> Capabilities: [40] Power Management version 3 >> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ >> Capabilities: [70] MSI-X: Enable+ Count=64 Masked- >> Capabilities: [a0] Express (v2) Endpoint, MSI 00 >> Capabilities: [e0] Vital Product Data >> Capabilities: [100 v1] Advanced Error Reporting >> Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 >> Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) >> Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) >> Kernel driver in use: ixgbe >> >> # lspci -vvv -s 00:01.1 >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 >> I/O behind bridge: 0000e000-0000efff >> Memory behind bridge: f7e00000-f7efffff >> Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff >> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- >> BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >> Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port >> Capabilities: [80] Power Management version 3 >> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- >> Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 >> Capabilities: [100 v1] Virtual Channel >> Capabilities: [140 v1] Root Complex Link >> Capabilities: [d94 v1] #19 >> Kernel driver in use: pcieport >> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), >> does 00:01.1 PCI bridge support ACS ? > >It does not and that's exactly the problem. We must assume that the >root port can redirect a transaction from a subordinate device back to >another subordinate device without IOMMU translation when ACS support is >not present. If you had a device plugged in below 00:01.0, we'd also >need to assume that non-IOMMU translated peer-to-peer between devices >behind either function, 00:01.0 or 00:01.1, is possible. > >Intel has indicated that processor root ports for all Xeon class >processors should support ACS and have verified isolation for PCH based >root ports allowing us to support quirks in place of ACS support. I'm >not aware of any efforts at Intel to verify isolation capabilities of >root ports on client processors. They are however aware that lack of >ACS is a limiting factor for usability of VT-d, and I hope that we'll >see future products with ACS support. > >Chances are good that the PCH root port at 00:1c.0 is supported by an >ACS quirk, but it seems that your system has a PCIe switch below the >root port. If the PCIe switch downstream ports support ACS, then you >may be able to move the 82599 to the empty slot at bus 07 to separate >the VFs into different IOMMU groups. Thanks, > Thanks, Alex, how to tell whether a PCI bridge/deivce support ACS capability? I perform "lspci -vvv -s | grep -i ACS", nothing matched. # lspci -vvv -s 00:1c.0 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=05, subordinate=0b, sec-latency=0 I/O behind bridge: 00002000-00003fff Memory behind bridge: f7800000-f7cfffff Prefetchable memory behind bridge: 00000000f0000000-00000000f03fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 25.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+ ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [90] Subsystem: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: pcieport Thanks, Zhang Haoyu >Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-18 1:00 ` Zhang Haoyu 0 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-18 1:00 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, kvm >> >> Hi, all >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., >> >> so how to only unbind (part of) the VFs but PF? >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >> >> I think I misunderstand someting, >> >> any advises? >> > >> >This occurs when the PF is installed behind components in the system >> >that do not support PCIe Access Control Services (ACS). The IOMMU group >> >contains both the PF and the VF because upstream transactions can be >> >re-routed downstream by these non-ACS components before being translated >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >> >version and we might be able to give you some advise on how to work >> >around the problem. Thanks, >> > >> # lspci | grep Ether >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) >> >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, >> # lspci -t >> -[0000:00]-+-00.0 >> +-01.0-[01]-- >> +-01.1-[02-03]--+-00.0 >> | \-00.1 >> +-02.0 >> +-06.0-[04]-- >> +-16.0 >> +-1a.0 >> +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- >> | +-05.0-[08]--+-00.0 >> | | \-00.1 >> | +-06.0-[09]--+-00.0 >> | | \-00.1 >> | +-08.0-[0a]--+-00.0 >> | | \-00.1 >> | \-09.0-[0b]--+-00.0 >> | \-00.1 >> +-1d.0 >> +-1e.0-[0c]----00.0 >> +-1f.0 >> +-1f.2 >> \-1f.3 >> >> lspci -vvv -s 02.00.0 >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 17 >> Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] >> Region 2: I/O ports at e020 [size=32] >> Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] >> Capabilities: [40] Power Management version 3 >> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ >> Capabilities: [70] MSI-X: Enable+ Count=64 Masked- >> Capabilities: [a0] Express (v2) Endpoint, MSI 00 >> Capabilities: [e0] Vital Product Data >> Capabilities: [100 v1] Advanced Error Reporting >> Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 >> Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) >> Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) >> Kernel driver in use: ixgbe >> >> # lspci -vvv -s 00:01.1 >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 >> I/O behind bridge: 0000e000-0000efff >> Memory behind bridge: f7e00000-f7efffff >> Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff >> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- >> BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >> Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port >> Capabilities: [80] Power Management version 3 >> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- >> Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 >> Capabilities: [100 v1] Virtual Channel >> Capabilities: [140 v1] Root Complex Link >> Capabilities: [d94 v1] #19 >> Kernel driver in use: pcieport >> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), >> does 00:01.1 PCI bridge support ACS ? > >It does not and that's exactly the problem. We must assume that the >root port can redirect a transaction from a subordinate device back to >another subordinate device without IOMMU translation when ACS support is >not present. If you had a device plugged in below 00:01.0, we'd also >need to assume that non-IOMMU translated peer-to-peer between devices >behind either function, 00:01.0 or 00:01.1, is possible. > >Intel has indicated that processor root ports for all Xeon class >processors should support ACS and have verified isolation for PCH based >root ports allowing us to support quirks in place of ACS support. I'm >not aware of any efforts at Intel to verify isolation capabilities of >root ports on client processors. They are however aware that lack of >ACS is a limiting factor for usability of VT-d, and I hope that we'll >see future products with ACS support. > >Chances are good that the PCH root port at 00:1c.0 is supported by an >ACS quirk, but it seems that your system has a PCIe switch below the >root port. If the PCIe switch downstream ports support ACS, then you >may be able to move the 82599 to the empty slot at bus 07 to separate >the VFs into different IOMMU groups. Thanks, > Thanks, Alex, how to tell whether a PCI bridge/deivce support ACS capability? I perform "lspci -vvv -s | grep -i ACS", nothing matched. # lspci -vvv -s 00:1c.0 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=05, subordinate=0b, sec-latency=0 I/O behind bridge: 00002000-00003fff Memory behind bridge: f7800000-f7cfffff Prefetchable memory behind bridge: 00000000f0000000-00000000f03fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 25.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range BC, TimeoutDis+ ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Address: 00000000 Data: 0000 Capabilities: [90] Subsystem: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 Capabilities: [a0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: pcieport Thanks, Zhang Haoyu >Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-18 1:00 ` [Qemu-devel] " Zhang Haoyu @ 2014-08-18 1:14 ` Alex Williamson -1 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-18 1:14 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On Mon, 2014-08-18 at 09:00 +0800, Zhang Haoyu wrote: > >> >> Hi, all > >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > >> >> so how to only unbind (part of) the VFs but PF? > >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >> >> I think I misunderstand someting, > >> >> any advises? > >> > > >> >This occurs when the PF is installed behind components in the system > >> >that do not support PCIe Access Control Services (ACS). The IOMMU group > >> >contains both the PF and the VF because upstream transactions can be > >> >re-routed downstream by these non-ACS components before being translated > >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >> >version and we might be able to give you some advise on how to work > >> >around the problem. Thanks, > >> > > >> # lspci | grep Ether > >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) > >> > >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, > >> # lspci -t > >> -[0000:00]-+-00.0 > >> +-01.0-[01]-- > >> +-01.1-[02-03]--+-00.0 > >> | \-00.1 > >> +-02.0 > >> +-06.0-[04]-- > >> +-16.0 > >> +-1a.0 > >> +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- > >> | +-05.0-[08]--+-00.0 > >> | | \-00.1 > >> | +-06.0-[09]--+-00.0 > >> | | \-00.1 > >> | +-08.0-[0a]--+-00.0 > >> | | \-00.1 > >> | \-09.0-[0b]--+-00.0 > >> | \-00.1 > >> +-1d.0 > >> +-1e.0-[0c]----00.0 > >> +-1f.0 > >> +-1f.2 > >> \-1f.3 > >> > >> lspci -vvv -s 02.00.0 > >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >> Latency: 0, Cache Line Size: 64 bytes > >> Interrupt: pin A routed to IRQ 17 > >> Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] > >> Region 2: I/O ports at e020 [size=32] > >> Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] > >> Capabilities: [40] Power Management version 3 > >> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > >> Capabilities: [70] MSI-X: Enable+ Count=64 Masked- > >> Capabilities: [a0] Express (v2) Endpoint, MSI 00 > >> Capabilities: [e0] Vital Product Data > >> Capabilities: [100 v1] Advanced Error Reporting > >> Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 > >> Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) > >> Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) > >> Kernel driver in use: ixgbe > >> > >> # lspci -vvv -s 00:01.1 > >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) > >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >> Latency: 0, Cache Line Size: 64 bytes > >> Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 > >> I/O behind bridge: 0000e000-0000efff > >> Memory behind bridge: f7e00000-f7efffff > >> Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff > >> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > >> BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > >> Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port > >> Capabilities: [80] Power Management version 3 > >> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > >> Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > >> Capabilities: [100 v1] Virtual Channel > >> Capabilities: [140 v1] Root Complex Link > >> Capabilities: [d94 v1] #19 > >> Kernel driver in use: pcieport > >> > >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > >> does 00:01.1 PCI bridge support ACS ? > > > >It does not and that's exactly the problem. We must assume that the > >root port can redirect a transaction from a subordinate device back to > >another subordinate device without IOMMU translation when ACS support is > >not present. If you had a device plugged in below 00:01.0, we'd also > >need to assume that non-IOMMU translated peer-to-peer between devices > >behind either function, 00:01.0 or 00:01.1, is possible. > > > >Intel has indicated that processor root ports for all Xeon class > >processors should support ACS and have verified isolation for PCH based > >root ports allowing us to support quirks in place of ACS support. I'm > >not aware of any efforts at Intel to verify isolation capabilities of > >root ports on client processors. They are however aware that lack of > >ACS is a limiting factor for usability of VT-d, and I hope that we'll > >see future products with ACS support. > > > >Chances are good that the PCH root port at 00:1c.0 is supported by an > >ACS quirk, but it seems that your system has a PCIe switch below the > >root port. If the PCIe switch downstream ports support ACS, then you > >may be able to move the 82599 to the empty slot at bus 07 to separate > >the VFs into different IOMMU groups. Thanks, > > > Thanks, Alex, > how to tell whether a PCI bridge/deivce support ACS capability? > > I perform "lspci -vvv -s | grep -i ACS", nothing matched. > # lspci -vvv -s 00:1c.0 > 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Ideally there would be capabilities for it, something like: Capabilities [xxx] Access Control Services... But, Intel failed to provide this, so we enable "effective" ACS capabilities via a quirk: drivers/pci/quirks.c: /* * Many Intel PCH root ports do provide ACS-like features to disable peer * transactions and validate bus numbers in requests, but do not provide an * actual PCIe ACS capability. This is the list of device IDs known to fall * into that category as provided by Intel in Red Hat bugzilla 1037684. */ static const u16 pci_quirk_intel_pch_acs_ids[] = { /* Ibexpeak PCH */ 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, /* Cougarpoint PCH */ 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, /* Pantherpoint PCH */ 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, /* Lynxpoint-H PCH */ 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, /* Lynxpoint-LP PCH */ 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, /* Wildcat PCH */ 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, /* Patsburg (X79) PCH */ 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, }; Hopefully if you run 'lspci -n', you'll see your device ID listed among these. We don't currently have any quirks for PCIe switches, so if your IOMMU group is still bigger than it should be, that may be the reason. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-18 1:14 ` Alex Williamson 0 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-18 1:14 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm On Mon, 2014-08-18 at 09:00 +0800, Zhang Haoyu wrote: > >> >> Hi, all > >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, ..., > >> >> so how to only unbind (part of) the VFs but PF? > >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >> >> I think I misunderstand someting, > >> >> any advises? > >> > > >> >This occurs when the PF is installed behind components in the system > >> >that do not support PCIe Access Control Services (ACS). The IOMMU group > >> >contains both the PF and the VF because upstream transactions can be > >> >re-routed downstream by these non-ACS components before being translated > >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >> >version and we might be able to give you some advise on how to work > >> >around the problem. Thanks, > >> > > >> # lspci | grep Ether > >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> 02:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> 08:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 08:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 09:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 09:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0b:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0b:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) > >> 0c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) > >> > >> I want to direct-assign the VFs of intel 82599(02:00.0 or 02:00.1) to VM, > >> # lspci -t > >> -[0000:00]-+-00.0 > >> +-01.0-[01]-- > >> +-01.1-[02-03]--+-00.0 > >> | \-00.1 > >> +-02.0 > >> +-06.0-[04]-- > >> +-16.0 > >> +-1a.0 > >> +-1c.0-[05-0b]----00.0-[06-0b]--+-04.0-[07]-- > >> | +-05.0-[08]--+-00.0 > >> | | \-00.1 > >> | +-06.0-[09]--+-00.0 > >> | | \-00.1 > >> | +-08.0-[0a]--+-00.0 > >> | | \-00.1 > >> | \-09.0-[0b]--+-00.0 > >> | \-00.1 > >> +-1d.0 > >> +-1e.0-[0c]----00.0 > >> +-1f.0 > >> +-1f.2 > >> \-1f.3 > >> > >> lspci -vvv -s 02.00.0 > >> 02:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) > >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >> Latency: 0, Cache Line Size: 64 bytes > >> Interrupt: pin A routed to IRQ 17 > >> Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] > >> Region 2: I/O ports at e020 [size=32] > >> Region 4: Memory at f7e44000 (64-bit, non-prefetchable) [size=16K] > >> Capabilities: [40] Power Management version 3 > >> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ > >> Capabilities: [70] MSI-X: Enable+ Count=64 Masked- > >> Capabilities: [a0] Express (v2) Endpoint, MSI 00 > >> Capabilities: [e0] Vital Product Data > >> Capabilities: [100 v1] Advanced Error Reporting > >> Capabilities: [140 v1] Device Serial Number 00-90-0b-ff-ff-29-33-c2 > >> Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) > >> Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) > >> Kernel driver in use: ixgbe > >> > >> # lspci -vvv -s 00:01.1 > >> 00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) > >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > >> Latency: 0, Cache Line Size: 64 bytes > >> Bus: primary=00, secondary=02, subordinate=03, sec-latency=0 > >> I/O behind bridge: 0000e000-0000efff > >> Memory behind bridge: f7e00000-f7efffff > >> Prefetchable memory behind bridge: 00000000dfb00000-00000000dfefffff > >> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > >> BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > >> Capabilities: [88] Subsystem: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port > >> Capabilities: [80] Power Management version 3 > >> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- > >> Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00 > >> Capabilities: [100 v1] Virtual Channel > >> Capabilities: [140 v1] Root Complex Link > >> Capabilities: [d94 v1] #19 > >> Kernel driver in use: pcieport > >> > >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > >> does 00:01.1 PCI bridge support ACS ? > > > >It does not and that's exactly the problem. We must assume that the > >root port can redirect a transaction from a subordinate device back to > >another subordinate device without IOMMU translation when ACS support is > >not present. If you had a device plugged in below 00:01.0, we'd also > >need to assume that non-IOMMU translated peer-to-peer between devices > >behind either function, 00:01.0 or 00:01.1, is possible. > > > >Intel has indicated that processor root ports for all Xeon class > >processors should support ACS and have verified isolation for PCH based > >root ports allowing us to support quirks in place of ACS support. I'm > >not aware of any efforts at Intel to verify isolation capabilities of > >root ports on client processors. They are however aware that lack of > >ACS is a limiting factor for usability of VT-d, and I hope that we'll > >see future products with ACS support. > > > >Chances are good that the PCH root port at 00:1c.0 is supported by an > >ACS quirk, but it seems that your system has a PCIe switch below the > >root port. If the PCIe switch downstream ports support ACS, then you > >may be able to move the 82599 to the empty slot at bus 07 to separate > >the VFs into different IOMMU groups. Thanks, > > > Thanks, Alex, > how to tell whether a PCI bridge/deivce support ACS capability? > > I perform "lspci -vvv -s | grep -i ACS", nothing matched. > # lspci -vvv -s 00:1c.0 > 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) Ideally there would be capabilities for it, something like: Capabilities [xxx] Access Control Services... But, Intel failed to provide this, so we enable "effective" ACS capabilities via a quirk: drivers/pci/quirks.c: /* * Many Intel PCH root ports do provide ACS-like features to disable peer * transactions and validate bus numbers in requests, but do not provide an * actual PCIe ACS capability. This is the list of device IDs known to fall * into that category as provided by Intel in Red Hat bugzilla 1037684. */ static const u16 pci_quirk_intel_pch_acs_ids[] = { /* Ibexpeak PCH */ 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, /* Cougarpoint PCH */ 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, /* Pantherpoint PCH */ 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, /* Lynxpoint-H PCH */ 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, /* Lynxpoint-LP PCH */ 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, /* Wildcat PCH */ 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, /* Patsburg (X79) PCH */ 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, }; Hopefully if you run 'lspci -n', you'll see your device ID listed among these. We don't currently have any quirks for PCIe switches, so if your IOMMU group is still bigger than it should be, that may be the reason. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-18 1:14 ` [Qemu-devel] " Alex Williamson @ 2014-08-18 8:46 ` Zhang Haoyu -1 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-18 8:46 UTC (permalink / raw) To: Alex Williamson; +Cc: qemu-devel, kvm, bhelgaas, donald.d.dugger >> >> >> Hi, all >> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, >> >> >> so how to only unbind (part of) the VFs but PF? >> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >> >> >> I think I misunderstand someting, >> >> >> any advises? >> >> > >> >> >This occurs when the PF is installed behind components in the system >> >> >that do not support PCIe Access Control Services (ACS). The IOMMU group >> >> >contains both the PF and the VF because upstream transactions can be >> >> >re-routed downstream by these non-ACS components before being translated >> >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >> >> >version and we might be able to give you some advise on how to work >> >> >around the problem. Thanks, >> >> > >> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), >> >> does 00:01.1 PCI bridge support ACS ? >> > >> >It does not and that's exactly the problem. We must assume that the >> >root port can redirect a transaction from a subordinate device back to >> >another subordinate device without IOMMU translation when ACS support is >> >not present. If you had a device plugged in below 00:01.0, we'd also >> >need to assume that non-IOMMU translated peer-to-peer between devices >> >behind either function, 00:01.0 or 00:01.1, is possible. >> > >> >Intel has indicated that processor root ports for all Xeon class >> >processors should support ACS and have verified isolation for PCH based >> >root ports allowing us to support quirks in place of ACS support. I'm >> >not aware of any efforts at Intel to verify isolation capabilities of >> >root ports on client processors. They are however aware that lack of >> >ACS is a limiting factor for usability of VT-d, and I hope that we'll >> >see future products with ACS support. >> > >> >Chances are good that the PCH root port at 00:1c.0 is supported by an >> >ACS quirk, but it seems that your system has a PCIe switch below the >> >root port. If the PCIe switch downstream ports support ACS, then you >> >may be able to move the 82599 to the empty slot at bus 07 to separate >> >the VFs into different IOMMU groups. Thanks, >> > >> Thanks, Alex, >> how to tell whether a PCI bridge/deivce support ACS capability? >> >> I perform "lspci -vvv -s | grep -i ACS", nothing matched. >> # lspci -vvv -s 00:1c.0 >> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) > > >Ideally there would be capabilities for it, something like: > >Capabilities [xxx] Access Control Services... > >But, Intel failed to provide this, so we enable "effective" ACS >capabilities via a quirk: > >drivers/pci/quirks.c: >/* > * Many Intel PCH root ports do provide ACS-like features to disable peer > * transactions and validate bus numbers in requests, but do not provide an > * actual PCIe ACS capability. This is the list of device IDs known to fall > * into that category as provided by Intel in Red Hat bugzilla 1037684. > */ >static const u16 pci_quirk_intel_pch_acs_ids[] = { > /* Ibexpeak PCH */ > 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, > 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, > /* Cougarpoint PCH */ > 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, > 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, > /* Pantherpoint PCH */ > 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, > 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, > /* Lynxpoint-H PCH */ > 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, > 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, > /* Lynxpoint-LP PCH */ > 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, > 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, > /* Wildcat PCH */ > 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, > 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, > /* Patsburg (X79) PCH */ > 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, >}; > >Hopefully if you run 'lspci -n', you'll see your device ID listed among >these. We don't currently have any quirks for PCIe switches, so if your >IOMMU group is still bigger than it should be, that may be the reason. >Thanks, > Using device specific mechanisms to enable and verify ACS-like capability is okay, but with regard to those devices which completely don't support ACS-like capabilities, what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities, and how to reduce the risk of data corruption and info leakage between VMs? Thanks, Zhang Haoyu >Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-18 8:46 ` Zhang Haoyu 0 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-18 8:46 UTC (permalink / raw) To: Alex Williamson; +Cc: bhelgaas, donald.d.dugger, qemu-devel, kvm >> >> >> Hi, all >> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, >> >> >> so how to only unbind (part of) the VFs but PF? >> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >> >> >> I think I misunderstand someting, >> >> >> any advises? >> >> > >> >> >This occurs when the PF is installed behind components in the system >> >> >that do not support PCIe Access Control Services (ACS). The IOMMU group >> >> >contains both the PF and the VF because upstream transactions can be >> >> >re-routed downstream by these non-ACS components before being translated >> >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >> >> >version and we might be able to give you some advise on how to work >> >> >around the problem. Thanks, >> >> > >> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), >> >> does 00:01.1 PCI bridge support ACS ? >> > >> >It does not and that's exactly the problem. We must assume that the >> >root port can redirect a transaction from a subordinate device back to >> >another subordinate device without IOMMU translation when ACS support is >> >not present. If you had a device plugged in below 00:01.0, we'd also >> >need to assume that non-IOMMU translated peer-to-peer between devices >> >behind either function, 00:01.0 or 00:01.1, is possible. >> > >> >Intel has indicated that processor root ports for all Xeon class >> >processors should support ACS and have verified isolation for PCH based >> >root ports allowing us to support quirks in place of ACS support. I'm >> >not aware of any efforts at Intel to verify isolation capabilities of >> >root ports on client processors. They are however aware that lack of >> >ACS is a limiting factor for usability of VT-d, and I hope that we'll >> >see future products with ACS support. >> > >> >Chances are good that the PCH root port at 00:1c.0 is supported by an >> >ACS quirk, but it seems that your system has a PCIe switch below the >> >root port. If the PCIe switch downstream ports support ACS, then you >> >may be able to move the 82599 to the empty slot at bus 07 to separate >> >the VFs into different IOMMU groups. Thanks, >> > >> Thanks, Alex, >> how to tell whether a PCI bridge/deivce support ACS capability? >> >> I perform "lspci -vvv -s | grep -i ACS", nothing matched. >> # lspci -vvv -s 00:1c.0 >> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) > > >Ideally there would be capabilities for it, something like: > >Capabilities [xxx] Access Control Services... > >But, Intel failed to provide this, so we enable "effective" ACS >capabilities via a quirk: > >drivers/pci/quirks.c: >/* > * Many Intel PCH root ports do provide ACS-like features to disable peer > * transactions and validate bus numbers in requests, but do not provide an > * actual PCIe ACS capability. This is the list of device IDs known to fall > * into that category as provided by Intel in Red Hat bugzilla 1037684. > */ >static const u16 pci_quirk_intel_pch_acs_ids[] = { > /* Ibexpeak PCH */ > 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, > 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, > /* Cougarpoint PCH */ > 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, > 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, > /* Pantherpoint PCH */ > 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, > 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, > /* Lynxpoint-H PCH */ > 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, > 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, > /* Lynxpoint-LP PCH */ > 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, > 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, > /* Wildcat PCH */ > 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, > 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, > /* Patsburg (X79) PCH */ > 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, >}; > >Hopefully if you run 'lspci -n', you'll see your device ID listed among >these. We don't currently have any quirks for PCIe switches, so if your >IOMMU group is still bigger than it should be, that may be the reason. >Thanks, > Using device specific mechanisms to enable and verify ACS-like capability is okay, but with regard to those devices which completely don't support ACS-like capabilities, what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities, and how to reduce the risk of data corruption and info leakage between VMs? Thanks, Zhang Haoyu >Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-18 8:46 ` [Qemu-devel] " Zhang Haoyu @ 2014-08-18 9:49 ` Zhang Haoyu -1 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-18 9:49 UTC (permalink / raw) To: Zhang Haoyu, Alex Williamson Cc: bhelgaas, xudong.hao, donald.d.dugger, qemu-devel, kvm >>> >> >> Hi, all >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, >>> >> >> so how to only unbind (part of) the VFs but PF? >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >>> >> >> I think I misunderstand someting, >>> >> >> any advises? >>> >> > >>> >> >This occurs when the PF is installed behind components in the system >>> >> >that do not support PCIe Access Control Services (ACS). The IOMMU group >>> >> >contains both the PF and the VF because upstream transactions can be >>> >> >re-routed downstream by these non-ACS components before being translated >>> >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >>> >> >version and we might be able to give you some advise on how to work >>> >> >around the problem. Thanks, >>> >> > >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), >>> >> does 00:01.1 PCI bridge support ACS ? >>> > >>> >It does not and that's exactly the problem. We must assume that the >>> >root port can redirect a transaction from a subordinate device back to >>> >another subordinate device without IOMMU translation when ACS support is >>> >not present. If you had a device plugged in below 00:01.0, we'd also >>> >need to assume that non-IOMMU translated peer-to-peer between devices >>> >behind either function, 00:01.0 or 00:01.1, is possible. >>> > >>> >Intel has indicated that processor root ports for all Xeon class >>> >processors should support ACS and have verified isolation for PCH based >>> >root ports allowing us to support quirks in place of ACS support. I'm >>> >not aware of any efforts at Intel to verify isolation capabilities of >>> >root ports on client processors. They are however aware that lack of >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll >>> >see future products with ACS support. >>> > >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an >>> >ACS quirk, but it seems that your system has a PCIe switch below the >>> >root port. If the PCIe switch downstream ports support ACS, then you >>> >may be able to move the 82599 to the empty slot at bus 07 to separate >>> >the VFs into different IOMMU groups. Thanks, >>> > >>> Thanks, Alex, >>> how to tell whether a PCI bridge/deivce support ACS capability? >>> >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched. >>> # lspci -vvv -s 00:1c.0 >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) >> >> >>Ideally there would be capabilities for it, something like: >> >>Capabilities [xxx] Access Control Services... >> >>But, Intel failed to provide this, so we enable "effective" ACS >>capabilities via a quirk: >> >>drivers/pci/quirks.c: >>/* >> * Many Intel PCH root ports do provide ACS-like features to disable peer >> * transactions and validate bus numbers in requests, but do not provide an >> * actual PCIe ACS capability. This is the list of device IDs known to fall >> * into that category as provided by Intel in Red Hat bugzilla 1037684. >> */ >>static const u16 pci_quirk_intel_pch_acs_ids[] = { >> /* Ibexpeak PCH */ >> 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, >> 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, >> /* Cougarpoint PCH */ >> 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, >> 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, >> /* Pantherpoint PCH */ >> 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, >> 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, >> /* Lynxpoint-H PCH */ >> 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, >> 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, >> /* Lynxpoint-LP PCH */ >> 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, >> 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, >> /* Wildcat PCH */ >> 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, >> 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, >> /* Patsburg (X79) PCH */ >> 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, >>}; >> >>Hopefully if you run 'lspci -n', you'll see your device ID listed among >>these. We don't currently have any quirks for PCIe switches, so if your >>IOMMU group is still bigger than it should be, that may be the reason. >>Thanks, >> >Using device specific mechanisms to enable and verify ACS-like capability is okay, >but with regard to those devices which completely don't support ACS-like capabilities, >what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities, >and how to reduce the risk of data corruption and info leakage between VMs? > Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ? >Thanks, >Zhang Haoyu >>Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-18 9:49 ` Zhang Haoyu 0 siblings, 0 replies; 18+ messages in thread From: Zhang Haoyu @ 2014-08-18 9:49 UTC (permalink / raw) To: Zhang Haoyu, Alex Williamson Cc: bhelgaas, xudong.hao, donald.d.dugger, qemu-devel, kvm >>> >> >> Hi, all >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, >>> >> >> so how to only unbind (part of) the VFs but PF? >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. >>> >> >> I think I misunderstand someting, >>> >> >> any advises? >>> >> > >>> >> >This occurs when the PF is installed behind components in the system >>> >> >that do not support PCIe Access Control Services (ACS). The IOMMU group >>> >> >contains both the PF and the VF because upstream transactions can be >>> >> >re-routed downstream by these non-ACS components before being translated >>> >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel >>> >> >version and we might be able to give you some advise on how to work >>> >> >around the problem. Thanks, >>> >> > >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), >>> >> does 00:01.1 PCI bridge support ACS ? >>> > >>> >It does not and that's exactly the problem. We must assume that the >>> >root port can redirect a transaction from a subordinate device back to >>> >another subordinate device without IOMMU translation when ACS support is >>> >not present. If you had a device plugged in below 00:01.0, we'd also >>> >need to assume that non-IOMMU translated peer-to-peer between devices >>> >behind either function, 00:01.0 or 00:01.1, is possible. >>> > >>> >Intel has indicated that processor root ports for all Xeon class >>> >processors should support ACS and have verified isolation for PCH based >>> >root ports allowing us to support quirks in place of ACS support. I'm >>> >not aware of any efforts at Intel to verify isolation capabilities of >>> >root ports on client processors. They are however aware that lack of >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll >>> >see future products with ACS support. >>> > >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an >>> >ACS quirk, but it seems that your system has a PCIe switch below the >>> >root port. If the PCIe switch downstream ports support ACS, then you >>> >may be able to move the 82599 to the empty slot at bus 07 to separate >>> >the VFs into different IOMMU groups. Thanks, >>> > >>> Thanks, Alex, >>> how to tell whether a PCI bridge/deivce support ACS capability? >>> >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched. >>> # lspci -vvv -s 00:1c.0 >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) >> >> >>Ideally there would be capabilities for it, something like: >> >>Capabilities [xxx] Access Control Services... >> >>But, Intel failed to provide this, so we enable "effective" ACS >>capabilities via a quirk: >> >>drivers/pci/quirks.c: >>/* >> * Many Intel PCH root ports do provide ACS-like features to disable peer >> * transactions and validate bus numbers in requests, but do not provide an >> * actual PCIe ACS capability. This is the list of device IDs known to fall >> * into that category as provided by Intel in Red Hat bugzilla 1037684. >> */ >>static const u16 pci_quirk_intel_pch_acs_ids[] = { >> /* Ibexpeak PCH */ >> 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, >> 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, >> /* Cougarpoint PCH */ >> 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, >> 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, >> /* Pantherpoint PCH */ >> 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, >> 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, >> /* Lynxpoint-H PCH */ >> 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, >> 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, >> /* Lynxpoint-LP PCH */ >> 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, >> 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, >> /* Wildcat PCH */ >> 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, >> 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, >> /* Patsburg (X79) PCH */ >> 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, >>}; >> >>Hopefully if you run 'lspci -n', you'll see your device ID listed among >>these. We don't currently have any quirks for PCIe switches, so if your >>IOMMU group is still bigger than it should be, that may be the reason. >>Thanks, >> >Using device specific mechanisms to enable and verify ACS-like capability is okay, >but with regard to those devices which completely don't support ACS-like capabilities, >what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities, >and how to reduce the risk of data corruption and info leakage between VMs? > Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ? >Thanks, >Zhang Haoyu >>Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [questions] about using vfio to assign sr-iov vf to vm 2014-08-18 9:49 ` [Qemu-devel] " Zhang Haoyu @ 2014-08-18 12:53 ` Alex Williamson -1 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-18 12:53 UTC (permalink / raw) To: Zhang Haoyu; +Cc: qemu-devel, kvm, bhelgaas, donald.d.dugger, xudong.hao On Mon, 2014-08-18 at 17:49 +0800, Zhang Haoyu wrote: > >>> >> >> Hi, all > >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, > >>> >> >> so how to only unbind (part of) the VFs but PF? > >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >>> >> >> I think I misunderstand someting, > >>> >> >> any advises? > >>> >> > > >>> >> >This occurs when the PF is installed behind components in the system > >>> >> >that do not support PCIe Access Control Services (ACS). The IOMMU group > >>> >> >contains both the PF and the VF because upstream transactions can be > >>> >> >re-routed downstream by these non-ACS components before being translated > >>> >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >>> >> >version and we might be able to give you some advise on how to work > >>> >> >around the problem. Thanks, > >>> >> > > >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > >>> >> does 00:01.1 PCI bridge support ACS ? > >>> > > >>> >It does not and that's exactly the problem. We must assume that the > >>> >root port can redirect a transaction from a subordinate device back to > >>> >another subordinate device without IOMMU translation when ACS support is > >>> >not present. If you had a device plugged in below 00:01.0, we'd also > >>> >need to assume that non-IOMMU translated peer-to-peer between devices > >>> >behind either function, 00:01.0 or 00:01.1, is possible. > >>> > > >>> >Intel has indicated that processor root ports for all Xeon class > >>> >processors should support ACS and have verified isolation for PCH based > >>> >root ports allowing us to support quirks in place of ACS support. I'm > >>> >not aware of any efforts at Intel to verify isolation capabilities of > >>> >root ports on client processors. They are however aware that lack of > >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll > >>> >see future products with ACS support. > >>> > > >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an > >>> >ACS quirk, but it seems that your system has a PCIe switch below the > >>> >root port. If the PCIe switch downstream ports support ACS, then you > >>> >may be able to move the 82599 to the empty slot at bus 07 to separate > >>> >the VFs into different IOMMU groups. Thanks, > >>> > > >>> Thanks, Alex, > >>> how to tell whether a PCI bridge/deivce support ACS capability? > >>> > >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched. > >>> # lspci -vvv -s 00:1c.0 > >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) > >> > >> > >>Ideally there would be capabilities for it, something like: > >> > >>Capabilities [xxx] Access Control Services... > >> > >>But, Intel failed to provide this, so we enable "effective" ACS > >>capabilities via a quirk: > >> > >>drivers/pci/quirks.c: > >>/* > >> * Many Intel PCH root ports do provide ACS-like features to disable peer > >> * transactions and validate bus numbers in requests, but do not provide an > >> * actual PCIe ACS capability. This is the list of device IDs known to fall > >> * into that category as provided by Intel in Red Hat bugzilla 1037684. > >> */ > >>static const u16 pci_quirk_intel_pch_acs_ids[] = { > >> /* Ibexpeak PCH */ > >> 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, > >> 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, > >> /* Cougarpoint PCH */ > >> 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, > >> 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, > >> /* Pantherpoint PCH */ > >> 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, > >> 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, > >> /* Lynxpoint-H PCH */ > >> 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, > >> 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, > >> /* Lynxpoint-LP PCH */ > >> 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, > >> 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, > >> /* Wildcat PCH */ > >> 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, > >> 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, > >> /* Patsburg (X79) PCH */ > >> 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, > >>}; > >> > >>Hopefully if you run 'lspci -n', you'll see your device ID listed among > >>these. We don't currently have any quirks for PCIe switches, so if your > >>IOMMU group is still bigger than it should be, that may be the reason. > >>Thanks, > >> > >Using device specific mechanisms to enable and verify ACS-like capability is okay, > >but with regard to those devices which completely don't support ACS-like capabilities, > >what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities, > >and how to reduce the risk of data corruption and info leakage between VMs? > > > Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ? You're welcome to use that patch, but you do so at your own risk. Upstream is not willing to accept that patch because incorrectly assuming isolation can lead to subtle issues which are difficult or impossible to support. This is why we only support ACS reported isolation or vendor verified quirks. If you're able to work with the component vendor to verify isolation, we'd welcome more quirks in this area. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm @ 2014-08-18 12:53 ` Alex Williamson 0 siblings, 0 replies; 18+ messages in thread From: Alex Williamson @ 2014-08-18 12:53 UTC (permalink / raw) To: Zhang Haoyu; +Cc: bhelgaas, xudong.hao, donald.d.dugger, qemu-devel, kvm On Mon, 2014-08-18 at 17:49 +0800, Zhang Haoyu wrote: > >>> >> >> Hi, all > >>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem, > >>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM, > >>> >> >> so how to only unbind (part of) the VFs but PF? > >>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group? > >>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared. > >>> >> >> I think I misunderstand someting, > >>> >> >> any advises? > >>> >> > > >>> >> >This occurs when the PF is installed behind components in the system > >>> >> >that do not support PCIe Access Control Services (ACS). The IOMMU group > >>> >> >contains both the PF and the VF because upstream transactions can be > >>> >> >re-routed downstream by these non-ACS components before being translated > >>> >> >by the IOMMU. Please provide 'sudo lspci -vvv', 'lspci -n', and kernel > >>> >> >version and we might be able to give you some advise on how to work > >>> >> >around the problem. Thanks, > >>> >> > > >>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1), > >>> >> does 00:01.1 PCI bridge support ACS ? > >>> > > >>> >It does not and that's exactly the problem. We must assume that the > >>> >root port can redirect a transaction from a subordinate device back to > >>> >another subordinate device without IOMMU translation when ACS support is > >>> >not present. If you had a device plugged in below 00:01.0, we'd also > >>> >need to assume that non-IOMMU translated peer-to-peer between devices > >>> >behind either function, 00:01.0 or 00:01.1, is possible. > >>> > > >>> >Intel has indicated that processor root ports for all Xeon class > >>> >processors should support ACS and have verified isolation for PCH based > >>> >root ports allowing us to support quirks in place of ACS support. I'm > >>> >not aware of any efforts at Intel to verify isolation capabilities of > >>> >root ports on client processors. They are however aware that lack of > >>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll > >>> >see future products with ACS support. > >>> > > >>> >Chances are good that the PCH root port at 00:1c.0 is supported by an > >>> >ACS quirk, but it seems that your system has a PCIe switch below the > >>> >root port. If the PCIe switch downstream ports support ACS, then you > >>> >may be able to move the 82599 to the empty slot at bus 07 to separate > >>> >the VFs into different IOMMU groups. Thanks, > >>> > > >>> Thanks, Alex, > >>> how to tell whether a PCI bridge/deivce support ACS capability? > >>> > >>> I perform "lspci -vvv -s | grep -i ACS", nothing matched. > >>> # lspci -vvv -s 00:1c.0 > >>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode]) > >> > >> > >>Ideally there would be capabilities for it, something like: > >> > >>Capabilities [xxx] Access Control Services... > >> > >>But, Intel failed to provide this, so we enable "effective" ACS > >>capabilities via a quirk: > >> > >>drivers/pci/quirks.c: > >>/* > >> * Many Intel PCH root ports do provide ACS-like features to disable peer > >> * transactions and validate bus numbers in requests, but do not provide an > >> * actual PCIe ACS capability. This is the list of device IDs known to fall > >> * into that category as provided by Intel in Red Hat bugzilla 1037684. > >> */ > >>static const u16 pci_quirk_intel_pch_acs_ids[] = { > >> /* Ibexpeak PCH */ > >> 0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49, > >> 0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51, > >> /* Cougarpoint PCH */ > >> 0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17, > >> 0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f, > >> /* Pantherpoint PCH */ > >> 0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17, > >> 0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f, > >> /* Lynxpoint-H PCH */ > >> 0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17, > >> 0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f, > >> /* Lynxpoint-LP PCH */ > >> 0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17, > >> 0x9c18, 0x9c19, 0x9c1a, 0x9c1b, > >> /* Wildcat PCH */ > >> 0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97, > >> 0x9c98, 0x9c99, 0x9c9a, 0x9c9b, > >> /* Patsburg (X79) PCH */ > >> 0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e, > >>}; > >> > >>Hopefully if you run 'lspci -n', you'll see your device ID listed among > >>these. We don't currently have any quirks for PCIe switches, so if your > >>IOMMU group is still bigger than it should be, that may be the reason. > >>Thanks, > >> > >Using device specific mechanisms to enable and verify ACS-like capability is okay, > >but with regard to those devices which completely don't support ACS-like capabilities, > >what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities, > >and how to reduce the risk of data corruption and info leakage between VMs? > > > Any update compared with http://thread.gmane.org/gmane.comp.emulators.kvm.devel/110726/focus=111515 ? You're welcome to use that patch, but you do so at your own risk. Upstream is not willing to accept that patch because incorrectly assuming isolation can lead to subtle issues which are difficult or impossible to support. This is why we only support ACS reported isolation or vendor verified quirks. If you're able to work with the component vendor to verify isolation, we'd welcome more quirks in this area. Thanks, Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-08-18 12:54 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-14 8:22 [questions] about using vfio to assign sr-iov vf to vm Zhang Haoyu 2014-08-14 8:22 ` [Qemu-devel] " Zhang Haoyu 2014-08-14 12:44 ` Alex Williamson 2014-08-14 12:44 ` [Qemu-devel] " Alex Williamson 2014-08-16 6:48 ` Zhang Haoyu 2014-08-16 6:48 ` [Qemu-devel] " Zhang Haoyu 2014-08-16 13:29 ` Alex Williamson 2014-08-16 13:29 ` [Qemu-devel] " Alex Williamson 2014-08-18 1:00 ` Zhang Haoyu 2014-08-18 1:00 ` [Qemu-devel] " Zhang Haoyu 2014-08-18 1:14 ` Alex Williamson 2014-08-18 1:14 ` [Qemu-devel] " Alex Williamson 2014-08-18 8:46 ` Zhang Haoyu 2014-08-18 8:46 ` [Qemu-devel] " Zhang Haoyu 2014-08-18 9:49 ` Zhang Haoyu 2014-08-18 9:49 ` [Qemu-devel] " Zhang Haoyu 2014-08-18 12:53 ` Alex Williamson 2014-08-18 12:53 ` [Qemu-devel] " Alex Williamson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.