All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhang Haoyu" <zhanghy@sangfor.com>
To: "Alex Williamson" <alex.williamson@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>,
	bhelgaas <bhelgaas@google.com>,
	"donald.d.dugger" <donald.d.dugger@intel.com>
Subject: Re: [questions] about using vfio to assign sr-iov vf to vm
Date: Mon, 18 Aug 2014 16:46:48 +0800	[thread overview]
Message-ID: <201408181646457989269@sangfor.com> (raw)
In-Reply-To: 1408324462.9800.342.camel@ul30vt.home

>> >> >> Hi, all
>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
>> >> >> so how to only unbind (part of) the VFs but PF?
>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> >> >> I think I misunderstand someting,
>> >> >> any advises?
>> >> >
>> >> >This occurs when the PF is installed behind components in the system
>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>> >> >contains both the PF and the VF because upstream transactions can be
>> >> >re-routed downstream by these non-ACS components before being translated
>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>> >> >version and we might be able to give you some advise on how to work
>> >> >around the problem.  Thanks,
>> >> >
>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>> >> does 00:01.1 PCI bridge support ACS ?
>> >
>> >It does not and that's exactly the problem.  We must assume that the
>> >root port can redirect a transaction from a subordinate device back to
>> >another subordinate device without IOMMU translation when ACS support is
>> >not present.  If you had a device plugged in below 00:01.0, we'd also
>> >need to assume that non-IOMMU translated peer-to-peer between devices
>> >behind either function, 00:01.0 or 00:01.1, is possible.
>> >
>> >Intel has indicated that processor root ports for all Xeon class
>> >processors should support ACS and have verified isolation for PCH based
>> >root ports allowing us to support quirks in place of ACS support.  I'm
>> >not aware of any efforts at Intel to verify isolation capabilities of
>> >root ports on client processors.  They are however aware that lack of
>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
>> >see future products with ACS support.
>> >
>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
>> >ACS quirk, but it seems that your system has a PCIe switch below the
>> >root port.  If the PCIe switch downstream ports support ACS, then you
>> >may be able to move the 82599 to the empty slot at bus 07 to separate
>> >the VFs into different IOMMU groups.  Thanks,
>> >
>> Thanks, Alex,
>> how to tell whether a PCI bridge/deivce support ACS capability?
>> 
>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
>> # lspci -vvv -s 00:1c.0
>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
>
>
>Ideally there would be capabilities for it, something like:
>
>Capabilities [xxx] Access Control Services...
>
>But, Intel failed to provide this, so we enable "effective" ACS
>capabilities via a quirk:
>
>drivers/pci/quirks.c:
>/*
> * Many Intel PCH root ports do provide ACS-like features to disable peer
> * transactions and validate bus numbers in requests, but do not provide an
> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> */
>static const u16 pci_quirk_intel_pch_acs_ids[] = {
>        /* Ibexpeak PCH */
>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
>        /* Cougarpoint PCH */
>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
>        /* Pantherpoint PCH */
>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
>        /* Lynxpoint-H PCH */
>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
>        /* Lynxpoint-LP PCH */
>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
>        /* Wildcat PCH */
>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
>        /* Patsburg (X79) PCH */
>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
>};
>
>Hopefully if you run 'lspci -n', you'll see your device ID listed among
>these.  We don't currently have any quirks for PCIe switches, so if your
>IOMMU group is still bigger than it should be, that may be the reason.
>Thanks,
>
Using device specific mechanisms to enable and verify ACS-like capability is okay,
but with regard to those devices which completely don't support ACS-like capabilities, 
what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
and how to reduce the risk of data corruption and info leakage between VMs?

Thanks,
Zhang Haoyu
>Alex


WARNING: multiple messages have this Message-ID (diff)
From: "Zhang Haoyu" <zhanghy@sangfor.com>
To: "Alex Williamson" <alex.williamson@redhat.com>
Cc: bhelgaas <bhelgaas@google.com>,
	"donald.d.dugger" <donald.d.dugger@intel.com>,
	qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] [questions] about using vfio to assign sr-iov vf to vm
Date: Mon, 18 Aug 2014 16:46:48 +0800	[thread overview]
Message-ID: <201408181646457989269@sangfor.com> (raw)
In-Reply-To: 1408324462.9800.342.camel@ul30vt.home

>> >> >> Hi, all
>> >> >> I'm using VFIO to assign intel 82599 VF to VM, now I encounter a problem,
>> >> >> 82599 PF and its VFs belong to the same iommu_group, but I only want to assign some VFs to one VM, and some other VFs to another VM,
>> >> >> so how to only unbind (part of) the VFs but PF?
>> >> >> I read the kernel doc vfio.txt, I'm not sure should I unbind all of the devices which belong to one iommu_group?
>> >> >> If so, because PF and its VFs belong to the same iommu_group, if I unbind the PF, its VFs also diappeared.
>> >> >> I think I misunderstand someting,
>> >> >> any advises?
>> >> >
>> >> >This occurs when the PF is installed behind components in the system
>> >> >that do not support PCIe Access Control Services (ACS).  The IOMMU group
>> >> >contains both the PF and the VF because upstream transactions can be
>> >> >re-routed downstream by these non-ACS components before being translated
>> >> >by the IOMMU.  Please provide 'sudo lspci -vvv', 'lspci -n', and kernel
>> >> >version and we might be able to give you some advise on how to work
>> >> >around the problem.  Thanks,
>> >> >
>> >> The intel 82599(02:00.0 or 02:00.1) is behind the pci bridge (00:01.1),  
>> >> does 00:01.1 PCI bridge support ACS ?
>> >
>> >It does not and that's exactly the problem.  We must assume that the
>> >root port can redirect a transaction from a subordinate device back to
>> >another subordinate device without IOMMU translation when ACS support is
>> >not present.  If you had a device plugged in below 00:01.0, we'd also
>> >need to assume that non-IOMMU translated peer-to-peer between devices
>> >behind either function, 00:01.0 or 00:01.1, is possible.
>> >
>> >Intel has indicated that processor root ports for all Xeon class
>> >processors should support ACS and have verified isolation for PCH based
>> >root ports allowing us to support quirks in place of ACS support.  I'm
>> >not aware of any efforts at Intel to verify isolation capabilities of
>> >root ports on client processors.  They are however aware that lack of
>> >ACS is a limiting factor for usability of VT-d, and I hope that we'll
>> >see future products with ACS support.
>> >
>> >Chances are good that the PCH root port at 00:1c.0 is supported by an
>> >ACS quirk, but it seems that your system has a PCIe switch below the
>> >root port.  If the PCIe switch downstream ports support ACS, then you
>> >may be able to move the 82599 to the empty slot at bus 07 to separate
>> >the VFs into different IOMMU groups.  Thanks,
>> >
>> Thanks, Alex,
>> how to tell whether a PCI bridge/deivce support ACS capability?
>> 
>> I perform "lspci -vvv -s | grep -i ACS", nothing matched.
>> # lspci -vvv -s 00:1c.0
>> 00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
>
>
>Ideally there would be capabilities for it, something like:
>
>Capabilities [xxx] Access Control Services...
>
>But, Intel failed to provide this, so we enable "effective" ACS
>capabilities via a quirk:
>
>drivers/pci/quirks.c:
>/*
> * Many Intel PCH root ports do provide ACS-like features to disable peer
> * transactions and validate bus numbers in requests, but do not provide an
> * actual PCIe ACS capability.  This is the list of device IDs known to fall
> * into that category as provided by Intel in Red Hat bugzilla 1037684.
> */
>static const u16 pci_quirk_intel_pch_acs_ids[] = {
>        /* Ibexpeak PCH */
>        0x3b42, 0x3b43, 0x3b44, 0x3b45, 0x3b46, 0x3b47, 0x3b48, 0x3b49,
>        0x3b4a, 0x3b4b, 0x3b4c, 0x3b4d, 0x3b4e, 0x3b4f, 0x3b50, 0x3b51,
>        /* Cougarpoint PCH */
>        0x1c10, 0x1c11, 0x1c12, 0x1c13, 0x1c14, 0x1c15, 0x1c16, 0x1c17,
>        0x1c18, 0x1c19, 0x1c1a, 0x1c1b, 0x1c1c, 0x1c1d, 0x1c1e, 0x1c1f,
>        /* Pantherpoint PCH */
>        0x1e10, 0x1e11, 0x1e12, 0x1e13, 0x1e14, 0x1e15, 0x1e16, 0x1e17,
>        0x1e18, 0x1e19, 0x1e1a, 0x1e1b, 0x1e1c, 0x1e1d, 0x1e1e, 0x1e1f,
>        /* Lynxpoint-H PCH */
>        0x8c10, 0x8c11, 0x8c12, 0x8c13, 0x8c14, 0x8c15, 0x8c16, 0x8c17,
>        0x8c18, 0x8c19, 0x8c1a, 0x8c1b, 0x8c1c, 0x8c1d, 0x8c1e, 0x8c1f,
>        /* Lynxpoint-LP PCH */
>        0x9c10, 0x9c11, 0x9c12, 0x9c13, 0x9c14, 0x9c15, 0x9c16, 0x9c17,
>        0x9c18, 0x9c19, 0x9c1a, 0x9c1b,
>        /* Wildcat PCH */
>        0x9c90, 0x9c91, 0x9c92, 0x9c93, 0x9c94, 0x9c95, 0x9c96, 0x9c97,
>        0x9c98, 0x9c99, 0x9c9a, 0x9c9b,
>        /* Patsburg (X79) PCH */
>        0x1d10, 0x1d12, 0x1d14, 0x1d16, 0x1d18, 0x1d1a, 0x1d1c, 0x1d1e,
>};
>
>Hopefully if you run 'lspci -n', you'll see your device ID listed among
>these.  We don't currently have any quirks for PCIe switches, so if your
>IOMMU group is still bigger than it should be, that may be the reason.
>Thanks,
>
Using device specific mechanisms to enable and verify ACS-like capability is okay,
but with regard to those devices which completely don't support ACS-like capabilities, 
what shall we do, how about applying the [PATCH] pci: Enable overrides for missing ACS capabilities,
and how to reduce the risk of data corruption and info leakage between VMs?

Thanks,
Zhang Haoyu
>Alex

  reply	other threads:[~2014-08-18  8:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-14  8:22 [questions] about using vfio to assign sr-iov vf to vm Zhang Haoyu
2014-08-14  8:22 ` [Qemu-devel] " Zhang Haoyu
2014-08-14 12:44 ` Alex Williamson
2014-08-14 12:44   ` [Qemu-devel] " Alex Williamson
2014-08-16  6:48 ` Zhang Haoyu
2014-08-16  6:48   ` [Qemu-devel] " Zhang Haoyu
2014-08-16 13:29   ` Alex Williamson
2014-08-16 13:29     ` [Qemu-devel] " Alex Williamson
2014-08-18  1:00     ` Zhang Haoyu
2014-08-18  1:00       ` [Qemu-devel] " Zhang Haoyu
2014-08-18  1:14       ` Alex Williamson
2014-08-18  1:14         ` [Qemu-devel] " Alex Williamson
2014-08-18  8:46         ` Zhang Haoyu [this message]
2014-08-18  8:46           ` Zhang Haoyu
2014-08-18  9:49           ` Zhang Haoyu
2014-08-18  9:49             ` [Qemu-devel] " Zhang Haoyu
2014-08-18 12:53             ` Alex Williamson
2014-08-18 12:53               ` [Qemu-devel] " Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201408181646457989269@sangfor.com \
    --to=zhanghy@sangfor.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=donald.d.dugger@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.