From: "Tian, Kevin" <kevin.tian@intel.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
"Lan, Tianyu" <tianyu.lan@intel.com>
Cc: "yang.zhang.wz@gmail.com" <yang.zhang.wz@gmail.com>,
"xuquan8@huawei.com" <xuquan8@huawei.com>,
Stefano Stabellini <sstabellini@kernel.org>,
Jan Beulich <JBeulich@suse.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
"ian.jackson@eu.citrix.com" <ian.jackson@eu.citrix.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"Nakajima, Jun" <jun.nakajima@intel.com>,
"anthony.perard@citrix.com" <anthony.perard@citrix.com>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: Xen virtual IOMMU high level design doc V2
Date: Thu, 20 Oct 2016 10:11:01 +0000 [thread overview]
Message-ID: <AADFC41AFE54684AB9EE6CBC0274A5D18DFD4F3F@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <20161018202631.GO30736@x230.dumpdata.com>
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, October 19, 2016 4:27 AM
> >
> > 2) For physical PCI device
> > DMA operations go though physical IOMMU directly and IO page table for
> > IOVA->HPA should be loaded into physical IOMMU. When guest updates
> > l2 Page-table pointer field, it provides IO page table for
> > IOVA->GPA. vIOMMU needs to shadow l2 translation table, translate
> > GPA->HPA and update shadow page table(IOVA->HPA) pointer to l2
> > Page-table pointer to context entry of physical IOMMU.
> >
> > Now all PCI devices in same hvm domain share one IO page table
> > (GPA->HPA) in physical IOMMU driver of Xen. To support l2
> > translation of vIOMMU, IOMMU driver need to support multiple address
> > spaces per device entry. Using existing IO page table(GPA->HPA)
> > defaultly and switch to shadow IO page table(IOVA->HPA) when l2
>
> defaultly?
>
> > translation function is enabled. These change will not affect current
> > P2M logic.
>
> What happens if the guests IO page tables have incorrect values?
>
> For example the guest sets up the pagetables to cover some section
> of HPA ranges (which are all good and permitted). But then during execution
> the guest kernel decides to muck around with the pagetables and adds an HPA
> range that is outside what the guest has been allocated.
>
> What then?
Shadow PTE is controlled by hypervisor. Whatever IOVA->GPA mapping in
guest PTE must be validated (IOVA->GPA->HPA) before updating into the
shadow PTE. So regardless of when guest mucks its PTE, the operation is
always trapped and validated. Why do you think there is a problem?
Also guest only sees GPA. All it can operate is GPA ranges.
> >
> > 3.3 Interrupt remapping
> > Interrupts from virtual devices and physical devices will be delivered
> > to vlapic from vIOAPIC and vMSI. It needs to add interrupt remapping
> > hooks in the vmsi_deliver() and ioapic_deliver() to find target vlapic
> > according interrupt remapping table.
> >
> >
> > 3.4 l1 translation
> > When nested translation is enabled, any address generated by l1
> > translation is used as the input address for nesting with l2
> > translation. Physical IOMMU needs to enable both l1 and l2 translation
> > in nested translation mode(GVA->GPA->HPA) for passthrough
> > device.
> >
> > VT-d context entry points to guest l1 translation table which
> > will be nest-translated by l2 translation table and so it
> > can be directly linked to context entry of physical IOMMU.
>
> I think this means that the shared_ept will be disabled?
> >
> What about different versions of contexts? Say the V1 is exposed
> to guest but the hardware supports V2? Are there any flags that have
> swapped positions? Or is it pretty backwards compatible?
yes, backward compatible.
> >
> >
> > 3.5 Implementation consideration
> > VT-d spec doesn't define a capability bit for the l2 translation.
> > Architecturally there is no way to tell guest that l2 translation
> > capability is not available. Linux Intel IOMMU driver thinks l2
> > translation is always available when VTD exits and fail to be loaded
> > without l2 translation support even if interrupt remapping and l1
> > translation are available. So it needs to enable l2 translation first
>
> I am lost on that sentence. Are you saying that it tries to load
> the IOVA and if they fail.. then it keeps on going? What is the result
> of this? That you can't do IOVA (so can't use vfio ?)
It's about VT-d capability. VT-d supports both 1st-level and 2nd-level
translation, however only the 1st-level translation can be optionally
reported through a capability bit. There is no capability bit to say
a version doesn't support 2nd-level translation. The implication is
that, as long as a vIOMMU is exposed, guest IOMMU driver always
assumes IOVA capability available thru 2nd level translation.
So we can first emulate a vIOMMU w/ only 2nd-level capability, and
then extend it to support 1st-level and interrupt remapping, but cannot
do the reverse direction. I think Tianyu's point is more to describe
enabling sequence based on this fact. :-)
> > 4.1 Qemu vIOMMU framework
> > Qemu has a framework to create virtual IOMMU(e.g. virtual intel VTD and
> > AMD IOMMU) and report in guest ACPI table. So for Xen side, a dummy
> > xen-vIOMMU wrapper is required to connect with actual vIOMMU in Xen.
> > Especially for l2 translation of virtual PCI device because
> > emulations of virtual PCI devices are in the Qemu. Qemu's vIOMMU
> > framework provides callback to deal with l2 translation when
> > DMA operations of virtual PCI devices happen.
>
> You say AMD and Intel. This sounds quite OS agnostic. Does it mean you
> could expose an vIOMMU to a guest and actually use the AMD IOMMU
> in the hypervisor?
Did you mean "expose an Intel vIOMMU to guest and then use physical
AMD IOMMU in hypervisor"? I didn't think about this, but what's the value
of doing so? :-)
Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-10-20 10:11 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-26 8:29 Discussion about virtual iommu support for Xen guest Lan Tianyu
2016-05-26 8:42 ` Dong, Eddie
2016-05-27 2:26 ` Lan Tianyu
2016-05-27 8:11 ` Tian, Kevin
2016-05-26 11:35 ` Andrew Cooper
2016-05-27 8:19 ` Lan Tianyu
2016-06-02 15:03 ` Lan, Tianyu
2016-06-02 18:58 ` Andrew Cooper
2016-06-03 11:01 ` Current PVH/HVMlite work and planning (was :Re: Discussion about virtual iommu support for Xen guest) Roger Pau Monne
2016-06-03 11:21 ` Tian, Kevin
2016-06-03 11:52 ` Roger Pau Monne
2016-06-03 12:11 ` Tian, Kevin
2016-06-03 16:56 ` Stefano Stabellini
2016-06-07 5:48 ` Tian, Kevin
2016-06-03 11:17 ` Discussion about virtual iommu support for Xen guest Tian, Kevin
2016-06-03 13:09 ` Lan, Tianyu
2016-06-03 14:00 ` Andrew Cooper
2016-06-03 13:51 ` Andrew Cooper
2016-06-03 14:31 ` Jan Beulich
2016-06-03 17:14 ` Stefano Stabellini
2016-06-07 5:14 ` Tian, Kevin
2016-06-07 7:26 ` Jan Beulich
2016-06-07 10:07 ` Stefano Stabellini
2016-06-08 8:11 ` Tian, Kevin
2016-06-26 13:42 ` Lan, Tianyu
2016-06-29 3:04 ` Tian, Kevin
2016-07-05 13:37 ` Lan, Tianyu
2016-07-05 13:57 ` Jan Beulich
2016-07-05 14:19 ` Lan, Tianyu
2016-08-17 12:05 ` Xen virtual IOMMU high level design doc Lan, Tianyu
2016-08-17 12:42 ` Paul Durrant
2016-08-18 2:57 ` Lan, Tianyu
2016-08-25 11:11 ` Jan Beulich
2016-08-31 8:39 ` Lan Tianyu
2016-08-31 12:02 ` Jan Beulich
2016-09-01 1:26 ` Tian, Kevin
2016-09-01 2:35 ` Lan Tianyu
2016-09-15 14:22 ` Lan, Tianyu
2016-10-05 18:36 ` Konrad Rzeszutek Wilk
2016-10-11 1:52 ` Lan Tianyu
2016-11-23 18:19 ` Edgar E. Iglesias
2016-11-23 19:09 ` Stefano Stabellini
2016-11-24 2:00 ` Tian, Kevin
2016-11-24 4:09 ` Edgar E. Iglesias
2016-11-24 6:49 ` Lan Tianyu
2016-11-24 13:37 ` Edgar E. Iglesias
2016-11-25 2:01 ` Xuquan (Quan Xu)
2016-11-25 5:53 ` Lan, Tianyu
2016-10-18 14:14 ` Xen virtual IOMMU high level design doc V2 Lan Tianyu
2016-10-18 19:17 ` Andrew Cooper
2016-10-20 9:53 ` Tian, Kevin
2016-10-20 18:10 ` Andrew Cooper
2016-10-20 14:17 ` Lan Tianyu
2016-10-20 20:36 ` Andrew Cooper
2016-10-22 7:32 ` Lan, Tianyu
2016-10-26 9:39 ` Jan Beulich
2016-10-26 15:03 ` Lan, Tianyu
2016-11-03 15:41 ` Lan, Tianyu
2016-10-28 15:36 ` Lan Tianyu
2016-10-18 20:26 ` Konrad Rzeszutek Wilk
2016-10-20 10:11 ` Tian, Kevin [this message]
2016-10-20 14:56 ` Lan, Tianyu
2016-10-26 9:36 ` Jan Beulich
2016-10-26 14:53 ` Lan, Tianyu
2016-11-17 15:36 ` Xen virtual IOMMU high level design doc V3 Lan Tianyu
2016-11-18 19:43 ` Julien Grall
2016-11-21 2:21 ` Lan, Tianyu
2016-11-21 13:17 ` Julien Grall
2016-11-21 18:24 ` Stefano Stabellini
2016-11-21 7:05 ` Tian, Kevin
2016-11-23 1:36 ` Lan Tianyu
2016-11-21 13:41 ` Andrew Cooper
2016-11-22 6:02 ` Tian, Kevin
2016-11-22 8:32 ` Lan Tianyu
2016-11-22 10:24 ` Jan Beulich
2016-11-24 2:34 ` Lan Tianyu
2016-06-03 19:51 ` Is: 'basic pci bridge and root device support. 'Was:Re: Discussion about virtual iommu support for Xen guest Konrad Rzeszutek Wilk
2016-06-06 9:55 ` Jan Beulich
2016-06-06 17:25 ` Konrad Rzeszutek Wilk
2016-08-02 15:15 ` Lan, Tianyu
2016-05-27 8:35 ` Tian, Kevin
2016-05-27 8:46 ` Paul Durrant
2016-05-27 9:39 ` Tian, Kevin
2016-05-31 9:43 ` George Dunlap
2016-05-27 2:26 ` Yang Zhang
2016-05-27 8:13 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AADFC41AFE54684AB9EE6CBC0274A5D18DFD4F3F@SHSMSX101.ccr.corp.intel.com \
--to=kevin.tian@intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=anthony.perard@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jun.nakajima@intel.com \
--cc=konrad.wilk@oracle.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=tianyu.lan@intel.com \
--cc=xen-devel@lists.xensource.com \
--cc=xuquan8@huawei.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).