Xen-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] x86/hvm: set 'ipat' in EPT for special pages
@ 2020-07-31 10:46 Paul Durrant
  2020-07-31 11:14 ` Jan Beulich
  2020-07-31 11:21 ` Andrew Cooper
  0 siblings, 2 replies; 5+ messages in thread
From: Paul Durrant @ 2020-07-31 10:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul Durrant, Wei Liu, Jan Beulich, Roger Pau Monné

From: Paul Durrant <pdurrant@amazon.com>

All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
when PV drivers running in a guest populate the BAR space of the Xen Platform
PCI Device with pages such as the Shared Info page or Grant Table pages,
accesses to these pages will be cachable.

However, should IOMMU mappings be enabled be enabled for the guest then these
accesses become uncachable. This has a substantial negative effect on I/O
throughput of PV devices. Arguably PV drivers should bot be using BAR space to
host the Shared Info and Grant Table pages but it is currently commonplace for
them to do this and so this problem needs mitigation. Hence this patch makes
sure the 'ipat' bit is set for any special page regardless of where in GFN
space it is mapped.

NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
      that there is any similar mitigation possible for AMD NPT. Downstreams
      such as Citrix XenServer have been carrying a patch similar to this for
      several releases though.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
 xen/arch/x86/hvm/mtrr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 511c3be1c8..3ad813ed15 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -830,7 +830,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_UNCACHABLE;
     }
 
-    if ( !is_iommu_enabled(d) && !cache_flush_permitted(d) )
+    if ( (!is_iommu_enabled(d) && !cache_flush_permitted(d)) ||
+         is_special_page(mfn_to_page(mfn)) )
     {
         *ipat = 1;
         return MTRR_TYPE_WRBACK;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
  2020-07-31 10:46 [PATCH] x86/hvm: set 'ipat' in EPT for special pages Paul Durrant
@ 2020-07-31 11:14 ` Jan Beulich
  2020-07-31 11:17   ` Paul Durrant
  2020-07-31 11:21 ` Andrew Cooper
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2020-07-31 11:14 UTC (permalink / raw)
  To: Paul Durrant
  Cc: xen-devel, Paul Durrant, Roger Pau Monné, Wei Liu, Andrew Cooper

On 31.07.2020 12:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> when PV drivers running in a guest populate the BAR space of the Xen Platform
> PCI Device with pages such as the Shared Info page or Grant Table pages,
> accesses to these pages will be cachable.
> 
> However, should IOMMU mappings be enabled be enabled for the guest then these
> accesses become uncachable. This has a substantial negative effect on I/O
> throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> host the Shared Info and Grant Table pages but it is currently commonplace for
> them to do this and so this problem needs mitigation. Hence this patch makes
> sure the 'ipat' bit is set for any special page regardless of where in GFN
> space it is mapped.
> 
> NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
>       that there is any similar mitigation possible for AMD NPT. Downstreams
>       such as Citrix XenServer have been carrying a patch similar to this for
>       several releases though.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

However, ...

> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -830,7 +830,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
>          return MTRR_TYPE_UNCACHABLE;
>      }
>  
> -    if ( !is_iommu_enabled(d) && !cache_flush_permitted(d) )
> +    if ( (!is_iommu_enabled(d) && !cache_flush_permitted(d)) ||
> +         is_special_page(mfn_to_page(mfn)) )
>      {
>          *ipat = 1;
>          return MTRR_TYPE_WRBACK;

... shouldn't we leverages this (right away?) to do away with the
APIC access page special case a few lines up from here? It is my
understanding that vmx_alloc_vlapic_mapping() uses
set_mmio_p2m_entry() just in order to get ipat set on the resulting
EPT entry. Yet with the allocation using MEMF_no_refcount, this will
now be the result even if no artificial MMIO mapping was created.

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
  2020-07-31 11:14 ` Jan Beulich
@ 2020-07-31 11:17   ` Paul Durrant
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Durrant @ 2020-07-31 11:17 UTC (permalink / raw)
  To: 'Jan Beulich'
  Cc: xen-devel, 'Paul Durrant', 'Roger Pau Monné',
	'Wei Liu', 'Andrew Cooper'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 31 July 2020 12:15
> To: Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
> 
> On 31.07.2020 12:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> > map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> > when PV drivers running in a guest populate the BAR space of the Xen Platform
> > PCI Device with pages such as the Shared Info page or Grant Table pages,
> > accesses to these pages will be cachable.
> >
> > However, should IOMMU mappings be enabled be enabled for the guest then these
> > accesses become uncachable. This has a substantial negative effect on I/O
> > throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> > host the Shared Info and Grant Table pages but it is currently commonplace for
> > them to do this and so this problem needs mitigation. Hence this patch makes
> > sure the 'ipat' bit is set for any special page regardless of where in GFN
> > space it is mapped.
> >
> > NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
> >       that there is any similar mitigation possible for AMD NPT. Downstreams
> >       such as Citrix XenServer have been carrying a patch similar to this for
> >       several releases though.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> However, ...
> 
> > --- a/xen/arch/x86/hvm/mtrr.c
> > +++ b/xen/arch/x86/hvm/mtrr.c
> > @@ -830,7 +830,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
> >          return MTRR_TYPE_UNCACHABLE;
> >      }
> >
> > -    if ( !is_iommu_enabled(d) && !cache_flush_permitted(d) )
> > +    if ( (!is_iommu_enabled(d) && !cache_flush_permitted(d)) ||
> > +         is_special_page(mfn_to_page(mfn)) )
> >      {
> >          *ipat = 1;
> >          return MTRR_TYPE_WRBACK;
> 
> ... shouldn't we leverages this (right away?) to do away with the
> APIC access page special case a few lines up from here? It is my
> understanding that vmx_alloc_vlapic_mapping() uses
> set_mmio_p2m_entry() just in order to get ipat set on the resulting
> EPT entry. Yet with the allocation using MEMF_no_refcount, this will
> now be the result even if no artificial MMIO mapping was created.

That's a good point. Best handled by a separate patch I think so I'll re-send this with your R-b plus a follow up patch as a v2.

  Paul

> 
> Jan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
  2020-07-31 11:21 ` Andrew Cooper
@ 2020-07-31 11:19   ` Paul Durrant
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Durrant @ 2020-07-31 11:19 UTC (permalink / raw)
  To: 'Andrew Cooper', xen-devel
  Cc: 'Paul Durrant', 'Wei Liu', 'Jan Beulich',
	'Roger Pau Monné'

> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 31 July 2020 12:21
> To: Paul Durrant <paul@xen.org>; xen-devel@lists.xenproject.org
> Cc: Paul Durrant <pdurrant@amazon.com>; Jan Beulich <jbeulich@suse.com>; Wei Liu <wl@xen.org>; Roger
> Pau Monné <roger.pau@citrix.com>
> Subject: Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
> 
> On 31/07/2020 11:46, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> > map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> > when PV drivers running in a guest populate the BAR space of the Xen Platform
> > PCI Device with pages such as the Shared Info page or Grant Table pages,
> > accesses to these pages will be cachable.
> >
> > However, should IOMMU mappings be enabled be enabled for the guest then these
> > accesses become uncachable. This has a substantial negative effect on I/O
> > throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> > host the Shared Info and Grant Table pages but it is currently commonplace for
> > them to do this and so this problem needs mitigation. Hence this patch makes
> > sure the 'ipat' bit is set for any special page regardless of where in GFN
> > space it is mapped.
> >
> > NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
> >       that there is any similar mitigation possible for AMD NPT. Downstreams
> >       such as Citrix XenServer have been carrying a patch similar to this for
> >       several releases though.
> 
> https://github.com/xenserver/xen.pg/blob/XS-8.2.x/master/xen-override-caching-cp-26562.patch
> 
> (Yay for internal ticket references escaping into the wild.)
> 

:-)

> 
> However, it is very important to be aware that this is just papering
> over the problem, and it will cease to function as soon as we get MKTME
> support.  When we hit that point, iPAT cannot be used, as it will cause
> data corruption in guests.
> 
> The only correct way to fix this is to not (mis)use BAR space for RAM
> mappings.
> 

Oh yes, t
his is only a mitigation. I believe Roger is working on a mechanism for guests to query for non-populated RAM space, which would be suitable for use by PV drivers.

  Paul

> ~Andrew



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/hvm: set 'ipat' in EPT for special pages
  2020-07-31 10:46 [PATCH] x86/hvm: set 'ipat' in EPT for special pages Paul Durrant
  2020-07-31 11:14 ` Jan Beulich
@ 2020-07-31 11:21 ` Andrew Cooper
  2020-07-31 11:19   ` Paul Durrant
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2020-07-31 11:21 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Paul Durrant, Wei Liu, Jan Beulich, Roger Pau Monné

On 31/07/2020 11:46, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
>
> All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
> map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
> when PV drivers running in a guest populate the BAR space of the Xen Platform
> PCI Device with pages such as the Shared Info page or Grant Table pages,
> accesses to these pages will be cachable.
>
> However, should IOMMU mappings be enabled be enabled for the guest then these
> accesses become uncachable. This has a substantial negative effect on I/O
> throughput of PV devices. Arguably PV drivers should bot be using BAR space to
> host the Shared Info and Grant Table pages but it is currently commonplace for
> them to do this and so this problem needs mitigation. Hence this patch makes
> sure the 'ipat' bit is set for any special page regardless of where in GFN
> space it is mapped.
>
> NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
>       that there is any similar mitigation possible for AMD NPT. Downstreams
>       such as Citrix XenServer have been carrying a patch similar to this for
>       several releases though.

https://github.com/xenserver/xen.pg/blob/XS-8.2.x/master/xen-override-caching-cp-26562.patch

(Yay for internal ticket references escaping into the wild.)


However, it is very important to be aware that this is just papering
over the problem, and it will cease to function as soon as we get MKTME
support.  When we hit that point, iPAT cannot be used, as it will cause
data corruption in guests.

The only correct way to fix this is to not (mis)use BAR space for RAM
mappings.

~Andrew


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 10:46 [PATCH] x86/hvm: set 'ipat' in EPT for special pages Paul Durrant
2020-07-31 11:14 ` Jan Beulich
2020-07-31 11:17   ` Paul Durrant
2020-07-31 11:21 ` Andrew Cooper
2020-07-31 11:19   ` Paul Durrant

Xen-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/xen-devel/0 xen-devel/git/0.git
	git clone --mirror https://lore.kernel.org/xen-devel/1 xen-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 xen-devel xen-devel/ https://lore.kernel.org/xen-devel \
		xen-devel@lists.xenproject.org xen-devel@lists.xen.org
	public-inbox-index xen-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.xenproject.lists.xen-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git