Xen Security Advisory 346 v3 (CVE-2020-27671)

* Xen Security Advisory 346 v3 (CVE-2020-27671) - undue deferral of IOMMU TLB flushes
@ 2021-01-19 16:34 Xen.org security team
  0 siblings, 0 replies; only message in thread
From: Xen.org security team @ 2021-01-19 16:34 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 6545 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2020-27671 / XSA-346
                              version 3

                  undue deferral of IOMMU TLB flushes

UPDATES IN VERSION 3
====================

CVE assigned.

ISSUE DESCRIPTION
=================

To efficiently change the physical to machine address mappings of a
larger range of addresses for fully virtualized guests, Xen contains
an optimization to coalesce per-page IOMMU TLB flushes into a single,
wider flush after all adjustments have been made.  While this is fine
to do for newly introduced page mappings, the possible removal of
pages from such guests during this operation should not be "optimized"
in the same way.  This is because the (typically) final reference of
such pages is dropped before the coalesced flush, and hence the pages
may have been put to a different use even though DMA initiated by
their original owner mightstill be in progress.

IMPACT
======

A malicious guest might be able to cause data corruption and data
leaks.  Host or guest Denial of Service (DoS), and privilege
escalation, cannot be ruled out.

VULNERABLE SYSTEMS
==================

All Xen versions from 4.2 onwards are vulnerable.  Xen versions 4.1 and
earlier are not vulnerable.

Only x86 HVM and PVH guests can leverage the vulnerability.  Arm guests
as well as x86 PV ones cannot leverage the vulnerability.

Only x86 HVM and PVH guests which have physical devices passed through
to them can leverage the vulnerability.

Only x86 HVM and PVH guests configured to not share IOMMU and CPU
page tables can leverage the vulnerability.  Sharing these page tables
is the default on capable Intel (VT-d) hardware.  On AMD hardware
sharing is not possible.  On Intel (VT-d) hardware sharing may also not
be possible, depending on hardware properties.  Whether it is possible
can be seen from the presence (or absence) of "iommu_hap_pt_share" on
the "virt_caps" line of "xl info" output.  Guests run in shadow mode
can leverage the vulnerability.

MITIGATION
==========

Not passing through physical devices to untrusted guests will avoid
the vulnerability.

On systems permitting page table sharing, not suppressing use of the
functionality will allow to avoid the vulnerability. This means guests
should not be run in
* shadow mode, i.e. hardware needs to be HAP (Hardware Assisted Paging)
  capable, there should not be "hap=0" in the guest's xl configuration
  file, and there should not be "hap=0" or equivalent on Xen's command
  line,
* non-shared page table mode, i.e. hardware needs to be capable of
  sharing, there should not be "passthrough=sync_pt" in the guest's xl
  configuration file, and there should not be "iommu=no-sharept" or
  equivalent on Xen's command line.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

Applying the appropriate pair of attached patches resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa346/xsa346-?.patch           Xen 4.14 - xen-unstable
xsa346/xsa346-4.13-?.patch      Xen 4.13
xsa346/xsa346-4.12-?.patch      Xen 4.12
xsa346/xsa346-4.11-?.patch      Xen 4.11
xsa346/xsa346-4.10-?.patch      Xen 4.10

$ sha256sum xsa346* xsa346*/*
ba560d34cb46f45d6da0ba5d672cb896c173e90de5c022d22415ace20c5e47b8  xsa346.meta
5f8b3e5565bc7d87283af173f5f2b35975e4ab6bff502780799d14fb263f730d  xsa346/xsa346-1.patch
9de89ca360f303e7aa3b42529cdf4191b0700ee7cb6928a22068195e047a4db7  xsa346/xsa346-2.patch
f3612bfad219160917a3bc46ea5b31673137593d62ae4f819a8e80ade0339c5b  xsa346/xsa346-4.10-1.patch
734ed82d583bbce342ffabeb9dd84e300f2717ec71e3de866670b0ddf18d57aa  xsa346/xsa346-4.10-2.patch
7a41bf06e19590cfc69d4f2ac132a23843dcec2ea5f98d86c4be971f9eec86af  xsa346/xsa346-4.11-1.patch
1359801b8f64ac62dc8de4e3acc15ec42c040f692f3a1ee9986acb478ee330cd  xsa346/xsa346-4.11-2.patch
190f594bb77dd044af8f0a051ab1d4143c348da192206da9b390af91c0a2cdec  xsa346/xsa346-4.12-1.patch
5bcb65dc45f6d74c644ee6b6add518044c9875e6759254773d3816e718c2be28  xsa346/xsa346-4.12-2.patch
69e0158276a922829eb60dc5bb13e60a71a232ace808843f45dac407716b107b  xsa346/xsa346-4.13-1.patch
eb8132a02c252dc65be1f334939f252db0c30ae2db8aa23f0d9e67f8148e2d2d  xsa346/xsa346-4.13-2.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patches described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

HOWEVER, deployment of the mitigations is NOT permitted (except where
all the affected systems and VMs are administered and used only by
organisations which are members of the Xen Project Security Issues
Predisclosure List).  Specifically, deployment on public cloud systems
is NOT permitted.

This is because removal of pass-through devices or their replacement by
emulated devices is a guest visible configuration change, which may lead
to re-discovery of the issue.  Similarly the possible guest
configuration changes can't be excluded to be noticeable to guests.

Deployment of this mitigation is permitted only AFTER the embargo ends.

AND: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.

(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmAHB6UMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZaK8IALUyLvMQUQROvO6h/e6Nr+hfA8ilByV9iGEzfXjg
LENdwiFMqdeB3MwbTuMHTE+6i8+S16+fcakamyZZZTFmNjNaOGiGrS/vQ9omsRzr
BaKg/X6AE81lNas5OW2sltjbcLitvSx+AZclhYMi/Te3rKqIue9U/m59mUw3TPfs
HQ7ANTxLfUF4Pi7R6tS3uu2bSa02AXg+WZoB8YcSk/hcsB6x1leTe9DQhIGwHDLP
yP8UeIl6yyMDEsfs11IxhmIMDCshLu/8NjMHcrBxTxQBvSeqmcCFf99sPTqvyNhj
1t95twToNRgO0UJPyD6230F7/VUqw2Y7b0bnMC/iDvFi0+A=
=WOL9
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa346.meta --]
[-- Type: application/octet-stream, Size: 1880 bytes --]

{
  "XSA": 346,
  "SupportedVersions": [
    "master",
    "4.14",
    "4.13",
    "4.12",
    "4.11",
    "4.10"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.10": {
      "Recipes": {
        "xen": {
          "StableRef": "1719f79a0efd36d15837c51982173dd1c287dced",
          "Prereqs": [
            286,
            345
          ],
          "Patches": [
            "xsa346/xsa346-4.10-?.patch"
          ]
        }
      }
    },
    "4.11": {
      "Recipes": {
        "xen": {
          "StableRef": "3630a367854c98bbf8e747d09eeab7e68f370003",
          "Prereqs": [
            286,
            345
          ],
          "Patches": [
            "xsa346/xsa346-4.11-?.patch"
          ]
        }
      }
    },
    "4.12": {
      "Recipes": {
        "xen": {
          "StableRef": "6888017392ac25b5e588554030642affac25a95d",
          "Prereqs": [
            286,
            345
          ],
          "Patches": [
            "xsa346/xsa346-4.12-?.patch"
          ]
        }
      }
    },
    "4.13": {
      "Recipes": {
        "xen": {
          "StableRef": "8e7e5857a203c9d9df7733fd68768555c7e76839",
          "Prereqs": [
            286,
            345
          ],
          "Patches": [
            "xsa346/xsa346-4.13-?.patch"
          ]
        }
      }
    },
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "c93b520a41f2787dd76bfb2e454836d1d5787505",
          "Prereqs": [
            286,
            345
          ],
          "Patches": [
            "xsa346/xsa346-?.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "93508595d588afe9dca087f95200effb7cedc81f",
          "Prereqs": [
            286,
            345
          ],
          "Patches": [
            "xsa346/xsa346-?.patch"
          ]
        }
      }
    }
  }
}

[-- Attachment #3: xsa346/xsa346-1.patch --]
[-- Type: application/octet-stream, Size: 1886 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page

Deferring flushes to a single, wide range one - as is done when
handling XENMAPSPACE_gmfn_range - is okay only as long as
pages don't get freed ahead of the eventual flush. While the only
function setting the flag (xenmem_add_to_physmap()) suggests by its name
that it's only mapping new entries, in reality the way
xenmem_add_to_physmap_one() works means an unmap would happen not only
for the page being moved (but not freed) but, if the destination GFN is
populated, also for the page being displaced from that GFN. Collapsing
the two flushes for this GFN into just one (end even more so deferring
it to a batched invocation) is not correct.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -293,6 +293,7 @@ int guest_remove_page(struct domain *d,
     p2m_type_t p2mt;
 #endif
     mfn_t mfn;
+    bool *dont_flush_p, dont_flush;
     int rc;
 
 #ifdef CONFIG_X86
@@ -379,8 +380,18 @@ int guest_remove_page(struct domain *d,
         return -ENXIO;
     }
 
+    /*
+     * Since we're likely to free the page below, we need to suspend
+     * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
+     */
+    dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
+    dont_flush = *dont_flush_p;
+    *dont_flush_p = false;
+
     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
 
+    *dont_flush_p = dont_flush;
+
     /*
      * With the lack of an IOMMU on some platforms, domains with DMA-capable
      * device must retrieve the same pfn when the hypercall populate_physmap

[-- Attachment #4: xsa346/xsa346-2.patch --]
[-- Type: application/octet-stream, Size: 6896 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: hold page ref until after deferred TLB flush

When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
flush for the "from" GFN range requires that the page remains allocated
to the guest until the TLB flush has actually occurred. Otherwise a
parallel hypercall to remove the page would only flush the TLB for the
GFN it has been moved to, but not the one is was mapped at originally.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1407,7 +1407,7 @@ void share_xen_page_with_guest(struct pa
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gfn)
 {
@@ -1480,10 +1480,6 @@ int xenmem_add_to_physmap_one(
         break;
     }
     case XENMAPSPACE_dev_mmio:
-        /* extra should be 0. Reserved for future use. */
-        if ( extra.res0 )
-            return -EOPNOTSUPP;
-
         rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
         return rc;
 
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4497,7 +4497,7 @@ static int handle_iomem_range(unsigned l
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gpfn)
 {
@@ -4581,9 +4581,20 @@ int xenmem_add_to_physmap_one(
         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
 
  put_both:
-    /* In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top. */
+    /*
+     * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
+     * We also may need to transfer ownership of the page reference to our
+     * caller.
+     */
     if ( space == XENMAPSPACE_gmfn )
+    {
         put_gfn(d, gfn);
+        if ( !rc && extra.ppage )
+        {
+            *extra.ppage = page;
+            page = NULL;
+        }
+    }
 
     if ( page )
         put_page(page);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -815,13 +815,12 @@ int xenmem_add_to_physmap(struct domain
 {
     unsigned int done = 0;
     long rc = 0;
-    union xen_add_to_physmap_batch_extra extra;
+    union add_to_physmap_extra extra = {};
+    struct page_info *pages[16];
 
     ASSERT(paging_mode_translate(d));
 
-    if ( xatp->space != XENMAPSPACE_gmfn_foreign )
-        extra.res0 = 0;
-    else
+    if ( xatp->space == XENMAPSPACE_gmfn_foreign )
         extra.foreign_domid = DOMID_INVALID;
 
     if ( xatp->space != XENMAPSPACE_gmfn_range )
@@ -836,7 +835,10 @@ int xenmem_add_to_physmap(struct domain
     xatp->size -= start;
 
     if ( is_iommu_enabled(d) )
+    {
        this_cpu(iommu_dont_flush_iotlb) = 1;
+       extra.ppage = &pages[0];
+    }
 
     while ( xatp->size > done )
     {
@@ -848,8 +850,12 @@ int xenmem_add_to_physmap(struct domain
         xatp->idx++;
         xatp->gpfn++;
 
+        if ( extra.ppage )
+            ++extra.ppage;
+
         /* Check for continuation if it's not the last iteration. */
-        if ( xatp->size > ++done && hypercall_preempt_check() )
+        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
+             (xatp->size > done && hypercall_preempt_check()) )
         {
             rc = start + done;
             break;
@@ -859,6 +865,7 @@ int xenmem_add_to_physmap(struct domain
     if ( is_iommu_enabled(d) )
     {
         int ret;
+        unsigned int i;
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
@@ -867,6 +874,15 @@ int xenmem_add_to_physmap(struct domain
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
+        /*
+         * Now that the IOMMU TLB flush was done for the original GFN, drop
+         * the page references. The 2nd flush below is fine to make later, as
+         * whoever removes the page again from its new GFN will have to do
+         * another flush anyway.
+         */
+        for ( i = 0; i < done; ++i )
+            put_page(pages[i]);
+
         ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
                                 IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
         if ( unlikely(ret) && rc >= 0 )
@@ -880,6 +896,8 @@ static int xenmem_add_to_physmap_batch(s
                                        struct xen_add_to_physmap_batch *xatpb,
                                        unsigned int extent)
 {
+    union add_to_physmap_extra extra = {};
+
     if ( unlikely(xatpb->size < extent) )
         return -EILSEQ;
 
@@ -891,6 +909,19 @@ static int xenmem_add_to_physmap_batch(s
          !guest_handle_subrange_okay(xatpb->errs, extent, xatpb->size - 1) )
         return -EFAULT;
 
+    switch ( xatpb->space )
+    {
+    case XENMAPSPACE_dev_mmio:
+        /* res0 is reserved for future use. */
+        if ( xatpb->u.res0 )
+            return -EOPNOTSUPP;
+        break;
+
+    case XENMAPSPACE_gmfn_foreign:
+        extra.foreign_domid = xatpb->u.foreign_domid;
+        break;
+    }
+
     while ( xatpb->size > extent )
     {
         xen_ulong_t idx;
@@ -903,8 +934,7 @@ static int xenmem_add_to_physmap_batch(s
                                                extent, 1)) )
             return -EFAULT;
 
-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
-                                       xatpb->u,
+        rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
                                        idx, _gfn(gpfn));
 
         if ( unlikely(__copy_to_guest_offset(xatpb->errs, extent, &rc, 1)) )
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -592,8 +592,22 @@ void scrub_one_page(struct page_info *);
     page_list_del(pg, page_to_list(d, pg))
 #endif
 
+union add_to_physmap_extra {
+    /*
+     * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
+     * to be kept until after the flush, so the page can't get removed from
+     * the domain (and re-used for another purpose) beforehand. By passing
+     * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
+     * to have ownership of such a reference transferred in the success case.
+     */
+    struct page_info **ppage;
+
+    /* XENMAPSPACE_gmfn_foreign */
+    domid_t foreign_domid;
+};
+
 int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
-                              union xen_add_to_physmap_batch_extra extra,
+                              union add_to_physmap_extra extra,
                               unsigned long idx, gfn_t gfn);
 
 int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,

[-- Attachment #5: xsa346/xsa346-4.10-1.patch --]
[-- Type: application/octet-stream, Size: 2046 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page

Deferring flushes to a single, wide range one - as is done when
handling XENMAPSPACE_gmfn_range - is okay only as long as
pages don't get freed ahead of the eventual flush. While the only
function setting the flag (xenmem_add_to_physmap()) suggests by its name
that it's only mapping new entries, in reality the way
xenmem_add_to_physmap_one() works means an unmap would happen not only
for the page being moved (but not freed) but, if the destination GFN is
populated, also for the page being displaced from that GFN. Collapsing
the two flushes for this GFN into just one (end even more so deferring
it to a batched invocation) is not correct.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -284,7 +284,10 @@ int guest_remove_page(struct domain *d,
     p2m_type_t p2mt;
 #endif
     mfn_t mfn;
+#ifdef CONFIG_HAS_PASSTHROUGH
+    bool *dont_flush_p, dont_flush;
     int rc;
+#endif
 
 #ifdef CONFIG_X86
     mfn = get_gfn_query(d, gmfn, &p2mt);
@@ -359,8 +362,22 @@ int guest_remove_page(struct domain *d,
         return -ENXIO;
     }
 
+#ifdef CONFIG_HAS_PASSTHROUGH
+    /*
+     * Since we're likely to free the page below, we need to suspend
+     * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
+     */
+    dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
+    dont_flush = *dont_flush_p;
+    *dont_flush_p = false;
+#endif
+
     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
 
+#ifdef CONFIG_HAS_PASSTHROUGH
+    *dont_flush_p = dont_flush;
+#endif
+
     /*
      * With the lack of an IOMMU on some platforms, domains with DMA-capable
      * device must retrieve the same pfn when the hypercall populate_physmap

[-- Attachment #6: xsa346/xsa346-4.10-2.patch --]
[-- Type: application/octet-stream, Size: 6611 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: hold page ref until after deferred TLB flush

When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
flush for the "from" GFN range requires that the page remains allocated
to the guest until the TLB flush has actually occurred. Otherwise a
parallel hypercall to remove the page would only flush the TLB for the
GFN it has been moved to, but not the one is was mapped at originally.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1225,7 +1225,7 @@ void share_xen_page_with_privileged_gues
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gfn)
 {
@@ -1297,10 +1297,6 @@ int xenmem_add_to_physmap_one(
         break;
     }
     case XENMAPSPACE_dev_mmio:
-        /* extra should be 0. Reserved for future use. */
-        if ( extra.res0 )
-            return -EOPNOTSUPP;
-
         rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
         return rc;
 
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4592,7 +4592,7 @@ static int handle_iomem_range(unsigned l
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gpfn)
 {
@@ -4682,9 +4682,20 @@ int xenmem_add_to_physmap_one(
         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
 
  put_both:
-    /* In the XENMAPSPACE_gmfn, we took a ref of the gfn at the top */
+    /*
+     * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
+     * We also may need to transfer ownership of the page reference to our
+     * caller.
+     */
     if ( space == XENMAPSPACE_gmfn || space == XENMAPSPACE_gmfn_range )
+    {
         put_gfn(d, gfn);
+        if ( !rc && extra.ppage )
+        {
+            *extra.ppage = page;
+            page = NULL;
+        }
+    }
 
     if ( page )
         put_page(page);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -768,11 +768,10 @@ static int xenmem_add_to_physmap(struct
 {
     unsigned int done = 0;
     long rc = 0;
-    union xen_add_to_physmap_batch_extra extra;
+    union add_to_physmap_extra extra = {};
+    struct page_info *pages[16];
 
-    if ( xatp->space != XENMAPSPACE_gmfn_foreign )
-        extra.res0 = 0;
-    else
+    if ( xatp->space == XENMAPSPACE_gmfn_foreign )
         extra.foreign_domid = DOMID_INVALID;
 
     if ( xatp->space != XENMAPSPACE_gmfn_range )
@@ -788,7 +787,10 @@ static int xenmem_add_to_physmap(struct
 
 #ifdef CONFIG_HAS_PASSTHROUGH
     if ( need_iommu(d) )
+    {
         this_cpu(iommu_dont_flush_iotlb) = 1;
+        extra.ppage = &pages[0];
+    }
 #endif
 
     while ( xatp->size > done )
@@ -801,8 +803,12 @@ static int xenmem_add_to_physmap(struct
         xatp->idx++;
         xatp->gpfn++;
 
+        if ( extra.ppage )
+            ++extra.ppage;
+
         /* Check for continuation if it's not the last iteration. */
-        if ( xatp->size > ++done && hypercall_preempt_check() )
+        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
+             (xatp->size > done && hypercall_preempt_check()) )
         {
             rc = start + done;
             break;
@@ -813,6 +819,7 @@ static int xenmem_add_to_physmap(struct
     if ( need_iommu(d) )
     {
         int ret;
+        unsigned int i;
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
@@ -820,6 +827,15 @@ static int xenmem_add_to_physmap(struct
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
+        /*
+         * Now that the IOMMU TLB flush was done for the original GFN, drop
+         * the page references. The 2nd flush below is fine to make later, as
+         * whoever removes the page again from its new GFN will have to do
+         * another flush anyway.
+         */
+        for ( i = 0; i < done; ++i )
+            put_page(pages[i]);
+
         ret = iommu_iotlb_flush(d, xatp->gpfn - done, done);
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
@@ -835,6 +851,7 @@ static int xenmem_add_to_physmap_batch(s
 {
     unsigned int done = 0;
     int rc;
+    union add_to_physmap_extra extra = {};
 
     if ( xatpb->size < start )
         return -EILSEQ;
@@ -849,6 +866,19 @@ static int xenmem_add_to_physmap_batch(s
          !guest_handle_okay(xatpb->errs, xatpb->size) )
         return -EFAULT;
 
+    switch ( xatpb->space )
+    {
+    case XENMAPSPACE_dev_mmio:
+        /* res0 is reserved for future use. */
+        if ( xatpb->u.res0 )
+            return -EOPNOTSUPP;
+        break;
+
+    case XENMAPSPACE_gmfn_foreign:
+        extra.foreign_domid = xatpb->u.foreign_domid;
+        break;
+    }
+
     while ( xatpb->size > done )
     {
         xen_ulong_t idx;
@@ -866,8 +896,7 @@ static int xenmem_add_to_physmap_batch(s
             goto out;
         }
 
-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
-                                       xatpb->u,
+        rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
                                        idx, _gfn(gpfn));
 
         if ( unlikely(__copy_to_guest_offset(xatpb->errs, 0, &rc, 1)) )
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -583,8 +583,22 @@ void scrub_one_page(struct page_info *);
                       &(d)->xenpage_list : &(d)->page_list)
 #endif
 
+union add_to_physmap_extra {
+    /*
+     * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
+     * to be kept until after the flush, so the page can't get removed from
+     * the domain (and re-used for another purpose) beforehand. By passing
+     * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
+     * to have ownership of such a reference transferred in the success case.
+     */
+    struct page_info **ppage;
+
+    /* XENMAPSPACE_gmfn_foreign */
+    domid_t foreign_domid;
+};
+
 int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
-                              union xen_add_to_physmap_batch_extra extra,
+                              union add_to_physmap_extra extra,
                               unsigned long idx, gfn_t gfn);
 
 /* Return 0 on success, or negative on error. */

[-- Attachment #7: xsa346/xsa346-4.11-1.patch --]
[-- Type: application/octet-stream, Size: 2046 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page

Deferring flushes to a single, wide range one - as is done when
handling XENMAPSPACE_gmfn_range - is okay only as long as
pages don't get freed ahead of the eventual flush. While the only
function setting the flag (xenmem_add_to_physmap()) suggests by its name
that it's only mapping new entries, in reality the way
xenmem_add_to_physmap_one() works means an unmap would happen not only
for the page being moved (but not freed) but, if the destination GFN is
populated, also for the page being displaced from that GFN. Collapsing
the two flushes for this GFN into just one (end even more so deferring
it to a batched invocation) is not correct.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -298,7 +298,10 @@ int guest_remove_page(struct domain *d,
     p2m_type_t p2mt;
 #endif
     mfn_t mfn;
+#ifdef CONFIG_HAS_PASSTHROUGH
+    bool *dont_flush_p, dont_flush;
     int rc;
+#endif
 
 #ifdef CONFIG_X86
     mfn = get_gfn_query(d, gmfn, &p2mt);
@@ -376,8 +379,22 @@ int guest_remove_page(struct domain *d,
         return -ENXIO;
     }
 
+#ifdef CONFIG_HAS_PASSTHROUGH
+    /*
+     * Since we're likely to free the page below, we need to suspend
+     * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
+     */
+    dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
+    dont_flush = *dont_flush_p;
+    *dont_flush_p = false;
+#endif
+
     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
 
+#ifdef CONFIG_HAS_PASSTHROUGH
+    *dont_flush_p = dont_flush;
+#endif
+
     /*
      * With the lack of an IOMMU on some platforms, domains with DMA-capable
      * device must retrieve the same pfn when the hypercall populate_physmap

[-- Attachment #8: xsa346/xsa346-4.11-2.patch --]
[-- Type: application/octet-stream, Size: 6798 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: hold page ref until after deferred TLB flush

When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
flush for the "from" GFN range requires that the page remains allocated
to the guest until the TLB flush has actually occurred. Otherwise a
parallel hypercall to remove the page would only flush the TLB for the
GFN it has been moved to, but not the one is was mapped at originally.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1222,7 +1222,7 @@ void share_xen_page_with_guest(struct pa
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gfn)
 {
@@ -1294,10 +1294,6 @@ int xenmem_add_to_physmap_one(
         break;
     }
     case XENMAPSPACE_dev_mmio:
-        /* extra should be 0. Reserved for future use. */
-        if ( extra.res0 )
-            return -EOPNOTSUPP;
-
         rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
         return rc;
 
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4634,7 +4634,7 @@ static int handle_iomem_range(unsigned l
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gpfn)
 {
@@ -4721,9 +4721,20 @@ int xenmem_add_to_physmap_one(
         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
 
  put_both:
-    /* In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top. */
+    /*
+     * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
+     * We also may need to transfer ownership of the page reference to our
+     * caller.
+     */
     if ( space == XENMAPSPACE_gmfn )
+    {
         put_gfn(d, gfn);
+        if ( !rc && extra.ppage )
+        {
+            *extra.ppage = page;
+            page = NULL;
+        }
+    }
 
     if ( page )
         put_page(page);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -811,11 +811,10 @@ int xenmem_add_to_physmap(struct domain
 {
     unsigned int done = 0;
     long rc = 0;
-    union xen_add_to_physmap_batch_extra extra;
+    union add_to_physmap_extra extra = {};
+    struct page_info *pages[16];
 
-    if ( xatp->space != XENMAPSPACE_gmfn_foreign )
-        extra.res0 = 0;
-    else
+    if ( xatp->space == XENMAPSPACE_gmfn_foreign )
         extra.foreign_domid = DOMID_INVALID;
 
     if ( xatp->space != XENMAPSPACE_gmfn_range )
@@ -831,7 +830,10 @@ int xenmem_add_to_physmap(struct domain
 
 #ifdef CONFIG_HAS_PASSTHROUGH
     if ( need_iommu(d) )
+    {
         this_cpu(iommu_dont_flush_iotlb) = 1;
+        extra.ppage = &pages[0];
+    }
 #endif
 
     while ( xatp->size > done )
@@ -844,8 +846,12 @@ int xenmem_add_to_physmap(struct domain
         xatp->idx++;
         xatp->gpfn++;
 
+        if ( extra.ppage )
+            ++extra.ppage;
+
         /* Check for continuation if it's not the last iteration. */
-        if ( xatp->size > ++done && hypercall_preempt_check() )
+        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
+             (xatp->size > done && hypercall_preempt_check()) )
         {
             rc = start + done;
             break;
@@ -856,6 +862,7 @@ int xenmem_add_to_physmap(struct domain
     if ( need_iommu(d) )
     {
         int ret;
+        unsigned int i;
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
@@ -863,6 +870,15 @@ int xenmem_add_to_physmap(struct domain
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
+        /*
+         * Now that the IOMMU TLB flush was done for the original GFN, drop
+         * the page references. The 2nd flush below is fine to make later, as
+         * whoever removes the page again from its new GFN will have to do
+         * another flush anyway.
+         */
+        for ( i = 0; i < done; ++i )
+            put_page(pages[i]);
+
         ret = iommu_iotlb_flush(d, xatp->gpfn - done, done);
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
@@ -876,6 +892,8 @@ static int xenmem_add_to_physmap_batch(s
                                        struct xen_add_to_physmap_batch *xatpb,
                                        unsigned int extent)
 {
+    union add_to_physmap_extra extra = {};
+
     if ( xatpb->size < extent )
         return -EILSEQ;
 
@@ -884,6 +902,19 @@ static int xenmem_add_to_physmap_batch(s
          !guest_handle_subrange_okay(xatpb->errs, extent, xatpb->size - 1) )
         return -EFAULT;
 
+    switch ( xatpb->space )
+    {
+    case XENMAPSPACE_dev_mmio:
+        /* res0 is reserved for future use. */
+        if ( xatpb->u.res0 )
+            return -EOPNOTSUPP;
+        break;
+
+    case XENMAPSPACE_gmfn_foreign:
+        extra.foreign_domid = xatpb->u.foreign_domid;
+        break;
+    }
+
     while ( xatpb->size > extent )
     {
         xen_ulong_t idx;
@@ -896,8 +927,7 @@ static int xenmem_add_to_physmap_batch(s
                                                extent, 1)) )
             return -EFAULT;
 
-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
-                                       xatpb->u,
+        rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
                                        idx, _gfn(gpfn));
 
         if ( unlikely(__copy_to_guest_offset(xatpb->errs, extent, &rc, 1)) )
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -577,8 +577,22 @@ void scrub_one_page(struct page_info *);
                       &(d)->xenpage_list : &(d)->page_list)
 #endif
 
+union add_to_physmap_extra {
+    /*
+     * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
+     * to be kept until after the flush, so the page can't get removed from
+     * the domain (and re-used for another purpose) beforehand. By passing
+     * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
+     * to have ownership of such a reference transferred in the success case.
+     */
+    struct page_info **ppage;
+
+    /* XENMAPSPACE_gmfn_foreign */
+    domid_t foreign_domid;
+};
+
 int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
-                              union xen_add_to_physmap_batch_extra extra,
+                              union add_to_physmap_extra extra,
                               unsigned long idx, gfn_t gfn);
 
 int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,

[-- Attachment #9: xsa346/xsa346-4.12-1.patch --]
[-- Type: application/octet-stream, Size: 1886 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page

Deferring flushes to a single, wide range one - as is done when
handling XENMAPSPACE_gmfn_range - is okay only as long as
pages don't get freed ahead of the eventual flush. While the only
function setting the flag (xenmem_add_to_physmap()) suggests by its name
that it's only mapping new entries, in reality the way
xenmem_add_to_physmap_one() works means an unmap would happen not only
for the page being moved (but not freed) but, if the destination GFN is
populated, also for the page being displaced from that GFN. Collapsing
the two flushes for this GFN into just one (end even more so deferring
it to a batched invocation) is not correct.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -300,6 +300,7 @@ int guest_remove_page(struct domain *d,
     p2m_type_t p2mt;
 #endif
     mfn_t mfn;
+    bool *dont_flush_p, dont_flush;
     int rc;
 
 #ifdef CONFIG_X86
@@ -386,8 +387,18 @@ int guest_remove_page(struct domain *d,
         return -ENXIO;
     }
 
+    /*
+     * Since we're likely to free the page below, we need to suspend
+     * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
+     */
+    dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
+    dont_flush = *dont_flush_p;
+    *dont_flush_p = false;
+
     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
 
+    *dont_flush_p = dont_flush;
+
     /*
      * With the lack of an IOMMU on some platforms, domains with DMA-capable
      * device must retrieve the same pfn when the hypercall populate_physmap

[-- Attachment #10: xsa346/xsa346-4.12-2.patch --]
[-- Type: application/octet-stream, Size: 6854 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: hold page ref until after deferred TLB flush

When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
flush for the "from" GFN range requires that the page remains allocated
to the guest until the TLB flush has actually occurred. Otherwise a
parallel hypercall to remove the page would only flush the TLB for the
GFN it has been moved to, but not the one is was mapped at originally.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1211,7 +1211,7 @@ void share_xen_page_with_guest(struct pa
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gfn)
 {
@@ -1284,10 +1284,6 @@ int xenmem_add_to_physmap_one(
         break;
     }
     case XENMAPSPACE_dev_mmio:
-        /* extra should be 0. Reserved for future use. */
-        if ( extra.res0 )
-            return -EOPNOTSUPP;
-
         rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
         return rc;
 
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4653,7 +4653,7 @@ static int handle_iomem_range(unsigned l
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gpfn)
 {
@@ -4740,9 +4740,20 @@ int xenmem_add_to_physmap_one(
         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
 
  put_both:
-    /* In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top. */
+    /*
+     * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
+     * We also may need to transfer ownership of the page reference to our
+     * caller.
+     */
     if ( space == XENMAPSPACE_gmfn )
+    {
         put_gfn(d, gfn);
+        if ( !rc && extra.ppage )
+        {
+            *extra.ppage = page;
+            page = NULL;
+        }
+    }
 
     if ( page )
         put_page(page);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -824,11 +824,10 @@ int xenmem_add_to_physmap(struct domain
 {
     unsigned int done = 0;
     long rc = 0;
-    union xen_add_to_physmap_batch_extra extra;
+    union add_to_physmap_extra extra = {};
+    struct page_info *pages[16];
 
-    if ( xatp->space != XENMAPSPACE_gmfn_foreign )
-        extra.res0 = 0;
-    else
+    if ( xatp->space == XENMAPSPACE_gmfn_foreign )
         extra.foreign_domid = DOMID_INVALID;
 
     if ( xatp->space != XENMAPSPACE_gmfn_range )
@@ -843,7 +842,10 @@ int xenmem_add_to_physmap(struct domain
     xatp->size -= start;
 
     if ( has_iommu_pt(d) )
+    {
        this_cpu(iommu_dont_flush_iotlb) = 1;
+       extra.ppage = &pages[0];
+    }
 
     while ( xatp->size > done )
     {
@@ -855,8 +857,12 @@ int xenmem_add_to_physmap(struct domain
         xatp->idx++;
         xatp->gpfn++;
 
+        if ( extra.ppage )
+            ++extra.ppage;
+
         /* Check for continuation if it's not the last iteration. */
-        if ( xatp->size > ++done && hypercall_preempt_check() )
+        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
+             (xatp->size > done && hypercall_preempt_check()) )
         {
             rc = start + done;
             break;
@@ -866,6 +872,7 @@ int xenmem_add_to_physmap(struct domain
     if ( has_iommu_pt(d) )
     {
         int ret;
+        unsigned int i;
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
@@ -874,6 +881,15 @@ int xenmem_add_to_physmap(struct domain
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
+        /*
+         * Now that the IOMMU TLB flush was done for the original GFN, drop
+         * the page references. The 2nd flush below is fine to make later, as
+         * whoever removes the page again from its new GFN will have to do
+         * another flush anyway.
+         */
+        for ( i = 0; i < done; ++i )
+            put_page(pages[i]);
+
         ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
                                 IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
         if ( unlikely(ret) && rc >= 0 )
@@ -887,6 +903,8 @@ static int xenmem_add_to_physmap_batch(s
                                        struct xen_add_to_physmap_batch *xatpb,
                                        unsigned int extent)
 {
+    union add_to_physmap_extra extra = {};
+
     if ( xatpb->size < extent )
         return -EILSEQ;
 
@@ -895,6 +913,19 @@ static int xenmem_add_to_physmap_batch(s
          !guest_handle_subrange_okay(xatpb->errs, extent, xatpb->size - 1) )
         return -EFAULT;
 
+    switch ( xatpb->space )
+    {
+    case XENMAPSPACE_dev_mmio:
+        /* res0 is reserved for future use. */
+        if ( xatpb->u.res0 )
+            return -EOPNOTSUPP;
+        break;
+
+    case XENMAPSPACE_gmfn_foreign:
+        extra.foreign_domid = xatpb->u.foreign_domid;
+        break;
+    }
+
     while ( xatpb->size > extent )
     {
         xen_ulong_t idx;
@@ -907,8 +938,7 @@ static int xenmem_add_to_physmap_batch(s
                                                extent, 1)) )
             return -EFAULT;
 
-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
-                                       xatpb->u,
+        rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
                                        idx, _gfn(gpfn));
 
         if ( unlikely(__copy_to_guest_offset(xatpb->errs, extent, &rc, 1)) )
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -583,8 +583,22 @@ void scrub_one_page(struct page_info *);
                       &(d)->xenpage_list : &(d)->page_list)
 #endif
 
+union add_to_physmap_extra {
+    /*
+     * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
+     * to be kept until after the flush, so the page can't get removed from
+     * the domain (and re-used for another purpose) beforehand. By passing
+     * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
+     * to have ownership of such a reference transferred in the success case.
+     */
+    struct page_info **ppage;
+
+    /* XENMAPSPACE_gmfn_foreign */
+    domid_t foreign_domid;
+};
+
 int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
-                              union xen_add_to_physmap_batch_extra extra,
+                              union add_to_physmap_extra extra,
                               unsigned long idx, gfn_t gfn);
 
 int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,

[-- Attachment #11: xsa346/xsa346-4.13-1.patch --]
[-- Type: application/octet-stream, Size: 1886 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: suppress "iommu_dont_flush_iotlb" when about to free a page

Deferring flushes to a single, wide range one - as is done when
handling XENMAPSPACE_gmfn_range - is okay only as long as
pages don't get freed ahead of the eventual flush. While the only
function setting the flag (xenmem_add_to_physmap()) suggests by its name
that it's only mapping new entries, in reality the way
xenmem_add_to_physmap_one() works means an unmap would happen not only
for the page being moved (but not freed) but, if the destination GFN is
populated, also for the page being displaced from that GFN. Collapsing
the two flushes for this GFN into just one (end even more so deferring
it to a batched invocation) is not correct.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -292,6 +292,7 @@ int guest_remove_page(struct domain *d,
     p2m_type_t p2mt;
 #endif
     mfn_t mfn;
+    bool *dont_flush_p, dont_flush;
     int rc;
 
 #ifdef CONFIG_X86
@@ -378,8 +379,18 @@ int guest_remove_page(struct domain *d,
         return -ENXIO;
     }
 
+    /*
+     * Since we're likely to free the page below, we need to suspend
+     * xenmem_add_to_physmap()'s suppressing of IOMMU TLB flushes.
+     */
+    dont_flush_p = &this_cpu(iommu_dont_flush_iotlb);
+    dont_flush = *dont_flush_p;
+    *dont_flush_p = false;
+
     rc = guest_physmap_remove_page(d, _gfn(gmfn), mfn, 0);
 
+    *dont_flush_p = dont_flush;
+
     /*
      * With the lack of an IOMMU on some platforms, domains with DMA-capable
      * device must retrieve the same pfn when the hypercall populate_physmap

[-- Attachment #12: xsa346/xsa346-4.13-2.patch --]
[-- Type: application/octet-stream, Size: 6913 bytes --]

From: Jan Beulich <jbeulich@suse.com>
Subject: IOMMU: hold page ref until after deferred TLB flush

When moving around a page via XENMAPSPACE_gmfn_range, deferring the TLB
flush for the "from" GFN range requires that the page remains allocated
to the guest until the TLB flush has actually occurred. Otherwise a
parallel hypercall to remove the page would only flush the TLB for the
GFN it has been moved to, but not the one is was mapped at originally.

This is part of XSA-346.

Fixes: cf95b2a9fd5a ("iommu: Introduce per cpu flag (iommu_dont_flush_iotlb) to avoid unnecessary iotlb... ")
Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1407,7 +1407,7 @@ void share_xen_page_with_guest(struct pa
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gfn)
 {
@@ -1480,10 +1480,6 @@ int xenmem_add_to_physmap_one(
         break;
     }
     case XENMAPSPACE_dev_mmio:
-        /* extra should be 0. Reserved for future use. */
-        if ( extra.res0 )
-            return -EOPNOTSUPP;
-
         rc = map_dev_mmio_region(d, gfn, 1, _mfn(idx));
         return rc;
 
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4617,7 +4617,7 @@ static int handle_iomem_range(unsigned l
 int xenmem_add_to_physmap_one(
     struct domain *d,
     unsigned int space,
-    union xen_add_to_physmap_batch_extra extra,
+    union add_to_physmap_extra extra,
     unsigned long idx,
     gfn_t gpfn)
 {
@@ -4701,9 +4701,20 @@ int xenmem_add_to_physmap_one(
         rc = guest_physmap_add_page(d, gpfn, mfn, PAGE_ORDER_4K);
 
  put_both:
-    /* In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top. */
+    /*
+     * In the XENMAPSPACE_gmfn case, we took a ref of the gfn at the top.
+     * We also may need to transfer ownership of the page reference to our
+     * caller.
+     */
     if ( space == XENMAPSPACE_gmfn )
+    {
         put_gfn(d, gfn);
+        if ( !rc && extra.ppage )
+        {
+            *extra.ppage = page;
+            page = NULL;
+        }
+    }
 
     if ( page )
         put_page(page);
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -814,13 +814,12 @@ int xenmem_add_to_physmap(struct domain
 {
     unsigned int done = 0;
     long rc = 0;
-    union xen_add_to_physmap_batch_extra extra;
+    union add_to_physmap_extra extra = {};
+    struct page_info *pages[16];
 
     ASSERT(paging_mode_translate(d));
 
-    if ( xatp->space != XENMAPSPACE_gmfn_foreign )
-        extra.res0 = 0;
-    else
+    if ( xatp->space == XENMAPSPACE_gmfn_foreign )
         extra.foreign_domid = DOMID_INVALID;
 
     if ( xatp->space != XENMAPSPACE_gmfn_range )
@@ -835,7 +834,10 @@ int xenmem_add_to_physmap(struct domain
     xatp->size -= start;
 
     if ( is_iommu_enabled(d) )
+    {
        this_cpu(iommu_dont_flush_iotlb) = 1;
+       extra.ppage = &pages[0];
+    }
 
     while ( xatp->size > done )
     {
@@ -847,8 +849,12 @@ int xenmem_add_to_physmap(struct domain
         xatp->idx++;
         xatp->gpfn++;
 
+        if ( extra.ppage )
+            ++extra.ppage;
+
         /* Check for continuation if it's not the last iteration. */
-        if ( xatp->size > ++done && hypercall_preempt_check() )
+        if ( (++done > ARRAY_SIZE(pages) && extra.ppage) ||
+             (xatp->size > done && hypercall_preempt_check()) )
         {
             rc = start + done;
             break;
@@ -858,6 +864,7 @@ int xenmem_add_to_physmap(struct domain
     if ( is_iommu_enabled(d) )
     {
         int ret;
+        unsigned int i;
 
         this_cpu(iommu_dont_flush_iotlb) = 0;
 
@@ -866,6 +873,15 @@ int xenmem_add_to_physmap(struct domain
         if ( unlikely(ret) && rc >= 0 )
             rc = ret;
 
+        /*
+         * Now that the IOMMU TLB flush was done for the original GFN, drop
+         * the page references. The 2nd flush below is fine to make later, as
+         * whoever removes the page again from its new GFN will have to do
+         * another flush anyway.
+         */
+        for ( i = 0; i < done; ++i )
+            put_page(pages[i]);
+
         ret = iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done,
                                 IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified);
         if ( unlikely(ret) && rc >= 0 )
@@ -879,6 +895,8 @@ static int xenmem_add_to_physmap_batch(s
                                        struct xen_add_to_physmap_batch *xatpb,
                                        unsigned int extent)
 {
+    union add_to_physmap_extra extra = {};
+
     if ( unlikely(xatpb->size < extent) )
         return -EILSEQ;
 
@@ -890,6 +908,19 @@ static int xenmem_add_to_physmap_batch(s
          !guest_handle_subrange_okay(xatpb->errs, extent, xatpb->size - 1) )
         return -EFAULT;
 
+    switch ( xatpb->space )
+    {
+    case XENMAPSPACE_dev_mmio:
+        /* res0 is reserved for future use. */
+        if ( xatpb->u.res0 )
+            return -EOPNOTSUPP;
+        break;
+
+    case XENMAPSPACE_gmfn_foreign:
+        extra.foreign_domid = xatpb->u.foreign_domid;
+        break;
+    }
+
     while ( xatpb->size > extent )
     {
         xen_ulong_t idx;
@@ -902,8 +933,7 @@ static int xenmem_add_to_physmap_batch(s
                                                extent, 1)) )
             return -EFAULT;
 
-        rc = xenmem_add_to_physmap_one(d, xatpb->space,
-                                       xatpb->u,
+        rc = xenmem_add_to_physmap_one(d, xatpb->space, extra,
                                        idx, _gfn(gpfn));
 
         if ( unlikely(__copy_to_guest_offset(xatpb->errs, extent, &rc, 1)) )
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -588,8 +588,22 @@ void scrub_one_page(struct page_info *);
                       &(d)->xenpage_list : &(d)->page_list)
 #endif
 
+union add_to_physmap_extra {
+    /*
+     * XENMAPSPACE_gmfn: When deferring TLB flushes, a page reference needs
+     * to be kept until after the flush, so the page can't get removed from
+     * the domain (and re-used for another purpose) beforehand. By passing
+     * non-NULL, the caller of xenmem_add_to_physmap_one() indicates it wants
+     * to have ownership of such a reference transferred in the success case.
+     */
+    struct page_info **ppage;
+
+    /* XENMAPSPACE_gmfn_foreign */
+    domid_t foreign_domid;
+};
+
 int xenmem_add_to_physmap_one(struct domain *d, unsigned int space,
-                              union xen_add_to_physmap_batch_extra extra,
+                              union add_to_physmap_extra extra,
                               unsigned long idx, gfn_t gfn);
 
 int xenmem_add_to_physmap(struct domain *d, struct xen_add_to_physmap *xatp,

^ permalink raw reply	[flat|nested] only message in thread