[for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn

All of lore.kernel.org
 help / color / mirror / Atom feed

* [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn
@ 2021-02-09 15:28 Julien Grall
  2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
                   ` (5 more replies)
  0 siblings, 6 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-09 15:28 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Paul Durrant, Kevin Tian

From: Julien Grall <jgrall@amazon.com>

Hi all,

This series is a collection of bug fixes for the IOMMU teardown code.
All of them are candidate for 4.15 as they can either leak memory or
lead to host crash/host corruption.

This is sent directly on xen-devel because all the issues were either
introduced in 4.15 or happen in the domain creation code.

Cheers,

Julien Grall (5):
  xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  xen/iommu: Check if the IOMMU was initialized before tearing down
  xen/iommu: iommu_map: Don't crash the domain if it is dying
  xen/iommu: x86: Don't leak the IOMMU page-tables
  xen/iommu: x86: Clear the root page-table before freeing the
    page-tables

 xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++-
 xen/drivers/passthrough/iommu.c             |  9 ++++-
 xen/drivers/passthrough/vtd/iommu.c         | 12 +++++-
 xen/drivers/passthrough/x86/iommu.c         | 42 ++++++++++++++++++++-
 xen/include/asm-x86/p2m.h                   |  4 ++
 xen/include/xen/iommu.h                     |  1 +
 6 files changed, 76 insertions(+), 4 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
@ 2021-02-09 15:28 ` Julien Grall
  2021-02-09 20:19   ` Paul Durrant
                     ` (2 more replies)
  2021-02-09 15:28 ` [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-09 15:28 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

From: Julien Grall <jgrall@amazon.com>

Currently, the IOMMU page-tables will be populated early in the domain
creation if the hardware is able to virtualize the local APIC. However,
the IOMMU page tables will not be freed during early failure and will
result to a leak.

An assigned device should not need to DMA into the vLAPIC page, so we
can avoid to map the page in the IOMMU page-tables.

This statement is also true for any special pages (the vLAPIC page is
one of them). So to take the opportunity to prevent the mapping for all
of them.

Note that:
    - This is matching the existing behavior with PV guest
    - This doesn't change the behavior when the P2M is shared with the
    IOMMU. IOW, the special pages will still be accessibled by the
    device.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>

---
    Changes in v2:
        - New patch
---
 xen/include/asm-x86/p2m.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 7d63f5787e62..1802545969b3 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -919,6 +919,10 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
 {
     unsigned int flags;
 
+    /* Don't map special pages in the IOMMU page-tables. */
+    if ( mfn_valid(mfn) && is_special_page(mfn_to_page(mfn)) )
+        return 0;
+
     switch( p2mt )
     {
     case p2m_ram_rw:
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down
  2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
  2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
@ 2021-02-09 15:28 ` Julien Grall
  2021-02-09 20:22   ` Paul Durrant
  2021-02-09 15:28 ` [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying Julien Grall
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-09 15:28 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

is_iommu_enabled() will return true even if the IOMMU has not been
initialized (e.g. the ops are not set).

In the case of an early failure in arch_domain_init(), the function
iommu_destroy_domain() will be called even if the IOMMU is not
initialized.

This will result to dereference the ops which will be NULL and an host
crash.

Fix the issue by checking that ops has been set before accessing it.

Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
Signed-off-by: Julien Grall <jgrall@amazon.com>

---
    Changes in v2:
        - Move the check in iommu_teardown() so we don't rely on
        arch_iommu_domain_init() to clean-up its allocation on failure.
        - Fix typo in the commit message
---
 xen/drivers/passthrough/iommu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 2358b6eb09f4..879d238bcd31 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -221,6 +221,13 @@ static void iommu_teardown(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
 
+    /*
+     * During early domain creation failure, we may reach here with the
+     * ops not yet initialized.
+     */
+    if ( !hd->platform_ops )
+        return;
+
     iommu_vcall(hd->platform_ops, teardown, d);
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
  2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
  2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
  2021-02-09 15:28 ` [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
@ 2021-02-09 15:28 ` Julien Grall
  2021-02-09 20:28   ` Paul Durrant
  2021-02-09 15:28 ` [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-09 15:28 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

It is a bit pointless to crash a domain that is already dying. This will
become more an annoyance with a follow-up change where page-table
allocation will be forbidden when the domain is dying.

Security wise, there is no change as the devices would still have access
to the IOMMU page-tables even if the domain has crashed until Xen
start to relinquish the resources.

For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
d->is_dying is correctly observed (a follow-up patch will held it in the
relinquish path).

For Arm, there is still a small race possible. But there is so far no
failure specific to a domain dying.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

This was spotted when trying to destroy IOREQ servers while the domain
is dying. The code will try to add the entry back in the P2M and
therefore update the P2M (see arch_ioreq_server_disable() ->
hvm_add_ioreq_gfn()).

It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however
I didn't try a patch yet because checking d->is_dying can be racy (I
can't find a proper lock).

Changes in v2:
    - Patch added
---
 xen/drivers/passthrough/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 879d238bcd31..75419f20f76d 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
                             flush_flags) )
                 continue;
 
-        if ( !is_hardware_domain(d) )
+        if ( !is_hardware_domain(d) && !d->is_dying )
             domain_crash(d);
 
         break;
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
                   ` (2 preceding siblings ...)
  2021-02-09 15:28 ` [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying Julien Grall
@ 2021-02-09 15:28 ` Julien Grall
  2021-02-09 20:33   ` Paul Durrant
                     ` (2 more replies)
  2021-02-09 15:28 ` [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
  2021-02-09 16:47 ` [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Ian Jackson
  5 siblings, 3 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-09 15:28 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The new IOMMU page-tables allocator will release the pages when
relinquish the domain resources. However, this is not sufficient when
the domain is dying because nothing prevents page-table to be allocated.

iommu_alloc_pgtable() is now checking if the domain is dying before
adding the page in the list. We are relying on &hd->arch.pgtables.lock
to synchronize d->is_dying.

Take the opportunity to check in arch_iommu_domain_destroy() that all
that page tables have been freed.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

There is one more bug that will be solved in the next patch as I felt
they each needed a long explanation.

Changes in v2:
    - Rework the approach
    - Move the patch earlier in the series
---
 xen/drivers/passthrough/x86/iommu.c | 36 ++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index cea1032b3d02..82d770107a47 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
+    /*
+     * There should be not page-tables left allocated by the time the
+     * domain is destroyed. Note that arch_iommu_domain_destroy() is
+     * called unconditionally, so pgtables may be unitialized.
+     */
+    ASSERT(dom_iommu(d)->platform_ops == NULL ||
+           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
 }
 
 static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
@@ -267,6 +274,12 @@ int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    if ( !is_iommu_enabled(d) )
+        return 0;
+
+    /* After this barrier no new page allocations can occur. */
+    spin_barrier(&hd->arch.pgtables.lock);
+
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
@@ -284,6 +297,7 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
     unsigned int memflags = 0;
     struct page_info *pg;
     void *p;
+    bool alive = false;
 
 #ifdef CONFIG_NUMA
     if ( hd->node != NUMA_NO_NODE )
@@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
     unmap_domain_page(p);
 
     spin_lock(&hd->arch.pgtables.lock);
-    page_list_add(pg, &hd->arch.pgtables.list);
+    /*
+     * The IOMMU page-tables are freed when relinquishing the domain, but
+     * nothing prevent allocation to happen afterwards. There is no valid
+     * reasons to continue to update the IOMMU page-tables while the
+     * domain is dying.
+     *
+     * So prevent page-table allocation when the domain is dying.
+     *
+     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
+     */
+    if ( likely(!d->is_dying) )
+    {
+        alive = true;
+        page_list_add(pg, &hd->arch.pgtables.list);
+    }
     spin_unlock(&hd->arch.pgtables.lock);
 
+    if ( unlikely(!alive) )
+    {
+        free_domheap_page(pg);
+        pg = NULL;
+    }
+
     return pg;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
                   ` (3 preceding siblings ...)
  2021-02-09 15:28 ` [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
@ 2021-02-09 15:28 ` Julien Grall
  2021-02-09 20:36   ` Paul Durrant
                     ` (2 more replies)
  2021-02-09 16:47 ` [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Ian Jackson
  5 siblings, 3 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-09 15:28 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
	Kevin Tian, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The new per-domain IOMMU page-table allocator will now free the
page-tables when domain's resources are relinquished. However, the root
page-table (i.e. hd->arch.pg_maddr) will not be cleared.

Xen may access the IOMMU page-tables afterwards at least in the case of
PV domain:

(XEN) Xen call trace:
(XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
(XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
(XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
(XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
(XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
(XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
(XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
(XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
(XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
(XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
(XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
(XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
(XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
(XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
(XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
(XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
(XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120

This will result to a use after-free and possibly an host crash or
memory corruption.

Freeing the page-tables further down in domain_relinquish_resources()
would not work because pages may not be released until later if another
domain hold a reference on them.

Once all the PCI devices have been de-assigned, it is actually pointless
to access modify the IOMMU page-tables. So we can simply clear the root
page-table address.

Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>

---
    Changes in v2:
        - Introduce clear_root_pgtable()
        - Move the patch later in the series
---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++++++++-
 xen/drivers/passthrough/vtd/iommu.c         | 12 +++++++++++-
 xen/drivers/passthrough/x86/iommu.c         |  6 ++++++
 xen/include/xen/iommu.h                     |  1 +
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 42b5a5a9bec4..81add0ba26b4 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
     return reassign_device(pdev->domain, d, devfn, pdev);
 }
 
+static void iommu_clear_root_pgtable(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+
+    spin_lock(&hd->arch.mapping_lock);
+    hd->arch.amd.root_table = NULL;
+    spin_unlock(&hd->arch.mapping_lock);
+}
+
 static void amd_iommu_domain_destroy(struct domain *d)
 {
-    dom_iommu(d)->arch.amd.root_table = NULL;
+    ASSERT(!dom_iommu(d)->arch.amd.root_table);
 }
 
 static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
@@ -565,6 +574,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .remove_device = amd_iommu_remove_device,
     .assign_device  = amd_iommu_assign_device,
     .teardown = amd_iommu_domain_destroy,
+    .clear_root_pgtable = iommu_clear_root_pgtable,
     .map_page = amd_iommu_map_page,
     .unmap_page = amd_iommu_unmap_page,
     .iotlb_flush = amd_iommu_flush_iotlb_pages,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d136fe36883b..e1871f6c2bc1 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1726,6 +1726,15 @@ out:
     return ret;
 }
 
+static void iommu_clear_root_pgtable(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+
+    spin_lock(&hd->arch.mapping_lock);
+    hd->arch.vtd.pgd_maddr = 0;
+    spin_unlock(&hd->arch.mapping_lock);
+}
+
 static void iommu_domain_teardown(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
@@ -1740,7 +1749,7 @@ static void iommu_domain_teardown(struct domain *d)
         xfree(mrmrr);
     }
 
-    hd->arch.vtd.pgd_maddr = 0;
+    ASSERT(!hd->arch.vtd.pgd_maddr);
 }
 
 static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -2719,6 +2728,7 @@ static struct iommu_ops __initdata vtd_ops = {
     .remove_device = intel_iommu_remove_device,
     .assign_device  = intel_iommu_assign_device,
     .teardown = iommu_domain_teardown,
+    .clear_root_pgtable = iommu_clear_root_pgtable,
     .map_page = intel_iommu_map_page,
     .unmap_page = intel_iommu_unmap_page,
     .lookup_page = intel_iommu_lookup_page,
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 82d770107a47..d3cdec6ee83f 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -280,6 +280,12 @@ int iommu_free_pgtables(struct domain *d)
     /* After this barrier no new page allocations can occur. */
     spin_barrier(&hd->arch.pgtables.lock);
 
+    /*
+     * Pages will be moved to the free list in a bit. So we want to
+     * clear the root page-table to avoid any potential use after-free.
+     */
+    hd->platform_ops->clear_root_pgtable(d);
+
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 863a68fe1622..d59ed7cbad43 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -272,6 +272,7 @@ struct iommu_ops {
 
     int (*adjust_irq_affinities)(void);
     void (*sync_cache)(const void *addr, unsigned int size);
+    void (*clear_root_pgtable)(struct domain *d);
 #endif /* CONFIG_X86 */
 
     int __must_check (*suspend)(void);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn
  2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
                   ` (4 preceding siblings ...)
  2021-02-09 15:28 ` [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2021-02-09 16:47 ` Ian Jackson
  2021-02-17 11:33   ` Julien Grall
  5 siblings, 1 reply; 47+ messages in thread
From: Ian Jackson @ 2021-02-09 16:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, hongyxia, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Paul Durrant, Kevin Tian

Julien Grall writes ("[for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn"):
> From: Julien Grall <jgrall@amazon.com>
...
> This series is a collection of bug fixes for the IOMMU teardown code.
> All of them are candidate for 4.15 as they can either leak memory or
> lead to host crash/host corruption.
> 
> This is sent directly on xen-devel because all the issues were either
> introduced in 4.15 or happen in the domain creation code.

I think by current freeze rules this does not need a release-ack but
for the avoidance of doubt

Release-Acked-by: Ian Jackson <iwj@xenproject.org>

assuming it's commited by the end of the week.

Ian.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
@ 2021-02-09 20:19   ` Paul Durrant
  2021-02-10  8:29   ` Roger Pau Monné
  2021-02-10 11:26   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Paul Durrant @ 2021-02-09 20:19 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: hongyxia, iwj, 'Julien Grall', 'Jan Beulich',
	'Andrew Cooper', 'Roger Pau Monné',
	'Wei Liu'

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Julien Grall
> Sent: 09 February 2021 15:28
> To: xen-devel@lists.xenproject.org
> Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
> <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau Monné
> <roger.pau@citrix.com>; Wei Liu <wl@xen.org>
> Subject: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the IOMMU page-tables will be populated early in the domain
> creation if the hardware is able to virtualize the local APIC. However,
> the IOMMU page tables will not be freed during early failure and will
> result to a leak.
> 
> An assigned device should not need to DMA into the vLAPIC page, so we
> can avoid to map the page in the IOMMU page-tables.
> 
> This statement is also true for any special pages (the vLAPIC page is
> one of them). So to take the opportunity to prevent the mapping for all
> of them.
> 
> Note that:
>     - This is matching the existing behavior with PV guest
>     - This doesn't change the behavior when the P2M is shared with the
>     IOMMU. IOW, the special pages will still be accessibled by the
>     device.
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 

Reviewed-by: Paul Durrant <paul@xen.org>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down
  2021-02-09 15:28 ` [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
@ 2021-02-09 20:22   ` Paul Durrant
  2021-02-17 11:25     ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Durrant @ 2021-02-09 20:22 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: hongyxia, iwj, 'Julien Grall', 'Jan Beulich'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 09 February 2021 15:28
> To: xen-devel@lists.xenproject.org
> Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
> <jbeulich@suse.com>; Paul Durrant <paul@xen.org>
> Subject: [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> is_iommu_enabled() will return true even if the IOMMU has not been
> initialized (e.g. the ops are not set).
> 
> In the case of an early failure in arch_domain_init(), the function
> iommu_destroy_domain() will be called even if the IOMMU is not
> initialized.
> 
> This will result to dereference the ops which will be NULL and an host
> crash.
> 
> Fix the issue by checking that ops has been set before accessing it.
> 
> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Paul Durrant <paul@xen.org>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
  2021-02-09 15:28 ` [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying Julien Grall
@ 2021-02-09 20:28   ` Paul Durrant
  2021-02-09 21:14     ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Paul Durrant @ 2021-02-09 20:28 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: hongyxia, iwj, 'Julien Grall', 'Jan Beulich'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 09 February 2021 15:28
> To: xen-devel@lists.xenproject.org
> Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
> <jbeulich@suse.com>; Paul Durrant <paul@xen.org>
> Subject: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> It is a bit pointless to crash a domain that is already dying. This will
> become more an annoyance with a follow-up change where page-table
> allocation will be forbidden when the domain is dying.
> 
> Security wise, there is no change as the devices would still have access
> to the IOMMU page-tables even if the domain has crashed until Xen
> start to relinquish the resources.
> 
> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
> d->is_dying is correctly observed (a follow-up patch will held it in the

s/held/hold

> relinquish path).
> 
> For Arm, there is still a small race possible. But there is so far no
> failure specific to a domain dying.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
> 
> This was spotted when trying to destroy IOREQ servers while the domain
> is dying. The code will try to add the entry back in the P2M and
> therefore update the P2M (see arch_ioreq_server_disable() ->
> hvm_add_ioreq_gfn()).
> 
> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however

s/mappin/mapping

> I didn't try a patch yet because checking d->is_dying can be racy (I
> can't find a proper lock).
> 
> Changes in v2:
>     - Patch added
> ---
>  xen/drivers/passthrough/iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 879d238bcd31..75419f20f76d 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
>                              flush_flags) )
>                  continue;
> 
> -        if ( !is_hardware_domain(d) )
> +        if ( !is_hardware_domain(d) && !d->is_dying )
>              domain_crash(d);

Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)?

  Paul

> 
>          break;
> --
> 2.17.1




^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
@ 2021-02-09 20:33   ` Paul Durrant
  2021-02-10 14:32   ` Jan Beulich
  2021-02-10 14:44   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Paul Durrant @ 2021-02-09 20:33 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: hongyxia, iwj, 'Julien Grall', 'Jan Beulich'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 09 February 2021 15:28
> To: xen-devel@lists.xenproject.org
> Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
> <jbeulich@suse.com>; Paul Durrant <paul@xen.org>
> Subject: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> The new IOMMU page-tables allocator will release the pages when
> relinquish the domain resources. However, this is not sufficient when
> the domain is dying because nothing prevents page-table to be allocated.
> 
> iommu_alloc_pgtable() is now checking if the domain is dying before
> adding the page in the list. We are relying on &hd->arch.pgtables.lock
> to synchronize d->is_dying.
> 
> Take the opportunity to check in arch_iommu_domain_destroy() that all
> that page tables have been freed.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 

Reviewed-by: Paul Durrant <paul@xen.org>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2021-02-09 20:36   ` Paul Durrant
  2021-02-10  2:21   ` Tian, Kevin
  2021-02-10 14:39   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Paul Durrant @ 2021-02-09 20:36 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: hongyxia, iwj, 'Julien Grall', 'Jan Beulich',
	'Andrew Cooper', 'Kevin Tian'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 09 February 2021 15:28
> To: xen-devel@lists.xenproject.org
> Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
> <jbeulich@suse.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Kevin Tian <kevin.tian@intel.com>;
> Paul Durrant <paul@xen.org>
> Subject: [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-
> tables
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> The new per-domain IOMMU page-table allocator will now free the
> page-tables when domain's resources are relinquished. However, the root
> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
> 
> Xen may access the IOMMU page-tables afterwards at least in the case of
> PV domain:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
> (XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
> 
> This will result to a use after-free and possibly an host crash or
> memory corruption.
> 
> Freeing the page-tables further down in domain_relinquish_resources()
> would not work because pages may not be released until later if another
> domain hold a reference on them.
> 
> Once all the PCI devices have been de-assigned, it is actually pointless
> to access modify the IOMMU page-tables. So we can simply clear the root
> page-table address.
> 
> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
>     Changes in v2:
>         - Introduce clear_root_pgtable()
>         - Move the patch later in the series
> ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++++++++-
>  xen/drivers/passthrough/vtd/iommu.c         | 12 +++++++++++-
>  xen/drivers/passthrough/x86/iommu.c         |  6 ++++++
>  xen/include/xen/iommu.h                     |  1 +
>  4 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 42b5a5a9bec4..81add0ba26b4 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
>      return reassign_device(pdev->domain, d, devfn, pdev);
>  }
> 
> +static void iommu_clear_root_pgtable(struct domain *d)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +    hd->arch.amd.root_table = NULL;
> +    spin_unlock(&hd->arch.mapping_lock);
> +}
> +
>  static void amd_iommu_domain_destroy(struct domain *d)
>  {
> -    dom_iommu(d)->arch.amd.root_table = NULL;
> +    ASSERT(!dom_iommu(d)->arch.amd.root_table);
>  }
> 
>  static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
> @@ -565,6 +574,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
>      .remove_device = amd_iommu_remove_device,
>      .assign_device  = amd_iommu_assign_device,
>      .teardown = amd_iommu_domain_destroy,
> +    .clear_root_pgtable = iommu_clear_root_pgtable,
>      .map_page = amd_iommu_map_page,
>      .unmap_page = amd_iommu_unmap_page,
>      .iotlb_flush = amd_iommu_flush_iotlb_pages,
> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
> index d136fe36883b..e1871f6c2bc1 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1726,6 +1726,15 @@ out:
>      return ret;
>  }
> 
> +static void iommu_clear_root_pgtable(struct domain *d)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +    hd->arch.vtd.pgd_maddr = 0;
> +    spin_unlock(&hd->arch.mapping_lock);
> +}
> +
>  static void iommu_domain_teardown(struct domain *d)
>  {
>      struct domain_iommu *hd = dom_iommu(d);
> @@ -1740,7 +1749,7 @@ static void iommu_domain_teardown(struct domain *d)
>          xfree(mrmrr);
>      }
> 
> -    hd->arch.vtd.pgd_maddr = 0;
> +    ASSERT(!hd->arch.vtd.pgd_maddr);
>  }
> 
>  static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
> @@ -2719,6 +2728,7 @@ static struct iommu_ops __initdata vtd_ops = {
>      .remove_device = intel_iommu_remove_device,
>      .assign_device  = intel_iommu_assign_device,
>      .teardown = iommu_domain_teardown,
> +    .clear_root_pgtable = iommu_clear_root_pgtable,
>      .map_page = intel_iommu_map_page,
>      .unmap_page = intel_iommu_unmap_page,
>      .lookup_page = intel_iommu_lookup_page,
> diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
> index 82d770107a47..d3cdec6ee83f 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -280,6 +280,12 @@ int iommu_free_pgtables(struct domain *d)
>      /* After this barrier no new page allocations can occur. */
>      spin_barrier(&hd->arch.pgtables.lock);
> 
> +    /*
> +     * Pages will be moved to the free list in a bit. So we want to

'in a bit' sounds a little odd. I think you could just say 'below'. The code looks fine though so...

Reviewed-by: Paul Durrant <paul@xen.org>

> +     * clear the root page-table to avoid any potential use after-free.
> +     */
> +    hd->platform_ops->clear_root_pgtable(d);
> +
>      while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
>      {
>          free_domheap_page(pg);
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 863a68fe1622..d59ed7cbad43 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -272,6 +272,7 @@ struct iommu_ops {
> 
>      int (*adjust_irq_affinities)(void);
>      void (*sync_cache)(const void *addr, unsigned int size);
> +    void (*clear_root_pgtable)(struct domain *d);
>  #endif /* CONFIG_X86 */
> 
>      int __must_check (*suspend)(void);
> --
> 2.17.1




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
  2021-02-09 20:28   ` Paul Durrant
@ 2021-02-09 21:14     ` Julien Grall
  2021-02-10 14:14       ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-09 21:14 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, hongyxia, Ian Jackson, Julien Grall, Jan Beulich

On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote:
>
> > -----Original Message-----
> > From: Julien Grall <julien@xen.org>
> > Sent: 09 February 2021 15:28
> > To: xen-devel@lists.xenproject.org
> > Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
> > <jbeulich@suse.com>; Paul Durrant <paul@xen.org>
> > Subject: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
> >
> > From: Julien Grall <jgrall@amazon.com>
> >
> > It is a bit pointless to crash a domain that is already dying. This will
> > become more an annoyance with a follow-up change where page-table
> > allocation will be forbidden when the domain is dying.
> >
> > Security wise, there is no change as the devices would still have access
> > to the IOMMU page-tables even if the domain has crashed until Xen
> > start to relinquish the resources.
> >
> > For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
> > d->is_dying is correctly observed (a follow-up patch will held it in the
>
> s/held/hold
>
> > relinquish path).
> >
> > For Arm, there is still a small race possible. But there is so far no
> > failure specific to a domain dying.
> >
> > Signed-off-by: Julien Grall <jgrall@amazon.com>
> >
> > ---
> >
> > This was spotted when trying to destroy IOREQ servers while the domain
> > is dying. The code will try to add the entry back in the P2M and
> > therefore update the P2M (see arch_ioreq_server_disable() ->
> > hvm_add_ioreq_gfn()).
> >
> > It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however
>
> s/mappin/mapping
>
> > I didn't try a patch yet because checking d->is_dying can be racy (I
> > can't find a proper lock).
> >
> > Changes in v2:
> >     - Patch added
> > ---
> >  xen/drivers/passthrough/iommu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> > index 879d238bcd31..75419f20f76d 100644
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
> >                              flush_flags) )
> >                  continue;
> >
> > -        if ( !is_hardware_domain(d) )
> > +        if ( !is_hardware_domain(d) && !d->is_dying )
> >              domain_crash(d);
>
> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)?

Jan also suggested moving the check in domain_crash(). However, I felt
it is potentially a too risky change for 4.15 as there are quite a few
callers.

Cheers,


^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
  2021-02-09 20:36   ` Paul Durrant
@ 2021-02-10  2:21   ` Tian, Kevin
  2021-02-17 13:54     ` Julien Grall
  2021-02-10 14:39   ` Jan Beulich
  2 siblings, 1 reply; 47+ messages in thread
From: Tian, Kevin @ 2021-02-10  2:21 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Cooper, Andrew, Paul Durrant

> From: Julien Grall <julien@xen.org>
> Sent: Tuesday, February 9, 2021 11:28 PM
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> The new per-domain IOMMU page-table allocator will now free the
> page-tables when domain's resources are relinquished. However, the root
> page-table (i.e. hd->arch.pg_maddr) will not be cleared.

It's the pointer not the table itself.

> 
> Xen may access the IOMMU page-tables afterwards at least in the case of
> PV domain:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04025b4b2>] R
> iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
> (XEN)    [<ffff82d04025b695>] F
> iommu.c#intel_iommu_unmap_page+0x5d/0xf8
> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
> 
> This will result to a use after-free and possibly an host crash or
> memory corruption.
> 
> Freeing the page-tables further down in domain_relinquish_resources()
> would not work because pages may not be released until later if another
> domain hold a reference on them.
> 
> Once all the PCI devices have been de-assigned, it is actually pointless
> to access modify the IOMMU page-tables. So we can simply clear the root
> page-table address.

Above two paragraphs are confusing to me. I don't know what exact change
is proposed until looking over the whole patch. Isn't it clearer to just say
"We should clear the root page table pointer in iommu_free_pgtables to
avoid use-after-free after all pgtables are moved to the free list"?

otherwise:

	Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> 
> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table
> allocator")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
>     Changes in v2:
>         - Introduce clear_root_pgtable()
>         - Move the patch later in the series
> ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++++++++-
>  xen/drivers/passthrough/vtd/iommu.c         | 12 +++++++++++-
>  xen/drivers/passthrough/x86/iommu.c         |  6 ++++++
>  xen/include/xen/iommu.h                     |  1 +
>  4 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 42b5a5a9bec4..81add0ba26b4 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain
> *d, u8 devfn,
>      return reassign_device(pdev->domain, d, devfn, pdev);
>  }
> 
> +static void iommu_clear_root_pgtable(struct domain *d)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +    hd->arch.amd.root_table = NULL;
> +    spin_unlock(&hd->arch.mapping_lock);
> +}
> +
>  static void amd_iommu_domain_destroy(struct domain *d)
>  {
> -    dom_iommu(d)->arch.amd.root_table = NULL;
> +    ASSERT(!dom_iommu(d)->arch.amd.root_table);
>  }
> 
>  static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
> @@ -565,6 +574,7 @@ static const struct iommu_ops __initconstrel
> _iommu_ops = {
>      .remove_device = amd_iommu_remove_device,
>      .assign_device  = amd_iommu_assign_device,
>      .teardown = amd_iommu_domain_destroy,
> +    .clear_root_pgtable = iommu_clear_root_pgtable,
>      .map_page = amd_iommu_map_page,
>      .unmap_page = amd_iommu_unmap_page,
>      .iotlb_flush = amd_iommu_flush_iotlb_pages,
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index d136fe36883b..e1871f6c2bc1 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1726,6 +1726,15 @@ out:
>      return ret;
>  }
> 
> +static void iommu_clear_root_pgtable(struct domain *d)
> +{
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    spin_lock(&hd->arch.mapping_lock);
> +    hd->arch.vtd.pgd_maddr = 0;
> +    spin_unlock(&hd->arch.mapping_lock);
> +}
> +
>  static void iommu_domain_teardown(struct domain *d)
>  {
>      struct domain_iommu *hd = dom_iommu(d);
> @@ -1740,7 +1749,7 @@ static void iommu_domain_teardown(struct
> domain *d)
>          xfree(mrmrr);
>      }
> 
> -    hd->arch.vtd.pgd_maddr = 0;
> +    ASSERT(!hd->arch.vtd.pgd_maddr);
>  }
> 
>  static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
> @@ -2719,6 +2728,7 @@ static struct iommu_ops __initdata vtd_ops = {
>      .remove_device = intel_iommu_remove_device,
>      .assign_device  = intel_iommu_assign_device,
>      .teardown = iommu_domain_teardown,
> +    .clear_root_pgtable = iommu_clear_root_pgtable,
>      .map_page = intel_iommu_map_page,
>      .unmap_page = intel_iommu_unmap_page,
>      .lookup_page = intel_iommu_lookup_page,
> diff --git a/xen/drivers/passthrough/x86/iommu.c
> b/xen/drivers/passthrough/x86/iommu.c
> index 82d770107a47..d3cdec6ee83f 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -280,6 +280,12 @@ int iommu_free_pgtables(struct domain *d)
>      /* After this barrier no new page allocations can occur. */
>      spin_barrier(&hd->arch.pgtables.lock);
> 
> +    /*
> +     * Pages will be moved to the free list in a bit. So we want to
> +     * clear the root page-table to avoid any potential use after-free.
> +     */
> +    hd->platform_ops->clear_root_pgtable(d);
> +
>      while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
>      {
>          free_domheap_page(pg);
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 863a68fe1622..d59ed7cbad43 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -272,6 +272,7 @@ struct iommu_ops {
> 
>      int (*adjust_irq_affinities)(void);
>      void (*sync_cache)(const void *addr, unsigned int size);
> +    void (*clear_root_pgtable)(struct domain *d);
>  #endif /* CONFIG_X86 */
> 
>      int __must_check (*suspend)(void);
> --
> 2.17.1



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
  2021-02-09 20:19   ` Paul Durrant
@ 2021-02-10  8:29   ` Roger Pau Monné
  2021-02-10  8:50     ` Julien Grall
  2021-02-10 11:10     ` Jan Beulich
  2021-02-10 11:26   ` Jan Beulich
  2 siblings, 2 replies; 47+ messages in thread
From: Roger Pau Monné @ 2021-02-10  8:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Jan Beulich,
	Andrew Cooper, Wei Liu

On Tue, Feb 09, 2021 at 03:28:12PM +0000, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the IOMMU page-tables will be populated early in the domain
> creation if the hardware is able to virtualize the local APIC. However,
> the IOMMU page tables will not be freed during early failure and will
> result to a leak.
> 
> An assigned device should not need to DMA into the vLAPIC page, so we
> can avoid to map the page in the IOMMU page-tables.
> 
> This statement is also true for any special pages (the vLAPIC page is
> one of them). So to take the opportunity to prevent the mapping for all
> of them.

Hm, OK, while I assume it's likely for special pages to not be target
of DMA operations, it's not easy to spot what are special pages.

> Note that:
>     - This is matching the existing behavior with PV guest

You might make HVM guests not sharing page-tables 'match' PV
behavior, but you are making behavior between HVM guests themselves
diverge.


>     - This doesn't change the behavior when the P2M is shared with the
>     IOMMU. IOW, the special pages will still be accessibled by the
>     device.

I have to admit I don't like this part at all. Having diverging device
mappings depending on whether the page tables are shared or not is
bad IMO, as there might be subtle bugs affecting one of the two
modes.

I get the feeling this is just papering over an existing issue instead
of actually fixing it: IOMMU page tables need to be properly freed
during early failure.

> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
>     Changes in v2:
>         - New patch
> ---
>  xen/include/asm-x86/p2m.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 7d63f5787e62..1802545969b3 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -919,6 +919,10 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>  {
>      unsigned int flags;
>  
> +    /* Don't map special pages in the IOMMU page-tables. */

I think this should explain why special pages don't require IOMMU
mappings, or even just note that special pages cannot be added to the
IOMMU page tables due to failure to free them afterwards and that this
is a bodge for it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10  8:29   ` Roger Pau Monné
@ 2021-02-10  8:50     ` Julien Grall
  2021-02-10  9:34       ` Julien Grall
  2021-02-10 11:10     ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-10  8:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Jan Beulich,
	Andrew Cooper, Wei Liu

Hi Roger,

On 10/02/2021 08:29, Roger Pau Monné wrote:
> On Tue, Feb 09, 2021 at 03:28:12PM +0000, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, the IOMMU page-tables will be populated early in the domain
>> creation if the hardware is able to virtualize the local APIC. However,
>> the IOMMU page tables will not be freed during early failure and will
>> result to a leak.
>>
>> An assigned device should not need to DMA into the vLAPIC page, so we
>> can avoid to map the page in the IOMMU page-tables.
>>
>> This statement is also true for any special pages (the vLAPIC page is
>> one of them). So to take the opportunity to prevent the mapping for all
>> of them.
> 
> Hm, OK, while I assume it's likely for special pages to not be target
> of DMA operations, it's not easy to spot what are special pages.

Special pages are allocated by Xen for grant-table, vCPU info...

> 
>> Note that:
>>      - This is matching the existing behavior with PV guest
> 
> You might make HVM guests not sharing page-tables 'match' PV
> behavior, but you are making behavior between HVM guests themselves
> diverge.
> 
> 
>>      - This doesn't change the behavior when the P2M is shared with the
>>      IOMMU. IOW, the special pages will still be accessibled by the
>>      device.
> 
> I have to admit I don't like this part at all. Having diverging device
> mappings depending on whether the page tables are shared or not is
> bad IMO, as there might be subtle bugs affecting one of the two
> modes.
> 
> I get the feeling this is just papering over an existing issue instead
> of actually fixing it: IOMMU page tables need to be properly freed
> during early failure.

My initial approach was to free the IOMMU page tables during early 
failure (see [1] and [2]). But Jan said the special pages should really 
not be mapped in the IOMMU ([3]) and he wasn't very happy with freeing 
the IOMMU pages table for early failure.

I don't particularly care about the approach as long as we don't leak 
IOMMU page-tables at the end.

So please try to find a common ground with Jan here.

Cheers,

[1] <20201222154338.9459-1-julien@xen.org>
[2] <20201222154338.9459-5-julien@xen.org>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10  8:50     ` Julien Grall
@ 2021-02-10  9:34       ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-10  9:34 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Jan Beulich,
	Andrew Cooper, Wei Liu

On 10/02/2021 08:50, Julien Grall wrote:
> Hi Roger,
> 
> On 10/02/2021 08:29, Roger Pau Monné wrote:
>> On Tue, Feb 09, 2021 at 03:28:12PM +0000, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> Currently, the IOMMU page-tables will be populated early in the domain
>>> creation if the hardware is able to virtualize the local APIC. However,
>>> the IOMMU page tables will not be freed during early failure and will
>>> result to a leak.
>>>
>>> An assigned device should not need to DMA into the vLAPIC page, so we
>>> can avoid to map the page in the IOMMU page-tables.
>>>
>>> This statement is also true for any special pages (the vLAPIC page is
>>> one of them). So to take the opportunity to prevent the mapping for all
>>> of them.
>>
>> Hm, OK, while I assume it's likely for special pages to not be target
>> of DMA operations, it's not easy to spot what are special pages.
> 
> Special pages are allocated by Xen for grant-table, vCPU info...
> 
>>
>>> Note that:
>>>      - This is matching the existing behavior with PV guest
>>
>> You might make HVM guests not sharing page-tables 'match' PV
>> behavior, but you are making behavior between HVM guests themselves
>> diverge.
>>
>>
>>>      - This doesn't change the behavior when the P2M is shared with the
>>>      IOMMU. IOW, the special pages will still be accessibled by the
>>>      device.
>>
>> I have to admit I don't like this part at all. Having diverging device
>> mappings depending on whether the page tables are shared or not is
>> bad IMO, as there might be subtle bugs affecting one of the two
>> modes.
>>
>> I get the feeling this is just papering over an existing issue instead
>> of actually fixing it: IOMMU page tables need to be properly freed
>> during early failure.
> 
> My initial approach was to free the IOMMU page tables during early 
> failure (see [1] and [2]). But Jan said the special pages should really 
> not be mapped in the IOMMU ([3]) and he wasn't very happy with freeing 
> the IOMMU pages table for early failure.
> 
> I don't particularly care about the approach as long as we don't leak 
> IOMMU page-tables at the end.
> 
> So please try to find a common ground with Jan here.
> 
> Cheers,
> 
> [1] <20201222154338.9459-1-julien@xen.org>
> [2] <20201222154338.9459-5-julien@xen.org>

Roger pointed out on IRC that I forgot to add a link for [3]. So here we go:

[3] <a22f7364-518f-ea5f-3b87-5c0462cfc193@suse.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10  8:29   ` Roger Pau Monné
  2021-02-10  8:50     ` Julien Grall
@ 2021-02-10 11:10     ` Jan Beulich
  2021-02-10 11:34       ` Roger Pau Monné
  1 sibling, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 11:10 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Julien Grall

On 10.02.2021 09:29, Roger Pau Monné wrote:
> On Tue, Feb 09, 2021 at 03:28:12PM +0000, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, the IOMMU page-tables will be populated early in the domain
>> creation if the hardware is able to virtualize the local APIC. However,
>> the IOMMU page tables will not be freed during early failure and will
>> result to a leak.
>>
>> An assigned device should not need to DMA into the vLAPIC page, so we
>> can avoid to map the page in the IOMMU page-tables.
>>
>> This statement is also true for any special pages (the vLAPIC page is
>> one of them). So to take the opportunity to prevent the mapping for all
>> of them.
> 
> Hm, OK, while I assume it's likely for special pages to not be target
> of DMA operations, it's not easy to spot what are special pages.
> 
>> Note that:
>>     - This is matching the existing behavior with PV guest
> 
> You might make HVM guests not sharing page-tables 'match' PV
> behavior, but you are making behavior between HVM guests themselves
> diverge.
> 
> 
>>     - This doesn't change the behavior when the P2M is shared with the
>>     IOMMU. IOW, the special pages will still be accessibled by the
>>     device.
> 
> I have to admit I don't like this part at all. Having diverging device
> mappings depending on whether the page tables are shared or not is
> bad IMO, as there might be subtle bugs affecting one of the two
> modes.

This is one way to look at things, yes. But if you take the
other perspective that special pages shouldn't be
IOMMU-mapped, then the divergence is the price to pay for
being able to share pages (and it's not Julien introducing
bad behavior here).

Additionally it may be possible to utilize the divergence to
our advantage: If one way of setting up things works and the
other doesn't, we have a reasonable clue where to look. In
fact the aspect above may, together with possible future
findings, end up being a reason to not default to or even
disallow (like for AMD) page table sharing.

> I get the feeling this is just papering over an existing issue instead
> of actually fixing it: IOMMU page tables need to be properly freed
> during early failure.

I take a different perspective: IOMMU page tables shouldn't
get created (yet) at all in the course of
XEN_DOMCTL_createdomain - this op is supposed to produce an
empty container for a VM.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
  2021-02-09 20:19   ` Paul Durrant
  2021-02-10  8:29   ` Roger Pau Monné
@ 2021-02-10 11:26   ` Jan Beulich
  2021-02-15 11:38     ` Julien Grall
  2 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 11:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel

On 09.02.2021 16:28, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, the IOMMU page-tables will be populated early in the domain
> creation if the hardware is able to virtualize the local APIC. However,
> the IOMMU page tables will not be freed during early failure and will
> result to a leak.
> 
> An assigned device should not need to DMA into the vLAPIC page, so we
> can avoid to map the page in the IOMMU page-tables.

Here and below, may I ask that you use the correct term "APIC
access page", as there are other pages involved in vLAPIC
handling (in particular the virtual APIC page, which is where
accesses actually go to that translate to the APIC access page
in EPT).

> This statement is also true for any special pages (the vLAPIC page is
> one of them). So to take the opportunity to prevent the mapping for all
> of them.

I probably should have realized this earlier, but there is a
downside to this: A guest wanting to core dump itself may want
to dump e.g. shared info and vcpu info pages. Hence ...

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -919,6 +919,10 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>  {
>      unsigned int flags;
>  
> +    /* Don't map special pages in the IOMMU page-tables. */
> +    if ( mfn_valid(mfn) && is_special_page(mfn_to_page(mfn)) )
> +        return 0;

... instead of is_special_page() I think you want to limit the
check here to seeing whether PGC_extra is set.

But as said on irc, since this crude way of setting up the APIC
access page is now firmly a problem, I intend to try to redo it.
I can't tell yet whether this will still take a PGC_extra page
of some form (nor if my plans will work out in the first place).
Irrespective of this I think we indeed want to exclude PGC_extra
pages from getting mapped. However, the APIC access page is, I
think, an outlier here - we wouldn't insert other PGC_extra
pages into the p2m, so for all other cases the above addition
would be effectively dead code.

Jan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:10     ` Jan Beulich
@ 2021-02-10 11:34       ` Roger Pau Monné
  2021-02-10 11:38         ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Roger Pau Monné @ 2021-02-10 11:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Julien Grall

On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
> On 10.02.2021 09:29, Roger Pau Monné wrote:
> > On Tue, Feb 09, 2021 at 03:28:12PM +0000, Julien Grall wrote:
> >> From: Julien Grall <jgrall@amazon.com>
> >>
> >> Currently, the IOMMU page-tables will be populated early in the domain
> >> creation if the hardware is able to virtualize the local APIC. However,
> >> the IOMMU page tables will not be freed during early failure and will
> >> result to a leak.
> >>
> >> An assigned device should not need to DMA into the vLAPIC page, so we
> >> can avoid to map the page in the IOMMU page-tables.
> >>
> >> This statement is also true for any special pages (the vLAPIC page is
> >> one of them). So to take the opportunity to prevent the mapping for all
> >> of them.
> > 
> > Hm, OK, while I assume it's likely for special pages to not be target
> > of DMA operations, it's not easy to spot what are special pages.
> > 
> >> Note that:
> >>     - This is matching the existing behavior with PV guest
> > 
> > You might make HVM guests not sharing page-tables 'match' PV
> > behavior, but you are making behavior between HVM guests themselves
> > diverge.
> > 
> > 
> >>     - This doesn't change the behavior when the P2M is shared with the
> >>     IOMMU. IOW, the special pages will still be accessibled by the
> >>     device.
> > 
> > I have to admit I don't like this part at all. Having diverging device
> > mappings depending on whether the page tables are shared or not is
> > bad IMO, as there might be subtle bugs affecting one of the two
> > modes.
> 
> This is one way to look at things, yes. But if you take the
> other perspective that special pages shouldn't be
> IOMMU-mapped, then the divergence is the price to pay for
> being able to share pages (and it's not Julien introducing
> bad behavior here).

Since when sharing we have no option but to make whatever is
accessible to the CPU also available to devices, it would seem
reasonable to also do it when not sharing.

> Additionally it may be possible to utilize the divergence to
> our advantage: If one way of setting up things works and the
> other doesn't, we have a reasonable clue where to look. In
> fact the aspect above may, together with possible future
> findings, end up being a reason to not default to or even
> disallow (like for AMD) page table sharing.

I think such approach is risky: we don't likely test VT-d without
sharing that much (or at all?), now that IOMMUs support the same page
sizes as EPT, hence it's likely for bugs to go unnoticed.

> > I get the feeling this is just papering over an existing issue instead
> > of actually fixing it: IOMMU page tables need to be properly freed
> > during early failure.
> 
> I take a different perspective: IOMMU page tables shouldn't
> get created (yet) at all in the course of
> XEN_DOMCTL_createdomain - this op is supposed to produce an
> empty container for a VM.

The same would apply for CPU page-tables then, yet they seem to be
created and populating them (ie: adding the lapic access page) doesn't
leak such entries, which points at an asymmetry. Either we setup both
tables and handle freeing them properly, or we set none of them.

Roger.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:34       ` Roger Pau Monné
@ 2021-02-10 11:38         ` Jan Beulich
  2021-02-10 11:40           ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 11:38 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Julien Grall

On 10.02.2021 12:34, Roger Pau Monné wrote:
> On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
>> On 10.02.2021 09:29, Roger Pau Monné wrote:
>>> I get the feeling this is just papering over an existing issue instead
>>> of actually fixing it: IOMMU page tables need to be properly freed
>>> during early failure.
>>
>> I take a different perspective: IOMMU page tables shouldn't
>> get created (yet) at all in the course of
>> XEN_DOMCTL_createdomain - this op is supposed to produce an
>> empty container for a VM.
> 
> The same would apply for CPU page-tables then, yet they seem to be
> created and populating them (ie: adding the lapic access page) doesn't
> leak such entries, which points at an asymmetry. Either we setup both
> tables and handle freeing them properly, or we set none of them.

Where would CPU page tables get created from at this early stage?
There's no memory assigned to the guest yet, so there's nothing to
map afaics.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:38         ` Jan Beulich
@ 2021-02-10 11:40           ` Julien Grall
  2021-02-10 11:45             ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-10 11:40 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu



On 10/02/2021 11:38, Jan Beulich wrote:
> On 10.02.2021 12:34, Roger Pau Monné wrote:
>> On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
>>> On 10.02.2021 09:29, Roger Pau Monné wrote:
>>>> I get the feeling this is just papering over an existing issue instead
>>>> of actually fixing it: IOMMU page tables need to be properly freed
>>>> during early failure.
>>>
>>> I take a different perspective: IOMMU page tables shouldn't
>>> get created (yet) at all in the course of
>>> XEN_DOMCTL_createdomain - this op is supposed to produce an
>>> empty container for a VM.
>>
>> The same would apply for CPU page-tables then, yet they seem to be
>> created and populating them (ie: adding the lapic access page) doesn't
>> leak such entries, which points at an asymmetry. Either we setup both
>> tables and handle freeing them properly, or we set none of them.
> 
> Where would CPU page tables get created from at this early stage?

When mapping the APIC page in the P2M. I don't think you can get away 
with removing it completely.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:40           ` Julien Grall
@ 2021-02-10 11:45             ` Jan Beulich
  2021-02-10 11:48               ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 11:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Roger Pau Monné

On 10.02.2021 12:40, Julien Grall wrote:
> On 10/02/2021 11:38, Jan Beulich wrote:
>> On 10.02.2021 12:34, Roger Pau Monné wrote:
>>> On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
>>>> On 10.02.2021 09:29, Roger Pau Monné wrote:
>>>>> I get the feeling this is just papering over an existing issue instead
>>>>> of actually fixing it: IOMMU page tables need to be properly freed
>>>>> during early failure.
>>>>
>>>> I take a different perspective: IOMMU page tables shouldn't
>>>> get created (yet) at all in the course of
>>>> XEN_DOMCTL_createdomain - this op is supposed to produce an
>>>> empty container for a VM.
>>>
>>> The same would apply for CPU page-tables then, yet they seem to be
>>> created and populating them (ie: adding the lapic access page) doesn't
>>> leak such entries, which points at an asymmetry. Either we setup both
>>> tables and handle freeing them properly, or we set none of them.
>>
>> Where would CPU page tables get created from at this early stage?
> 
> When mapping the APIC page in the P2M. I don't think you can get away 
> with removing it completely.

It doesn't need putting in the p2m this early. It would be quite
fine to defer this until e.g. the first vCPU gets created.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:45             ` Jan Beulich
@ 2021-02-10 11:48               ` Julien Grall
  2021-02-10 11:54                 ` Roger Pau Monné
  2021-02-10 13:08                 ` Jan Beulich
  0 siblings, 2 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-10 11:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Roger Pau Monné



On 10/02/2021 11:45, Jan Beulich wrote:
> On 10.02.2021 12:40, Julien Grall wrote:
>> On 10/02/2021 11:38, Jan Beulich wrote:
>>> On 10.02.2021 12:34, Roger Pau Monné wrote:
>>>> On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
>>>>> On 10.02.2021 09:29, Roger Pau Monné wrote:
>>>>>> I get the feeling this is just papering over an existing issue instead
>>>>>> of actually fixing it: IOMMU page tables need to be properly freed
>>>>>> during early failure.
>>>>>
>>>>> I take a different perspective: IOMMU page tables shouldn't
>>>>> get created (yet) at all in the course of
>>>>> XEN_DOMCTL_createdomain - this op is supposed to produce an
>>>>> empty container for a VM.
>>>>
>>>> The same would apply for CPU page-tables then, yet they seem to be
>>>> created and populating them (ie: adding the lapic access page) doesn't
>>>> leak such entries, which points at an asymmetry. Either we setup both
>>>> tables and handle freeing them properly, or we set none of them.
>>>
>>> Where would CPU page tables get created from at this early stage?
>>
>> When mapping the APIC page in the P2M. I don't think you can get away
>> with removing it completely.
> 
> It doesn't need putting in the p2m this early. It would be quite
> fine to defer this until e.g. the first vCPU gets created.

It feels wrong to me to setup a per-domain mapping when initializing the 
first vCPU.

But, I was under the impression that there is plan to remove 
XEN_DOMCTL_max_vcpus. So it would only buy just a bit of time...

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:48               ` Julien Grall
@ 2021-02-10 11:54                 ` Roger Pau Monné
  2021-02-10 13:12                   ` Jan Beulich
  2021-02-10 13:08                 ` Jan Beulich
  1 sibling, 1 reply; 47+ messages in thread
From: Roger Pau Monné @ 2021-02-10 11:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, xen-devel, hongyxia, iwj, Julien Grall,
	Andrew Cooper, Wei Liu

On Wed, Feb 10, 2021 at 11:48:40AM +0000, Julien Grall wrote:
> 
> 
> On 10/02/2021 11:45, Jan Beulich wrote:
> > On 10.02.2021 12:40, Julien Grall wrote:
> > > On 10/02/2021 11:38, Jan Beulich wrote:
> > > > On 10.02.2021 12:34, Roger Pau Monné wrote:
> > > > > On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
> > > > > > On 10.02.2021 09:29, Roger Pau Monné wrote:
> > > > > > > I get the feeling this is just papering over an existing issue instead
> > > > > > > of actually fixing it: IOMMU page tables need to be properly freed
> > > > > > > during early failure.
> > > > > > 
> > > > > > I take a different perspective: IOMMU page tables shouldn't
> > > > > > get created (yet) at all in the course of
> > > > > > XEN_DOMCTL_createdomain - this op is supposed to produce an
> > > > > > empty container for a VM.
> > > > > 
> > > > > The same would apply for CPU page-tables then, yet they seem to be
> > > > > created and populating them (ie: adding the lapic access page) doesn't
> > > > > leak such entries, which points at an asymmetry. Either we setup both
> > > > > tables and handle freeing them properly, or we set none of them.
> > > > 
> > > > Where would CPU page tables get created from at this early stage?
> > > 
> > > When mapping the APIC page in the P2M. I don't think you can get away
> > > with removing it completely.
> > 
> > It doesn't need putting in the p2m this early. It would be quite
> > fine to defer this until e.g. the first vCPU gets created.
> 
> It feels wrong to me to setup a per-domain mapping when initializing the
> first vCPU.
> 
> But, I was under the impression that there is plan to remove
> XEN_DOMCTL_max_vcpus. So it would only buy just a bit of time...

I was also under that impression. We could setup the lapic access page
at vlapic_init for the BSP (which is part of XEN_DOMCTL_max_vcpus
ATM).

But then I think there should be some kind of check to prevent
populating either the CPU or the IOMMU page tables at domain creation
hypercall, and so the logic to free CPU table tables on failure could
be removed.

Roger.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:48               ` Julien Grall
  2021-02-10 11:54                 ` Roger Pau Monné
@ 2021-02-10 13:08                 ` Jan Beulich
  1 sibling, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 13:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Roger Pau Monné

On 10.02.2021 12:48, Julien Grall wrote:
> 
> 
> On 10/02/2021 11:45, Jan Beulich wrote:
>> On 10.02.2021 12:40, Julien Grall wrote:
>>> On 10/02/2021 11:38, Jan Beulich wrote:
>>>> On 10.02.2021 12:34, Roger Pau Monné wrote:
>>>>> On Wed, Feb 10, 2021 at 12:10:09PM +0100, Jan Beulich wrote:
>>>>>> On 10.02.2021 09:29, Roger Pau Monné wrote:
>>>>>>> I get the feeling this is just papering over an existing issue instead
>>>>>>> of actually fixing it: IOMMU page tables need to be properly freed
>>>>>>> during early failure.
>>>>>>
>>>>>> I take a different perspective: IOMMU page tables shouldn't
>>>>>> get created (yet) at all in the course of
>>>>>> XEN_DOMCTL_createdomain - this op is supposed to produce an
>>>>>> empty container for a VM.
>>>>>
>>>>> The same would apply for CPU page-tables then, yet they seem to be
>>>>> created and populating them (ie: adding the lapic access page) doesn't
>>>>> leak such entries, which points at an asymmetry. Either we setup both
>>>>> tables and handle freeing them properly, or we set none of them.
>>>>
>>>> Where would CPU page tables get created from at this early stage?
>>>
>>> When mapping the APIC page in the P2M. I don't think you can get away
>>> with removing it completely.
>>
>> It doesn't need putting in the p2m this early. It would be quite
>> fine to defer this until e.g. the first vCPU gets created.
> 
> It feels wrong to me to setup a per-domain mapping when initializing the 
> first vCPU.

Then we could do it even later. Any time up to when the domain
would first get unpaused would do.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:54                 ` Roger Pau Monné
@ 2021-02-10 13:12                   ` Jan Beulich
  2021-02-10 15:24                     ` Roger Pau Monné
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 13:12 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Julien Grall

On 10.02.2021 12:54, Roger Pau Monné wrote:
> On Wed, Feb 10, 2021 at 11:48:40AM +0000, Julien Grall wrote:
>> It feels wrong to me to setup a per-domain mapping when initializing the
>> first vCPU.
>>
>> But, I was under the impression that there is plan to remove
>> XEN_DOMCTL_max_vcpus. So it would only buy just a bit of time...
> 
> I was also under that impression. We could setup the lapic access page
> at vlapic_init for the BSP (which is part of XEN_DOMCTL_max_vcpus
> ATM).
> 
> But then I think there should be some kind of check to prevent
> populating either the CPU or the IOMMU page tables at domain creation
> hypercall, and so the logic to free CPU table tables on failure could
> be removed.

I can spot paging_final_teardown() on an error path there, but I
don't suppose that's what you mean? I guess I'm not looking in
the right place (there are quite a few after all) ...

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
  2021-02-09 21:14     ` Julien Grall
@ 2021-02-10 14:14       ` Jan Beulich
  2021-02-10 14:58         ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 14:14 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, hongyxia, Ian Jackson, Julien Grall, Paul Durrant

On 09.02.2021 22:14, Julien Grall wrote:
> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote:
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 09 February 2021 15:28
>>>
>>> It is a bit pointless to crash a domain that is already dying. This will
>>> become more an annoyance with a follow-up change where page-table
>>> allocation will be forbidden when the domain is dying.
>>>
>>> Security wise, there is no change as the devices would still have access
>>> to the IOMMU page-tables even if the domain has crashed until Xen
>>> start to relinquish the resources.
>>>
>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
>>> d->is_dying is correctly observed (a follow-up patch will held it in the
>>> relinquish path).

Am I to understand this to mean that at this point of the series
things aren't really correct yet in this regard? If so, wouldn't
it be better to re-order?

>>> For Arm, there is still a small race possible. But there is so far no
>>> failure specific to a domain dying.
>>>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>
>>> ---
>>>
>>> This was spotted when trying to destroy IOREQ servers while the domain
>>> is dying. The code will try to add the entry back in the P2M and
>>> therefore update the P2M (see arch_ioreq_server_disable() ->
>>> hvm_add_ioreq_gfn()).
>>>
>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however
>>> I didn't try a patch yet because checking d->is_dying can be racy (I
>>> can't find a proper lock).

I understand the concern. I find it odd though that we permit
iommu_map() to do anything at all when the domain is already
dying. So irrespective of the remark below, how about bailing
from iommu_map() earlier when the domain is dying?

>>> --- a/xen/drivers/passthrough/iommu.c
>>> +++ b/xen/drivers/passthrough/iommu.c
>>> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
>>>                              flush_flags) )
>>>                  continue;
>>>
>>> -        if ( !is_hardware_domain(d) )
>>> +        if ( !is_hardware_domain(d) && !d->is_dying )
>>>              domain_crash(d);
>>
>> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)?
> 
> Jan also suggested moving the check in domain_crash(). However, I felt
> it is potentially a too risky change for 4.15 as there are quite a few
> callers.

This is a fair point. However, in such a case I'd prefer symmetry
at least throughout this one source file (there are three more
places), unless there are strong reasons against doing so.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
  2021-02-09 20:33   ` Paul Durrant
@ 2021-02-10 14:32   ` Jan Beulich
  2021-02-10 15:04     ` Julien Grall
  2021-02-10 14:44   ` Jan Beulich
  2 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 14:32 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 09.02.2021 16:28, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new IOMMU page-tables allocator will release the pages when
> relinquish the domain resources. However, this is not sufficient when
> the domain is dying because nothing prevents page-table to be allocated.
> 
> iommu_alloc_pgtable() is now checking if the domain is dying before
> adding the page in the list. We are relying on &hd->arch.pgtables.lock
> to synchronize d->is_dying.

As said in reply to an earlier patch, I think suppressing
(really: ignoring) new mappings would be better. You could
utilize the same lock, but you'd need to duplicate the
checking in {amd,intel}_iommu_map_page().

I'm not entirely certain in the case about unmap requests:
It may be possible to also suppress/ignore them, but this
may require some further thought.

Apart from this, just in case we settle on your current
approach, a few spelling nits:

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)
>  
>  void arch_iommu_domain_destroy(struct domain *d)
>  {
> +    /*
> +     * There should be not page-tables left allocated by the time the

... should be no ...

> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
> +     * called unconditionally, so pgtables may be unitialized.

uninitialized

> @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>      unmap_domain_page(p);
>  
>      spin_lock(&hd->arch.pgtables.lock);
> -    page_list_add(pg, &hd->arch.pgtables.list);
> +    /*
> +     * The IOMMU page-tables are freed when relinquishing the domain, but
> +     * nothing prevent allocation to happen afterwards. There is no valid

prevents

> +     * reasons to continue to update the IOMMU page-tables while the

reason

> +     * domain is dying.
> +     *
> +     * So prevent page-table allocation when the domain is dying.
> +     *
> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.

rely

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
  2021-02-09 20:36   ` Paul Durrant
  2021-02-10  2:21   ` Tian, Kevin
@ 2021-02-10 14:39   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 14:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel

On 09.02.2021 16:28, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new per-domain IOMMU page-table allocator will now free the
> page-tables when domain's resources are relinquished. However, the root
> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
> 
> Xen may access the IOMMU page-tables afterwards at least in the case of
> PV domain:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
> (XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63

So if unmap was (also) short-circuited for dying domains, this
path wouldn't be taken and all would be well even without this
change?

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-09 15:28 ` [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
  2021-02-09 20:33   ` Paul Durrant
  2021-02-10 14:32   ` Jan Beulich
@ 2021-02-10 14:44   ` Jan Beulich
  2 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 14:44 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 09.02.2021 16:28, Julien Grall wrote:
> @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>      unmap_domain_page(p);
>  
>      spin_lock(&hd->arch.pgtables.lock);
> -    page_list_add(pg, &hd->arch.pgtables.list);
> +    /*
> +     * The IOMMU page-tables are freed when relinquishing the domain, but
> +     * nothing prevent allocation to happen afterwards. There is no valid
> +     * reasons to continue to update the IOMMU page-tables while the
> +     * domain is dying.
> +     *
> +     * So prevent page-table allocation when the domain is dying.
> +     *
> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
> +     */
> +    if ( likely(!d->is_dying) )
> +    {
> +        alive = true;
> +        page_list_add(pg, &hd->arch.pgtables.list);
> +    }
>      spin_unlock(&hd->arch.pgtables.lock);
>  
> +    if ( unlikely(!alive) )
> +    {
> +        free_domheap_page(pg);
> +        pg = NULL;
> +    }
> +
>      return pg;
>  }

There's a pretty clear downside to this approach compared to that
of ignoring maps (and perhaps also unmaps) for dying domains: The
caller here will (hopefully) recognize and propagate an error.

Additionally (considering the situation patch 5 fixes) ignoring
unmaps may provide quite a bit of a performance gain for domain
destruction - we don't need every individual page unmapped from
the page tables.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
  2021-02-10 14:14       ` Jan Beulich
@ 2021-02-10 14:58         ` Julien Grall
  2021-02-10 15:56           ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-10 14:58 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: xen-devel, hongyxia, Ian Jackson, Julien Grall, Paul Durrant

Hi Jan,

On 10/02/2021 14:14, Jan Beulich wrote:
> On 09.02.2021 22:14, Julien Grall wrote:
>> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote:
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 09 February 2021 15:28
>>>>
>>>> It is a bit pointless to crash a domain that is already dying. This will
>>>> become more an annoyance with a follow-up change where page-table
>>>> allocation will be forbidden when the domain is dying.
>>>>
>>>> Security wise, there is no change as the devices would still have access
>>>> to the IOMMU page-tables even if the domain has crashed until Xen
>>>> start to relinquish the resources.
>>>>
>>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
>>>> d->is_dying is correctly observed (a follow-up patch will held it in the
>>>> relinquish path).
> 
> Am I to understand this to mean that at this point of the series
> things aren't really correct yet in this regard? If so, wouldn't
> it be better to re-order?

You asked this specific order... So are you saying you want me to use 
the original ordering?

> 
>>>> For Arm, there is still a small race possible. But there is so far no
>>>> failure specific to a domain dying.
>>>>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>
>>>> ---
>>>>
>>>> This was spotted when trying to destroy IOREQ servers while the domain
>>>> is dying. The code will try to add the entry back in the P2M and
>>>> therefore update the P2M (see arch_ioreq_server_disable() ->
>>>> hvm_add_ioreq_gfn()).
>>>>
>>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however
>>>> I didn't try a patch yet because checking d->is_dying can be racy (I
>>>> can't find a proper lock).
> 
> I understand the concern. I find it odd though that we permit
> iommu_map() to do anything at all when the domain is already
> dying. So irrespective of the remark below, how about bailing
> from iommu_map() earlier when the domain is dying?

I felt this was potentially too racy to use it. But it should be fine if 
keep the !d->is_dying below.

> 
>>>> --- a/xen/drivers/passthrough/iommu.c
>>>> +++ b/xen/drivers/passthrough/iommu.c
>>>> @@ -272,7 +272,7 @@ int iommu_map(struct domain *d, dfn_t dfn, mfn_t mfn,
>>>>                               flush_flags) )
>>>>                   continue;
>>>>
>>>> -        if ( !is_hardware_domain(d) )
>>>> +        if ( !is_hardware_domain(d) && !d->is_dying )
>>>>               domain_crash(d);
>>>
>>> Would it make more sense to check is_dying inside domain_crash() (and turn it into a no-op in that case)?
>>
>> Jan also suggested moving the check in domain_crash(). However, I felt
>> it is potentially a too risky change for 4.15 as there are quite a few
>> callers.
> 
> This is a fair point. However, in such a case I'd prefer symmetry
> at least throughout this one source file (there are three more
> places), unless there are strong reasons against doing so.

I can have a look and see if the decision is easy to make.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-10 14:32   ` Jan Beulich
@ 2021-02-10 15:04     ` Julien Grall
  2021-02-10 16:12       ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-10 15:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel



On 10/02/2021 14:32, Jan Beulich wrote:
> On 09.02.2021 16:28, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new IOMMU page-tables allocator will release the pages when
>> relinquish the domain resources. However, this is not sufficient when
>> the domain is dying because nothing prevents page-table to be allocated.
>>
>> iommu_alloc_pgtable() is now checking if the domain is dying before
>> adding the page in the list. We are relying on &hd->arch.pgtables.lock
>> to synchronize d->is_dying.
> 
> As said in reply to an earlier patch, I think suppressing
> (really: ignoring) new mappings would be better.

This is exactly what I suggested in v1 but you wrote:

"Ignoring requests there seems fragile to me. Paul - what are your
thoughts about bailing early from hvm_add_ioreq_gfn() when the
domain is dying?"

Are you know saying that the following snipped would be fine:

if ( d->is_dying )
   return 0;

> You could
> utilize the same lock, but you'd need to duplicate the
> checking in {amd,intel}_iommu_map_page().
> 
> I'm not entirely certain in the case about unmap requests:
> It may be possible to also suppress/ignore them, but this
> may require some further thought.

I think the unmap part is quite risky to d->is_dying because the PCI 
devices may not quiesced and still assigned to the domain.

> 
> Apart from this, just in case we settle on your current
> approach, a few spelling nits:
> 
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)
>>   
>>   void arch_iommu_domain_destroy(struct domain *d)
>>   {
>> +    /*
>> +     * There should be not page-tables left allocated by the time the
> 
> ... should be no ...
> 
>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>> +     * called unconditionally, so pgtables may be unitialized.
> 
> uninitialized
> 
>> @@ -303,9 +317,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>       unmap_domain_page(p);
>>   
>>       spin_lock(&hd->arch.pgtables.lock);
>> -    page_list_add(pg, &hd->arch.pgtables.list);
>> +    /*
>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>> +     * nothing prevent allocation to happen afterwards. There is no valid
> 
> prevents
> 
>> +     * reasons to continue to update the IOMMU page-tables while the
> 
> reason
> 
>> +     * domain is dying.
>> +     *
>> +     * So prevent page-table allocation when the domain is dying.
>> +     *
>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
> 
> rely
> 
> Jan
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 13:12                   ` Jan Beulich
@ 2021-02-10 15:24                     ` Roger Pau Monné
  2021-02-10 15:53                       ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Roger Pau Monné @ 2021-02-10 15:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Julien Grall

On Wed, Feb 10, 2021 at 02:12:38PM +0100, Jan Beulich wrote:
> On 10.02.2021 12:54, Roger Pau Monné wrote:
> > On Wed, Feb 10, 2021 at 11:48:40AM +0000, Julien Grall wrote:
> >> It feels wrong to me to setup a per-domain mapping when initializing the
> >> first vCPU.
> >>
> >> But, I was under the impression that there is plan to remove
> >> XEN_DOMCTL_max_vcpus. So it would only buy just a bit of time...
> > 
> > I was also under that impression. We could setup the lapic access page
> > at vlapic_init for the BSP (which is part of XEN_DOMCTL_max_vcpus
> > ATM).
> > 
> > But then I think there should be some kind of check to prevent
> > populating either the CPU or the IOMMU page tables at domain creation
> > hypercall, and so the logic to free CPU table tables on failure could
> > be removed.
> 
> I can spot paging_final_teardown() on an error path there, but I
> don't suppose that's what you mean? I guess I'm not looking in
> the right place (there are quite a few after all) ...

Well, I assume some freeing of the EPT page tables must happen on
error paths, or else we would be leaking them like IOMMU tables are
leaked currently?

Maybe I've not correctly understanding the issue here.

Roger.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 15:24                     ` Roger Pau Monné
@ 2021-02-10 15:53                       ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 15:53 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, hongyxia, iwj, Julien Grall, Andrew Cooper, Wei Liu,
	Julien Grall

On 10.02.2021 16:24, Roger Pau Monné wrote:
> On Wed, Feb 10, 2021 at 02:12:38PM +0100, Jan Beulich wrote:
>> On 10.02.2021 12:54, Roger Pau Monné wrote:
>>> On Wed, Feb 10, 2021 at 11:48:40AM +0000, Julien Grall wrote:
>>>> It feels wrong to me to setup a per-domain mapping when initializing the
>>>> first vCPU.
>>>>
>>>> But, I was under the impression that there is plan to remove
>>>> XEN_DOMCTL_max_vcpus. So it would only buy just a bit of time...
>>>
>>> I was also under that impression. We could setup the lapic access page
>>> at vlapic_init for the BSP (which is part of XEN_DOMCTL_max_vcpus
>>> ATM).
>>>
>>> But then I think there should be some kind of check to prevent
>>> populating either the CPU or the IOMMU page tables at domain creation
>>> hypercall, and so the logic to free CPU table tables on failure could
>>> be removed.
>>
>> I can spot paging_final_teardown() on an error path there, but I
>> don't suppose that's what you mean? I guess I'm not looking in
>> the right place (there are quite a few after all) ...
> 
> Well, I assume some freeing of the EPT page tables must happen on
> error paths, or else we would be leaking them like IOMMU tables are
> leaked currently?

Well, you can't eliminate paging_final_teardown() from that
error path because it frees internal structures. It _also_
sets HAP's / shadow's allocation to zero, so has the side
effect of freeing why may have been CPU page tables.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying
  2021-02-10 14:58         ` Julien Grall
@ 2021-02-10 15:56           ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 15:56 UTC (permalink / raw)
  To: Julien Grall, Julien Grall
  Cc: xen-devel, hongyxia, Ian Jackson, Julien Grall, Paul Durrant

On 10.02.2021 15:58, Julien Grall wrote:
> Hi Jan,
> 
> On 10/02/2021 14:14, Jan Beulich wrote:
>> On 09.02.2021 22:14, Julien Grall wrote:
>>> On Tue, 9 Feb 2021 at 20:28, Paul Durrant <xadimgnik@gmail.com> wrote:
>>>>> From: Julien Grall <julien@xen.org>
>>>>> Sent: 09 February 2021 15:28
>>>>>
>>>>> It is a bit pointless to crash a domain that is already dying. This will
>>>>> become more an annoyance with a follow-up change where page-table
>>>>> allocation will be forbidden when the domain is dying.
>>>>>
>>>>> Security wise, there is no change as the devices would still have access
>>>>> to the IOMMU page-tables even if the domain has crashed until Xen
>>>>> start to relinquish the resources.
>>>>>
>>>>> For x86, we rely on dom_iommu(d)->arch.mapping.lock to ensure
>>>>> d->is_dying is correctly observed (a follow-up patch will held it in the
>>>>> relinquish path).
>>
>> Am I to understand this to mean that at this point of the series
>> things aren't really correct yet in this regard? If so, wouldn't
>> it be better to re-order?
> 
> You asked this specific order... So are you saying you want me to use 
> the original ordering?

Well, it's been a while and I don't recall the specific reason
for the request. But then at least the spin_barrier() you mean
to rely on could / should be moved here?

>>>>> For Arm, there is still a small race possible. But there is so far no
>>>>> failure specific to a domain dying.
>>>>>
>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> ---
>>>>>
>>>>> This was spotted when trying to destroy IOREQ servers while the domain
>>>>> is dying. The code will try to add the entry back in the P2M and
>>>>> therefore update the P2M (see arch_ioreq_server_disable() ->
>>>>> hvm_add_ioreq_gfn()).
>>>>>
>>>>> It should be possible to skip the mappin in hvm_add_ioreq_gfn(), however
>>>>> I didn't try a patch yet because checking d->is_dying can be racy (I
>>>>> can't find a proper lock).
>>
>> I understand the concern. I find it odd though that we permit
>> iommu_map() to do anything at all when the domain is already
>> dying. So irrespective of the remark below, how about bailing
>> from iommu_map() earlier when the domain is dying?
> 
> I felt this was potentially too racy to use it. But it should be fine if 
> keep the !d->is_dying below.

Why? As per later comments I didn't necessarily mean iommu_map()
literally - as indicated, the per-vendor functions ought to be
suitable to place the check, right after having taken the lock.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-10 15:04     ` Julien Grall
@ 2021-02-10 16:12       ` Jan Beulich
  2021-02-17 11:49         ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-10 16:12 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 10.02.2021 16:04, Julien Grall wrote:
> On 10/02/2021 14:32, Jan Beulich wrote:
>> On 09.02.2021 16:28, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The new IOMMU page-tables allocator will release the pages when
>>> relinquish the domain resources. However, this is not sufficient when
>>> the domain is dying because nothing prevents page-table to be allocated.
>>>
>>> iommu_alloc_pgtable() is now checking if the domain is dying before
>>> adding the page in the list. We are relying on &hd->arch.pgtables.lock
>>> to synchronize d->is_dying.
>>
>> As said in reply to an earlier patch, I think suppressing
>> (really: ignoring) new mappings would be better.
> 
> This is exactly what I suggested in v1 but you wrote:
> 
> "Ignoring requests there seems fragile to me. Paul - what are your
> thoughts about bailing early from hvm_add_ioreq_gfn() when the
> domain is dying?"

Was this on the thread of this patch? I didn't find such a
reply of mine. I need more context here because you name
hvm_add_ioreq_gfn() above, while I refer to iommu_map()
(and downwards the call stack).

> Are you know saying that the following snipped would be fine:
> 
> if ( d->is_dying )
>    return 0;

In {amd,intel}_iommu_map_page(), after the lock was acquired
and with it suitably released, yes. And if that's what you
suggested, then I'm sorry - I don't think I can see anything
fragile there anymore.

>> You could
>> utilize the same lock, but you'd need to duplicate the
>> checking in {amd,intel}_iommu_map_page().
>>
>> I'm not entirely certain in the case about unmap requests:
>> It may be possible to also suppress/ignore them, but this
>> may require some further thought.
> 
> I think the unmap part is quite risky to d->is_dying because the PCI 
> devices may not quiesced and still assigned to the domain.

Hmm, yes, good point. Of course upon first unmap with is_dying
observed set we could zap the root page table, but I don't
suppose that's something we want to do for 4.15.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-10 11:26   ` Jan Beulich
@ 2021-02-15 11:38     ` Julien Grall
  2021-02-15 12:36       ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-15 11:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel

Hi Jan,

On 10/02/2021 11:26, Jan Beulich wrote:
> On 09.02.2021 16:28, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, the IOMMU page-tables will be populated early in the domain
>> creation if the hardware is able to virtualize the local APIC. However,
>> the IOMMU page tables will not be freed during early failure and will
>> result to a leak.
>>
>> An assigned device should not need to DMA into the vLAPIC page, so we
>> can avoid to map the page in the IOMMU page-tables.
> 
> Here and below, may I ask that you use the correct term "APIC
> access page", as there are other pages involved in vLAPIC
> handling (in particular the virtual APIC page, which is where
> accesses actually go to that translate to the APIC access page
> in EPT).
> 
>> This statement is also true for any special pages (the vLAPIC page is
>> one of them). So to take the opportunity to prevent the mapping for all
>> of them.
> 
> I probably should have realized this earlier, but there is a
> downside to this: A guest wanting to core dump itself may want
> to dump e.g. shared info and vcpu info pages. Hence ...
> 
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -919,6 +919,10 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>>   {
>>       unsigned int flags;
>>   
>> +    /* Don't map special pages in the IOMMU page-tables. */
>> +    if ( mfn_valid(mfn) && is_special_page(mfn_to_page(mfn)) )
>> +        return 0;
> 
> ... instead of is_special_page() I think you want to limit the
> check here to seeing whether PGC_extra is set.
> 
> But as said on irc, since this crude way of setting up the APIC
> access page is now firmly a problem, I intend to try to redo it.

Given this series needs to go in 4.15 (we would introduce a 0-day 
otherwise), could you clarify whether your patch [1] is intended to 
replace this one in 4.15?

Cheers,

[1] <1b6a411b-84e7-bfb1-647e-511a13df838c@suse.com>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-15 11:38     ` Julien Grall
@ 2021-02-15 12:36       ` Jan Beulich
  2021-02-15 12:54         ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-15 12:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel

On 15.02.2021 12:38, Julien Grall wrote:
> On 10/02/2021 11:26, Jan Beulich wrote:
>> On 09.02.2021 16:28, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> Currently, the IOMMU page-tables will be populated early in the domain
>>> creation if the hardware is able to virtualize the local APIC. However,
>>> the IOMMU page tables will not be freed during early failure and will
>>> result to a leak.
>>>
>>> An assigned device should not need to DMA into the vLAPIC page, so we
>>> can avoid to map the page in the IOMMU page-tables.
>>
>> Here and below, may I ask that you use the correct term "APIC
>> access page", as there are other pages involved in vLAPIC
>> handling (in particular the virtual APIC page, which is where
>> accesses actually go to that translate to the APIC access page
>> in EPT).
>>
>>> This statement is also true for any special pages (the vLAPIC page is
>>> one of them). So to take the opportunity to prevent the mapping for all
>>> of them.
>>
>> I probably should have realized this earlier, but there is a
>> downside to this: A guest wanting to core dump itself may want
>> to dump e.g. shared info and vcpu info pages. Hence ...
>>
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -919,6 +919,10 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>>>   {
>>>       unsigned int flags;
>>>   
>>> +    /* Don't map special pages in the IOMMU page-tables. */
>>> +    if ( mfn_valid(mfn) && is_special_page(mfn_to_page(mfn)) )
>>> +        return 0;
>>
>> ... instead of is_special_page() I think you want to limit the
>> check here to seeing whether PGC_extra is set.
>>
>> But as said on irc, since this crude way of setting up the APIC
>> access page is now firmly a problem, I intend to try to redo it.
> 
> Given this series needs to go in 4.15 (we would introduce a 0-day 
> otherwise), could you clarify whether your patch [1] is intended to 
> replace this one in 4.15?

Yes, that or a cut down variant (simply moving the invocation of
set_mmio_p2m_entry()). The more that there the controversy
continued regarding the adjustment to p2m_get_iommu_flags(). I
did indicate there that I've dropped it for v2.

> [1] <1b6a411b-84e7-bfb1-647e-511a13df838c@suse.com>

Given the context I was able to guess what mail you refer to, but
I would very much like to ask you and anyone else to provide links
rather than mail IDs as references. Not every mail UI allows
looking up by ID.

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-15 12:36       ` Jan Beulich
@ 2021-02-15 12:54         ` Julien Grall
  2021-02-15 13:14           ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-15 12:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel

Hi Jan,

On 15/02/2021 12:36, Jan Beulich wrote:
> On 15.02.2021 12:38, Julien Grall wrote:
>> On 10/02/2021 11:26, Jan Beulich wrote:
>>> On 09.02.2021 16:28, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> Currently, the IOMMU page-tables will be populated early in the domain
>>>> creation if the hardware is able to virtualize the local APIC. However,
>>>> the IOMMU page tables will not be freed during early failure and will
>>>> result to a leak.
>>>>
>>>> An assigned device should not need to DMA into the vLAPIC page, so we
>>>> can avoid to map the page in the IOMMU page-tables.
>>>
>>> Here and below, may I ask that you use the correct term "APIC
>>> access page", as there are other pages involved in vLAPIC
>>> handling (in particular the virtual APIC page, which is where
>>> accesses actually go to that translate to the APIC access page
>>> in EPT).
>>>
>>>> This statement is also true for any special pages (the vLAPIC page is
>>>> one of them). So to take the opportunity to prevent the mapping for all
>>>> of them.
>>>
>>> I probably should have realized this earlier, but there is a
>>> downside to this: A guest wanting to core dump itself may want
>>> to dump e.g. shared info and vcpu info pages. Hence ...
>>>
>>>> --- a/xen/include/asm-x86/p2m.h
>>>> +++ b/xen/include/asm-x86/p2m.h
>>>> @@ -919,6 +919,10 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
>>>>    {
>>>>        unsigned int flags;
>>>>    
>>>> +    /* Don't map special pages in the IOMMU page-tables. */
>>>> +    if ( mfn_valid(mfn) && is_special_page(mfn_to_page(mfn)) )
>>>> +        return 0;
>>>
>>> ... instead of is_special_page() I think you want to limit the
>>> check here to seeing whether PGC_extra is set.
>>>
>>> But as said on irc, since this crude way of setting up the APIC
>>> access page is now firmly a problem, I intend to try to redo it.
>>
>> Given this series needs to go in 4.15 (we would introduce a 0-day
>> otherwise), could you clarify whether your patch [1] is intended to
>> replace this one in 4.15?
> 
> Yes, that or a cut down variant (simply moving the invocation of
> set_mmio_p2m_entry()). The more that there the controversy
> continued regarding the adjustment to p2m_get_iommu_flags(). I
> did indicate there that I've dropped it for v2.

Do you have a link to v2? I would like to try with my series.

> 
>> [1] <1b6a411b-84e7-bfb1-647e-511a13df838c@suse.com>
> 
> Given the context I was able to guess what mail you refer to, but
> I would very much like to ask you and anyone else to provide links
> rather than mail IDs as references. Not every mail UI allows
> looking up by ID.
It is pretty trivial nowadays to search for a message by ID online:

https://lore.kernel.org/xen-devel/<message-id>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-15 12:54         ` Julien Grall
@ 2021-02-15 13:14           ` Jan Beulich
  2021-02-17 11:21             ` Julien Grall
  0 siblings, 1 reply; 47+ messages in thread
From: Jan Beulich @ 2021-02-15 13:14 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel

On 15.02.2021 13:54, Julien Grall wrote:
> On 15/02/2021 12:36, Jan Beulich wrote:
>> On 15.02.2021 12:38, Julien Grall wrote:
>>> Given this series needs to go in 4.15 (we would introduce a 0-day
>>> otherwise), could you clarify whether your patch [1] is intended to
>>> replace this one in 4.15?
>>
>> Yes, that or a cut down variant (simply moving the invocation of
>> set_mmio_p2m_entry()). The more that there the controversy
>> continued regarding the adjustment to p2m_get_iommu_flags(). I
>> did indicate there that I've dropped it for v2.
> 
> Do you have a link to v2? I would like to try with my series.

I didn't post it yet, as I didn't consider the v1 discussion
settled so far. The intermediate version I have at present is
below.

Jan

VMX: use a single, global APIC access page

The address of this page is used by the CPU only to recognize when to
access the virtual APIC page instead. No accesses would ever go to this
page. It only needs to be present in the (CPU) page tables so that
address translation will produce its address as result for respective
accesses.

By making this page global, we also eliminate the need to refcount it,
or to assign it to any domain in the first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Avoid insertion when !has_vlapic(). Split off change to
    p2m_get_iommu_flags().
---
Hooking p2m insertion onto arch_domain_creation_finished() isn't very
nice, but I couldn't find any better hook (nor a good place where to
introduce a new one). In particular there look to be no hvm_funcs hooks
being used on a domain-wide basis (except for init/destroy of course).
I did consider connecting this to the setting of HVM_PARAM_IDENT_PT, but
considered this no better, the more that the tool stack could be smarter
and avoid setting that param when not needed.

I did further consider not allocating any real page at all, but just
using the address of some unpopulated space (which would require
announcing this page as reserved to Dom0, so it wouldn't put any PCI
MMIO BARs there). But I thought this would be too controversial, because
of the possible risks associated with this.

--- unstable.orig/xen/arch/x86/domain.c
+++ unstable/xen/arch/x86/domain.c
@@ -1005,6 +1005,8 @@ int arch_domain_soft_reset(struct domain
 
 void arch_domain_creation_finished(struct domain *d)
 {
+    if ( is_hvm_domain(d) )
+        hvm_domain_creation_finished(d);
 }
 
 #define xen_vcpu_guest_context vcpu_guest_context
--- unstable.orig/xen/arch/x86/hvm/vmx/vmx.c
+++ unstable/xen/arch/x86/hvm/vmx/vmx.c
@@ -66,8 +66,7 @@ boolean_param("force-ept", opt_force_ept
 static void vmx_ctxt_switch_from(struct vcpu *v);
 static void vmx_ctxt_switch_to(struct vcpu *v);
 
-static int  vmx_alloc_vlapic_mapping(struct domain *d);
-static void vmx_free_vlapic_mapping(struct domain *d);
+static int alloc_vlapic_mapping(void);
 static void vmx_install_vlapic_mapping(struct vcpu *v);
 static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
                                 unsigned int flags);
@@ -78,6 +77,8 @@ static int vmx_msr_read_intercept(unsign
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg(struct vcpu *v, unsigned long linear);
 
+static mfn_t __read_mostly apic_access_mfn;
+
 /* Values for domain's ->arch.hvm_domain.pi_ops.flags. */
 #define PI_CSW_FROM (1u << 0)
 #define PI_CSW_TO   (1u << 1)
@@ -401,7 +402,6 @@ static int vmx_domain_initialise(struct
         .to   = vmx_ctxt_switch_to,
         .tail = vmx_do_resume,
     };
-    int rc;
 
     d->arch.ctxt_switch = &csw;
 
@@ -411,21 +411,16 @@ static int vmx_domain_initialise(struct
      */
     d->arch.hvm.vmx.exec_sp = is_hardware_domain(d) || opt_ept_exec_sp;
 
-    if ( !has_vlapic(d) )
-        return 0;
-
-    if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
-        return rc;
-
     return 0;
 }
 
-static void vmx_domain_relinquish_resources(struct domain *d)
+static void domain_creation_finished(struct domain *d)
 {
-    if ( !has_vlapic(d) )
-        return;
 
-    vmx_free_vlapic_mapping(d);
+    if ( has_vlapic(d) && !mfn_eq(apic_access_mfn, _mfn(0)) &&
+         set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE),
+                            apic_access_mfn, PAGE_ORDER_4K) )
+        domain_crash(d);
 }
 
 static int vmx_vcpu_initialise(struct vcpu *v)
@@ -2264,7 +2259,7 @@ static struct hvm_function_table __initd
     .cpu_up_prepare       = vmx_cpu_up_prepare,
     .cpu_dead             = vmx_cpu_dead,
     .domain_initialise    = vmx_domain_initialise,
-    .domain_relinquish_resources = vmx_domain_relinquish_resources,
+    .domain_creation_finished = domain_creation_finished,
     .vcpu_initialise      = vmx_vcpu_initialise,
     .vcpu_destroy         = vmx_vcpu_destroy,
     .save_cpu_ctxt        = vmx_save_vmcs_ctxt,
@@ -2503,7 +2498,7 @@ const struct hvm_function_table * __init
 {
     set_in_cr4(X86_CR4_VMXE);
 
-    if ( vmx_vmcs_init() )
+    if ( vmx_vmcs_init() || alloc_vlapic_mapping() )
     {
         printk("VMX: failed to initialise.\n");
         return NULL;
@@ -3064,7 +3059,7 @@ gp_fault:
     return X86EMUL_EXCEPTION;
 }
 
-static int vmx_alloc_vlapic_mapping(struct domain *d)
+static int __init alloc_vlapic_mapping(void)
 {
     struct page_info *pg;
     mfn_t mfn;
@@ -3072,53 +3067,28 @@ static int vmx_alloc_vlapic_mapping(stru
     if ( !cpu_has_vmx_virtualize_apic_accesses )
         return 0;
 
-    pg = alloc_domheap_page(d, MEMF_no_refcount);
+    pg = alloc_domheap_page(NULL, 0);
     if ( !pg )
         return -ENOMEM;
 
-    if ( !get_page_and_type(pg, d, PGT_writable_page) )
-    {
-        /*
-         * The domain can't possibly know about this page yet, so failure
-         * here is a clear indication of something fishy going on.
-         */
-        domain_crash(d);
-        return -ENODATA;
-    }
-
     mfn = page_to_mfn(pg);
     clear_domain_page(mfn);
-    d->arch.hvm.vmx.apic_access_mfn = mfn;
+    apic_access_mfn = mfn;
 
-    return set_mmio_p2m_entry(d, gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE), mfn,
-                              PAGE_ORDER_4K);
-}
-
-static void vmx_free_vlapic_mapping(struct domain *d)
-{
-    mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
-
-    d->arch.hvm.vmx.apic_access_mfn = _mfn(0);
-    if ( !mfn_eq(mfn, _mfn(0)) )
-    {
-        struct page_info *pg = mfn_to_page(mfn);
-
-        put_page_alloc_ref(pg);
-        put_page_and_type(pg);
-    }
+    return 0;
 }
 
 static void vmx_install_vlapic_mapping(struct vcpu *v)
 {
     paddr_t virt_page_ma, apic_page_ma;
 
-    if ( mfn_eq(v->domain->arch.hvm.vmx.apic_access_mfn, _mfn(0)) )
+    if ( !has_vlapic(v->domain) || mfn_eq(apic_access_mfn, _mfn(0)) )
         return;
 
     ASSERT(cpu_has_vmx_virtualize_apic_accesses);
 
     virt_page_ma = page_to_maddr(vcpu_vlapic(v)->regs_page);
-    apic_page_ma = mfn_to_maddr(v->domain->arch.hvm.vmx.apic_access_mfn);
+    apic_page_ma = mfn_to_maddr(apic_access_mfn);
 
     vmx_vmcs_enter(v);
     __vmwrite(VIRTUAL_APIC_PAGE_ADDR, virt_page_ma);
--- unstable.orig/xen/include/asm-x86/hvm/hvm.h
+++ unstable/xen/include/asm-x86/hvm/hvm.h
@@ -106,6 +106,7 @@ struct hvm_function_table {
      * Initialise/destroy HVM domain/vcpu resources
      */
     int  (*domain_initialise)(struct domain *d);
+    void (*domain_creation_finished)(struct domain *d);
     void (*domain_relinquish_resources)(struct domain *d);
     void (*domain_destroy)(struct domain *d);
     int  (*vcpu_initialise)(struct vcpu *v);
@@ -383,6 +384,12 @@ static inline bool hvm_has_set_descripto
     return hvm_funcs.set_descriptor_access_exiting;
 }
 
+static inline void hvm_domain_creation_finished(struct domain *d)
+{
+    if ( hvm_funcs.domain_creation_finished )
+        alternative_vcall(hvm_funcs.domain_creation_finished, d);
+}
+
 static inline int
 hvm_guest_x86_mode(struct vcpu *v)
 {
@@ -715,6 +722,11 @@ static inline void hvm_invlpg(const stru
 {
     ASSERT_UNREACHABLE();
 }
+
+static inline void hvm_domain_creation_finished(struct domain *d)
+{
+    ASSERT_UNREACHABLE();
+}
 
 /*
  * Shadow code needs further cleanup to eliminate some HVM-only paths. For
--- unstable.orig/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ unstable/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -58,7 +58,6 @@ struct ept_data {
 #define _VMX_DOMAIN_PML_ENABLED    0
 #define VMX_DOMAIN_PML_ENABLED     (1ul << _VMX_DOMAIN_PML_ENABLED)
 struct vmx_domain {
-    mfn_t apic_access_mfn;
     /* VMX_DOMAIN_* */
     unsigned int status;
 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables
  2021-02-15 13:14           ` Jan Beulich
@ 2021-02-17 11:21             ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-17 11:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, xen-devel



On 15/02/2021 13:14, Jan Beulich wrote:
> On 15.02.2021 13:54, Julien Grall wrote:
>> On 15/02/2021 12:36, Jan Beulich wrote:
>>> On 15.02.2021 12:38, Julien Grall wrote:
>>>> Given this series needs to go in 4.15 (we would introduce a 0-day
>>>> otherwise), could you clarify whether your patch [1] is intended to
>>>> replace this one in 4.15?
>>>
>>> Yes, that or a cut down variant (simply moving the invocation of
>>> set_mmio_p2m_entry()). The more that there the controversy
>>> continued regarding the adjustment to p2m_get_iommu_flags(). I
>>> did indicate there that I've dropped it for v2.
>>
>> Do you have a link to v2? I would like to try with my series.
> 
> I didn't post it yet, as I didn't consider the v1 discussion
> settled so far. The intermediate version I have at present is
> below.

Thanks! I will drop patch #1 from my series and resend it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down
  2021-02-09 20:22   ` Paul Durrant
@ 2021-02-17 11:25     ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-17 11:25 UTC (permalink / raw)
  To: paul, xen-devel
  Cc: hongyxia, iwj, 'Julien Grall', 'Jan Beulich'

Hi Paul,

On 09/02/2021 20:22, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 09 February 2021 15:28
>> To: xen-devel@lists.xenproject.org
>> Cc: hongyxia@amazon.co.uk; iwj@xenproject.org; Julien Grall <jgrall@amazon.com>; Jan Beulich
>> <jbeulich@suse.com>; Paul Durrant <paul@xen.org>
>> Subject: [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> is_iommu_enabled() will return true even if the IOMMU has not been
>> initialized (e.g. the ops are not set).
>>
>> In the case of an early failure in arch_domain_init(), the function
>> iommu_destroy_domain() will be called even if the IOMMU is not
>> initialized.
>>
>> This will result to dereference the ops which will be NULL and an host
>> crash.
>>
>> Fix the issue by checking that ops has been set before accessing it.
>>
>> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> Reviewed-by: Paul Durrant <paul@xen.org>

Thanks! Ian gave his Release-Acked-by so I will commit this patch now.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn
  2021-02-09 16:47 ` [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Ian Jackson
@ 2021-02-17 11:33   ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-17 11:33 UTC (permalink / raw)
  To: Ian Jackson
  Cc: xen-devel, hongyxia, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Paul Durrant, Kevin Tian

Hi Ian,

On 09/02/2021 16:47, Ian Jackson wrote:
> Julien Grall writes ("[for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn"):
>> From: Julien Grall <jgrall@amazon.com>
> ...
>> This series is a collection of bug fixes for the IOMMU teardown code.
>> All of them are candidate for 4.15 as they can either leak memory or
>> lead to host crash/host corruption.
>>
>> This is sent directly on xen-devel because all the issues were either
>> introduced in 4.15 or happen in the domain creation code.
> 
> I think by current freeze rules this does not need a release-ack but
> for the avoidance of doubt
> 
> Release-Acked-by: Ian Jackson <iwj@xenproject.org>

Thanks!

> 
> assuming it's commited by the end of the week.

I saw you extended the freeze rules by a week. So I will assume that I 
have until end of this week (19th February) to commit it.

Please let me know if I misunderstood the extension.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-10 16:12       ` Jan Beulich
@ 2021-02-17 11:49         ` Julien Grall
  2021-02-17 12:57           ` Jan Beulich
  0 siblings, 1 reply; 47+ messages in thread
From: Julien Grall @ 2021-02-17 11:49 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 10/02/2021 16:12, Jan Beulich wrote:
> On 10.02.2021 16:04, Julien Grall wrote:
>> On 10/02/2021 14:32, Jan Beulich wrote:
>>> On 09.02.2021 16:28, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The new IOMMU page-tables allocator will release the pages when
>>>> relinquish the domain resources. However, this is not sufficient when
>>>> the domain is dying because nothing prevents page-table to be allocated.
>>>>
>>>> iommu_alloc_pgtable() is now checking if the domain is dying before
>>>> adding the page in the list. We are relying on &hd->arch.pgtables.lock
>>>> to synchronize d->is_dying.
>>>
>>> As said in reply to an earlier patch, I think suppressing
>>> (really: ignoring) new mappings would be better.
>>
>> This is exactly what I suggested in v1 but you wrote:
>>
>> "Ignoring requests there seems fragile to me. Paul - what are your
>> thoughts about bailing early from hvm_add_ioreq_gfn() when the
>> domain is dying?"
> 
> Was this on the thread of this patch? I didn't find such a
> reply of mine. I need more context here because you name
> hvm_add_ioreq_gfn() above, while I refer to iommu_map()
> (and downwards the call stack).

See [1].

> 
>> Are you know saying that the following snipped would be fine:
>>
>> if ( d->is_dying )
>>     return 0;
> 
> In {amd,intel}_iommu_map_page(), after the lock was acquired
> and with it suitably released, yes. And if that's what you
> suggested, then I'm sorry - I don't think I can see anything
> fragile there anymore.

Duplicating the check sounds good to me.

> 
>>> You could
>>> utilize the same lock, but you'd need to duplicate the
>>> checking in {amd,intel}_iommu_map_page().
>>>
>>> I'm not entirely certain in the case about unmap requests:
>>> It may be possible to also suppress/ignore them, but this
>>> may require some further thought.
>>
>> I think the unmap part is quite risky to d->is_dying because the PCI
>> devices may not quiesced and still assigned to the domain.
> 
> Hmm, yes, good point. Of course upon first unmap with is_dying
> observed set we could zap the root page table, but I don't
> suppose that's something we want to do for 4.15.

We would still need to zap the root page table in the relinquish path. 
So I am not sure what benefits it would give us to zap the page tables 
on the first iommu_unmap() afther domain dies.

Cheers,

[1] 
https://lore.kernel.org/xen-devel/f21f1f61-5213-55a8-320c-43e5fe80100f@suse.com/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-02-17 11:49         ` Julien Grall
@ 2021-02-17 12:57           ` Jan Beulich
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Beulich @ 2021-02-17 12:57 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 17.02.2021 12:49, Julien Grall wrote:
> On 10/02/2021 16:12, Jan Beulich wrote:
>> On 10.02.2021 16:04, Julien Grall wrote:
>>> Are you know saying that the following snipped would be fine:
>>>
>>> if ( d->is_dying )
>>>     return 0;
>>
>> In {amd,intel}_iommu_map_page(), after the lock was acquired
>> and with it suitably released, yes. And if that's what you
>> suggested, then I'm sorry - I don't think I can see anything
>> fragile there anymore.
> 
> Duplicating the check sounds good to me.

The checks in said functions are mandatory, and I didn't really
have any duplication in mind. But yes, iommu_map() could have
an early (but racy) check, if so wanted.

>>>> You could
>>>> utilize the same lock, but you'd need to duplicate the
>>>> checking in {amd,intel}_iommu_map_page().
>>>>
>>>> I'm not entirely certain in the case about unmap requests:
>>>> It may be possible to also suppress/ignore them, but this
>>>> may require some further thought.
>>>
>>> I think the unmap part is quite risky to d->is_dying because the PCI
>>> devices may not quiesced and still assigned to the domain.
>>
>> Hmm, yes, good point. Of course upon first unmap with is_dying
>> observed set we could zap the root page table, but I don't
>> suppose that's something we want to do for 4.15.
> 
> We would still need to zap the root page table in the relinquish path. 
> So I am not sure what benefits it would give us to zap the page tables 
> on the first iommu_unmap() afther domain dies.

I guess we think of different aspects of the zapping - what I'm
suggesting here is for the effect of no translations remaining
active anymore. I'm not after freeing the memory at this point;
that'll have to happen on the relinquish path, as you say. The
problem with allowing individual unmaps to proceed (unlike your
plan for maps) is that these, too, can trigger allocations (when
a large page needs shattering).

Jan


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-10  2:21   ` Tian, Kevin
@ 2021-02-17 13:54     ` Julien Grall
  0 siblings, 0 replies; 47+ messages in thread
From: Julien Grall @ 2021-02-17 13:54 UTC (permalink / raw)
  To: Tian, Kevin, xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Cooper, Andrew, Paul Durrant

Hi Kevin,

On 10/02/2021 02:21, Tian, Kevin wrote:
>> From: Julien Grall <julien@xen.org>
>> Sent: Tuesday, February 9, 2021 11:28 PM
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new per-domain IOMMU page-table allocator will now free the
>> page-tables when domain's resources are relinquished. However, the root
>> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
> 
> It's the pointer not the table itself.

I don't think the sentence is incorrect. One can view clearing as either 
clearing the page table itself or the pointer.

> 
>>
>> Xen may access the IOMMU page-tables afterwards at least in the case of
>> PV domain:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04025b4b2>] R
>> iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
>> (XEN)    [<ffff82d04025b695>] F
>> iommu.c#intel_iommu_unmap_page+0x5d/0xf8
>> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
>> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
>> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
>> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
>> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
>> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
>> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
>> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
>> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
>> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
>> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
>> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
>> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
>> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
>> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
>>
>> This will result to a use after-free and possibly an host crash or
>> memory corruption.
>>
>> Freeing the page-tables further down in domain_relinquish_resources()
>> would not work because pages may not be released until later if another
>> domain hold a reference on them.
>>
>> Once all the PCI devices have been de-assigned, it is actually pointless
>> to access modify the IOMMU page-tables. So we can simply clear the root
>> page-table address.
> 
> Above two paragraphs are confusing to me. I don't know what exact change
> is proposed until looking over the whole patch. Isn't it clearer to just say
> "We should clear the root page table pointer in iommu_free_pgtables to
> avoid use-after-free after all pgtables are moved to the free list"?

Your sentence explain the approach taken but not the rational behind the 
approach. Both are equally important for future reference.

I will try to reword it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-02-17 13:55 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 15:28 [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
2021-02-09 15:28 ` [for-4.15][PATCH v2 1/5] xen/x86: p2m: Don't map the special pages in the IOMMU page-tables Julien Grall
2021-02-09 20:19   ` Paul Durrant
2021-02-10  8:29   ` Roger Pau Monné
2021-02-10  8:50     ` Julien Grall
2021-02-10  9:34       ` Julien Grall
2021-02-10 11:10     ` Jan Beulich
2021-02-10 11:34       ` Roger Pau Monné
2021-02-10 11:38         ` Jan Beulich
2021-02-10 11:40           ` Julien Grall
2021-02-10 11:45             ` Jan Beulich
2021-02-10 11:48               ` Julien Grall
2021-02-10 11:54                 ` Roger Pau Monné
2021-02-10 13:12                   ` Jan Beulich
2021-02-10 15:24                     ` Roger Pau Monné
2021-02-10 15:53                       ` Jan Beulich
2021-02-10 13:08                 ` Jan Beulich
2021-02-10 11:26   ` Jan Beulich
2021-02-15 11:38     ` Julien Grall
2021-02-15 12:36       ` Jan Beulich
2021-02-15 12:54         ` Julien Grall
2021-02-15 13:14           ` Jan Beulich
2021-02-17 11:21             ` Julien Grall
2021-02-09 15:28 ` [for-4.15][PATCH v2 2/5] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
2021-02-09 20:22   ` Paul Durrant
2021-02-17 11:25     ` Julien Grall
2021-02-09 15:28 ` [for-4.15][PATCH v2 3/5] xen/iommu: iommu_map: Don't crash the domain if it is dying Julien Grall
2021-02-09 20:28   ` Paul Durrant
2021-02-09 21:14     ` Julien Grall
2021-02-10 14:14       ` Jan Beulich
2021-02-10 14:58         ` Julien Grall
2021-02-10 15:56           ` Jan Beulich
2021-02-09 15:28 ` [for-4.15][PATCH v2 4/5] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
2021-02-09 20:33   ` Paul Durrant
2021-02-10 14:32   ` Jan Beulich
2021-02-10 15:04     ` Julien Grall
2021-02-10 16:12       ` Jan Beulich
2021-02-17 11:49         ` Julien Grall
2021-02-17 12:57           ` Jan Beulich
2021-02-10 14:44   ` Jan Beulich
2021-02-09 15:28 ` [for-4.15][PATCH v2 5/5] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
2021-02-09 20:36   ` Paul Durrant
2021-02-10  2:21   ` Tian, Kevin
2021-02-17 13:54     ` Julien Grall
2021-02-10 14:39   ` Jan Beulich
2021-02-09 16:47 ` [for-4.15][PATCH v2 0/5] xen/iommu: Collection of bug fixes for IOMMU teadorwn Ian Jackson
2021-02-17 11:33   ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.