xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [for-4.15][PATCH v3 0/3] xen/iommu: Collection of bug fixes for IOMMU teadorwn
@ 2021-02-17 14:24 Julien Grall
  2021-02-17 14:24 ` [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Julien Grall @ 2021-02-17 14:24 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
	Kevin Tian, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

Hi all,

This series is a collection of bug fixes for the IOMMU teardown code.
All of them are candidate for 4.15 as they can either leak memory or
lead to host crash/host corruption.

This is sent directly on xen-devel because all the issues were either
introduced in 4.15 or happen in the domain creation code.

Major changes since v2:
    - patch #1 "xen/x86: p2m: Don't map the special pages in the IOMMU
    page-tables" has been removed. This requires Jan's patch [2] to
    fully mitigate memory leaks.

Cheers,

[1] https://lore.kernel.org/xen-devel/1b6a411b-84e7-bfb1-647e-511a13df838c@suse.com

Julien Grall (3):
  xen/iommu: x86: Clear the root page-table before freeing the
    page-tables
  xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  xen/iommu: x86: Harden the IOMMU page-table allocator

 xen/drivers/passthrough/amd/iommu_map.c     | 13 ++++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++-
 xen/drivers/passthrough/vtd/iommu.c         | 25 +++++++++++-
 xen/drivers/passthrough/x86/iommu.c         | 45 ++++++++++++++++++++-
 xen/include/xen/iommu.h                     |  1 +
 5 files changed, 93 insertions(+), 3 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-17 14:24 [for-4.15][PATCH v3 0/3] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
@ 2021-02-17 14:24 ` Julien Grall
  2021-02-17 14:54   ` Jan Beulich
  2021-02-17 14:24 ` [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
  2021-02-17 14:24 ` [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator Julien Grall
  2 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-17 14:24 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
	Kevin Tian, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The new per-domain IOMMU page-table allocator will now free the
page-tables when domain's resources are relinquished. However, the
per-domain IOMMU structure will still contain a dangling pointer to
the root page-table.

Xen may access the IOMMU page-tables afterwards at least in the case of
PV domain:

(XEN) Xen call trace:
(XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
(XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
(XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
(XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
(XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
(XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
(XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
(XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
(XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
(XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
(XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
(XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
(XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
(XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
(XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
(XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
(XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120

This will result to a use after-free and possibly an host crash or
memory corruption.

It would not be possible to free the page-tables further down in
domain_relinquish_resources() because cleanup_page_mappings() will only
be called when the last reference on the page dropped. This may happen
much later if another domain still hold a reference.

After all the PCI devices have been de-assigned, nobody should use the
IOMMU page-tables and it is therefore pointless to try to modify them.

So we can simply clear any reference to the root page-table in the
per-domain IOMMU structure. This requires to introduce a new callback of
the method will depend on the IOMMU driver used.

Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>

---
    Changes in v3:
        - Move the patch earlier in the series
        - Reword the commit message

    Changes in v2:
        - Introduce clear_root_pgtable()
        - Move the patch later in the series
---
 xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++++++++-
 xen/drivers/passthrough/vtd/iommu.c         | 12 +++++++++++-
 xen/drivers/passthrough/x86/iommu.c         |  9 +++++++++
 xen/include/xen/iommu.h                     |  1 +
 4 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 42b5a5a9bec4..81add0ba26b4 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
     return reassign_device(pdev->domain, d, devfn, pdev);
 }
 
+static void iommu_clear_root_pgtable(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+
+    spin_lock(&hd->arch.mapping_lock);
+    hd->arch.amd.root_table = NULL;
+    spin_unlock(&hd->arch.mapping_lock);
+}
+
 static void amd_iommu_domain_destroy(struct domain *d)
 {
-    dom_iommu(d)->arch.amd.root_table = NULL;
+    ASSERT(!dom_iommu(d)->arch.amd.root_table);
 }
 
 static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
@@ -565,6 +574,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
     .remove_device = amd_iommu_remove_device,
     .assign_device  = amd_iommu_assign_device,
     .teardown = amd_iommu_domain_destroy,
+    .clear_root_pgtable = iommu_clear_root_pgtable,
     .map_page = amd_iommu_map_page,
     .unmap_page = amd_iommu_unmap_page,
     .iotlb_flush = amd_iommu_flush_iotlb_pages,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d136fe36883b..e1871f6c2bc1 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1726,6 +1726,15 @@ out:
     return ret;
 }
 
+static void iommu_clear_root_pgtable(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+
+    spin_lock(&hd->arch.mapping_lock);
+    hd->arch.vtd.pgd_maddr = 0;
+    spin_unlock(&hd->arch.mapping_lock);
+}
+
 static void iommu_domain_teardown(struct domain *d)
 {
     struct domain_iommu *hd = dom_iommu(d);
@@ -1740,7 +1749,7 @@ static void iommu_domain_teardown(struct domain *d)
         xfree(mrmrr);
     }
 
-    hd->arch.vtd.pgd_maddr = 0;
+    ASSERT(!hd->arch.vtd.pgd_maddr);
 }
 
 static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -2719,6 +2728,7 @@ static struct iommu_ops __initdata vtd_ops = {
     .remove_device = intel_iommu_remove_device,
     .assign_device  = intel_iommu_assign_device,
     .teardown = iommu_domain_teardown,
+    .clear_root_pgtable = iommu_clear_root_pgtable,
     .map_page = intel_iommu_map_page,
     .unmap_page = intel_iommu_unmap_page,
     .lookup_page = intel_iommu_lookup_page,
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index cea1032b3d02..f54fc8093f18 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,6 +267,15 @@ int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    if ( !is_iommu_enabled(d) )
+        return 0;
+
+    /*
+     * Pages will be moved to the free list below. So we want to
+     * clear the root page-table to avoid any potential use after-free.
+     */
+    hd->platform_ops->clear_root_pgtable(d);
+
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 863a68fe1622..d59ed7cbad43 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -272,6 +272,7 @@ struct iommu_ops {
 
     int (*adjust_irq_affinities)(void);
     void (*sync_cache)(const void *addr, unsigned int size);
+    void (*clear_root_pgtable)(struct domain *d);
 #endif /* CONFIG_X86 */
 
     int __must_check (*suspend)(void);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-17 14:24 [for-4.15][PATCH v3 0/3] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
  2021-02-17 14:24 ` [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2021-02-17 14:24 ` Julien Grall
  2021-02-17 15:01   ` Jan Beulich
  2021-02-17 14:24 ` [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator Julien Grall
  2 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-17 14:24 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
	Kevin Tian, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The new x86 IOMMU page-tables allocator will release the pages when
relinquishing the domain resources. However, this is not sufficient
when the domain is dying because nothing prevents page-table to be
allocated.

Currently page-table allocations can only happen from iommu_map(). As
the domain is dying, there is no good reason to continue to modify the
IOMMU page-tables.

In order to observe d->is_dying correctly, we need to rely on per-arch
locking, so the check to ignore IOMMU mapping is added on the per-driver
map_page() callback.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Changes in v3:
    - Patch added. This is a replacement of "xen/iommu: iommu_map: Don't
    crash the domain if it is dying"
---
 xen/drivers/passthrough/amd/iommu_map.c | 13 +++++++++++++
 xen/drivers/passthrough/vtd/iommu.c     | 13 +++++++++++++
 xen/drivers/passthrough/x86/iommu.c     |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index d3a8b1aec766..ed78a083ba12 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -285,6 +285,19 @@ int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
+    /*
+     * IOMMU mapping request can be safely ignored when the domain is dying.
+     *
+     * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+     * before any page tables are freed (see iommu_free_pgtables() and
+     * iommu_clear_root_pgtable()).
+     */
+    if ( d->is_dying )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return 0;
+    }
+
     rc = amd_iommu_alloc_root(d);
     if ( rc )
     {
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index e1871f6c2bc1..239a63f74f64 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1771,6 +1771,19 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
 
     spin_lock(&hd->arch.mapping_lock);
 
+    /*
+     * IOMMU mapping request can be safely ignored when the domain is dying.
+     *
+     * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+     * before any page tables are freed (see iommu_free_pgtables() and
+     * iommu_clear_root_pgtable()).
+     */
+    if ( d->is_dying )
+    {
+        spin_unlock(&hd->arch.mapping_lock);
+        return 0;
+    }
+
     pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 1);
     if ( !pg_maddr )
     {
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index f54fc8093f18..faa0078db595 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
     /*
      * Pages will be moved to the free list below. So we want to
      * clear the root page-table to avoid any potential use after-free.
+     *
+     * After this call, no more IOMMU mapping can happen.
+     *
      */
     hd->platform_ops->clear_root_pgtable(d);
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-17 14:24 [for-4.15][PATCH v3 0/3] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
  2021-02-17 14:24 ` [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
  2021-02-17 14:24 ` [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
@ 2021-02-17 14:24 ` Julien Grall
  2021-02-17 15:13   ` Jan Beulich
  2 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-17 14:24 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

At the moment, we are assuming that only iommu_map() can allocate
IOMMU page-table.

Given the complexity of the IOMMU framework, it would be sensible to
have a check closer to the IOMMU allocator. This would avoid to leak
IOMMU page-tables again in the future.

iommu_alloc_pgtable() is now checking if the domain is dying before
adding the page in the list. We are relying on &hd->arch.pgtables.lock
to synchronize d->is_dying.

Take the opportunity to add an ASSERT() in arch_iommu_domain_destroy()
to check if we freed all the IOMMU page tables.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Changes in v3:
    - Rename the patch. This was originally "xen/iommu: x86: Don't leak
    the IOMMU page-tables"
    - Rework the commit message
    - Move the patch towards the end of the series

Changes in v2:
    - Rework the approach
    - Move the patch earlier in the series
---
 xen/drivers/passthrough/x86/iommu.c | 33 ++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index faa0078db595..a67075f0045d 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
+    /*
+     * There should be not page-tables left allocated by the time the
+     * domain is destroyed. Note that arch_iommu_domain_destroy() is
+     * called unconditionally, so pgtables may be unitialized.
+     */
+    ASSERT(dom_iommu(d)->platform_ops == NULL ||
+           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
 }
 
 static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
@@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
      */
     hd->platform_ops->clear_root_pgtable(d);
 
+    /* After this barrier no new page allocations can occur. */
+    spin_barrier(&hd->arch.pgtables.lock);
+
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
@@ -296,6 +306,7 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
     unsigned int memflags = 0;
     struct page_info *pg;
     void *p;
+    bool alive = false;
 
 #ifdef CONFIG_NUMA
     if ( hd->node != NUMA_NO_NODE )
@@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
     unmap_domain_page(p);
 
     spin_lock(&hd->arch.pgtables.lock);
-    page_list_add(pg, &hd->arch.pgtables.list);
+    /*
+     * The IOMMU page-tables are freed when relinquishing the domain, but
+     * nothing prevent allocation to happen afterwards. There is no valid
+     * reasons to continue to update the IOMMU page-tables while the
+     * domain is dying.
+     *
+     * So prevent page-table allocation when the domain is dying.
+     *
+     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
+     */
+    if ( likely(!d->is_dying) )
+    {
+        alive = true;
+        page_list_add(pg, &hd->arch.pgtables.list);
+    }
     spin_unlock(&hd->arch.pgtables.lock);
 
+    if ( unlikely(!alive) )
+    {
+        free_domheap_page(pg);
+        pg = NULL;
+    }
+
     return pg;
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-17 14:24 ` [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2021-02-17 14:54   ` Jan Beulich
  2021-02-17 15:00     ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-17 14:54 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel

On 17.02.2021 15:24, Julien Grall wrote:
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
>      return reassign_device(pdev->domain, d, devfn, pdev);
>  }
>  
> +static void iommu_clear_root_pgtable(struct domain *d)

Nit: amd_iommu_ as a prefix would be okay here considering other
(static) functions also use it. Since it is a static function,
no prefix at all would also do (my personal preference). But
iommu_ as a prefix isn't helpful and results in needless re-use
of VT-d's name.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -267,6 +267,15 @@ int iommu_free_pgtables(struct domain *d)
>      struct page_info *pg;
>      unsigned int done = 0;
>  
> +    if ( !is_iommu_enabled(d) )
> +        return 0;
> +
> +    /*
> +     * Pages will be moved to the free list below. So we want to
> +     * clear the root page-table to avoid any potential use after-free.
> +     */
> +    hd->platform_ops->clear_root_pgtable(d);

Taking amd_iommu_alloc_root() as example, is this really correct
prior to what is now patch 2? What guarantees a new root table
won't get allocated subsequently?

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-17 14:54   ` Jan Beulich
@ 2021-02-17 15:00     ` Julien Grall
  2021-02-17 15:17       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-17 15:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel

Hi Jan,

On 17/02/2021 14:54, Jan Beulich wrote:
> On 17.02.2021 15:24, Julien Grall wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -267,6 +267,15 @@ int iommu_free_pgtables(struct domain *d)
>>       struct page_info *pg;
>>       unsigned int done = 0;
>>   
>> +    if ( !is_iommu_enabled(d) )
>> +        return 0;
>> +
>> +    /*
>> +     * Pages will be moved to the free list below. So we want to
>> +     * clear the root page-table to avoid any potential use after-free.
>> +     */
>> +    hd->platform_ops->clear_root_pgtable(d);
> 
> Taking amd_iommu_alloc_root() as example, is this really correct
> prior to what is now patch 2? 

Yes, there are no more use-after-free...
	
> What guarantees a new root table
> won't get allocated subsequently?

It doesn't prevent root table allocation. I view the two as distincts 
issues, hence the two patches.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-17 14:24 ` [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
@ 2021-02-17 15:01   ` Jan Beulich
  2021-02-17 16:07     ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-17 15:01 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel

On 17.02.2021 15:24, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new x86 IOMMU page-tables allocator will release the pages when
> relinquishing the domain resources. However, this is not sufficient
> when the domain is dying because nothing prevents page-table to be
> allocated.
> 
> Currently page-table allocations can only happen from iommu_map(). As
> the domain is dying, there is no good reason to continue to modify the
> IOMMU page-tables.

While I agree this to be the case right now, I'm not sure it is a
good idea to build on it (in that you leave the unmap paths
untouched). Imo there's a fair chance this would be overlooked at
the point super page mappings get introduced (which has been long
overdue), and I thought prior discussion had lead to a possible
approach without risking use-after-free due to squashed unmap
requests.

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
>      /*
>       * Pages will be moved to the free list below. So we want to
>       * clear the root page-table to avoid any potential use after-free.
> +     *
> +     * After this call, no more IOMMU mapping can happen.
> +     *
>       */
>      hd->platform_ops->clear_root_pgtable(d);

I.e. you utilize the call in place of spin_barrier(). Maybe worth
saying in the comment?

Also, nit: Stray blank comment line.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-17 14:24 ` [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator Julien Grall
@ 2021-02-17 15:13   ` Jan Beulich
  2021-02-17 16:29     ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-17 15:13 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
Nit: s/not/no/ ?

> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
> +     * called unconditionally, so pgtables may be unitialized.
> +     */
> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>  }
>  
>  static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>       */
>      hd->platform_ops->clear_root_pgtable(d);
>  
> +    /* After this barrier no new page allocations can occur. */
> +    spin_barrier(&hd->arch.pgtables.lock);

Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
the barrier? Why introduce another one (with a similar comment)
explicitly now?

> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>      unmap_domain_page(p);
>  
>      spin_lock(&hd->arch.pgtables.lock);
> -    page_list_add(pg, &hd->arch.pgtables.list);
> +    /*
> +     * The IOMMU page-tables are freed when relinquishing the domain, but
> +     * nothing prevent allocation to happen afterwards. There is no valid
> +     * reasons to continue to update the IOMMU page-tables while the
> +     * domain is dying.
> +     *
> +     * So prevent page-table allocation when the domain is dying.
> +     *
> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
> +     */
> +    if ( likely(!d->is_dying) )
> +    {
> +        alive = true;
> +        page_list_add(pg, &hd->arch.pgtables.list);
> +    }
>      spin_unlock(&hd->arch.pgtables.lock);
>  
> +    if ( unlikely(!alive) )
> +    {
> +        free_domheap_page(pg);
> +        pg = NULL;
> +    }
> +
>      return pg;
>  }

As before I'm concerned of this forcing error paths to be taken
elsewhere, in case an allocation still happens (e.g. from unmap
once super page mappings are supported). Considering some of the
error handling in the IOMMU code is to invoke domain_crash(), it
would be quite unfortunate if we ended up crashing a domain
while it is being cleaned up after.

Additionally, the (at present still hypothetical) unmap case, if
failing because of the change here, would then again chance to
leave mappings in place while the underlying pages get freed. As
this would likely require an XSA, the change doesn't feel like
"hardening" to me.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-17 15:00     ` Julien Grall
@ 2021-02-17 15:17       ` Jan Beulich
  2021-02-17 16:48         ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-17 15:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel

On 17.02.2021 16:00, Julien Grall wrote:
> Hi Jan,
> 
> On 17/02/2021 14:54, Jan Beulich wrote:
>> On 17.02.2021 15:24, Julien Grall wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -267,6 +267,15 @@ int iommu_free_pgtables(struct domain *d)
>>>       struct page_info *pg;
>>>       unsigned int done = 0;
>>>   
>>> +    if ( !is_iommu_enabled(d) )
>>> +        return 0;
>>> +
>>> +    /*
>>> +     * Pages will be moved to the free list below. So we want to
>>> +     * clear the root page-table to avoid any potential use after-free.
>>> +     */
>>> +    hd->platform_ops->clear_root_pgtable(d);
>>
>> Taking amd_iommu_alloc_root() as example, is this really correct
>> prior to what is now patch 2? 
> 
> Yes, there are no more use-after-free...

And this is because of ...? The necessary lock isn't being held
here, so on another CPU allocation of a new root and then of new
page tables could happen before you make enough progress here,
and hence it looks to me as if there might then still be pages
which get freed while present in the page tables (and hence
accessible by devices).

Jan

>> What guarantees a new root table
>> won't get allocated subsequently?
> 
> It doesn't prevent root table allocation. I view the two as distincts 
> issues, hence the two patches.
> 
> Cheers,
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-17 15:01   ` Jan Beulich
@ 2021-02-17 16:07     ` Julien Grall
  2021-02-18 13:05       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-17 16:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel

Hi Jan,

On 17/02/2021 15:01, Jan Beulich wrote:
> On 17.02.2021 15:24, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new x86 IOMMU page-tables allocator will release the pages when
>> relinquishing the domain resources. However, this is not sufficient
>> when the domain is dying because nothing prevents page-table to be
>> allocated.
>>
>> Currently page-table allocations can only happen from iommu_map(). As
>> the domain is dying, there is no good reason to continue to modify the
>> IOMMU page-tables.
> 
> While I agree this to be the case right now, I'm not sure it is a
> good idea to build on it (in that you leave the unmap paths
> untouched).

I don't build on that assumption. See next patch.

> Imo there's a fair chance this would be overlooked at
> the point super page mappings get introduced (which has been long
> overdue), and I thought prior discussion had lead to a possible
> approach without risking use-after-free due to squashed unmap
> requests.

I know you suggested to zap the root page-tables... However, I don't 
think this is 4.15 material and you agree with this (you were the one 
pointed out that out).

>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
>>       /*
>>        * Pages will be moved to the free list below. So we want to
>>        * clear the root page-table to avoid any potential use after-free.
>> +     *
>> +     * After this call, no more IOMMU mapping can happen.
>> +     *
>>        */
>>       hd->platform_ops->clear_root_pgtable(d);
> 
> I.e. you utilize the call in place of spin_barrier(). Maybe worth
> saying in the comment?

Sure.

> 
> Also, nit: Stray blank comment line.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-17 15:13   ` Jan Beulich
@ 2021-02-17 16:29     ` Julien Grall
  2021-02-18 13:10       ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-17 16:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 17/02/2021 15:13, Jan Beulich wrote:
> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
> Nit: s/not/no/ ?
> 
>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>> +     * called unconditionally, so pgtables may be unitialized.
>> +     */
>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>   }
>>   
>>   static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>        */
>>       hd->platform_ops->clear_root_pgtable(d);
>>   
>> +    /* After this barrier no new page allocations can occur. */
>> +    spin_barrier(&hd->arch.pgtables.lock);
> 
> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
> the barrier? Why introduce another one (with a similar comment)
> explicitly now?
The barriers act differently, one will get against any IOMMU page-tables 
modification. The other one will gate against allocation.

There is no guarantee that the former will prevent the latter.

> 
>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>       unmap_domain_page(p);
>>   
>>       spin_lock(&hd->arch.pgtables.lock);
>> -    page_list_add(pg, &hd->arch.pgtables.list);
>> +    /*
>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>> +     * nothing prevent allocation to happen afterwards. There is no valid
>> +     * reasons to continue to update the IOMMU page-tables while the
>> +     * domain is dying.
>> +     *
>> +     * So prevent page-table allocation when the domain is dying.
>> +     *
>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>> +     */
>> +    if ( likely(!d->is_dying) )
>> +    {
>> +        alive = true;
>> +        page_list_add(pg, &hd->arch.pgtables.list);
>> +    }
>>       spin_unlock(&hd->arch.pgtables.lock);
>>   
>> +    if ( unlikely(!alive) )
>> +    {
>> +        free_domheap_page(pg);
>> +        pg = NULL;
>> +    }
>> +
>>       return pg;
>>   }
> 
> As before I'm concerned of this forcing error paths to be taken
> elsewhere, in case an allocation still happens (e.g. from unmap
> once super page mappings are supported). Considering some of the
> error handling in the IOMMU code is to invoke domain_crash(), it
> would be quite unfortunate if we ended up crashing a domain
> while it is being cleaned up after.

It is unfortunate, but I think this is better than having to leak page 
tables.

> 
> Additionally, the (at present still hypothetical) unmap case, if
> failing because of the change here, would then again chance to
> leave mappings in place while the underlying pages get freed. As
> this would likely require an XSA, the change doesn't feel like
> "hardening" to me.

I would agree with this if memory allocations could never fail. That's 
not that case and will become worse as we use IOMMU pool.

Do you have callers in mind that doesn't check the returns of iommu_unmap()?

Cheers,


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2021-02-17 15:17       ` Jan Beulich
@ 2021-02-17 16:48         ` Julien Grall
  0 siblings, 0 replies; 24+ messages in thread
From: Julien Grall @ 2021-02-17 16:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	Paul Durrant, xen-devel



On 17/02/2021 15:17, Jan Beulich wrote:
> On 17.02.2021 16:00, Julien Grall wrote:
>> Hi Jan,
>>
>> On 17/02/2021 14:54, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>>> @@ -267,6 +267,15 @@ int iommu_free_pgtables(struct domain *d)
>>>>        struct page_info *pg;
>>>>        unsigned int done = 0;
>>>>    
>>>> +    if ( !is_iommu_enabled(d) )
>>>> +        return 0;
>>>> +
>>>> +    /*
>>>> +     * Pages will be moved to the free list below. So we want to
>>>> +     * clear the root page-table to avoid any potential use after-free.
>>>> +     */
>>>> +    hd->platform_ops->clear_root_pgtable(d);
>>>
>>> Taking amd_iommu_alloc_root() as example, is this really correct
>>> prior to what is now patch 2?
>>
>> Yes, there are no more use-after-free...
> 
> And this is because of ...? The necessary lock isn't being held
> here, so on another CPU allocation of a new root and then of new
> page tables could happen before you make enough progress here,
> and hence it looks to me as if there might then still be pages
> which get freed while present in the page tables (and hence
> accessible by devices).

Ah yes. I forgot that now patch #3 is not first anymore. I can move 
again patch #3 first, although I know you dislike the approach taken 
there...

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-17 16:07     ` Julien Grall
@ 2021-02-18 13:05       ` Jan Beulich
  2021-02-18 13:25         ` Julien Grall
  2021-02-18 14:00         ` Paul Durrant
  0 siblings, 2 replies; 24+ messages in thread
From: Jan Beulich @ 2021-02-18 13:05 UTC (permalink / raw)
  To: Julien Grall, Paul Durrant
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian, xen-devel

On 17.02.2021 17:07, Julien Grall wrote:
> On 17/02/2021 15:01, Jan Beulich wrote:
>> On 17.02.2021 15:24, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The new x86 IOMMU page-tables allocator will release the pages when
>>> relinquishing the domain resources. However, this is not sufficient
>>> when the domain is dying because nothing prevents page-table to be
>>> allocated.
>>>
>>> Currently page-table allocations can only happen from iommu_map(). As
>>> the domain is dying, there is no good reason to continue to modify the
>>> IOMMU page-tables.
>>
>> While I agree this to be the case right now, I'm not sure it is a
>> good idea to build on it (in that you leave the unmap paths
>> untouched).
> 
> I don't build on that assumption. See next patch.

Yet as said there that patch makes unmapping perhaps more fragile,
by introducing a latent error source into the path.

>> Imo there's a fair chance this would be overlooked at
>> the point super page mappings get introduced (which has been long
>> overdue), and I thought prior discussion had lead to a possible
>> approach without risking use-after-free due to squashed unmap
>> requests.
> 
> I know you suggested to zap the root page-tables... However, I don't 
> think this is 4.15 material and you agree with this (you were the one 
> pointed out that out).

Paul - do you have any thoughts here? Outright zapping isn't
going to work, we'd need to switch to quarantine page tables at
the very least to prevent the issue with babbling devices. But
that still seems better to me than the introduction of a latent
issue on the unmap paths.

>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -273,6 +273,9 @@ int iommu_free_pgtables(struct domain *d)
>>>       /*
>>>        * Pages will be moved to the free list below. So we want to
>>>        * clear the root page-table to avoid any potential use after-free.
>>> +     *
>>> +     * After this call, no more IOMMU mapping can happen.
>>> +     *
>>>        */
>>>       hd->platform_ops->clear_root_pgtable(d);
>>
>> I.e. you utilize the call in place of spin_barrier(). Maybe worth
>> saying in the comment?
> 
> Sure.

Btw - "no more IOMMU mapping" is also possibly ambiguous here:
One might take it to mean both maps and unmaps. How about "no
new IOMMU mappings can be inserted", as long as the unmap paths
don't follow suit?

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-17 16:29     ` Julien Grall
@ 2021-02-18 13:10       ` Jan Beulich
  2021-02-18 13:19         ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-18 13:10 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 17.02.2021 17:29, Julien Grall wrote:
> On 17/02/2021 15:13, Jan Beulich wrote:
>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>> Nit: s/not/no/ ?
>>
>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>> +     * called unconditionally, so pgtables may be unitialized.
>>> +     */
>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>   }
>>>   
>>>   static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>        */
>>>       hd->platform_ops->clear_root_pgtable(d);
>>>   
>>> +    /* After this barrier no new page allocations can occur. */
>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>
>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>> the barrier? Why introduce another one (with a similar comment)
>> explicitly now?
> The barriers act differently, one will get against any IOMMU page-tables 
> modification. The other one will gate against allocation.
> 
> There is no guarantee that the former will prevent the latter.

Oh, right - different locks. I got confused here because in both
cases the goal is to prevent allocations.

>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>       unmap_domain_page(p);
>>>   
>>>       spin_lock(&hd->arch.pgtables.lock);
>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>> +    /*
>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>> +     * reasons to continue to update the IOMMU page-tables while the
>>> +     * domain is dying.
>>> +     *
>>> +     * So prevent page-table allocation when the domain is dying.
>>> +     *
>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>> +     */
>>> +    if ( likely(!d->is_dying) )
>>> +    {
>>> +        alive = true;
>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>> +    }
>>>       spin_unlock(&hd->arch.pgtables.lock);
>>>   
>>> +    if ( unlikely(!alive) )
>>> +    {
>>> +        free_domheap_page(pg);
>>> +        pg = NULL;
>>> +    }
>>> +
>>>       return pg;
>>>   }
>>
>> As before I'm concerned of this forcing error paths to be taken
>> elsewhere, in case an allocation still happens (e.g. from unmap
>> once super page mappings are supported). Considering some of the
>> error handling in the IOMMU code is to invoke domain_crash(), it
>> would be quite unfortunate if we ended up crashing a domain
>> while it is being cleaned up after.
> 
> It is unfortunate, but I think this is better than having to leak page 
> tables.
> 
>>
>> Additionally, the (at present still hypothetical) unmap case, if
>> failing because of the change here, would then again chance to
>> leave mappings in place while the underlying pages get freed. As
>> this would likely require an XSA, the change doesn't feel like
>> "hardening" to me.
> 
> I would agree with this if memory allocations could never fail. That's 
> not that case and will become worse as we use IOMMU pool.
> 
> Do you have callers in mind that doesn't check the returns of iommu_unmap()?

The function is marked __must_check, so there won't be any direct
callers ignoring errors (albeit I may be wrong here - we used to
have cases where we simply suppressed the resulting compiler
diagnostic, without really handling errors; not sure if all of
these are gone by now). Risks might be elsewhere.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-18 13:10       ` Jan Beulich
@ 2021-02-18 13:19         ` Julien Grall
  2021-02-18 17:04           ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-18 13:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel



On 18/02/2021 13:10, Jan Beulich wrote:
> On 17.02.2021 17:29, Julien Grall wrote:
>> On 17/02/2021 15:13, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>> Nit: s/not/no/ ?
>>>
>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>> +     */
>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>    }
>>>>    
>>>>    static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>         */
>>>>        hd->platform_ops->clear_root_pgtable(d);
>>>>    
>>>> +    /* After this barrier no new page allocations can occur. */
>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>
>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>> the barrier? Why introduce another one (with a similar comment)
>>> explicitly now?
>> The barriers act differently, one will get against any IOMMU page-tables
>> modification. The other one will gate against allocation.
>>
>> There is no guarantee that the former will prevent the latter.
> 
> Oh, right - different locks. I got confused here because in both
> cases the goal is to prevent allocations.
> 
>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>        unmap_domain_page(p);
>>>>    
>>>>        spin_lock(&hd->arch.pgtables.lock);
>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>> +    /*
>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>> +     * domain is dying.
>>>> +     *
>>>> +     * So prevent page-table allocation when the domain is dying.
>>>> +     *
>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>> +     */
>>>> +    if ( likely(!d->is_dying) )
>>>> +    {
>>>> +        alive = true;
>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>> +    }
>>>>        spin_unlock(&hd->arch.pgtables.lock);
>>>>    
>>>> +    if ( unlikely(!alive) )
>>>> +    {
>>>> +        free_domheap_page(pg);
>>>> +        pg = NULL;
>>>> +    }
>>>> +
>>>>        return pg;
>>>>    }
>>>
>>> As before I'm concerned of this forcing error paths to be taken
>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>> once super page mappings are supported). Considering some of the
>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>> would be quite unfortunate if we ended up crashing a domain
>>> while it is being cleaned up after.
>>
>> It is unfortunate, but I think this is better than having to leak page
>> tables.
>>
>>>
>>> Additionally, the (at present still hypothetical) unmap case, if
>>> failing because of the change here, would then again chance to
>>> leave mappings in place while the underlying pages get freed. As
>>> this would likely require an XSA, the change doesn't feel like
>>> "hardening" to me.
>>
>> I would agree with this if memory allocations could never fail. That's
>> not that case and will become worse as we use IOMMU pool.
>>
>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
> 
> The function is marked __must_check, so there won't be any direct
> callers ignoring errors (albeit I may be wrong here - we used to
> have cases where we simply suppressed the resulting compiler
> diagnostic, without really handling errors; not sure if all of
> these are gone by now). Risks might be elsewhere.

But this is not a new risk. So I don't understand why you think my patch 
is the one that may lead to an XSA in the future.

What my patch could do is expose such issues more easily rather than 
waiting until an OOM condition.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-18 13:05       ` Jan Beulich
@ 2021-02-18 13:25         ` Julien Grall
  2021-02-19  8:49           ` Jan Beulich
  2021-02-18 14:00         ` Paul Durrant
  1 sibling, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-18 13:25 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian, xen-devel

Hi,

On 18/02/2021 13:05, Jan Beulich wrote:
> On 17.02.2021 17:07, Julien Grall wrote:
>> On 17/02/2021 15:01, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>> relinquishing the domain resources. However, this is not sufficient
>>>> when the domain is dying because nothing prevents page-table to be
>>>> allocated.
>>>>
>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>> the domain is dying, there is no good reason to continue to modify the
>>>> IOMMU page-tables.
>>>
>>> While I agree this to be the case right now, I'm not sure it is a
>>> good idea to build on it (in that you leave the unmap paths
>>> untouched).
>>
>> I don't build on that assumption. See next patch.
> 
> Yet as said there that patch makes unmapping perhaps more fragile,
> by introducing a latent error source into the path.

I still don't see what latent error my patch will introduce. Allocation 
of page-tables are not guarantee to succeed.

So are you implying that a code may rely on the page allocation to succeed?

> 
>>> Imo there's a fair chance this would be overlooked at
>>> the point super page mappings get introduced (which has been long
>>> overdue), and I thought prior discussion had lead to a possible
>>> approach without risking use-after-free due to squashed unmap
>>> requests.
>>
>> I know you suggested to zap the root page-tables... However, I don't
>> think this is 4.15 material and you agree with this (you were the one
>> pointed out that out).
> 
> Paul - do you have any thoughts here? Outright zapping isn't
> going to work, we'd need to switch to quarantine page tables at
> the very least to prevent the issue with babbling devices. But
> that still seems better to me than the introduction of a latent
> issue on the unmap paths.

I am afraid I am not going to be able to come up with such patch for 
4.15. If you want to go that route for 4.15, then feel free to suggest a 
patch.

[...]

> Btw - "no more IOMMU mapping" is also possibly ambiguous here:
> One might take it to mean both maps and unmaps. How about "no
> new IOMMU mappings can be inserted", as long as the unmap paths
> don't follow suit?

Sure.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-18 13:05       ` Jan Beulich
  2021-02-18 13:25         ` Julien Grall
@ 2021-02-18 14:00         ` Paul Durrant
  2021-02-19  8:56           ` Jan Beulich
  1 sibling, 1 reply; 24+ messages in thread
From: Paul Durrant @ 2021-02-18 14:00 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian, xen-devel

On 18/02/2021 13:05, Jan Beulich wrote:
> On 17.02.2021 17:07, Julien Grall wrote:
>> On 17/02/2021 15:01, Jan Beulich wrote:
>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>> relinquishing the domain resources. However, this is not sufficient
>>>> when the domain is dying because nothing prevents page-table to be
>>>> allocated.
>>>>
>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>> the domain is dying, there is no good reason to continue to modify the
>>>> IOMMU page-tables.
>>>
>>> While I agree this to be the case right now, I'm not sure it is a
>>> good idea to build on it (in that you leave the unmap paths
>>> untouched).
>>
>> I don't build on that assumption. See next patch.
> 
> Yet as said there that patch makes unmapping perhaps more fragile,
> by introducing a latent error source into the path.
> 
>>> Imo there's a fair chance this would be overlooked at
>>> the point super page mappings get introduced (which has been long
>>> overdue), and I thought prior discussion had lead to a possible
>>> approach without risking use-after-free due to squashed unmap
>>> requests.
>>
>> I know you suggested to zap the root page-tables... However, I don't
>> think this is 4.15 material and you agree with this (you were the one
>> pointed out that out).
> 
> Paul - do you have any thoughts here? Outright zapping isn't
> going to work, we'd need to switch to quarantine page tables at
> the very least to prevent the issue with babbling devices. But
> that still seems better to me than the introduction of a latent
> issue on the unmap paths.
> 

I've not really been following this. AFAICT clear_root_pgtable() does 
not actually mess with any context entries so the device view of the 
tables won't change, will it?

   Paul


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-18 13:19         ` Julien Grall
@ 2021-02-18 17:04           ` Jan Beulich
  2021-02-18 17:41             ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-18 17:04 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 18.02.2021 14:19, Julien Grall wrote:
> 
> 
> On 18/02/2021 13:10, Jan Beulich wrote:
>> On 17.02.2021 17:29, Julien Grall wrote:
>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>> Nit: s/not/no/ ?
>>>>
>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>> +     */
>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>    }
>>>>>    
>>>>>    static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>         */
>>>>>        hd->platform_ops->clear_root_pgtable(d);
>>>>>    
>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>
>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>> the barrier? Why introduce another one (with a similar comment)
>>>> explicitly now?
>>> The barriers act differently, one will get against any IOMMU page-tables
>>> modification. The other one will gate against allocation.
>>>
>>> There is no guarantee that the former will prevent the latter.
>>
>> Oh, right - different locks. I got confused here because in both
>> cases the goal is to prevent allocations.
>>
>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>        unmap_domain_page(p);
>>>>>    
>>>>>        spin_lock(&hd->arch.pgtables.lock);
>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>> +    /*
>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>> +     * domain is dying.
>>>>> +     *
>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>> +     *
>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>> +     */
>>>>> +    if ( likely(!d->is_dying) )
>>>>> +    {
>>>>> +        alive = true;
>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>> +    }
>>>>>        spin_unlock(&hd->arch.pgtables.lock);
>>>>>    
>>>>> +    if ( unlikely(!alive) )
>>>>> +    {
>>>>> +        free_domheap_page(pg);
>>>>> +        pg = NULL;
>>>>> +    }
>>>>> +
>>>>>        return pg;
>>>>>    }
>>>>
>>>> As before I'm concerned of this forcing error paths to be taken
>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>> once super page mappings are supported). Considering some of the
>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>> would be quite unfortunate if we ended up crashing a domain
>>>> while it is being cleaned up after.
>>>
>>> It is unfortunate, but I think this is better than having to leak page
>>> tables.
>>>
>>>>
>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>> failing because of the change here, would then again chance to
>>>> leave mappings in place while the underlying pages get freed. As
>>>> this would likely require an XSA, the change doesn't feel like
>>>> "hardening" to me.
>>>
>>> I would agree with this if memory allocations could never fail. That's
>>> not that case and will become worse as we use IOMMU pool.
>>>
>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>
>> The function is marked __must_check, so there won't be any direct
>> callers ignoring errors (albeit I may be wrong here - we used to
>> have cases where we simply suppressed the resulting compiler
>> diagnostic, without really handling errors; not sure if all of
>> these are gone by now). Risks might be elsewhere.
> 
> But this is not a new risk. So I don't understand why you think my patch 
> is the one that may lead to an XSA in the future.

I didn't mean to imply it would _lead_ to an XSA (you're
right that the problem was there already before), but the term
"harden" suggests to me that the patch aims at eliminating
possible conditions. IOW the result here looks to me as if it
would yield a false sense of safety.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-18 17:04           ` Jan Beulich
@ 2021-02-18 17:41             ` Julien Grall
  2021-02-19  8:46               ` Jan Beulich
  0 siblings, 1 reply; 24+ messages in thread
From: Julien Grall @ 2021-02-18 17:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel



On 18/02/2021 17:04, Jan Beulich wrote:
> On 18.02.2021 14:19, Julien Grall wrote:
>>
>>
>> On 18/02/2021 13:10, Jan Beulich wrote:
>>> On 17.02.2021 17:29, Julien Grall wrote:
>>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>>> Nit: s/not/no/ ?
>>>>>
>>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>>> +     */
>>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>>     }
>>>>>>     
>>>>>>     static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>>          */
>>>>>>         hd->platform_ops->clear_root_pgtable(d);
>>>>>>     
>>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>>
>>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>>> the barrier? Why introduce another one (with a similar comment)
>>>>> explicitly now?
>>>> The barriers act differently, one will get against any IOMMU page-tables
>>>> modification. The other one will gate against allocation.
>>>>
>>>> There is no guarantee that the former will prevent the latter.
>>>
>>> Oh, right - different locks. I got confused here because in both
>>> cases the goal is to prevent allocations.
>>>
>>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>>         unmap_domain_page(p);
>>>>>>     
>>>>>>         spin_lock(&hd->arch.pgtables.lock);
>>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>>> +    /*
>>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>>> +     * domain is dying.
>>>>>> +     *
>>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>>> +     *
>>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>>> +     */
>>>>>> +    if ( likely(!d->is_dying) )
>>>>>> +    {
>>>>>> +        alive = true;
>>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>>> +    }
>>>>>>         spin_unlock(&hd->arch.pgtables.lock);
>>>>>>     
>>>>>> +    if ( unlikely(!alive) )
>>>>>> +    {
>>>>>> +        free_domheap_page(pg);
>>>>>> +        pg = NULL;
>>>>>> +    }
>>>>>> +
>>>>>>         return pg;
>>>>>>     }
>>>>>
>>>>> As before I'm concerned of this forcing error paths to be taken
>>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>>> once super page mappings are supported). Considering some of the
>>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>>> would be quite unfortunate if we ended up crashing a domain
>>>>> while it is being cleaned up after.
>>>>
>>>> It is unfortunate, but I think this is better than having to leak page
>>>> tables.
>>>>
>>>>>
>>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>>> failing because of the change here, would then again chance to
>>>>> leave mappings in place while the underlying pages get freed. As
>>>>> this would likely require an XSA, the change doesn't feel like
>>>>> "hardening" to me.
>>>>
>>>> I would agree with this if memory allocations could never fail. That's
>>>> not that case and will become worse as we use IOMMU pool.
>>>>
>>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>>
>>> The function is marked __must_check, so there won't be any direct
>>> callers ignoring errors (albeit I may be wrong here - we used to
>>> have cases where we simply suppressed the resulting compiler
>>> diagnostic, without really handling errors; not sure if all of
>>> these are gone by now). Risks might be elsewhere.
>>
>> But this is not a new risk. So I don't understand why you think my patch
>> is the one that may lead to an XSA in the future.
> 
> I didn't mean to imply it would _lead_ to an XSA (you're
> right that the problem was there already before), but the term
> "harden" suggests to me that the patch aims at eliminating
> possible conditions.

It elimitates the risk that someone inadvertently call 
iommu_alloc_pgtable() when the domain is dying. If this is happening 
after the page tables have been freed, then we would end up to leak memory.

> IOW the result here looks to me as if it
> would yield a false sense of safety.

So you are concerned about the wording rather than the code itself. Is 
that correct?

If so, how about "xen/iommu: Make the IOMMU page-table allocator 
slightly firmer"?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-18 17:41             ` Julien Grall
@ 2021-02-19  8:46               ` Jan Beulich
  2021-02-19  8:57                 ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-19  8:46 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel

On 18.02.2021 18:41, Julien Grall wrote:
> 
> 
> On 18/02/2021 17:04, Jan Beulich wrote:
>> On 18.02.2021 14:19, Julien Grall wrote:
>>>
>>>
>>> On 18/02/2021 13:10, Jan Beulich wrote:
>>>> On 17.02.2021 17:29, Julien Grall wrote:
>>>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>>>> Nit: s/not/no/ ?
>>>>>>
>>>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>>>> +     */
>>>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>>>     }
>>>>>>>     
>>>>>>>     static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>>>          */
>>>>>>>         hd->platform_ops->clear_root_pgtable(d);
>>>>>>>     
>>>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>>>
>>>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>>>> the barrier? Why introduce another one (with a similar comment)
>>>>>> explicitly now?
>>>>> The barriers act differently, one will get against any IOMMU page-tables
>>>>> modification. The other one will gate against allocation.
>>>>>
>>>>> There is no guarantee that the former will prevent the latter.
>>>>
>>>> Oh, right - different locks. I got confused here because in both
>>>> cases the goal is to prevent allocations.
>>>>
>>>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>>>         unmap_domain_page(p);
>>>>>>>     
>>>>>>>         spin_lock(&hd->arch.pgtables.lock);
>>>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>> +    /*
>>>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>>>> +     * domain is dying.
>>>>>>> +     *
>>>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>>>> +     *
>>>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>>>> +     */
>>>>>>> +    if ( likely(!d->is_dying) )
>>>>>>> +    {
>>>>>>> +        alive = true;
>>>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>> +    }
>>>>>>>         spin_unlock(&hd->arch.pgtables.lock);
>>>>>>>     
>>>>>>> +    if ( unlikely(!alive) )
>>>>>>> +    {
>>>>>>> +        free_domheap_page(pg);
>>>>>>> +        pg = NULL;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         return pg;
>>>>>>>     }
>>>>>>
>>>>>> As before I'm concerned of this forcing error paths to be taken
>>>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>>>> once super page mappings are supported). Considering some of the
>>>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>>>> would be quite unfortunate if we ended up crashing a domain
>>>>>> while it is being cleaned up after.
>>>>>
>>>>> It is unfortunate, but I think this is better than having to leak page
>>>>> tables.
>>>>>
>>>>>>
>>>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>>>> failing because of the change here, would then again chance to
>>>>>> leave mappings in place while the underlying pages get freed. As
>>>>>> this would likely require an XSA, the change doesn't feel like
>>>>>> "hardening" to me.
>>>>>
>>>>> I would agree with this if memory allocations could never fail. That's
>>>>> not that case and will become worse as we use IOMMU pool.
>>>>>
>>>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>>>
>>>> The function is marked __must_check, so there won't be any direct
>>>> callers ignoring errors (albeit I may be wrong here - we used to
>>>> have cases where we simply suppressed the resulting compiler
>>>> diagnostic, without really handling errors; not sure if all of
>>>> these are gone by now). Risks might be elsewhere.
>>>
>>> But this is not a new risk. So I don't understand why you think my patch
>>> is the one that may lead to an XSA in the future.
>>
>> I didn't mean to imply it would _lead_ to an XSA (you're
>> right that the problem was there already before), but the term
>> "harden" suggests to me that the patch aims at eliminating
>> possible conditions.
> 
> It elimitates the risk that someone inadvertently call 
> iommu_alloc_pgtable() when the domain is dying. If this is happening 
> after the page tables have been freed, then we would end up to leak memory.
> 
>> IOW the result here looks to me as if it
>> would yield a false sense of safety.
> 
> So you are concerned about the wording rather than the code itself. Is 
> that correct?

In a way, yes. First of all I'd like us to settle on what to do
with late unmap requests, for 4.15 (and if need be longer term).

Jan

> If so, how about "xen/iommu: Make the IOMMU page-table allocator 
> slightly firmer"?
> 
> Cheers,
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-18 13:25         ` Julien Grall
@ 2021-02-19  8:49           ` Jan Beulich
  2021-02-19  9:24             ` Julien Grall
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Beulich @ 2021-02-19  8:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	xen-devel, Paul Durrant

On 18.02.2021 14:25, Julien Grall wrote:
> Hi,
> 
> On 18/02/2021 13:05, Jan Beulich wrote:
>> On 17.02.2021 17:07, Julien Grall wrote:
>>> On 17/02/2021 15:01, Jan Beulich wrote:
>>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>>> relinquishing the domain resources. However, this is not sufficient
>>>>> when the domain is dying because nothing prevents page-table to be
>>>>> allocated.
>>>>>
>>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>>> the domain is dying, there is no good reason to continue to modify the
>>>>> IOMMU page-tables.
>>>>
>>>> While I agree this to be the case right now, I'm not sure it is a
>>>> good idea to build on it (in that you leave the unmap paths
>>>> untouched).
>>>
>>> I don't build on that assumption. See next patch.
>>
>> Yet as said there that patch makes unmapping perhaps more fragile,
>> by introducing a latent error source into the path.
> 
> I still don't see what latent error my patch will introduce. Allocation 
> of page-tables are not guarantee to succeed.
> 
> So are you implying that a code may rely on the page allocation to succeed?

I'm implying that failure to split a superpage may have unknown
consequences. Since we make no use of superpages when not
sharing page tables, I call this a latent issue which may go
unnoticed for quite some time once no longer latent.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-18 14:00         ` Paul Durrant
@ 2021-02-19  8:56           ` Jan Beulich
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Beulich @ 2021-02-19  8:56 UTC (permalink / raw)
  To: paul
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	xen-devel, Julien Grall

On 18.02.2021 15:00, Paul Durrant wrote:
> On 18/02/2021 13:05, Jan Beulich wrote:
>> On 17.02.2021 17:07, Julien Grall wrote:
>>> On 17/02/2021 15:01, Jan Beulich wrote:
>>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>
>>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>>> relinquishing the domain resources. However, this is not sufficient
>>>>> when the domain is dying because nothing prevents page-table to be
>>>>> allocated.
>>>>>
>>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>>> the domain is dying, there is no good reason to continue to modify the
>>>>> IOMMU page-tables.
>>>>
>>>> While I agree this to be the case right now, I'm not sure it is a
>>>> good idea to build on it (in that you leave the unmap paths
>>>> untouched).
>>>
>>> I don't build on that assumption. See next patch.
>>
>> Yet as said there that patch makes unmapping perhaps more fragile,
>> by introducing a latent error source into the path.
>>
>>>> Imo there's a fair chance this would be overlooked at
>>>> the point super page mappings get introduced (which has been long
>>>> overdue), and I thought prior discussion had lead to a possible
>>>> approach without risking use-after-free due to squashed unmap
>>>> requests.
>>>
>>> I know you suggested to zap the root page-tables... However, I don't
>>> think this is 4.15 material and you agree with this (you were the one
>>> pointed out that out).
>>
>> Paul - do you have any thoughts here? Outright zapping isn't
>> going to work, we'd need to switch to quarantine page tables at
>> the very least to prevent the issue with babbling devices. But
>> that still seems better to me than the introduction of a latent
>> issue on the unmap paths.
>>
> 
> I've not really been following this. AFAICT clear_root_pgtable() does 
> not actually mess with any context entries so the device view of the 
> tables won't change, will it?

Correct. Elsewhere I said that "zapping" here has two meanings,
one is what Julien does and the other is what I mean and what I
understand you also refer to - to actually replace the mappings
referenced by a context entry as soon as we no longer guarantee
to update them correctly.

My concern with not touching the unmap paths is that if there
was any "best effort" unmap left anywhere in the tree, we may
still end up with a use-after-free when late unmaps are now
made possibly fail by failing late allocation attempts.

Jan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator
  2021-02-19  8:46               ` Jan Beulich
@ 2021-02-19  8:57                 ` Julien Grall
  0 siblings, 0 replies; 24+ messages in thread
From: Julien Grall @ 2021-02-19  8:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel



On 19/02/2021 08:46, Jan Beulich wrote:
> On 18.02.2021 18:41, Julien Grall wrote:
>>
>>
>> On 18/02/2021 17:04, Jan Beulich wrote:
>>> On 18.02.2021 14:19, Julien Grall wrote:
>>>>
>>>>
>>>> On 18/02/2021 13:10, Jan Beulich wrote:
>>>>> On 17.02.2021 17:29, Julien Grall wrote:
>>>>>> On 17/02/2021 15:13, Jan Beulich wrote:
>>>>>>> On 17.02.2021 15:24, Julien Grall wrote:> --- a/xen/drivers/passthrough/x86/iommu.c> +++ b/xen/drivers/passthrough/x86/iommu.c> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)>  >  void arch_iommu_domain_destroy(struct domain *d)>  {> +    /*> +     * There should be not page-tables left allocated by the time the
>>>>>>> Nit: s/not/no/ ?
>>>>>>>
>>>>>>>> +     * domain is destroyed. Note that arch_iommu_domain_destroy() is
>>>>>>>> +     * called unconditionally, so pgtables may be unitialized.
>>>>>>>> +     */
>>>>>>>> +    ASSERT(dom_iommu(d)->platform_ops == NULL ||
>>>>>>>> +           page_list_empty(&dom_iommu(d)->arch.pgtables.list));
>>>>>>>>      }
>>>>>>>>      
>>>>>>>>      static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>>>>>>>> @@ -279,6 +286,9 @@ int iommu_free_pgtables(struct domain *d)
>>>>>>>>           */
>>>>>>>>          hd->platform_ops->clear_root_pgtable(d);
>>>>>>>>      
>>>>>>>> +    /* After this barrier no new page allocations can occur. */
>>>>>>>> +    spin_barrier(&hd->arch.pgtables.lock);
>>>>>>>
>>>>>>> Didn't patch 2 utilize the call to ->clear_root_pgtable() itself as
>>>>>>> the barrier? Why introduce another one (with a similar comment)
>>>>>>> explicitly now?
>>>>>> The barriers act differently, one will get against any IOMMU page-tables
>>>>>> modification. The other one will gate against allocation.
>>>>>>
>>>>>> There is no guarantee that the former will prevent the latter.
>>>>>
>>>>> Oh, right - different locks. I got confused here because in both
>>>>> cases the goal is to prevent allocations.
>>>>>
>>>>>>>> @@ -315,9 +326,29 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>>>>>          unmap_domain_page(p);
>>>>>>>>      
>>>>>>>>          spin_lock(&hd->arch.pgtables.lock);
>>>>>>>> -    page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>>> +    /*
>>>>>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>>>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>>>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>>>>>> +     * domain is dying.
>>>>>>>> +     *
>>>>>>>> +     * So prevent page-table allocation when the domain is dying.
>>>>>>>> +     *
>>>>>>>> +     * We relying on &hd->arch.pgtables.lock to synchronize d->is_dying.
>>>>>>>> +     */
>>>>>>>> +    if ( likely(!d->is_dying) )
>>>>>>>> +    {
>>>>>>>> +        alive = true;
>>>>>>>> +        page_list_add(pg, &hd->arch.pgtables.list);
>>>>>>>> +    }
>>>>>>>>          spin_unlock(&hd->arch.pgtables.lock);
>>>>>>>>      
>>>>>>>> +    if ( unlikely(!alive) )
>>>>>>>> +    {
>>>>>>>> +        free_domheap_page(pg);
>>>>>>>> +        pg = NULL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          return pg;
>>>>>>>>      }
>>>>>>>
>>>>>>> As before I'm concerned of this forcing error paths to be taken
>>>>>>> elsewhere, in case an allocation still happens (e.g. from unmap
>>>>>>> once super page mappings are supported). Considering some of the
>>>>>>> error handling in the IOMMU code is to invoke domain_crash(), it
>>>>>>> would be quite unfortunate if we ended up crashing a domain
>>>>>>> while it is being cleaned up after.
>>>>>>
>>>>>> It is unfortunate, but I think this is better than having to leak page
>>>>>> tables.
>>>>>>
>>>>>>>
>>>>>>> Additionally, the (at present still hypothetical) unmap case, if
>>>>>>> failing because of the change here, would then again chance to
>>>>>>> leave mappings in place while the underlying pages get freed. As
>>>>>>> this would likely require an XSA, the change doesn't feel like
>>>>>>> "hardening" to me.
>>>>>>
>>>>>> I would agree with this if memory allocations could never fail. That's
>>>>>> not that case and will become worse as we use IOMMU pool.
>>>>>>
>>>>>> Do you have callers in mind that doesn't check the returns of iommu_unmap()?
>>>>>
>>>>> The function is marked __must_check, so there won't be any direct
>>>>> callers ignoring errors (albeit I may be wrong here - we used to
>>>>> have cases where we simply suppressed the resulting compiler
>>>>> diagnostic, without really handling errors; not sure if all of
>>>>> these are gone by now). Risks might be elsewhere.
>>>>
>>>> But this is not a new risk. So I don't understand why you think my patch
>>>> is the one that may lead to an XSA in the future.
>>>
>>> I didn't mean to imply it would _lead_ to an XSA (you're
>>> right that the problem was there already before), but the term
>>> "harden" suggests to me that the patch aims at eliminating
>>> possible conditions.
>>
>> It elimitates the risk that someone inadvertently call
>> iommu_alloc_pgtable() when the domain is dying. If this is happening
>> after the page tables have been freed, then we would end up to leak memory.
>>
>>> IOW the result here looks to me as if it
>>> would yield a false sense of safety.
>>
>> So you are concerned about the wording rather than the code itself. Is
>> that correct?
> 
> In a way, yes. First of all I'd like us to settle on what to do
> with late unmap requests, for 4.15 (and if need be longer term).

iommu_unmap() doesn't allocate memory today and very unlikely going to 
do for the lifetime of 4.15. So why should we address the late unmap 
requests in 4.15?

At best, to me this looks like introduce a risk for fixing a so far 
inexistent bug.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
  2021-02-19  8:49           ` Jan Beulich
@ 2021-02-19  9:24             ` Julien Grall
  0 siblings, 0 replies; 24+ messages in thread
From: Julien Grall @ 2021-02-19  9:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
	xen-devel, Paul Durrant

Hi Jan,

On 19/02/2021 08:49, Jan Beulich wrote:
> On 18.02.2021 14:25, Julien Grall wrote:
>> Hi,
>>
>> On 18/02/2021 13:05, Jan Beulich wrote:
>>> On 17.02.2021 17:07, Julien Grall wrote:
>>>> On 17/02/2021 15:01, Jan Beulich wrote:
>>>>> On 17.02.2021 15:24, Julien Grall wrote:
>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>
>>>>>> The new x86 IOMMU page-tables allocator will release the pages when
>>>>>> relinquishing the domain resources. However, this is not sufficient
>>>>>> when the domain is dying because nothing prevents page-table to be
>>>>>> allocated.
>>>>>>
>>>>>> Currently page-table allocations can only happen from iommu_map(). As
>>>>>> the domain is dying, there is no good reason to continue to modify the
>>>>>> IOMMU page-tables.
>>>>>
>>>>> While I agree this to be the case right now, I'm not sure it is a
>>>>> good idea to build on it (in that you leave the unmap paths
>>>>> untouched).
>>>>
>>>> I don't build on that assumption. See next patch.
>>>
>>> Yet as said there that patch makes unmapping perhaps more fragile,
>>> by introducing a latent error source into the path.
>>
>> I still don't see what latent error my patch will introduce. Allocation
>> of page-tables are not guarantee to succeed.
>>
>> So are you implying that a code may rely on the page allocation to succeed?
> 
> I'm implying that failure to split a superpage may have unknown
> consequences.

As failure to flush the TLBs (see below).

> Since we make no use of superpages when not
> sharing page tables, I call this a latent issue which may go
> unnoticed for quite some time once no longer latent.

And it would take a lot longer to unnotice an OOM as this should happen 
less often ;).

But, I think this is wrong to blame the lower layer for a (latent) bug 
in the upper layer...

Even with your solution to zap page-tables, there is still a risk of 
failure because the TLB flush may fail (the operation can return an error).

So regardless the solution, we still need to have callers that function 
correctly.

Anyway, what matters right now is fixing a host crash when running PV 
guest with passthrough.  Neither patch #3 nor the iommu_unmap() part is 
strictly necessary for 4.15 (as you say this is latent).

So I would suggest to defer this discussion post-4.15.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-02-19  9:24 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-17 14:24 [for-4.15][PATCH v3 0/3] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
2021-02-17 14:24 ` [for-4.15][PATCH v3 1/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
2021-02-17 14:54   ` Jan Beulich
2021-02-17 15:00     ` Julien Grall
2021-02-17 15:17       ` Jan Beulich
2021-02-17 16:48         ` Julien Grall
2021-02-17 14:24 ` [for-4.15][PATCH v3 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
2021-02-17 15:01   ` Jan Beulich
2021-02-17 16:07     ` Julien Grall
2021-02-18 13:05       ` Jan Beulich
2021-02-18 13:25         ` Julien Grall
2021-02-19  8:49           ` Jan Beulich
2021-02-19  9:24             ` Julien Grall
2021-02-18 14:00         ` Paul Durrant
2021-02-19  8:56           ` Jan Beulich
2021-02-17 14:24 ` [for-4.15][PATCH v3 3/3] xen/iommu: x86: Harden the IOMMU page-table allocator Julien Grall
2021-02-17 15:13   ` Jan Beulich
2021-02-17 16:29     ` Julien Grall
2021-02-18 13:10       ` Jan Beulich
2021-02-18 13:19         ` Julien Grall
2021-02-18 17:04           ` Jan Beulich
2021-02-18 17:41             ` Julien Grall
2021-02-19  8:46               ` Jan Beulich
2021-02-19  8:57                 ` Julien Grall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).