* [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown
@ 2021-02-26 10:56 Julien Grall
2021-02-26 10:56 ` [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled Julien Grall
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Julien Grall @ 2021-02-26 10:56 UTC (permalink / raw)
To: xen-devel
Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant,
Andrew Cooper, Kevin Tian
From: Julien Grall <jgrall@amazon.com>
Hi all,
This series is a collection of bug fixes for the IOMMU teardown code.
All of them are candidate for 4.15 as they can either leak memory or
lead to host crash/host corruption.
This is sent directly on xen-devel because all the issues were either
introduced in 4.15 or happen in the domain creation code.
Major changes since v4:
- New patch added (it is a split of patch #1 in v4)
Major changes since v3:
- Remove patch #3 "xen/iommu: x86: Harden the IOMMU page-table
allocator" as it is not strictly necessary for 4.15.
- Re-order the patches to avoid on a follow-up patch to fix
completely the issue.
Major changes since v2:
- patch #1 "xen/x86: p2m: Don't map the special pages in the IOMMU
page-tables" has been removed. This requires Jan's patch [1] to
fully mitigate memory leaks.
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
@Ian, I assumed that the release-acked-by would stand even with the
patch split. Let me know if if this is not the case.
Cheers,
[1] <90271e69-c07e-a32c-5531-a79b10ef03dd@suse.com>
Julien Grall (3):
xen/iommu: x86: Don't try to free page tables is the IOMMU is not
enabled
xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
xen/iommu: x86: Clear the root page-table before freeing the
page-tables
xen/drivers/passthrough/amd/iommu_map.c | 12 +++++++++++
xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 ++++++++++-
xen/drivers/passthrough/vtd/iommu.c | 24 ++++++++++++++++++++-
xen/drivers/passthrough/x86/iommu.c | 19 ++++++++++++++++
xen/include/xen/iommu.h | 1 +
5 files changed, 66 insertions(+), 2 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled
2021-02-26 10:56 [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
@ 2021-02-26 10:56 ` Julien Grall
2021-02-26 13:27 ` Jan Beulich
2021-02-26 10:56 ` [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Julien Grall @ 2021-02-26 10:56 UTC (permalink / raw)
To: xen-devel; +Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant
From: Julien Grall <jgrall@amazon.com>
When using CONFIG_BIGMEM=y, the page_list cannot be accessed whilst it
is is unitialized. However, iommu_free_pgtables() will be called even if
the domain is not using an IOMMU.
Consequently, Xen will try to go through the page list and deference a
NULL pointer.
Bail out early if the domain is not using an IOMMU.
Fixes: 15bc9a1ef51c ("x86/iommu: add common page-table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
Changes in v5:
- Patch added. This was split from "xen/x86: iommu: Ignore
IOMMU mapping requests when a domain is dying"
---
xen/drivers/passthrough/x86/iommu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index cea1032b3d02..58a330e82247 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,6 +267,9 @@ int iommu_free_pgtables(struct domain *d)
struct page_info *pg;
unsigned int done = 0;
+ if ( !is_iommu_enabled(d) )
+ return 0;
+
while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
{
free_domheap_page(pg);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
2021-02-26 10:56 [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
2021-02-26 10:56 ` [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled Julien Grall
@ 2021-02-26 10:56 ` Julien Grall
2021-02-26 13:30 ` Jan Beulich
2021-03-01 2:52 ` Tian, Kevin
2021-02-26 10:56 ` [PATCH for-4.15 v5 3/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
2021-03-02 9:56 ` [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
3 siblings, 2 replies; 10+ messages in thread
From: Julien Grall @ 2021-02-26 10:56 UTC (permalink / raw)
To: xen-devel
Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
Kevin Tian, Paul Durrant
From: Julien Grall <jgrall@amazon.com>
The new x86 IOMMU page-tables allocator will release the pages when
relinquishing the domain resources. However, this is not sufficient
when the domain is dying because nothing prevents page-table to be
allocated.
As the domain is dying, it is not necessary to continue to modify the
IOMMU page-tables as they are going to be destroyed soon.
At the moment, page-table allocates will only happen when iommu_map().
So after this change there will be no more page-table allocation
happening because we don't use superpage mappings yet when not sharing
page tables.
In order to observe d->is_dying correctly, we need to rely on per-arch
locking, so the check to ignore IOMMU mapping is added on the per-driver
map_page() callback.
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
As discussed in v3, this is only covering 4.15. We can discuss
post-4.15 how to catch page-table allocations if another caller (e.g.
iommu_unmap() if we ever decide to support superpages) start to use the
page-table allocator.
Changes in v5:
- Clarify in the commit message why fixing iommu_map() is enough
- Split "if ( !is_iommu_enabled(d) )" in a separate patch
- Update the comment on top of the spin_barrier()
Changes in v4:
- Move the patch to the top of the queue
- Reword the commit message
Changes in v3:
- Patch added. This is a replacement of "xen/iommu: iommu_map: Don't
crash the domain if it is dying"
---
xen/drivers/passthrough/amd/iommu_map.c | 12 ++++++++++++
xen/drivers/passthrough/vtd/iommu.c | 12 ++++++++++++
xen/drivers/passthrough/x86/iommu.c | 3 +++
3 files changed, 27 insertions(+)
diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index d3a8b1aec766..560af54b765b 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -285,6 +285,18 @@ int amd_iommu_map_page(struct domain *d, dfn_t dfn, mfn_t mfn,
spin_lock(&hd->arch.mapping_lock);
+ /*
+ * IOMMU mapping request can be safely ignored when the domain is dying.
+ *
+ * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+ * before any page tables are freed (see iommu_free_pgtables()).
+ */
+ if ( d->is_dying )
+ {
+ spin_unlock(&hd->arch.mapping_lock);
+ return 0;
+ }
+
rc = amd_iommu_alloc_root(d);
if ( rc )
{
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index d136fe36883b..b549a71530d5 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1762,6 +1762,18 @@ static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
spin_lock(&hd->arch.mapping_lock);
+ /*
+ * IOMMU mapping request can be safely ignored when the domain is dying.
+ *
+ * hd->arch.mapping_lock guarantees that d->is_dying will be observed
+ * before any page tables are freed (see iommu_free_pgtables())
+ */
+ if ( d->is_dying )
+ {
+ spin_unlock(&hd->arch.mapping_lock);
+ return 0;
+ }
+
pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 1);
if ( !pg_maddr )
{
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 58a330e82247..ad19b7dd461c 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -270,6 +270,9 @@ int iommu_free_pgtables(struct domain *d)
if ( !is_iommu_enabled(d) )
return 0;
+ /* After this barrier, no new IOMMU mappings can be inserted. */
+ spin_barrier(&hd->arch.mapping_lock);
+
while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
{
free_domheap_page(pg);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH for-4.15 v5 3/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
2021-02-26 10:56 [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
2021-02-26 10:56 ` [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled Julien Grall
2021-02-26 10:56 ` [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
@ 2021-02-26 10:56 ` Julien Grall
2021-03-01 2:54 ` Tian, Kevin
2021-03-02 9:56 ` [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
3 siblings, 1 reply; 10+ messages in thread
From: Julien Grall @ 2021-02-26 10:56 UTC (permalink / raw)
To: xen-devel
Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Andrew Cooper,
Kevin Tian, Paul Durrant
From: Julien Grall <jgrall@amazon.com>
The new per-domain IOMMU page-table allocator will now free the
page-tables when domain's resources are relinquished. However, the
per-domain IOMMU structure will still contain a dangling pointer to
the root page-table.
Xen may access the IOMMU page-tables afterwards at least in the case of
PV domain:
(XEN) Xen call trace:
(XEN) [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
(XEN) [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
(XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
(XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
(XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
(XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3
(XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
(XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
(XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
(XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
(XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
(XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
(XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
(XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
(XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
(XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
(XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
(XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
(XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
(XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
(XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120
This will result to a use after-free and possibly an host crash or
memory corruption.
It would not be possible to free the page-tables further down in
domain_relinquish_resources() because cleanup_page_mappings() will only
be called when the last reference on the page dropped. This may happen
much later if another domain still hold a reference.
After all the PCI devices have been de-assigned, nobody should use the
IOMMU page-tables and it is therefore pointless to try to modify them.
So we can simply clear any reference to the root page-table in the
per-domain IOMMU structure. This requires to introduce a new callback of
the method will depend on the IOMMU driver used.
Take the opportunity to add an ASSERT() in arch_iommu_domain_destroy()
to check if we freed all the IOMMU page tables.
Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v5:
- Add Jan's reviewed-by
- Fix typo
- Use ! rather than == NULL
Changes in v4:
- Move the patch later in the series as we need to prevent
iommu_map() to allocate memory first.
- Add an ASSERT() in arch_iommu_domain_destroy().
Changes in v3:
- Move the patch earlier in the series
- Reword the commit message
Changes in v2:
- Introduce clear_root_pgtable()
- Move the patch later in the series
---
xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++++++++-
xen/drivers/passthrough/vtd/iommu.c | 12 +++++++++++-
xen/drivers/passthrough/x86/iommu.c | 13 +++++++++++++
xen/include/xen/iommu.h | 1 +
4 files changed, 36 insertions(+), 2 deletions(-)
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 42b5a5a9bec4..085fe2f5771e 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain *d, u8 devfn,
return reassign_device(pdev->domain, d, devfn, pdev);
}
+static void amd_iommu_clear_root_pgtable(struct domain *d)
+{
+ struct domain_iommu *hd = dom_iommu(d);
+
+ spin_lock(&hd->arch.mapping_lock);
+ hd->arch.amd.root_table = NULL;
+ spin_unlock(&hd->arch.mapping_lock);
+}
+
static void amd_iommu_domain_destroy(struct domain *d)
{
- dom_iommu(d)->arch.amd.root_table = NULL;
+ ASSERT(!dom_iommu(d)->arch.amd.root_table);
}
static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
@@ -565,6 +574,7 @@ static const struct iommu_ops __initconstrel _iommu_ops = {
.remove_device = amd_iommu_remove_device,
.assign_device = amd_iommu_assign_device,
.teardown = amd_iommu_domain_destroy,
+ .clear_root_pgtable = amd_iommu_clear_root_pgtable,
.map_page = amd_iommu_map_page,
.unmap_page = amd_iommu_unmap_page,
.iotlb_flush = amd_iommu_flush_iotlb_pages,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index b549a71530d5..475efb3be3bd 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1726,6 +1726,15 @@ out:
return ret;
}
+static void iommu_clear_root_pgtable(struct domain *d)
+{
+ struct domain_iommu *hd = dom_iommu(d);
+
+ spin_lock(&hd->arch.mapping_lock);
+ hd->arch.vtd.pgd_maddr = 0;
+ spin_unlock(&hd->arch.mapping_lock);
+}
+
static void iommu_domain_teardown(struct domain *d)
{
struct domain_iommu *hd = dom_iommu(d);
@@ -1740,7 +1749,7 @@ static void iommu_domain_teardown(struct domain *d)
xfree(mrmrr);
}
- hd->arch.vtd.pgd_maddr = 0;
+ ASSERT(!hd->arch.vtd.pgd_maddr);
}
static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -2731,6 +2740,7 @@ static struct iommu_ops __initdata vtd_ops = {
.remove_device = intel_iommu_remove_device,
.assign_device = intel_iommu_assign_device,
.teardown = iommu_domain_teardown,
+ .clear_root_pgtable = iommu_clear_root_pgtable,
.map_page = intel_iommu_map_page,
.unmap_page = intel_iommu_unmap_page,
.lookup_page = intel_iommu_lookup_page,
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index ad19b7dd461c..b90bb31bfeea 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)
void arch_iommu_domain_destroy(struct domain *d)
{
+ /*
+ * There should be not page-tables left allocated by the time the
+ * domain is destroyed. Note that arch_iommu_domain_destroy() is
+ * called unconditionally, so pgtables may be uninitialized.
+ */
+ ASSERT(!dom_iommu(d)->platform_ops ||
+ page_list_empty(&dom_iommu(d)->arch.pgtables.list));
}
static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
@@ -273,6 +280,12 @@ int iommu_free_pgtables(struct domain *d)
/* After this barrier, no new IOMMU mappings can be inserted. */
spin_barrier(&hd->arch.mapping_lock);
+ /*
+ * Pages will be moved to the free list below. So we want to
+ * clear the root page-table to avoid any potential use after-free.
+ */
+ hd->platform_ops->clear_root_pgtable(d);
+
while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
{
free_domheap_page(pg);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 863a68fe1622..d59ed7cbad43 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -272,6 +272,7 @@ struct iommu_ops {
int (*adjust_irq_affinities)(void);
void (*sync_cache)(const void *addr, unsigned int size);
+ void (*clear_root_pgtable)(struct domain *d);
#endif /* CONFIG_X86 */
int __must_check (*suspend)(void);
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled
2021-02-26 10:56 ` [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled Julien Grall
@ 2021-02-26 13:27 ` Jan Beulich
0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2021-02-26 13:27 UTC (permalink / raw)
To: Julien Grall; +Cc: hongyxia, iwj, Julien Grall, Paul Durrant, xen-devel
On 26.02.2021 11:56, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
>
> When using CONFIG_BIGMEM=y, the page_list cannot be accessed whilst it
> is is unitialized. However, iommu_free_pgtables() will be called even if
> the domain is not using an IOMMU.
>
> Consequently, Xen will try to go through the page list and deference a
> NULL pointer.
>
> Bail out early if the domain is not using an IOMMU.
>
> Fixes: 15bc9a1ef51c ("x86/iommu: add common page-table allocator")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
2021-02-26 10:56 ` [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
@ 2021-02-26 13:30 ` Jan Beulich
2021-03-01 9:21 ` Julien Grall
2021-03-01 2:52 ` Tian, Kevin
1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2021-02-26 13:30 UTC (permalink / raw)
To: Julien Grall
Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
Paul Durrant, xen-devel
On 26.02.2021 11:56, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
>
> The new x86 IOMMU page-tables allocator will release the pages when
> relinquishing the domain resources. However, this is not sufficient
> when the domain is dying because nothing prevents page-table to be
> allocated.
>
> As the domain is dying, it is not necessary to continue to modify the
> IOMMU page-tables as they are going to be destroyed soon.
>
> At the moment, page-table allocates will only happen when iommu_map().
> So after this change there will be no more page-table allocation
> happening because we don't use superpage mappings yet when not sharing
> page tables.
>
> In order to observe d->is_dying correctly, we need to rely on per-arch
> locking, so the check to ignore IOMMU mapping is added on the per-driver
> map_page() callback.
>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Does this also want a Fixes: tag (the same as patch 1)?
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
2021-02-26 10:56 ` [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
2021-02-26 13:30 ` Jan Beulich
@ 2021-03-01 2:52 ` Tian, Kevin
1 sibling, 0 replies; 10+ messages in thread
From: Tian, Kevin @ 2021-03-01 2:52 UTC (permalink / raw)
To: Julien Grall, xen-devel
Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Cooper, Andrew, Paul Durrant
> From: Julien Grall <julien@xen.org>
> Sent: Friday, February 26, 2021 6:57 PM
>
> From: Julien Grall <jgrall@amazon.com>
>
> The new x86 IOMMU page-tables allocator will release the pages when
> relinquishing the domain resources. However, this is not sufficient
> when the domain is dying because nothing prevents page-table to be
> allocated.
>
> As the domain is dying, it is not necessary to continue to modify the
> IOMMU page-tables as they are going to be destroyed soon.
>
> At the moment, page-table allocates will only happen when iommu_map().
> So after this change there will be no more page-table allocation
> happening because we don't use superpage mappings yet when not sharing
> page tables.
>
> In order to observe d->is_dying correctly, we need to rely on per-arch
> locking, so the check to ignore IOMMU mapping is added on the per-driver
> map_page() callback.
>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>
> ---
>
> As discussed in v3, this is only covering 4.15. We can discuss
> post-4.15 how to catch page-table allocations if another caller (e.g.
> iommu_unmap() if we ever decide to support superpages) start to use the
> page-table allocator.
>
> Changes in v5:
> - Clarify in the commit message why fixing iommu_map() is enough
> - Split "if ( !is_iommu_enabled(d) )" in a separate patch
> - Update the comment on top of the spin_barrier()
>
> Changes in v4:
> - Move the patch to the top of the queue
> - Reword the commit message
>
> Changes in v3:
> - Patch added. This is a replacement of "xen/iommu: iommu_map: Don't
> crash the domain if it is dying"
> ---
> xen/drivers/passthrough/amd/iommu_map.c | 12 ++++++++++++
> xen/drivers/passthrough/vtd/iommu.c | 12 ++++++++++++
> xen/drivers/passthrough/x86/iommu.c | 3 +++
> 3 files changed, 27 insertions(+)
>
> diff --git a/xen/drivers/passthrough/amd/iommu_map.c
> b/xen/drivers/passthrough/amd/iommu_map.c
> index d3a8b1aec766..560af54b765b 100644
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -285,6 +285,18 @@ int amd_iommu_map_page(struct domain *d, dfn_t
> dfn, mfn_t mfn,
>
> spin_lock(&hd->arch.mapping_lock);
>
> + /*
> + * IOMMU mapping request can be safely ignored when the domain is
> dying.
> + *
> + * hd->arch.mapping_lock guarantees that d->is_dying will be observed
> + * before any page tables are freed (see iommu_free_pgtables()).
> + */
> + if ( d->is_dying )
> + {
> + spin_unlock(&hd->arch.mapping_lock);
> + return 0;
> + }
> +
> rc = amd_iommu_alloc_root(d);
> if ( rc )
> {
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index d136fe36883b..b549a71530d5 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1762,6 +1762,18 @@ static int __must_check
> intel_iommu_map_page(struct domain *d, dfn_t dfn,
>
> spin_lock(&hd->arch.mapping_lock);
>
> + /*
> + * IOMMU mapping request can be safely ignored when the domain is
> dying.
> + *
> + * hd->arch.mapping_lock guarantees that d->is_dying will be observed
> + * before any page tables are freed (see iommu_free_pgtables())
> + */
> + if ( d->is_dying )
> + {
> + spin_unlock(&hd->arch.mapping_lock);
> + return 0;
> + }
> +
> pg_maddr = addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 1);
> if ( !pg_maddr )
> {
> diff --git a/xen/drivers/passthrough/x86/iommu.c
> b/xen/drivers/passthrough/x86/iommu.c
> index 58a330e82247..ad19b7dd461c 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -270,6 +270,9 @@ int iommu_free_pgtables(struct domain *d)
> if ( !is_iommu_enabled(d) )
> return 0;
>
> + /* After this barrier, no new IOMMU mappings can be inserted. */
> + spin_barrier(&hd->arch.mapping_lock);
> +
> while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
> {
> free_domheap_page(pg);
> --
> 2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH for-4.15 v5 3/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables
2021-02-26 10:56 ` [PATCH for-4.15 v5 3/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2021-03-01 2:54 ` Tian, Kevin
0 siblings, 0 replies; 10+ messages in thread
From: Tian, Kevin @ 2021-03-01 2:54 UTC (permalink / raw)
To: Julien Grall, xen-devel
Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Cooper, Andrew, Paul Durrant
> From: Julien Grall <julien@xen.org>
> Sent: Friday, February 26, 2021 6:57 PM
>
> From: Julien Grall <jgrall@amazon.com>
>
> The new per-domain IOMMU page-table allocator will now free the
> page-tables when domain's resources are relinquished. However, the
> per-domain IOMMU structure will still contain a dangling pointer to
> the root page-table.
>
> Xen may access the IOMMU page-tables afterwards at least in the case of
> PV domain:
>
> (XEN) Xen call trace:
> (XEN) [<ffff82d04025b4b2>] R
> iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
> (XEN) [<ffff82d04025b695>] F
> iommu.c#intel_iommu_unmap_page+0x5d/0xf8
> (XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
> (XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
> (XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
> (XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3
> (XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
> (XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
> (XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
> (XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
> (XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
> (XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
> (XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
> (XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
> (XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
> (XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
> (XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120
>
> This will result to a use after-free and possibly an host crash or
> memory corruption.
>
> It would not be possible to free the page-tables further down in
> domain_relinquish_resources() because cleanup_page_mappings() will only
> be called when the last reference on the page dropped. This may happen
> much later if another domain still hold a reference.
>
> After all the PCI devices have been de-assigned, nobody should use the
> IOMMU page-tables and it is therefore pointless to try to modify them.
>
> So we can simply clear any reference to the root page-table in the
> per-domain IOMMU structure. This requires to introduce a new callback of
> the method will depend on the IOMMU driver used.
>
> Take the opportunity to add an ASSERT() in arch_iommu_domain_destroy()
> to check if we freed all the IOMMU page tables.
>
> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table
> allocator")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>
> ---
> Changes in v5:
> - Add Jan's reviewed-by
> - Fix typo
> - Use ! rather than == NULL
>
> Changes in v4:
> - Move the patch later in the series as we need to prevent
> iommu_map() to allocate memory first.
> - Add an ASSERT() in arch_iommu_domain_destroy().
>
> Changes in v3:
> - Move the patch earlier in the series
> - Reword the commit message
>
> Changes in v2:
> - Introduce clear_root_pgtable()
> - Move the patch later in the series
> ---
> xen/drivers/passthrough/amd/pci_amd_iommu.c | 12 +++++++++++-
> xen/drivers/passthrough/vtd/iommu.c | 12 +++++++++++-
> xen/drivers/passthrough/x86/iommu.c | 13 +++++++++++++
> xen/include/xen/iommu.h | 1 +
> 4 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 42b5a5a9bec4..085fe2f5771e 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -381,9 +381,18 @@ static int amd_iommu_assign_device(struct domain
> *d, u8 devfn,
> return reassign_device(pdev->domain, d, devfn, pdev);
> }
>
> +static void amd_iommu_clear_root_pgtable(struct domain *d)
> +{
> + struct domain_iommu *hd = dom_iommu(d);
> +
> + spin_lock(&hd->arch.mapping_lock);
> + hd->arch.amd.root_table = NULL;
> + spin_unlock(&hd->arch.mapping_lock);
> +}
> +
> static void amd_iommu_domain_destroy(struct domain *d)
> {
> - dom_iommu(d)->arch.amd.root_table = NULL;
> + ASSERT(!dom_iommu(d)->arch.amd.root_table);
> }
>
> static int amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
> @@ -565,6 +574,7 @@ static const struct iommu_ops __initconstrel
> _iommu_ops = {
> .remove_device = amd_iommu_remove_device,
> .assign_device = amd_iommu_assign_device,
> .teardown = amd_iommu_domain_destroy,
> + .clear_root_pgtable = amd_iommu_clear_root_pgtable,
> .map_page = amd_iommu_map_page,
> .unmap_page = amd_iommu_unmap_page,
> .iotlb_flush = amd_iommu_flush_iotlb_pages,
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index b549a71530d5..475efb3be3bd 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1726,6 +1726,15 @@ out:
> return ret;
> }
>
> +static void iommu_clear_root_pgtable(struct domain *d)
> +{
> + struct domain_iommu *hd = dom_iommu(d);
> +
> + spin_lock(&hd->arch.mapping_lock);
> + hd->arch.vtd.pgd_maddr = 0;
> + spin_unlock(&hd->arch.mapping_lock);
> +}
> +
> static void iommu_domain_teardown(struct domain *d)
> {
> struct domain_iommu *hd = dom_iommu(d);
> @@ -1740,7 +1749,7 @@ static void iommu_domain_teardown(struct
> domain *d)
> xfree(mrmrr);
> }
>
> - hd->arch.vtd.pgd_maddr = 0;
> + ASSERT(!hd->arch.vtd.pgd_maddr);
> }
>
> static int __must_check intel_iommu_map_page(struct domain *d, dfn_t dfn,
> @@ -2731,6 +2740,7 @@ static struct iommu_ops __initdata vtd_ops = {
> .remove_device = intel_iommu_remove_device,
> .assign_device = intel_iommu_assign_device,
> .teardown = iommu_domain_teardown,
> + .clear_root_pgtable = iommu_clear_root_pgtable,
> .map_page = intel_iommu_map_page,
> .unmap_page = intel_iommu_unmap_page,
> .lookup_page = intel_iommu_lookup_page,
> diff --git a/xen/drivers/passthrough/x86/iommu.c
> b/xen/drivers/passthrough/x86/iommu.c
> index ad19b7dd461c..b90bb31bfeea 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -149,6 +149,13 @@ int arch_iommu_domain_init(struct domain *d)
>
> void arch_iommu_domain_destroy(struct domain *d)
> {
> + /*
> + * There should be not page-tables left allocated by the time the
> + * domain is destroyed. Note that arch_iommu_domain_destroy() is
> + * called unconditionally, so pgtables may be uninitialized.
> + */
> + ASSERT(!dom_iommu(d)->platform_ops ||
> + page_list_empty(&dom_iommu(d)->arch.pgtables.list));
> }
>
> static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> @@ -273,6 +280,12 @@ int iommu_free_pgtables(struct domain *d)
> /* After this barrier, no new IOMMU mappings can be inserted. */
> spin_barrier(&hd->arch.mapping_lock);
>
> + /*
> + * Pages will be moved to the free list below. So we want to
> + * clear the root page-table to avoid any potential use after-free.
> + */
> + hd->platform_ops->clear_root_pgtable(d);
> +
> while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
> {
> free_domheap_page(pg);
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 863a68fe1622..d59ed7cbad43 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -272,6 +272,7 @@ struct iommu_ops {
>
> int (*adjust_irq_affinities)(void);
> void (*sync_cache)(const void *addr, unsigned int size);
> + void (*clear_root_pgtable)(struct domain *d);
> #endif /* CONFIG_X86 */
>
> int __must_check (*suspend)(void);
> --
> 2.17.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
2021-02-26 13:30 ` Jan Beulich
@ 2021-03-01 9:21 ` Julien Grall
0 siblings, 0 replies; 10+ messages in thread
From: Julien Grall @ 2021-03-01 9:21 UTC (permalink / raw)
To: Jan Beulich
Cc: hongyxia, iwj, Julien Grall, Andrew Cooper, Kevin Tian,
Paul Durrant, xen-devel
Hi Jan,
On 26/02/2021 13:30, Jan Beulich wrote:
> On 26.02.2021 11:56, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new x86 IOMMU page-tables allocator will release the pages when
>> relinquishing the domain resources. However, this is not sufficient
>> when the domain is dying because nothing prevents page-table to be
>> allocated.
>>
>> As the domain is dying, it is not necessary to continue to modify the
>> IOMMU page-tables as they are going to be destroyed soon.
>>
>> At the moment, page-table allocates will only happen when iommu_map().
>> So after this change there will be no more page-table allocation
>> happening because we don't use superpage mappings yet when not sharing
>> page tables.
>>
>> In order to observe d->is_dying correctly, we need to rely on per-arch
>> locking, so the check to ignore IOMMU mapping is added on the per-driver
>> map_page() callback.
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thanks!
>
> Does this also want a Fixes: tag (the same as patch 1)?
I think so. I will add it when committing the series.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown
2021-02-26 10:56 [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
` (2 preceding siblings ...)
2021-02-26 10:56 ` [PATCH for-4.15 v5 3/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2021-03-02 9:56 ` Julien Grall
3 siblings, 0 replies; 10+ messages in thread
From: Julien Grall @ 2021-03-02 9:56 UTC (permalink / raw)
To: xen-devel
Cc: hongyxia, iwj, Julien Grall, Jan Beulich, Paul Durrant,
Andrew Cooper, Kevin Tian
Hi,
On 26/02/2021 10:56, Julien Grall wrote:
> Julien Grall (3):
> xen/iommu: x86: Don't try to free page tables is the IOMMU is not
> enabled
> xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
> xen/iommu: x86: Clear the root page-table before freeing the
> page-tables
I have committed the 3 patches.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-03-02 9:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 10:56 [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
2021-02-26 10:56 ` [PATCH for-4.15 v5 1/3] xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled Julien Grall
2021-02-26 13:27 ` Jan Beulich
2021-02-26 10:56 ` [PATCH for-4.15 v5 2/3] xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying Julien Grall
2021-02-26 13:30 ` Jan Beulich
2021-03-01 9:21 ` Julien Grall
2021-03-01 2:52 ` Tian, Kevin
2021-02-26 10:56 ` [PATCH for-4.15 v5 3/3] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
2021-03-01 2:54 ` Tian, Kevin
2021-03-02 9:56 ` [PATCH for-4.15 v5 0/3] xen/iommu: Collection of bug fixes for IOMMU teardown Julien Grall
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).