* [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn @ 2020-12-22 15:43 Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall ` (3 more replies) 0 siblings, 4 replies; 40+ messages in thread From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw) To: xen-devel Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant, Andrew Cooper, Roger Pau Monné, Wei Liu From: Julien Grall <jgrall@amazon.com> Hi all, This series is a collection of bug fixes for the IOMMU teardown code. All of them are candidate for 4.15 as they can either leak memory or lead to host crash/host corruption. This is sent directly on xen-devel because all the issues were either introduced in 4.15 or happen in the domain creation code. Cheers, Julien Grall (4): xen/iommu: Check if the IOMMU was initialized before tearing down xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables xen/iommu: x86: Don't leak the IOMMU page-tables xen/arch/x86/domain.c | 2 +- xen/drivers/passthrough/iommu.c | 10 +++++- xen/drivers/passthrough/x86/iommu.c | 47 +++++++++++++++++++++++++++-- xen/include/asm-x86/iommu.h | 2 +- 4 files changed, 56 insertions(+), 5 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall @ 2020-12-22 15:43 ` Julien Grall 2020-12-23 13:27 ` Jan Beulich 2021-01-04 9:28 ` Paul Durrant 2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall ` (2 subsequent siblings) 3 siblings, 2 replies; 40+ messages in thread From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw) To: xen-devel; +Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant From: Julien Grall <jgrall@amazon.com> is_iommu_enabled() will return true even if the IOMMU has not been initialized (e.g. the ops are not set). In the case of an early failure in arch_domain_init(), the function iommu_destroy_domain() will be called even if the IOMMU is initialized. This will result to dereference the ops which will be NULL and an host crash. Fix the issue by checking that ops has been set before accessing it. Note that we are assuming that arch_iommu_domain_init() will cleanup an intermediate failure if it failed. Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") Signed-off-by: Julien Grall <jgrall@amazon.com> --- xen/drivers/passthrough/iommu.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 2358b6eb09f4..f976d5a0b0a5 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) void iommu_domain_destroy(struct domain *d) { - if ( !is_iommu_enabled(d) ) + struct domain_iommu *hd = dom_iommu(d); + + /* + * In case of failure during the domain construction, it would be + * possible to reach this function with the IOMMU enabled but not + * yet initialized. We assume that hd->platforms will be non-NULL as + * soon as we start to initialize the IOMMU. + */ + if ( !is_iommu_enabled(d) || !hd->platform_ops ) return; iommu_teardown(d); -- 2.17.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall @ 2020-12-23 13:27 ` Jan Beulich 2020-12-23 13:50 ` Julien Grall 2021-01-04 9:28 ` Paul Durrant 1 sibling, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 13:27 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 22.12.2020 16:43, Julien Grall wrote: > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) > > void iommu_domain_destroy(struct domain *d) > { > - if ( !is_iommu_enabled(d) ) > + struct domain_iommu *hd = dom_iommu(d); > + > + /* > + * In case of failure during the domain construction, it would be > + * possible to reach this function with the IOMMU enabled but not > + * yet initialized. We assume that hd->platforms will be non-NULL as > + * soon as we start to initialize the IOMMU. > + */ > + if ( !is_iommu_enabled(d) || !hd->platform_ops ) > return; From your description I'd rather say is_iommu_enabled() is the wrong predicate to use here. IOW I'd rather see it be replaced. A couple of additional nits: "hd" isn't really necessary to have as a local variable, but if you insist on introducing it despite being used just once, it should be pointer-to-const. Plus the comment would better spell correctly the field it talks about. But I consider this comment excessive anyway, as the check of ->platform_ops is quite clear by itself. And finally "we assume" is in at least latent conflict with ->platform_ops getting set only after arch_iommu_domain_init() was called. Right now neither x86'es nor Arm's do anything that would need undoing, but I'd still suggest to replace "assume" here (if you want to keep the comment in the first place) and move the assignment up a few lines (right after the is_iommu_enabled() check there). Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-23 13:27 ` Jan Beulich @ 2020-12-23 13:50 ` Julien Grall 2020-12-23 13:59 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 13:50 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel Hi Jan, On 23/12/2020 13:27, Jan Beulich wrote: > On 22.12.2020 16:43, Julien Grall wrote: >> --- a/xen/drivers/passthrough/iommu.c >> +++ b/xen/drivers/passthrough/iommu.c >> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) >> >> void iommu_domain_destroy(struct domain *d) >> { >> - if ( !is_iommu_enabled(d) ) >> + struct domain_iommu *hd = dom_iommu(d); >> + >> + /* >> + * In case of failure during the domain construction, it would be >> + * possible to reach this function with the IOMMU enabled but not >> + * yet initialized. We assume that hd->platforms will be non-NULL as >> + * soon as we start to initialize the IOMMU. >> + */ >> + if ( !is_iommu_enabled(d) || !hd->platform_ops ) >> return; > > From your description I'd rather say is_iommu_enabled() is the > wrong predicate to use here. IOW I'd rather see it be replaced. !hd->platform_ops should be sufficient. So, I think we can drop !is_iommu_enabled(d). Would that be fine with you? > > A couple of additional nits: "hd" isn't really necessary to > have as a local variable, but if you insist on introducing it > despite being used just once, it should be pointer-to-const. > Plus the comment would better spell correctly the field it > talks about. But I consider this comment excessive anyway, as > the check of ->platform_ops is quite clear by itself. Well, I added the comment because I think check hd->platform_ops may not be that obvious (see more below). > And finally "we assume" is in at least latent conflict with > ->platform_ops getting set only after arch_iommu_domain_init() > was called. Right now neither x86'es nor Arm's do anything > that would need undoing, but I'd still suggest to replace > "assume" here (if you want to keep the comment in the first > place) and move the assignment up a few lines (right after the > is_iommu_enabled() check there). My initial implementation of this patch moved the initialization of hd->platform_ops before arch_iommu_domain_init(). However, this is going to lead to some issues with Paul's series which add an IOMMU page-table pool ([1]). The function arch_iommu_domain_init() may now fail. If we initialize hd->platform_ops earlier on, then the ->teardown() will be called before ->init(). This may be an issue if ->teardown() expects some list pointer to be initialized by ->init(). I am not aware of any today, but this seems quite risky to me. So I think it is better if we initialize hd->platform_ops after arch_iommu_domain_init() and expect the function to clean-up nything that was allocated before the error. The alternative is we set hd->platform_ops if both arch_iommu_domain_init() and ->init() have succeeded. This means they will both have to clean-up in case of an error. Any thoughts? Cheers, [1] <20201005094905.2929-6-paul@xen.org> -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-23 13:50 ` Julien Grall @ 2020-12-23 13:59 ` Jan Beulich 2020-12-23 14:51 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 13:59 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23.12.2020 14:50, Julien Grall wrote: > On 23/12/2020 13:27, Jan Beulich wrote: >> On 22.12.2020 16:43, Julien Grall wrote: >>> --- a/xen/drivers/passthrough/iommu.c >>> +++ b/xen/drivers/passthrough/iommu.c >>> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) >>> >>> void iommu_domain_destroy(struct domain *d) >>> { >>> - if ( !is_iommu_enabled(d) ) >>> + struct domain_iommu *hd = dom_iommu(d); >>> + >>> + /* >>> + * In case of failure during the domain construction, it would be >>> + * possible to reach this function with the IOMMU enabled but not >>> + * yet initialized. We assume that hd->platforms will be non-NULL as >>> + * soon as we start to initialize the IOMMU. >>> + */ >>> + if ( !is_iommu_enabled(d) || !hd->platform_ops ) >>> return; >> >> From your description I'd rather say is_iommu_enabled() is the >> wrong predicate to use here. IOW I'd rather see it be replaced. > > !hd->platform_ops should be sufficient. So, I think we can drop > !is_iommu_enabled(d). Would that be fine with you? Well, that's what I was trying to suggest. >> A couple of additional nits: "hd" isn't really necessary to >> have as a local variable, but if you insist on introducing it >> despite being used just once, it should be pointer-to-const. >> Plus the comment would better spell correctly the field it >> talks about. But I consider this comment excessive anyway, as >> the check of ->platform_ops is quite clear by itself. > > Well, I added the comment because I think check hd->platform_ops may not > be that obvious (see more below). > >> And finally "we assume" is in at least latent conflict with >> ->platform_ops getting set only after arch_iommu_domain_init() >> was called. Right now neither x86'es nor Arm's do anything >> that would need undoing, but I'd still suggest to replace >> "assume" here (if you want to keep the comment in the first >> place) and move the assignment up a few lines (right after the >> is_iommu_enabled() check there). > My initial implementation of this patch moved the initialization of > hd->platform_ops before arch_iommu_domain_init(). > > However, this is going to lead to some issues with Paul's series which > add an IOMMU page-table pool ([1]). > > The function arch_iommu_domain_init() may now fail. If we initialize > hd->platform_ops earlier on, then the ->teardown() will be called before > ->init(). > > This may be an issue if ->teardown() expects some list pointer to be > initialized by ->init(). I am not aware of any today, but this seems > quite risky to me. In such a case the obvious thing is to make the teardown handler check whether its init counterpart has run before. This will then fit our apparently much wider goal of making cleanup functions idempotent. Jan > So I think it is better if we initialize hd->platform_ops after > arch_iommu_domain_init() and expect the function to clean-up nything > that was allocated before the error. > > The alternative is we set hd->platform_ops if both > arch_iommu_domain_init() and ->init() have succeeded. This means they > will both have to clean-up in case of an error. > > Any thoughts? > > Cheers, > > [1] <20201005094905.2929-6-paul@xen.org> > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-23 13:59 ` Jan Beulich @ 2020-12-23 14:51 ` Julien Grall 2020-12-23 14:58 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 14:51 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel Hi Jan, On 23/12/2020 13:59, Jan Beulich wrote: > On 23.12.2020 14:50, Julien Grall wrote: >> On 23/12/2020 13:27, Jan Beulich wrote: >>> On 22.12.2020 16:43, Julien Grall wrote: >>>> --- a/xen/drivers/passthrough/iommu.c >>>> +++ b/xen/drivers/passthrough/iommu.c >>>> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) >>>> >>>> void iommu_domain_destroy(struct domain *d) >>>> { >>>> - if ( !is_iommu_enabled(d) ) >>>> + struct domain_iommu *hd = dom_iommu(d); >>>> + >>>> + /* >>>> + * In case of failure during the domain construction, it would be >>>> + * possible to reach this function with the IOMMU enabled but not >>>> + * yet initialized. We assume that hd->platforms will be non-NULL as >>>> + * soon as we start to initialize the IOMMU. >>>> + */ >>>> + if ( !is_iommu_enabled(d) || !hd->platform_ops ) >>>> return; >>> >>> From your description I'd rather say is_iommu_enabled() is the >>> wrong predicate to use here. IOW I'd rather see it be replaced. >> >> !hd->platform_ops should be sufficient. So, I think we can drop >> !is_iommu_enabled(d). Would that be fine with you? > > Well, that's what I was trying to suggest. > >>> A couple of additional nits: "hd" isn't really necessary to >>> have as a local variable, but if you insist on introducing it >>> despite being used just once, it should be pointer-to-const. >>> Plus the comment would better spell correctly the field it >>> talks about. But I consider this comment excessive anyway, as >>> the check of ->platform_ops is quite clear by itself. >> >> Well, I added the comment because I think check hd->platform_ops may not >> be that obvious (see more below). >> >>> And finally "we assume" is in at least latent conflict with >>> ->platform_ops getting set only after arch_iommu_domain_init() >>> was called. Right now neither x86'es nor Arm's do anything >>> that would need undoing, but I'd still suggest to replace >>> "assume" here (if you want to keep the comment in the first >>> place) and move the assignment up a few lines (right after the >>> is_iommu_enabled() check there). >> My initial implementation of this patch moved the initialization of >> hd->platform_ops before arch_iommu_domain_init(). >> >> However, this is going to lead to some issues with Paul's series which >> add an IOMMU page-table pool ([1]). >> >> The function arch_iommu_domain_init() may now fail. If we initialize >> hd->platform_ops earlier on, then the ->teardown() will be called before >> ->init(). >> >> This may be an issue if ->teardown() expects some list pointer to be >> initialized by ->init(). I am not aware of any today, but this seems >> quite risky to me. > > In such a case the obvious thing is to make the teardown handler > check whether its init counterpart has run before. This will then > fit our apparently much wider goal of making cleanup functions > idempotent. I will have a look. This may require another boolean which I wanted to avoid. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-23 14:51 ` Julien Grall @ 2020-12-23 14:58 ` Jan Beulich 0 siblings, 0 replies; 40+ messages in thread From: Jan Beulich @ 2020-12-23 14:58 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23.12.2020 15:51, Julien Grall wrote: > On 23/12/2020 13:59, Jan Beulich wrote: >> On 23.12.2020 14:50, Julien Grall wrote: >>> On 23/12/2020 13:27, Jan Beulich wrote: >>>> And finally "we assume" is in at least latent conflict with >>>> ->platform_ops getting set only after arch_iommu_domain_init() >>>> was called. Right now neither x86'es nor Arm's do anything >>>> that would need undoing, but I'd still suggest to replace >>>> "assume" here (if you want to keep the comment in the first >>>> place) and move the assignment up a few lines (right after the >>>> is_iommu_enabled() check there). >>> My initial implementation of this patch moved the initialization of >>> hd->platform_ops before arch_iommu_domain_init(). >>> >>> However, this is going to lead to some issues with Paul's series which >>> add an IOMMU page-table pool ([1]). >>> >>> The function arch_iommu_domain_init() may now fail. If we initialize >>> hd->platform_ops earlier on, then the ->teardown() will be called before >>> ->init(). >>> >>> This may be an issue if ->teardown() expects some list pointer to be >>> initialized by ->init(). I am not aware of any today, but this seems >>> quite risky to me. >> >> In such a case the obvious thing is to make the teardown handler >> check whether its init counterpart has run before. This will then >> fit our apparently much wider goal of making cleanup functions >> idempotent. > > I will have a look. This may require another boolean which I wanted to > avoid. I could imagine list_head_is_null() to become handy here. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall 2020-12-23 13:27 ` Jan Beulich @ 2021-01-04 9:28 ` Paul Durrant 2021-01-04 14:33 ` Julien Grall 1 sibling, 1 reply; 40+ messages in thread From: Paul Durrant @ 2021-01-04 9:28 UTC (permalink / raw) To: 'Julien Grall', xen-devel Cc: hongyxia, 'Julien Grall', 'Jan Beulich' > -----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: 22 December 2020 15:44 > To: xen-devel@lists.xenproject.org > Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; Jan Beulich <jbeulich@suse.com>; Paul > Durrant <paul@xen.org> > Subject: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down > > From: Julien Grall <jgrall@amazon.com> > > is_iommu_enabled() will return true even if the IOMMU has not been > initialized (e.g. the ops are not set). > > In the case of an early failure in arch_domain_init(), the function > iommu_destroy_domain() will be called even if the IOMMU is initialized. > > This will result to dereference the ops which will be NULL and an host > crash. > > Fix the issue by checking that ops has been set before accessing it. > Note that we are assuming that arch_iommu_domain_init() will cleanup an > intermediate failure if it failed. > > Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") > Signed-off-by: Julien Grall <jgrall@amazon.com> > --- > xen/drivers/passthrough/iommu.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c > index 2358b6eb09f4..f976d5a0b0a5 100644 > --- a/xen/drivers/passthrough/iommu.c > +++ b/xen/drivers/passthrough/iommu.c > @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) > > void iommu_domain_destroy(struct domain *d) > { > - if ( !is_iommu_enabled(d) ) > + struct domain_iommu *hd = dom_iommu(d); > + > + /* > + * In case of failure during the domain construction, it would be > + * possible to reach this function with the IOMMU enabled but not > + * yet initialized. We assume that hd->platforms will be non-NULL as > + * soon as we start to initialize the IOMMU. > + */ > + if ( !is_iommu_enabled(d) || !hd->platform_ops ) > return; TBH, this seems like the wrong way to fix it. The ops dereference is done in iommu_teardown() so that ought to be doing the check, but given it is single line function then it could be inlined at the same time. That said, I think arch_iommu_domain_destroy() needs to be call unconditionally as it arch_iommu_domain_init() is called before the ops pointer is set. Paul > > iommu_teardown(d); > -- > 2.17.1 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down 2021-01-04 9:28 ` Paul Durrant @ 2021-01-04 14:33 ` Julien Grall 0 siblings, 0 replies; 40+ messages in thread From: Julien Grall @ 2021-01-04 14:33 UTC (permalink / raw) To: paul, xen-devel; +Cc: hongyxia, 'Julien Grall', 'Jan Beulich' Hi Paul, On 04/01/2021 09:28, Paul Durrant wrote: >> -----Original Message----- >> From: Julien Grall <julien@xen.org> >> Sent: 22 December 2020 15:44 >> To: xen-devel@lists.xenproject.org >> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; Jan Beulich <jbeulich@suse.com>; Paul >> Durrant <paul@xen.org> >> Subject: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down >> >> From: Julien Grall <jgrall@amazon.com> >> >> is_iommu_enabled() will return true even if the IOMMU has not been >> initialized (e.g. the ops are not set). >> >> In the case of an early failure in arch_domain_init(), the function >> iommu_destroy_domain() will be called even if the IOMMU is initialized. >> >> This will result to dereference the ops which will be NULL and an host >> crash. >> >> Fix the issue by checking that ops has been set before accessing it. >> Note that we are assuming that arch_iommu_domain_init() will cleanup an >> intermediate failure if it failed. >> >> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...") >> Signed-off-by: Julien Grall <jgrall@amazon.com> >> --- >> xen/drivers/passthrough/iommu.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c >> index 2358b6eb09f4..f976d5a0b0a5 100644 >> --- a/xen/drivers/passthrough/iommu.c >> +++ b/xen/drivers/passthrough/iommu.c >> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d) >> >> void iommu_domain_destroy(struct domain *d) >> { >> - if ( !is_iommu_enabled(d) ) >> + struct domain_iommu *hd = dom_iommu(d); >> + >> + /* >> + * In case of failure during the domain construction, it would be >> + * possible to reach this function with the IOMMU enabled but not >> + * yet initialized. We assume that hd->platforms will be non-NULL as >> + * soon as we start to initialize the IOMMU. >> + */ >> + if ( !is_iommu_enabled(d) || !hd->platform_ops ) >> return; > > TBH, this seems like the wrong way to fix it. The ops dereference is done in iommu_teardown() so that ought to be doing the check, > but given it is single line function then it could be inlined at the same time. That said, I think arch_iommu_domain_destroy() needs > to be call unconditionally as it arch_iommu_domain_init() is called before the ops pointer is set. I originally resolved this way to avoid arch_iommu_domain_init() and iommu_teardown() to be called when the failure may happen before iommu_domain_init() is called. IOW, the lists/locks would be unitialized. After discussing with Jan, it turns out that we could check whether the list was initialized or not. So I have reworked the code to now call arch_iommu_domain_init() unconditionally and move the ops check in iommu_teardown(). Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall @ 2020-12-22 15:43 ` Julien Grall 2020-12-23 13:48 ` Jan Beulich 2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall 3 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw) To: xen-devel; +Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant From: Julien Grall <jgrall@amazon.com> The pgtables.lock is protecting access to the page list pgtables.list. However, iommu_free_pgtables() will not held it. I guess it was assumed that page-tables cannot be allocated while the domain is dying. Unfortunately, there is no guarantee that iommu_map() will not be called while a domain is dying (it looks like to be possible from XEN_DOMCTL_memory_mapping). So it would be possible to be concurrently allocate memory and free the page-tables. Therefore, we need to held the lock when freeing the page tables. There are more issues around the IOMMU page-allocator. They will be handled in follow-up patches. Signed-off-by: Julien Grall <jgrall@amazon.com> --- xen/drivers/passthrough/x86/iommu.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index cea1032b3d02..779dbb5b98ba 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -267,13 +267,18 @@ int iommu_free_pgtables(struct domain *d) struct page_info *pg; unsigned int done = 0; + spin_lock(&hd->arch.pgtables.lock); while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) ) { free_domheap_page(pg); if ( !(++done & 0xff) && general_preempt_check() ) + { + spin_unlock(&hd->arch.pgtables.lock); return -ERESTART; + } } + spin_unlock(&hd->arch.pgtables.lock); return 0; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall @ 2020-12-23 13:48 ` Jan Beulich 2020-12-23 14:01 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 13:48 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 22.12.2020 16:43, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > The pgtables.lock is protecting access to the page list pgtables.list. > However, iommu_free_pgtables() will not held it. I guess it was assumed > that page-tables cannot be allocated while the domain is dying. > > Unfortunately, there is no guarantee that iommu_map() will not be > called while a domain is dying (it looks like to be possible from > XEN_DOMCTL_memory_mapping). I'd rather disallow any new allocations for a dying domain, not the least because ... > So it would be possible to be concurrently > allocate memory and free the page-tables. > > Therefore, we need to held the lock when freeing the page tables. ... we should try to avoid holding locks across allocation / freeing functions wherever possible. As to where to place a respective check - I wonder if we wouldn't be better off disallowing a majority of domctl-s (and perhaps other operations) on dying domains. Thoughts? Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2020-12-23 13:48 ` Jan Beulich @ 2020-12-23 14:01 ` Julien Grall 2020-12-23 14:16 ` Jan Beulich 2021-01-14 19:19 ` Julien Grall 0 siblings, 2 replies; 40+ messages in thread From: Julien Grall @ 2020-12-23 14:01 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel Hi Jan, On 23/12/2020 13:48, Jan Beulich wrote: > On 22.12.2020 16:43, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> The pgtables.lock is protecting access to the page list pgtables.list. >> However, iommu_free_pgtables() will not held it. I guess it was assumed >> that page-tables cannot be allocated while the domain is dying. >> >> Unfortunately, there is no guarantee that iommu_map() will not be >> called while a domain is dying (it looks like to be possible from >> XEN_DOMCTL_memory_mapping). > > I'd rather disallow any new allocations for a dying domain, not > the least because ... Patch #4 will disallow such allocation. However... > >> So it would be possible to be concurrently >> allocate memory and free the page-tables. >> >> Therefore, we need to held the lock when freeing the page tables. > > ... we should try to avoid holding locks across allocation / > freeing functions wherever possible. > > As to where to place a respective check - I wonder if we wouldn't > be better off disallowing a majority of domctl-s (and perhaps > other operations) on dying domains. Thoughts? ... this is still pretty racy because you need to guarantee that d->is_dying is seen by the other processors to prevent allocation. As to whether we can forbid most of the domctl-s, I would agree this is a good move. But this doesn't remove the underlying problem here. We are hoping that a top-level function will protect us against the race. Given the IOMMU code is quite deep in the callstack, this is something pretty hard to guarantee with future change. So I still think we need a way to mitigate the issue. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2020-12-23 14:01 ` Julien Grall @ 2020-12-23 14:16 ` Jan Beulich 2021-01-14 19:19 ` Julien Grall 1 sibling, 0 replies; 40+ messages in thread From: Jan Beulich @ 2020-12-23 14:16 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23.12.2020 15:01, Julien Grall wrote: > Hi Jan, > > On 23/12/2020 13:48, Jan Beulich wrote: >> On 22.12.2020 16:43, Julien Grall wrote: >>> From: Julien Grall <jgrall@amazon.com> >>> >>> The pgtables.lock is protecting access to the page list pgtables.list. >>> However, iommu_free_pgtables() will not held it. I guess it was assumed >>> that page-tables cannot be allocated while the domain is dying. >>> >>> Unfortunately, there is no guarantee that iommu_map() will not be >>> called while a domain is dying (it looks like to be possible from >>> XEN_DOMCTL_memory_mapping). >> >> I'd rather disallow any new allocations for a dying domain, not >> the least because ... > > Patch #4 will disallow such allocation. However... > >> >>> So it would be possible to be concurrently >>> allocate memory and free the page-tables. >>> >>> Therefore, we need to held the lock when freeing the page tables. >> >> ... we should try to avoid holding locks across allocation / >> freeing functions wherever possible. > >> As to where to place a respective check - I wonder if we wouldn't >> be better off disallowing a majority of domctl-s (and perhaps >> other operations) on dying domains. Thoughts? > > ... this is still pretty racy because you need to guarantee that > d->is_dying is seen by the other processors to prevent allocation. The function freeing the page tables will need a spin_barrier() or alike similar to evtchn_destroy(). Aiui this will eliminate all potential for races. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2020-12-23 14:01 ` Julien Grall 2020-12-23 14:16 ` Jan Beulich @ 2021-01-14 19:19 ` Julien Grall 2021-01-15 11:06 ` Jan Beulich 1 sibling, 1 reply; 40+ messages in thread From: Julien Grall @ 2021-01-14 19:19 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23/12/2020 14:01, Julien Grall wrote: > Hi Jan, > > On 23/12/2020 13:48, Jan Beulich wrote: >> On 22.12.2020 16:43, Julien Grall wrote: >>> From: Julien Grall <jgrall@amazon.com> >>> >>> The pgtables.lock is protecting access to the page list pgtables.list. >>> However, iommu_free_pgtables() will not held it. I guess it was assumed >>> that page-tables cannot be allocated while the domain is dying. >>> >>> Unfortunately, there is no guarantee that iommu_map() will not be >>> called while a domain is dying (it looks like to be possible from >>> XEN_DOMCTL_memory_mapping). >> >> I'd rather disallow any new allocations for a dying domain, not >> the least because ... > > Patch #4 will disallow such allocation. However... It turns out that I can't disallow it because there is at least one call of iommu_map() while an HVM domain is destroyed: (XEN) [<ffff82d04026e399>] R iommu_map+0xf2/0x171 (XEN) [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63 (XEN) [<ffff82d040302a42>] F arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730 (XEN) [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128 (XEN) [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0 (XEN) [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe (XEN) [<ffff82d0402ba0a2>] F arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d (XEN) [<ffff82d0402ba0f9>] F arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f (XEN) [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b (XEN) [<ffff82d0402afc15>] F hvm_domain_relinquish_resources+0x3e/0x92 (XEN) [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372 (XEN) [<ffff82d040205dd4>] F domain_kill+0xc7/0x150 (XEN) [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09 (XEN) [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f (XEN) [<ffff82d040391432>] F lstar_enter+0x112/0x120 This one resulted to a domain crash rather than a clean destruction. I would still like to disallow page-table allocation, so I am think to ignore any request in iommu_map() if the domain is dying. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2021-01-14 19:19 ` Julien Grall @ 2021-01-15 11:06 ` Jan Beulich 2021-01-15 15:18 ` Paul Durrant 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2021-01-15 11:06 UTC (permalink / raw) To: Julien Grall, Paul Durrant Cc: hongyxia, Julien Grall, xen-devel, Andrew Cooper On 14.01.2021 20:19, Julien Grall wrote: > > > On 23/12/2020 14:01, Julien Grall wrote: >> Hi Jan, >> >> On 23/12/2020 13:48, Jan Beulich wrote: >>> On 22.12.2020 16:43, Julien Grall wrote: >>>> From: Julien Grall <jgrall@amazon.com> >>>> >>>> The pgtables.lock is protecting access to the page list pgtables.list. >>>> However, iommu_free_pgtables() will not held it. I guess it was assumed >>>> that page-tables cannot be allocated while the domain is dying. >>>> >>>> Unfortunately, there is no guarantee that iommu_map() will not be >>>> called while a domain is dying (it looks like to be possible from >>>> XEN_DOMCTL_memory_mapping). >>> >>> I'd rather disallow any new allocations for a dying domain, not >>> the least because ... >> >> Patch #4 will disallow such allocation. However... > > It turns out that I can't disallow it because there is at least one call > of iommu_map() while an HVM domain is destroyed: > > (XEN) [<ffff82d04026e399>] R iommu_map+0xf2/0x171 > (XEN) [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63 > (XEN) [<ffff82d040302a42>] F > arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730 > (XEN) [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128 > (XEN) [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0 > (XEN) [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe > (XEN) [<ffff82d0402ba0a2>] F > arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d > (XEN) [<ffff82d0402ba0f9>] F > arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f > (XEN) [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b > (XEN) [<ffff82d0402afc15>] F > hvm_domain_relinquish_resources+0x3e/0x92 > (XEN) [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372 > (XEN) [<ffff82d040205dd4>] F domain_kill+0xc7/0x150 > (XEN) [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09 > (XEN) [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f > (XEN) [<ffff82d040391432>] F lstar_enter+0x112/0x120 > > This one resulted to a domain crash rather than a clean destruction. A domain crash despite the domain already getting cleaned up is something we may at least want to consider doing better: There already is a !d->is_shutting_down conditional printk() there. What would people think about avoiding the domain_crash() call in similar ways? (It could even be considered to make domain_crash() itself behave like this, but such a step may make it necessary to first audit all use sites.) > I would still like to disallow page-table allocation, so I am think to > ignore any request in iommu_map() if the domain is dying. Ignoring requests there seems fragile to me. Paul - what are your thoughts about bailing early from hvm_add_ioreq_gfn() when the domain is dying? Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held 2021-01-15 11:06 ` Jan Beulich @ 2021-01-15 15:18 ` Paul Durrant 0 siblings, 0 replies; 40+ messages in thread From: Paul Durrant @ 2021-01-15 15:18 UTC (permalink / raw) To: 'Jan Beulich', 'Julien Grall' Cc: hongyxia, 'Julien Grall', xen-devel, 'Andrew Cooper' > -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 15 January 2021 11:07 > To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org> > Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org; Andrew > Cooper <andrew.cooper3@citrix.com> > Subject: Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock > held > > On 14.01.2021 20:19, Julien Grall wrote: > > > > > > On 23/12/2020 14:01, Julien Grall wrote: > >> Hi Jan, > >> > >> On 23/12/2020 13:48, Jan Beulich wrote: > >>> On 22.12.2020 16:43, Julien Grall wrote: > >>>> From: Julien Grall <jgrall@amazon.com> > >>>> > >>>> The pgtables.lock is protecting access to the page list pgtables.list. > >>>> However, iommu_free_pgtables() will not held it. I guess it was assumed > >>>> that page-tables cannot be allocated while the domain is dying. > >>>> > >>>> Unfortunately, there is no guarantee that iommu_map() will not be > >>>> called while a domain is dying (it looks like to be possible from > >>>> XEN_DOMCTL_memory_mapping). > >>> > >>> I'd rather disallow any new allocations for a dying domain, not > >>> the least because ... > >> > >> Patch #4 will disallow such allocation. However... > > > > It turns out that I can't disallow it because there is at least one call > > of iommu_map() while an HVM domain is destroyed: > > > > (XEN) [<ffff82d04026e399>] R iommu_map+0xf2/0x171 > > (XEN) [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63 > > (XEN) [<ffff82d040302a42>] F > > arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730 > > (XEN) [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128 > > (XEN) [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0 > > (XEN) [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe > > (XEN) [<ffff82d0402ba0a2>] F > > arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d > > (XEN) [<ffff82d0402ba0f9>] F > > arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f > > (XEN) [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b > > (XEN) [<ffff82d0402afc15>] F > > hvm_domain_relinquish_resources+0x3e/0x92 > > (XEN) [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372 > > (XEN) [<ffff82d040205dd4>] F domain_kill+0xc7/0x150 > > (XEN) [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09 > > (XEN) [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f > > (XEN) [<ffff82d040391432>] F lstar_enter+0x112/0x120 > > > > This one resulted to a domain crash rather than a clean destruction. > > A domain crash despite the domain already getting cleaned up is > something we may at least want to consider doing better: There > already is a !d->is_shutting_down conditional printk() there. > What would people think about avoiding the domain_crash() call > in similar ways? (It could even be considered to make > domain_crash() itself behave like this, but such a step may make > it necessary to first audit all use sites.) > > > I would still like to disallow page-table allocation, so I am think to > > ignore any request in iommu_map() if the domain is dying. > > Ignoring requests there seems fragile to me. Paul - what are your > thoughts about bailing early from hvm_add_ioreq_gfn() when the > domain is dying? > I think that's ok. Really, the only reason for putting the pages back in the P2M is to allow migration to work so if the domain is dying then we don't really care do we? Paul > Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall @ 2020-12-22 15:43 ` Julien Grall 2020-12-23 14:12 ` Jan Beulich 2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall 3 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw) To: xen-devel; +Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant From: Julien Grall <jgrall@amazon.com> The new per-domain IOMMU page-table allocator will now free the page-tables when domain's resources are relinquished. However, the root page-table (i.e. hd->arch.pg_maddr) will not be cleared. Xen may access the IOMMU page-tables afterwards at least in the case of PV domain: (XEN) Xen call trace: (XEN) [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8 (XEN) [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8 (XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129 (XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63 (XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144 (XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3 (XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b (XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 (XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf (XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 (XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf (XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 (XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d (XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e (XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15 (XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9 (XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a (XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141 (XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5 (XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f (XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120 This will result to a use after-free and possibly an host crash or memory corruption. Freeing the page-tables further down in domain_relinquish_resources() would not work because pages may not be released until later if another domain hold a reference on them. Once all the PCI devices have been de-assigned, it is actually pointless to access modify the IOMMU page-tables. So we can simply clear the root page-table address. Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator") Signed-off-by: Julien Grall <jgrall@amazon.com> --- This is an RFC because it would break AMD IOMMU driver. One option would be to move the call to the teardown callback earlier on. Any opinions? --- xen/drivers/passthrough/x86/iommu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 779dbb5b98ba..99a23177b3d2 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -267,6 +267,16 @@ int iommu_free_pgtables(struct domain *d) struct page_info *pg; unsigned int done = 0; + /* + * Pages will be moved to the free list in a bit. So we want to + * clear the root page-table to avoid any potential use after-free. + * + * XXX: This only code works for Intel vT-D. + */ + spin_lock(&hd->arch.mapping_lock); + hd->arch.vtd.pgd_maddr = 0; + spin_unlock(&hd->arch.mapping_lock); + spin_lock(&hd->arch.pgtables.lock); while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) ) { -- 2.17.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall @ 2020-12-23 14:12 ` Jan Beulich 2020-12-23 14:56 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 14:12 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 22.12.2020 16:43, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > The new per-domain IOMMU page-table allocator will now free the > page-tables when domain's resources are relinquished. However, the root > page-table (i.e. hd->arch.pg_maddr) will not be cleared. > > Xen may access the IOMMU page-tables afterwards at least in the case of > PV domain: > > (XEN) Xen call trace: > (XEN) [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8 > (XEN) [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8 > (XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129 > (XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63 > (XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144 > (XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3 > (XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b > (XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 > (XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf > (XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 > (XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf > (XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 > (XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d > (XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc > (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e > (XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15 > (XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9 > (XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a > (XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141 > (XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5 > (XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f > (XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120 > > This will result to a use after-free and possibly an host crash or > memory corruption. > > Freeing the page-tables further down in domain_relinquish_resources() > would not work because pages may not be released until later if another > domain hold a reference on them. > > Once all the PCI devices have been de-assigned, it is actually pointless > to access modify the IOMMU page-tables. So we can simply clear the root > page-table address. > > Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator") > Signed-off-by: Julien Grall <jgrall@amazon.com> > > --- > > This is an RFC because it would break AMD IOMMU driver. One option would > be to move the call to the teardown callback earlier on. Any opinions? We already have static void amd_iommu_domain_destroy(struct domain *d) { dom_iommu(d)->arch.amd.root_table = NULL; } and this function is AMD's teardown handler. Hence I suppose doing the same for VT-d would be quite reasonable. And really VT-d's iommu_domain_teardown() also already has hd->arch.vtd.pgd_maddr = 0; I guess what's missing is prevention of the root table getting re-setup. This, I guess, would be helped by the previously suggested preventing of any further modifications to the p2m and alike for dying domains. Note how e.g. the handling of XEN_DOMCTL_assign_device already includes a "dying" check. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 14:12 ` Jan Beulich @ 2020-12-23 14:56 ` Julien Grall 2020-12-23 15:00 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 14:56 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel Hi Jan, On 23/12/2020 14:12, Jan Beulich wrote: > On 22.12.2020 16:43, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> The new per-domain IOMMU page-table allocator will now free the >> page-tables when domain's resources are relinquished. However, the root >> page-table (i.e. hd->arch.pg_maddr) will not be cleared. >> >> Xen may access the IOMMU page-tables afterwards at least in the case of >> PV domain: >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8 >> (XEN) [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8 >> (XEN) [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129 >> (XEN) [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63 >> (XEN) [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144 >> (XEN) [<ffff82d04033c61d>] F put_page+0x4b/0xb3 >> (XEN) [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b >> (XEN) [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 >> (XEN) [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf >> (XEN) [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 >> (XEN) [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf >> (XEN) [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80 >> (XEN) [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d >> (XEN) [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc >> (XEN) [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e >> (XEN) [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15 >> (XEN) [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9 >> (XEN) [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a >> (XEN) [<ffff82d040205cdf>] F domain_kill+0xb8/0x141 >> (XEN) [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5 >> (XEN) [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f >> (XEN) [<ffff82d04039b432>] F lstar_enter+0x112/0x120 >> >> This will result to a use after-free and possibly an host crash or >> memory corruption. >> >> Freeing the page-tables further down in domain_relinquish_resources() >> would not work because pages may not be released until later if another >> domain hold a reference on them. >> >> Once all the PCI devices have been de-assigned, it is actually pointless >> to access modify the IOMMU page-tables. So we can simply clear the root >> page-table address. >> >> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator") >> Signed-off-by: Julien Grall <jgrall@amazon.com> >> >> --- >> >> This is an RFC because it would break AMD IOMMU driver. One option would >> be to move the call to the teardown callback earlier on. Any opinions? > > We already have > > static void amd_iommu_domain_destroy(struct domain *d) > { > dom_iommu(d)->arch.amd.root_table = NULL; > } > > and this function is AMD's teardown handler. Hence I suppose > doing the same for VT-d would be quite reasonable. And really > VT-d's iommu_domain_teardown() also already has > > hd->arch.vtd.pgd_maddr = 0; Let me have a look if that works. > > I guess what's missing is prevention of the root table > getting re-setup. This is taken care in the follow-up patch by forbidding page-table allocation. I can mention it in the commit message. > This, I guess, would be helped by the > previously suggested preventing of any further modifications > to the p2m and alike for dying domains. Note how e.g. the > handling of XEN_DOMCTL_assign_device already includes a > "dying" check. > > Jan > -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 14:56 ` Julien Grall @ 2020-12-23 15:00 ` Jan Beulich 2020-12-23 15:16 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 15:00 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23.12.2020 15:56, Julien Grall wrote: > On 23/12/2020 14:12, Jan Beulich wrote: >> On 22.12.2020 16:43, Julien Grall wrote: >>> This is an RFC because it would break AMD IOMMU driver. One option would >>> be to move the call to the teardown callback earlier on. Any opinions? >> >> We already have >> >> static void amd_iommu_domain_destroy(struct domain *d) >> { >> dom_iommu(d)->arch.amd.root_table = NULL; >> } >> >> and this function is AMD's teardown handler. Hence I suppose >> doing the same for VT-d would be quite reasonable. And really >> VT-d's iommu_domain_teardown() also already has >> >> hd->arch.vtd.pgd_maddr = 0; > > Let me have a look if that works. > >> >> I guess what's missing is prevention of the root table >> getting re-setup. > > This is taken care in the follow-up patch by forbidding page-table > allocation. I can mention it in the commit message. My expectation is that with that subsequent change the change here (or any variant of it) would become unnecessary. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 15:00 ` Jan Beulich @ 2020-12-23 15:16 ` Julien Grall 2020-12-23 16:11 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 15:16 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel Hi Jan, On 23/12/2020 15:00, Jan Beulich wrote: > On 23.12.2020 15:56, Julien Grall wrote: >> On 23/12/2020 14:12, Jan Beulich wrote: >>> On 22.12.2020 16:43, Julien Grall wrote: >>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>> be to move the call to the teardown callback earlier on. Any opinions? >>> >>> We already have >>> >>> static void amd_iommu_domain_destroy(struct domain *d) >>> { >>> dom_iommu(d)->arch.amd.root_table = NULL; >>> } >>> >>> and this function is AMD's teardown handler. Hence I suppose >>> doing the same for VT-d would be quite reasonable. And really >>> VT-d's iommu_domain_teardown() also already has >>> >>> hd->arch.vtd.pgd_maddr = 0; >> >> Let me have a look if that works. >> >>> >>> I guess what's missing is prevention of the root table >>> getting re-setup. >> >> This is taken care in the follow-up patch by forbidding page-table >> allocation. I can mention it in the commit message. > > My expectation is that with that subsequent change the change here > (or any variant of it) would become unnecessary. I am not be sure. iommu_unmap() would still get called from put_page(). Are you suggesting to gate the code if d->is_dying as well? Even if this patch is deemed to be unecessary to fix the issue. This issue was quite hard to chase/reproduce. I think it would still be good to harden the code by zeroing hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the pointer was still "valid". Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 15:16 ` Julien Grall @ 2020-12-23 16:11 ` Jan Beulich 2020-12-23 16:16 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 16:11 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23.12.2020 16:16, Julien Grall wrote: > On 23/12/2020 15:00, Jan Beulich wrote: >> On 23.12.2020 15:56, Julien Grall wrote: >>> On 23/12/2020 14:12, Jan Beulich wrote: >>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>> >>>> We already have >>>> >>>> static void amd_iommu_domain_destroy(struct domain *d) >>>> { >>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>> } >>>> >>>> and this function is AMD's teardown handler. Hence I suppose >>>> doing the same for VT-d would be quite reasonable. And really >>>> VT-d's iommu_domain_teardown() also already has >>>> >>>> hd->arch.vtd.pgd_maddr = 0; >>> >>> Let me have a look if that works. >>> >>>> >>>> I guess what's missing is prevention of the root table >>>> getting re-setup. >>> >>> This is taken care in the follow-up patch by forbidding page-table >>> allocation. I can mention it in the commit message. >> >> My expectation is that with that subsequent change the change here >> (or any variant of it) would become unnecessary. > > I am not be sure. iommu_unmap() would still get called from put_page(). > Are you suggesting to gate the code if d->is_dying as well? Unmap shouldn't be allocating any memory right now, as in non-shared-page-table mode we don't install any superpages (unless I misremember). > Even if this patch is deemed to be unecessary to fix the issue. > This issue was quite hard to chase/reproduce. > > I think it would still be good to harden the code by zeroing > hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the > pointer was still "valid". But my point was that this zeroing already happens. What I suspect is that it gets re-populated after it was zeroed, because of page table manipulation that shouldn't be occurring anymore for a dying domain. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:11 ` Jan Beulich @ 2020-12-23 16:16 ` Julien Grall 2020-12-23 16:24 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 16:16 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel Hi, On 23/12/2020 16:11, Jan Beulich wrote: > On 23.12.2020 16:16, Julien Grall wrote: >> On 23/12/2020 15:00, Jan Beulich wrote: >>> On 23.12.2020 15:56, Julien Grall wrote: >>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>>> >>>>> We already have >>>>> >>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>> { >>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>> } >>>>> >>>>> and this function is AMD's teardown handler. Hence I suppose >>>>> doing the same for VT-d would be quite reasonable. And really >>>>> VT-d's iommu_domain_teardown() also already has >>>>> >>>>> hd->arch.vtd.pgd_maddr = 0; >>>> >>>> Let me have a look if that works. >>>> >>>>> >>>>> I guess what's missing is prevention of the root table >>>>> getting re-setup. >>>> >>>> This is taken care in the follow-up patch by forbidding page-table >>>> allocation. I can mention it in the commit message. >>> >>> My expectation is that with that subsequent change the change here >>> (or any variant of it) would become unnecessary. >> >> I am not be sure. iommu_unmap() would still get called from put_page(). >> Are you suggesting to gate the code if d->is_dying as well? > > Unmap shouldn't be allocating any memory right now, as in > non-shared-page-table mode we don't install any superpages > (unless I misremember). It doesn't allocate memory, but it will try to access the IOMMU page-tables (see more below). > >> Even if this patch is deemed to be unecessary to fix the issue. >> This issue was quite hard to chase/reproduce. >> >> I think it would still be good to harden the code by zeroing >> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >> pointer was still "valid". > > But my point was that this zeroing already happens. > What I > suspect is that it gets re-populated after it was zeroed, > because of page table manipulation that shouldn't be > occurring anymore for a dying domain. AFAICT, the zeroing is happening in ->teardown() helper. It is only called when the domain is fully destroyed (see call in arch_domain_destroy()). This will happen much after relinquishing the code. Could you clarify why you think it is already zeroed and by who? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:16 ` Julien Grall @ 2020-12-23 16:24 ` Jan Beulich 2020-12-23 16:29 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 16:24 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23.12.2020 17:16, Julien Grall wrote: > On 23/12/2020 16:11, Jan Beulich wrote: >> On 23.12.2020 16:16, Julien Grall wrote: >>> On 23/12/2020 15:00, Jan Beulich wrote: >>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>> be to move the call to the teardown callback earlier on. Any opinions? Please note this (in your original submission). I simply ... >>>>>> We already have >>>>>> >>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>> { >>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>> } >>>>>> >>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>> VT-d's iommu_domain_teardown() also already has >>>>>> >>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>> >>>>> Let me have a look if that works. >>>>> >>>>>> >>>>>> I guess what's missing is prevention of the root table >>>>>> getting re-setup. >>>>> >>>>> This is taken care in the follow-up patch by forbidding page-table >>>>> allocation. I can mention it in the commit message. >>>> >>>> My expectation is that with that subsequent change the change here >>>> (or any variant of it) would become unnecessary. >>> >>> I am not be sure. iommu_unmap() would still get called from put_page(). >>> Are you suggesting to gate the code if d->is_dying as well? >> >> Unmap shouldn't be allocating any memory right now, as in >> non-shared-page-table mode we don't install any superpages >> (unless I misremember). > > It doesn't allocate memory, but it will try to access the IOMMU > page-tables (see more below). > >> >>> Even if this patch is deemed to be unecessary to fix the issue. >>> This issue was quite hard to chase/reproduce. >>> >>> I think it would still be good to harden the code by zeroing >>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>> pointer was still "valid". >> >> But my point was that this zeroing already happens. >> What I >> suspect is that it gets re-populated after it was zeroed, >> because of page table manipulation that shouldn't be >> occurring anymore for a dying domain. > > AFAICT, the zeroing is happening in ->teardown() helper. > > It is only called when the domain is fully destroyed (see call in > arch_domain_destroy()). This will happen much after relinquishing the code. > > Could you clarify why you think it is already zeroed and by who? ... trusted you on what you stated there. But perhaps I somehow misunderstood that sentence to mean you want to put your addition into the teardown functions, when apparently you meant to invoke them earlier in the process. Without clearly identifying why this would be a safe thing to do, I couldn't imagine that's what you suggest as alternative. This is because the interdependencies of the IOMMU code are pretty hard to follow at times, and hence any such re-ordering has a fair risk of breaking something elsewhere. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:24 ` Jan Beulich @ 2020-12-23 16:29 ` Julien Grall 2020-12-23 16:46 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 16:29 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel On 23/12/2020 16:24, Jan Beulich wrote: > On 23.12.2020 17:16, Julien Grall wrote: >> On 23/12/2020 16:11, Jan Beulich wrote: >>> On 23.12.2020 16:16, Julien Grall wrote: >>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? > > Please note this (in your original submission). I simply ... > >>>>>>> We already have >>>>>>> >>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>> { >>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>> } >>>>>>> >>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>> >>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>> >>>>>> Let me have a look if that works. >>>>>> >>>>>>> >>>>>>> I guess what's missing is prevention of the root table >>>>>>> getting re-setup. >>>>>> >>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>> allocation. I can mention it in the commit message. >>>>> >>>>> My expectation is that with that subsequent change the change here >>>>> (or any variant of it) would become unnecessary. >>>> >>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>> Are you suggesting to gate the code if d->is_dying as well? >>> >>> Unmap shouldn't be allocating any memory right now, as in >>> non-shared-page-table mode we don't install any superpages >>> (unless I misremember). >> >> It doesn't allocate memory, but it will try to access the IOMMU >> page-tables (see more below). >> >>> >>>> Even if this patch is deemed to be unecessary to fix the issue. >>>> This issue was quite hard to chase/reproduce. >>>> >>>> I think it would still be good to harden the code by zeroing >>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>> pointer was still "valid". >>> >>> But my point was that this zeroing already happens. >>> What I >>> suspect is that it gets re-populated after it was zeroed, >>> because of page table manipulation that shouldn't be >>> occurring anymore for a dying domain. >> >> AFAICT, the zeroing is happening in ->teardown() helper. >> >> It is only called when the domain is fully destroyed (see call in >> arch_domain_destroy()). This will happen much after relinquishing the code. >> >> Could you clarify why you think it is already zeroed and by who? > > ... trusted you on what you stated there. But perhaps I somehow > misunderstood that sentence to mean you want to put your addition > into the teardown functions, when apparently you meant to invoke > them earlier in the process. Without clearly identifying why this > would be a safe thing to do, I couldn't imagine that's what you > suggest as alternative. This was a wording issue. I meant moving ->teardown() before (or calling from) iommu_free_pgtables(). Shall I introduce a new callback then? > This is because the interdependencies of > the IOMMU code are pretty hard to follow at times, and hence any > such re-ordering has a fair risk of breaking something elsewhere. Right, this is another reason to try to get most of my fix self-contained rather relying on top-layer call to protect against a domain dying. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:29 ` Julien Grall @ 2020-12-23 16:46 ` Jan Beulich 2020-12-23 16:54 ` Julien Grall 2021-01-04 9:53 ` Paul Durrant 0 siblings, 2 replies; 40+ messages in thread From: Jan Beulich @ 2020-12-23 16:46 UTC (permalink / raw) To: Julien Grall, Paul Durrant; +Cc: hongyxia, Julien Grall, xen-devel On 23.12.2020 17:29, Julien Grall wrote: > On 23/12/2020 16:24, Jan Beulich wrote: >> On 23.12.2020 17:16, Julien Grall wrote: >>> On 23/12/2020 16:11, Jan Beulich wrote: >>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >> >> Please note this (in your original submission). I simply ... >> >>>>>>>> We already have >>>>>>>> >>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>> { >>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>> } >>>>>>>> >>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>> >>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>> >>>>>>> Let me have a look if that works. >>>>>>> >>>>>>>> >>>>>>>> I guess what's missing is prevention of the root table >>>>>>>> getting re-setup. >>>>>>> >>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>> allocation. I can mention it in the commit message. >>>>>> >>>>>> My expectation is that with that subsequent change the change here >>>>>> (or any variant of it) would become unnecessary. >>>>> >>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>> Are you suggesting to gate the code if d->is_dying as well? >>>> >>>> Unmap shouldn't be allocating any memory right now, as in >>>> non-shared-page-table mode we don't install any superpages >>>> (unless I misremember). >>> >>> It doesn't allocate memory, but it will try to access the IOMMU >>> page-tables (see more below). >>> >>>> >>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>> This issue was quite hard to chase/reproduce. >>>>> >>>>> I think it would still be good to harden the code by zeroing >>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>> pointer was still "valid". >>>> >>>> But my point was that this zeroing already happens. >>>> What I >>>> suspect is that it gets re-populated after it was zeroed, >>>> because of page table manipulation that shouldn't be >>>> occurring anymore for a dying domain. >>> >>> AFAICT, the zeroing is happening in ->teardown() helper. >>> >>> It is only called when the domain is fully destroyed (see call in >>> arch_domain_destroy()). This will happen much after relinquishing the code. >>> >>> Could you clarify why you think it is already zeroed and by who? >> >> ... trusted you on what you stated there. But perhaps I somehow >> misunderstood that sentence to mean you want to put your addition >> into the teardown functions, when apparently you meant to invoke >> them earlier in the process. Without clearly identifying why this >> would be a safe thing to do, I couldn't imagine that's what you >> suggest as alternative. > > This was a wording issue. I meant moving ->teardown() before (or calling > from) iommu_free_pgtables(). > > Shall I introduce a new callback then? Earlier zeroing won't help unless you prevent re-population, or unless you make the code capable of telling "still zero" from "already zero". But I have to admit I'd like to also have Paul's opinion on the matter. Jan >> This is because the interdependencies of >> the IOMMU code are pretty hard to follow at times, and hence any >> such re-ordering has a fair risk of breaking something elsewhere. > > Right, this is another reason to try to get most of my fix > self-contained rather relying on top-layer call to protect against a > domain dying. > > Cheers, > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:46 ` Jan Beulich @ 2020-12-23 16:54 ` Julien Grall 2020-12-23 17:02 ` Jan Beulich 2021-01-04 9:53 ` Paul Durrant 1 sibling, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 16:54 UTC (permalink / raw) To: Jan Beulich, Paul Durrant; +Cc: hongyxia, Julien Grall, xen-devel On 23/12/2020 16:46, Jan Beulich wrote: > On 23.12.2020 17:29, Julien Grall wrote: >> On 23/12/2020 16:24, Jan Beulich wrote: >>> On 23.12.2020 17:16, Julien Grall wrote: >>>> On 23/12/2020 16:11, Jan Beulich wrote: >>>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>> >>> Please note this (in your original submission). I simply ... >>> >>>>>>>>> We already have >>>>>>>>> >>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>>> { >>>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>>> } >>>>>>>>> >>>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>>> >>>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>>> >>>>>>>> Let me have a look if that works. >>>>>>>> >>>>>>>>> >>>>>>>>> I guess what's missing is prevention of the root table >>>>>>>>> getting re-setup. >>>>>>>> >>>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>>> allocation. I can mention it in the commit message. >>>>>>> >>>>>>> My expectation is that with that subsequent change the change here >>>>>>> (or any variant of it) would become unnecessary. >>>>>> >>>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>>> Are you suggesting to gate the code if d->is_dying as well? >>>>> >>>>> Unmap shouldn't be allocating any memory right now, as in >>>>> non-shared-page-table mode we don't install any superpages >>>>> (unless I misremember). >>>> >>>> It doesn't allocate memory, but it will try to access the IOMMU >>>> page-tables (see more below). >>>> >>>>> >>>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>>> This issue was quite hard to chase/reproduce. >>>>>> >>>>>> I think it would still be good to harden the code by zeroing >>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>>> pointer was still "valid". >>>>> >>>>> But my point was that this zeroing already happens. >>>>> What I >>>>> suspect is that it gets re-populated after it was zeroed, >>>>> because of page table manipulation that shouldn't be >>>>> occurring anymore for a dying domain. >>>> >>>> AFAICT, the zeroing is happening in ->teardown() helper. >>>> >>>> It is only called when the domain is fully destroyed (see call in >>>> arch_domain_destroy()). This will happen much after relinquishing the code. >>>> >>>> Could you clarify why you think it is already zeroed and by who? >>> >>> ... trusted you on what you stated there. But perhaps I somehow >>> misunderstood that sentence to mean you want to put your addition >>> into the teardown functions, when apparently you meant to invoke >>> them earlier in the process. Without clearly identifying why this >>> would be a safe thing to do, I couldn't imagine that's what you >>> suggest as alternative. >> >> This was a wording issue. I meant moving ->teardown() before (or calling >> from) iommu_free_pgtables(). >> >> Shall I introduce a new callback then? > > Earlier zeroing won't help unless you prevent re-population, or > unless you make the code capable of telling "still zero" from > "already zero". But I have to admit I'd like to also have Paul's > opinion on the matter. Patch #4 is meant to prevent that with the d->is_dying check in the IOMMU page-table allocation. Do you think this is not enough? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:54 ` Julien Grall @ 2020-12-23 17:02 ` Jan Beulich 2020-12-23 17:26 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 17:02 UTC (permalink / raw) To: Julien Grall; +Cc: hongyxia, Julien Grall, xen-devel, Paul Durrant On 23.12.2020 17:54, Julien Grall wrote: > > > On 23/12/2020 16:46, Jan Beulich wrote: >> On 23.12.2020 17:29, Julien Grall wrote: >>> On 23/12/2020 16:24, Jan Beulich wrote: >>>> On 23.12.2020 17:16, Julien Grall wrote: >>>>> On 23/12/2020 16:11, Jan Beulich wrote: >>>>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>> >>>> Please note this (in your original submission). I simply ... >>>> >>>>>>>>>> We already have >>>>>>>>>> >>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>>>> { >>>>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>>>> >>>>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>>>> >>>>>>>>> Let me have a look if that works. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I guess what's missing is prevention of the root table >>>>>>>>>> getting re-setup. >>>>>>>>> >>>>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>>>> allocation. I can mention it in the commit message. >>>>>>>> >>>>>>>> My expectation is that with that subsequent change the change here >>>>>>>> (or any variant of it) would become unnecessary. >>>>>>> >>>>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>>>> Are you suggesting to gate the code if d->is_dying as well? >>>>>> >>>>>> Unmap shouldn't be allocating any memory right now, as in >>>>>> non-shared-page-table mode we don't install any superpages >>>>>> (unless I misremember). >>>>> >>>>> It doesn't allocate memory, but it will try to access the IOMMU >>>>> page-tables (see more below). >>>>> >>>>>> >>>>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>>>> This issue was quite hard to chase/reproduce. >>>>>>> >>>>>>> I think it would still be good to harden the code by zeroing >>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>>>> pointer was still "valid". >>>>>> >>>>>> But my point was that this zeroing already happens. >>>>>> What I >>>>>> suspect is that it gets re-populated after it was zeroed, >>>>>> because of page table manipulation that shouldn't be >>>>>> occurring anymore for a dying domain. >>>>> >>>>> AFAICT, the zeroing is happening in ->teardown() helper. >>>>> >>>>> It is only called when the domain is fully destroyed (see call in >>>>> arch_domain_destroy()). This will happen much after relinquishing the code. >>>>> >>>>> Could you clarify why you think it is already zeroed and by who? >>>> >>>> ... trusted you on what you stated there. But perhaps I somehow >>>> misunderstood that sentence to mean you want to put your addition >>>> into the teardown functions, when apparently you meant to invoke >>>> them earlier in the process. Without clearly identifying why this >>>> would be a safe thing to do, I couldn't imagine that's what you >>>> suggest as alternative. >>> >>> This was a wording issue. I meant moving ->teardown() before (or calling >>> from) iommu_free_pgtables(). >>> >>> Shall I introduce a new callback then? >> >> Earlier zeroing won't help unless you prevent re-population, or >> unless you make the code capable of telling "still zero" from >> "already zero". But I have to admit I'd like to also have Paul's >> opinion on the matter. > > Patch #4 is meant to prevent that with the d->is_dying check in the > IOMMU page-table allocation. > > Do you think this is not enough? It probably is; I think that other patch would want to come first then, or both be folded. Nevertheless I'm not fully convinced putting the check there is the best course of action. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 17:02 ` Jan Beulich @ 2020-12-23 17:26 ` Julien Grall 0 siblings, 0 replies; 40+ messages in thread From: Julien Grall @ 2020-12-23 17:26 UTC (permalink / raw) To: Jan Beulich; +Cc: hongyxia, Julien Grall, xen-devel, Paul Durrant Hi Jan, On 23/12/2020 17:02, Jan Beulich wrote: > On 23.12.2020 17:54, Julien Grall wrote: >> >> >> On 23/12/2020 16:46, Jan Beulich wrote: >>> On 23.12.2020 17:29, Julien Grall wrote: >>>> On 23/12/2020 16:24, Jan Beulich wrote: >>>>> On 23.12.2020 17:16, Julien Grall wrote: >>>>>> On 23/12/2020 16:11, Jan Beulich wrote: >>>>>>> On 23.12.2020 16:16, Julien Grall wrote: >>>>>>>> On 23/12/2020 15:00, Jan Beulich wrote: >>>>>>>>> On 23.12.2020 15:56, Julien Grall wrote: >>>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: >>>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: >>>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would >>>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? >>>>> >>>>> Please note this (in your original submission). I simply ... >>>>> >>>>>>>>>>> We already have >>>>>>>>>>> >>>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) >>>>>>>>>>> { >>>>>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose >>>>>>>>>>> doing the same for VT-d would be quite reasonable. And really >>>>>>>>>>> VT-d's iommu_domain_teardown() also already has >>>>>>>>>>> >>>>>>>>>>> hd->arch.vtd.pgd_maddr = 0; >>>>>>>>>> >>>>>>>>>> Let me have a look if that works. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I guess what's missing is prevention of the root table >>>>>>>>>>> getting re-setup. >>>>>>>>>> >>>>>>>>>> This is taken care in the follow-up patch by forbidding page-table >>>>>>>>>> allocation. I can mention it in the commit message. >>>>>>>>> >>>>>>>>> My expectation is that with that subsequent change the change here >>>>>>>>> (or any variant of it) would become unnecessary. >>>>>>>> >>>>>>>> I am not be sure. iommu_unmap() would still get called from put_page(). >>>>>>>> Are you suggesting to gate the code if d->is_dying as well? >>>>>>> >>>>>>> Unmap shouldn't be allocating any memory right now, as in >>>>>>> non-shared-page-table mode we don't install any superpages >>>>>>> (unless I misremember). >>>>>> >>>>>> It doesn't allocate memory, but it will try to access the IOMMU >>>>>> page-tables (see more below). >>>>>> >>>>>>> >>>>>>>> Even if this patch is deemed to be unecessary to fix the issue. >>>>>>>> This issue was quite hard to chase/reproduce. >>>>>>>> >>>>>>>> I think it would still be good to harden the code by zeroing >>>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the >>>>>>>> pointer was still "valid". >>>>>>> >>>>>>> But my point was that this zeroing already happens. >>>>>>> What I >>>>>>> suspect is that it gets re-populated after it was zeroed, >>>>>>> because of page table manipulation that shouldn't be >>>>>>> occurring anymore for a dying domain. >>>>>> >>>>>> AFAICT, the zeroing is happening in ->teardown() helper. >>>>>> >>>>>> It is only called when the domain is fully destroyed (see call in >>>>>> arch_domain_destroy()). This will happen much after relinquishing the code. >>>>>> >>>>>> Could you clarify why you think it is already zeroed and by who? >>>>> >>>>> ... trusted you on what you stated there. But perhaps I somehow >>>>> misunderstood that sentence to mean you want to put your addition >>>>> into the teardown functions, when apparently you meant to invoke >>>>> them earlier in the process. Without clearly identifying why this >>>>> would be a safe thing to do, I couldn't imagine that's what you >>>>> suggest as alternative. >>>> >>>> This was a wording issue. I meant moving ->teardown() before (or calling >>>> from) iommu_free_pgtables(). >>>> >>>> Shall I introduce a new callback then? >>> >>> Earlier zeroing won't help unless you prevent re-population, or >>> unless you make the code capable of telling "still zero" from >>> "already zero". But I have to admit I'd like to also have Paul's >>> opinion on the matter. >> >> Patch #4 is meant to prevent that with the d->is_dying check in the >> IOMMU page-table allocation. >> >> Do you think this is not enough? > > It probably is; I think that other patch would want to come first > then, or both be folded. I would like to keep them separated. But I am happy to re-order them. > Nevertheless I'm not fully convinced > putting the check there is the best course of action. As you pointed out in a previous e-mail, the IOMMU code is pretty hard to follow at times. The check in the allocator is quite simple, so I think it would be best to keep it. It doesn't mean this should be the only change, but it will avoid a whole lot of potential issues if we missed any path that may touch the IOMMU page-tables while the domain is dying. The other checks can just be shortcut to prevent extra work that will result to a failure. I will wait for Paul's input before reworking the series. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables 2020-12-23 16:46 ` Jan Beulich 2020-12-23 16:54 ` Julien Grall @ 2021-01-04 9:53 ` Paul Durrant 1 sibling, 0 replies; 40+ messages in thread From: Paul Durrant @ 2021-01-04 9:53 UTC (permalink / raw) To: 'Jan Beulich', 'Julien Grall' Cc: hongyxia, 'Julien Grall', xen-devel > -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 23 December 2020 16:46 > To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org> > Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the > page-tables > > On 23.12.2020 17:29, Julien Grall wrote: > > On 23/12/2020 16:24, Jan Beulich wrote: > >> On 23.12.2020 17:16, Julien Grall wrote: > >>> On 23/12/2020 16:11, Jan Beulich wrote: > >>>> On 23.12.2020 16:16, Julien Grall wrote: > >>>>> On 23/12/2020 15:00, Jan Beulich wrote: > >>>>>> On 23.12.2020 15:56, Julien Grall wrote: > >>>>>>> On 23/12/2020 14:12, Jan Beulich wrote: > >>>>>>>> On 22.12.2020 16:43, Julien Grall wrote: > >>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would > >>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions? > >> > >> Please note this (in your original submission). I simply ... > >> > >>>>>>>> We already have > >>>>>>>> > >>>>>>>> static void amd_iommu_domain_destroy(struct domain *d) > >>>>>>>> { > >>>>>>>> dom_iommu(d)->arch.amd.root_table = NULL; > >>>>>>>> } > >>>>>>>> > >>>>>>>> and this function is AMD's teardown handler. Hence I suppose > >>>>>>>> doing the same for VT-d would be quite reasonable. And really > >>>>>>>> VT-d's iommu_domain_teardown() also already has > >>>>>>>> > >>>>>>>> hd->arch.vtd.pgd_maddr = 0; > >>>>>>> > >>>>>>> Let me have a look if that works. > >>>>>>> > >>>>>>>> > >>>>>>>> I guess what's missing is prevention of the root table > >>>>>>>> getting re-setup. > >>>>>>> > >>>>>>> This is taken care in the follow-up patch by forbidding page-table > >>>>>>> allocation. I can mention it in the commit message. > >>>>>> > >>>>>> My expectation is that with that subsequent change the change here > >>>>>> (or any variant of it) would become unnecessary. > >>>>> > >>>>> I am not be sure. iommu_unmap() would still get called from put_page(). > >>>>> Are you suggesting to gate the code if d->is_dying as well? > >>>> > >>>> Unmap shouldn't be allocating any memory right now, as in > >>>> non-shared-page-table mode we don't install any superpages > >>>> (unless I misremember). > >>> > >>> It doesn't allocate memory, but it will try to access the IOMMU > >>> page-tables (see more below). > >>> > >>>> > >>>>> Even if this patch is deemed to be unecessary to fix the issue. > >>>>> This issue was quite hard to chase/reproduce. > >>>>> > >>>>> I think it would still be good to harden the code by zeroing > >>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the > >>>>> pointer was still "valid". > >>>> > >>>> But my point was that this zeroing already happens. > >>>> What I > >>>> suspect is that it gets re-populated after it was zeroed, > >>>> because of page table manipulation that shouldn't be > >>>> occurring anymore for a dying domain. > >>> > >>> AFAICT, the zeroing is happening in ->teardown() helper. > >>> > >>> It is only called when the domain is fully destroyed (see call in > >>> arch_domain_destroy()). This will happen much after relinquishing the code. > >>> > >>> Could you clarify why you think it is already zeroed and by who? > >> > >> ... trusted you on what you stated there. But perhaps I somehow > >> misunderstood that sentence to mean you want to put your addition > >> into the teardown functions, when apparently you meant to invoke > >> them earlier in the process. Without clearly identifying why this > >> would be a safe thing to do, I couldn't imagine that's what you > >> suggest as alternative. > > > > This was a wording issue. I meant moving ->teardown() before (or calling > > from) iommu_free_pgtables(). > > > > Shall I introduce a new callback then? > > Earlier zeroing won't help unless you prevent re-population, or > unless you make the code capable of telling "still zero" from > "already zero". But I have to admit I'd like to also have Paul's > opinion on the matter. > I have a pending series to dynamically unshare IOMMU mappings (to allow logdirty to be enabled on domains with currently-shared EPT) which gets rid of the lazy allocation of the root table and uses pgd_maddr == NULL as a test of whether EPT is shared or not. Therefore it would seem best, to fix this immediate problem, to also stop lazy allocation of pgd_maddr, and re-introduce a per-implementation free_pgtables() callback which zeroes pgd_maddr and then calls the common function. Paul ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall ` (2 preceding siblings ...) 2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall @ 2020-12-22 15:43 ` Julien Grall 2020-12-22 17:12 ` Julien Grall 2020-12-23 14:34 ` Jan Beulich 3 siblings, 2 replies; 40+ messages in thread From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw) To: xen-devel Cc: hongyxia, Julien Grall, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant From: Julien Grall <jgrall@amazon.com> The new IOMMU page-tables allocator will release the pages when relinquish the domain resources. However, this is not sufficient in two cases: 1) domain_relinquish_resources() is not called when the domain creation fails. 2) There is nothing preventing page-table allocations when the domain is dying. In both cases, this can be solved by freeing the page-tables again when the domain destruction. Although, this may result to an high number of page-tables to free. In the second case, it is pointless to allow page-table allocation when the domain is going to die. iommu_alloc_pgtable() will now return an error when it is called while the domain is dying. Signed-off-by: Julien Grall <jgrall@amazon.com> --- xen/arch/x86/domain.c | 2 +- xen/drivers/passthrough/x86/iommu.c | 32 +++++++++++++++++++++++++++-- xen/include/asm-x86/iommu.h | 2 +- 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index b9ba04633e18..1b7ee5c1a8cb 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2290,7 +2290,7 @@ int domain_relinquish_resources(struct domain *d) PROGRESS(iommu_pagetables): - ret = iommu_free_pgtables(d); + ret = iommu_free_pgtables(d, false); if ( ret ) return ret; diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 99a23177b3d2..4a083e4b8f11 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -149,6 +149,21 @@ int arch_iommu_domain_init(struct domain *d) void arch_iommu_domain_destroy(struct domain *d) { + struct domain_iommu *hd = dom_iommu(d); + int rc; + + /* + * The relinquish code will not be executed if the domain creation + * failed. To avoid any memory leak, we want to free any IOMMU + * page-tables that may have been allocated. + */ + rc = iommu_free_pgtables(d, false); + + /* The preemption was disabled, so the call should never fail. */ + if ( rc ) + ASSERT_UNREACHABLE(); + + ASSERT(page_list_empty(&hd->arch.pgtables.list)); } static bool __hwdom_init hwdom_iommu_map(const struct domain *d, @@ -261,7 +276,7 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) return; } -int iommu_free_pgtables(struct domain *d) +int iommu_free_pgtables(struct domain *d, bool preempt) { struct domain_iommu *hd = dom_iommu(d); struct page_info *pg; @@ -282,7 +297,7 @@ int iommu_free_pgtables(struct domain *d) { free_domheap_page(pg); - if ( !(++done & 0xff) && general_preempt_check() ) + if ( !(++done & 0xff) && preempt && general_preempt_check() ) { spin_unlock(&hd->arch.pgtables.lock); return -ERESTART; @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) memflags = MEMF_node(hd->node); #endif + /* + * The IOMMU page-tables are freed when relinquishing the domain, but + * nothing prevent allocation to happen afterwards. There is no valid + * reasons to continue to update the IOMMU page-tables while the + * domain is dying. + * + * So prevent page-table allocation when the domain is dying. Note + * this doesn't fully prevent the race because d->is_dying may not + * yet be seen. + */ + if ( d->is_dying ) + return NULL; + pg = alloc_domheap_page(NULL, memflags); if ( !pg ) return NULL; diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h index 970eb06ffac5..874bb5bbfbde 100644 --- a/xen/include/asm-x86/iommu.h +++ b/xen/include/asm-x86/iommu.h @@ -135,7 +135,7 @@ int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq, iommu_vcall(ops, sync_cache, addr, size); \ }) -int __must_check iommu_free_pgtables(struct domain *d); +int __must_check iommu_free_pgtables(struct domain *d, bool preempt); struct page_info *__must_check iommu_alloc_pgtable(struct domain *d); #endif /* !__ARCH_X86_IOMMU_H__ */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall @ 2020-12-22 17:12 ` Julien Grall 2020-12-23 14:34 ` Jan Beulich 1 sibling, 0 replies; 40+ messages in thread From: Julien Grall @ 2020-12-22 17:12 UTC (permalink / raw) To: xen-devel Cc: hongyxia, Julien Grall, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant Hi, On 22/12/2020 15:43, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > The new IOMMU page-tables allocator will release the pages when > relinquish the domain resources. However, this is not sufficient in two > cases: > 1) domain_relinquish_resources() is not called when the domain > creation fails. > 2) There is nothing preventing page-table allocations when the > domain is dying. > > In both cases, this can be solved by freeing the page-tables again > when the domain destruction. Although, this may result to an high > number of page-tables to free. > > In the second case, it is pointless to allow page-table allocation when > the domain is going to die. iommu_alloc_pgtable() will now return an > error when it is called while the domain is dying. > > Signed-off-by: Julien Grall <jgrall@amazon.com> I forgot to mention this is fixing 15bc9a1ef51c "x86/iommu: add common page-table allocator". I will add a Fixes tag for the next version. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall 2020-12-22 17:12 ` Julien Grall @ 2020-12-23 14:34 ` Jan Beulich 2020-12-23 16:07 ` Julien Grall 1 sibling, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 14:34 UTC (permalink / raw) To: Julien Grall Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel On 22.12.2020 16:43, Julien Grall wrote: > From: Julien Grall <jgrall@amazon.com> > > The new IOMMU page-tables allocator will release the pages when > relinquish the domain resources. However, this is not sufficient in two > cases: > 1) domain_relinquish_resources() is not called when the domain > creation fails. Could you remind me of what IOMMU page table insertions there are during domain creation? No memory got allocated to the domain at that point yet, so it would seem to me there simply is nothing to map. > 2) There is nothing preventing page-table allocations when the > domain is dying. > > In both cases, this can be solved by freeing the page-tables again > when the domain destruction. Although, this may result to an high > number of page-tables to free. Since I've seen this before in this series, and despite me also not being a native speaker, as a nit: I don't think it can typically be other than "result in". > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -2290,7 +2290,7 @@ int domain_relinquish_resources(struct domain *d) > > PROGRESS(iommu_pagetables): > > - ret = iommu_free_pgtables(d); > + ret = iommu_free_pgtables(d, false); I suppose you mean "true" here, but I also think the other approach (checking for DOMDYING_dead, which you don't seem to like very much) is better, if for no other reason than it already being used elsewhere. > @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) > memflags = MEMF_node(hd->node); > #endif > > + /* > + * The IOMMU page-tables are freed when relinquishing the domain, but > + * nothing prevent allocation to happen afterwards. There is no valid > + * reasons to continue to update the IOMMU page-tables while the > + * domain is dying. > + * > + * So prevent page-table allocation when the domain is dying. Note > + * this doesn't fully prevent the race because d->is_dying may not > + * yet be seen. > + */ > + if ( d->is_dying ) > + return NULL; > + > pg = alloc_domheap_page(NULL, memflags); > if ( !pg ) > return NULL; As said in reply to an earlier patch - with a suitable spin_barrier() you can place your check further down, along the lines of spin_lock(&hd->arch.pgtables.lock); if ( likely(!d->is_dying) ) { page_list_add(pg, &hd->arch.pgtables.list); p = NULL; } spin_unlock(&hd->arch.pgtables.lock); if ( p ) { free_domheap_page(pg); pg = NULL; } (albeit I'm relatively sure you won't like the re-purposing of p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be nice to use here, but we seem to only have FREE_XENHEAP_PAGE() so far.) Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-23 14:34 ` Jan Beulich @ 2020-12-23 16:07 ` Julien Grall 2020-12-23 16:15 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 16:07 UTC (permalink / raw) To: Jan Beulich Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel On 23/12/2020 14:34, Jan Beulich wrote: > On 22.12.2020 16:43, Julien Grall wrote: >> From: Julien Grall <jgrall@amazon.com> >> >> The new IOMMU page-tables allocator will release the pages when >> relinquish the domain resources. However, this is not sufficient in two >> cases: >> 1) domain_relinquish_resources() is not called when the domain >> creation fails. > > Could you remind me of what IOMMU page table insertions there > are during domain creation? No memory got allocated to the > domain at that point yet, so it would seem to me there simply > is nothing to map. The P2M is first modified in hvm_domain_initialise(): (XEN) Xen call trace: (XEN) [<ffff82d04026b9ec>] R iommu_alloc_pgtable+0x11/0x137 (XEN) [<ffff82d04025f9f5>] F drivers/passthrough/vtd/iommu.c#addr_to_dma_page_maddr+0x146/0x1d8 (XEN) [<ffff82d04025fcc5>] F drivers/passthrough/vtd/iommu.c#intel_iommu_map_page+0x6a/0x14b (XEN) [<ffff82d04026d949>] F iommu_map+0x6d/0x16f (XEN) [<ffff82d04026da71>] F iommu_legacy_map+0x26/0x63 (XEN) [<ffff82d040301bdc>] F arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730 (XEN) [<ffff82d0402f67e7>] F p2m_set_entry+0x91/0x128 (XEN) [<ffff82d0402f6b5c>] F arch/x86/mm/p2m.c#set_typed_p2m_entry+0xfe/0x3f7 (XEN) [<ffff82d0402f7f4c>] F set_mmio_p2m_entry+0x65/0x6e (XEN) [<ffff82d04029a080>] F arch/x86/hvm/vmx/vmx.c#vmx_domain_initialise+0xf6/0x137 (XEN) [<ffff82d0402af421>] F hvm_domain_initialise+0x357/0x4c7 (XEN) [<ffff82d04031eae7>] F arch_domain_create+0x478/0x4ff (XEN) [<ffff82d04020476e>] F domain_create+0x4f2/0x778 (XEN) [<ffff82d04023b0d2>] F do_domctl+0xb1e/0x18b8 (XEN) [<ffff82d040311dbf>] F pv_hypercall+0x2f0/0x55f (XEN) [<ffff82d040390432>] F lstar_enter+0x112/0x120 > >> 2) There is nothing preventing page-table allocations when the >> domain is dying. >> >> In both cases, this can be solved by freeing the page-tables again >> when the domain destruction. Although, this may result to an high >> number of page-tables to free. > > Since I've seen this before in this series, and despite me also > not being a native speaker, as a nit: I don't think it can > typically be other than "result in". I think you are right. > >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -2290,7 +2290,7 @@ int domain_relinquish_resources(struct domain *d) >> >> PROGRESS(iommu_pagetables): >> >> - ret = iommu_free_pgtables(d); >> + ret = iommu_free_pgtables(d, false); > > I suppose you mean "true" here, but I also think the other > approach (checking for DOMDYING_dead, which you don't seem to > like very much) is better, if for no other reason than it > already being used elsewhere. I think "don't like very much" is an understatement :). There seems to be more function using an extra parameter (such as hap_set_allocation() which was introduced before your DOMDYING_dead). So I only followed what they did. > >> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) >> memflags = MEMF_node(hd->node); >> #endif >> >> + /* >> + * The IOMMU page-tables are freed when relinquishing the domain, but >> + * nothing prevent allocation to happen afterwards. There is no valid >> + * reasons to continue to update the IOMMU page-tables while the >> + * domain is dying. >> + * >> + * So prevent page-table allocation when the domain is dying. Note >> + * this doesn't fully prevent the race because d->is_dying may not >> + * yet be seen. >> + */ >> + if ( d->is_dying ) >> + return NULL; >> + >> pg = alloc_domheap_page(NULL, memflags); >> if ( !pg ) >> return NULL; > > As said in reply to an earlier patch - with a suitable > spin_barrier() you can place your check further down, along the > lines of > > spin_lock(&hd->arch.pgtables.lock); > if ( likely(!d->is_dying) ) > { > page_list_add(pg, &hd->arch.pgtables.list); > p = NULL; > } > spin_unlock(&hd->arch.pgtables.lock); > > if ( p ) > { > free_domheap_page(pg); > pg = NULL; > } > > (albeit I'm relatively sure you won't like the re-purposing of > p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be > nice to use here, but we seem to only have FREE_XENHEAP_PAGE() > so far.) In fact I don't mind the re-purposing of p. However, I dislike the allocation and then freeing when the domain is dying. I think I prefer the small race introduced (the pages will still be freed) over this solution. Note that Paul's IOMMU series will completely rework the function. So this is only temporary. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-23 16:07 ` Julien Grall @ 2020-12-23 16:15 ` Jan Beulich 2020-12-23 16:19 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 16:15 UTC (permalink / raw) To: Julien Grall Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel On 23.12.2020 17:07, Julien Grall wrote: > On 23/12/2020 14:34, Jan Beulich wrote: >> On 22.12.2020 16:43, Julien Grall wrote: >>> From: Julien Grall <jgrall@amazon.com> >>> >>> The new IOMMU page-tables allocator will release the pages when >>> relinquish the domain resources. However, this is not sufficient in two >>> cases: >>> 1) domain_relinquish_resources() is not called when the domain >>> creation fails. >> >> Could you remind me of what IOMMU page table insertions there >> are during domain creation? No memory got allocated to the >> domain at that point yet, so it would seem to me there simply >> is nothing to map. > > The P2M is first modified in hvm_domain_initialise(): > > (XEN) Xen call trace: > (XEN) [<ffff82d04026b9ec>] R iommu_alloc_pgtable+0x11/0x137 > (XEN) [<ffff82d04025f9f5>] F > drivers/passthrough/vtd/iommu.c#addr_to_dma_page_maddr+0x146/0x1d8 > (XEN) [<ffff82d04025fcc5>] F > drivers/passthrough/vtd/iommu.c#intel_iommu_map_page+0x6a/0x14b > (XEN) [<ffff82d04026d949>] F iommu_map+0x6d/0x16f > (XEN) [<ffff82d04026da71>] F iommu_legacy_map+0x26/0x63 > (XEN) [<ffff82d040301bdc>] F > arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730 > (XEN) [<ffff82d0402f67e7>] F p2m_set_entry+0x91/0x128 > (XEN) [<ffff82d0402f6b5c>] F > arch/x86/mm/p2m.c#set_typed_p2m_entry+0xfe/0x3f7 > (XEN) [<ffff82d0402f7f4c>] F set_mmio_p2m_entry+0x65/0x6e > (XEN) [<ffff82d04029a080>] F > arch/x86/hvm/vmx/vmx.c#vmx_domain_initialise+0xf6/0x137 > (XEN) [<ffff82d0402af421>] F hvm_domain_initialise+0x357/0x4c7 Oh, the infamous APIC access page again. >>> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) >>> memflags = MEMF_node(hd->node); >>> #endif >>> >>> + /* >>> + * The IOMMU page-tables are freed when relinquishing the domain, but >>> + * nothing prevent allocation to happen afterwards. There is no valid >>> + * reasons to continue to update the IOMMU page-tables while the >>> + * domain is dying. >>> + * >>> + * So prevent page-table allocation when the domain is dying. Note >>> + * this doesn't fully prevent the race because d->is_dying may not >>> + * yet be seen. >>> + */ >>> + if ( d->is_dying ) >>> + return NULL; >>> + >>> pg = alloc_domheap_page(NULL, memflags); >>> if ( !pg ) >>> return NULL; >> >> As said in reply to an earlier patch - with a suitable >> spin_barrier() you can place your check further down, along the >> lines of >> >> spin_lock(&hd->arch.pgtables.lock); >> if ( likely(!d->is_dying) ) >> { >> page_list_add(pg, &hd->arch.pgtables.list); >> p = NULL; >> } >> spin_unlock(&hd->arch.pgtables.lock); >> >> if ( p ) >> { >> free_domheap_page(pg); >> pg = NULL; >> } >> >> (albeit I'm relatively sure you won't like the re-purposing of >> p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be >> nice to use here, but we seem to only have FREE_XENHEAP_PAGE() >> so far.) > > In fact I don't mind the re-purposing of p. However, I dislike the > allocation and then freeing when the domain is dying. > > I think I prefer the small race introduced (the pages will still be > freed) over this solution. The "will still be freed" is because of the 2nd round of freeing you add in an earlier patch? I'd prefer to avoid the race to in turn avoid that 2nd round of freeing. Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-23 16:15 ` Jan Beulich @ 2020-12-23 16:19 ` Julien Grall 2020-12-23 16:35 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2020-12-23 16:19 UTC (permalink / raw) To: Jan Beulich Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel On 23/12/2020 16:15, Jan Beulich wrote: > On 23.12.2020 17:07, Julien Grall wrote: >> On 23/12/2020 14:34, Jan Beulich wrote: >>> On 22.12.2020 16:43, Julien Grall wrote: >>>> From: Julien Grall <jgrall@amazon.com> >>>> >>>> The new IOMMU page-tables allocator will release the pages when >>>> relinquish the domain resources. However, this is not sufficient in two >>>> cases: >>>> 1) domain_relinquish_resources() is not called when the domain >>>> creation fails. >>> >>> Could you remind me of what IOMMU page table insertions there >>> are during domain creation? No memory got allocated to the >>> domain at that point yet, so it would seem to me there simply >>> is nothing to map. >> >> The P2M is first modified in hvm_domain_initialise(): >> >> (XEN) Xen call trace: >> (XEN) [<ffff82d04026b9ec>] R iommu_alloc_pgtable+0x11/0x137 >> (XEN) [<ffff82d04025f9f5>] F >> drivers/passthrough/vtd/iommu.c#addr_to_dma_page_maddr+0x146/0x1d8 >> (XEN) [<ffff82d04025fcc5>] F >> drivers/passthrough/vtd/iommu.c#intel_iommu_map_page+0x6a/0x14b >> (XEN) [<ffff82d04026d949>] F iommu_map+0x6d/0x16f >> (XEN) [<ffff82d04026da71>] F iommu_legacy_map+0x26/0x63 >> (XEN) [<ffff82d040301bdc>] F >> arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730 >> (XEN) [<ffff82d0402f67e7>] F p2m_set_entry+0x91/0x128 >> (XEN) [<ffff82d0402f6b5c>] F >> arch/x86/mm/p2m.c#set_typed_p2m_entry+0xfe/0x3f7 >> (XEN) [<ffff82d0402f7f4c>] F set_mmio_p2m_entry+0x65/0x6e >> (XEN) [<ffff82d04029a080>] F >> arch/x86/hvm/vmx/vmx.c#vmx_domain_initialise+0xf6/0x137 >> (XEN) [<ffff82d0402af421>] F hvm_domain_initialise+0x357/0x4c7 > > Oh, the infamous APIC access page again. > >>>> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d) >>>> memflags = MEMF_node(hd->node); >>>> #endif >>>> >>>> + /* >>>> + * The IOMMU page-tables are freed when relinquishing the domain, but >>>> + * nothing prevent allocation to happen afterwards. There is no valid >>>> + * reasons to continue to update the IOMMU page-tables while the >>>> + * domain is dying. >>>> + * >>>> + * So prevent page-table allocation when the domain is dying. Note >>>> + * this doesn't fully prevent the race because d->is_dying may not >>>> + * yet be seen. >>>> + */ >>>> + if ( d->is_dying ) >>>> + return NULL; >>>> + >>>> pg = alloc_domheap_page(NULL, memflags); >>>> if ( !pg ) >>>> return NULL; >>> >>> As said in reply to an earlier patch - with a suitable >>> spin_barrier() you can place your check further down, along the >>> lines of >>> >>> spin_lock(&hd->arch.pgtables.lock); >>> if ( likely(!d->is_dying) ) >>> { >>> page_list_add(pg, &hd->arch.pgtables.list); >>> p = NULL; >>> } >>> spin_unlock(&hd->arch.pgtables.lock); >>> >>> if ( p ) >>> { >>> free_domheap_page(pg); >>> pg = NULL; >>> } >>> >>> (albeit I'm relatively sure you won't like the re-purposing of >>> p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be >>> nice to use here, but we seem to only have FREE_XENHEAP_PAGE() >>> so far.) >> >> In fact I don't mind the re-purposing of p. However, I dislike the >> allocation and then freeing when the domain is dying. >> >> I think I prefer the small race introduced (the pages will still be >> freed) over this solution. > > The "will still be freed" is because of the 2nd round of freeing > you add in an earlier patch? I'd prefer to avoid the race to in > turn avoid that 2nd round of freeing. The "2nd round" of freeing is necessary at least for the domain creation failure case (it would be the 1rst round). If we can avoid IOMMU page-table allocation in the domain creation path, then yes I agree that we want to avoid that "2nd round". Otherwise, I think it is best to take advantage of it. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-23 16:19 ` Julien Grall @ 2020-12-23 16:35 ` Jan Beulich 2021-01-14 18:53 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2020-12-23 16:35 UTC (permalink / raw) To: Julien Grall Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel On 23.12.2020 17:19, Julien Grall wrote: > On 23/12/2020 16:15, Jan Beulich wrote: >> On 23.12.2020 17:07, Julien Grall wrote: >>> I think I prefer the small race introduced (the pages will still be >>> freed) over this solution. >> >> The "will still be freed" is because of the 2nd round of freeing >> you add in an earlier patch? I'd prefer to avoid the race to in >> turn avoid that 2nd round of freeing. > > The "2nd round" of freeing is necessary at least for the domain creation > failure case (it would be the 1rst round). > > If we can avoid IOMMU page-table allocation in the domain creation path, > then yes I agree that we want to avoid that "2nd round". Otherwise, I > think it is best to take advantage of it. Well, I'm not really certain either way here. If it's really just VMX'es APIC access page that's the problem here, custom cleanup of this "fallout" from VMX code would certainly be an option. Furthermore I consider it wrong to insert this page in the IOMMU page tables in the first place. This page overlaps with the MSI special address range, and hence accesses will be dealt with by interrupt remapping machinery (if enabled). If interrupt remapping is disabled, this page should be no different for I/O purposes than all other pages in this window - they shouldn't be mapped at all. Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry() should gain an is_special_page() check to prevent propagation to the IOMMU for such pages (we can't do anything about them in shared-page-table setups)? See also the (PV related) comment in cleanup_page_mappings(), a few lines ahead of a respective use of is_special_page(), Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2020-12-23 16:35 ` Jan Beulich @ 2021-01-14 18:53 ` Julien Grall 2021-01-15 11:24 ` Jan Beulich 0 siblings, 1 reply; 40+ messages in thread From: Julien Grall @ 2021-01-14 18:53 UTC (permalink / raw) To: Jan Beulich Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel Hi Jan, On 23/12/2020 16:35, Jan Beulich wrote: > On 23.12.2020 17:19, Julien Grall wrote: >> On 23/12/2020 16:15, Jan Beulich wrote: >>> On 23.12.2020 17:07, Julien Grall wrote: >>>> I think I prefer the small race introduced (the pages will still be >>>> freed) over this solution. >>> >>> The "will still be freed" is because of the 2nd round of freeing >>> you add in an earlier patch? I'd prefer to avoid the race to in >>> turn avoid that 2nd round of freeing. >> >> The "2nd round" of freeing is necessary at least for the domain creation >> failure case (it would be the 1rst round). >> >> If we can avoid IOMMU page-table allocation in the domain creation path, >> then yes I agree that we want to avoid that "2nd round". Otherwise, I >> think it is best to take advantage of it. > > Well, I'm not really certain either way here. If it's really just > VMX'es APIC access page that's the problem here, custom cleanup > of this "fallout" from VMX code would certainly be an option. From my testing, it looks like the VMX APIC page is the only entry added while the domain is created. > Furthermore I consider it wrong to insert this page in the IOMMU > page tables in the first place. This page overlaps with the MSI > special address range, and hence accesses will be dealt with by > interrupt remapping machinery (if enabled). If interrupt > remapping is disabled, this page should be no different for I/O > purposes than all other pages in this window - they shouldn't > be mapped at all. That's a fair point. I will have a look to see if I can avoid the IOMMU mapping for the VMX APIC page. > Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry() > should gain an is_special_page() check to prevent propagation to > the IOMMU for such pages (we can't do anything about them in > shared-page-table setups)? See also the (PV related) comment in > cleanup_page_mappings(), a few lines ahead of a respective use of > is_special_page(), There seems to be a mismatch between what the comment says and the code adding the page in the IOMMU for PV domain. AFAICT, the IOMMU mapping will be added via _get_page_type(): /* Special pages should not be accessible from devices. */ struct domain *d = page_get_owner(page); if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) ) { mfn_t mfn = page_to_mfn(page); if ( (x & PGT_type_mask) == PGT_writable_page ) rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)), 1ul << PAGE_ORDER_4K); else rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn, 1ul << PAGE_ORDER_4K, IOMMUF_readable | IOMMUF_writable); } In this snippet, "special page" is interpreted as a page with no owner. However is_special_page() will return true when PGC_extra is set and the flag implies there is an owner (see assign_pages()). So it looks like to me, any pages with PGC_extra would end up to have a mapping in the IOMMU pages-tables if they are writable. This statement also seems to apply for xenheap pages shared with a domain (e.g grant-table). Did I miss anything? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2021-01-14 18:53 ` Julien Grall @ 2021-01-15 11:24 ` Jan Beulich 2021-01-15 11:30 ` Julien Grall 0 siblings, 1 reply; 40+ messages in thread From: Jan Beulich @ 2021-01-15 11:24 UTC (permalink / raw) To: Julien Grall Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel On 14.01.2021 19:53, Julien Grall wrote: > On 23/12/2020 16:35, Jan Beulich wrote: >> On 23.12.2020 17:19, Julien Grall wrote: >>> On 23/12/2020 16:15, Jan Beulich wrote: >>>> On 23.12.2020 17:07, Julien Grall wrote: >>>>> I think I prefer the small race introduced (the pages will still be >>>>> freed) over this solution. >>>> >>>> The "will still be freed" is because of the 2nd round of freeing >>>> you add in an earlier patch? I'd prefer to avoid the race to in >>>> turn avoid that 2nd round of freeing. >>> >>> The "2nd round" of freeing is necessary at least for the domain creation >>> failure case (it would be the 1rst round). >>> >>> If we can avoid IOMMU page-table allocation in the domain creation path, >>> then yes I agree that we want to avoid that "2nd round". Otherwise, I >>> think it is best to take advantage of it. >> >> Well, I'm not really certain either way here. If it's really just >> VMX'es APIC access page that's the problem here, custom cleanup >> of this "fallout" from VMX code would certainly be an option. > > From my testing, it looks like the VMX APIC page is the only entry > added while the domain is created. > >> Furthermore I consider it wrong to insert this page in the IOMMU >> page tables in the first place. This page overlaps with the MSI >> special address range, and hence accesses will be dealt with by >> interrupt remapping machinery (if enabled). If interrupt >> remapping is disabled, this page should be no different for I/O >> purposes than all other pages in this window - they shouldn't >> be mapped at all. > > That's a fair point. I will have a look to see if I can avoid the IOMMU > mapping for the VMX APIC page. > >> Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry() >> should gain an is_special_page() check to prevent propagation to >> the IOMMU for such pages (we can't do anything about them in >> shared-page-table setups)? See also the (PV related) comment in >> cleanup_page_mappings(), a few lines ahead of a respective use of >> is_special_page(), > > There seems to be a mismatch between what the comment says and the code > adding the page in the IOMMU for PV domain. > > AFAICT, the IOMMU mapping will be added via _get_page_type(): > > /* Special pages should not be accessible from devices. */ > struct domain *d = page_get_owner(page); > > if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) ) > { > mfn_t mfn = page_to_mfn(page); > > if ( (x & PGT_type_mask) == PGT_writable_page ) > rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)), > 1ul << PAGE_ORDER_4K); > else > rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn, > 1ul << PAGE_ORDER_4K, > IOMMUF_readable | IOMMUF_writable); > } > > In this snippet, "special page" is interpreted as a page with no owner. > However is_special_page() will return true when PGC_extra is set and the > flag implies there is an owner (see assign_pages()). > > So it looks like to me, any pages with PGC_extra would end up to have a > mapping in the IOMMU pages-tables if they are writable. > > This statement also seems to apply for xenheap pages shared with a > domain (e.g grant-table). > > Did I miss anything? First let me point out that what you quote is not what I had pointed you at - you refer to _get_page_type() when I suggested to look at cleanup_page_mappings(). The important aspect for special pages (here I mean ones having been subject to share_xen_page_with_guest()) is that their type gets "pinned", i.e. they'll never be subject to _get_page_type()'s transitioning of types. As you likely will have noticed, cleanup_page_mappings() also only gets called when the last _general_ ref of a page got dropped, i.e. entirely unrelated to type references. If PGC_extra pages have a similar requirement, they may need similar pinning of their types. (Maybe they do; didn't check.) Jan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables 2021-01-15 11:24 ` Jan Beulich @ 2021-01-15 11:30 ` Julien Grall 0 siblings, 0 replies; 40+ messages in thread From: Julien Grall @ 2021-01-15 11:30 UTC (permalink / raw) To: Jan Beulich Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné, Wei Liu, Paul Durrant, xen-devel Hi Jan, On 15/01/2021 11:24, Jan Beulich wrote: > On 14.01.2021 19:53, Julien Grall wrote: >> On 23/12/2020 16:35, Jan Beulich wrote: >>> On 23.12.2020 17:19, Julien Grall wrote: >>>> On 23/12/2020 16:15, Jan Beulich wrote: >>>>> On 23.12.2020 17:07, Julien Grall wrote: >>>>>> I think I prefer the small race introduced (the pages will still be >>>>>> freed) over this solution. >>>>> >>>>> The "will still be freed" is because of the 2nd round of freeing >>>>> you add in an earlier patch? I'd prefer to avoid the race to in >>>>> turn avoid that 2nd round of freeing. >>>> >>>> The "2nd round" of freeing is necessary at least for the domain creation >>>> failure case (it would be the 1rst round). >>>> >>>> If we can avoid IOMMU page-table allocation in the domain creation path, >>>> then yes I agree that we want to avoid that "2nd round". Otherwise, I >>>> think it is best to take advantage of it. >>> >>> Well, I'm not really certain either way here. If it's really just >>> VMX'es APIC access page that's the problem here, custom cleanup >>> of this "fallout" from VMX code would certainly be an option. >> >> From my testing, it looks like the VMX APIC page is the only entry >> added while the domain is created. >> >>> Furthermore I consider it wrong to insert this page in the IOMMU >>> page tables in the first place. This page overlaps with the MSI >>> special address range, and hence accesses will be dealt with by >>> interrupt remapping machinery (if enabled). If interrupt >>> remapping is disabled, this page should be no different for I/O >>> purposes than all other pages in this window - they shouldn't >>> be mapped at all. >> >> That's a fair point. I will have a look to see if I can avoid the IOMMU >> mapping for the VMX APIC page. >> >>> Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry() >>> should gain an is_special_page() check to prevent propagation to >>> the IOMMU for such pages (we can't do anything about them in >>> shared-page-table setups)? See also the (PV related) comment in >>> cleanup_page_mappings(), a few lines ahead of a respective use of >>> is_special_page(), >> >> There seems to be a mismatch between what the comment says and the code >> adding the page in the IOMMU for PV domain. >> >> AFAICT, the IOMMU mapping will be added via _get_page_type(): >> >> /* Special pages should not be accessible from devices. */ >> struct domain *d = page_get_owner(page); >> >> if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) ) >> { >> mfn_t mfn = page_to_mfn(page); >> >> if ( (x & PGT_type_mask) == PGT_writable_page ) >> rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)), >> 1ul << PAGE_ORDER_4K); >> else >> rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn, >> 1ul << PAGE_ORDER_4K, >> IOMMUF_readable | IOMMUF_writable); >> } >> >> In this snippet, "special page" is interpreted as a page with no owner. >> However is_special_page() will return true when PGC_extra is set and the >> flag implies there is an owner (see assign_pages()). >> >> So it looks like to me, any pages with PGC_extra would end up to have a >> mapping in the IOMMU pages-tables if they are writable. >> >> This statement also seems to apply for xenheap pages shared with a >> domain (e.g grant-table). >> >> Did I miss anything? > > First let me point out that what you quote is not what I had > pointed you at - you refer to _get_page_type() when I suggested > to look at cleanup_page_mappings(). I know and that's why I pointed out the mismatch between the comments (in cleanup_page_mappings()) and the code adding the mapping in the IOMMU (_get_page_type()). I only looked at _get_page_type() because I wanted to understand how the IOMMU mapping works for PV. The important aspect for > special pages (here I mean ones having been subject to > share_xen_page_with_guest()) is that their type gets "pinned", > i.e. they'll never be subject to _get_page_type()'s > transitioning of types. As you likely will have noticed, > cleanup_page_mappings() also only gets called when the last > _general_ ref of a page got dropped, i.e. entirely unrelated > to type references. Ah, that makes sense. I added some debugging in the code and couldn't really figure out why the transition wasn't happening. > > If PGC_extra pages have a similar requirement, they may need > similar pinning of their types. (Maybe they do; didn't check.) I have only looked at the VMX APIC page so far. It effectively pin the types. Thanks for the explanation! Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2021-01-15 15:18 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall 2020-12-23 13:27 ` Jan Beulich 2020-12-23 13:50 ` Julien Grall 2020-12-23 13:59 ` Jan Beulich 2020-12-23 14:51 ` Julien Grall 2020-12-23 14:58 ` Jan Beulich 2021-01-04 9:28 ` Paul Durrant 2021-01-04 14:33 ` Julien Grall 2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall 2020-12-23 13:48 ` Jan Beulich 2020-12-23 14:01 ` Julien Grall 2020-12-23 14:16 ` Jan Beulich 2021-01-14 19:19 ` Julien Grall 2021-01-15 11:06 ` Jan Beulich 2021-01-15 15:18 ` Paul Durrant 2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall 2020-12-23 14:12 ` Jan Beulich 2020-12-23 14:56 ` Julien Grall 2020-12-23 15:00 ` Jan Beulich 2020-12-23 15:16 ` Julien Grall 2020-12-23 16:11 ` Jan Beulich 2020-12-23 16:16 ` Julien Grall 2020-12-23 16:24 ` Jan Beulich 2020-12-23 16:29 ` Julien Grall 2020-12-23 16:46 ` Jan Beulich 2020-12-23 16:54 ` Julien Grall 2020-12-23 17:02 ` Jan Beulich 2020-12-23 17:26 ` Julien Grall 2021-01-04 9:53 ` Paul Durrant 2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall 2020-12-22 17:12 ` Julien Grall 2020-12-23 14:34 ` Jan Beulich 2020-12-23 16:07 ` Julien Grall 2020-12-23 16:15 ` Jan Beulich 2020-12-23 16:19 ` Julien Grall 2020-12-23 16:35 ` Jan Beulich 2021-01-14 18:53 ` Julien Grall 2021-01-15 11:24 ` Jan Beulich 2021-01-15 11:30 ` Julien Grall
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.