[PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn
@ 2020-12-22 15:43 Julien Grall
  2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

From: Julien Grall <jgrall@amazon.com>

Hi all,

This series is a collection of bug fixes for the IOMMU teardown code.
All of them are candidate for 4.15 as they can either leak memory or
lead to host crash/host corruption.

This is sent directly on xen-devel because all the issues were either
introduced in 4.15 or happen in the domain creation code.

Cheers,

Julien Grall (4):
  xen/iommu: Check if the IOMMU was initialized before tearing down
  xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  [RFC] xen/iommu: x86: Clear the root page-table before freeing the
    page-tables
  xen/iommu: x86: Don't leak the IOMMU page-tables

 xen/arch/x86/domain.c               |  2 +-
 xen/drivers/passthrough/iommu.c     | 10 +++++-
 xen/drivers/passthrough/x86/iommu.c | 47 +++++++++++++++++++++++++++--
 xen/include/asm-x86/iommu.h         |  2 +-
 4 files changed, 56 insertions(+), 5 deletions(-)

-- 
2.17.1



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
@ 2020-12-22 15:43 ` Julien Grall
  2020-12-23 13:27   ` Jan Beulich
  2021-01-04  9:28   ` Paul Durrant
  2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 40+ messages in thread
From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

is_iommu_enabled() will return true even if the IOMMU has not been
initialized (e.g. the ops are not set).

In the case of an early failure in arch_domain_init(), the function
iommu_destroy_domain() will be called even if the IOMMU is initialized.

This will result to dereference the ops which will be NULL and an host
crash.

Fix the issue by checking that ops has been set before accessing it.
Note that we are assuming that arch_iommu_domain_init() will cleanup an
intermediate failure if it failed.

Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/drivers/passthrough/iommu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 2358b6eb09f4..f976d5a0b0a5 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
 
 void iommu_domain_destroy(struct domain *d)
 {
-    if ( !is_iommu_enabled(d) )
+    struct domain_iommu *hd = dom_iommu(d);
+
+    /*
+     * In case of failure during the domain construction, it would be
+     * possible to reach this function with the IOMMU enabled but not
+     * yet initialized. We assume that hd->platforms will be non-NULL as
+     * soon as we start to initialize the IOMMU.
+     */
+    if ( !is_iommu_enabled(d) || !hd->platform_ops )
         return;
 
     iommu_teardown(d);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
@ 2020-12-23 13:27   ` Jan Beulich
  2020-12-23 13:50     ` Julien Grall
  2021-01-04  9:28   ` Paul Durrant
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 13:27 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 22.12.2020 16:43, Julien Grall wrote:
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
>  
>  void iommu_domain_destroy(struct domain *d)
>  {
> -    if ( !is_iommu_enabled(d) )
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    /*
> +     * In case of failure during the domain construction, it would be
> +     * possible to reach this function with the IOMMU enabled but not
> +     * yet initialized. We assume that hd->platforms will be non-NULL as
> +     * soon as we start to initialize the IOMMU.
> +     */
> +    if ( !is_iommu_enabled(d) || !hd->platform_ops )
>          return;

From your description I'd rather say is_iommu_enabled() is the
wrong predicate to use here. IOW I'd rather see it be replaced.

A couple of additional nits: "hd" isn't really necessary to
have as a local variable, but if you insist on introducing it
despite being used just once, it should be pointer-to-const.
Plus the comment would better spell correctly the field it
talks about. But I consider this comment excessive anyway, as
the check of ->platform_ops is quite clear by itself.

And finally "we assume" is in at least latent conflict with
->platform_ops getting set only after arch_iommu_domain_init()
was called. Right now neither x86'es nor Arm's do anything
that would need undoing, but I'd still suggest to replace
"assume" here (if you want to keep the comment in the first
place) and move the assignment up a few lines (right after the
is_iommu_enabled() check there).

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-23 13:27   ` Jan Beulich
@ 2020-12-23 13:50     ` Julien Grall
  2020-12-23 13:59       ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 13:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 23/12/2020 13:27, Jan Beulich wrote:
> On 22.12.2020 16:43, Julien Grall wrote:
>> --- a/xen/drivers/passthrough/iommu.c
>> +++ b/xen/drivers/passthrough/iommu.c
>> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
>>   
>>   void iommu_domain_destroy(struct domain *d)
>>   {
>> -    if ( !is_iommu_enabled(d) )
>> +    struct domain_iommu *hd = dom_iommu(d);
>> +
>> +    /*
>> +     * In case of failure during the domain construction, it would be
>> +     * possible to reach this function with the IOMMU enabled but not
>> +     * yet initialized. We assume that hd->platforms will be non-NULL as
>> +     * soon as we start to initialize the IOMMU.
>> +     */
>> +    if ( !is_iommu_enabled(d) || !hd->platform_ops )
>>           return;
> 
>  From your description I'd rather say is_iommu_enabled() is the
> wrong predicate to use here. IOW I'd rather see it be replaced.

!hd->platform_ops should be sufficient. So, I think we can drop 
!is_iommu_enabled(d). Would that be fine with you?

> 
> A couple of additional nits: "hd" isn't really necessary to
> have as a local variable, but if you insist on introducing it
> despite being used just once, it should be pointer-to-const.
> Plus the comment would better spell correctly the field it
> talks about. But I consider this comment excessive anyway, as
> the check of ->platform_ops is quite clear by itself.

Well, I added the comment because I think check hd->platform_ops may not 
be that obvious (see more below).

> And finally "we assume" is in at least latent conflict with
> ->platform_ops getting set only after arch_iommu_domain_init()
> was called. Right now neither x86'es nor Arm's do anything
> that would need undoing, but I'd still suggest to replace
> "assume" here (if you want to keep the comment in the first
> place) and move the assignment up a few lines (right after the
> is_iommu_enabled() check there).
My initial implementation of this patch moved the initialization of 
hd->platform_ops before arch_iommu_domain_init().

However, this is going to lead to some issues with Paul's series which 
add an IOMMU page-table pool ([1]).

The function arch_iommu_domain_init() may now fail. If we initialize 
hd->platform_ops earlier on, then the ->teardown() will be called before 
->init().

This may be an issue if ->teardown() expects some list pointer to be 
initialized by ->init(). I am not aware of any today, but this seems 
quite risky to me.

So I think it is better if we initialize hd->platform_ops after 
arch_iommu_domain_init() and expect the function to clean-up nything 
that was allocated before the error.

The alternative is we set hd->platform_ops if both 
arch_iommu_domain_init() and ->init() have succeeded. This means they 
will both have to clean-up in case of an error.

Any thoughts?

Cheers,

[1] <20201005094905.2929-6-paul@xen.org>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-23 13:50     ` Julien Grall
@ 2020-12-23 13:59       ` Jan Beulich
  2020-12-23 14:51         ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 13:59 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 23.12.2020 14:50, Julien Grall wrote:
> On 23/12/2020 13:27, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> --- a/xen/drivers/passthrough/iommu.c
>>> +++ b/xen/drivers/passthrough/iommu.c
>>> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
>>>   
>>>   void iommu_domain_destroy(struct domain *d)
>>>   {
>>> -    if ( !is_iommu_enabled(d) )
>>> +    struct domain_iommu *hd = dom_iommu(d);
>>> +
>>> +    /*
>>> +     * In case of failure during the domain construction, it would be
>>> +     * possible to reach this function with the IOMMU enabled but not
>>> +     * yet initialized. We assume that hd->platforms will be non-NULL as
>>> +     * soon as we start to initialize the IOMMU.
>>> +     */
>>> +    if ( !is_iommu_enabled(d) || !hd->platform_ops )
>>>           return;
>>
>>  From your description I'd rather say is_iommu_enabled() is the
>> wrong predicate to use here. IOW I'd rather see it be replaced.
> 
> !hd->platform_ops should be sufficient. So, I think we can drop 
> !is_iommu_enabled(d). Would that be fine with you?

Well, that's what I was trying to suggest.

>> A couple of additional nits: "hd" isn't really necessary to
>> have as a local variable, but if you insist on introducing it
>> despite being used just once, it should be pointer-to-const.
>> Plus the comment would better spell correctly the field it
>> talks about. But I consider this comment excessive anyway, as
>> the check of ->platform_ops is quite clear by itself.
> 
> Well, I added the comment because I think check hd->platform_ops may not 
> be that obvious (see more below).
> 
>> And finally "we assume" is in at least latent conflict with
>> ->platform_ops getting set only after arch_iommu_domain_init()
>> was called. Right now neither x86'es nor Arm's do anything
>> that would need undoing, but I'd still suggest to replace
>> "assume" here (if you want to keep the comment in the first
>> place) and move the assignment up a few lines (right after the
>> is_iommu_enabled() check there).
> My initial implementation of this patch moved the initialization of 
> hd->platform_ops before arch_iommu_domain_init().
> 
> However, this is going to lead to some issues with Paul's series which 
> add an IOMMU page-table pool ([1]).
> 
> The function arch_iommu_domain_init() may now fail. If we initialize 
> hd->platform_ops earlier on, then the ->teardown() will be called before 
> ->init().
> 
> This may be an issue if ->teardown() expects some list pointer to be 
> initialized by ->init(). I am not aware of any today, but this seems 
> quite risky to me.

In such a case the obvious thing is to make the teardown handler
check whether its init counterpart has run before. This will then
fit our apparently much wider goal of making cleanup functions
idempotent.

Jan

> So I think it is better if we initialize hd->platform_ops after 
> arch_iommu_domain_init() and expect the function to clean-up nything 
> that was allocated before the error.
> 
> The alternative is we set hd->platform_ops if both 
> arch_iommu_domain_init() and ->init() have succeeded. This means they 
> will both have to clean-up in case of an error.
> 
> Any thoughts?
> 
> Cheers,
> 
> [1] <20201005094905.2929-6-paul@xen.org>
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-23 13:59       ` Jan Beulich
@ 2020-12-23 14:51         ` Julien Grall
  2020-12-23 14:58           ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 14:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 23/12/2020 13:59, Jan Beulich wrote:
> On 23.12.2020 14:50, Julien Grall wrote:
>> On 23/12/2020 13:27, Jan Beulich wrote:
>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>> --- a/xen/drivers/passthrough/iommu.c
>>>> +++ b/xen/drivers/passthrough/iommu.c
>>>> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
>>>>    
>>>>    void iommu_domain_destroy(struct domain *d)
>>>>    {
>>>> -    if ( !is_iommu_enabled(d) )
>>>> +    struct domain_iommu *hd = dom_iommu(d);
>>>> +
>>>> +    /*
>>>> +     * In case of failure during the domain construction, it would be
>>>> +     * possible to reach this function with the IOMMU enabled but not
>>>> +     * yet initialized. We assume that hd->platforms will be non-NULL as
>>>> +     * soon as we start to initialize the IOMMU.
>>>> +     */
>>>> +    if ( !is_iommu_enabled(d) || !hd->platform_ops )
>>>>            return;
>>>
>>>   From your description I'd rather say is_iommu_enabled() is the
>>> wrong predicate to use here. IOW I'd rather see it be replaced.
>>
>> !hd->platform_ops should be sufficient. So, I think we can drop
>> !is_iommu_enabled(d). Would that be fine with you?
> 
> Well, that's what I was trying to suggest.
> 
>>> A couple of additional nits: "hd" isn't really necessary to
>>> have as a local variable, but if you insist on introducing it
>>> despite being used just once, it should be pointer-to-const.
>>> Plus the comment would better spell correctly the field it
>>> talks about. But I consider this comment excessive anyway, as
>>> the check of ->platform_ops is quite clear by itself.
>>
>> Well, I added the comment because I think check hd->platform_ops may not
>> be that obvious (see more below).
>>
>>> And finally "we assume" is in at least latent conflict with
>>> ->platform_ops getting set only after arch_iommu_domain_init()
>>> was called. Right now neither x86'es nor Arm's do anything
>>> that would need undoing, but I'd still suggest to replace
>>> "assume" here (if you want to keep the comment in the first
>>> place) and move the assignment up a few lines (right after the
>>> is_iommu_enabled() check there).
>> My initial implementation of this patch moved the initialization of
>> hd->platform_ops before arch_iommu_domain_init().
>>
>> However, this is going to lead to some issues with Paul's series which
>> add an IOMMU page-table pool ([1]).
>>
>> The function arch_iommu_domain_init() may now fail. If we initialize
>> hd->platform_ops earlier on, then the ->teardown() will be called before
>> ->init().
>>
>> This may be an issue if ->teardown() expects some list pointer to be
>> initialized by ->init(). I am not aware of any today, but this seems
>> quite risky to me.
> 
> In such a case the obvious thing is to make the teardown handler
> check whether its init counterpart has run before. This will then
> fit our apparently much wider goal of making cleanup functions
> idempotent.

I will have a look. This may require another boolean which I wanted to 
avoid.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-23 14:51         ` Julien Grall
@ 2020-12-23 14:58           ` Jan Beulich
  0 siblings, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 14:58 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 23.12.2020 15:51, Julien Grall wrote:
> On 23/12/2020 13:59, Jan Beulich wrote:
>> On 23.12.2020 14:50, Julien Grall wrote:
>>> On 23/12/2020 13:27, Jan Beulich wrote:
>>>> And finally "we assume" is in at least latent conflict with
>>>> ->platform_ops getting set only after arch_iommu_domain_init()
>>>> was called. Right now neither x86'es nor Arm's do anything
>>>> that would need undoing, but I'd still suggest to replace
>>>> "assume" here (if you want to keep the comment in the first
>>>> place) and move the assignment up a few lines (right after the
>>>> is_iommu_enabled() check there).
>>> My initial implementation of this patch moved the initialization of
>>> hd->platform_ops before arch_iommu_domain_init().
>>>
>>> However, this is going to lead to some issues with Paul's series which
>>> add an IOMMU page-table pool ([1]).
>>>
>>> The function arch_iommu_domain_init() may now fail. If we initialize
>>> hd->platform_ops earlier on, then the ->teardown() will be called before
>>> ->init().
>>>
>>> This may be an issue if ->teardown() expects some list pointer to be
>>> initialized by ->init(). I am not aware of any today, but this seems
>>> quite risky to me.
>>
>> In such a case the obvious thing is to make the teardown handler
>> check whether its init counterpart has run before. This will then
>> fit our apparently much wider goal of making cleanup functions
>> idempotent.
> 
> I will have a look. This may require another boolean which I wanted to 
> avoid.

I could imagine list_head_is_null() to become handy here.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
  2020-12-23 13:27   ` Jan Beulich
@ 2021-01-04  9:28   ` Paul Durrant
  2021-01-04 14:33     ` Julien Grall
  1 sibling, 1 reply; 40+ messages in thread
From: Paul Durrant @ 2021-01-04  9:28 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: hongyxia, 'Julien Grall', 'Jan Beulich'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 22 December 2020 15:44
> To: xen-devel@lists.xenproject.org
> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; Jan Beulich <jbeulich@suse.com>; Paul
> Durrant <paul@xen.org>
> Subject: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> is_iommu_enabled() will return true even if the IOMMU has not been
> initialized (e.g. the ops are not set).
> 
> In the case of an early failure in arch_domain_init(), the function
> iommu_destroy_domain() will be called even if the IOMMU is initialized.
> 
> This will result to dereference the ops which will be NULL and an host
> crash.
> 
> Fix the issue by checking that ops has been set before accessing it.
> Note that we are assuming that arch_iommu_domain_init() will cleanup an
> intermediate failure if it failed.
> 
> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> ---
>  xen/drivers/passthrough/iommu.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 2358b6eb09f4..f976d5a0b0a5 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
> 
>  void iommu_domain_destroy(struct domain *d)
>  {
> -    if ( !is_iommu_enabled(d) )
> +    struct domain_iommu *hd = dom_iommu(d);
> +
> +    /*
> +     * In case of failure during the domain construction, it would be
> +     * possible to reach this function with the IOMMU enabled but not
> +     * yet initialized. We assume that hd->platforms will be non-NULL as
> +     * soon as we start to initialize the IOMMU.
> +     */
> +    if ( !is_iommu_enabled(d) || !hd->platform_ops )
>          return;

TBH, this seems like the wrong way to fix it. The ops dereference is done in iommu_teardown() so that ought to be doing the check,
but given it is single line function then it could be inlined at the same time. That said, I think arch_iommu_domain_destroy() needs
to be call unconditionally as it arch_iommu_domain_init() is called before the ops pointer is set.

  Paul

> 
>      iommu_teardown(d);
> --
> 2.17.1




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
  2021-01-04  9:28   ` Paul Durrant
@ 2021-01-04 14:33     ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2021-01-04 14:33 UTC (permalink / raw)
  To: paul, xen-devel; +Cc: hongyxia, 'Julien Grall', 'Jan Beulich'

Hi Paul,

On 04/01/2021 09:28, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 22 December 2020 15:44
>> To: xen-devel@lists.xenproject.org
>> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; Jan Beulich <jbeulich@suse.com>; Paul
>> Durrant <paul@xen.org>
>> Subject: [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> is_iommu_enabled() will return true even if the IOMMU has not been
>> initialized (e.g. the ops are not set).
>>
>> In the case of an early failure in arch_domain_init(), the function
>> iommu_destroy_domain() will be called even if the IOMMU is initialized.
>>
>> This will result to dereference the ops which will be NULL and an host
>> crash.
>>
>> Fix the issue by checking that ops has been set before accessing it.
>> Note that we are assuming that arch_iommu_domain_init() will cleanup an
>> intermediate failure if it failed.
>>
>> Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> ---
>>   xen/drivers/passthrough/iommu.c | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
>> index 2358b6eb09f4..f976d5a0b0a5 100644
>> --- a/xen/drivers/passthrough/iommu.c
>> +++ b/xen/drivers/passthrough/iommu.c
>> @@ -226,7 +226,15 @@ static void iommu_teardown(struct domain *d)
>>
>>   void iommu_domain_destroy(struct domain *d)
>>   {
>> -    if ( !is_iommu_enabled(d) )
>> +    struct domain_iommu *hd = dom_iommu(d);
>> +
>> +    /*
>> +     * In case of failure during the domain construction, it would be
>> +     * possible to reach this function with the IOMMU enabled but not
>> +     * yet initialized. We assume that hd->platforms will be non-NULL as
>> +     * soon as we start to initialize the IOMMU.
>> +     */
>> +    if ( !is_iommu_enabled(d) || !hd->platform_ops )
>>           return;
> 
> TBH, this seems like the wrong way to fix it. The ops dereference is done in iommu_teardown() so that ought to be doing the check,
> but given it is single line function then it could be inlined at the same time. That said, I think arch_iommu_domain_destroy() needs
> to be call unconditionally as it arch_iommu_domain_init() is called before the ops pointer is set.

I originally resolved this way to avoid arch_iommu_domain_init() and 
iommu_teardown() to be called when the failure may happen before 
iommu_domain_init() is called. IOW, the lists/locks would be unitialized.

After discussing with Jan, it turns out that we could check whether the 
list was initialized or not. So I have reworked the code to now call 
arch_iommu_domain_init() unconditionally and move the ops check in 
iommu_teardown().

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
  2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
@ 2020-12-22 15:43 ` Julien Grall
  2020-12-23 13:48   ` Jan Beulich
  2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
  2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
  3 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The pgtables.lock is protecting access to the page list pgtables.list.
However, iommu_free_pgtables() will not held it. I guess it was assumed
that page-tables cannot be allocated while the domain is dying.

Unfortunately, there is no guarantee that iommu_map() will not be
called while a domain is dying (it looks like to be possible from
XEN_DOMCTL_memory_mapping). So it would be possible to be concurrently
allocate memory and free the page-tables.

Therefore, we need to held the lock when freeing the page tables.

There are more issues around the IOMMU page-allocator. They will be
handled in follow-up patches.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/drivers/passthrough/x86/iommu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index cea1032b3d02..779dbb5b98ba 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,13 +267,18 @@ int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    spin_lock(&hd->arch.pgtables.lock);
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
         free_domheap_page(pg);
 
         if ( !(++done & 0xff) && general_preempt_check() )
+        {
+            spin_unlock(&hd->arch.pgtables.lock);
             return -ERESTART;
+        }
     }
+    spin_unlock(&hd->arch.pgtables.lock);
 
     return 0;
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall
@ 2020-12-23 13:48   ` Jan Beulich
  2020-12-23 14:01     ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 13:48 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 22.12.2020 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The pgtables.lock is protecting access to the page list pgtables.list.
> However, iommu_free_pgtables() will not held it. I guess it was assumed
> that page-tables cannot be allocated while the domain is dying.
> 
> Unfortunately, there is no guarantee that iommu_map() will not be
> called while a domain is dying (it looks like to be possible from
> XEN_DOMCTL_memory_mapping).

I'd rather disallow any new allocations for a dying domain, not
the least because ...

> So it would be possible to be concurrently
> allocate memory and free the page-tables.
> 
> Therefore, we need to held the lock when freeing the page tables.

... we should try to avoid holding locks across allocation /
freeing functions wherever possible.

As to where to place a respective check - I wonder if we wouldn't
be better off disallowing a majority of domctl-s (and perhaps
other operations) on dying domains. Thoughts?

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2020-12-23 13:48   ` Jan Beulich
@ 2020-12-23 14:01     ` Julien Grall
  2020-12-23 14:16       ` Jan Beulich
  2021-01-14 19:19       ` Julien Grall
  0 siblings, 2 replies; 40+ messages in thread
From: Julien Grall @ 2020-12-23 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 23/12/2020 13:48, Jan Beulich wrote:
> On 22.12.2020 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The pgtables.lock is protecting access to the page list pgtables.list.
>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>> that page-tables cannot be allocated while the domain is dying.
>>
>> Unfortunately, there is no guarantee that iommu_map() will not be
>> called while a domain is dying (it looks like to be possible from
>> XEN_DOMCTL_memory_mapping).
> 
> I'd rather disallow any new allocations for a dying domain, not
> the least because ...

Patch #4 will disallow such allocation. However...

> 
>> So it would be possible to be concurrently
>> allocate memory and free the page-tables.
>>
>> Therefore, we need to held the lock when freeing the page tables.
> 
> ... we should try to avoid holding locks across allocation /
> freeing functions wherever possible. >
> As to where to place a respective check - I wonder if we wouldn't
> be better off disallowing a majority of domctl-s (and perhaps
> other operations) on dying domains. Thoughts?

... this is still pretty racy because you need to guarantee that 
d->is_dying is seen by the other processors to prevent allocation.

As to whether we can forbid most of the domctl-s, I would agree this is 
a good move. But this doesn't remove the underlying problem here.

We are hoping that a top-level function will protect us against the 
race. Given the IOMMU code is quite deep in the callstack, this is 
something pretty hard to guarantee with future change.

So I still think we need a way to mitigate the issue.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2020-12-23 14:01     ` Julien Grall
@ 2020-12-23 14:16       ` Jan Beulich
  2021-01-14 19:19       ` Julien Grall
  1 sibling, 0 replies; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 14:16 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 23.12.2020 15:01, Julien Grall wrote:
> Hi Jan,
> 
> On 23/12/2020 13:48, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The pgtables.lock is protecting access to the page list pgtables.list.
>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>>> that page-tables cannot be allocated while the domain is dying.
>>>
>>> Unfortunately, there is no guarantee that iommu_map() will not be
>>> called while a domain is dying (it looks like to be possible from
>>> XEN_DOMCTL_memory_mapping).
>>
>> I'd rather disallow any new allocations for a dying domain, not
>> the least because ...
> 
> Patch #4 will disallow such allocation. However...
> 
>>
>>> So it would be possible to be concurrently
>>> allocate memory and free the page-tables.
>>>
>>> Therefore, we need to held the lock when freeing the page tables.
>>
>> ... we should try to avoid holding locks across allocation /
>> freeing functions wherever possible. >
>> As to where to place a respective check - I wonder if we wouldn't
>> be better off disallowing a majority of domctl-s (and perhaps
>> other operations) on dying domains. Thoughts?
> 
> ... this is still pretty racy because you need to guarantee that 
> d->is_dying is seen by the other processors to prevent allocation.

The function freeing the page tables will need a spin_barrier()
or alike similar to evtchn_destroy(). Aiui this will eliminate
all potential for races.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2020-12-23 14:01     ` Julien Grall
  2020-12-23 14:16       ` Jan Beulich
@ 2021-01-14 19:19       ` Julien Grall
  2021-01-15 11:06         ` Jan Beulich
  1 sibling, 1 reply; 40+ messages in thread
From: Julien Grall @ 2021-01-14 19:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel



On 23/12/2020 14:01, Julien Grall wrote:
> Hi Jan,
> 
> On 23/12/2020 13:48, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The pgtables.lock is protecting access to the page list pgtables.list.
>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>>> that page-tables cannot be allocated while the domain is dying.
>>>
>>> Unfortunately, there is no guarantee that iommu_map() will not be
>>> called while a domain is dying (it looks like to be possible from
>>> XEN_DOMCTL_memory_mapping).
>>
>> I'd rather disallow any new allocations for a dying domain, not
>> the least because ...
> 
> Patch #4 will disallow such allocation. However...

It turns out that I can't disallow it because there is at least one call 
of iommu_map() while an HVM domain is destroyed:

     (XEN)    [<ffff82d04026e399>] R iommu_map+0xf2/0x171
     (XEN)    [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63
     (XEN)    [<ffff82d040302a42>] F 
arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
     (XEN)    [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128
     (XEN)    [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0
     (XEN)    [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe
     (XEN)    [<ffff82d0402ba0a2>] F 
arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d
     (XEN)    [<ffff82d0402ba0f9>] F 
arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f
     (XEN)    [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b
     (XEN)    [<ffff82d0402afc15>] F 
hvm_domain_relinquish_resources+0x3e/0x92
     (XEN)    [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372
     (XEN)    [<ffff82d040205dd4>] F domain_kill+0xc7/0x150
     (XEN)    [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09
     (XEN)    [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f
     (XEN)    [<ffff82d040391432>] F lstar_enter+0x112/0x120

This one resulted to a domain crash rather than a clean destruction.

I would still like to disallow page-table allocation, so I am think to 
ignore any request in iommu_map() if the domain is dying.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2021-01-14 19:19       ` Julien Grall
@ 2021-01-15 11:06         ` Jan Beulich
  2021-01-15 15:18           ` Paul Durrant
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2021-01-15 11:06 UTC (permalink / raw)
  To: Julien Grall, Paul Durrant
  Cc: hongyxia, Julien Grall, xen-devel, Andrew Cooper

On 14.01.2021 20:19, Julien Grall wrote:
> 
> 
> On 23/12/2020 14:01, Julien Grall wrote:
>> Hi Jan,
>>
>> On 23/12/2020 13:48, Jan Beulich wrote:
>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The pgtables.lock is protecting access to the page list pgtables.list.
>>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
>>>> that page-tables cannot be allocated while the domain is dying.
>>>>
>>>> Unfortunately, there is no guarantee that iommu_map() will not be
>>>> called while a domain is dying (it looks like to be possible from
>>>> XEN_DOMCTL_memory_mapping).
>>>
>>> I'd rather disallow any new allocations for a dying domain, not
>>> the least because ...
>>
>> Patch #4 will disallow such allocation. However...
> 
> It turns out that I can't disallow it because there is at least one call 
> of iommu_map() while an HVM domain is destroyed:
> 
>      (XEN)    [<ffff82d04026e399>] R iommu_map+0xf2/0x171
>      (XEN)    [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63
>      (XEN)    [<ffff82d040302a42>] F 
> arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
>      (XEN)    [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128
>      (XEN)    [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0
>      (XEN)    [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe
>      (XEN)    [<ffff82d0402ba0a2>] F 
> arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d
>      (XEN)    [<ffff82d0402ba0f9>] F 
> arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f
>      (XEN)    [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b
>      (XEN)    [<ffff82d0402afc15>] F 
> hvm_domain_relinquish_resources+0x3e/0x92
>      (XEN)    [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372
>      (XEN)    [<ffff82d040205dd4>] F domain_kill+0xc7/0x150
>      (XEN)    [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09
>      (XEN)    [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f
>      (XEN)    [<ffff82d040391432>] F lstar_enter+0x112/0x120
> 
> This one resulted to a domain crash rather than a clean destruction.

A domain crash despite the domain already getting cleaned up is
something we may at least want to consider doing better: There
already is a !d->is_shutting_down conditional printk() there.
What would people think about avoiding the domain_crash() call
in similar ways? (It could even be considered to make
domain_crash() itself behave like this, but such a step may make
it necessary to first audit all use sites.)

> I would still like to disallow page-table allocation, so I am think to 
> ignore any request in iommu_map() if the domain is dying.

Ignoring requests there seems fragile to me. Paul - what are your
thoughts about bailing early from hvm_add_ioreq_gfn() when the
domain is dying?

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held
  2021-01-15 11:06         ` Jan Beulich
@ 2021-01-15 15:18           ` Paul Durrant
  0 siblings, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2021-01-15 15:18 UTC (permalink / raw)
  To: 'Jan Beulich', 'Julien Grall'
  Cc: hongyxia, 'Julien Grall', xen-devel, 'Andrew Cooper'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 15 January 2021 11:07
> To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org>
> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org; Andrew
> Cooper <andrew.cooper3@citrix.com>
> Subject: Re: [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock
> held
> 
> On 14.01.2021 20:19, Julien Grall wrote:
> >
> >
> > On 23/12/2020 14:01, Julien Grall wrote:
> >> Hi Jan,
> >>
> >> On 23/12/2020 13:48, Jan Beulich wrote:
> >>> On 22.12.2020 16:43, Julien Grall wrote:
> >>>> From: Julien Grall <jgrall@amazon.com>
> >>>>
> >>>> The pgtables.lock is protecting access to the page list pgtables.list.
> >>>> However, iommu_free_pgtables() will not held it. I guess it was assumed
> >>>> that page-tables cannot be allocated while the domain is dying.
> >>>>
> >>>> Unfortunately, there is no guarantee that iommu_map() will not be
> >>>> called while a domain is dying (it looks like to be possible from
> >>>> XEN_DOMCTL_memory_mapping).
> >>>
> >>> I'd rather disallow any new allocations for a dying domain, not
> >>> the least because ...
> >>
> >> Patch #4 will disallow such allocation. However...
> >
> > It turns out that I can't disallow it because there is at least one call
> > of iommu_map() while an HVM domain is destroyed:
> >
> >      (XEN)    [<ffff82d04026e399>] R iommu_map+0xf2/0x171
> >      (XEN)    [<ffff82d04026e43e>] F iommu_legacy_map+0x26/0x63
> >      (XEN)    [<ffff82d040302a42>] F
> > arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
> >      (XEN)    [<ffff82d0402f761d>] F p2m_set_entry+0x91/0x128
> >      (XEN)    [<ffff82d0402f8643>] F guest_physmap_add_entry+0x3d7/0x4e0
> >      (XEN)    [<ffff82d0402f87cb>] F guest_physmap_add_page+0x7f/0xfe
> >      (XEN)    [<ffff82d0402ba0a2>] F
> > arch/x86/hvm/ioreq.c#hvm_add_ioreq_gfn+0x6f/0x8d
> >      (XEN)    [<ffff82d0402ba0f9>] F
> > arch/x86/hvm/ioreq.c#hvm_ioreq_server_disable+0x39/0x4f
> >      (XEN)    [<ffff82d0402bb50e>] F hvm_destroy_all_ioreq_servers+0x6f/0x9b
> >      (XEN)    [<ffff82d0402afc15>] F
> > hvm_domain_relinquish_resources+0x3e/0x92
> >      (XEN)    [<ffff82d04031f278>] F domain_relinquish_resources+0x355/0x372
> >      (XEN)    [<ffff82d040205dd4>] F domain_kill+0xc7/0x150
> >      (XEN)    [<ffff82d04023baf0>] F do_domctl+0xba7/0x1a09
> >      (XEN)    [<ffff82d040312c8f>] F pv_hypercall+0x2f0/0x55f
> >      (XEN)    [<ffff82d040391432>] F lstar_enter+0x112/0x120
> >
> > This one resulted to a domain crash rather than a clean destruction.
> 
> A domain crash despite the domain already getting cleaned up is
> something we may at least want to consider doing better: There
> already is a !d->is_shutting_down conditional printk() there.
> What would people think about avoiding the domain_crash() call
> in similar ways? (It could even be considered to make
> domain_crash() itself behave like this, but such a step may make
> it necessary to first audit all use sites.)
> 
> > I would still like to disallow page-table allocation, so I am think to
> > ignore any request in iommu_map() if the domain is dying.
> 
> Ignoring requests there seems fragile to me. Paul - what are your
> thoughts about bailing early from hvm_add_ioreq_gfn() when the
> domain is dying?
> 

I think that's ok. Really, the only reason for putting the pages back in the P2M is to allow migration to work so if the domain is dying then we don't really care do we?

  Paul

> Jan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
  2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
  2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall
@ 2020-12-22 15:43 ` Julien Grall
  2020-12-23 14:12   ` Jan Beulich
  2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
  3 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw)
  To: xen-devel; +Cc: hongyxia, Julien Grall, Jan Beulich, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The new per-domain IOMMU page-table allocator will now free the
page-tables when domain's resources are relinquished. However, the root
page-table (i.e. hd->arch.pg_maddr) will not be cleared.

Xen may access the IOMMU page-tables afterwards at least in the case of
PV domain:

(XEN) Xen call trace:
(XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
(XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
(XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
(XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
(XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
(XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
(XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
(XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
(XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
(XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
(XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
(XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
(XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
(XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
(XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
(XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
(XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120

This will result to a use after-free and possibly an host crash or
memory corruption.

Freeing the page-tables further down in domain_relinquish_resources()
would not work because pages may not be released until later if another
domain hold a reference on them.

Once all the PCI devices have been de-assigned, it is actually pointless
to access modify the IOMMU page-tables. So we can simply clear the root
page-table address.

Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>

---

This is an RFC because it would break AMD IOMMU driver. One option would
be to move the call to the teardown callback earlier on. Any opinions?
---
 xen/drivers/passthrough/x86/iommu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 779dbb5b98ba..99a23177b3d2 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -267,6 +267,16 @@ int iommu_free_pgtables(struct domain *d)
     struct page_info *pg;
     unsigned int done = 0;
 
+    /*
+     * Pages will be moved to the free list in a bit. So we want to
+     * clear the root page-table to avoid any potential use after-free.
+     *
+     * XXX: This only code works for Intel vT-D.
+     */
+    spin_lock(&hd->arch.mapping_lock);
+    hd->arch.vtd.pgd_maddr = 0;
+    spin_unlock(&hd->arch.mapping_lock);
+
     spin_lock(&hd->arch.pgtables.lock);
     while ( (pg = page_list_remove_head(&hd->arch.pgtables.list)) )
     {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2020-12-23 14:12   ` Jan Beulich
  2020-12-23 14:56     ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 14:12 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 22.12.2020 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new per-domain IOMMU page-table allocator will now free the
> page-tables when domain's resources are relinquished. However, the root
> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
> 
> Xen may access the IOMMU page-tables afterwards at least in the case of
> PV domain:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
> (XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
> 
> This will result to a use after-free and possibly an host crash or
> memory corruption.
> 
> Freeing the page-tables further down in domain_relinquish_resources()
> would not work because pages may not be released until later if another
> domain hold a reference on them.
> 
> Once all the PCI devices have been de-assigned, it is actually pointless
> to access modify the IOMMU page-tables. So we can simply clear the root
> page-table address.
> 
> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> ---
> 
> This is an RFC because it would break AMD IOMMU driver. One option would
> be to move the call to the teardown callback earlier on. Any opinions?

We already have

static void amd_iommu_domain_destroy(struct domain *d)
{
    dom_iommu(d)->arch.amd.root_table = NULL;
}

and this function is AMD's teardown handler. Hence I suppose
doing the same for VT-d would be quite reasonable. And really
VT-d's iommu_domain_teardown() also already has

    hd->arch.vtd.pgd_maddr = 0;

I guess what's missing is prevention of the root table
getting re-setup. This, I guess, would be helped by the
previously suggested preventing of any further modifications
to the p2m and alike for dying domains. Note how e.g. the
handling of XEN_DOMCTL_assign_device already includes a
"dying" check.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 14:12   ` Jan Beulich
@ 2020-12-23 14:56     ` Julien Grall
  2020-12-23 15:00       ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 14:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 23/12/2020 14:12, Jan Beulich wrote:
> On 22.12.2020 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new per-domain IOMMU page-table allocator will now free the
>> page-tables when domain's resources are relinquished. However, the root
>> page-table (i.e. hd->arch.pg_maddr) will not be cleared.
>>
>> Xen may access the IOMMU page-tables afterwards at least in the case of
>> PV domain:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
>> (XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
>> (XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
>> (XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
>> (XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
>> (XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
>> (XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
>> (XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
>> (XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
>> (XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
>> (XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
>> (XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
>> (XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
>> (XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
>> (XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
>> (XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
>> (XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
>> (XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
>> (XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
>> (XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120
>>
>> This will result to a use after-free and possibly an host crash or
>> memory corruption.
>>
>> Freeing the page-tables further down in domain_relinquish_resources()
>> would not work because pages may not be released until later if another
>> domain hold a reference on them.
>>
>> Once all the PCI devices have been de-assigned, it is actually pointless
>> to access modify the IOMMU page-tables. So we can simply clear the root
>> page-table address.
>>
>> Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> ---
>>
>> This is an RFC because it would break AMD IOMMU driver. One option would
>> be to move the call to the teardown callback earlier on. Any opinions?
> 
> We already have
> 
> static void amd_iommu_domain_destroy(struct domain *d)
> {
>      dom_iommu(d)->arch.amd.root_table = NULL;
> }
> 
> and this function is AMD's teardown handler. Hence I suppose
> doing the same for VT-d would be quite reasonable. And really
> VT-d's iommu_domain_teardown() also already has
> 
>      hd->arch.vtd.pgd_maddr = 0;

Let me have a look if that works.

> 
> I guess what's missing is prevention of the root table
> getting re-setup. 

This is taken care in the follow-up patch by forbidding page-table 
allocation. I can mention it in the commit message.


> This, I guess, would be helped by the
> previously suggested preventing of any further modifications
> to the p2m and alike for dying domains. Note how e.g. the
> handling of XEN_DOMCTL_assign_device already includes a
> "dying" check.
> 
> Jan
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 14:56     ` Julien Grall
@ 2020-12-23 15:00       ` Jan Beulich
  2020-12-23 15:16         ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 15:00 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 23.12.2020 15:56, Julien Grall wrote:
> On 23/12/2020 14:12, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>> be to move the call to the teardown callback earlier on. Any opinions?
>>
>> We already have
>>
>> static void amd_iommu_domain_destroy(struct domain *d)
>> {
>>      dom_iommu(d)->arch.amd.root_table = NULL;
>> }
>>
>> and this function is AMD's teardown handler. Hence I suppose
>> doing the same for VT-d would be quite reasonable. And really
>> VT-d's iommu_domain_teardown() also already has
>>
>>      hd->arch.vtd.pgd_maddr = 0;
> 
> Let me have a look if that works.
> 
>>
>> I guess what's missing is prevention of the root table
>> getting re-setup. 
> 
> This is taken care in the follow-up patch by forbidding page-table 
> allocation. I can mention it in the commit message.

My expectation is that with that subsequent change the change here
(or any variant of it) would become unnecessary.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 15:00       ` Jan Beulich
@ 2020-12-23 15:16         ` Julien Grall
  2020-12-23 16:11           ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 15:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

Hi Jan,

On 23/12/2020 15:00, Jan Beulich wrote:
> On 23.12.2020 15:56, Julien Grall wrote:
>> On 23/12/2020 14:12, Jan Beulich wrote:
>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>
>>> We already have
>>>
>>> static void amd_iommu_domain_destroy(struct domain *d)
>>> {
>>>       dom_iommu(d)->arch.amd.root_table = NULL;
>>> }
>>>
>>> and this function is AMD's teardown handler. Hence I suppose
>>> doing the same for VT-d would be quite reasonable. And really
>>> VT-d's iommu_domain_teardown() also already has
>>>
>>>       hd->arch.vtd.pgd_maddr = 0;
>>
>> Let me have a look if that works.
>>
>>>
>>> I guess what's missing is prevention of the root table
>>> getting re-setup.
>>
>> This is taken care in the follow-up patch by forbidding page-table
>> allocation. I can mention it in the commit message.
> 
> My expectation is that with that subsequent change the change here
> (or any variant of it) would become unnecessary.

I am not be sure. iommu_unmap() would still get called from put_page(). 
Are you suggesting to gate the code if d->is_dying as well?

Even if this patch is deemed to be unecessary to fix the issue.
This issue was quite hard to chase/reproduce.

I think it would still be good to harden the code by zeroing 
hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the 
pointer was still "valid".

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 15:16         ` Julien Grall
@ 2020-12-23 16:11           ` Jan Beulich
  2020-12-23 16:16             ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 16:11 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 23.12.2020 16:16, Julien Grall wrote:
> On 23/12/2020 15:00, Jan Beulich wrote:
>> On 23.12.2020 15:56, Julien Grall wrote:
>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>
>>>> We already have
>>>>
>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>> {
>>>>       dom_iommu(d)->arch.amd.root_table = NULL;
>>>> }
>>>>
>>>> and this function is AMD's teardown handler. Hence I suppose
>>>> doing the same for VT-d would be quite reasonable. And really
>>>> VT-d's iommu_domain_teardown() also already has
>>>>
>>>>       hd->arch.vtd.pgd_maddr = 0;
>>>
>>> Let me have a look if that works.
>>>
>>>>
>>>> I guess what's missing is prevention of the root table
>>>> getting re-setup.
>>>
>>> This is taken care in the follow-up patch by forbidding page-table
>>> allocation. I can mention it in the commit message.
>>
>> My expectation is that with that subsequent change the change here
>> (or any variant of it) would become unnecessary.
> 
> I am not be sure. iommu_unmap() would still get called from put_page(). 
> Are you suggesting to gate the code if d->is_dying as well?

Unmap shouldn't be allocating any memory right now, as in
non-shared-page-table mode we don't install any superpages
(unless I misremember).

> Even if this patch is deemed to be unecessary to fix the issue.
> This issue was quite hard to chase/reproduce.
> 
> I think it would still be good to harden the code by zeroing 
> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the 
> pointer was still "valid".

But my point was that this zeroing already happens. What I
suspect is that it gets re-populated after it was zeroed,
because of page table manipulation that shouldn't be
occurring anymore for a dying domain.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:11           ` Jan Beulich
@ 2020-12-23 16:16             ` Julien Grall
  2020-12-23 16:24               ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 16:16 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

Hi,

On 23/12/2020 16:11, Jan Beulich wrote:
> On 23.12.2020 16:16, Julien Grall wrote:
>> On 23/12/2020 15:00, Jan Beulich wrote:
>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>>
>>>>> We already have
>>>>>
>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>> {
>>>>>        dom_iommu(d)->arch.amd.root_table = NULL;
>>>>> }
>>>>>
>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>
>>>>>        hd->arch.vtd.pgd_maddr = 0;
>>>>
>>>> Let me have a look if that works.
>>>>
>>>>>
>>>>> I guess what's missing is prevention of the root table
>>>>> getting re-setup.
>>>>
>>>> This is taken care in the follow-up patch by forbidding page-table
>>>> allocation. I can mention it in the commit message.
>>>
>>> My expectation is that with that subsequent change the change here
>>> (or any variant of it) would become unnecessary.
>>
>> I am not be sure. iommu_unmap() would still get called from put_page().
>> Are you suggesting to gate the code if d->is_dying as well?
> 
> Unmap shouldn't be allocating any memory right now, as in
> non-shared-page-table mode we don't install any superpages
> (unless I misremember).

It doesn't allocate memory, but it will try to access the IOMMU 
page-tables (see more below).

> 
>> Even if this patch is deemed to be unecessary to fix the issue.
>> This issue was quite hard to chase/reproduce.
>>
>> I think it would still be good to harden the code by zeroing
>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>> pointer was still "valid".
> 
> But my point was that this zeroing already happens. 
> What I
> suspect is that it gets re-populated after it was zeroed,
> because of page table manipulation that shouldn't be
> occurring anymore for a dying domain.

AFAICT, the zeroing is happening in ->teardown() helper.

It is only called when the domain is fully destroyed (see call in 
arch_domain_destroy()). This will happen much after relinquishing the code.

Could you clarify why you think it is already zeroed and by who?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:16             ` Julien Grall
@ 2020-12-23 16:24               ` Jan Beulich
  2020-12-23 16:29                 ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 16:24 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel

On 23.12.2020 17:16, Julien Grall wrote:
> On 23/12/2020 16:11, Jan Beulich wrote:
>> On 23.12.2020 16:16, Julien Grall wrote:
>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?

Please note this (in your original submission). I simply ...

>>>>>> We already have
>>>>>>
>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>> {
>>>>>>        dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>> }
>>>>>>
>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>
>>>>>>        hd->arch.vtd.pgd_maddr = 0;
>>>>>
>>>>> Let me have a look if that works.
>>>>>
>>>>>>
>>>>>> I guess what's missing is prevention of the root table
>>>>>> getting re-setup.
>>>>>
>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>> allocation. I can mention it in the commit message.
>>>>
>>>> My expectation is that with that subsequent change the change here
>>>> (or any variant of it) would become unnecessary.
>>>
>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>> Are you suggesting to gate the code if d->is_dying as well?
>>
>> Unmap shouldn't be allocating any memory right now, as in
>> non-shared-page-table mode we don't install any superpages
>> (unless I misremember).
> 
> It doesn't allocate memory, but it will try to access the IOMMU 
> page-tables (see more below).
> 
>>
>>> Even if this patch is deemed to be unecessary to fix the issue.
>>> This issue was quite hard to chase/reproduce.
>>>
>>> I think it would still be good to harden the code by zeroing
>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>> pointer was still "valid".
>>
>> But my point was that this zeroing already happens. 
>> What I
>> suspect is that it gets re-populated after it was zeroed,
>> because of page table manipulation that shouldn't be
>> occurring anymore for a dying domain.
> 
> AFAICT, the zeroing is happening in ->teardown() helper.
> 
> It is only called when the domain is fully destroyed (see call in 
> arch_domain_destroy()). This will happen much after relinquishing the code.
> 
> Could you clarify why you think it is already zeroed and by who?

... trusted you on what you stated there. But perhaps I somehow
misunderstood that sentence to mean you want to put your addition
into the teardown functions, when apparently you meant to invoke
them earlier in the process. Without clearly identifying why this
would be a safe thing to do, I couldn't imagine that's what you
suggest as alternative. This is because the interdependencies of
the IOMMU code are pretty hard to follow at times, and hence any
such re-ordering has a fair risk of breaking something elsewhere.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:24               ` Jan Beulich
@ 2020-12-23 16:29                 ` Julien Grall
  2020-12-23 16:46                   ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 16:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, Paul Durrant, xen-devel



On 23/12/2020 16:24, Jan Beulich wrote:
> On 23.12.2020 17:16, Julien Grall wrote:
>> On 23/12/2020 16:11, Jan Beulich wrote:
>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
> 
> Please note this (in your original submission). I simply ...
> 
>>>>>>> We already have
>>>>>>>
>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>> {
>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>> }
>>>>>>>
>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>
>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
>>>>>>
>>>>>> Let me have a look if that works.
>>>>>>
>>>>>>>
>>>>>>> I guess what's missing is prevention of the root table
>>>>>>> getting re-setup.
>>>>>>
>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>> allocation. I can mention it in the commit message.
>>>>>
>>>>> My expectation is that with that subsequent change the change here
>>>>> (or any variant of it) would become unnecessary.
>>>>
>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>
>>> Unmap shouldn't be allocating any memory right now, as in
>>> non-shared-page-table mode we don't install any superpages
>>> (unless I misremember).
>>
>> It doesn't allocate memory, but it will try to access the IOMMU
>> page-tables (see more below).
>>
>>>
>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>> This issue was quite hard to chase/reproduce.
>>>>
>>>> I think it would still be good to harden the code by zeroing
>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>> pointer was still "valid".
>>>
>>> But my point was that this zeroing already happens.
>>> What I
>>> suspect is that it gets re-populated after it was zeroed,
>>> because of page table manipulation that shouldn't be
>>> occurring anymore for a dying domain.
>>
>> AFAICT, the zeroing is happening in ->teardown() helper.
>>
>> It is only called when the domain is fully destroyed (see call in
>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>
>> Could you clarify why you think it is already zeroed and by who?
> 
> ... trusted you on what you stated there. But perhaps I somehow
> misunderstood that sentence to mean you want to put your addition
> into the teardown functions, when apparently you meant to invoke
> them earlier in the process. Without clearly identifying why this
> would be a safe thing to do, I couldn't imagine that's what you
> suggest as alternative. 

This was a wording issue. I meant moving ->teardown() before (or calling 
from) iommu_free_pgtables().

Shall I introduce a new callback then?

> This is because the interdependencies of
> the IOMMU code are pretty hard to follow at times, and hence any
> such re-ordering has a fair risk of breaking something elsewhere.

Right, this is another reason to try to get most of my fix 
self-contained rather relying on top-layer call to protect against a 
domain dying.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:29                 ` Julien Grall
@ 2020-12-23 16:46                   ` Jan Beulich
  2020-12-23 16:54                     ` Julien Grall
  2021-01-04  9:53                     ` Paul Durrant
  0 siblings, 2 replies; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 16:46 UTC (permalink / raw)
  To: Julien Grall, Paul Durrant; +Cc: hongyxia, Julien Grall, xen-devel

On 23.12.2020 17:29, Julien Grall wrote:
> On 23/12/2020 16:24, Jan Beulich wrote:
>> On 23.12.2020 17:16, Julien Grall wrote:
>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>
>> Please note this (in your original submission). I simply ...
>>
>>>>>>>> We already have
>>>>>>>>
>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>> {
>>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>
>>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
>>>>>>>
>>>>>>> Let me have a look if that works.
>>>>>>>
>>>>>>>>
>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>> getting re-setup.
>>>>>>>
>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>> allocation. I can mention it in the commit message.
>>>>>>
>>>>>> My expectation is that with that subsequent change the change here
>>>>>> (or any variant of it) would become unnecessary.
>>>>>
>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>
>>>> Unmap shouldn't be allocating any memory right now, as in
>>>> non-shared-page-table mode we don't install any superpages
>>>> (unless I misremember).
>>>
>>> It doesn't allocate memory, but it will try to access the IOMMU
>>> page-tables (see more below).
>>>
>>>>
>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>> This issue was quite hard to chase/reproduce.
>>>>>
>>>>> I think it would still be good to harden the code by zeroing
>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>> pointer was still "valid".
>>>>
>>>> But my point was that this zeroing already happens.
>>>> What I
>>>> suspect is that it gets re-populated after it was zeroed,
>>>> because of page table manipulation that shouldn't be
>>>> occurring anymore for a dying domain.
>>>
>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>
>>> It is only called when the domain is fully destroyed (see call in
>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>
>>> Could you clarify why you think it is already zeroed and by who?
>>
>> ... trusted you on what you stated there. But perhaps I somehow
>> misunderstood that sentence to mean you want to put your addition
>> into the teardown functions, when apparently you meant to invoke
>> them earlier in the process. Without clearly identifying why this
>> would be a safe thing to do, I couldn't imagine that's what you
>> suggest as alternative. 
> 
> This was a wording issue. I meant moving ->teardown() before (or calling 
> from) iommu_free_pgtables().
> 
> Shall I introduce a new callback then?

Earlier zeroing won't help unless you prevent re-population, or
unless you make the code capable of telling "still zero" from
"already zero". But I have to admit I'd like to also have Paul's
opinion on the matter.

Jan

>> This is because the interdependencies of
>> the IOMMU code are pretty hard to follow at times, and hence any
>> such re-ordering has a fair risk of breaking something elsewhere.
> 
> Right, this is another reason to try to get most of my fix 
> self-contained rather relying on top-layer call to protect against a 
> domain dying.
> 
> Cheers,
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:46                   ` Jan Beulich
@ 2020-12-23 16:54                     ` Julien Grall
  2020-12-23 17:02                       ` Jan Beulich
  2021-01-04  9:53                     ` Paul Durrant
  1 sibling, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 16:54 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant; +Cc: hongyxia, Julien Grall, xen-devel



On 23/12/2020 16:46, Jan Beulich wrote:
> On 23.12.2020 17:29, Julien Grall wrote:
>> On 23/12/2020 16:24, Jan Beulich wrote:
>>> On 23.12.2020 17:16, Julien Grall wrote:
>>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>
>>> Please note this (in your original submission). I simply ...
>>>
>>>>>>>>> We already have
>>>>>>>>>
>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>>> {
>>>>>>>>>          dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>>
>>>>>>>>>          hd->arch.vtd.pgd_maddr = 0;
>>>>>>>>
>>>>>>>> Let me have a look if that works.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>>> getting re-setup.
>>>>>>>>
>>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>>> allocation. I can mention it in the commit message.
>>>>>>>
>>>>>>> My expectation is that with that subsequent change the change here
>>>>>>> (or any variant of it) would become unnecessary.
>>>>>>
>>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>>
>>>>> Unmap shouldn't be allocating any memory right now, as in
>>>>> non-shared-page-table mode we don't install any superpages
>>>>> (unless I misremember).
>>>>
>>>> It doesn't allocate memory, but it will try to access the IOMMU
>>>> page-tables (see more below).
>>>>
>>>>>
>>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>>> This issue was quite hard to chase/reproduce.
>>>>>>
>>>>>> I think it would still be good to harden the code by zeroing
>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>>> pointer was still "valid".
>>>>>
>>>>> But my point was that this zeroing already happens.
>>>>> What I
>>>>> suspect is that it gets re-populated after it was zeroed,
>>>>> because of page table manipulation that shouldn't be
>>>>> occurring anymore for a dying domain.
>>>>
>>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>>
>>>> It is only called when the domain is fully destroyed (see call in
>>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>>
>>>> Could you clarify why you think it is already zeroed and by who?
>>>
>>> ... trusted you on what you stated there. But perhaps I somehow
>>> misunderstood that sentence to mean you want to put your addition
>>> into the teardown functions, when apparently you meant to invoke
>>> them earlier in the process. Without clearly identifying why this
>>> would be a safe thing to do, I couldn't imagine that's what you
>>> suggest as alternative.
>>
>> This was a wording issue. I meant moving ->teardown() before (or calling
>> from) iommu_free_pgtables().
>>
>> Shall I introduce a new callback then?
> 
> Earlier zeroing won't help unless you prevent re-population, or
> unless you make the code capable of telling "still zero" from
> "already zero". But I have to admit I'd like to also have Paul's
> opinion on the matter.

Patch #4 is meant to prevent that with the d->is_dying check in the 
IOMMU page-table allocation.

Do you think this is not enough?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:54                     ` Julien Grall
@ 2020-12-23 17:02                       ` Jan Beulich
  2020-12-23 17:26                         ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 17:02 UTC (permalink / raw)
  To: Julien Grall; +Cc: hongyxia, Julien Grall, xen-devel, Paul Durrant

On 23.12.2020 17:54, Julien Grall wrote:
> 
> 
> On 23/12/2020 16:46, Jan Beulich wrote:
>> On 23.12.2020 17:29, Julien Grall wrote:
>>> On 23/12/2020 16:24, Jan Beulich wrote:
>>>> On 23.12.2020 17:16, Julien Grall wrote:
>>>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>
>>>> Please note this (in your original submission). I simply ...
>>>>
>>>>>>>>>> We already have
>>>>>>>>>>
>>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>>>> {
>>>>>>>>>>          dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>>>
>>>>>>>>>>          hd->arch.vtd.pgd_maddr = 0;
>>>>>>>>>
>>>>>>>>> Let me have a look if that works.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>>>> getting re-setup.
>>>>>>>>>
>>>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>>>> allocation. I can mention it in the commit message.
>>>>>>>>
>>>>>>>> My expectation is that with that subsequent change the change here
>>>>>>>> (or any variant of it) would become unnecessary.
>>>>>>>
>>>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>>>
>>>>>> Unmap shouldn't be allocating any memory right now, as in
>>>>>> non-shared-page-table mode we don't install any superpages
>>>>>> (unless I misremember).
>>>>>
>>>>> It doesn't allocate memory, but it will try to access the IOMMU
>>>>> page-tables (see more below).
>>>>>
>>>>>>
>>>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>>>> This issue was quite hard to chase/reproduce.
>>>>>>>
>>>>>>> I think it would still be good to harden the code by zeroing
>>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>>>> pointer was still "valid".
>>>>>>
>>>>>> But my point was that this zeroing already happens.
>>>>>> What I
>>>>>> suspect is that it gets re-populated after it was zeroed,
>>>>>> because of page table manipulation that shouldn't be
>>>>>> occurring anymore for a dying domain.
>>>>>
>>>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>>>
>>>>> It is only called when the domain is fully destroyed (see call in
>>>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>>>
>>>>> Could you clarify why you think it is already zeroed and by who?
>>>>
>>>> ... trusted you on what you stated there. But perhaps I somehow
>>>> misunderstood that sentence to mean you want to put your addition
>>>> into the teardown functions, when apparently you meant to invoke
>>>> them earlier in the process. Without clearly identifying why this
>>>> would be a safe thing to do, I couldn't imagine that's what you
>>>> suggest as alternative.
>>>
>>> This was a wording issue. I meant moving ->teardown() before (or calling
>>> from) iommu_free_pgtables().
>>>
>>> Shall I introduce a new callback then?
>>
>> Earlier zeroing won't help unless you prevent re-population, or
>> unless you make the code capable of telling "still zero" from
>> "already zero". But I have to admit I'd like to also have Paul's
>> opinion on the matter.
> 
> Patch #4 is meant to prevent that with the d->is_dying check in the 
> IOMMU page-table allocation.
> 
> Do you think this is not enough?

It probably is; I think that other patch would want to come first
then, or both be folded. Nevertheless I'm not fully convinced
putting the check there is the best course of action.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 17:02                       ` Jan Beulich
@ 2020-12-23 17:26                         ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2020-12-23 17:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: hongyxia, Julien Grall, xen-devel, Paul Durrant

Hi Jan,

On 23/12/2020 17:02, Jan Beulich wrote:
> On 23.12.2020 17:54, Julien Grall wrote:
>>
>>
>> On 23/12/2020 16:46, Jan Beulich wrote:
>>> On 23.12.2020 17:29, Julien Grall wrote:
>>>> On 23/12/2020 16:24, Jan Beulich wrote:
>>>>> On 23.12.2020 17:16, Julien Grall wrote:
>>>>>> On 23/12/2020 16:11, Jan Beulich wrote:
>>>>>>> On 23.12.2020 16:16, Julien Grall wrote:
>>>>>>>> On 23/12/2020 15:00, Jan Beulich wrote:
>>>>>>>>> On 23.12.2020 15:56, Julien Grall wrote:
>>>>>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
>>>>>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>>>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
>>>>>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
>>>>>
>>>>> Please note this (in your original submission). I simply ...
>>>>>
>>>>>>>>>>> We already have
>>>>>>>>>>>
>>>>>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
>>>>>>>>>>> {
>>>>>>>>>>>           dom_iommu(d)->arch.amd.root_table = NULL;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> and this function is AMD's teardown handler. Hence I suppose
>>>>>>>>>>> doing the same for VT-d would be quite reasonable. And really
>>>>>>>>>>> VT-d's iommu_domain_teardown() also already has
>>>>>>>>>>>
>>>>>>>>>>>           hd->arch.vtd.pgd_maddr = 0;
>>>>>>>>>>
>>>>>>>>>> Let me have a look if that works.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I guess what's missing is prevention of the root table
>>>>>>>>>>> getting re-setup.
>>>>>>>>>>
>>>>>>>>>> This is taken care in the follow-up patch by forbidding page-table
>>>>>>>>>> allocation. I can mention it in the commit message.
>>>>>>>>>
>>>>>>>>> My expectation is that with that subsequent change the change here
>>>>>>>>> (or any variant of it) would become unnecessary.
>>>>>>>>
>>>>>>>> I am not be sure. iommu_unmap() would still get called from put_page().
>>>>>>>> Are you suggesting to gate the code if d->is_dying as well?
>>>>>>>
>>>>>>> Unmap shouldn't be allocating any memory right now, as in
>>>>>>> non-shared-page-table mode we don't install any superpages
>>>>>>> (unless I misremember).
>>>>>>
>>>>>> It doesn't allocate memory, but it will try to access the IOMMU
>>>>>> page-tables (see more below).
>>>>>>
>>>>>>>
>>>>>>>> Even if this patch is deemed to be unecessary to fix the issue.
>>>>>>>> This issue was quite hard to chase/reproduce.
>>>>>>>>
>>>>>>>> I think it would still be good to harden the code by zeroing
>>>>>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
>>>>>>>> pointer was still "valid".
>>>>>>>
>>>>>>> But my point was that this zeroing already happens.
>>>>>>> What I
>>>>>>> suspect is that it gets re-populated after it was zeroed,
>>>>>>> because of page table manipulation that shouldn't be
>>>>>>> occurring anymore for a dying domain.
>>>>>>
>>>>>> AFAICT, the zeroing is happening in ->teardown() helper.
>>>>>>
>>>>>> It is only called when the domain is fully destroyed (see call in
>>>>>> arch_domain_destroy()). This will happen much after relinquishing the code.
>>>>>>
>>>>>> Could you clarify why you think it is already zeroed and by who?
>>>>>
>>>>> ... trusted you on what you stated there. But perhaps I somehow
>>>>> misunderstood that sentence to mean you want to put your addition
>>>>> into the teardown functions, when apparently you meant to invoke
>>>>> them earlier in the process. Without clearly identifying why this
>>>>> would be a safe thing to do, I couldn't imagine that's what you
>>>>> suggest as alternative.
>>>>
>>>> This was a wording issue. I meant moving ->teardown() before (or calling
>>>> from) iommu_free_pgtables().
>>>>
>>>> Shall I introduce a new callback then?
>>>
>>> Earlier zeroing won't help unless you prevent re-population, or
>>> unless you make the code capable of telling "still zero" from
>>> "already zero". But I have to admit I'd like to also have Paul's
>>> opinion on the matter.
>>
>> Patch #4 is meant to prevent that with the d->is_dying check in the
>> IOMMU page-table allocation.
>>
>> Do you think this is not enough?
> 
> It probably is; I think that other patch would want to come first
> then, or both be folded. 

I would like to keep them separated. But I am happy to re-order them.

> Nevertheless I'm not fully convinced
> putting the check there is the best course of action.

As you pointed out in a previous e-mail, the IOMMU code is pretty hard 
to follow at times. The check in the allocator is quite simple, so I 
think it would be best to keep it.

It doesn't mean this should be the only change, but it will avoid a 
whole lot of potential issues if we missed any path that may touch the 
IOMMU page-tables while the domain is dying.

The other checks can just be shortcut to prevent extra work that will 
result to a failure.

I will wait for Paul's input before reworking the series.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables
  2020-12-23 16:46                   ` Jan Beulich
  2020-12-23 16:54                     ` Julien Grall
@ 2021-01-04  9:53                     ` Paul Durrant
  1 sibling, 0 replies; 40+ messages in thread
From: Paul Durrant @ 2021-01-04  9:53 UTC (permalink / raw)
  To: 'Jan Beulich', 'Julien Grall'
  Cc: hongyxia, 'Julien Grall', xen-devel

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 23 December 2020 16:46
> To: Julien Grall <julien@xen.org>; Paul Durrant <paul@xen.org>
> Cc: hongyxia@amazon.co.uk; Julien Grall <jgrall@amazon.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the
> page-tables
> 
> On 23.12.2020 17:29, Julien Grall wrote:
> > On 23/12/2020 16:24, Jan Beulich wrote:
> >> On 23.12.2020 17:16, Julien Grall wrote:
> >>> On 23/12/2020 16:11, Jan Beulich wrote:
> >>>> On 23.12.2020 16:16, Julien Grall wrote:
> >>>>> On 23/12/2020 15:00, Jan Beulich wrote:
> >>>>>> On 23.12.2020 15:56, Julien Grall wrote:
> >>>>>>> On 23/12/2020 14:12, Jan Beulich wrote:
> >>>>>>>> On 22.12.2020 16:43, Julien Grall wrote:
> >>>>>>>>> This is an RFC because it would break AMD IOMMU driver. One option would
> >>>>>>>>> be to move the call to the teardown callback earlier on. Any opinions?
> >>
> >> Please note this (in your original submission). I simply ...
> >>
> >>>>>>>> We already have
> >>>>>>>>
> >>>>>>>> static void amd_iommu_domain_destroy(struct domain *d)
> >>>>>>>> {
> >>>>>>>>         dom_iommu(d)->arch.amd.root_table = NULL;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> and this function is AMD's teardown handler. Hence I suppose
> >>>>>>>> doing the same for VT-d would be quite reasonable. And really
> >>>>>>>> VT-d's iommu_domain_teardown() also already has
> >>>>>>>>
> >>>>>>>>         hd->arch.vtd.pgd_maddr = 0;
> >>>>>>>
> >>>>>>> Let me have a look if that works.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> I guess what's missing is prevention of the root table
> >>>>>>>> getting re-setup.
> >>>>>>>
> >>>>>>> This is taken care in the follow-up patch by forbidding page-table
> >>>>>>> allocation. I can mention it in the commit message.
> >>>>>>
> >>>>>> My expectation is that with that subsequent change the change here
> >>>>>> (or any variant of it) would become unnecessary.
> >>>>>
> >>>>> I am not be sure. iommu_unmap() would still get called from put_page().
> >>>>> Are you suggesting to gate the code if d->is_dying as well?
> >>>>
> >>>> Unmap shouldn't be allocating any memory right now, as in
> >>>> non-shared-page-table mode we don't install any superpages
> >>>> (unless I misremember).
> >>>
> >>> It doesn't allocate memory, but it will try to access the IOMMU
> >>> page-tables (see more below).
> >>>
> >>>>
> >>>>> Even if this patch is deemed to be unecessary to fix the issue.
> >>>>> This issue was quite hard to chase/reproduce.
> >>>>>
> >>>>> I think it would still be good to harden the code by zeroing
> >>>>> hd->arch.vtd.pgd_maddr to avoid anyone else wasting 2 days because the
> >>>>> pointer was still "valid".
> >>>>
> >>>> But my point was that this zeroing already happens.
> >>>> What I
> >>>> suspect is that it gets re-populated after it was zeroed,
> >>>> because of page table manipulation that shouldn't be
> >>>> occurring anymore for a dying domain.
> >>>
> >>> AFAICT, the zeroing is happening in ->teardown() helper.
> >>>
> >>> It is only called when the domain is fully destroyed (see call in
> >>> arch_domain_destroy()). This will happen much after relinquishing the code.
> >>>
> >>> Could you clarify why you think it is already zeroed and by who?
> >>
> >> ... trusted you on what you stated there. But perhaps I somehow
> >> misunderstood that sentence to mean you want to put your addition
> >> into the teardown functions, when apparently you meant to invoke
> >> them earlier in the process. Without clearly identifying why this
> >> would be a safe thing to do, I couldn't imagine that's what you
> >> suggest as alternative.
> >
> > This was a wording issue. I meant moving ->teardown() before (or calling
> > from) iommu_free_pgtables().
> >
> > Shall I introduce a new callback then?
> 
> Earlier zeroing won't help unless you prevent re-population, or
> unless you make the code capable of telling "still zero" from
> "already zero". But I have to admit I'd like to also have Paul's
> opinion on the matter.
> 

I have a pending series to dynamically unshare IOMMU mappings (to allow logdirty to be enabled on domains with currently-shared EPT) which gets rid of the lazy allocation of the root table and uses pgd_maddr == NULL as a test of whether EPT is shared or not. Therefore it would seem best, to fix this immediate problem, to also stop lazy allocation of pgd_maddr, and re-introduce a per-implementation free_pgtables() callback which zeroes pgd_maddr and then calls the common function.

  Paul



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
                   ` (2 preceding siblings ...)
  2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
@ 2020-12-22 15:43 ` Julien Grall
  2020-12-22 17:12   ` Julien Grall
  2020-12-23 14:34   ` Jan Beulich
  3 siblings, 2 replies; 40+ messages in thread
From: Julien Grall @ 2020-12-22 15:43 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Paul Durrant

From: Julien Grall <jgrall@amazon.com>

The new IOMMU page-tables allocator will release the pages when
relinquish the domain resources. However, this is not sufficient in two
cases:
    1) domain_relinquish_resources() is not called when the domain
    creation fails.
    2) There is nothing preventing page-table allocations when the
    domain is dying.

In both cases, this can be solved by freeing the page-tables again
when the domain destruction. Although, this may result to an high
number of page-tables to free.

In the second case, it is pointless to allow page-table allocation when
the domain is going to die. iommu_alloc_pgtable() will now return an
error when it is called while the domain is dying.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/x86/domain.c               |  2 +-
 xen/drivers/passthrough/x86/iommu.c | 32 +++++++++++++++++++++++++++--
 xen/include/asm-x86/iommu.h         |  2 +-
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b9ba04633e18..1b7ee5c1a8cb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2290,7 +2290,7 @@ int domain_relinquish_resources(struct domain *d)
 
     PROGRESS(iommu_pagetables):
 
-        ret = iommu_free_pgtables(d);
+        ret = iommu_free_pgtables(d, false);
         if ( ret )
             return ret;
 
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index 99a23177b3d2..4a083e4b8f11 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -149,6 +149,21 @@ int arch_iommu_domain_init(struct domain *d)
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
+    struct domain_iommu *hd = dom_iommu(d);
+    int rc;
+
+    /*
+     * The relinquish code will not be executed if the domain creation
+     * failed. To avoid any memory leak, we want to free any IOMMU
+     * page-tables that may have been allocated.
+     */
+    rc = iommu_free_pgtables(d, false);
+
+    /* The preemption was disabled, so the call should never fail. */
+    if ( rc )
+        ASSERT_UNREACHABLE();
+
+    ASSERT(page_list_empty(&hd->arch.pgtables.list));
 }
 
 static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
@@ -261,7 +276,7 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
         return;
 }
 
-int iommu_free_pgtables(struct domain *d)
+int iommu_free_pgtables(struct domain *d, bool preempt)
 {
     struct domain_iommu *hd = dom_iommu(d);
     struct page_info *pg;
@@ -282,7 +297,7 @@ int iommu_free_pgtables(struct domain *d)
     {
         free_domheap_page(pg);
 
-        if ( !(++done & 0xff) && general_preempt_check() )
+        if ( !(++done & 0xff) && preempt && general_preempt_check() )
         {
             spin_unlock(&hd->arch.pgtables.lock);
             return -ERESTART;
@@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
         memflags = MEMF_node(hd->node);
 #endif
 
+    /*
+     * The IOMMU page-tables are freed when relinquishing the domain, but
+     * nothing prevent allocation to happen afterwards. There is no valid
+     * reasons to continue to update the IOMMU page-tables while the
+     * domain is dying.
+     *
+     * So prevent page-table allocation when the domain is dying. Note
+     * this doesn't fully prevent the race because d->is_dying may not
+     * yet be seen.
+     */
+    if ( d->is_dying )
+        return NULL;
+
     pg = alloc_domheap_page(NULL, memflags);
     if ( !pg )
         return NULL;
diff --git a/xen/include/asm-x86/iommu.h b/xen/include/asm-x86/iommu.h
index 970eb06ffac5..874bb5bbfbde 100644
--- a/xen/include/asm-x86/iommu.h
+++ b/xen/include/asm-x86/iommu.h
@@ -135,7 +135,7 @@ int pi_update_irte(const struct pi_desc *pi_desc, const struct pirq *pirq,
         iommu_vcall(ops, sync_cache, addr, size);       \
 })
 
-int __must_check iommu_free_pgtables(struct domain *d);
+int __must_check iommu_free_pgtables(struct domain *d, bool preempt);
 struct page_info *__must_check iommu_alloc_pgtable(struct domain *d);
 
 #endif /* !__ARCH_X86_IOMMU_H__ */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
@ 2020-12-22 17:12   ` Julien Grall
  2020-12-23 14:34   ` Jan Beulich
  1 sibling, 0 replies; 40+ messages in thread
From: Julien Grall @ 2020-12-22 17:12 UTC (permalink / raw)
  To: xen-devel
  Cc: hongyxia, Julien Grall, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Paul Durrant

Hi,

On 22/12/2020 15:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new IOMMU page-tables allocator will release the pages when
> relinquish the domain resources. However, this is not sufficient in two
> cases:
>      1) domain_relinquish_resources() is not called when the domain
>      creation fails.
>      2) There is nothing preventing page-table allocations when the
>      domain is dying.
> 
> In both cases, this can be solved by freeing the page-tables again
> when the domain destruction. Although, this may result to an high
> number of page-tables to free.
> 
> In the second case, it is pointless to allow page-table allocation when
> the domain is going to die. iommu_alloc_pgtable() will now return an
> error when it is called while the domain is dying.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

I forgot to mention this is fixing 15bc9a1ef51c "x86/iommu: add common 
page-table allocator". I will add a Fixes tag for the next version.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
  2020-12-22 17:12   ` Julien Grall
@ 2020-12-23 14:34   ` Jan Beulich
  2020-12-23 16:07     ` Julien Grall
  1 sibling, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 14:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel

On 22.12.2020 16:43, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The new IOMMU page-tables allocator will release the pages when
> relinquish the domain resources. However, this is not sufficient in two
> cases:
>     1) domain_relinquish_resources() is not called when the domain
>     creation fails.

Could you remind me of what IOMMU page table insertions there
are during domain creation? No memory got allocated to the
domain at that point yet, so it would seem to me there simply
is nothing to map.

>     2) There is nothing preventing page-table allocations when the
>     domain is dying.
> 
> In both cases, this can be solved by freeing the page-tables again
> when the domain destruction. Although, this may result to an high
> number of page-tables to free.

Since I've seen this before in this series, and despite me also
not being a native speaker, as a nit: I don't think it can
typically be other than "result in".

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2290,7 +2290,7 @@ int domain_relinquish_resources(struct domain *d)
>  
>      PROGRESS(iommu_pagetables):
>  
> -        ret = iommu_free_pgtables(d);
> +        ret = iommu_free_pgtables(d, false);

I suppose you mean "true" here, but I also think the other
approach (checking for DOMDYING_dead, which you don't seem to
like very much) is better, if for no other reason than it
already being used elsewhere.

> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>          memflags = MEMF_node(hd->node);
>  #endif
>  
> +    /*
> +     * The IOMMU page-tables are freed when relinquishing the domain, but
> +     * nothing prevent allocation to happen afterwards. There is no valid
> +     * reasons to continue to update the IOMMU page-tables while the
> +     * domain is dying.
> +     *
> +     * So prevent page-table allocation when the domain is dying. Note
> +     * this doesn't fully prevent the race because d->is_dying may not
> +     * yet be seen.
> +     */
> +    if ( d->is_dying )
> +        return NULL;
> +
>      pg = alloc_domheap_page(NULL, memflags);
>      if ( !pg )
>          return NULL;

As said in reply to an earlier patch - with a suitable
spin_barrier() you can place your check further down, along the
lines of

    spin_lock(&hd->arch.pgtables.lock);
    if ( likely(!d->is_dying) )
    {
        page_list_add(pg, &hd->arch.pgtables.list);
        p = NULL;
    }
    spin_unlock(&hd->arch.pgtables.lock);

    if ( p )
    {
        free_domheap_page(pg);
        pg = NULL;
    }

(albeit I'm relatively sure you won't like the re-purposing of
p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be
nice to use here, but we seem to only have FREE_XENHEAP_PAGE()
so far.)

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-23 14:34   ` Jan Beulich
@ 2020-12-23 16:07     ` Julien Grall
  2020-12-23 16:15       ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 16:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel



On 23/12/2020 14:34, Jan Beulich wrote:
> On 22.12.2020 16:43, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The new IOMMU page-tables allocator will release the pages when
>> relinquish the domain resources. However, this is not sufficient in two
>> cases:
>>      1) domain_relinquish_resources() is not called when the domain
>>      creation fails.
> 
> Could you remind me of what IOMMU page table insertions there
> are during domain creation? No memory got allocated to the
> domain at that point yet, so it would seem to me there simply
> is nothing to map.

The P2M is first modified in hvm_domain_initialise():

(XEN) Xen call trace:
(XEN)    [<ffff82d04026b9ec>] R iommu_alloc_pgtable+0x11/0x137
(XEN)    [<ffff82d04025f9f5>] F 
drivers/passthrough/vtd/iommu.c#addr_to_dma_page_maddr+0x146/0x1d8
(XEN)    [<ffff82d04025fcc5>] F 
drivers/passthrough/vtd/iommu.c#intel_iommu_map_page+0x6a/0x14b
(XEN)    [<ffff82d04026d949>] F iommu_map+0x6d/0x16f
(XEN)    [<ffff82d04026da71>] F iommu_legacy_map+0x26/0x63
(XEN)    [<ffff82d040301bdc>] F 
arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
(XEN)    [<ffff82d0402f67e7>] F p2m_set_entry+0x91/0x128
(XEN)    [<ffff82d0402f6b5c>] F 
arch/x86/mm/p2m.c#set_typed_p2m_entry+0xfe/0x3f7
(XEN)    [<ffff82d0402f7f4c>] F set_mmio_p2m_entry+0x65/0x6e
(XEN)    [<ffff82d04029a080>] F 
arch/x86/hvm/vmx/vmx.c#vmx_domain_initialise+0xf6/0x137
(XEN)    [<ffff82d0402af421>] F hvm_domain_initialise+0x357/0x4c7
(XEN)    [<ffff82d04031eae7>] F arch_domain_create+0x478/0x4ff
(XEN)    [<ffff82d04020476e>] F domain_create+0x4f2/0x778
(XEN)    [<ffff82d04023b0d2>] F do_domctl+0xb1e/0x18b8
(XEN)    [<ffff82d040311dbf>] F pv_hypercall+0x2f0/0x55f
(XEN)    [<ffff82d040390432>] F lstar_enter+0x112/0x120

> 
>>      2) There is nothing preventing page-table allocations when the
>>      domain is dying.
>>
>> In both cases, this can be solved by freeing the page-tables again
>> when the domain destruction. Although, this may result to an high
>> number of page-tables to free.
> 
> Since I've seen this before in this series, and despite me also
> not being a native speaker, as a nit: I don't think it can
> typically be other than "result in".

I think you are right.

> 
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -2290,7 +2290,7 @@ int domain_relinquish_resources(struct domain *d)
>>   
>>       PROGRESS(iommu_pagetables):
>>   
>> -        ret = iommu_free_pgtables(d);
>> +        ret = iommu_free_pgtables(d, false);
> 
> I suppose you mean "true" here, but I also think the other
> approach (checking for DOMDYING_dead, which you don't seem to
> like very much) is better, if for no other reason than it
> already being used elsewhere.

I think "don't like very much" is an understatement :). There seems to 
be more function using an extra parameter (such as hap_set_allocation() 
which was introduced before your DOMDYING_dead). So I only followed what 
they did.

> 
>> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>           memflags = MEMF_node(hd->node);
>>   #endif
>>   
>> +    /*
>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>> +     * nothing prevent allocation to happen afterwards. There is no valid
>> +     * reasons to continue to update the IOMMU page-tables while the
>> +     * domain is dying.
>> +     *
>> +     * So prevent page-table allocation when the domain is dying. Note
>> +     * this doesn't fully prevent the race because d->is_dying may not
>> +     * yet be seen.
>> +     */
>> +    if ( d->is_dying )
>> +        return NULL;
>> +
>>       pg = alloc_domheap_page(NULL, memflags);
>>       if ( !pg )
>>           return NULL;
> 
> As said in reply to an earlier patch - with a suitable
> spin_barrier() you can place your check further down, along the
> lines of
> 
>      spin_lock(&hd->arch.pgtables.lock);
>      if ( likely(!d->is_dying) )
>      {
>          page_list_add(pg, &hd->arch.pgtables.list);
>          p = NULL;
>      }
>      spin_unlock(&hd->arch.pgtables.lock);
> 
>      if ( p )
>      {
>          free_domheap_page(pg);
>          pg = NULL;
>      }
> 
> (albeit I'm relatively sure you won't like the re-purposing of
> p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be
> nice to use here, but we seem to only have FREE_XENHEAP_PAGE()
> so far.)

In fact I don't mind the re-purposing of p. However, I dislike the 
allocation and then freeing when the domain is dying.

I think I prefer the small race introduced (the pages will still be 
freed) over this solution.

Note that Paul's IOMMU series will completely rework the function. So 
this is only temporary.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-23 16:07     ` Julien Grall
@ 2020-12-23 16:15       ` Jan Beulich
  2020-12-23 16:19         ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 16:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel

On 23.12.2020 17:07, Julien Grall wrote:
> On 23/12/2020 14:34, Jan Beulich wrote:
>> On 22.12.2020 16:43, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The new IOMMU page-tables allocator will release the pages when
>>> relinquish the domain resources. However, this is not sufficient in two
>>> cases:
>>>      1) domain_relinquish_resources() is not called when the domain
>>>      creation fails.
>>
>> Could you remind me of what IOMMU page table insertions there
>> are during domain creation? No memory got allocated to the
>> domain at that point yet, so it would seem to me there simply
>> is nothing to map.
> 
> The P2M is first modified in hvm_domain_initialise():
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04026b9ec>] R iommu_alloc_pgtable+0x11/0x137
> (XEN)    [<ffff82d04025f9f5>] F 
> drivers/passthrough/vtd/iommu.c#addr_to_dma_page_maddr+0x146/0x1d8
> (XEN)    [<ffff82d04025fcc5>] F 
> drivers/passthrough/vtd/iommu.c#intel_iommu_map_page+0x6a/0x14b
> (XEN)    [<ffff82d04026d949>] F iommu_map+0x6d/0x16f
> (XEN)    [<ffff82d04026da71>] F iommu_legacy_map+0x26/0x63
> (XEN)    [<ffff82d040301bdc>] F 
> arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
> (XEN)    [<ffff82d0402f67e7>] F p2m_set_entry+0x91/0x128
> (XEN)    [<ffff82d0402f6b5c>] F 
> arch/x86/mm/p2m.c#set_typed_p2m_entry+0xfe/0x3f7
> (XEN)    [<ffff82d0402f7f4c>] F set_mmio_p2m_entry+0x65/0x6e
> (XEN)    [<ffff82d04029a080>] F 
> arch/x86/hvm/vmx/vmx.c#vmx_domain_initialise+0xf6/0x137
> (XEN)    [<ffff82d0402af421>] F hvm_domain_initialise+0x357/0x4c7

Oh, the infamous APIC access page again.

>>> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>           memflags = MEMF_node(hd->node);
>>>   #endif
>>>   
>>> +    /*
>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>> +     * reasons to continue to update the IOMMU page-tables while the
>>> +     * domain is dying.
>>> +     *
>>> +     * So prevent page-table allocation when the domain is dying. Note
>>> +     * this doesn't fully prevent the race because d->is_dying may not
>>> +     * yet be seen.
>>> +     */
>>> +    if ( d->is_dying )
>>> +        return NULL;
>>> +
>>>       pg = alloc_domheap_page(NULL, memflags);
>>>       if ( !pg )
>>>           return NULL;
>>
>> As said in reply to an earlier patch - with a suitable
>> spin_barrier() you can place your check further down, along the
>> lines of
>>
>>      spin_lock(&hd->arch.pgtables.lock);
>>      if ( likely(!d->is_dying) )
>>      {
>>          page_list_add(pg, &hd->arch.pgtables.list);
>>          p = NULL;
>>      }
>>      spin_unlock(&hd->arch.pgtables.lock);
>>
>>      if ( p )
>>      {
>>          free_domheap_page(pg);
>>          pg = NULL;
>>      }
>>
>> (albeit I'm relatively sure you won't like the re-purposing of
>> p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be
>> nice to use here, but we seem to only have FREE_XENHEAP_PAGE()
>> so far.)
> 
> In fact I don't mind the re-purposing of p. However, I dislike the 
> allocation and then freeing when the domain is dying.
> 
> I think I prefer the small race introduced (the pages will still be 
> freed) over this solution.

The "will still be freed" is because of the 2nd round of freeing
you add in an earlier patch? I'd prefer to avoid the race to in
turn avoid that 2nd round of freeing.

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-23 16:15       ` Jan Beulich
@ 2020-12-23 16:19         ` Julien Grall
  2020-12-23 16:35           ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2020-12-23 16:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel



On 23/12/2020 16:15, Jan Beulich wrote:
> On 23.12.2020 17:07, Julien Grall wrote:
>> On 23/12/2020 14:34, Jan Beulich wrote:
>>> On 22.12.2020 16:43, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The new IOMMU page-tables allocator will release the pages when
>>>> relinquish the domain resources. However, this is not sufficient in two
>>>> cases:
>>>>       1) domain_relinquish_resources() is not called when the domain
>>>>       creation fails.
>>>
>>> Could you remind me of what IOMMU page table insertions there
>>> are during domain creation? No memory got allocated to the
>>> domain at that point yet, so it would seem to me there simply
>>> is nothing to map.
>>
>> The P2M is first modified in hvm_domain_initialise():
>>
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d04026b9ec>] R iommu_alloc_pgtable+0x11/0x137
>> (XEN)    [<ffff82d04025f9f5>] F
>> drivers/passthrough/vtd/iommu.c#addr_to_dma_page_maddr+0x146/0x1d8
>> (XEN)    [<ffff82d04025fcc5>] F
>> drivers/passthrough/vtd/iommu.c#intel_iommu_map_page+0x6a/0x14b
>> (XEN)    [<ffff82d04026d949>] F iommu_map+0x6d/0x16f
>> (XEN)    [<ffff82d04026da71>] F iommu_legacy_map+0x26/0x63
>> (XEN)    [<ffff82d040301bdc>] F
>> arch/x86/mm/p2m-ept.c#ept_set_entry+0x6b2/0x730
>> (XEN)    [<ffff82d0402f67e7>] F p2m_set_entry+0x91/0x128
>> (XEN)    [<ffff82d0402f6b5c>] F
>> arch/x86/mm/p2m.c#set_typed_p2m_entry+0xfe/0x3f7
>> (XEN)    [<ffff82d0402f7f4c>] F set_mmio_p2m_entry+0x65/0x6e
>> (XEN)    [<ffff82d04029a080>] F
>> arch/x86/hvm/vmx/vmx.c#vmx_domain_initialise+0xf6/0x137
>> (XEN)    [<ffff82d0402af421>] F hvm_domain_initialise+0x357/0x4c7
> 
> Oh, the infamous APIC access page again.
> 
>>>> @@ -305,6 +320,19 @@ struct page_info *iommu_alloc_pgtable(struct domain *d)
>>>>            memflags = MEMF_node(hd->node);
>>>>    #endif
>>>>    
>>>> +    /*
>>>> +     * The IOMMU page-tables are freed when relinquishing the domain, but
>>>> +     * nothing prevent allocation to happen afterwards. There is no valid
>>>> +     * reasons to continue to update the IOMMU page-tables while the
>>>> +     * domain is dying.
>>>> +     *
>>>> +     * So prevent page-table allocation when the domain is dying. Note
>>>> +     * this doesn't fully prevent the race because d->is_dying may not
>>>> +     * yet be seen.
>>>> +     */
>>>> +    if ( d->is_dying )
>>>> +        return NULL;
>>>> +
>>>>        pg = alloc_domheap_page(NULL, memflags);
>>>>        if ( !pg )
>>>>            return NULL;
>>>
>>> As said in reply to an earlier patch - with a suitable
>>> spin_barrier() you can place your check further down, along the
>>> lines of
>>>
>>>       spin_lock(&hd->arch.pgtables.lock);
>>>       if ( likely(!d->is_dying) )
>>>       {
>>>           page_list_add(pg, &hd->arch.pgtables.list);
>>>           p = NULL;
>>>       }
>>>       spin_unlock(&hd->arch.pgtables.lock);
>>>
>>>       if ( p )
>>>       {
>>>           free_domheap_page(pg);
>>>           pg = NULL;
>>>       }
>>>
>>> (albeit I'm relatively sure you won't like the re-purposing of
>>> p, but that's a minor detail). (FREE_DOMHEAP_PAGE() would be
>>> nice to use here, but we seem to only have FREE_XENHEAP_PAGE()
>>> so far.)
>>
>> In fact I don't mind the re-purposing of p. However, I dislike the
>> allocation and then freeing when the domain is dying.
>>
>> I think I prefer the small race introduced (the pages will still be
>> freed) over this solution.
> 
> The "will still be freed" is because of the 2nd round of freeing
> you add in an earlier patch? I'd prefer to avoid the race to in
> turn avoid that 2nd round of freeing.

The "2nd round" of freeing is necessary at least for the domain creation 
failure case (it would be the 1rst round).

If we can avoid IOMMU page-table allocation in the domain creation path, 
then yes I agree that we want to avoid that "2nd round". Otherwise, I 
think it is best to take advantage of it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-23 16:19         ` Julien Grall
@ 2020-12-23 16:35           ` Jan Beulich
  2021-01-14 18:53             ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2020-12-23 16:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel

On 23.12.2020 17:19, Julien Grall wrote:
> On 23/12/2020 16:15, Jan Beulich wrote:
>> On 23.12.2020 17:07, Julien Grall wrote:
>>> I think I prefer the small race introduced (the pages will still be
>>> freed) over this solution.
>>
>> The "will still be freed" is because of the 2nd round of freeing
>> you add in an earlier patch? I'd prefer to avoid the race to in
>> turn avoid that 2nd round of freeing.
> 
> The "2nd round" of freeing is necessary at least for the domain creation 
> failure case (it would be the 1rst round).
> 
> If we can avoid IOMMU page-table allocation in the domain creation path, 
> then yes I agree that we want to avoid that "2nd round". Otherwise, I 
> think it is best to take advantage of it.

Well, I'm not really certain either way here. If it's really just
VMX'es APIC access page that's the problem here, custom cleanup
of this "fallout" from VMX code would certainly be an option.
Furthermore I consider it wrong to insert this page in the IOMMU
page tables in the first place. This page overlaps with the MSI
special address range, and hence accesses will be dealt with by
interrupt remapping machinery (if enabled). If interrupt
remapping is disabled, this page should be no different for I/O
purposes than all other pages in this window - they shouldn't
be mapped at all.

Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry()
should gain an is_special_page() check to prevent propagation to
the IOMMU for such pages (we can't do anything about them in
shared-page-table setups)? See also the (PV related) comment in
cleanup_page_mappings(), a few lines ahead of a respective use of
is_special_page(),

Jan

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2020-12-23 16:35           ` Jan Beulich
@ 2021-01-14 18:53             ` Julien Grall
  2021-01-15 11:24               ` Jan Beulich
  0 siblings, 1 reply; 40+ messages in thread
From: Julien Grall @ 2021-01-14 18:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel

Hi Jan,

On 23/12/2020 16:35, Jan Beulich wrote:
> On 23.12.2020 17:19, Julien Grall wrote:
>> On 23/12/2020 16:15, Jan Beulich wrote:
>>> On 23.12.2020 17:07, Julien Grall wrote:
>>>> I think I prefer the small race introduced (the pages will still be
>>>> freed) over this solution.
>>>
>>> The "will still be freed" is because of the 2nd round of freeing
>>> you add in an earlier patch? I'd prefer to avoid the race to in
>>> turn avoid that 2nd round of freeing.
>>
>> The "2nd round" of freeing is necessary at least for the domain creation
>> failure case (it would be the 1rst round).
>>
>> If we can avoid IOMMU page-table allocation in the domain creation path,
>> then yes I agree that we want to avoid that "2nd round". Otherwise, I
>> think it is best to take advantage of it.
> 
> Well, I'm not really certain either way here. If it's really just
> VMX'es APIC access page that's the problem here, custom cleanup
> of this "fallout" from VMX code would certainly be an option.

 From my testing, it looks like the VMX APIC page is the only entry 
added while the domain is created.

> Furthermore I consider it wrong to insert this page in the IOMMU
> page tables in the first place. This page overlaps with the MSI
> special address range, and hence accesses will be dealt with by
> interrupt remapping machinery (if enabled). If interrupt
> remapping is disabled, this page should be no different for I/O
> purposes than all other pages in this window - they shouldn't
> be mapped at all.

That's a fair point. I will have a look to see if I can avoid the IOMMU 
mapping for the VMX APIC page.

> Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry()
> should gain an is_special_page() check to prevent propagation to
> the IOMMU for such pages (we can't do anything about them in
> shared-page-table setups)? See also the (PV related) comment in
> cleanup_page_mappings(), a few lines ahead of a respective use of
> is_special_page(),

There seems to be a mismatch between what the comment says and the code 
adding the page in the IOMMU for PV domain.

AFAICT, the IOMMU mapping will be added via _get_page_type():

         /* Special pages should not be accessible from devices. */
         struct domain *d = page_get_owner(page);

         if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
         {
             mfn_t mfn = page_to_mfn(page);

             if ( (x & PGT_type_mask) == PGT_writable_page )
                 rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)),
                                         1ul << PAGE_ORDER_4K);
             else
                 rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn,
                                       1ul << PAGE_ORDER_4K,
                                       IOMMUF_readable | IOMMUF_writable);
         }

In this snippet, "special page" is interpreted as a page with no owner. 
However is_special_page() will return true when PGC_extra is set and the 
flag implies there is an owner (see assign_pages()).

So it looks like to me, any pages with PGC_extra would end up to have a 
mapping in the IOMMU pages-tables if they are writable.

This statement also seems to apply for xenheap pages shared with a 
domain (e.g grant-table).

Did I miss anything?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-01-14 18:53             ` Julien Grall
@ 2021-01-15 11:24               ` Jan Beulich
  2021-01-15 11:30                 ` Julien Grall
  0 siblings, 1 reply; 40+ messages in thread
From: Jan Beulich @ 2021-01-15 11:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel

On 14.01.2021 19:53, Julien Grall wrote:
> On 23/12/2020 16:35, Jan Beulich wrote:
>> On 23.12.2020 17:19, Julien Grall wrote:
>>> On 23/12/2020 16:15, Jan Beulich wrote:
>>>> On 23.12.2020 17:07, Julien Grall wrote:
>>>>> I think I prefer the small race introduced (the pages will still be
>>>>> freed) over this solution.
>>>>
>>>> The "will still be freed" is because of the 2nd round of freeing
>>>> you add in an earlier patch? I'd prefer to avoid the race to in
>>>> turn avoid that 2nd round of freeing.
>>>
>>> The "2nd round" of freeing is necessary at least for the domain creation
>>> failure case (it would be the 1rst round).
>>>
>>> If we can avoid IOMMU page-table allocation in the domain creation path,
>>> then yes I agree that we want to avoid that "2nd round". Otherwise, I
>>> think it is best to take advantage of it.
>>
>> Well, I'm not really certain either way here. If it's really just
>> VMX'es APIC access page that's the problem here, custom cleanup
>> of this "fallout" from VMX code would certainly be an option.
> 
>  From my testing, it looks like the VMX APIC page is the only entry 
> added while the domain is created.
> 
>> Furthermore I consider it wrong to insert this page in the IOMMU
>> page tables in the first place. This page overlaps with the MSI
>> special address range, and hence accesses will be dealt with by
>> interrupt remapping machinery (if enabled). If interrupt
>> remapping is disabled, this page should be no different for I/O
>> purposes than all other pages in this window - they shouldn't
>> be mapped at all.
> 
> That's a fair point. I will have a look to see if I can avoid the IOMMU 
> mapping for the VMX APIC page.
> 
>> Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry()
>> should gain an is_special_page() check to prevent propagation to
>> the IOMMU for such pages (we can't do anything about them in
>> shared-page-table setups)? See also the (PV related) comment in
>> cleanup_page_mappings(), a few lines ahead of a respective use of
>> is_special_page(),
> 
> There seems to be a mismatch between what the comment says and the code 
> adding the page in the IOMMU for PV domain.
> 
> AFAICT, the IOMMU mapping will be added via _get_page_type():
> 
>          /* Special pages should not be accessible from devices. */
>          struct domain *d = page_get_owner(page);
> 
>          if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>          {
>              mfn_t mfn = page_to_mfn(page);
> 
>              if ( (x & PGT_type_mask) == PGT_writable_page )
>                  rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)),
>                                          1ul << PAGE_ORDER_4K);
>              else
>                  rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn,
>                                        1ul << PAGE_ORDER_4K,
>                                        IOMMUF_readable | IOMMUF_writable);
>          }
> 
> In this snippet, "special page" is interpreted as a page with no owner. 
> However is_special_page() will return true when PGC_extra is set and the 
> flag implies there is an owner (see assign_pages()).
> 
> So it looks like to me, any pages with PGC_extra would end up to have a 
> mapping in the IOMMU pages-tables if they are writable.
> 
> This statement also seems to apply for xenheap pages shared with a 
> domain (e.g grant-table).
> 
> Did I miss anything?

First let me point out that what you quote is not what I had
pointed you at - you refer to _get_page_type() when I suggested
to look at cleanup_page_mappings(). The important aspect for
special pages (here I mean ones having been subject to
share_xen_page_with_guest()) is that their type gets "pinned",
i.e. they'll never be subject to _get_page_type()'s
transitioning of types. As you likely will have noticed,
cleanup_page_mappings() also only gets called when the last
_general_ ref of a page got dropped, i.e. entirely unrelated
to type references.

If PGC_extra pages have a similar requirement, they may need
similar pinning of their types. (Maybe they do; didn't check.)

Jan


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables
  2021-01-15 11:24               ` Jan Beulich
@ 2021-01-15 11:30                 ` Julien Grall
  0 siblings, 0 replies; 40+ messages in thread
From: Julien Grall @ 2021-01-15 11:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: hongyxia, Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Paul Durrant, xen-devel

Hi Jan,

On 15/01/2021 11:24, Jan Beulich wrote:
> On 14.01.2021 19:53, Julien Grall wrote:
>> On 23/12/2020 16:35, Jan Beulich wrote:
>>> On 23.12.2020 17:19, Julien Grall wrote:
>>>> On 23/12/2020 16:15, Jan Beulich wrote:
>>>>> On 23.12.2020 17:07, Julien Grall wrote:
>>>>>> I think I prefer the small race introduced (the pages will still be
>>>>>> freed) over this solution.
>>>>>
>>>>> The "will still be freed" is because of the 2nd round of freeing
>>>>> you add in an earlier patch? I'd prefer to avoid the race to in
>>>>> turn avoid that 2nd round of freeing.
>>>>
>>>> The "2nd round" of freeing is necessary at least for the domain creation
>>>> failure case (it would be the 1rst round).
>>>>
>>>> If we can avoid IOMMU page-table allocation in the domain creation path,
>>>> then yes I agree that we want to avoid that "2nd round". Otherwise, I
>>>> think it is best to take advantage of it.
>>>
>>> Well, I'm not really certain either way here. If it's really just
>>> VMX'es APIC access page that's the problem here, custom cleanup
>>> of this "fallout" from VMX code would certainly be an option.
>>
>>   From my testing, it looks like the VMX APIC page is the only entry
>> added while the domain is created.
>>
>>> Furthermore I consider it wrong to insert this page in the IOMMU
>>> page tables in the first place. This page overlaps with the MSI
>>> special address range, and hence accesses will be dealt with by
>>> interrupt remapping machinery (if enabled). If interrupt
>>> remapping is disabled, this page should be no different for I/O
>>> purposes than all other pages in this window - they shouldn't
>>> be mapped at all.
>>
>> That's a fair point. I will have a look to see if I can avoid the IOMMU
>> mapping for the VMX APIC page.
>>
>>> Perhaps, along the lines of epte_get_entry_emt(), ept_set_entry()
>>> should gain an is_special_page() check to prevent propagation to
>>> the IOMMU for such pages (we can't do anything about them in
>>> shared-page-table setups)? See also the (PV related) comment in
>>> cleanup_page_mappings(), a few lines ahead of a respective use of
>>> is_special_page(),
>>
>> There seems to be a mismatch between what the comment says and the code
>> adding the page in the IOMMU for PV domain.
>>
>> AFAICT, the IOMMU mapping will be added via _get_page_type():
>>
>>           /* Special pages should not be accessible from devices. */
>>           struct domain *d = page_get_owner(page);
>>
>>           if ( d && unlikely(need_iommu_pt_sync(d)) && is_pv_domain(d) )
>>           {
>>               mfn_t mfn = page_to_mfn(page);
>>
>>               if ( (x & PGT_type_mask) == PGT_writable_page )
>>                   rc = iommu_legacy_unmap(d, _dfn(mfn_x(mfn)),
>>                                           1ul << PAGE_ORDER_4K);
>>               else
>>                   rc = iommu_legacy_map(d, _dfn(mfn_x(mfn)), mfn,
>>                                         1ul << PAGE_ORDER_4K,
>>                                         IOMMUF_readable | IOMMUF_writable);
>>           }
>>
>> In this snippet, "special page" is interpreted as a page with no owner.
>> However is_special_page() will return true when PGC_extra is set and the
>> flag implies there is an owner (see assign_pages()).
>>
>> So it looks like to me, any pages with PGC_extra would end up to have a
>> mapping in the IOMMU pages-tables if they are writable.
>>
>> This statement also seems to apply for xenheap pages shared with a
>> domain (e.g grant-table).
>>
>> Did I miss anything?
> 
> First let me point out that what you quote is not what I had
> pointed you at - you refer to _get_page_type() when I suggested
> to look at cleanup_page_mappings(). 
I know and that's why I pointed out the mismatch between the comments 
(in cleanup_page_mappings()) and the code adding the mapping in the 
IOMMU (_get_page_type()).

I only looked at _get_page_type() because I wanted to understand how the 
IOMMU mapping works for PV.

The important aspect for
> special pages (here I mean ones having been subject to
> share_xen_page_with_guest()) is that their type gets "pinned",
> i.e. they'll never be subject to _get_page_type()'s
> transitioning of types. As you likely will have noticed,
> cleanup_page_mappings() also only gets called when the last
> _general_ ref of a page got dropped, i.e. entirely unrelated
> to type references.

Ah, that makes sense. I added some debugging in the code and couldn't 
really figure out why the transition wasn't happening.

> 
> If PGC_extra pages have a similar requirement, they may need
> similar pinning of their types. (Maybe they do; didn't check.)

I have only looked at the VMX APIC page so far. It effectively pin the 
types.

Thanks for the explanation!

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-01-15 15:18 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22 15:43 [PATCH for-4.15 0/4] xen/iommu: Collection of bug fixes for IOMMU teadorwn Julien Grall
2020-12-22 15:43 ` [PATCH for-4.15 1/4] xen/iommu: Check if the IOMMU was initialized before tearing down Julien Grall
2020-12-23 13:27   ` Jan Beulich
2020-12-23 13:50     ` Julien Grall
2020-12-23 13:59       ` Jan Beulich
2020-12-23 14:51         ` Julien Grall
2020-12-23 14:58           ` Jan Beulich
2021-01-04  9:28   ` Paul Durrant
2021-01-04 14:33     ` Julien Grall
2020-12-22 15:43 ` [PATCH for-4.15 2/4] xen/iommu: x86: Free the IOMMU page-tables with the pgtables.lock held Julien Grall
2020-12-23 13:48   ` Jan Beulich
2020-12-23 14:01     ` Julien Grall
2020-12-23 14:16       ` Jan Beulich
2021-01-14 19:19       ` Julien Grall
2021-01-15 11:06         ` Jan Beulich
2021-01-15 15:18           ` Paul Durrant
2020-12-22 15:43 ` [PATCH for-4.15 3/4] [RFC] xen/iommu: x86: Clear the root page-table before freeing the page-tables Julien Grall
2020-12-23 14:12   ` Jan Beulich
2020-12-23 14:56     ` Julien Grall
2020-12-23 15:00       ` Jan Beulich
2020-12-23 15:16         ` Julien Grall
2020-12-23 16:11           ` Jan Beulich
2020-12-23 16:16             ` Julien Grall
2020-12-23 16:24               ` Jan Beulich
2020-12-23 16:29                 ` Julien Grall
2020-12-23 16:46                   ` Jan Beulich
2020-12-23 16:54                     ` Julien Grall
2020-12-23 17:02                       ` Jan Beulich
2020-12-23 17:26                         ` Julien Grall
2021-01-04  9:53                     ` Paul Durrant
2020-12-22 15:43 ` [PATCH for-4.15 4/4] xen/iommu: x86: Don't leak the IOMMU page-tables Julien Grall
2020-12-22 17:12   ` Julien Grall
2020-12-23 14:34   ` Jan Beulich
2020-12-23 16:07     ` Julien Grall
2020-12-23 16:15       ` Jan Beulich
2020-12-23 16:19         ` Julien Grall
2020-12-23 16:35           ` Jan Beulich
2021-01-14 18:53             ` Julien Grall
2021-01-15 11:24               ` Jan Beulich
2021-01-15 11:30                 ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.