* [PATCH v2] x86/PoD: defer nested P2M flushes
@ 2021-10-19 12:52 Jan Beulich
2021-10-19 13:53 ` Roger Pau Monné
2021-10-20 7:27 ` Tian, Kevin
0 siblings, 2 replies; 5+ messages in thread
From: Jan Beulich @ 2021-10-19 12:52 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Roger Pau Monné, Kevin Tian
With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
order violation when the PoD lock is held around it. Hence such flushing
needs to be deferred. Steal the approach from p2m_change_type_range().
(Note that strictly speaking the change at the out_of_memory label is
not needed, as the domain gets crashed there anyway. The change is being
made nevertheless to avoid setting up a trap from someone meaning to
deal with that case better than by domain_crash().)
Similarly for EPT I think ept_set_entry() -> ept_sync_domain() ->
ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its
p2m_flush_nestedp2m() invocation conditional. Note that this then also
alters behavior of p2m_change_type_range() on EPT, deferring the nested
flushes there as well. I think this should have been that way from the
introduction of the flag.
Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Also adjust ept_sync_domain_prepare(). Also convert the flush at the
out_of_memory label. Extend description to cover these.
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(stru
{
if ( p2m_is_nestedp2m(p2m) )
ept = &p2m_get_hostp2m(d)->ept;
- else
+ else if ( !p2m->defer_nested_flush )
p2m_flush_nestedp2m(d);
}
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -24,6 +24,7 @@
#include <xen/mm.h>
#include <xen/sched.h>
#include <xen/trace.h>
+#include <asm/hvm/nestedhvm.h>
#include <asm/page.h>
#include <asm/paging.h>
#include <asm/p2m.h>
@@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct
static int
p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn);
+static void pod_unlock_and_flush(struct p2m_domain *p2m)
+{
+ pod_unlock(p2m);
+ p2m->defer_nested_flush = false;
+ if ( nestedhvm_enabled(p2m->domain) )
+ p2m_flush_nestedp2m(p2m->domain);
+}
/*
* This function is needed for two reasons:
@@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma
gfn_lock(p2m, gfn, order);
pod_lock(p2m);
+ p2m->defer_nested_flush = true;
/*
* If we don't have any outstanding PoD entries, let things take their
@@ -665,7 +674,7 @@ out_entry_check:
}
out_unlock:
- pod_unlock(p2m);
+ pod_unlock_and_flush(p2m);
gfn_unlock(p2m, gfn, order);
return ret;
}
@@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai
* won't start until we're done.
*/
if ( unlikely(d->is_dying) )
- goto out_fail;
-
+ {
+ pod_unlock(p2m);
+ return false;
+ }
/*
* Because PoD does not have cache list for 1GB pages, it has to remap
@@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai
p2m_populate_on_demand, p2m->default_access);
}
+ p2m->defer_nested_flush = true;
+
/* Only reclaim if we're in actual need of more cache. */
if ( p2m->pod.entry_count > p2m->pod.count )
pod_eager_reclaim(p2m);
@@ -1229,22 +1242,25 @@ p2m_pod_demand_populate(struct p2m_domai
__trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
}
- pod_unlock(p2m);
+ pod_unlock_and_flush(p2m);
return true;
+
out_of_memory:
- pod_unlock(p2m);
+ pod_unlock_and_flush(p2m);
printk("%s: Dom%d out of PoD memory! (tot=%"PRIu32" ents=%ld dom%d)\n",
__func__, d->domain_id, domain_tot_pages(d),
p2m->pod.entry_count, current->domain->domain_id);
domain_crash(d);
return false;
+
out_fail:
- pod_unlock(p2m);
+ pod_unlock_and_flush(p2m);
return false;
+
remap_and_retry:
BUG_ON(order != PAGE_ORDER_2M);
- pod_unlock(p2m);
+ pod_unlock_and_flush(p2m);
/*
* Remap this 2-meg region in singleton chunks. See the comment on the
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/PoD: defer nested P2M flushes
2021-10-19 12:52 [PATCH v2] x86/PoD: defer nested P2M flushes Jan Beulich
@ 2021-10-19 13:53 ` Roger Pau Monné
2021-10-19 15:06 ` Jan Beulich
2021-10-20 7:27 ` Tian, Kevin
1 sibling, 1 reply; 5+ messages in thread
From: Roger Pau Monné @ 2021-10-19 13:53 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, George Dunlap, Kevin Tian
On Tue, Oct 19, 2021 at 02:52:27PM +0200, Jan Beulich wrote:
> With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
> order violation when the PoD lock is held around it. Hence such flushing
> needs to be deferred. Steal the approach from p2m_change_type_range().
> (Note that strictly speaking the change at the out_of_memory label is
> not needed, as the domain gets crashed there anyway. The change is being
> made nevertheless to avoid setting up a trap from someone meaning to
> deal with that case better than by domain_crash().)
>
> Similarly for EPT I think ept_set_entry() -> ept_sync_domain() ->
> ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its
> p2m_flush_nestedp2m() invocation conditional. Note that this then also
> alters behavior of p2m_change_type_range() on EPT, deferring the nested
> flushes there as well. I think this should have been that way from the
> introduction of the flag.
>
> Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> v2: Also adjust ept_sync_domain_prepare(). Also convert the flush at the
> out_of_memory label. Extend description to cover these.
>
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(stru
> {
> if ( p2m_is_nestedp2m(p2m) )
> ept = &p2m_get_hostp2m(d)->ept;
> - else
> + else if ( !p2m->defer_nested_flush )
> p2m_flush_nestedp2m(d);
I find this model slightly concerning, as we don't actually notify the
caller that a nested flush as been deferred, so we must make sure that
whoever sets defer_nested_flush also performs a flush unconditionally
when clearing the flag.
Thanks, Roger.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/PoD: defer nested P2M flushes
2021-10-19 13:53 ` Roger Pau Monné
@ 2021-10-19 15:06 ` Jan Beulich
2021-10-20 7:52 ` Roger Pau Monné
0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-10-19 15:06 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, George Dunlap, Kevin Tian
On 19.10.2021 15:53, Roger Pau Monné wrote:
> On Tue, Oct 19, 2021 at 02:52:27PM +0200, Jan Beulich wrote:
>> With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
>> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
>> order violation when the PoD lock is held around it. Hence such flushing
>> needs to be deferred. Steal the approach from p2m_change_type_range().
>> (Note that strictly speaking the change at the out_of_memory label is
>> not needed, as the domain gets crashed there anyway. The change is being
>> made nevertheless to avoid setting up a trap from someone meaning to
>> deal with that case better than by domain_crash().)
>>
>> Similarly for EPT I think ept_set_entry() -> ept_sync_domain() ->
>> ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its
>> p2m_flush_nestedp2m() invocation conditional. Note that this then also
>> alters behavior of p2m_change_type_range() on EPT, deferring the nested
>> flushes there as well. I think this should have been that way from the
>> introduction of the flag.
>>
>> Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks.
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(stru
>> {
>> if ( p2m_is_nestedp2m(p2m) )
>> ept = &p2m_get_hostp2m(d)->ept;
>> - else
>> + else if ( !p2m->defer_nested_flush )
>> p2m_flush_nestedp2m(d);
>
> I find this model slightly concerning, as we don't actually notify the
> caller that a nested flush as been deferred, so we must make sure that
> whoever sets defer_nested_flush also performs a flush unconditionally
> when clearing the flag.
Well, this _is_ the model used for now. Until this change there was
just a single party setting the flag. And like here, any new party
setting the flag will also need to invoke a flush upon clearing it.
It's not clear to me what alternative model you may have in mind.
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH v2] x86/PoD: defer nested P2M flushes
2021-10-19 12:52 [PATCH v2] x86/PoD: defer nested P2M flushes Jan Beulich
2021-10-19 13:53 ` Roger Pau Monné
@ 2021-10-20 7:27 ` Tian, Kevin
1 sibling, 0 replies; 5+ messages in thread
From: Tian, Kevin @ 2021-10-20 7:27 UTC (permalink / raw)
To: Jan Beulich, xen-devel
Cc: Cooper, Andrew, George Dunlap, Roger Pau Monné
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, October 19, 2021 8:52 PM
>
> With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
> order violation when the PoD lock is held around it. Hence such flushing
> needs to be deferred. Steal the approach from p2m_change_type_range().
> (Note that strictly speaking the change at the out_of_memory label is
> not needed, as the domain gets crashed there anyway. The change is being
> made nevertheless to avoid setting up a trap from someone meaning to
> deal with that case better than by domain_crash().)
>
> Similarly for EPT I think ept_set_entry() -> ept_sync_domain() ->
> ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its
> p2m_flush_nestedp2m() invocation conditional. Note that this then also
> alters behavior of p2m_change_type_range() on EPT, deferring the nested
> flushes there as well. I think this should have been that way from the
> introduction of the flag.
>
> Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
> v2: Also adjust ept_sync_domain_prepare(). Also convert the flush at the
> out_of_memory label. Extend description to cover these.
>
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(stru
> {
> if ( p2m_is_nestedp2m(p2m) )
> ept = &p2m_get_hostp2m(d)->ept;
> - else
> + else if ( !p2m->defer_nested_flush )
> p2m_flush_nestedp2m(d);
> }
>
> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -24,6 +24,7 @@
> #include <xen/mm.h>
> #include <xen/sched.h>
> #include <xen/trace.h>
> +#include <asm/hvm/nestedhvm.h>
> #include <asm/page.h>
> #include <asm/paging.h>
> #include <asm/p2m.h>
> @@ -494,6 +495,13 @@ p2m_pod_offline_or_broken_replace(struct
> static int
> p2m_pod_zero_check_superpage(struct p2m_domain *p2m, gfn_t gfn);
>
> +static void pod_unlock_and_flush(struct p2m_domain *p2m)
> +{
> + pod_unlock(p2m);
> + p2m->defer_nested_flush = false;
> + if ( nestedhvm_enabled(p2m->domain) )
> + p2m_flush_nestedp2m(p2m->domain);
> +}
>
> /*
> * This function is needed for two reasons:
> @@ -514,6 +522,7 @@ p2m_pod_decrease_reservation(struct doma
>
> gfn_lock(p2m, gfn, order);
> pod_lock(p2m);
> + p2m->defer_nested_flush = true;
>
> /*
> * If we don't have any outstanding PoD entries, let things take their
> @@ -665,7 +674,7 @@ out_entry_check:
> }
>
> out_unlock:
> - pod_unlock(p2m);
> + pod_unlock_and_flush(p2m);
> gfn_unlock(p2m, gfn, order);
> return ret;
> }
> @@ -1144,8 +1153,10 @@ p2m_pod_demand_populate(struct p2m_domai
> * won't start until we're done.
> */
> if ( unlikely(d->is_dying) )
> - goto out_fail;
> -
> + {
> + pod_unlock(p2m);
> + return false;
> + }
>
> /*
> * Because PoD does not have cache list for 1GB pages, it has to remap
> @@ -1167,6 +1178,8 @@ p2m_pod_demand_populate(struct p2m_domai
> p2m_populate_on_demand, p2m->default_access);
> }
>
> + p2m->defer_nested_flush = true;
> +
> /* Only reclaim if we're in actual need of more cache. */
> if ( p2m->pod.entry_count > p2m->pod.count )
> pod_eager_reclaim(p2m);
> @@ -1229,22 +1242,25 @@ p2m_pod_demand_populate(struct p2m_domai
> __trace_var(TRC_MEM_POD_POPULATE, 0, sizeof(t), &t);
> }
>
> - pod_unlock(p2m);
> + pod_unlock_and_flush(p2m);
> return true;
> +
> out_of_memory:
> - pod_unlock(p2m);
> + pod_unlock_and_flush(p2m);
>
> printk("%s: Dom%d out of PoD memory! (tot=%"PRIu32" ents=%ld
> dom%d)\n",
> __func__, d->domain_id, domain_tot_pages(d),
> p2m->pod.entry_count, current->domain->domain_id);
> domain_crash(d);
> return false;
> +
> out_fail:
> - pod_unlock(p2m);
> + pod_unlock_and_flush(p2m);
> return false;
> +
> remap_and_retry:
> BUG_ON(order != PAGE_ORDER_2M);
> - pod_unlock(p2m);
> + pod_unlock_and_flush(p2m);
>
> /*
> * Remap this 2-meg region in singleton chunks. See the comment on the
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/PoD: defer nested P2M flushes
2021-10-19 15:06 ` Jan Beulich
@ 2021-10-20 7:52 ` Roger Pau Monné
0 siblings, 0 replies; 5+ messages in thread
From: Roger Pau Monné @ 2021-10-20 7:52 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, George Dunlap, Kevin Tian
On Tue, Oct 19, 2021 at 05:06:27PM +0200, Jan Beulich wrote:
> On 19.10.2021 15:53, Roger Pau Monné wrote:
> > On Tue, Oct 19, 2021 at 02:52:27PM +0200, Jan Beulich wrote:
> >> With NPT or shadow in use, the p2m_set_entry() -> p2m_pt_set_entry() ->
> >> write_p2m_entry() -> p2m_flush_nestedp2m() call sequence triggers a lock
> >> order violation when the PoD lock is held around it. Hence such flushing
> >> needs to be deferred. Steal the approach from p2m_change_type_range().
> >> (Note that strictly speaking the change at the out_of_memory label is
> >> not needed, as the domain gets crashed there anyway. The change is being
> >> made nevertheless to avoid setting up a trap from someone meaning to
> >> deal with that case better than by domain_crash().)
> >>
> >> Similarly for EPT I think ept_set_entry() -> ept_sync_domain() ->
> >> ept_sync_domain_prepare() -> p2m_flush_nestedp2m() is affected. Make its
> >> p2m_flush_nestedp2m() invocation conditional. Note that this then also
> >> alters behavior of p2m_change_type_range() on EPT, deferring the nested
> >> flushes there as well. I think this should have been that way from the
> >> introduction of the flag.
> >>
> >> Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Thanks.
>
> >> --- a/xen/arch/x86/mm/p2m-ept.c
> >> +++ b/xen/arch/x86/mm/p2m-ept.c
> >> @@ -1253,7 +1253,7 @@ static void ept_sync_domain_prepare(stru
> >> {
> >> if ( p2m_is_nestedp2m(p2m) )
> >> ept = &p2m_get_hostp2m(d)->ept;
> >> - else
> >> + else if ( !p2m->defer_nested_flush )
> >> p2m_flush_nestedp2m(d);
> >
> > I find this model slightly concerning, as we don't actually notify the
> > caller that a nested flush as been deferred, so we must make sure that
> > whoever sets defer_nested_flush also performs a flush unconditionally
> > when clearing the flag.
>
> Well, this _is_ the model used for now. Until this change there was
> just a single party setting the flag. And like here, any new party
> setting the flag will also need to invoke a flush upon clearing it.
> It's not clear to me what alternative model you may have in mind.
I was mostly thinking of something similar to the
defer_flush/need_flush pair, where the need for a flush is signaled in
need_flush, so setting defer_flush doesn't automatically imply a flush
when clearing it.
Anyway, this is simple enough that it doesn't warrant the use of the
more complicated logic.
Thanks, Roger.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-10-20 7:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19 12:52 [PATCH v2] x86/PoD: defer nested P2M flushes Jan Beulich
2021-10-19 13:53 ` Roger Pau Monné
2021-10-19 15:06 ` Jan Beulich
2021-10-20 7:52 ` Roger Pau Monné
2021-10-20 7:27 ` Tian, Kevin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.