xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Performance regression due to XSA-336
@ 2021-03-29 21:15 Boris Ostrovsky
  2021-03-29 21:15 ` [PATCH for-4.15 v3 1/2] x86/vpt: Do not take pt_migrate rwlock in some cases Boris Ostrovsky
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2021-03-29 21:15 UTC (permalink / raw)
  To: xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, wl, boris.ostrovsky,
	stephen.s.brennan, iwj

The first patch addresses performance regression introduced by XSA-336 fixes.
This patch could be considered as a candidate for inclusion in 4.15.

The second patch is a minor cleanup and can safely wait until after 4.15.


Boris Ostrovsky (2):
  x86/vpt: Do not take pt_migrate rwlock in some cases
  x86/vpt: Simplify locking argument to write_{un}lock

 xen/arch/x86/hvm/vpt.c        | 48 +++++++++++++++++++++++++++++++++----------
 xen/include/asm-x86/hvm/vpt.h | 18 ++++++++++------
 2 files changed, 49 insertions(+), 17 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH for-4.15 v3 1/2] x86/vpt: Do not take pt_migrate rwlock in some cases
  2021-03-29 21:15 [PATCH v3 0/2] Performance regression due to XSA-336 Boris Ostrovsky
@ 2021-03-29 21:15 ` Boris Ostrovsky
  2021-03-29 21:15 ` [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock Boris Ostrovsky
  2021-03-30 10:17 ` [PATCH v3 0/2] Performance regression due to XSA-336 Ian Jackson
  2 siblings, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2021-03-29 21:15 UTC (permalink / raw)
  To: xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, wl, boris.ostrovsky,
	stephen.s.brennan, iwj

Commit 8e76aef72820 ("x86/vpt: fix race when migrating timers between
vCPUs") addressed XSA-336 by introducing a per-domain rwlock that was
intended to protect periodic timer during VCPU migration. Since such
migration is an infrequent event no performance impact was expected.

Unfortunately this turned out not to be the case: on a fairly large
guest (92 VCPUs) we've observed as much as 40% TPCC performance
regression with some guest kernels. Further investigation pointed to
pt_migrate read lock taken in pt_update_irq() as the largest contributor
to this regression. With large number of VCPUs and large number of VMEXITs
(from where pt_update_irq() is always called) the update of an atomic in
read_lock() is thought to be the main cause.

Stephen Brennan analyzed locking pattern and classified lock users as
follows:

1. Functions which read (maybe write) all periodic_time instances attached
to a particular vCPU. These are functions which use pt_vcpu_lock() such
as pt_restore_timer(), pt_save_timer(), etc.
2. Functions which want to modify a particular periodic_time object.
These functions lock whichever vCPU the periodic_time is attached to, but
since the vCPU could be modified without holding any lock, they are
vulnerable to XSA-336. Functions in this group use pt_lock(), such as
pt_timer_fn() or destroy_periodic_time().
3. Functions which not only want to modify the periodic_time, but also
would like to modify the =vcpu= fields. These are create_periodic_time()
or pt_adjust_vcpu(). They create XSA-336 conditions for group 2, but we
can't simply hold 2 vcpu locks due to the deadlock risk.

Roger then pointed out that group 1 functions don't really need to hold
the pt_migrate rwlock and that instead groups 2 and 3 should hold per-vcpu
lock whenever they modify per-vcpu timer lists.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---

v3:
* Updated comments as proposed by Roger
* Updated commit message (Roger, Stephen)
* Added pt->vcpu==v test in pt_adjust_vcpu() (Jan)
* Use v instead of pt->vcpu as argument to locking helpers in create_periodic_time()
  and pt_adjust_vcpu() (Jan)

 xen/arch/x86/hvm/vpt.c        | 44 ++++++++++++++++++++++++++++++++++---------
 xen/include/asm-x86/hvm/vpt.h | 18 ++++++++++++------
 2 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 4c2afe2e9154..560fab9cfc60 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -153,32 +153,43 @@ static int pt_irq_masked(struct periodic_time *pt)
     return 1;
 }
 
+/*
+ * Functions which read (maybe write) all periodic_time instances
+ * attached to a particular vCPU use pt_vcpu_{un}lock locking helpers.
+ *
+ * Such users are explicitly forbidden from changing the value of the
+ * pt->vcpu field, because another thread holding the pt_migrate lock
+ * may already be spinning waiting for your vcpu lock.
+ */
 static void pt_vcpu_lock(struct vcpu *v)
 {
-    read_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&v->arch.hvm.tm_lock);
 }
 
 static void pt_vcpu_unlock(struct vcpu *v)
 {
     spin_unlock(&v->arch.hvm.tm_lock);
-    read_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
 }
 
+/*
+ * Functions which want to modify a particular periodic_time object
+ * use pt_{un}lock locking helpers.
+ *
+ * These users lock whichever vCPU the periodic_time is attached to,
+ * but since the vCPU could be modified without holding any lock, they
+ * need to take an additional lock that protects against pt->vcpu
+ * changing.
+ */
 static void pt_lock(struct periodic_time *pt)
 {
-    /*
-     * We cannot use pt_vcpu_lock here, because we need to acquire the
-     * per-domain lock first and then (re-)fetch the value of pt->vcpu, or
-     * else we might be using a stale value of pt->vcpu.
-     */
     read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&pt->vcpu->arch.hvm.tm_lock);
 }
 
 static void pt_unlock(struct periodic_time *pt)
 {
-    pt_vcpu_unlock(pt->vcpu);
+    spin_unlock(&pt->vcpu->arch.hvm.tm_lock);
+    read_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
 }
 
 static void pt_process_missed_ticks(struct periodic_time *pt)
@@ -543,8 +554,10 @@ void create_periodic_time(
     pt->cb = cb;
     pt->priv = data;
 
+    pt_vcpu_lock(v);
     pt->on_list = 1;
     list_add(&pt->list, &v->arch.hvm.tm_list);
+    pt_vcpu_unlock(v);
 
     init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
     set_timer(&pt->timer, pt->scheduled);
@@ -580,13 +593,26 @@ static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
         return;
 
     write_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+
+    if ( pt->vcpu == v )
+        goto out;
+
+    pt_vcpu_lock(pt->vcpu);
+    if ( pt->on_list )
+        list_del(&pt->list);
+    pt_vcpu_unlock(pt->vcpu);
+
     pt->vcpu = v;
+
+    pt_vcpu_lock(v);
     if ( pt->on_list )
     {
-        list_del(&pt->list);
         list_add(&pt->list, &v->arch.hvm.tm_list);
         migrate_timer(&pt->timer, v->processor);
     }
+    pt_vcpu_unlock(v);
+
+ out:
     write_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
 }
 
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 39d26cbda496..74c0cedd11cc 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -128,12 +128,18 @@ struct pl_time {    /* platform time */
     struct RTCState  vrtc;
     struct HPETState vhpet;
     struct PMTState  vpmt;
-    /*
-     * rwlock to prevent periodic_time vCPU migration. Take the lock in read
-     * mode in order to prevent the vcpu field of periodic_time from changing.
-     * Lock must be taken in write mode when changes to the vcpu field are
-     * performed, as it allows exclusive access to all the timers of a domain.
-     */
+     /*
+      * Functions which want to modify the vcpu field of the vpt need
+      * to hold the global lock (pt_migrate) in write mode together
+      * with the per-vcpu locks of the lists being modified. Functions
+      * that want to lock a periodic_timer that's possibly on a
+      * different vCPU list need to take the lock in read mode first in
+      * order to prevent the vcpu field of periodic_timer from
+      * changing.
+      *
+      * Note that two vcpu locks cannot be held at the same time to
+      * avoid a deadlock.
+      */
     rwlock_t pt_migrate;
     /* guest_time = Xen sys time + stime_offset */
     int64_t stime_offset;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock
  2021-03-29 21:15 [PATCH v3 0/2] Performance regression due to XSA-336 Boris Ostrovsky
  2021-03-29 21:15 ` [PATCH for-4.15 v3 1/2] x86/vpt: Do not take pt_migrate rwlock in some cases Boris Ostrovsky
@ 2021-03-29 21:15 ` Boris Ostrovsky
  2021-03-30  7:36   ` Roger Pau Monné
  2021-03-30 10:17 ` [PATCH v3 0/2] Performance regression due to XSA-336 Ian Jackson
  2 siblings, 1 reply; 7+ messages in thread
From: Boris Ostrovsky @ 2021-03-29 21:15 UTC (permalink / raw)
  To: xen-devel
  Cc: jbeulich, andrew.cooper3, roger.pau, wl, boris.ostrovsky,
	stephen.s.brennan, iwj

Make both create_periodic_time() and pt_adjust_vcpu() call
write_{un}lock with similar arguments.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
New patch.

I ended up doing what Jan asked --- create_periodic_time() is also using different
start pointers in lock() and unlock().


 xen/arch/x86/hvm/vpt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 560fab9cfc60..4cc0a0848bd7 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -592,7 +592,7 @@ static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
     if ( pt->vcpu == NULL )
         return;
 
-    write_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
 
     if ( pt->vcpu == v )
         goto out;
@@ -613,7 +613,7 @@ static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
     pt_vcpu_unlock(v);
 
  out:
-    write_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    write_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
 }
 
 void pt_adjust_global_vcpu_target(struct vcpu *v)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock
  2021-03-29 21:15 ` [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock Boris Ostrovsky
@ 2021-03-30  7:36   ` Roger Pau Monné
  2021-03-30 12:49     ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Roger Pau Monné @ 2021-03-30  7:36 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: xen-devel, jbeulich, andrew.cooper3, wl, stephen.s.brennan, iwj

On Mon, Mar 29, 2021 at 05:15:02PM -0400, Boris Ostrovsky wrote:
> Make both create_periodic_time() and pt_adjust_vcpu() call
> write_{un}lock with similar arguments.
> 

Might be worth adding that this is not a functional change?

> Requested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Either way:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> New patch.
> 
> I ended up doing what Jan asked --- create_periodic_time() is also using different
> start pointers in lock() and unlock().

Hm, I'm not sure I'm following, create_periodic_time uses 'v' in both
write_{un}lock calls, which doesn't change across the function.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/2] Performance regression due to XSA-336
  2021-03-29 21:15 [PATCH v3 0/2] Performance regression due to XSA-336 Boris Ostrovsky
  2021-03-29 21:15 ` [PATCH for-4.15 v3 1/2] x86/vpt: Do not take pt_migrate rwlock in some cases Boris Ostrovsky
  2021-03-29 21:15 ` [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock Boris Ostrovsky
@ 2021-03-30 10:17 ` Ian Jackson
  2 siblings, 0 replies; 7+ messages in thread
From: Ian Jackson @ 2021-03-30 10:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: xen-devel, jbeulich, andrew.cooper3, roger.pau, wl, stephen.s.brennan

Boris Ostrovsky writes ("[PATCH v3 0/2] Performance regression due to XSA-336"):
> The first patch addresses performance regression introduced by XSA-336 fixes.
> This patch could be considered as a candidate for inclusion in 4.15.

Thank you, but I think this is too late for 4.15.

Regards,
Ian.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock
  2021-03-30  7:36   ` Roger Pau Monné
@ 2021-03-30 12:49     ` Jan Beulich
  2021-03-30 14:22       ` Boris Ostrovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2021-03-30 12:49 UTC (permalink / raw)
  To: Roger Pau Monné, Boris Ostrovsky
  Cc: xen-devel, andrew.cooper3, wl, stephen.s.brennan, iwj

On 30.03.2021 09:36, Roger Pau Monné wrote:
> On Mon, Mar 29, 2021 at 05:15:02PM -0400, Boris Ostrovsky wrote:
>> Make both create_periodic_time() and pt_adjust_vcpu() call
>> write_{un}lock with similar arguments.

This makes it sound like you adjust both functions, but really
you bring the latter in line with the former. Would you mind me
adjusting the wording along these lines while (and when)
committing?

> Might be worth adding that this is not a functional change?
> 
>> Requested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> 
> Either way:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 
>> ---
>> New patch.
>>
>> I ended up doing what Jan asked --- create_periodic_time() is also using different
>> start pointers in lock() and unlock().
> 
> Hm, I'm not sure I'm following, create_periodic_time uses 'v' in both
> write_{un}lock calls, which doesn't change across the function.

I guess Boris merely meant to express that there's already precedent?

Jan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock
  2021-03-30 12:49     ` Jan Beulich
@ 2021-03-30 14:22       ` Boris Ostrovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Boris Ostrovsky @ 2021-03-30 14:22 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, andrew.cooper3, wl, stephen.s.brennan, iwj


On 3/30/21 8:49 AM, Jan Beulich wrote:
> On 30.03.2021 09:36, Roger Pau Monné wrote:
>> On Mon, Mar 29, 2021 at 05:15:02PM -0400, Boris Ostrovsky wrote:
>>> Make both create_periodic_time() and pt_adjust_vcpu() call
>>> write_{un}lock with similar arguments.
> This makes it sound like you adjust both functions, but really
> you bring the latter in line with the former. Would you mind me
> adjusting the wording along these lines while (and when)
> committing?


Yes, please.


>
>> Might be worth adding that this is not a functional change?
>>
>>> Requested-by: Jan Beulich <jbeulich@suse.com>
>>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Either way:
>>
>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>>> ---
>>> New patch.
>>>
>>> I ended up doing what Jan asked --- create_periodic_time() is also using different
>>> start pointers in lock() and unlock().
>> Hm, I'm not sure I'm following, create_periodic_time uses 'v' in both
>> write_{un}lock calls, which doesn't change across the function.
> I guess Boris merely meant to express that there's already precedent?


Yes, that's what I wanted to say. But clearly not what I actually said.


-boris



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-30 14:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-29 21:15 [PATCH v3 0/2] Performance regression due to XSA-336 Boris Ostrovsky
2021-03-29 21:15 ` [PATCH for-4.15 v3 1/2] x86/vpt: Do not take pt_migrate rwlock in some cases Boris Ostrovsky
2021-03-29 21:15 ` [PATCH v3 2/2] x86/vpt: Simplify locking argument to write_{un}lock Boris Ostrovsky
2021-03-30  7:36   ` Roger Pau Monné
2021-03-30 12:49     ` Jan Beulich
2021-03-30 14:22       ` Boris Ostrovsky
2021-03-30 10:17 ` [PATCH v3 0/2] Performance regression due to XSA-336 Ian Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).