All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH 0/2] vTSC performance improvements
@ 2019-12-13 22:48 Igor Druzhinin
  2019-12-13 22:48 ` [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters Igor Druzhinin
  2019-12-13 22:48 ` [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock Igor Druzhinin
  0 siblings, 2 replies; 12+ messages in thread
From: Igor Druzhinin @ 2019-12-13 22:48 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, Igor Druzhinin, wl, jbeulich, roger.pau

In our PV shim testing we've noticed constant lockups of guests with high
number of vCPUs assigned that usually happening when there is another guest
running on the same host. Reproing the problem manually and dumping shim
state immediately showed that most of the vCPUs are locked on vtsc_lock.
As PV shim guest always gets emulated TSC due to L1 Xen itself not being
provided with ITSC the ideal solution would be to try dropping the lock
entirely.

Igor Druzhinin (2):
  x86/time: drop vtsc_{kern,user}count debug counters
  x86/time: update vtsc_last with cmpxchg and drop vtsc_lock

 xen/arch/x86/domain.c        |  1 -
 xen/arch/x86/hvm/hvm.c       | 32 ++------------------------------
 xen/arch/x86/time.c          | 28 ++++++----------------------
 xen/include/asm-x86/domain.h |  5 -----
 4 files changed, 8 insertions(+), 58 deletions(-)

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters
  2019-12-13 22:48 [Xen-devel] [PATCH 0/2] vTSC performance improvements Igor Druzhinin
@ 2019-12-13 22:48 ` Igor Druzhinin
  2019-12-16  9:47   ` Roger Pau Monné
  2019-12-13 22:48 ` [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock Igor Druzhinin
  1 sibling, 1 reply; 12+ messages in thread
From: Igor Druzhinin @ 2019-12-13 22:48 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, Igor Druzhinin, wl, jbeulich, roger.pau

They either need to be transformed to atomics to work correctly
(currently they left unprotected for HVM domains) or dropped entirely
as taking a per-domain spinlock is too expensive for high-vCPU count
domains even for debug build given this lock is taken too often.

Choose the latter as they are not extremely important anyway.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 xen/arch/x86/hvm/hvm.c       | 32 ++------------------------------
 xen/arch/x86/time.c          | 12 ------------
 xen/include/asm-x86/domain.h |  4 ----
 3 files changed, 2 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 47573f7..614ed60 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3405,37 +3405,9 @@ int hvm_vmexit_cpuid(struct cpu_user_regs *regs, unsigned int inst_len)
     return hvm_monitor_cpuid(inst_len, leaf, subleaf);
 }
 
-static uint64_t _hvm_rdtsc_intercept(void)
-{
-    struct vcpu *curr = current;
-#if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
-    struct domain *currd = curr->domain;
-
-    if ( currd->arch.vtsc )
-        switch ( hvm_guest_x86_mode(curr) )
-        {
-        case 8:
-        case 4:
-        case 2:
-            if ( unlikely(hvm_get_cpl(curr)) )
-            {
-        case 1:
-                currd->arch.vtsc_usercount++;
-                break;
-            }
-            /* fall through */
-        case 0:
-            currd->arch.vtsc_kerncount++;
-            break;
-        }
-#endif
-
-    return hvm_get_guest_tsc(curr);
-}
-
 void hvm_rdtsc_intercept(struct cpu_user_regs *regs)
 {
-    msr_split(regs, _hvm_rdtsc_intercept());
+    msr_split(regs, hvm_get_guest_tsc(current));
 
     HVMTRACE_2D(RDTSC, regs->eax, regs->edx);
 }
@@ -3464,7 +3436,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
         break;
 
     case MSR_IA32_TSC:
-        *msr_content = _hvm_rdtsc_intercept();
+        *msr_content = hvm_get_guest_tsc(v);
         break;
 
     case MSR_IA32_TSC_ADJUST:
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 27a3a10..216169a 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2135,13 +2135,6 @@ uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
 
     spin_lock(&d->arch.vtsc_lock);
 
-#if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
-    if ( guest_kernel_mode(v, regs) )
-        d->arch.vtsc_kerncount++;
-    else
-        d->arch.vtsc_usercount++;
-#endif
-
     if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
         d->arch.vtsc_last = now;
     else
@@ -2318,11 +2311,6 @@ static void dump_softtsc(unsigned char key)
             printk(",khz=%"PRIu32, d->arch.tsc_khz);
         if ( d->arch.incarnation )
             printk(",inc=%"PRIu32, d->arch.incarnation);
-#if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
-        if ( d->arch.vtsc_kerncount | d->arch.vtsc_usercount )
-            printk(",vtsc count: %"PRIu64" kernel,%"PRIu64" user",
-                   d->arch.vtsc_kerncount, d->arch.vtsc_usercount);
-#endif
         printk("\n");
         domcnt++;
     }
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 212303f..3780287 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -374,10 +374,6 @@ struct arch_domain
                                      hardware TSC scaling cases */
     uint32_t incarnation;    /* incremented every restore or live migrate
                                 (possibly other cases in the future */
-#if !defined(NDEBUG) || defined(CONFIG_PERF_COUNTERS)
-    uint64_t vtsc_kerncount;
-    uint64_t vtsc_usercount;
-#endif
 
     /* Pseudophysical e820 map (XENMEM_memory_map).  */
     spinlock_t e820_lock;
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-13 22:48 [Xen-devel] [PATCH 0/2] vTSC performance improvements Igor Druzhinin
  2019-12-13 22:48 ` [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters Igor Druzhinin
@ 2019-12-13 22:48 ` Igor Druzhinin
  2019-12-16 10:00   ` Roger Pau Monné
  1 sibling, 1 reply; 12+ messages in thread
From: Igor Druzhinin @ 2019-12-13 22:48 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, Igor Druzhinin, wl, jbeulich, roger.pau

Now that vtsc_last is the only entity protected by vtsc_lock we can
simply update it using a single atomic operation and drop the spinlock
entirely. This is extremely important for the case of running nested
(e.g. shim instance with lots of vCPUs assigned) since if preemption
happens somewhere inside the critical section that would immediately
mean that other vCPU stop progressing (and probably being preempted
as well) waiting for the spinlock to be freed.

This fixes constant shim guest boot lockups with ~32 vCPUs if there is
vCPU overcommit present (which increases the likelihood of preemption).

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 xen/arch/x86/domain.c        |  1 -
 xen/arch/x86/time.c          | 16 ++++++----------
 xen/include/asm-x86/domain.h |  1 -
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index bed19fc..94531be 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -539,7 +539,6 @@ int arch_domain_create(struct domain *d,
     INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
 
     spin_lock_init(&d->arch.e820_lock);
-    spin_lock_init(&d->arch.vtsc_lock);
 
     /* Minimal initialisation for the idle domain. */
     if ( unlikely(is_idle_domain(d)) )
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 216169a..202446f 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2130,19 +2130,15 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
 
 uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
 {
-    s_time_t now = get_s_time();
+    s_time_t old, new, now = get_s_time();
     struct domain *d = v->domain;
 
-    spin_lock(&d->arch.vtsc_lock);
-
-    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
-        d->arch.vtsc_last = now;
-    else
-        now = ++d->arch.vtsc_last;
-
-    spin_unlock(&d->arch.vtsc_lock);
+    do {
+        old = d->arch.vtsc_last;
+        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;
+    } while ( cmpxchg(&d->arch.vtsc_last, old, new) != old );
 
-    return gtime_to_gtsc(d, now);
+    return gtime_to_gtsc(d, new);
 }
 
 bool clocksource_is_tsc(void)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 3780287..e4da373 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -364,7 +364,6 @@ struct arch_domain
     int tsc_mode;            /* see include/asm-x86/time.h */
     bool_t vtsc;             /* tsc is emulated (may change after migrate) */
     s_time_t vtsc_last;      /* previous TSC value (guarantee monotonicity) */
-    spinlock_t vtsc_lock;
     uint64_t vtsc_offset;    /* adjustment for save/restore/migrate */
     uint32_t tsc_khz;        /* cached guest khz for certain emulated or
                                 hardware TSC scaling cases */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters
  2019-12-13 22:48 ` [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters Igor Druzhinin
@ 2019-12-16  9:47   ` Roger Pau Monné
  2019-12-16 14:24     ` Andrew Cooper
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2019-12-16  9:47 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: xen-devel, wl, jbeulich, andrew.cooper3

On Fri, Dec 13, 2019 at 10:48:01PM +0000, Igor Druzhinin wrote:
> They either need to be transformed to atomics to work correctly
> (currently they left unprotected for HVM domains) or dropped entirely
                  ^ are used
> as taking a per-domain spinlock is too expensive for high-vCPU count
> domains even for debug build given this lock is taken too often.
> 
> Choose the latter as they are not extremely important anyway.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

I don't find those counters specially useful TBH, but I'm not sure whether
others find them useful. The change LGTM, so:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-13 22:48 ` [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock Igor Druzhinin
@ 2019-12-16 10:00   ` Roger Pau Monné
  2019-12-16 11:21     ` Jan Beulich
  2019-12-16 12:53     ` Igor Druzhinin
  0 siblings, 2 replies; 12+ messages in thread
From: Roger Pau Monné @ 2019-12-16 10:00 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: xen-devel, wl, jbeulich, andrew.cooper3

On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
> Now that vtsc_last is the only entity protected by vtsc_lock we can
> simply update it using a single atomic operation and drop the spinlock
> entirely. This is extremely important for the case of running nested
> (e.g. shim instance with lots of vCPUs assigned) since if preemption
> happens somewhere inside the critical section that would immediately
> mean that other vCPU stop progressing (and probably being preempted
> as well) waiting for the spinlock to be freed.
> 
> This fixes constant shim guest boot lockups with ~32 vCPUs if there is
> vCPU overcommit present (which increases the likelihood of preemption).
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
>  xen/arch/x86/domain.c        |  1 -
>  xen/arch/x86/time.c          | 16 ++++++----------
>  xen/include/asm-x86/domain.h |  1 -
>  3 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index bed19fc..94531be 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -539,7 +539,6 @@ int arch_domain_create(struct domain *d,
>      INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
>  
>      spin_lock_init(&d->arch.e820_lock);
> -    spin_lock_init(&d->arch.vtsc_lock);
>  
>      /* Minimal initialisation for the idle domain. */
>      if ( unlikely(is_idle_domain(d)) )
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 216169a..202446f 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -2130,19 +2130,15 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
>  
>  uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
>  {
> -    s_time_t now = get_s_time();
> +    s_time_t old, new, now = get_s_time();
>      struct domain *d = v->domain;
>  
> -    spin_lock(&d->arch.vtsc_lock);
> -
> -    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
> -        d->arch.vtsc_last = now;
> -    else
> -        now = ++d->arch.vtsc_last;
> -
> -    spin_unlock(&d->arch.vtsc_lock);
> +    do {
> +        old = d->arch.vtsc_last;
> +        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;

Why do you need to do this subtraction? Isn't it easier to just do:

new = now > d->arch.vtsc_last ? now : old + 1;

That avoids the cast and the subtraction.

> +    } while ( cmpxchg(&d->arch.vtsc_last, old, new) != old );

I'm not sure if the following would be slightly better performance
wise:

do {
    old = d->arch.vtsc_last;
    if ( d->arch.vtsc_last >= now )
    {
        new = atomic_inc_return(&d->arch.vtsc_last);
        break;
    }
    else
        new = now;
} while ( cmpxchg(&d->arch.vtsc_last, old, new) != old );

In any case I'm fine with your version using cmpxchg exclusively.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-16 10:00   ` Roger Pau Monné
@ 2019-12-16 11:21     ` Jan Beulich
  2019-12-16 12:30       ` Roger Pau Monné
  2019-12-16 12:53     ` Igor Druzhinin
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2019-12-16 11:21 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Igor Druzhinin, andrew.cooper3, wl, xen-devel

On 16.12.2019 11:00, Roger Pau Monné wrote:
> On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
>> Now that vtsc_last is the only entity protected by vtsc_lock we can
>> simply update it using a single atomic operation and drop the spinlock
>> entirely. This is extremely important for the case of running nested
>> (e.g. shim instance with lots of vCPUs assigned) since if preemption
>> happens somewhere inside the critical section that would immediately
>> mean that other vCPU stop progressing (and probably being preempted
>> as well) waiting for the spinlock to be freed.
>>
>> This fixes constant shim guest boot lockups with ~32 vCPUs if there is
>> vCPU overcommit present (which increases the likelihood of preemption).
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>  xen/arch/x86/domain.c        |  1 -
>>  xen/arch/x86/time.c          | 16 ++++++----------
>>  xen/include/asm-x86/domain.h |  1 -
>>  3 files changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index bed19fc..94531be 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -539,7 +539,6 @@ int arch_domain_create(struct domain *d,
>>      INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
>>  
>>      spin_lock_init(&d->arch.e820_lock);
>> -    spin_lock_init(&d->arch.vtsc_lock);
>>  
>>      /* Minimal initialisation for the idle domain. */
>>      if ( unlikely(is_idle_domain(d)) )
>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 216169a..202446f 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2130,19 +2130,15 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
>>  
>>  uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
>>  {
>> -    s_time_t now = get_s_time();
>> +    s_time_t old, new, now = get_s_time();
>>      struct domain *d = v->domain;
>>  
>> -    spin_lock(&d->arch.vtsc_lock);
>> -
>> -    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
>> -        d->arch.vtsc_last = now;
>> -    else
>> -        now = ++d->arch.vtsc_last;
>> -
>> -    spin_unlock(&d->arch.vtsc_lock);
>> +    do {
>> +        old = d->arch.vtsc_last;
>> +        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;
> 
> Why do you need to do this subtraction? Isn't it easier to just do:
> 
> new = now > d->arch.vtsc_last ? now : old + 1;

This wouldn't be reliable when the TSC wraps. Remember that firmware
may set the TSC, and it has been seen to be set to very large
(effectively negative, if they were signed quantities) values, which
will then eventually wrap (whereas we're not typically concerned of
64-bit counters wrapping when they start from zero).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-16 11:21     ` Jan Beulich
@ 2019-12-16 12:30       ` Roger Pau Monné
  2019-12-16 12:45         ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2019-12-16 12:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Igor Druzhinin, andrew.cooper3, wl, xen-devel

On Mon, Dec 16, 2019 at 12:21:09PM +0100, Jan Beulich wrote:
> On 16.12.2019 11:00, Roger Pau Monné wrote:
> > On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
> >> Now that vtsc_last is the only entity protected by vtsc_lock we can
> >> simply update it using a single atomic operation and drop the spinlock
> >> entirely. This is extremely important for the case of running nested
> >> (e.g. shim instance with lots of vCPUs assigned) since if preemption
> >> happens somewhere inside the critical section that would immediately
> >> mean that other vCPU stop progressing (and probably being preempted
> >> as well) waiting for the spinlock to be freed.
> >>
> >> This fixes constant shim guest boot lockups with ~32 vCPUs if there is
> >> vCPU overcommit present (which increases the likelihood of preemption).
> >>
> >> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> >> ---
> >>  xen/arch/x86/domain.c        |  1 -
> >>  xen/arch/x86/time.c          | 16 ++++++----------
> >>  xen/include/asm-x86/domain.h |  1 -
> >>  3 files changed, 6 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> >> index bed19fc..94531be 100644
> >> --- a/xen/arch/x86/domain.c
> >> +++ b/xen/arch/x86/domain.c
> >> @@ -539,7 +539,6 @@ int arch_domain_create(struct domain *d,
> >>      INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
> >>  
> >>      spin_lock_init(&d->arch.e820_lock);
> >> -    spin_lock_init(&d->arch.vtsc_lock);
> >>  
> >>      /* Minimal initialisation for the idle domain. */
> >>      if ( unlikely(is_idle_domain(d)) )
> >> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> >> index 216169a..202446f 100644
> >> --- a/xen/arch/x86/time.c
> >> +++ b/xen/arch/x86/time.c
> >> @@ -2130,19 +2130,15 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
> >>  
> >>  uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
> >>  {
> >> -    s_time_t now = get_s_time();
> >> +    s_time_t old, new, now = get_s_time();
> >>      struct domain *d = v->domain;
> >>  
> >> -    spin_lock(&d->arch.vtsc_lock);
> >> -
> >> -    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
> >> -        d->arch.vtsc_last = now;
> >> -    else
> >> -        now = ++d->arch.vtsc_last;
> >> -
> >> -    spin_unlock(&d->arch.vtsc_lock);
> >> +    do {
> >> +        old = d->arch.vtsc_last;
> >> +        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;
> > 
> > Why do you need to do this subtraction? Isn't it easier to just do:
> > 
> > new = now > d->arch.vtsc_last ? now : old + 1;
> 
> This wouldn't be reliable when the TSC wraps. Remember that firmware
> may set the TSC, and it has been seen to be set to very large
> (effectively negative, if they were signed quantities) values,

s_time_t is a signed value AFAICT (s64).

> which
> will then eventually wrap (whereas we're not typically concerned of
> 64-bit counters wrapping when they start from zero).

But get_s_time returns the system time in ns since boot, not the TSC
value, hence it will start from 0 and we shouldn't be concerned about
wraps?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-16 12:30       ` Roger Pau Monné
@ 2019-12-16 12:45         ` Jan Beulich
  2019-12-16 12:55           ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2019-12-16 12:45 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Igor Druzhinin, andrew.cooper3, wl, xen-devel

On 16.12.2019 13:30, Roger Pau Monné wrote:
> On Mon, Dec 16, 2019 at 12:21:09PM +0100, Jan Beulich wrote:
>> On 16.12.2019 11:00, Roger Pau Monné wrote:
>>> On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
>>>> Now that vtsc_last is the only entity protected by vtsc_lock we can
>>>> simply update it using a single atomic operation and drop the spinlock
>>>> entirely. This is extremely important for the case of running nested
>>>> (e.g. shim instance with lots of vCPUs assigned) since if preemption
>>>> happens somewhere inside the critical section that would immediately
>>>> mean that other vCPU stop progressing (and probably being preempted
>>>> as well) waiting for the spinlock to be freed.
>>>>
>>>> This fixes constant shim guest boot lockups with ~32 vCPUs if there is
>>>> vCPU overcommit present (which increases the likelihood of preemption).
>>>>
>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> ---
>>>>  xen/arch/x86/domain.c        |  1 -
>>>>  xen/arch/x86/time.c          | 16 ++++++----------
>>>>  xen/include/asm-x86/domain.h |  1 -
>>>>  3 files changed, 6 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>>> index bed19fc..94531be 100644
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -539,7 +539,6 @@ int arch_domain_create(struct domain *d,
>>>>      INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
>>>>  
>>>>      spin_lock_init(&d->arch.e820_lock);
>>>> -    spin_lock_init(&d->arch.vtsc_lock);
>>>>  
>>>>      /* Minimal initialisation for the idle domain. */
>>>>      if ( unlikely(is_idle_domain(d)) )
>>>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>>>> index 216169a..202446f 100644
>>>> --- a/xen/arch/x86/time.c
>>>> +++ b/xen/arch/x86/time.c
>>>> @@ -2130,19 +2130,15 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
>>>>  
>>>>  uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
>>>>  {
>>>> -    s_time_t now = get_s_time();
>>>> +    s_time_t old, new, now = get_s_time();
>>>>      struct domain *d = v->domain;
>>>>  
>>>> -    spin_lock(&d->arch.vtsc_lock);
>>>> -
>>>> -    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
>>>> -        d->arch.vtsc_last = now;
>>>> -    else
>>>> -        now = ++d->arch.vtsc_last;
>>>> -
>>>> -    spin_unlock(&d->arch.vtsc_lock);
>>>> +    do {
>>>> +        old = d->arch.vtsc_last;
>>>> +        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;
>>>
>>> Why do you need to do this subtraction? Isn't it easier to just do:
>>>
>>> new = now > d->arch.vtsc_last ? now : old + 1;
>>
>> This wouldn't be reliable when the TSC wraps. Remember that firmware
>> may set the TSC, and it has been seen to be set to very large
>> (effectively negative, if they were signed quantities) values,
> 
> s_time_t is a signed value AFAICT (s64).

Oh, I should have looked at types, rather than inferring uint64_t
in particular for something like vtsc_last.

>> which
>> will then eventually wrap (whereas we're not typically concerned of
>> 64-bit counters wrapping when they start from zero).
> 
> But get_s_time returns the system time in ns since boot, not the TSC
> value, hence it will start from 0 and we shouldn't be concerned about
> wraps?

Good point, seeing that all parts here are s_time_t. Of course
with all parts being so, there's indeed no need for the cast,
but comparing both values is then equivalent to comparing the
difference against zero.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-16 10:00   ` Roger Pau Monné
  2019-12-16 11:21     ` Jan Beulich
@ 2019-12-16 12:53     ` Igor Druzhinin
  2019-12-16 12:57       ` Roger Pau Monné
  1 sibling, 1 reply; 12+ messages in thread
From: Igor Druzhinin @ 2019-12-16 12:53 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, wl, jbeulich, andrew.cooper3

On 16/12/2019 10:00, Roger Pau Monné wrote:
> On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
>> Now that vtsc_last is the only entity protected by vtsc_lock we can
>> simply update it using a single atomic operation and drop the spinlock
>> entirely. This is extremely important for the case of running nested
>> (e.g. shim instance with lots of vCPUs assigned) since if preemption
>> happens somewhere inside the critical section that would immediately
>> mean that other vCPU stop progressing (and probably being preempted
>> as well) waiting for the spinlock to be freed.
>>
>> This fixes constant shim guest boot lockups with ~32 vCPUs if there is
>> vCPU overcommit present (which increases the likelihood of preemption).
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>  xen/arch/x86/domain.c        |  1 -
>>  xen/arch/x86/time.c          | 16 ++++++----------
>>  xen/include/asm-x86/domain.h |  1 -
>>  3 files changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index bed19fc..94531be 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -539,7 +539,6 @@ int arch_domain_create(struct domain *d,
>>      INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
>>  
>>      spin_lock_init(&d->arch.e820_lock);
>> -    spin_lock_init(&d->arch.vtsc_lock);
>>  
>>      /* Minimal initialisation for the idle domain. */
>>      if ( unlikely(is_idle_domain(d)) )
>> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
>> index 216169a..202446f 100644
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -2130,19 +2130,15 @@ u64 gtsc_to_gtime(struct domain *d, u64 tsc)
>>  
>>  uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
>>  {
>> -    s_time_t now = get_s_time();
>> +    s_time_t old, new, now = get_s_time();
>>      struct domain *d = v->domain;
>>  
>> -    spin_lock(&d->arch.vtsc_lock);
>> -
>> -    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
>> -        d->arch.vtsc_last = now;
>> -    else
>> -        now = ++d->arch.vtsc_last;
>> -
>> -    spin_unlock(&d->arch.vtsc_lock);
>> +    do {
>> +        old = d->arch.vtsc_last;
>> +        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;
> 
> Why do you need to do this subtraction? Isn't it easier to just do:
> 
> new = now > d->arch.vtsc_last ? now : old + 1;
> 
> That avoids the cast and the subtraction.

I'm afraid I fell into the same trap as Jan. Given they are both signed will
change in v2.

>> +    } while ( cmpxchg(&d->arch.vtsc_last, old, new) != old );
> 
> I'm not sure if the following would be slightly better performance
> wise:
> 
> do {
>     old = d->arch.vtsc_last;
>     if ( d->arch.vtsc_last >= now )
>     {
>         new = atomic_inc_return(&d->arch.vtsc_last);
>         break;
>     }
>     else
>         new = now;
> } while ( cmpxchg(&d->arch.vtsc_last, old, new) != old );
> 
> In any case I'm fine with your version using cmpxchg exclusively.

That could be marginally better (knowing that atomic increment usually performs
better than cmpxchg) but it took me some time to work out there is no hidden
race here. I'd request a third opinion on the matter if it's worth changing.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-16 12:45         ` Jan Beulich
@ 2019-12-16 12:55           ` Roger Pau Monné
  0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2019-12-16 12:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Igor Druzhinin, andrew.cooper3, wl, xen-devel

On Mon, Dec 16, 2019 at 01:45:10PM +0100, Jan Beulich wrote:
> On 16.12.2019 13:30, Roger Pau Monné wrote:
> > On Mon, Dec 16, 2019 at 12:21:09PM +0100, Jan Beulich wrote:
> >> On 16.12.2019 11:00, Roger Pau Monné wrote:
> >>> On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
> >>>>  uint64_t pv_soft_rdtsc(const struct vcpu *v, const struct cpu_user_regs *regs)
> >>>>  {
> >>>> -    s_time_t now = get_s_time();
> >>>> +    s_time_t old, new, now = get_s_time();
> >>>>      struct domain *d = v->domain;
> >>>>  
> >>>> -    spin_lock(&d->arch.vtsc_lock);
> >>>> -
> >>>> -    if ( (int64_t)(now - d->arch.vtsc_last) > 0 )
> >>>> -        d->arch.vtsc_last = now;
> >>>> -    else
> >>>> -        now = ++d->arch.vtsc_last;
> >>>> -
> >>>> -    spin_unlock(&d->arch.vtsc_lock);
> >>>> +    do {
> >>>> +        old = d->arch.vtsc_last;
> >>>> +        new = (int64_t)(now - d->arch.vtsc_last) > 0 ? now : old + 1;
> >>>
> >>> Why do you need to do this subtraction? Isn't it easier to just do:
> >>>
> >>> new = now > d->arch.vtsc_last ? now : old + 1;
> >>
> >> This wouldn't be reliable when the TSC wraps. Remember that firmware
> >> may set the TSC, and it has been seen to be set to very large
> >> (effectively negative, if they were signed quantities) values,
> > 
> > s_time_t is a signed value AFAICT (s64).
> 
> Oh, I should have looked at types, rather than inferring uint64_t
> in particular for something like vtsc_last.
> 
> >> which
> >> will then eventually wrap (whereas we're not typically concerned of
> >> 64-bit counters wrapping when they start from zero).
> > 
> > But get_s_time returns the system time in ns since boot, not the TSC
> > value, hence it will start from 0 and we shouldn't be concerned about
> > wraps?
> 
> Good point, seeing that all parts here are s_time_t. Of course
> with all parts being so, there's indeed no need for the cast,
> but comparing both values is then equivalent to comparing the
> difference against zero.

Right, I just think it's easier to compare both values instead of
comparing the difference against zero (and likely less expensive in
terms of performance).

Anyway, I prefer comparing both values instead of the difference, but
that's also correct and I would be fine with it as long as the cast is
dropped.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock
  2019-12-16 12:53     ` Igor Druzhinin
@ 2019-12-16 12:57       ` Roger Pau Monné
  0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2019-12-16 12:57 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: xen-devel, wl, jbeulich, andrew.cooper3

On Mon, Dec 16, 2019 at 12:53:40PM +0000, Igor Druzhinin wrote:
> On 16/12/2019 10:00, Roger Pau Monné wrote:
> > On Fri, Dec 13, 2019 at 10:48:02PM +0000, Igor Druzhinin wrote:
> > I'm not sure if the following would be slightly better performance
> > wise:
> > 
> > do {
> >     old = d->arch.vtsc_last;
> >     if ( d->arch.vtsc_last >= now )
> >     {
> >         new = atomic_inc_return(&d->arch.vtsc_last);
> >         break;
> >     }
> >     else
> >         new = now;
> > } while ( cmpxchg(&d->arch.vtsc_last, old, new) != old );
> > 
> > In any case I'm fine with your version using cmpxchg exclusively.
> 
> That could be marginally better (knowing that atomic increment usually performs
> better than cmpxchg) but it took me some time to work out there is no hidden
> race here. I'd request a third opinion on the matter if it's worth changing.

Anyway, your proposed approach using cmpxchg is fine IMO, we can leave
the atomic increment for a further improvement if there's a need for
it.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters
  2019-12-16  9:47   ` Roger Pau Monné
@ 2019-12-16 14:24     ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2019-12-16 14:24 UTC (permalink / raw)
  To: Roger Pau Monné, Igor Druzhinin; +Cc: xen-devel, wl, jbeulich

On 16/12/2019 09:47, Roger Pau Monné wrote:
> On Fri, Dec 13, 2019 at 10:48:01PM +0000, Igor Druzhinin wrote:
>> They either need to be transformed to atomics to work correctly
>> (currently they left unprotected for HVM domains) or dropped entirely
>                   ^ are used
>> as taking a per-domain spinlock is too expensive for high-vCPU count
>> domains even for debug build given this lock is taken too often.
>>
>> Choose the latter as they are not extremely important anyway.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> I don't find those counters specially useful TBH, but I'm not sure whether
> others find them useful. The change LGTM, so:
>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Jan and I already considered dropping them once (because of the HVM
observation), but didn't find any harm with keeping them, seeing as they
were diagnostic-only.

I suspect they were put in for PVRDTSCP which has since been dropped.

We now have a clear case where dropping them is of use to Xen.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-16 14:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-13 22:48 [Xen-devel] [PATCH 0/2] vTSC performance improvements Igor Druzhinin
2019-12-13 22:48 ` [Xen-devel] [PATCH 1/2] x86/time: drop vtsc_{kern, user}count debug counters Igor Druzhinin
2019-12-16  9:47   ` Roger Pau Monné
2019-12-16 14:24     ` Andrew Cooper
2019-12-13 22:48 ` [Xen-devel] [PATCH 2/2] x86/time: update vtsc_last with cmpxchg and drop vtsc_lock Igor Druzhinin
2019-12-16 10:00   ` Roger Pau Monné
2019-12-16 11:21     ` Jan Beulich
2019-12-16 12:30       ` Roger Pau Monné
2019-12-16 12:45         ` Jan Beulich
2019-12-16 12:55           ` Roger Pau Monné
2019-12-16 12:53     ` Igor Druzhinin
2019-12-16 12:57       ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.