xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [SPAM] A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
@ 2010-11-13 10:56 Song Xiang
  2010-11-15  8:28 ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Song Xiang @ 2010-11-13 10:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Haibo Chen


[-- Attachment #1.1: Type: text/plain, Size: 3120 bytes --]

Hi all,

This provides a patch to arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and  
Xen 4.0.1 to mitigate the heavy contention on handle_pmt_io when  
running a HVM configured with 32 cores on a 48-core HP machine. (8 * 6  
AMD 2.4GHz Opteron chips)

We used seven applications to profile guest HVM(Debian GNU/Linux 5.0,  
kernel version 2.6.35-rc5 ) upon Xen 4.0.0 and Xen 4.0.1, and found  
all applications encountered heavy contention on a spin_lock inside  
handle_pmt_io in Xen.

The patch is a workaround for eliminating the contention on  
handle_pmt_io, As the virtual time must be fresh, there should be  
someone updating it. But it is not necessary to let a VCPU update the  
virtual time when another one has been updating it. Thus the update  
can be skipped when the VCPU finds someone else is updating the  
virtual time. The patch substitutes the spin_lock with spin_try_lock  
to check whether someone is holding the spin_lock. If so, there must  
be someone refreshing the virtual time, and others can just skip the  
operation. Otherwise, the spin_lock is acquired, and the current VCPU  
should update the virtual time.

The performance improvements of each application(running on a 32-core  
HVM and pinning one thread/process to each core) after applying the  
patch to Xen 4.0.0 and Xen 4.0.1 is as follows:


original	patched	improvements
gmake (sec)	91.2	86.4	5.6%
phoenix-histogram (sec)	32.79	27.43	19.5%
phoenix-wordcount (sec)	279.22	232.85	19.9%
phoenix-linear_regression (sec)	2.57	2.4	7.2%
specjvm-compress (ops/min)	774.37	923.71	19.0%
specjvm-crypto  (ops/min)	209.55	251.79	20.0%
specjvm-xml-validation  (ops/min)	624.46	785.51	26.0%

			Performance of each application on a 32-core HVM on Xen 4.0.0


original	patched	improvements
gmake (sec)	89.04	85.93	3.6%
phoenix-histogram (sec)	42.63	28.27	50.8%
phoenix-wordcount (sec)	280.38	238.93	17.3%
phoenix-linear_regression (sec)	2.58	2.42	6.5%
specjvm-compress (ops/min)	751.33	923.84	23.0%
specjvm-crypto  (ops/min)	209.33	243.28	16.2%
specjvm-xml-validation  (ops/min)	620.41	772.25	24.5%

			Performance of each application on a 32-core HVM on Xen 4.0.1


For more details, please refer to our technical  report:
http://ppi.fudan.edu.cn/_media/xen-profile-report.pdf?id=system_research_group&cache=cache


The patch is same for xen4.0.0 and xen4.0.1:
Index: arch/x86/hvm/pmtimer.c
===================================================================
--- arch/x86/hvm/pmtimer.c	(revision 4651)
+++ arch/x86/hvm/pmtimer.c	(working copy)
@@ -206,10 +206,17 @@

      if ( dir == IOREQ_READ )
      {
-        spin_lock(&s->lock);
-        pmt_update_time(s);
-        *val = s->pm.tmr_val;
-        spin_unlock(&s->lock);
+        /*
+         * if acquired the PMTState lock then update the time
+         * else other vcpu is updating it ,it should be up to date.
+         */
+        if (spin_trylock(&s->lock)) {
+            pmt_update_time(s);
+            *val = s->pm.tmr_val;
+            spin_unlock(&s->lock);
+        }
+        else
+            *val = (s->pm.tmr_val & TMR_VAL_MASK);
          return X86EMUL_OKAY;
      }

[-- Attachment #1.2: Type: text/html, Size: 11644 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-13 10:56 [SPAM] A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability Song Xiang
@ 2010-11-15  8:28 ` Jan Beulich
  2010-11-16 14:51   ` [SPAM] " Song Xiang
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2010-11-15  8:28 UTC (permalink / raw)
  To: Song Xiang; +Cc: xen-devel, Haibo Chen

>>> On 13.11.10 at 11:56, Song Xiang <classicxsong@gmail.com> wrote:
> --- arch/x86/hvm/pmtimer.c	(revision 4651)
> +++ arch/x86/hvm/pmtimer.c	(working copy)
> @@ -206,10 +206,17 @@
> 
>       if ( dir == IOREQ_READ )
>       {
> -        spin_lock(&s->lock);
> -        pmt_update_time(s);
> -        *val = s->pm.tmr_val;
> -        spin_unlock(&s->lock);
> +        /*
> +         * if acquired the PMTState lock then update the time
> +         * else other vcpu is updating it ,it should be up to date.
> +         */
> +        if (spin_trylock(&s->lock)) {
> +            pmt_update_time(s);
> +            *val = s->pm.tmr_val;
> +            spin_unlock(&s->lock);
> +        }
> +        else
> +            *val = (s->pm.tmr_val & TMR_VAL_MASK);
>           return X86EMUL_OKAY;
>       }

I don't think this is correct: While it seems reasonable to skip the
global update, returning potentially stale time to the guest isn't.
You'd need to mimic what pmt_update_time() does, just without
updating global state. That, however, isn't going to be that
trivial as you need to also read global state for doing the
calculations.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPAM] Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-16 14:51   ` [SPAM] " Song Xiang
@ 2010-11-16  7:48     ` Keir Fraser
  2010-11-16  7:51       ` Keir Fraser
  2010-11-16  7:54       ` Jan Beulich
  2010-11-16  7:50     ` Jan Beulich
  1 sibling, 2 replies; 8+ messages in thread
From: Keir Fraser @ 2010-11-16  7:48 UTC (permalink / raw)
  To: Song Xiang, Jan Beulich; +Cc: xen-devel, Haibo Chen

On 16/11/2010 14:51, "Song Xiang" <classicxsong@gmail.com> wrote:

> +        /*
> +         * if acquired the PMTState lock then update the time
> +         * else other vcpu is updating it ,it should be up to date.
> +         */
> +        tmp = atomic_read(&s-> ownership);
> +        if (spin_trylock(&s->lock)) {
> +            pmt_update_time(s);
> +            *val = s->pm.tmr_val;
> +            spin_unlock(&s->lock);
> +            atomic_inc(&s-> ownership);
> +        }
> +        else {
> +            while (tmp == atomic_read(&s-> ownership));

You've kind of implemented a spin_barrier(). What you implemented could be
better and equivalently done as something like:

if (spin_trylock(&s->lock)) {
  ...
} else {
  spin_barrier(&s->lock);
}

No need for your new field at all! It initially seems weird that this
performs much better than the original code, but I guess it might: if all
VCPUs are piling in here at the same time, rather than having to execute one
by one, we'll have one go first and then the others will all execute
simultaneously read-only in convoy...

Anyway, please modify your patch as I suggest above, and then also confirm
what speedups you get with the revised patch. I want to know what the wins
are before applying optimisation hacks. I don't particularly have an issue
with them as long as they work! :-)

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [SPAM] Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-16 14:51   ` [SPAM] " Song Xiang
  2010-11-16  7:48     ` Keir Fraser
@ 2010-11-16  7:50     ` Jan Beulich
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Beulich @ 2010-11-16  7:50 UTC (permalink / raw)
  To: Song Xiang; +Cc: xen-devel, Haibo Chen

>>> On 16.11.10 at 15:51, Song Xiang <classicxsong@gmail.com> wrote:
> We updated our patch as follows:
> 
> Index: arch/x86/hvm/pmtimer.c
> ===================================================================
> --- arch/x86/hvm/pmtimer.c
> +++ arch/x86/hvm/pmtimer.c
> @@ -206,13 +206,23 @@
> 
>       if ( dir == IOREQ_READ )
>       {
> -        spin_lock(&s->lock);
> -        pmt_update_time(s);
> -        *val = s->pm.tmr_val;
> -        spin_unlock(&s->lock);
> +        uint32_t tmp;
> +        /*
> +         * if acquired the PMTState lock then update the time
> +         * else other vcpu is updating it ,it should be up to date.
> +         */
> +        tmp = atomic_read(&s-> ownership);
> +        if (spin_trylock(&s->lock)) {
> +            pmt_update_time(s);
> +            *val = s->pm.tmr_val;
> +            spin_unlock(&s->lock);
> +            atomic_inc(&s-> ownership);
> +        }
> +        else {
> +            while (tmp == atomic_read(&s-> ownership));
> +
> +	      *val = s->pm.tmr_val;
> +        }
>           return X86EMUL_OKAY;
>       }
> 
> Index: include/asm-x86/hvm/vpt.h
> ===================================================================
> --- include/asm-x86/hvm/vpt.h
> +++ include/asm-x86/hvm/vpt.h
> @@ -120,6 +120,7 @@
>       uint64_t scale;             /* Multiplier to get from tsc to  
> timer ticks */
>       struct timer timer;         /* To make sure we send SCIs */
>       spinlock_t lock;
> +   atomic_t ownership;
>   } PMTState;
> 
>   struct pl_time {    /* platform time */
> 
> 
> The key idea is, to keep the returned time fresh, any VCPU that fails  
> to acquire the PMTState lock will wait until the PMTState lock holder  
> updates the global virtual time, and then returns the freshest time.
>
> We add a field in PMTState, named ownership. The PMTState->ownership  
> can only be updated by the PMTState lock holder.It is updated after  
> the PMTState lock holder has updated the global virtual time and  
> released the MTState lock.  Other VCPUs that fail to acquire the  
> PMTState lock will check whether the MTState->ownership is updated in  
> a while loop.  When the PMTState->ownership is changed, the global  
> virtual time must be the freshest, and can be returned to the guest OS.
> 
> The time returned to the guest in this patch is fresher than the  
> previous one we have proposed.

That's not better: Depending on timing, the reader may well loop
until the *next* acquirer releases the lock (and increments
"ownership").

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPAM] Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-16  7:48     ` Keir Fraser
@ 2010-11-16  7:51       ` Keir Fraser
  2010-11-16  7:54       ` Jan Beulich
  1 sibling, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2010-11-16  7:51 UTC (permalink / raw)
  To: Song Xiang, Jan Beulich; +Cc: xen-devel, Haibo Chen

On 16/11/2010 07:48, "Keir Fraser" <keir@xen.org> wrote:

> Anyway, please modify your patch as I suggest above, and then also confirm
> what speedups you get with the revised patch. I want to know what the wins
> are before applying optimisation hacks. I don't particularly have an issue
> with them as long as they work! :-)

Oh, also, if your patch is intended to be applied straight away, you also
need to always provide a suitable patch description and Signed-off-by
line(s) directly above the patch in your email. Otherwise I tend to silently
ignore the email.

 -- Keir

>  -- Keir
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPAM] Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-16  7:48     ` Keir Fraser
  2010-11-16  7:51       ` Keir Fraser
@ 2010-11-16  7:54       ` Jan Beulich
  2010-11-16  8:08         ` Keir Fraser
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2010-11-16  7:54 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Song Xiang, xen-devel, Haibo Chen

>>> On 16.11.10 at 08:48, Keir Fraser <keir@xen.org> wrote:
> On 16/11/2010 14:51, "Song Xiang" <classicxsong@gmail.com> wrote:
> 
>> +        /*
>> +         * if acquired the PMTState lock then update the time
>> +         * else other vcpu is updating it ,it should be up to date.
>> +         */
>> +        tmp = atomic_read(&s-> ownership);
>> +        if (spin_trylock(&s->lock)) {
>> +            pmt_update_time(s);
>> +            *val = s->pm.tmr_val;
>> +            spin_unlock(&s->lock);
>> +            atomic_inc(&s-> ownership);
>> +        }
>> +        else {
>> +            while (tmp == atomic_read(&s-> ownership));
> 
> You've kind of implemented a spin_barrier(). What you implemented could be
> better and equivalently done as something like:
> 
> if (spin_trylock(&s->lock)) {
>   ...
> } else {
>   spin_barrier(&s->lock);
> }
> 
> No need for your new field at all! It initially seems weird that this
> performs much better than the original code, but I guess it might: if all
> VCPUs are piling in here at the same time, rather than having to execute one
> by one, we'll have one go first and then the others will all execute
> simultaneously read-only in convoy...

They'd need to re-measure whether this still provides a meaningful
benefit, I think.

Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPAM] Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-16  7:54       ` Jan Beulich
@ 2010-11-16  8:08         ` Keir Fraser
  0 siblings, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2010-11-16  8:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Song Xiang, xen-devel, Haibo Chen

On 16/11/2010 07:54, "Jan Beulich" <JBeulich@novell.com> wrote:

>> No need for your new field at all! It initially seems weird that this
>> performs much better than the original code, but I guess it might: if all
>> VCPUs are piling in here at the same time, rather than having to execute one
>> by one, we'll have one go first and then the others will all execute
>> simultaneously read-only in convoy...
> 
> They'd need to re-measure whether this still provides a meaningful
> benefit, I think.

Definitely. I explicitly asked for that.

Oh, also the approach is currently buggy. Although s->pm.tmr_val is atomic
to read, it is not atomically updated in pmt_update_time(). Need to
calculate new value for tmr_val in a local variable then update the shared
field with an atomic_set()-alike *(volatile u32 *)&s->pm.tmr_val = x type of
arrangement.

For all this extra fragility, the perf win needs to be compelling.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [SPAM] Re: A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability
  2010-11-15  8:28 ` Jan Beulich
@ 2010-11-16 14:51   ` Song Xiang
  2010-11-16  7:48     ` Keir Fraser
  2010-11-16  7:50     ` Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Song Xiang @ 2010-11-16 14:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Haibo Chen

We updated our patch as follows:

Index: arch/x86/hvm/pmtimer.c
===================================================================
--- arch/x86/hvm/pmtimer.c
+++ arch/x86/hvm/pmtimer.c
@@ -206,13 +206,23 @@

      if ( dir == IOREQ_READ )
      {
-        spin_lock(&s->lock);
-        pmt_update_time(s);
-        *val = s->pm.tmr_val;
-        spin_unlock(&s->lock);
+        uint32_t tmp;
+        /*
+         * if acquired the PMTState lock then update the time
+         * else other vcpu is updating it ,it should be up to date.
+         */
+        tmp = atomic_read(&s-> ownership);
+        if (spin_trylock(&s->lock)) {
+            pmt_update_time(s);
+            *val = s->pm.tmr_val;
+            spin_unlock(&s->lock);
+            atomic_inc(&s-> ownership);
+        }
+        else {
+            while (tmp == atomic_read(&s-> ownership));
+
+	      *val = s->pm.tmr_val;
+        }
          return X86EMUL_OKAY;
      }

Index: include/asm-x86/hvm/vpt.h
===================================================================
--- include/asm-x86/hvm/vpt.h
+++ include/asm-x86/hvm/vpt.h
@@ -120,6 +120,7 @@
      uint64_t scale;             /* Multiplier to get from tsc to  
timer ticks */
      struct timer timer;         /* To make sure we send SCIs */
      spinlock_t lock;
+   atomic_t ownership;
  } PMTState;

  struct pl_time {    /* platform time */


The key idea is, to keep the returned time fresh, any VCPU that fails  
to acquire the PMTState lock will wait until the PMTState lock holder  
updates the global virtual time, and then returns the freshest time.

We add a field in PMTState, named ownership. The PMTState->ownership  
can only be updated by the PMTState lock holder.It is updated after  
the PMTState lock holder has updated the global virtual time and  
released the MTState lock.  Other VCPUs that fail to acquire the  
PMTState lock will check whether the MTState->ownership is updated in  
a while loop.  When the PMTState->ownership is changed, the global  
virtual time must be the freshest, and can be returned to the guest OS.

The time returned to the guest in this patch is fresher than the  
previous one we have proposed.


在 2010-11-15,上午8:28, Jan Beulich 写道:

>>>> On 13.11.10 at 11:56, Song Xiang <classicxsong@gmail.com> wrote:
>> --- arch/x86/hvm/pmtimer.c	(revision 4651)
>> +++ arch/x86/hvm/pmtimer.c	(working copy)
>> @@ -206,10 +206,17 @@
>>
>>      if ( dir == IOREQ_READ )
>>      {
>> -        spin_lock(&s->lock);
>> -        pmt_update_time(s);
>> -        *val = s->pm.tmr_val;
>> -        spin_unlock(&s->lock);
>> +        /*
>> +         * if acquired the PMTState lock then update the time
>> +         * else other vcpu is updating it ,it should be up to date.
>> +         */
>> +        if (spin_trylock(&s->lock)) {
>> +            pmt_update_time(s);
>> +            *val = s->pm.tmr_val;
>> +            spin_unlock(&s->lock);
>> +        }
>> +        else
>> +            *val = (s->pm.tmr_val & TMR_VAL_MASK);
>>          return X86EMUL_OKAY;
>>      }
>
> I don't think this is correct: While it seems reasonable to skip the
> global update, returning potentially stale time to the guest isn't.
> You'd need to mimic what pmt_update_time() does, just without
> updating global state. That, however, isn't going to be that
> trivial as you need to also read global state for doing the
> calculations.
>
> Jan
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-11-16 14:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-13 10:56 [SPAM] A patch to xen/arch/x86/hvm/pmtimer.c for both Xen 4.0.0 and Xen 4.0.1 to improve HVM scalability Song Xiang
2010-11-15  8:28 ` Jan Beulich
2010-11-16 14:51   ` [SPAM] " Song Xiang
2010-11-16  7:48     ` Keir Fraser
2010-11-16  7:51       ` Keir Fraser
2010-11-16  7:54       ` Jan Beulich
2010-11-16  8:08         ` Keir Fraser
2010-11-16  7:50     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).