All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Performance Results
@ 2017-12-15 15:14 Sergej Proskurin
  2017-12-21 14:42 ` Sergej Proskurin
  0 siblings, 1 reply; 6+ messages in thread
From: Sergej Proskurin @ 2017-12-15 15:14 UTC (permalink / raw)
  To: Xen-devel

Hi all,

I have a question concerning a 'correct' Xen configuration to measure
performance, as I am currently experiencing a quite unexpected behavior.

My overall setup comprises a Skylake micro-architecture based system
with a Debian Buster and Linux kernel 4.13.16 running on top of Xen
v4.8. For performance measurements, I make use of the Phoronix Test
Suite v7.6.0 and SPECINT 2017. I compare the results of the test suits
performed in a Xen domu with results performed natively (with the
"performance" CPU governor on bare metal). Since my test case requires
the performance measurements run on only one CPU, I limit the Linux
running on bare metal to using only one CPU (maxcpus=1). I do the same
with Xen and additionally pin domu to the same CPU that runs dom0, as to
measure the entire overhead that comes with Xen.

The odd thing is that the resulting cpu-intensive performance
measurements of Xen seem to be partially faster than on Bare Metal. The
affected measurements inside the domu seem to be between ~6% and ~8%
faster than the ones measured on bare metal. Normally, I would say this
is a caching issue but the results are quite stable. BTW: there is no
such issue when running KVM.

My first assumption was that the benchmak suits make use of the TSC that
might have been falsely adjusted by the hypervisor. Yet, after playing
around with the domain control "tsc_mode" (and also the options
"no_migrate" and "timer_mode"), I did not experience any changes.
Besides, since the performance benchmarks seemingly use the wallclock
time, the TSC would not affect the stated time issues anyway. Thus, I
have set up the domu to use the same localtime as the dom0
(localtime=1), while at the same time the dom0 uses NTP. Nevertheless,
the tests showed the same results as before.

In my oppinion, this is a rather odd behavior that (despite the
virtualization overhead of Xen) the domu appears to be faster than the
same Linux kernel running on bare metal. Thus, I wanted to ask you
whether you had an advice regarding the stated issue, as I am sure that
I miss a configuration option (also I can't be the only one experiencing
such behavior, yet I did not find any useful hints on the Internet).

It would be great if you could help me with my concern :) Thank you very
much in advance :)

Thanks and best regards,

~Sergej




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen Performance Results
  2017-12-15 15:14 Xen Performance Results Sergej Proskurin
@ 2017-12-21 14:42 ` Sergej Proskurin
  2017-12-22 10:26   ` George Dunlap
  0 siblings, 1 reply; 6+ messages in thread
From: Sergej Proskurin @ 2017-12-21 14:42 UTC (permalink / raw)
  To: xen-devel

Hi all,

For the sake of completeness: the solution to the issue stated in my
last email was deactivating Intel's Turbo Boost technology directly in
UEFI (deactivating Turbo Boost through xenpm was not enough). Apparently
Turbo Boost affects Linux and KVM differently than Xen, which led to the
phonomenon, in which the benchmark execution on Xen appeared faster than
on bare metal.

Thanks,

~Sergej


On 12/15/2017 04:14 PM, Sergej Proskurin wrote:
> Hi all,
>
> I have a question concerning a 'correct' Xen configuration to measure
> performance, as I am currently experiencing a quite unexpected behavior.
>
> My overall setup comprises a Skylake micro-architecture based system
> with a Debian Buster and Linux kernel 4.13.16 running on top of Xen
> v4.8. For performance measurements, I make use of the Phoronix Test
> Suite v7.6.0 and SPECINT 2017. I compare the results of the test suits
> performed in a Xen domu with results performed natively (with the
> "performance" CPU governor on bare metal). Since my test case requires
> the performance measurements run on only one CPU, I limit the Linux
> running on bare metal to using only one CPU (maxcpus=1). I do the same
> with Xen and additionally pin domu to the same CPU that runs dom0, as to
> measure the entire overhead that comes with Xen.
>
> The odd thing is that the resulting cpu-intensive performance
> measurements of Xen seem to be partially faster than on Bare Metal. The
> affected measurements inside the domu seem to be between ~6% and ~8%
> faster than the ones measured on bare metal. Normally, I would say this
> is a caching issue but the results are quite stable. BTW: there is no
> such issue when running KVM.
>
> My first assumption was that the benchmak suits make use of the TSC that
> might have been falsely adjusted by the hypervisor. Yet, after playing
> around with the domain control "tsc_mode" (and also the options
> "no_migrate" and "timer_mode"), I did not experience any changes.
> Besides, since the performance benchmarks seemingly use the wallclock
> time, the TSC would not affect the stated time issues anyway. Thus, I
> have set up the domu to use the same localtime as the dom0
> (localtime=1), while at the same time the dom0 uses NTP. Nevertheless,
> the tests showed the same results as before.
>
> In my oppinion, this is a rather odd behavior that (despite the
> virtualization overhead of Xen) the domu appears to be faster than the
> same Linux kernel running on bare metal. Thus, I wanted to ask you
> whether you had an advice regarding the stated issue, as I am sure that
> I miss a configuration option (also I can't be the only one experiencing
> such behavior, yet I did not find any useful hints on the Internet).
>
> It would be great if you could help me with my concern :) Thank you very
> much in advance :)
>
> Thanks and best regards,
>
> ~Sergej
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

-- 
Sergej Proskurin, M.Sc.
Wissenschaftlicher Mitarbeiter

Technische Universität München
Fakultät für Informatik
Lehrstuhl für Sicherheit in der Informatik

Boltzmannstraße 3
85748 Garching (bei München)

Tel. +49 (0)89 289-18592
Fax +49 (0)89 289-18579



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen Performance Results
  2017-12-21 14:42 ` Sergej Proskurin
@ 2017-12-22 10:26   ` George Dunlap
  2017-12-22 10:41     ` Sergej Proskurin
  0 siblings, 1 reply; 6+ messages in thread
From: George Dunlap @ 2017-12-22 10:26 UTC (permalink / raw)
  To: Sergej Proskurin; +Cc: xen-devel

On Thu, Dec 21, 2017 at 2:42 PM, Sergej Proskurin
<proskurin@sec.in.tum.de> wrote:
> Hi all,
>
> For the sake of completeness: the solution to the issue stated in my
> last email was deactivating Intel's Turbo Boost technology directly in
> UEFI (deactivating Turbo Boost through xenpm was not enough). Apparently
> Turbo Boost affects Linux and KVM differently than Xen, which led to the
> phonomenon, in which the benchmark execution on Xen appeared faster than
> on bare metal.

*Appeared* faster than on bare metal, or *was* faster than on bare metal?

If the source of the change was the Turbo Boost, it's entirely
possible that the difference is due to the placement of workers on
cpus -- i.e., that Linux's bare metal scheduler makes a worse choice
for this particular workload than Xen's scheduler does.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen Performance Results
  2017-12-22 10:26   ` George Dunlap
@ 2017-12-22 10:41     ` Sergej Proskurin
  2017-12-22 12:18       ` George Dunlap
  0 siblings, 1 reply; 6+ messages in thread
From: Sergej Proskurin @ 2017-12-22 10:41 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

Hi George,

Thank you for your reply.

On 12/22/2017 11:26 AM, George Dunlap wrote:
> On Thu, Dec 21, 2017 at 2:42 PM, Sergej Proskurin
> <proskurin@sec.in.tum.de> wrote:
>> Hi all,
>>
>> For the sake of completeness: the solution to the issue stated in my
>> last email was deactivating Intel's Turbo Boost technology directly in
>> UEFI (deactivating Turbo Boost through xenpm was not enough). Apparently
>> Turbo Boost affects Linux and KVM differently than Xen, which led to the
>> phonomenon, in which the benchmark execution on Xen appeared faster than
>> on bare metal.
> 
> *Appeared* faster than on bare metal, or *was* faster than on bare metal?
> 
> If the source of the change was the Turbo Boost, it's entirely
> possible that the difference is due to the placement of workers on
> cpus -- i.e., that Linux's bare metal scheduler makes a worse choice
> for this particular workload than Xen's scheduler does.
> 

Given the fact that for this particular benchmark I configured both dom0
and domu to using only one core (in fact I have pinned both domains to
the same physical core), I do not believe that the performance increase
was due to a better placement on all available CPU's. However, I
absolutely agree that there might be a difference in handling Turbo
Boosts between Linux and Xen.

Thanks,
~Sergej

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen Performance Results
  2017-12-22 10:41     ` Sergej Proskurin
@ 2017-12-22 12:18       ` George Dunlap
  2017-12-22 13:53         ` Sergej Proskurin
  0 siblings, 1 reply; 6+ messages in thread
From: George Dunlap @ 2017-12-22 12:18 UTC (permalink / raw)
  To: Sergej Proskurin; +Cc: xen-devel

On Fri, Dec 22, 2017 at 10:41 AM, Sergej Proskurin
<proskurin@sec.in.tum.de> wrote:
> Hi George,
>
> Thank you for your reply.
>
> On 12/22/2017 11:26 AM, George Dunlap wrote:
>> On Thu, Dec 21, 2017 at 2:42 PM, Sergej Proskurin
>> <proskurin@sec.in.tum.de> wrote:
>>> Hi all,
>>>
>>> For the sake of completeness: the solution to the issue stated in my
>>> last email was deactivating Intel's Turbo Boost technology directly in
>>> UEFI (deactivating Turbo Boost through xenpm was not enough). Apparently
>>> Turbo Boost affects Linux and KVM differently than Xen, which led to the
>>> phonomenon, in which the benchmark execution on Xen appeared faster than
>>> on bare metal.
>>
>> *Appeared* faster than on bare metal, or *was* faster than on bare metal?
>>
>> If the source of the change was the Turbo Boost, it's entirely
>> possible that the difference is due to the placement of workers on
>> cpus -- i.e., that Linux's bare metal scheduler makes a worse choice
>> for this particular workload than Xen's scheduler does.
>>
>
> Given the fact that for this particular benchmark I configured both dom0
> and domu to using only one core (in fact I have pinned both domains to
> the same physical core), I do not believe that the performance increase
> was due to a better placement on all available CPU's. However, I
> absolutely agree that there might be a difference in handling Turbo
> Boosts between Linux and Xen.

in which case, *that* may actually be your problem. :-)

I've got a "scheduler microbenchmark" program that I wrote that has
unikernel workloads doing simplistic "burn / sleep" cycles.  There's a
certain point -- somewhere between 50% busy and 80% busy -- where
adding more work actually *improves* the performance of existing
workloads. Presumably when the cpu is busy more of the time, the
microcode puts it into a higher performance state.

If you pinned them to different sockets, you might actually get a
result more in line with your expectations.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen Performance Results
  2017-12-22 12:18       ` George Dunlap
@ 2017-12-22 13:53         ` Sergej Proskurin
  0 siblings, 0 replies; 6+ messages in thread
From: Sergej Proskurin @ 2017-12-22 13:53 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel



On 12/22/2017 01:18 PM, George Dunlap wrote:
> On Fri, Dec 22, 2017 at 10:41 AM, Sergej Proskurin
> <proskurin@sec.in.tum.de> wrote:
>> Hi George,
>>
>> Thank you for your reply.
>>
>> On 12/22/2017 11:26 AM, George Dunlap wrote:
>>> On Thu, Dec 21, 2017 at 2:42 PM, Sergej Proskurin
>>> <proskurin@sec.in.tum.de> wrote:
>>>> Hi all,
>>>>
>>>> For the sake of completeness: the solution to the issue stated in my
>>>> last email was deactivating Intel's Turbo Boost technology directly in
>>>> UEFI (deactivating Turbo Boost through xenpm was not enough). Apparently
>>>> Turbo Boost affects Linux and KVM differently than Xen, which led to the
>>>> phonomenon, in which the benchmark execution on Xen appeared faster than
>>>> on bare metal.
>>>
>>> *Appeared* faster than on bare metal, or *was* faster than on bare metal?
>>>
>>> If the source of the change was the Turbo Boost, it's entirely
>>> possible that the difference is due to the placement of workers on
>>> cpus -- i.e., that Linux's bare metal scheduler makes a worse choice
>>> for this particular workload than Xen's scheduler does.
>>>
>>
>> Given the fact that for this particular benchmark I configured both dom0
>> and domu to using only one core (in fact I have pinned both domains to
>> the same physical core), I do not believe that the performance increase
>> was due to a better placement on all available CPU's. However, I
>> absolutely agree that there might be a difference in handling Turbo
>> Boosts between Linux and Xen.
> 
> in which case, *that* may actually be your problem. :-)
> 
> I've got a "scheduler microbenchmark" program that I wrote that has
> unikernel workloads doing simplistic "burn / sleep" cycles.  There's a
> certain point -- somewhere between 50% busy and 80% busy -- where
> adding more work actually *improves* the performance of existing
> workloads. Presumably when the cpu is busy more of the time, the
> microcode puts it into a higher performance state.
> 
> If you pinned them to different sockets, you might actually get a
> result more in line with your expectations.
> 
>  

I see your point. Yet, I just ran the benchmark with domu pinned to a
different physical core. Unfortunately, the results where the same as
before. Nevertheless, it is a good hint, thank you :)

Besides, I also believe that if the additional cpu load through dom0
running on the same core would have been the issue, we would experience
a similar effect with KVM, as, in my experiment, the guest runs on the
same physical core as the underlying Linux host.

Thanks,
~Sergej

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-12-22 13:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-15 15:14 Xen Performance Results Sergej Proskurin
2017-12-21 14:42 ` Sergej Proskurin
2017-12-22 10:26   ` George Dunlap
2017-12-22 10:41     ` Sergej Proskurin
2017-12-22 12:18       ` George Dunlap
2017-12-22 13:53         ` Sergej Proskurin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.