All of lore.kernel.org
 help / color / mirror / Atom feed
* SVM: vmload/vmsave-free VM exits?
@ 2015-04-05  8:31 Jan Kiszka
  2015-04-05 17:12 ` Valentine Sinitsyn
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-05  8:31 UTC (permalink / raw)
  To: kvm, Jailhouse; +Cc: Valentine Sinitsyn

Hi,

studying the VM exit logic of Jailhouse, I was wondering when AMD's
vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
these instructions unconditionally. However, I think both only need
GS.base, i.e. the per-cpu base address, to be saved and restored if no
user space exit or no CPU migration is involved (both is always true for
Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
still uses rsp-based per-cpu variables.

So the question boils down to what is generally faster:

A) vmload
   vmrun
   vmsave

B) wrmsrl(MSR_GS_BASE, guest_gs_base)
   vmrun
   rdmsrl(MSR_GS_BASE, guest_gs_base)

Of course, KVM also has to take into account that heavyweight exits
still require vmload/vmsave, thus become more expensive with B) due to
the additional MSR accesses.

Any thoughts or results of previous experiments?

Jan

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-05  8:31 SVM: vmload/vmsave-free VM exits? Jan Kiszka
@ 2015-04-05 17:12 ` Valentine Sinitsyn
  2015-04-07  5:43   ` Jan Kiszka
  0 siblings, 1 reply; 20+ messages in thread
From: Valentine Sinitsyn @ 2015-04-05 17:12 UTC (permalink / raw)
  To: Jan Kiszka, kvm, Jailhouse

Hi Jan,

On 05.04.2015 13:31, Jan Kiszka wrote:
> studying the VM exit logic of Jailhouse, I was wondering when AMD's
> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
> these instructions unconditionally. However, I think both only need
> GS.base, i.e. the per-cpu base address, to be saved and restored if no
> user space exit or no CPU migration is involved (both is always true for
> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
> still uses rsp-based per-cpu variables.
>
> So the question boils down to what is generally faster:
>
> A) vmload
>     vmrun
>     vmsave
>
> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>     vmrun
>     rdmsrl(MSR_GS_BASE, guest_gs_base)
>
> Of course, KVM also has to take into account that heavyweight exits
> still require vmload/vmsave, thus become more expensive with B) due to
> the additional MSR accesses.
>
> Any thoughts or results of previous experiments?
That's a good question, I also thought about it when I was finalizing 
Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it 
didn't seem to affect the latency in any noticeable way. That's why I 
decided not to push the patch (in fact, I was even unable to find it now).

Note however that how AMD chips store host state during VM switches are 
implementation-specific. I did my quick experiments on one CPU only, so 
your mileage may vary.

Regarding your question, I feel B will be faster anyways but again I'm 
afraid that the gain could be within statistical error of the experiment.

Valentine

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-05 17:12 ` Valentine Sinitsyn
@ 2015-04-07  5:43   ` Jan Kiszka
  2015-04-07  6:10     ` Valentine Sinitsyn
  2015-04-13  7:01     ` Jan Kiszka
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Kiszka @ 2015-04-07  5:43 UTC (permalink / raw)
  To: Valentine Sinitsyn, kvm, Jailhouse

[-- Attachment #1: Type: text/plain, Size: 2300 bytes --]

On 2015-04-05 19:12, Valentine Sinitsyn wrote:
> Hi Jan,
> 
> On 05.04.2015 13:31, Jan Kiszka wrote:
>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>> these instructions unconditionally. However, I think both only need
>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>> user space exit or no CPU migration is involved (both is always true for
>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>> still uses rsp-based per-cpu variables.
>>
>> So the question boils down to what is generally faster:
>>
>> A) vmload
>>     vmrun
>>     vmsave
>>
>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>     vmrun
>>     rdmsrl(MSR_GS_BASE, guest_gs_base)
>>
>> Of course, KVM also has to take into account that heavyweight exits
>> still require vmload/vmsave, thus become more expensive with B) due to
>> the additional MSR accesses.
>>
>> Any thoughts or results of previous experiments?
> That's a good question, I also thought about it when I was finalizing
> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
> didn't seem to affect the latency in any noticeable way. That's why I
> decided not to push the patch (in fact, I was even unable to find it now).
> 
> Note however that how AMD chips store host state during VM switches are
> implementation-specific. I did my quick experiments on one CPU only, so
> your mileage may vary.
> 
> Regarding your question, I feel B will be faster anyways but again I'm
> afraid that the gain could be within statistical error of the experiment.

It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
towards 600 if they are colder (added some usleep to each loop in the test).

I've tested via vmmcall from guest userspace under Jailhouse. KVM should
be adjustable in a similar way. Attached the benchmark, patch will be in
the Jailhouse next branch soon. We need to check more CPU types, though.

Jan

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: vmexit-bench.c --]
[-- Type: text/x-csrc, Size: 1220 bytes --]

/*
 * VM exit benchmark using a hypercall
 *
 * Copyright (c) Siemens AG, 2015
 *
 * Authors:
 *  Jan Kiszka <jan.kiszka@siemens.com>
 *
 * This work is licensed under the terms of the GNU GPL, version 2.  See
 * the COPYING file in the top-level directory.
 */

#ifndef __x86_64__
#error only x86-64 supported
#endif

#include <stdbool.h>
#include <stdio.h>

#define LOOPS			1000000

#define X86_FEATURE_VMX		(1UL << 5)

static inline unsigned long cpuid_ecx(void)
{
	unsigned long val;

	asm volatile("cpuid" : "=c" (val) : "a" (1) : "ebx", "edx");
	return val;
}

static inline __attribute__((always_inline)) unsigned long long read_tsc(void)
{
	unsigned long long hi, lo;

	asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
	return (hi << 32) | lo;
}

int main(int argc, char *argv[])
{
	bool use_vmcall = !!(cpuid_ecx() & X86_FEATURE_VMX);
	unsigned long long start, sum = 0;
	unsigned int n;

	for (n = 0; n < LOOPS; n++) {
		if (use_vmcall) {
			start = read_tsc();
			asm volatile("vmcall" : : "a" (-1));
			sum += read_tsc() - start;
		} else {
			start = read_tsc();
			asm volatile("vmmcall" : : "a" (-1));
			sum += read_tsc() - start;
		}
	}
	printf("Null hypercall: %llu cycles\n", sum / LOOPS);

	return 0;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  5:43   ` Jan Kiszka
@ 2015-04-07  6:10     ` Valentine Sinitsyn
  2015-04-07  6:13       ` Jan Kiszka
  2015-04-13  7:01     ` Jan Kiszka
  1 sibling, 1 reply; 20+ messages in thread
From: Valentine Sinitsyn @ 2015-04-07  6:10 UTC (permalink / raw)
  To: Jan Kiszka, kvm, Jailhouse

Hi Jan,

On 07.04.2015 10:43, Jan Kiszka wrote:
> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>> Hi Jan,
>>
>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>> these instructions unconditionally. However, I think both only need
>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>> user space exit or no CPU migration is involved (both is always true for
>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>> still uses rsp-based per-cpu variables.
>>>
>>> So the question boils down to what is generally faster:
>>>
>>> A) vmload
>>>      vmrun
>>>      vmsave
>>>
>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>      vmrun
>>>      rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>
>>> Of course, KVM also has to take into account that heavyweight exits
>>> still require vmload/vmsave, thus become more expensive with B) due to
>>> the additional MSR accesses.
>>>
>>> Any thoughts or results of previous experiments?
>> That's a good question, I also thought about it when I was finalizing
>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>> didn't seem to affect the latency in any noticeable way. That's why I
>> decided not to push the patch (in fact, I was even unable to find it now).
>>
>> Note however that how AMD chips store host state during VM switches are
>> implementation-specific. I did my quick experiments on one CPU only, so
>> your mileage may vary.
>>
>> Regarding your question, I feel B will be faster anyways but again I'm
>> afraid that the gain could be within statistical error of the experiment.
>
> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
> towards 600 if they are colder (added some usleep to each loop in the test).
Great, thanks. Could you post absolute numbers, i.e how long do A and B 
take on your CPU?

Valentine

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  6:10     ` Valentine Sinitsyn
@ 2015-04-07  6:13       ` Jan Kiszka
  2015-04-07  6:19         ` Valentine Sinitsyn
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-07  6:13 UTC (permalink / raw)
  To: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-07 08:10, Valentine Sinitsyn wrote:
> Hi Jan,
> 
> On 07.04.2015 10:43, Jan Kiszka wrote:
>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>> Hi Jan,
>>>
>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>>> these instructions unconditionally. However, I think both only need
>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>>> user space exit or no CPU migration is involved (both is always true
>>>> for
>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>>> still uses rsp-based per-cpu variables.
>>>>
>>>> So the question boils down to what is generally faster:
>>>>
>>>> A) vmload
>>>>      vmrun
>>>>      vmsave
>>>>
>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>      vmrun
>>>>      rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>
>>>> Of course, KVM also has to take into account that heavyweight exits
>>>> still require vmload/vmsave, thus become more expensive with B) due to
>>>> the additional MSR accesses.
>>>>
>>>> Any thoughts or results of previous experiments?
>>> That's a good question, I also thought about it when I was finalizing
>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>>> didn't seem to affect the latency in any noticeable way. That's why I
>>> decided not to push the patch (in fact, I was even unable to find it
>>> now).
>>>
>>> Note however that how AMD chips store host state during VM switches are
>>> implementation-specific. I did my quick experiments on one CPU only, so
>>> your mileage may vary.
>>>
>>> Regarding your question, I feel B will be faster anyways but again I'm
>>> afraid that the gain could be within statistical error of the
>>> experiment.
>>
>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>> towards 600 if they are colder (added some usleep to each loop in the
>> test).
> Great, thanks. Could you post absolute numbers, i.e how long do A and B
> take on your CPU?

A is around 1910 cycles, B about 1750.

Jan

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  6:13       ` Jan Kiszka
@ 2015-04-07  6:19         ` Valentine Sinitsyn
  2015-04-07  6:23           ` Jan Kiszka
  0 siblings, 1 reply; 20+ messages in thread
From: Valentine Sinitsyn @ 2015-04-07  6:19 UTC (permalink / raw)
  To: Jan Kiszka, kvm, Jailhouse

On 07.04.2015 11:13, Jan Kiszka wrote:
>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>> towards 600 if they are colder (added some usleep to each loop in the
>>> test).
>> Great, thanks. Could you post absolute numbers, i.e how long do A and B
>> take on your CPU?
>
> A is around 1910 cycles, B about 1750.
It's with hot caches I guess? Not bad anyways, it's a pity I didn't 
observe this and didn't include this optimization from the day one.

Valentine

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  6:19         ` Valentine Sinitsyn
@ 2015-04-07  6:23           ` Jan Kiszka
  2015-04-07  6:29             ` Valentine Sinitsyn
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-07  6:23 UTC (permalink / raw)
  To: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-07 08:19, Valentine Sinitsyn wrote:
> On 07.04.2015 11:13, Jan Kiszka wrote:
>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>> test).
>>> Great, thanks. Could you post absolute numbers, i.e how long do A and B
>>> take on your CPU?
>>
>> A is around 1910 cycles, B about 1750.
> It's with hot caches I guess? Not bad anyways, it's a pity I didn't
> observe this and didn't include this optimization from the day one.

Yes, that is with the unmodified benchmark I sent. When I add, say
usleep(1000) to that loop body, the cycles jumped to 4k (IIRC).

BTW, this is the Jailhouse patch:
https://github.com/siemens/jailhouse/commit/dbf2fe479ac07a677462dfa87e008e37a4e72858

Jan

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  6:23           ` Jan Kiszka
@ 2015-04-07  6:29             ` Valentine Sinitsyn
  2015-04-07  6:35               ` Jan Kiszka
  0 siblings, 1 reply; 20+ messages in thread
From: Valentine Sinitsyn @ 2015-04-07  6:29 UTC (permalink / raw)
  To: Jan Kiszka, kvm, Jailhouse

On 07.04.2015 11:23, Jan Kiszka wrote:
> On 2015-04-07 08:19, Valentine Sinitsyn wrote:
>> On 07.04.2015 11:13, Jan Kiszka wrote:
>>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>>> test).
>>>> Great, thanks. Could you post absolute numbers, i.e how long do A and B
>>>> take on your CPU?
>>>
>>> A is around 1910 cycles, B about 1750.
>> It's with hot caches I guess? Not bad anyways, it's a pity I didn't
>> observe this and didn't include this optimization from the day one.
>
> Yes, that is with the unmodified benchmark I sent. When I add, say
> usleep(1000) to that loop body, the cycles jumped to 4k (IIRC).
>
> BTW, this is the Jailhouse patch:
> https://github.com/siemens/jailhouse/commit/dbf2fe479ac07a677462dfa87e008e37a4e72858
I guess, it's getting off-topic here, but wouldn't it be cleaner to 
simply use wrmsr and rdmsr instead of vmload and vmsave in svm-vmexit.S? 
This would require less changes and will keep all entry/exit setup code 
in one place.

Valentine

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  6:29             ` Valentine Sinitsyn
@ 2015-04-07  6:35               ` Jan Kiszka
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kiszka @ 2015-04-07  6:35 UTC (permalink / raw)
  To: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-07 08:29, Valentine Sinitsyn wrote:
> On 07.04.2015 11:23, Jan Kiszka wrote:
>> On 2015-04-07 08:19, Valentine Sinitsyn wrote:
>>> On 07.04.2015 11:13, Jan Kiszka wrote:
>>>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU,
>>>>>> more
>>>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>>>> test).
>>>>> Great, thanks. Could you post absolute numbers, i.e how long do A
>>>>> and B
>>>>> take on your CPU?
>>>>
>>>> A is around 1910 cycles, B about 1750.
>>> It's with hot caches I guess? Not bad anyways, it's a pity I didn't
>>> observe this and didn't include this optimization from the day one.
>>
>> Yes, that is with the unmodified benchmark I sent. When I add, say
>> usleep(1000) to that loop body, the cycles jumped to 4k (IIRC).
>>
>> BTW, this is the Jailhouse patch:
>> https://github.com/siemens/jailhouse/commit/dbf2fe479ac07a677462dfa87e008e37a4e72858
>>
> I guess, it's getting off-topic here, but wouldn't it be cleaner to
> simply use wrmsr and rdmsr instead of vmload and vmsave in svm-vmexit.S?
> This would require less changes and will keep all entry/exit setup code
> in one place.

It's a tradeoff between assembly lines and C statements. My feeling is
that it's easier done in C, but you can prove me wrong.

Jan

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-07  5:43   ` Jan Kiszka
  2015-04-07  6:10     ` Valentine Sinitsyn
@ 2015-04-13  7:01     ` Jan Kiszka
  2015-04-13 17:29       ` Avi Kivity
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-13  7:01 UTC (permalink / raw)
  To: Joel Schopp, Avi Kivity; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-07 07:43, Jan Kiszka wrote:
> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>> Hi Jan,
>>
>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>> these instructions unconditionally. However, I think both only need
>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>> user space exit or no CPU migration is involved (both is always true for
>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>> still uses rsp-based per-cpu variables.
>>>
>>> So the question boils down to what is generally faster:
>>>
>>> A) vmload
>>>     vmrun
>>>     vmsave
>>>
>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>     vmrun
>>>     rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>
>>> Of course, KVM also has to take into account that heavyweight exits
>>> still require vmload/vmsave, thus become more expensive with B) due to
>>> the additional MSR accesses.
>>>
>>> Any thoughts or results of previous experiments?
>> That's a good question, I also thought about it when I was finalizing
>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>> didn't seem to affect the latency in any noticeable way. That's why I
>> decided not to push the patch (in fact, I was even unable to find it now).
>>
>> Note however that how AMD chips store host state during VM switches are
>> implementation-specific. I did my quick experiments on one CPU only, so
>> your mileage may vary.
>>
>> Regarding your question, I feel B will be faster anyways but again I'm
>> afraid that the gain could be within statistical error of the experiment.
> 
> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
> towards 600 if they are colder (added some usleep to each loop in the test).
> 
> I've tested via vmmcall from guest userspace under Jailhouse. KVM should
> be adjustable in a similar way. Attached the benchmark, patch will be in
> the Jailhouse next branch soon. We need to check more CPU types, though.

Avi, I found some preparatory patches of yours from 2010 [1]. Do you
happen to remember if it was never completed for a technical reason?

Joel, can you comment on the benefit of variant B) for the various AMD
CPUs? Is it always positive?

Thanks,
Jan

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61455

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13  7:01     ` Jan Kiszka
@ 2015-04-13 17:29       ` Avi Kivity
  2015-04-13 17:35         ` Jan Kiszka
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2015-04-13 17:29 UTC (permalink / raw)
  To: Jan Kiszka, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 04/13/2015 10:01 AM, Jan Kiszka wrote:
> On 2015-04-07 07:43, Jan Kiszka wrote:
>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>> Hi Jan,
>>>
>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>>> these instructions unconditionally. However, I think both only need
>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>>> user space exit or no CPU migration is involved (both is always true for
>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>>> still uses rsp-based per-cpu variables.
>>>>
>>>> So the question boils down to what is generally faster:
>>>>
>>>> A) vmload
>>>>      vmrun
>>>>      vmsave
>>>>
>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>      vmrun
>>>>      rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>
>>>> Of course, KVM also has to take into account that heavyweight exits
>>>> still require vmload/vmsave, thus become more expensive with B) due to
>>>> the additional MSR accesses.
>>>>
>>>> Any thoughts or results of previous experiments?
>>> That's a good question, I also thought about it when I was finalizing
>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>>> didn't seem to affect the latency in any noticeable way. That's why I
>>> decided not to push the patch (in fact, I was even unable to find it now).
>>>
>>> Note however that how AMD chips store host state during VM switches are
>>> implementation-specific. I did my quick experiments on one CPU only, so
>>> your mileage may vary.
>>>
>>> Regarding your question, I feel B will be faster anyways but again I'm
>>> afraid that the gain could be within statistical error of the experiment.
>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>> towards 600 if they are colder (added some usleep to each loop in the test).
>>
>> I've tested via vmmcall from guest userspace under Jailhouse. KVM should
>> be adjustable in a similar way. Attached the benchmark, patch will be in
>> the Jailhouse next branch soon. We need to check more CPU types, though.
> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
> happen to remember if it was never completed for a technical reason?

IIRC, I came to the conclusion that it was impossible.  Something about 
TR.size not receiving a reasonable value.  Let me see.

> Joel, can you comment on the benefit of variant B) for the various AMD
> CPUs? Is it always positive?
>
> Thanks,
> Jan
>
> [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/61455
>

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 17:29       ` Avi Kivity
@ 2015-04-13 17:35         ` Jan Kiszka
  2015-04-13 17:41           ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-13 17:35 UTC (permalink / raw)
  To: Avi Kivity, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-13 19:29, Avi Kivity wrote:
> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>> Hi Jan,
>>>>
>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>>>> these instructions unconditionally. However, I think both only need
>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>>>> user space exit or no CPU migration is involved (both is always
>>>>> true for
>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>>>> still uses rsp-based per-cpu variables.
>>>>>
>>>>> So the question boils down to what is generally faster:
>>>>>
>>>>> A) vmload
>>>>>      vmrun
>>>>>      vmsave
>>>>>
>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>      vmrun
>>>>>      rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>
>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>> still require vmload/vmsave, thus become more expensive with B) due to
>>>>> the additional MSR accesses.
>>>>>
>>>>> Any thoughts or results of previous experiments?
>>>> That's a good question, I also thought about it when I was finalizing
>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>>>> didn't seem to affect the latency in any noticeable way. That's why I
>>>> decided not to push the patch (in fact, I was even unable to find it
>>>> now).
>>>>
>>>> Note however that how AMD chips store host state during VM switches are
>>>> implementation-specific. I did my quick experiments on one CPU only, so
>>>> your mileage may vary.
>>>>
>>>> Regarding your question, I feel B will be faster anyways but again I'm
>>>> afraid that the gain could be within statistical error of the
>>>> experiment.
>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>> towards 600 if they are colder (added some usleep to each loop in the
>>> test).
>>>
>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM should
>>> be adjustable in a similar way. Attached the benchmark, patch will be in
>>> the Jailhouse next branch soon. We need to check more CPU types, though.
>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>> happen to remember if it was never completed for a technical reason?
> 
> IIRC, I came to the conclusion that it was impossible.  Something about
> TR.size not receiving a reasonable value.  Let me see.

To my understanding, TR doesn't play a role until we leave ring 0 again.
Or what could make the CPU look for any of the fields in the 64-bit TSS
before that?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 17:35         ` Jan Kiszka
@ 2015-04-13 17:41           ` Avi Kivity
  2015-04-13 17:48             ` Avi Kivity
  2015-04-14  6:39             ` Valentine Sinitsyn
  0 siblings, 2 replies; 20+ messages in thread
From: Avi Kivity @ 2015-04-13 17:41 UTC (permalink / raw)
  To: Jan Kiszka, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 04/13/2015 08:35 PM, Jan Kiszka wrote:
> On 2015-04-13 19:29, Avi Kivity wrote:
>> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>>> Hi Jan,
>>>>>
>>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>>>>> these instructions unconditionally. However, I think both only need
>>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored if no
>>>>>> user space exit or no CPU migration is involved (both is always
>>>>>> true for
>>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it also
>>>>>> still uses rsp-based per-cpu variables.
>>>>>>
>>>>>> So the question boils down to what is generally faster:
>>>>>>
>>>>>> A) vmload
>>>>>>       vmrun
>>>>>>       vmsave
>>>>>>
>>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>       vmrun
>>>>>>       rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>
>>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>>> still require vmload/vmsave, thus become more expensive with B) due to
>>>>>> the additional MSR accesses.
>>>>>>
>>>>>> Any thoughts or results of previous experiments?
>>>>> That's a good question, I also thought about it when I was finalizing
>>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>>>>> didn't seem to affect the latency in any noticeable way. That's why I
>>>>> decided not to push the patch (in fact, I was even unable to find it
>>>>> now).
>>>>>
>>>>> Note however that how AMD chips store host state during VM switches are
>>>>> implementation-specific. I did my quick experiments on one CPU only, so
>>>>> your mileage may vary.
>>>>>
>>>>> Regarding your question, I feel B will be faster anyways but again I'm
>>>>> afraid that the gain could be within statistical error of the
>>>>> experiment.
>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>> test).
>>>>
>>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM should
>>>> be adjustable in a similar way. Attached the benchmark, patch will be in
>>>> the Jailhouse next branch soon. We need to check more CPU types, though.
>>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>>> happen to remember if it was never completed for a technical reason?
>> IIRC, I came to the conclusion that it was impossible.  Something about
>> TR.size not receiving a reasonable value.  Let me see.
> To my understanding, TR doesn't play a role until we leave ring 0 again.
> Or what could make the CPU look for any of the fields in the 64-bit TSS
> before that?

Exceptions that utilize the IST.  I found a writeup [17] that describes 
this, but I think it's even more impossible than that writeup implies.

[17]  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/26712/


> Jan
>

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 17:41           ` Avi Kivity
@ 2015-04-13 17:48             ` Avi Kivity
  2015-04-13 17:57               ` Jan Kiszka
  2015-04-14  6:39             ` Valentine Sinitsyn
  1 sibling, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2015-04-13 17:48 UTC (permalink / raw)
  To: Jan Kiszka, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 04/13/2015 08:41 PM, Avi Kivity wrote:
> On 04/13/2015 08:35 PM, Jan Kiszka wrote:
>> On 2015-04-13 19:29, Avi Kivity wrote:
>>> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>>>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>>>> Hi Jan,
>>>>>>
>>>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently 
>>>>>>> use
>>>>>>> these instructions unconditionally. However, I think both only need
>>>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored 
>>>>>>> if no
>>>>>>> user space exit or no CPU migration is involved (both is always
>>>>>>> true for
>>>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it 
>>>>>>> also
>>>>>>> still uses rsp-based per-cpu variables.
>>>>>>>
>>>>>>> So the question boils down to what is generally faster:
>>>>>>>
>>>>>>> A) vmload
>>>>>>>       vmrun
>>>>>>>       vmsave
>>>>>>>
>>>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>       vmrun
>>>>>>>       rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>
>>>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>>>> still require vmload/vmsave, thus become more expensive with B) 
>>>>>>> due to
>>>>>>> the additional MSR accesses.
>>>>>>>
>>>>>>> Any thoughts or results of previous experiments?
>>>>>> That's a good question, I also thought about it when I was 
>>>>>> finalizing
>>>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo 
>>>>>> but it
>>>>>> didn't seem to affect the latency in any noticeable way. That's 
>>>>>> why I
>>>>>> decided not to push the patch (in fact, I was even unable to find it
>>>>>> now).
>>>>>>
>>>>>> Note however that how AMD chips store host state during VM 
>>>>>> switches are
>>>>>> implementation-specific. I did my quick experiments on one CPU 
>>>>>> only, so
>>>>>> your mileage may vary.
>>>>>>
>>>>>> Regarding your question, I feel B will be faster anyways but 
>>>>>> again I'm
>>>>>> afraid that the gain could be within statistical error of the
>>>>>> experiment.
>>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, 
>>>>> more
>>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>>> test).
>>>>>
>>>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM 
>>>>> should
>>>>> be adjustable in a similar way. Attached the benchmark, patch will 
>>>>> be in
>>>>> the Jailhouse next branch soon. We need to check more CPU types, 
>>>>> though.
>>>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>>>> happen to remember if it was never completed for a technical reason?
>>> IIRC, I came to the conclusion that it was impossible. Something about
>>> TR.size not receiving a reasonable value.  Let me see.
>> To my understanding, TR doesn't play a role until we leave ring 0 again.
>> Or what could make the CPU look for any of the fields in the 64-bit TSS
>> before that?
>
> Exceptions that utilize the IST.  I found a writeup [17] that 
> describes this, but I think it's even more impossible than that 
> writeup implies.
>

I think that Xen does (or did) something along the lines of disabling 
IST usage (by playing with the descriptors in the IDT) and then 
re-enabling them when exiting to userspace.


> [17] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/26712/
>
>
>> Jan
>>
>

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 17:48             ` Avi Kivity
@ 2015-04-13 17:57               ` Jan Kiszka
  2015-04-13 18:07                 ` Avi Kivity
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-13 17:57 UTC (permalink / raw)
  To: Avi Kivity, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-13 19:48, Avi Kivity wrote:
> On 04/13/2015 08:41 PM, Avi Kivity wrote:
>> On 04/13/2015 08:35 PM, Jan Kiszka wrote:
>>> On 2015-04-13 19:29, Avi Kivity wrote:
>>>> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>>>>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>>>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently
>>>>>>>> use
>>>>>>>> these instructions unconditionally. However, I think both only need
>>>>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored
>>>>>>>> if no
>>>>>>>> user space exit or no CPU migration is involved (both is always
>>>>>>>> true for
>>>>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it
>>>>>>>> also
>>>>>>>> still uses rsp-based per-cpu variables.
>>>>>>>>
>>>>>>>> So the question boils down to what is generally faster:
>>>>>>>>
>>>>>>>> A) vmload
>>>>>>>>       vmrun
>>>>>>>>       vmsave
>>>>>>>>
>>>>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>>       vmrun
>>>>>>>>       rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>>
>>>>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>>>>> still require vmload/vmsave, thus become more expensive with B)
>>>>>>>> due to
>>>>>>>> the additional MSR accesses.
>>>>>>>>
>>>>>>>> Any thoughts or results of previous experiments?
>>>>>>> That's a good question, I also thought about it when I was
>>>>>>> finalizing
>>>>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo
>>>>>>> but it
>>>>>>> didn't seem to affect the latency in any noticeable way. That's
>>>>>>> why I
>>>>>>> decided not to push the patch (in fact, I was even unable to find it
>>>>>>> now).
>>>>>>>
>>>>>>> Note however that how AMD chips store host state during VM
>>>>>>> switches are
>>>>>>> implementation-specific. I did my quick experiments on one CPU
>>>>>>> only, so
>>>>>>> your mileage may vary.
>>>>>>>
>>>>>>> Regarding your question, I feel B will be faster anyways but
>>>>>>> again I'm
>>>>>>> afraid that the gain could be within statistical error of the
>>>>>>> experiment.
>>>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU,
>>>>>> more
>>>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>>>> test).
>>>>>>
>>>>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM
>>>>>> should
>>>>>> be adjustable in a similar way. Attached the benchmark, patch will
>>>>>> be in
>>>>>> the Jailhouse next branch soon. We need to check more CPU types,
>>>>>> though.
>>>>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>>>>> happen to remember if it was never completed for a technical reason?
>>>> IIRC, I came to the conclusion that it was impossible. Something about
>>>> TR.size not receiving a reasonable value.  Let me see.
>>> To my understanding, TR doesn't play a role until we leave ring 0 again.
>>> Or what could make the CPU look for any of the fields in the 64-bit TSS
>>> before that?
>>
>> Exceptions that utilize the IST.  I found a writeup [17] that
>> describes this, but I think it's even more impossible than that
>> writeup implies.
>>
> 
> I think that Xen does (or did) something along the lines of disabling
> IST usage (by playing with the descriptors in the IDT) and then
> re-enabling them when exiting to userspace.

So we would reuse that active stack for the current IST users until
then. But I bet there are subtle details that prevent a simple switch at
IDT level. Hmm, no low-hanging fruit it seems...

> 
> 
>> [17] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/26712/

That thread proposed the complete IST removal. But, given that we still
have it 7 years later, I suppose that was not very welcome in general.

Thanks,
Jan

PS: For the Jailhouse readers: we don't use IST.

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 17:57               ` Jan Kiszka
@ 2015-04-13 18:07                 ` Avi Kivity
  2015-04-13 18:14                   ` Jan Kiszka
  0 siblings, 1 reply; 20+ messages in thread
From: Avi Kivity @ 2015-04-13 18:07 UTC (permalink / raw)
  To: Jan Kiszka, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 04/13/2015 08:57 PM, Jan Kiszka wrote:
> On 2015-04-13 19:48, Avi Kivity wrote:
>> I think that Xen does (or did) something along the lines of disabling
>> IST usage (by playing with the descriptors in the IDT) and then
>> re-enabling them when exiting to userspace.
> So we would reuse that active stack for the current IST users until
> then.

Yes.

> But I bet there are subtle details that prevent a simple switch at
> IDT level. Hmm, no low-hanging fruit it seems...


For sure. It's not insurmountable, but fairly hard.

>>
>>> [17] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/26712/
> That thread proposed the complete IST removal. But, given that we still
> have it 7 years later,

Well, it's not as if a crack team of kernel hackers was laboring night 
and day to remove it, but...

>   I suppose that was not very welcome in general.

Simply removing it is impossible, or an NMI happening immediately after 
SYSCALL will hit user-provided %rsp.

> Thanks,
> Jan
>
> PS: For the Jailhouse readers: we don't use IST.
>

You don't have userspace, yes?  Only guests?

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 18:07                 ` Avi Kivity
@ 2015-04-13 18:14                   ` Jan Kiszka
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kiszka @ 2015-04-13 18:14 UTC (permalink / raw)
  To: Avi Kivity, Joel Schopp; +Cc: Valentine Sinitsyn, kvm, Jailhouse

On 2015-04-13 20:07, Avi Kivity wrote:
> On 04/13/2015 08:57 PM, Jan Kiszka wrote:
>> On 2015-04-13 19:48, Avi Kivity wrote:
>>> I think that Xen does (or did) something along the lines of disabling
>>> IST usage (by playing with the descriptors in the IDT) and then
>>> re-enabling them when exiting to userspace.
>> So we would reuse that active stack for the current IST users until
>> then.
> 
> Yes.
> 
>> But I bet there are subtle details that prevent a simple switch at
>> IDT level. Hmm, no low-hanging fruit it seems...
> 
> 
> For sure. It's not insurmountable, but fairly hard.
> 
>>>
>>>> [17] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/26712/
>> That thread proposed the complete IST removal. But, given that we still
>> have it 7 years later,
> 
> Well, it's not as if a crack team of kernel hackers was laboring night
> and day to remove it, but...
> 
>>   I suppose that was not very welcome in general.
> 
> Simply removing it is impossible, or an NMI happening immediately after
> SYSCALL will hit user-provided %rsp.
> 
>> Thanks,
>> Jan
>>
>> PS: For the Jailhouse readers: we don't use IST.
>>
> 
> You don't have userspace, yes?  Only guests?

Exactly. The day someone adds userspace, I guess I'll have to create a
new hypervisor.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-13 17:41           ` Avi Kivity
  2015-04-13 17:48             ` Avi Kivity
@ 2015-04-14  6:39             ` Valentine Sinitsyn
  2015-04-14  7:02               ` Jan Kiszka
  1 sibling, 1 reply; 20+ messages in thread
From: Valentine Sinitsyn @ 2015-04-14  6:39 UTC (permalink / raw)
  To: Avi Kivity, Jan Kiszka, Joel Schopp; +Cc: kvm, Jailhouse

Hi all,

On 13.04.2015 22:41, Avi Kivity wrote:
> On 04/13/2015 08:35 PM, Jan Kiszka wrote:
>> On 2015-04-13 19:29, Avi Kivity wrote:
>>> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>>>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>>>> Hi Jan,
>>>>>>
>>>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use
>>>>>>> these instructions unconditionally. However, I think both only need
>>>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored
>>>>>>> if no
>>>>>>> user space exit or no CPU migration is involved (both is always
>>>>>>> true for
>>>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it
>>>>>>> also
>>>>>>> still uses rsp-based per-cpu variables.
>>>>>>>
>>>>>>> So the question boils down to what is generally faster:
>>>>>>>
>>>>>>> A) vmload
>>>>>>>       vmrun
>>>>>>>       vmsave
>>>>>>>
>>>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>       vmrun
>>>>>>>       rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>
>>>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>>>> still require vmload/vmsave, thus become more expensive with B)
>>>>>>> due to
>>>>>>> the additional MSR accesses.
>>>>>>>
>>>>>>> Any thoughts or results of previous experiments?
>>>>>> That's a good question, I also thought about it when I was finalizing
>>>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo but it
>>>>>> didn't seem to affect the latency in any noticeable way. That's why I
>>>>>> decided not to push the patch (in fact, I was even unable to find it
>>>>>> now).
>>>>>>
>>>>>> Note however that how AMD chips store host state during VM
>>>>>> switches are
>>>>>> implementation-specific. I did my quick experiments on one CPU
>>>>>> only, so
>>>>>> your mileage may vary.
>>>>>>
>>>>>> Regarding your question, I feel B will be faster anyways but again
>>>>>> I'm
>>>>>> afraid that the gain could be within statistical error of the
>>>>>> experiment.
>>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more
>>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>>> test).
>>>>>
>>>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM
>>>>> should
>>>>> be adjustable in a similar way. Attached the benchmark, patch will
>>>>> be in
>>>>> the Jailhouse next branch soon. We need to check more CPU types,
>>>>> though.
>>>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>>>> happen to remember if it was never completed for a technical reason?
>>> IIRC, I came to the conclusion that it was impossible.  Something about
>>> TR.size not receiving a reasonable value.  Let me see.
>> To my understanding, TR doesn't play a role until we leave ring 0 again.
>> Or what could make the CPU look for any of the fields in the 64-bit TSS
>> before that?
>
> Exceptions that utilize the IST.  I found a writeup [17] that describes
> this, but I think it's even more impossible than that writeup implies.
Pardon my slowness, but how does it affect Jailhouse running on AMD? For 
NMI, we do #VMEXIT, but we can disable IST (I'm not sure it's enabled 
already, in fact). Double faults don't cause #VMEXIT, so there is no 
VMLOAD/VMSAVE issue. I'm not sure about MCE, but for now they are sort 
of flawed in Jailhouse anyways IIRC.

What am I missing here?

Thanks,
Valentine

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-14  6:39             ` Valentine Sinitsyn
@ 2015-04-14  7:02               ` Jan Kiszka
  2015-04-14  7:11                 ` Valentine Sinitsyn
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2015-04-14  7:02 UTC (permalink / raw)
  To: Valentine Sinitsyn, Avi Kivity, Joel Schopp; +Cc: kvm, Jailhouse

On 2015-04-14 08:39, Valentine Sinitsyn wrote:
> Hi all,
> 
> On 13.04.2015 22:41, Avi Kivity wrote:
>> On 04/13/2015 08:35 PM, Jan Kiszka wrote:
>>> On 2015-04-13 19:29, Avi Kivity wrote:
>>>> On 04/13/2015 10:01 AM, Jan Kiszka wrote:
>>>>> On 2015-04-07 07:43, Jan Kiszka wrote:
>>>>>> On 2015-04-05 19:12, Valentine Sinitsyn wrote:
>>>>>>> Hi Jan,
>>>>>>>
>>>>>>> On 05.04.2015 13:31, Jan Kiszka wrote:
>>>>>>>> studying the VM exit logic of Jailhouse, I was wondering when AMD's
>>>>>>>> vmload/vmsave can be avoided. Jailhouse as well as KVM currently
>>>>>>>> use
>>>>>>>> these instructions unconditionally. However, I think both only need
>>>>>>>> GS.base, i.e. the per-cpu base address, to be saved and restored
>>>>>>>> if no
>>>>>>>> user space exit or no CPU migration is involved (both is always
>>>>>>>> true for
>>>>>>>> Jailhouse). Xen avoids vmload/vmsave on lightweight exits but it
>>>>>>>> also
>>>>>>>> still uses rsp-based per-cpu variables.
>>>>>>>>
>>>>>>>> So the question boils down to what is generally faster:
>>>>>>>>
>>>>>>>> A) vmload
>>>>>>>>       vmrun
>>>>>>>>       vmsave
>>>>>>>>
>>>>>>>> B) wrmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>>       vmrun
>>>>>>>>       rdmsrl(MSR_GS_BASE, guest_gs_base)
>>>>>>>>
>>>>>>>> Of course, KVM also has to take into account that heavyweight exits
>>>>>>>> still require vmload/vmsave, thus become more expensive with B)
>>>>>>>> due to
>>>>>>>> the additional MSR accesses.
>>>>>>>>
>>>>>>>> Any thoughts or results of previous experiments?
>>>>>>> That's a good question, I also thought about it when I was
>>>>>>> finalizing
>>>>>>> Jailhouse AMD port. I tried "lightweight exits" with apic-demo
>>>>>>> but it
>>>>>>> didn't seem to affect the latency in any noticeable way. That's
>>>>>>> why I
>>>>>>> decided not to push the patch (in fact, I was even unable to find it
>>>>>>> now).
>>>>>>>
>>>>>>> Note however that how AMD chips store host state during VM
>>>>>>> switches are
>>>>>>> implementation-specific. I did my quick experiments on one CPU
>>>>>>> only, so
>>>>>>> your mileage may vary.
>>>>>>>
>>>>>>> Regarding your question, I feel B will be faster anyways but again
>>>>>>> I'm
>>>>>>> afraid that the gain could be within statistical error of the
>>>>>>> experiment.
>>>>>> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU,
>>>>>> more
>>>>>> towards 600 if they are colder (added some usleep to each loop in the
>>>>>> test).
>>>>>>
>>>>>> I've tested via vmmcall from guest userspace under Jailhouse. KVM
>>>>>> should
>>>>>> be adjustable in a similar way. Attached the benchmark, patch will
>>>>>> be in
>>>>>> the Jailhouse next branch soon. We need to check more CPU types,
>>>>>> though.
>>>>> Avi, I found some preparatory patches of yours from 2010 [1]. Do you
>>>>> happen to remember if it was never completed for a technical reason?
>>>> IIRC, I came to the conclusion that it was impossible.  Something about
>>>> TR.size not receiving a reasonable value.  Let me see.
>>> To my understanding, TR doesn't play a role until we leave ring 0 again.
>>> Or what could make the CPU look for any of the fields in the 64-bit TSS
>>> before that?
>>
>> Exceptions that utilize the IST.  I found a writeup [17] that describes
>> this, but I think it's even more impossible than that writeup implies.
> Pardon my slowness, but how does it affect Jailhouse running on AMD? For
> NMI, we do #VMEXIT, but we can disable IST (I'm not sure it's enabled
> already, in fact). Double faults don't cause #VMEXIT, so there is no
> VMLOAD/VMSAVE issue. I'm not sure about MCE, but for now they are sort
> of flawed in Jailhouse anyways IIRC.
> 
> What am I missing here?

Nothing. As I said in the other branch of this thread, Jailhouse is not
affected as it doesn't use the IST. Only KVM is because Linux - in host
mode - requires it for the cases Avi mentioned.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: SVM: vmload/vmsave-free VM exits?
  2015-04-14  7:02               ` Jan Kiszka
@ 2015-04-14  7:11                 ` Valentine Sinitsyn
  0 siblings, 0 replies; 20+ messages in thread
From: Valentine Sinitsyn @ 2015-04-14  7:11 UTC (permalink / raw)
  To: Jan Kiszka, Avi Kivity, Joel Schopp; +Cc: kvm, Jailhouse

On 14.04.2015 12:02, Jan Kiszka wrote:
>> What am I missing here?
>
> Nothing. As I said in the other branch of this thread, Jailhouse is not
> affected as it doesn't use the IST. Only KVM is because Linux - in host
> mode - requires it for the cases Avi mentioned.
Then what I am missing is correct discussion branch. :) Thanks for 
explaining.

Valentine

-- 
You received this message because you are subscribed to the Google Groups "Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-04-14  7:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-05  8:31 SVM: vmload/vmsave-free VM exits? Jan Kiszka
2015-04-05 17:12 ` Valentine Sinitsyn
2015-04-07  5:43   ` Jan Kiszka
2015-04-07  6:10     ` Valentine Sinitsyn
2015-04-07  6:13       ` Jan Kiszka
2015-04-07  6:19         ` Valentine Sinitsyn
2015-04-07  6:23           ` Jan Kiszka
2015-04-07  6:29             ` Valentine Sinitsyn
2015-04-07  6:35               ` Jan Kiszka
2015-04-13  7:01     ` Jan Kiszka
2015-04-13 17:29       ` Avi Kivity
2015-04-13 17:35         ` Jan Kiszka
2015-04-13 17:41           ` Avi Kivity
2015-04-13 17:48             ` Avi Kivity
2015-04-13 17:57               ` Jan Kiszka
2015-04-13 18:07                 ` Avi Kivity
2015-04-13 18:14                   ` Jan Kiszka
2015-04-14  6:39             ` Valentine Sinitsyn
2015-04-14  7:02               ` Jan Kiszka
2015-04-14  7:11                 ` Valentine Sinitsyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.