All of lore.kernel.org
 help / color / mirror / Atom feed
* PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
@ 2016-12-02  2:32 Don Bowman
  2016-12-02 15:07 ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Don Bowman @ 2016-12-02  2:32 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, rkrcmar

Add an enable_tsc_scaling parameter to kvm

Signed-off-by: Don Bowman <db@donbowman.ca>
---

My system has what i thought were two identical processors (same
stepping ID etc).
However, bafflingly, one of them has the ability to do TSC scaling,
and one does not (as reported in the vmcs).

This in turn causes kvm to give up entirely.

I feel a better solution is to mask off this capability on the one processor.
If enable_tsc_scaling is set false, the feature is ignored, and all
processors end up matching each other, enabling acceleration.

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..5ace575 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -103,6 +103,9 @@ module_param_named(enable_shadow_vmcs,
enable_shadow_vmcs, bool, S_IRUGO);
 static bool __read_mostly nested = 0;
 module_param(nested, bool, S_IRUGO);

+static bool __read_mostly enable_tsc_scaling = true;
+module_param(enable_tsc_scaling, bool, S_IRUGO);
+
 static u64 __read_mostly host_xss;

 static bool __read_mostly enable_pml = 1;
@@ -3449,6 +3452,8 @@ static __init int setup_vmcs_config(struct
vmcs_config *vmcs_conf)
        vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
        vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
        vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
+        if (!enable_tsc_scaling)
+            vmcs_conf->cpu_based_2nd_exec_ctrl &= ~SECONDARY_EXEC_TSC_SCALING;
        vmcs_conf->vmexit_ctrl         = _vmexit_control;
        vmcs_conf->vmentry_ctrl        = _vmentry_control;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-02  2:32 PATCH: setup_vmcs_config: disable TSC scaling on unlike processors Don Bowman
@ 2016-12-02 15:07 ` Radim Krčmář
  2016-12-02 19:10   ` Don Bowman
  2016-12-06  8:49   ` David Hildenbrand
  0 siblings, 2 replies; 17+ messages in thread
From: Radim Krčmář @ 2016-12-02 15:07 UTC (permalink / raw)
  To: Don Bowman; +Cc: kvm, pbonzini

Subjects that touch only arch/x86/kvm/vmx.c are usually prefixed with
"[PATCH] KVM: VMX:".

2016-12-01 21:32-0500, Don Bowman:
> Add an enable_tsc_scaling parameter to kvm
> 
> Signed-off-by: Don Bowman <db@donbowman.ca>
> ---
> 
> My system has what i thought were two identical processors (same
> stepping ID etc).
> However, bafflingly, one of them has the ability to do TSC scaling,
> and one does not (as reported in the vmcs).

Wow, the chip doesn't sound trustworthy.

> This in turn causes kvm to give up entirely.
> 
> I feel a better solution is to mask off this capability on the one processor.

I like the global toggle better -- it is less code with more uses.

> If enable_tsc_scaling is set false, the feature is ignored, and all
> processors end up matching each other, enabling acceleration.
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -103,6 +103,9 @@ module_param_named(enable_shadow_vmcs,
> enable_shadow_vmcs, bool, S_IRUGO);
>  static bool __read_mostly nested = 0;
>  module_param(nested, bool, S_IRUGO);
> 
> +static bool __read_mostly enable_tsc_scaling = true;
> +module_param(enable_tsc_scaling, bool, S_IRUGO);

The "enable_" prefix is not needed, please do

  module_param_named(tsc_scaling, ...)

> +
>  static u64 __read_mostly host_xss;
> 
>  static bool __read_mostly enable_pml = 1;
> @@ -3449,6 +3452,8 @@ static __init int setup_vmcs_config(struct
> vmcs_config *vmcs_conf)
>         vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
>         vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
>         vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
> +        if (!enable_tsc_scaling)
> +            vmcs_conf->cpu_based_2nd_exec_ctrl &= ~SECONDARY_EXEC_TSC_SCALING;

Move this to vmx_secondary_exec_control(), where we disable all other
features.

You'll notice kvm_has_tsc_control in hardware_setup() when clearing the
flag if it isn't supported by hardware, so I think your change would
make sense as a x86 kvm module option based on kvm_has_tsc_control.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-02 15:07 ` Radim Krčmář
@ 2016-12-02 19:10   ` Don Bowman
  2016-12-02 20:58     ` Don Bowman
  2016-12-06  8:49   ` David Hildenbrand
  1 sibling, 1 reply; 17+ messages in thread
From: Don Bowman @ 2016-12-02 19:10 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: kvm, pbonzini

On 2 December 2016 at 10:07, Radim Krčmář <rkrcmar@redhat.com> wrote:
> Subjects that touch only arch/x86/kvm/vmx.c are usually prefixed with
> "[PATCH] KVM: VMX:".
>
> 2016-12-01 21:32-0500, Don Bowman:
>> Add an enable_tsc_scaling parameter to kvm
>>
>> Signed-off-by: Don Bowman <db@donbowman.ca>
>> ---
>>
>> My system has what i thought were two identical processors (same
>> stepping ID etc).
>> However, bafflingly, one of them has the ability to do TSC scaling,
>> and one does not (as reported in the vmcs).
>
> Wow, the chip doesn't sound trustworthy.

Its TSC scaling. The tray-version doesn't have it, the retail one has it
(2696 v3 vs 2699 v3).
The TSC is reliable on both as are the other flags.
As to why, I don't know. the stepping is identical.
This happens if you have an OEM server w/ 1 socket, and then populate
the other. I'm not sure Its a problem(?)

 ...

>
> You'll notice kvm_has_tsc_control in hardware_setup() when clearing the
> flag if it isn't supported by hardware, so I think your change would
> make sense as a x86 kvm module option based on kvm_has_tsc_control.
>
> Thanks.

I will look @ this and supply new potential patch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-02 19:10   ` Don Bowman
@ 2016-12-02 20:58     ` Don Bowman
  2016-12-05 16:37       ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Don Bowman @ 2016-12-02 20:58 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: kvm, pbonzini

On 2 December 2016 at 14:10, Don Bowman <db@donbowman.ca> wrote:
> On 2 December 2016 at 10:07, Radim Krčmář <rkrcmar@redhat.com> wrote:
>> Subjects that touch only arch/x86/kvm/vmx.c are usually prefixed with
>> "[PATCH] KVM: VMX:".
>>
>> 2016-12-01 21:32-0500, Don Bowman:
>>> Add an enable_tsc_scaling parameter to kvm
>>>
>>> Signed-off-by: Don Bowman <db@donbowman.ca>
>>> ---
>>>
>>> My system has what i thought were two identical processors (same
>>> stepping ID etc).
>>> However, bafflingly, one of them has the ability to do TSC scaling,
>>> and one does not (as reported in the vmcs).
>>
>> Wow, the chip doesn't sound trustworthy.
>

OK, how about this? The check has to be in setup_vmcs_config() not
elsewhere I think. This is where the rdmsr occurs, and immediately
following that is the compare against the other processor(s). Unless
I'm missing something I don't see how vmx_secondary_exec_control()
could work. For the enable I was following the 'enable_pml' which is
already there, but have changed it below. vmx_check_processor_compat()
calls setup_vmcs_config() and then the memcmp() immediately
afterwards.

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..503ee1e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -103,6 +103,9 @@ module_param_named(enable_shadow_vmcs,
enable_shadow_vmcs, bool, S_IRUGO);
 static bool __read_mostly nested = 0;
 module_param(nested, bool, S_IRUGO);

+static bool __read_mostly tsc_scaling = true;
+module_param(tsc_scaling, bool, S_IRUGO);
+
 static u64 __read_mostly host_xss;

 static bool __read_mostly enable_pml = 1;
@@ -3449,6 +3452,8 @@ static __init int setup_vmcs_config(struct
vmcs_config *vmcs_conf)
        vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
        vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
        vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
+        if (!tsc_scaling)
+            vmcs_conf->cpu_based_2nd_exec_ctrl &= ~SECONDARY_EXEC_TSC_SCALING;
        vmcs_conf->vmexit_ctrl         = _vmexit_control;
        vmcs_conf->vmentry_ctrl        = _vmentry_control;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-02 20:58     ` Don Bowman
@ 2016-12-05 16:37       ` Radim Krčmář
  0 siblings, 0 replies; 17+ messages in thread
From: Radim Krčmář @ 2016-12-05 16:37 UTC (permalink / raw)
  To: Don Bowman; +Cc: kvm, pbonzini

2016-12-02 15:58-0500, Don Bowman:
> On 2 December 2016 at 14:10, Don Bowman <db@donbowman.ca> wrote:
>> On 2 December 2016 at 10:07, Radim Krčmář <rkrcmar@redhat.com> wrote:
>>> 2016-12-01 21:32-0500, Don Bowman:
>>>> My system has what i thought were two identical processors (same
>>>> stepping ID etc).
>>>> However, bafflingly, one of them has the ability to do TSC scaling,
>>>> and one does not (as reported in the vmcs).
>>>
> OK, how about this? The check has to be in setup_vmcs_config() not
> elsewhere I think. This is where the rdmsr occurs, and immediately
> following that is the compare against the other processor(s). Unless
> I'm missing something I don't see how vmx_secondary_exec_control()
> could work. For the enable I was following the 'enable_pml' which is
> already there, but have changed it below. vmx_check_processor_compat()
> calls setup_vmcs_config() and then the memcmp() immediately
> afterwards.

Right, KVM checks this early ... I don't like that the patch treats
tsc_scaling specially while the same could happen with some other
feature.  What about warning in vmx_check_processor_compat() if features
that won't be used don't match, but letting the check pass?

(I'd even prefer an unsafe option to disable the check than to treat
 tsc_scaling differently.)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-02 15:07 ` Radim Krčmář
  2016-12-02 19:10   ` Don Bowman
@ 2016-12-06  8:49   ` David Hildenbrand
  2016-12-06  9:09     ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2016-12-06  8:49 UTC (permalink / raw)
  To: Radim Krčmář, Don Bowman; +Cc: kvm, pbonzini

Am 02.12.2016 um 16:07 schrieb Radim Krčmář:
> Subjects that touch only arch/x86/kvm/vmx.c are usually prefixed with
> "[PATCH] KVM: VMX:".
>
> 2016-12-01 21:32-0500, Don Bowman:
>> Add an enable_tsc_scaling parameter to kvm
>>
>> Signed-off-by: Don Bowman <db@donbowman.ca>
>> ---
>>
>> My system has what i thought were two identical processors (same
>> stepping ID etc).
>> However, bafflingly, one of them has the ability to do TSC scaling,
>> and one does not (as reported in the vmcs).
>
> Wow, the chip doesn't sound trustworthy.
>
>> This in turn causes kvm to give up entirely.
>>
>> I feel a better solution is to mask off this capability on the one processor.
>
> I like the global toggle better -- it is less code with more uses.

(not having much insight) what speaks against allowing such features 
only if available on all CPUs symmetrically? I actually prefer having an 
automatism to some magic toggle.

-- 

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-06  8:49   ` David Hildenbrand
@ 2016-12-06  9:09     ` Paolo Bonzini
  2016-12-06 11:08       ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2016-12-06  9:09 UTC (permalink / raw)
  To: David Hildenbrand, Radim Krčmář, Don Bowman; +Cc: kvm



On 06/12/2016 09:49, David Hildenbrand wrote:
>>>
>>> I feel a better solution is to mask off this capability on the one
>>> processor.
>>
>> I like the global toggle better -- it is less code with more uses.
> 
> (not having much insight) what speaks against allowing such features
> only if available on all CPUs symmetrically? I actually prefer having an
> automatism to some magic toggle.

I agree with David.  Just warn and proceed with the minimum common set
of features.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-06  9:09     ` Paolo Bonzini
@ 2016-12-06 11:08       ` Radim Krčmář
  2016-12-07 11:37         ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2016-12-06 11:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: David Hildenbrand, Don Bowman, kvm

2016-12-06 10:09+0100, Paolo Bonzini:
> On 06/12/2016 09:49, David Hildenbrand wrote:
>>>>
>>>> I feel a better solution is to mask off this capability on the one
>>>> processor.
>>>
>>> I like the global toggle better -- it is less code with more uses.
>> 
>> (not having much insight) what speaks against allowing such features
>> only if available on all CPUs symmetrically? I actually prefer having an
>> automatism to some magic toggle.
> 
> I agree with David.  Just warn and proceed with the minimum common set
> of features.

Yes, that is reasonable -- I though David wanted to use the feature when
running on CPUs that support it (only mask off on the one).

I'd start with changing the check in vmx_check_processor_compat() to
ignore disabled features and then extend KVM to enable only the common
set.

Finding the minimal set of common features is complicated by hotplug ...
I think that the check we have is not working perfectly, so if you
currently

 1) offline all cores on the worse CPU
 2) load kvm module
 3) online all CPUs

then KVM is going enable tsc-scaling and not complain, but guests
running on onlined CPUs are not going to work.
Can you confirm that?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-06 11:08       ` Radim Krčmář
@ 2016-12-07 11:37         ` David Hildenbrand
  2016-12-07 15:25           ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2016-12-07 11:37 UTC (permalink / raw)
  To: Radim Krčmář, Paolo Bonzini; +Cc: Don Bowman, kvm


> Yes, that is reasonable -- I though David wanted to use the feature when
> running on CPUs that support it (only mask off on the one).
>
> I'd start with changing the check in vmx_check_processor_compat() to
> ignore disabled features and then extend KVM to enable only the common
> set.
>
> Finding the minimal set of common features is complicated by hotplug ...
> I think that the check we have is not working perfectly, so if you
> currently
>
>  1) offline all cores on the worse CPU
>  2) load kvm module
>  3) online all CPUs
>
> then KVM is going enable tsc-scaling and not complain, but guests
> running on onlined CPUs are not going to work.
> Can you confirm that?

My intuition tells me that whenever we hotplug CPUs that have less 
features than used, we are in trouble. So tsc scaling might just be the 
tip of the iceberg. I think this is a general problem.

What should happen if we hotplug such CPUs? We can't run existing VCPUs 
on them. And isn't this even a general problem, also for other tasks in 
the system (how is that problem solved with cpuid features?)?

(I am currently thinking about "virsh capabilities", could it happen 
that our "host" cpu model is no longer valid after we hotplugged cpus 
(as the common feature set got smaller)? that would be very ugly)

-- 

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-07 11:37         ` David Hildenbrand
@ 2016-12-07 15:25           ` Radim Krčmář
  2016-12-08 11:46             ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2016-12-07 15:25 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Paolo Bonzini, Don Bowman, kvm

2016-12-07 12:37+0100, David Hildenbrand:
>> Yes, that is reasonable -- I though David wanted to use the feature when
>> running on CPUs that support it (only mask off on the one).
>> 
>> I'd start with changing the check in vmx_check_processor_compat() to
>> ignore disabled features and then extend KVM to enable only the common
>> set.
>> 
>> Finding the minimal set of common features is complicated by hotplug ...
>> I think that the check we have is not working perfectly, so if you
>> currently
>> 
>>  1) offline all cores on the worse CPU
>>  2) load kvm module
>>  3) online all CPUs
>> 
>> then KVM is going enable tsc-scaling and not complain, but guests
>> running on onlined CPUs are not going to work.
>> Can you confirm that?
> 
> My intuition tells me that whenever we hotplug CPUs that have less features
> than used, we are in trouble. So tsc scaling might just be the tip of the
> iceberg. I think this is a general problem.

Definitely.  It was not handled in KVM probably because it doesn't have
a simple solution.

Finding the minimal common subset on hotplug will take extra work, which
is why relaxing the check and having a toggle for features that
shouldn't be enabled nor checked is easier.

> What should happen if we hotplug such CPUs? We can't run existing VCPUs on
> them. And isn't this even a general problem, also for other tasks in the
> system (how is that problem solved with cpuid features?)?

1) Prevent the hotplug -- admin can notice the error, kill guests or
   decide to let them finish, and then online hotplugged CPU.

2) Just warn and trust that admin knows what hotplugging to non-SMP
   means.

(2) is less work and give a bit more freedom to an undesirable case, so
I slightly prefer it.  I wouldn't for example limit existing VCPUs to
compatible CPUs or cleanly kill all guests from the kernel.

> (I am currently thinking about "virsh capabilities", could it happen that
> our "host" cpu model is no longer valid after we hotplugged cpus (as the
> common feature set got smaller)? that would be very ugly)

I think it could, but I'd continue in thinking only about SMP.  VMX
features are not even noticed by `virsh capabilities`, so finding the
minimal common subset in KVM would not affect the output.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-07 15:25           ` Radim Krčmář
@ 2016-12-08 11:46             ` David Hildenbrand
  2016-12-08 14:32               ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2016-12-08 11:46 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Don Bowman, kvm


>> My intuition tells me that whenever we hotplug CPUs that have less features
>> than used, we are in trouble. So tsc scaling might just be the tip of the
>> iceberg. I think this is a general problem.
>
> Definitely.  It was not handled in KVM probably because it doesn't have
> a simple solution.
>
> Finding the minimal common subset on hotplug will take extra work, which
> is why relaxing the check and having a toggle for features that
> shouldn't be enabled nor checked is easier.

But even with a toggle, the problem of not knowing what will be 
hotplugged persists. Before hotplugging, the admin would have to restart 
all guests after flipping the toggle.

>
>> What should happen if we hotplug such CPUs? We can't run existing VCPUs on
>> them. And isn't this even a general problem, also for other tasks in the
>> system (how is that problem solved with cpuid features?)?
>
> 1) Prevent the hotplug -- admin can notice the error, kill guests or
>    decide to let them finish, and then online hotplugged CPU.
>
> 2) Just warn and trust that admin knows what hotplugging to non-SMP
>    means.
>
> (2) is less work and give a bit more freedom to an undesirable case, so
> I slightly prefer it.  I wouldn't for example limit existing VCPUs to
> compatible CPUs or cleanly kill all guests from the kernel.

And I assume that such scenarios are also quite unlikely. Or is it a 
common practice in production to hotplug random CPUs from the shelf? I 
doubt it.

>
>> (I am currently thinking about "virsh capabilities", could it happen that
>> our "host" cpu model is no longer valid after we hotplugged cpus (as the
>> common feature set got smaller)? that would be very ugly)
>
> I think it could, but I'd continue in thinking only about SMP.  VMX
> features are not even noticed by `virsh capabilities`, so finding the
> minimal common subset in KVM would not affect the output.

Do you know if we can hotplug CPUs with differing CPUID features on x86?

-- 

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-08 11:46             ` David Hildenbrand
@ 2016-12-08 14:32               ` Radim Krčmář
  2016-12-09 15:12                 ` Don Bowman
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2016-12-08 14:32 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Paolo Bonzini, Don Bowman, kvm

2016-12-08 12:46+0100, David Hildenbrand:
>> > My intuition tells me that whenever we hotplug CPUs that have less features
>> > than used, we are in trouble. So tsc scaling might just be the tip of the
>> > iceberg. I think this is a general problem.
>> 
>> Definitely.  It was not handled in KVM probably because it doesn't have
>> a simple solution.
>> 
>> Finding the minimal common subset on hotplug will take extra work, which
>> is why relaxing the check and having a toggle for features that
>> shouldn't be enabled nor checked is easier.
> 
> But even with a toggle, the problem of not knowing what will be hotplugged
> persists. Before hotplugging, the admin would have to restart all guests
> after flipping the toggle.

Yes.  The admin would need that situation and disable incompatible
features in advance (KVM would just provide the option to do so).
The toggle would be a readonly kvm_module parameter, like we have for
ept and other features.

>> > What should happen if we hotplug such CPUs? We can't run existing VCPUs on
>> > them. And isn't this even a general problem, also for other tasks in the
>> > system (how is that problem solved with cpuid features?)?
>> 
>> 1) Prevent the hotplug -- admin can notice the error, kill guests or
>>    decide to let them finish, and then online hotplugged CPU.
>> 
>> 2) Just warn and trust that admin knows what hotplugging to non-SMP
>>    means.
>> 
>> (2) is less work and give a bit more freedom to an undesirable case, so
>> I slightly prefer it.  I wouldn't for example limit existing VCPUs to
>> compatible CPUs or cleanly kill all guests from the kernel.
> 
> And I assume that such scenarios are also quite unlikely. Or is it a common
> practice in production to hotplug random CPUs from the shelf? I doubt it.

Absurdly unlikely.  I hope it never happens with serious intentions.
(It is amusing to think about, but considering this situation is a waste
 of time for practical purposes.)

>> > (I am currently thinking about "virsh capabilities", could it happen that
>> > our "host" cpu model is no longer valid after we hotplugged cpus (as the
>> > common feature set got smaller)? that would be very ugly)
>> 
>> I think it could, but I'd continue in thinking only about SMP.  VMX
>> features are not even noticed by `virsh capabilities`, so finding the
>> minimal common subset in KVM would not affect the output.
> 
> Do you know if we can hotplug CPUs with differing CPUID features on x86?

Linux doesn't seem to do any sanity checks and it uses globals for CPUID
features and assumes that they are all the same, so we'd be getting what
we deserve.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-08 14:32               ` Radim Krčmář
@ 2016-12-09 15:12                 ` Don Bowman
  2016-12-13 15:43                   ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Don Bowman @ 2016-12-09 15:12 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: David Hildenbrand, Paolo Bonzini, kvm

OK, based on previous feedback, this patch version simply ignores any
inconsistency if a knowing and trusting user wishes.

In my case, two identical processors in stepping and version and all
others have 1 flag missing (retail vs tray version of chip), but the
next person might have another flag.

Comments?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..264870e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -103,6 +103,9 @@ module_param_named(enable_shadow_vmcs,
enable_shadow_vmcs, bool, S_IRUGO);
 static bool __read_mostly nested = 0;
 module_param(nested, bool, S_IRUGO);

+static bool __read_mostly ignore_inconsistency = false;
+module_param(ignore_inconsistency, bool, S_IRUGO);
+
 static u64 __read_mostly host_xss;

 static bool __read_mostly enable_pml = 1;
@@ -3449,6 +3452,7 @@ static __init int setup_vmcs_config(struct
vmcs_config *vmcs_conf)
        vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
        vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
        vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
+       vmcs_conf->cpu_based_2nd_exec_ctrl &= ~SECONDARY_EXEC_TSC_SCALING;
        vmcs_conf->vmexit_ctrl         = _vmexit_control;
        vmcs_conf->vmentry_ctrl        = _vmentry_control;

@@ -9202,9 +9206,11 @@ static void __init vmx_check_processor_compat(void *rtn)
        if (setup_vmcs_config(&vmcs_conf) < 0)
                *(int *)rtn = -EIO;
        if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
-               printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
-                               smp_processor_id());
-               *(int *)rtn = -EIO;
+               printk(KERN_ERR "kvm: CPU %d feature inconsistency%s!\n",
+                               smp_processor_id(),
+                               ignore_inconsistency ? " -- ignored" : "");
+               if (!ignore_inconsistency)
+                       *(int *)rtn = -EIO;
        }
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-09 15:12                 ` Don Bowman
@ 2016-12-13 15:43                   ` Radim Krčmář
  2016-12-14  4:07                     ` Don Bowman
  2016-12-14 12:46                     ` Paolo Bonzini
  0 siblings, 2 replies; 17+ messages in thread
From: Radim Krčmář @ 2016-12-13 15:43 UTC (permalink / raw)
  To: Don Bowman; +Cc: David Hildenbrand, Paolo Bonzini, kvm

2016-12-09 10:12-0500, Don Bowman:
> OK, based on previous feedback, this patch version simply ignores any
> inconsistency if a knowing and trusting user wishes.
> 
> In my case, two identical processors in stepping and version and all
> others have 1 flag missing (retail vs tray version of chip), but the
> next person might have another flag.
> 
> Comments?
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5382b82..264870e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -103,6 +103,9 @@ module_param_named(enable_shadow_vmcs,
> enable_shadow_vmcs, bool, S_IRUGO);
>  static bool __read_mostly nested = 0;
>  module_param(nested, bool, S_IRUGO);
> 
> +static bool __read_mostly ignore_inconsistency = false;
> +module_param(ignore_inconsistency, bool, S_IRUGO);
> +
>  static u64 __read_mostly host_xss;
> 
>  static bool __read_mostly enable_pml = 1;
> @@ -3449,6 +3452,7 @@ static __init int setup_vmcs_config(struct
> vmcs_config *vmcs_conf)
>         vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control;
>         vmcs_conf->cpu_based_exec_ctrl = _cpu_based_exec_control;
>         vmcs_conf->cpu_based_2nd_exec_ctrl = _cpu_based_2nd_exec_control;
> +       vmcs_conf->cpu_based_2nd_exec_ctrl &= ~SECONDARY_EXEC_TSC_SCALING;

This would disable TSC_SCALING for everyone.  You need a second patch
that makes tsc_scaling optional.

>         vmcs_conf->vmexit_ctrl         = _vmexit_control;
>         vmcs_conf->vmentry_ctrl        = _vmentry_control;
> 
> @@ -9202,9 +9206,11 @@ static void __init vmx_check_processor_compat(void *rtn)
>         if (setup_vmcs_config(&vmcs_conf) < 0)
>                 *(int *)rtn = -EIO;
>         if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
> -               printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
> -                               smp_processor_id());
> -               *(int *)rtn = -EIO;
> +               printk(KERN_ERR "kvm: CPU %d feature inconsistency%s!\n",
> +                               smp_processor_id(),
> +                               ignore_inconsistency ? " -- ignored" : "");
> +               if (!ignore_inconsistency)
> +                       *(int *)rtn = -EIO;

Please add a note in the spirit of: "KVM is going to fail unless you
explicitly disable features that are not present on all CPUs."

If this becomes a normal feature, we'd remove the toggle, check that all
enabled features are supported, and pass if KVM will work with selected
features ... the check seems like a waste of code for this rarity of
machines.

Paolo, are you ok with the toggle?
(We can just make it the default without any significant issues.)

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-13 15:43                   ` Radim Krčmář
@ 2016-12-14  4:07                     ` Don Bowman
  2016-12-14 12:30                       ` Paolo Bonzini
  2016-12-14 12:46                     ` Paolo Bonzini
  1 sibling, 1 reply; 17+ messages in thread
From: Don Bowman @ 2016-12-14  4:07 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: David Hildenbrand, Paolo Bonzini, kvm

On 13 December 2016 at 10:43, Radim Krčmář <rkrcmar@redhat.com> wrote:
> 2016-12-09 10:12-0500, Don Bowman:
>> OK, based on previous feedback, this patch version simply ignores any
>> inconsistency if a knowing and trusting user wishes.
>>
>> In my case, two identical processors in stepping and version and all
>> others have 1 flag missing (retail vs tray version of chip), but the
>> next person might have another flag.
>>
>> Comments?
 ...

> Please add a note in the spirit of: "KVM is going to fail unless you
> explicitly disable features that are not present on all CPUs."
>
> If this becomes a normal feature, we'd remove the toggle, check that all
> enabled features are supported, and pass if KVM will work with selected
> features ... the check seems like a waste of code for this rarity of
> machines.
>

OK, how about this?
As a side note, i'm still not clear on why this happens. cpuid cannot
tell the difference between these processors (same model hex-id,
stepping, extended family, extended model). The only difference is one
was an OEM 'tray' and one was the retail (to upgrade the single-socket
machine to dual socket).

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5382b82..84733b3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -103,6 +103,14 @@ module_param_named(enable_shadow_vmcs,
enable_shadow_vmcs, bool, S_IRUGO);
 static bool __read_mostly nested = 0;
 module_param(nested, bool, S_IRUGO);

+/*
+ * Override the 'all cpu have identical flag' check.
+ * Note: KVM is going to fail unless you explicitly disable
+ * features that are not present on all CPUs
+*/
+static bool __read_mostly ignore_inconsistency = false;
+module_param(ignore_inconsistency, bool, S_IRUGO);
+
 static u64 __read_mostly host_xss;

 static bool __read_mostly enable_pml = 1;
@@ -9202,9 +9210,11 @@ static void __init vmx_check_processor_compat(void *rtn)
        if (setup_vmcs_config(&vmcs_conf) < 0)
                *(int *)rtn = -EIO;
        if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
-               printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
-                               smp_processor_id());
-               *(int *)rtn = -EIO;
+               printk(KERN_ERR "kvm: CPU %d feature inconsistency%s!\n",
+                               smp_processor_id(),
+                               ignore_inconsistency ? " -- ignored" : "");
+               if (!ignore_inconsistency)
+                       *(int *)rtn = -EIO;
        }
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-14  4:07                     ` Don Bowman
@ 2016-12-14 12:30                       ` Paolo Bonzini
  0 siblings, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2016-12-14 12:30 UTC (permalink / raw)
  To: Don Bowman, Radim Krčmář; +Cc: David Hildenbrand, kvm

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]



On 14/12/2016 05:07, Don Bowman wrote:
> OK, how about this?
> As a side note, i'm still not clear on why this happens. cpuid cannot
> tell the difference between these processors (same model hex-id,
> stepping, extended family, extended model). The only difference is one
> was an OEM 'tray' and one was the retail (to upgrade the single-socket
> machine to dual socket).

Can you please provide the exact model, and the output of the attached
program for both CPUs.  The argument to the program is a CPU number, so
you should run it once with a CPU id from the OEM processor, and once
with a CPU id from the retail processor (so two ids that have a
different "physical id" in /proc/cpuinfo).

Also please include /proc/cpuinfo contents, and the boot log to check
the microcode revision.  I'd like to pass this info to Intel.

Thanks,

Paolo

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5382b82..84733b3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -103,6 +103,14 @@ module_param_named(enable_shadow_vmcs,
> enable_shadow_vmcs, bool, S_IRUGO);
>  static bool __read_mostly nested = 0;
>  module_param(nested, bool, S_IRUGO);
> 
> +/*
> + * Override the 'all cpu have identical flag' check.
> + * Note: KVM is going to fail unless you explicitly disable
> + * features that are not present on all CPUs
> +*/
> +static bool __read_mostly ignore_inconsistency = false;
> +module_param(ignore_inconsistency, bool, S_IRUGO);
> +
>  static u64 __read_mostly host_xss;
> 
>  static bool __read_mostly enable_pml = 1;
> @@ -9202,9 +9210,11 @@ static void __init vmx_check_processor_compat(void *rtn)
>         if (setup_vmcs_config(&vmcs_conf) < 0)
>                 *(int *)rtn = -EIO;
>         if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
> -               printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
> -                               smp_processor_id());
> -               *(int *)rtn = -EIO;
> +               printk(KERN_ERR "kvm: CPU %d feature inconsistency%s!\n",
> +                               smp_processor_id(),
> +                               ignore_inconsistency ? " -- ignored" : "");
> +               if (!ignore_inconsistency)
> +                       *(int *)rtn = -EIO;
>         }
>  }

[-- Attachment #2: vmxcap --]
[-- Type: text/plain, Size: 8290 bytes --]

#!/usr/bin/python
#
# tool for querying VMX capabilities
#
# Copyright 2009-2010 Red Hat, Inc.
#
# Authors:
#  Avi Kivity <avi@redhat.com>
#
# This work is licensed under the terms of the GNU GPL, version 2.  See
# the COPYING file in the top-level directory.

MSR_IA32_VMX_BASIC = 0x480
MSR_IA32_VMX_PINBASED_CTLS = 0x481
MSR_IA32_VMX_PROCBASED_CTLS = 0x482
MSR_IA32_VMX_EXIT_CTLS = 0x483
MSR_IA32_VMX_ENTRY_CTLS = 0x484
MSR_IA32_VMX_MISC_CTLS = 0x485
MSR_IA32_VMX_PROCBASED_CTLS2 = 0x48B
MSR_IA32_VMX_EPT_VPID_CAP = 0x48C
MSR_IA32_VMX_TRUE_PINBASED_CTLS = 0x48D
MSR_IA32_VMX_TRUE_PROCBASED_CTLS = 0x48E
MSR_IA32_VMX_TRUE_EXIT_CTLS = 0x48F
MSR_IA32_VMX_TRUE_ENTRY_CTLS = 0x490
MSR_IA32_VMX_VMFUNC = 0x491

import sys

class msr(object):
    def __init__(self):
        self.f = open('/dev/cpu/' + sys.argv[1] + '/msr', 'r', 0)
    def read(self, index, default = None):
        import struct
        self.f.seek(index)
        try:
            return struct.unpack('Q', self.f.read(8))[0]
        except:
            return default

class Control(object):
    def __init__(self, name, bits, cap_msr, true_cap_msr = None):
        self.name = name
        self.bits = bits
        self.cap_msr = cap_msr
        self.true_cap_msr = true_cap_msr
    def read2(self, nr):
        m = msr()
        val = m.read(nr, 0)
        return (val & 0xffffffff, val >> 32)
    def show(self):
        print self.name
        mbz, mb1 = self.read2(self.cap_msr)
        tmbz, tmb1 = 0, 0
        if self.true_cap_msr:
            tmbz, tmb1 = self.read2(self.true_cap_msr)
        for bit in sorted(self.bits.keys()):
            zero = not (mbz & (1 << bit))
            one = mb1 & (1 << bit)
            true_zero = not (tmbz & (1 << bit))
            true_one = tmb1 & (1 << bit)
            s= '?'
            if (self.true_cap_msr and true_zero and true_one
                and one and not zero):
                s = 'default'
            elif zero and not one:
                s = 'no'
            elif one and not zero:
                s = 'forced'
            elif one and zero:
                s = 'yes'
            print '  %-40s %s' % (self.bits[bit], s)

class Misc(object):
    def __init__(self, name, bits, msr):
        self.name = name
        self.bits = bits
        self.msr = msr
    def show(self):
        print self.name
        value = msr().read(self.msr, 0)
        print '  Hex: 0x%x' % (value)
        def first_bit(key):
            if type(key) is tuple:
                return key[0]
            else:
                return key
        for bits in sorted(self.bits.keys(), key = first_bit):
            if type(bits) is tuple:
                lo, hi = bits
                fmt = int
            else:
                lo = hi = bits
                def fmt(x):
                    return { True: 'yes', False: 'no' }[x]
            v = (value >> lo) & ((1 << (hi - lo + 1)) - 1)
            print '  %-40s %s' % (self.bits[bits], fmt(v))

controls = [
    Misc(
        name = 'Basic VMX Information',
        bits = {
            (0, 30): 'Revision',
            (32,44): 'VMCS size',
            48: 'VMCS restricted to 32 bit addresses',
            49: 'Dual-monitor support',
            (50, 53): 'VMCS memory type',
            54: 'INS/OUTS instruction information',
            55: 'IA32_VMX_TRUE_*_CTLS support',
            },
        msr = MSR_IA32_VMX_BASIC,
        ),
    Control(
        name = 'pin-based controls',
        bits = {
            0: 'External interrupt exiting',
            3: 'NMI exiting',
            5: 'Virtual NMIs',
            6: 'Activate VMX-preemption timer',
            7: 'Process posted interrupts',
            },
        cap_msr = MSR_IA32_VMX_PINBASED_CTLS,
        true_cap_msr = MSR_IA32_VMX_TRUE_PINBASED_CTLS,
        ),

    Control(
        name = 'primary processor-based controls',
        bits = {
            2: 'Interrupt window exiting',
            3: 'Use TSC offsetting',
            7: 'HLT exiting',
            9: 'INVLPG exiting',
            10: 'MWAIT exiting',
            11: 'RDPMC exiting',
            12: 'RDTSC exiting',
            15: 'CR3-load exiting',
            16: 'CR3-store exiting',
            19: 'CR8-load exiting',
            20: 'CR8-store exiting',
            21: 'Use TPR shadow',
            22: 'NMI-window exiting',
            23: 'MOV-DR exiting',
            24: 'Unconditional I/O exiting',
            25: 'Use I/O bitmaps',
            27: 'Monitor trap flag',
            28: 'Use MSR bitmaps',
            29: 'MONITOR exiting',
            30: 'PAUSE exiting',
            31: 'Activate secondary control',
            },
        cap_msr = MSR_IA32_VMX_PROCBASED_CTLS,
        true_cap_msr = MSR_IA32_VMX_TRUE_PROCBASED_CTLS,
        ),

    Control(
        name = 'secondary processor-based controls',
        bits = {
            0: 'Virtualize APIC accesses',
            1: 'Enable EPT',
            2: 'Descriptor-table exiting',
            3: 'Enable RDTSCP',
            4: 'Virtualize x2APIC mode',
            5: 'Enable VPID',
            6: 'WBINVD exiting',
            7: 'Unrestricted guest',
            8: 'APIC register emulation',
            9: 'Virtual interrupt delivery',
            10: 'PAUSE-loop exiting',
            11: 'RDRAND exiting',
            12: 'Enable INVPCID',
            13: 'Enable VM functions',
            14: 'VMCS shadowing',
            16: 'RDSEED exiting',
            18: 'EPT-violation #VE',
            20: 'Enable XSAVES/XRSTORS',
            25: 'TSC scaling',
            },
        cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2,
        ),

    Control(
        name = 'VM-Exit controls',
        bits = {
            2: 'Save debug controls',
            9: 'Host address-space size',
            12: 'Load IA32_PERF_GLOBAL_CTRL',
            15: 'Acknowledge interrupt on exit',
            18: 'Save IA32_PAT',
            19: 'Load IA32_PAT',
            20: 'Save IA32_EFER',
            21: 'Load IA32_EFER',
            22: 'Save VMX-preemption timer value',
            },
        cap_msr = MSR_IA32_VMX_EXIT_CTLS,
        true_cap_msr = MSR_IA32_VMX_TRUE_EXIT_CTLS,
        ),

    Control(
        name = 'VM-Entry controls',
        bits = {
            2: 'Load debug controls',
            9: 'IA-32e mode guest',
            10: 'Entry to SMM',
            11: 'Deactivate dual-monitor treatment',
            13: 'Load IA32_PERF_GLOBAL_CTRL',
            14: 'Load IA32_PAT',
            15: 'Load IA32_EFER',
            },
        cap_msr = MSR_IA32_VMX_ENTRY_CTLS,
        true_cap_msr = MSR_IA32_VMX_TRUE_ENTRY_CTLS,
        ),

    Misc(
        name = 'Miscellaneous data',
        bits = {
            (0,4): 'VMX-preemption timer scale (log2)',
            5: 'Store EFER.LMA into IA-32e mode guest control',
            6: 'HLT activity state',
            7: 'Shutdown activity state',
            8: 'Wait-for-SIPI activity state',
            15: 'IA32_SMBASE support',
            (16,24): 'Number of CR3-target values',
            (25,27): 'MSR-load/store count recommendation',
            28: 'IA32_SMM_MONITOR_CTL[2] can be set to 1',
            29: 'VMWRITE to VM-exit information fields',
            (32,63): 'MSEG revision identifier',
            },
        msr = MSR_IA32_VMX_MISC_CTLS,
        ),

    Misc(
        name = 'VPID and EPT capabilities',
        bits = {
            0: 'Execute-only EPT translations',
            6: 'Page-walk length 4',
            8: 'Paging-structure memory type UC',
            14: 'Paging-structure memory type WB',
            16: '2MB EPT pages',
            17: '1GB EPT pages',
            20: 'INVEPT supported',
            21: 'EPT accessed and dirty flags',
            25: 'Single-context INVEPT',
            26: 'All-context INVEPT',
            32: 'INVVPID supported',
            40: 'Individual-address INVVPID',
            41: 'Single-context INVVPID',
            42: 'All-context INVVPID',
            43: 'Single-context-retaining-globals INVVPID',
            },
        msr = MSR_IA32_VMX_EPT_VPID_CAP,
        ),
    Misc(
        name = 'VM Functions',
        bits = {
            0: 'EPTP Switching',
            },
        msr = MSR_IA32_VMX_VMFUNC,
        ),
    ]

for c in controls:
    c.show()

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: PATCH: setup_vmcs_config: disable TSC scaling on unlike processors
  2016-12-13 15:43                   ` Radim Krčmář
  2016-12-14  4:07                     ` Don Bowman
@ 2016-12-14 12:46                     ` Paolo Bonzini
  1 sibling, 0 replies; 17+ messages in thread
From: Paolo Bonzini @ 2016-12-14 12:46 UTC (permalink / raw)
  To: Radim Krčmář, Don Bowman; +Cc: David Hildenbrand, kvm



On 13/12/2016 16:43, Radim Krčmář wrote:
> Paolo, are you ok with the toggle?
> (We can just make it the default without any significant issues.)

I don't know... It's ugly and it seems like nothing but an Intel
mess-up.  On ark.intel.com 2696v3 doesn't exist and 2699v3 is listed as
Announced rather than Launched.

I'm okay with a simple patch to disable TSC scaling, but I'm not sure
it's worth doing anything more than that.

Paolo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-12-14 12:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-02  2:32 PATCH: setup_vmcs_config: disable TSC scaling on unlike processors Don Bowman
2016-12-02 15:07 ` Radim Krčmář
2016-12-02 19:10   ` Don Bowman
2016-12-02 20:58     ` Don Bowman
2016-12-05 16:37       ` Radim Krčmář
2016-12-06  8:49   ` David Hildenbrand
2016-12-06  9:09     ` Paolo Bonzini
2016-12-06 11:08       ` Radim Krčmář
2016-12-07 11:37         ` David Hildenbrand
2016-12-07 15:25           ` Radim Krčmář
2016-12-08 11:46             ` David Hildenbrand
2016-12-08 14:32               ` Radim Krčmář
2016-12-09 15:12                 ` Don Bowman
2016-12-13 15:43                   ` Radim Krčmář
2016-12-14  4:07                     ` Don Bowman
2016-12-14 12:30                       ` Paolo Bonzini
2016-12-14 12:46                     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.