linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: SVM: Add support for AMD's OSVW feature in guests
@ 2011-12-19 17:46 Boris Ostrovsky
  2011-12-26 13:53 ` Avi Kivity
  0 siblings, 1 reply; 4+ messages in thread
From: Boris Ostrovsky @ 2011-12-19 17:46 UTC (permalink / raw)
  To: avi, mtosatti, Joerg.Roedel; +Cc: kvm, linux-kernel, boris.ostrovsky

From: Boris Ostrovsky <boris.ostrovsky@amd.com>

In some cases guests should not provide workarounds for errata even when the
physical processor is affected. For example, because of erratum 400 on family
10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
going to idle in order to avoid getting stuck in a non-C0 state. This is not
necessary: HLT and IO instructions are intercepted and therefore there is no
reason for erratum 400 workaround in the guest.

This patch allows us to present a guest with certain errata as fixed,
regardless of the state of actual hardware.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Acked-by: Joerg.Roedel <Joerg.Roedel@amd.com>
---
 arch/x86/kvm/svm.c |   74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c |    2 +-
 2 files changed, 75 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e32243e..4bd41b7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -110,6 +110,13 @@ struct nested_state {
 #define MSRPM_OFFSETS	16
 static u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 
+/*
+ * Set osvw_len to higher value when updated Revision Guides
+ * are published and we know what the new status bits are
+ */
+static uint64_t osvw_len = 4, osvw_status;
+static DEFINE_SPINLOCK(svm_lock);
+
 struct vcpu_svm {
 	struct kvm_vcpu vcpu;
 	struct vmcb *vmcb;
@@ -556,6 +563,35 @@ static void svm_init_erratum_383(void)
 	erratum_383_found = true;
 }
 
+
+static int svm_handle_osvw(struct kvm_vcpu *vcpu,
+			   uint32_t msr, uint64_t *val)
+{
+	struct kvm_cpuid_entry2 *cpuid_entry;
+
+	/* Guest OSVW support */
+	cpuid_entry = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
+	if (!cpuid_entry || !(cpuid_entry->ecx & bit(X86_FEATURE_OSVW)))
+		return -1;
+
+	/*
+	 * Guests should see errata 400 and 415 as fixed (assuming that
+	 * HLT and IO instructions are intercepted).
+	 */
+	if (msr == MSR_AMD64_OSVW_ID_LENGTH)
+		*val = (osvw_len >= 3) ? (osvw_len) : 3;
+	else {
+		*val = osvw_status & ~(6ULL);
+
+		if (osvw_len == 0 && boot_cpu_data.x86 == 0x10)
+			/* Mark erratum 298 as present */
+			*val |= 1;
+	}
+
+	return 0;
+}
+
+
 static int has_svm(void)
 {
 	const char *msg;
@@ -620,6 +656,37 @@ static int svm_hardware_enable(void *garbage)
 		__get_cpu_var(current_tsc_ratio) = TSC_RATIO_DEFAULT;
 	}
 
+
+	/*
+	 * Get OSVW bits.
+	 *
+	 * Note that it is possible to have a system with mixed processor
+	 * revisions and therefore different OSVW bits. If bits are not the same
+	 * on different processors then choose the worst case (i.e. if erratum
+	 * is present on one processor and not on another then assume that the
+	 * erratum is present everywhere).
+	 */
+	if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
+		uint64_t len, status;
+		int err;
+
+		len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &err);
+		if (!err)
+			status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
+						      &err);
+
+		spin_lock(&svm_lock);
+		if (err)
+			osvw_status = osvw_len = 0;
+		else {
+			if (len < osvw_len)
+				osvw_len = len;
+			osvw_status |= status;
+			osvw_status &= (1ULL << osvw_len) - 1;
+		}
+		spin_unlock(&svm_lock);
+	}
+
 	svm_init_erratum_383();
 
 	return 0;
@@ -2982,6 +3049,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
 	case MSR_IA32_UCODE_REV:
 		*data = 0x01000065;
 		break;
+	case MSR_AMD64_OSVW_ID_LENGTH:
+	case MSR_AMD64_OSVW_STATUS:
+		return svm_handle_osvw(vcpu, ecx, data);
 	default:
 		return kvm_get_msr_common(vcpu, ecx, data);
 	}
@@ -3092,6 +3162,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
 	case MSR_VM_IGNNE:
 		pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
 		break;
+	case MSR_AMD64_OSVW_ID_LENGTH:
+	case MSR_AMD64_OSVW_STATUS:
+		/* Writes are ignored */
+		break;
 	default:
 		return kvm_set_msr_common(vcpu, ecx, data);
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..0a84162 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2452,7 +2452,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	const u32 kvm_supported_word6_x86_features =
 		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
 		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
-		F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(XOP) |
+		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
 	/* cpuid 0xC0000001.edx */
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SVM: Add support for AMD's OSVW feature in guests
  2011-12-19 17:46 [PATCH] KVM: SVM: Add support for AMD's OSVW feature in guests Boris Ostrovsky
@ 2011-12-26 13:53 ` Avi Kivity
  2011-12-27  6:51   ` Boris Ostrovsky
  0 siblings, 1 reply; 4+ messages in thread
From: Avi Kivity @ 2011-12-26 13:53 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: mtosatti, Joerg.Roedel, kvm, linux-kernel, boris.ostrovsky

On 12/19/2011 07:46 PM, Boris Ostrovsky wrote:
> From: Boris Ostrovsky <boris.ostrovsky@amd.com>
>
> In some cases guests should not provide workarounds for errata even when the
> physical processor is affected. For example, because of erratum 400 on family
> 10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
> going to idle in order to avoid getting stuck in a non-C0 state. This is not
> necessary: HLT and IO instructions are intercepted and therefore there is no
> reason for erratum 400 workaround in the guest.
>
> This patch allows us to present a guest with certain errata as fixed,
> regardless of the state of actual hardware.
>
>  
> +
> +static int svm_handle_osvw(struct kvm_vcpu *vcpu,
> +			   uint32_t msr, uint64_t *val)
> +{
> +	struct kvm_cpuid_entry2 *cpuid_entry;
> +
> +	/* Guest OSVW support */
> +	cpuid_entry = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
> +	if (!cpuid_entry || !(cpuid_entry->ecx & bit(X86_FEATURE_OSVW)))
> +		return -1;
> +
> +	/*
> +	 * Guests should see errata 400 and 415 as fixed (assuming that
> +	 * HLT and IO instructions are intercepted).
> +	 */
> +	if (msr == MSR_AMD64_OSVW_ID_LENGTH)
> +		*val = (osvw_len >= 3) ? (osvw_len) : 3;
> +	else {
> +		*val = osvw_status & ~(6ULL);
> +
> +		if (osvw_len == 0 && boot_cpu_data.x86 == 0x10)
> +			/* Mark erratum 298 as present */
> +			*val |= 1;
> +	}
> +
> +	return 0;
> +}

Please move this to common code, to support cross-vendor migration.

> +
> +
>  static int has_svm(void)
>  {
>  	const char *msg;
> @@ -620,6 +656,37 @@ static int svm_hardware_enable(void *garbage)
>  		__get_cpu_var(current_tsc_ratio) = TSC_RATIO_DEFAULT;
>  	}
>  
> +
> +	/*
> +	 * Get OSVW bits.
> +	 *
> +	 * Note that it is possible to have a system with mixed processor
> +	 * revisions and therefore different OSVW bits. If bits are not the same
> +	 * on different processors then choose the worst case (i.e. if erratum
> +	 * is present on one processor and not on another then assume that the
> +	 * erratum is present everywhere).
> +	 */
> +	if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
> +		uint64_t len, status;
> +		int err;
> +
> +		len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &err);
> +		if (!err)
> +			status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
> +						      &err);
> +
> +		spin_lock(&svm_lock);
> +		if (err)
> +			osvw_status = osvw_len = 0;
> +		else {
> +			if (len < osvw_len)
> +				osvw_len = len;

This implies that if a bit is inside len, then the OS must apply the
workaround?

> +			osvw_status |= status;
> +			osvw_status &= (1ULL << osvw_len) - 1;
> +		}
> +		spin_unlock(&svm_lock);
> +	}
> +
>  	svm_init_erratum_383();
>  
> @@ -3092,6 +3162,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
>  	case MSR_VM_IGNNE:
>  		pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
>  		break;
> +	case MSR_AMD64_OSVW_ID_LENGTH:
> +	case MSR_AMD64_OSVW_STATUS:
> +		/* Writes are ignored */
> +		break;
>  	default:
>  		return kvm_set_msr_common(vcpu, ecx, data);
>  	}

Best to allow writes, the manual says writes are allowed for bios code,
and the OS should just avoid it.

Not sure what to do here about live migration, since the guest will not
adjust its behaviour.  Should management software read those MSRs from
userspace and check that they're consistent across a cluster?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SVM: Add support for AMD's OSVW feature in guests
  2011-12-26 13:53 ` Avi Kivity
@ 2011-12-27  6:51   ` Boris Ostrovsky
  2011-12-27  9:13     ` Avi Kivity
  0 siblings, 1 reply; 4+ messages in thread
From: Boris Ostrovsky @ 2011-12-27  6:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Boris Ostrovsky, mtosatti, Joerg.Roedel, kvm, linux-kernel

On 12/26/11 08:53, Avi Kivity wrote:
> On 12/19/2011 07:46 PM, Boris Ostrovsky wrote:
>> From: Boris Ostrovsky<boris.ostrovsky@amd.com>
>>
>> In some cases guests should not provide workarounds for errata even when the
>> physical processor is affected. For example, because of erratum 400 on family
>> 10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
>> going to idle in order to avoid getting stuck in a non-C0 state. This is not
>> necessary: HLT and IO instructions are intercepted and therefore there is no
>> reason for erratum 400 workaround in the guest.
>>
>> This patch allows us to present a guest with certain errata as fixed,
>> regardless of the state of actual hardware.
>>
>>
>> +
>> +static int svm_handle_osvw(struct kvm_vcpu *vcpu,
>> +			   uint32_t msr, uint64_t *val)
>> +{
>> +	struct kvm_cpuid_entry2 *cpuid_entry;
>> +
>> +	/* Guest OSVW support */
>> +	cpuid_entry = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
>> +	if (!cpuid_entry || !(cpuid_entry->ecx&  bit(X86_FEATURE_OSVW)))
>> +		return -1;
>> +
>> +	/*
>> +	 * Guests should see errata 400 and 415 as fixed (assuming that
>> +	 * HLT and IO instructions are intercepted).
>> +	 */
>> +	if (msr == MSR_AMD64_OSVW_ID_LENGTH)
>> +		*val = (osvw_len>= 3) ? (osvw_len) : 3;
>> +	else {
>> +		*val = osvw_status&  ~(6ULL);
>> +
>> +		if (osvw_len == 0&&  boot_cpu_data.x86 == 0x10)
>> +			/* Mark erratum 298 as present */
>> +			*val |= 1;
>> +	}
>> +
>> +	return 0;
>> +}
>
> Please move this to common code, to support cross-vendor migration.

OK. (Note though that the OSVW registers are typically checked during 
system boot so if you migrate a running guest it is unlikely that it 
will read them again)

>
>> +
>> +
>>   static int has_svm(void)
>>   {
>>   	const char *msg;
>> @@ -620,6 +656,37 @@ static int svm_hardware_enable(void *garbage)
>>   		__get_cpu_var(current_tsc_ratio) = TSC_RATIO_DEFAULT;
>>   	}
>>
>> +
>> +	/*
>> +	 * Get OSVW bits.
>> +	 *
>> +	 * Note that it is possible to have a system with mixed processor
>> +	 * revisions and therefore different OSVW bits. If bits are not the same
>> +	 * on different processors then choose the worst case (i.e. if erratum
>> +	 * is present on one processor and not on another then assume that the
>> +	 * erratum is present everywhere).
>> +	 */
>> +	if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
>> +		uint64_t len, status;
>> +		int err;
>> +
>> +		len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH,&err);
>> +		if (!err)
>> +			status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
>> +						&err);
>> +
>> +		spin_lock(&svm_lock);
>> +		if (err)
>> +			osvw_status = osvw_len = 0;
>> +		else {
>> +			if (len<  osvw_len)
>> +				osvw_len = len;
>
> This implies that if a bit is inside len, then the OS must apply the
> workaround?

Almost: when osvw bit is inside len then OS workaround is applied if the 
bit is set. And if the bit is outside len then OS workaround should 
always be applied.


>
>> +			osvw_status |= status;
>> +			osvw_status&= (1ULL<<  osvw_len) - 1;
>> +		}
>> +		spin_unlock(&svm_lock);
>> +	}
>> +
>>   	svm_init_erratum_383();
>>
>> @@ -3092,6 +3162,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
>>   	case MSR_VM_IGNNE:
>>   		pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
>>   		break;
>> +	case MSR_AMD64_OSVW_ID_LENGTH:
>> +	case MSR_AMD64_OSVW_STATUS:
>> +		/* Writes are ignored */
>> +		break;
>>   	default:
>>   		return kvm_set_msr_common(vcpu, ecx, data);
>>   	}
>
> Best to allow writes, the manual says writes are allowed for bios code,
> and the OS should just avoid it.

BIOS is usually the one who sets these bits and OS indeed shouldn't try 
to write the registers.

The reason I decided to ignore the writes is for cases when we are on a 
buggy BIOS and a guest writing through to the registers can alter system 
state.

>
> Not sure what to do here about live migration, since the guest will not
> adjust its behaviour.  Should management software read those MSRs from
> userspace and check that they're consistent across a cluster?
>

Either that or start the guest on the most "broken" processor (assuming 
that OSVW registers are only read during boot).

Alternatively the management SW would set emulated values of the MSRs 
for all systems in the cluster but I am not sure whether this is 
supported yet.

-boris


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: SVM: Add support for AMD's OSVW feature in guests
  2011-12-27  6:51   ` Boris Ostrovsky
@ 2011-12-27  9:13     ` Avi Kivity
  0 siblings, 0 replies; 4+ messages in thread
From: Avi Kivity @ 2011-12-27  9:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Boris Ostrovsky, mtosatti, Joerg.Roedel, kvm, linux-kernel

On 12/27/2011 08:51 AM, Boris Ostrovsky wrote:
> On 12/26/11 08:53, Avi Kivity wrote:
>> On 12/19/2011 07:46 PM, Boris Ostrovsky wrote:
>>> From: Boris Ostrovsky<boris.ostrovsky@amd.com>
>>>
>>> In some cases guests should not provide workarounds for errata even
>>> when the
>>> physical processor is affected. For example, because of erratum 400
>>> on family
>>> 10h processors a Linux guest will read an MSR (resulting in VMEXIT)
>>> before
>>> going to idle in order to avoid getting stuck in a non-C0 state.
>>> This is not
>>> necessary: HLT and IO instructions are intercepted and therefore
>>> there is no
>>> reason for erratum 400 workaround in the guest.
>>>
>>> This patch allows us to present a guest with certain errata as fixed,
>>> regardless of the state of actual hardware.
>>>
>>>
>>> +
>>> +static int svm_handle_osvw(struct kvm_vcpu *vcpu,
>>> +               uint32_t msr, uint64_t *val)
>>> +{
>>> +    struct kvm_cpuid_entry2 *cpuid_entry;
>>> +
>>> +    /* Guest OSVW support */
>>> +    cpuid_entry = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
>>> +    if (!cpuid_entry || !(cpuid_entry->ecx&  bit(X86_FEATURE_OSVW)))
>>> +        return -1;
>>> +
>>> +    /*
>>> +     * Guests should see errata 400 and 415 as fixed (assuming that
>>> +     * HLT and IO instructions are intercepted).
>>> +     */
>>> +    if (msr == MSR_AMD64_OSVW_ID_LENGTH)
>>> +        *val = (osvw_len>= 3) ? (osvw_len) : 3;
>>> +    else {
>>> +        *val = osvw_status&  ~(6ULL);
>>> +
>>> +        if (osvw_len == 0&&  boot_cpu_data.x86 == 0x10)
>>> +            /* Mark erratum 298 as present */
>>> +            *val |= 1;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>
>> Please move this to common code, to support cross-vendor migration.
>
> OK. (Note though that the OSVW registers are typically checked during
> system boot so if you migrate a running guest it is unlikely that it
> will read them again)

It will, if it reboots.   Of course that removes any consistency issues,
but we like to be consistent even if there are no issues.

>>> +    if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
>>> +        uint64_t len, status;
>>> +        int err;
>>> +
>>> +        len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH,&err);
>>> +        if (!err)
>>> +            status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
>>> +                        &err);
>>> +
>>> +        spin_lock(&svm_lock);
>>> +        if (err)
>>> +            osvw_status = osvw_len = 0;
>>> +        else {
>>> +            if (len<  osvw_len)
>>> +                osvw_len = len;
>>
>> This implies that if a bit is inside len, then the OS must apply the
>> workaround?
>
>
> Almost: when osvw bit is inside len then OS workaround is applied if
> the bit is set. And if the bit is outside len then OS workaround
> should always be applied.

Ok.

Note that hardware_enable() can be called as part of host cpu hotplug,
so the values can change dynamically.  Not really sure what to do in
this case.

>
>
>>
>>> +            osvw_status |= status;
>>> +            osvw_status&= (1ULL<<  osvw_len) - 1;
>>> +        }
>>> +        spin_unlock(&svm_lock);
>>> +    }
>>> +
>>>       svm_init_erratum_383();
>>>
>>> @@ -3092,6 +3162,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu,
>>> unsigned ecx, u64 data)
>>>       case MSR_VM_IGNNE:
>>>           pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n",
>>> ecx, data);
>>>           break;
>>> +    case MSR_AMD64_OSVW_ID_LENGTH:
>>> +    case MSR_AMD64_OSVW_STATUS:
>>> +        /* Writes are ignored */
>>> +        break;
>>>       default:
>>>           return kvm_set_msr_common(vcpu, ecx, data);
>>>       }
>>
>> Best to allow writes, the manual says writes are allowed for bios code,
>> and the OS should just avoid it.
>
> BIOS is usually the one who sets these bits and OS indeed shouldn't
> try to write the registers.
>
> The reason I decided to ignore the writes is for cases when we are on
> a buggy BIOS and a guest writing through to the registers can alter
> system state.

They'd write to a virtual set of MSRs, not the real ones.

>
>>
>> Not sure what to do here about live migration, since the guest will not
>> adjust its behaviour.  Should management software read those MSRs from
>> userspace and check that they're consistent across a cluster?
>>
>
> Either that or start the guest on the most "broken" processor
> (assuming that OSVW registers are only read during boot).
>
> Alternatively the management SW would set emulated values of the MSRs
> for all systems in the cluster but I am not sure whether this is
> supported yet.

If you allow writing, then host userspace can coordinate across the
cluster and use KVM_SET_MSR to set up the most strict values.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-12-27  9:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-19 17:46 [PATCH] KVM: SVM: Add support for AMD's OSVW feature in guests Boris Ostrovsky
2011-12-26 13:53 ` Avi Kivity
2011-12-27  6:51   ` Boris Ostrovsky
2011-12-27  9:13     ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).