linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
@ 2019-03-20 17:43 Vitaly Kuznetsov
  2019-04-18  7:14 ` Vitaly Kuznetsov
  2019-04-18 14:17 ` Sean Christopherson
  0 siblings, 2 replies; 7+ messages in thread
From: Vitaly Kuznetsov @ 2019-03-20 17:43 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Radim Krčmář, Roman Kagan, linux-kernel

It was reported that with some special Multi Processor Group configuration,
e.g:
 bcdedit.exe /set groupsize 1
 bcdedit.exe /set maxgroup on
 bcdedit.exe /set groupaware on
for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
is in use.

Tracing kvm_hv_flush_tlb immediately reveals the issue:

 kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2

The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
We don't flush anything and apparently it's not what Windows expects.

TLFS doesn't say anything about such requests and newer Windows versions
seem to be unaffected. This all feels like a WS2012 bug, which is, however,
easy to workaround in KVM: let's flush everything when we see an empty
flush request, over-flushing doesn't hurt.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/hyperv.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 421899f6ad7b..5887f7d22ac6 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1371,7 +1371,17 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
 
 		valid_bank_mask = BIT_ULL(0);
 		sparse_banks[0] = flush.processor_mask;
-		all_cpus = flush.flags & HV_FLUSH_ALL_PROCESSORS;
+
+		/*
+		 * WS2012 seems to be buggy, under certain conditions it is
+		 * possible to observe requests with processor_mask = 0x0 and
+		 * no HV_FLUSH_ALL_PROCESSORS flag set. It also seems that
+		 * Windows actually expects us to flush something and crashes
+		 * otherwise. Let's treat processor_mask == 0 same as
+		 * HV_FLUSH_ALL_PROCESSORS.
+		 */
+		all_cpus = (flush.flags & HV_FLUSH_ALL_PROCESSORS) ||
+			(flush.processor_mask == 0);
 	} else {
 		if (unlikely(kvm_read_guest(kvm, ingpa, &flush_ex,
 					    sizeof(flush_ex))))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
  2019-03-20 17:43 [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 Vitaly Kuznetsov
@ 2019-04-18  7:14 ` Vitaly Kuznetsov
  2019-04-18 14:17 ` Sean Christopherson
  1 sibling, 0 replies; 7+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-18  7:14 UTC (permalink / raw)
  To: kvm; +Cc: Paolo Bonzini, Radim Krčmář, Roman Kagan, linux-kernel

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> It was reported that with some special Multi Processor Group configuration,
> e.g:
>  bcdedit.exe /set groupsize 1
>  bcdedit.exe /set maxgroup on
>  bcdedit.exe /set groupaware on
> for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
> is in use.
>
> Tracing kvm_hv_flush_tlb immediately reveals the issue:
>
>  kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
>
> The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
> however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
> We don't flush anything and apparently it's not what Windows expects.
>
> TLFS doesn't say anything about such requests and newer Windows versions
> seem to be unaffected. This all feels like a WS2012 bug, which is, however,
> easy to workaround in KVM: let's flush everything when we see an empty
> flush request, over-flushing doesn't hurt.
>

Ping) I understand the lack of interest towards this ugly hack but I
don't see a better option ...

-- 
Vitaly

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
  2019-03-20 17:43 [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 Vitaly Kuznetsov
  2019-04-18  7:14 ` Vitaly Kuznetsov
@ 2019-04-18 14:17 ` Sean Christopherson
  2019-04-18 16:47   ` Paolo Bonzini
  1 sibling, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2019-04-18 14:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, linux-kernel

On Wed, Mar 20, 2019 at 06:43:20PM +0100, Vitaly Kuznetsov wrote:
> It was reported that with some special Multi Processor Group configuration,
> e.g:
>  bcdedit.exe /set groupsize 1
>  bcdedit.exe /set maxgroup on
>  bcdedit.exe /set groupaware on
> for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
> is in use.
> 
> Tracing kvm_hv_flush_tlb immediately reveals the issue:
> 
>  kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
> 
> The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
> however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
> We don't flush anything and apparently it's not what Windows expects.
> 
> TLFS doesn't say anything about such requests and newer Windows versions
> seem to be unaffected. This all feels like a WS2012 bug, which is, however,
> easy to workaround in KVM: let's flush everything when we see an empty
> flush request, over-flushing doesn't hurt.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/hyperv.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 421899f6ad7b..5887f7d22ac6 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1371,7 +1371,17 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
>  
>  		valid_bank_mask = BIT_ULL(0);
>  		sparse_banks[0] = flush.processor_mask;
> -		all_cpus = flush.flags & HV_FLUSH_ALL_PROCESSORS;
> +
> +		/*
> +		 * WS2012 seems to be buggy, under certain conditions it is
> +		 * possible to observe requests with processor_mask = 0x0 and
> +		 * no HV_FLUSH_ALL_PROCESSORS flag set. It also seems that

"and no HV_FLUSH_ALL_PROCESSORS flag set" is awkward, and probably
extraneous.  The whole comment is a probably a bit more verbose than it
needs to be, e.g. most readers won't care how we came to the conclusion
that 'processor_mask == 0', and those that care about the background will
read the changelog anyways.

Maybe something like this:

		/*
		 * Some Windows versions, e.g. WS2012, use processor_mask = 0
		 * in lieu of the dedicated flag to flush all processors.
		 */


> +		 * Windows actually expects us to flush something and crashes
> +		 * otherwise. Let's treat processor_mask == 0 same as
> +		 * HV_FLUSH_ALL_PROCESSORS.
> +		 */
> +		all_cpus = (flush.flags & HV_FLUSH_ALL_PROCESSORS) ||
> +			(flush.processor_mask == 0);

Nits:

Personal preference, but I like '!flush.processor_mask' in this case as it
immediately conveys that we're handling the scenario where the guest didn't
set a mask.  Then there wouldn't be a visual need for the second set of
parentheses.

Aligning its indentation with the first first chunk of the statement would
also be nice, but again, personal preference.  :-)

>  	} else {
>  		if (unlikely(kvm_read_guest(kvm, ingpa, &flush_ex,
>  					    sizeof(flush_ex))))
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
  2019-04-18 14:17 ` Sean Christopherson
@ 2019-04-18 16:47   ` Paolo Bonzini
  2019-04-18 16:56     ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2019-04-18 16:47 UTC (permalink / raw)
  To: Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, Radim Krčmář, Roman Kagan, linux-kernel

On 18/04/19 16:17, Sean Christopherson wrote:
> On Wed, Mar 20, 2019 at 06:43:20PM +0100, Vitaly Kuznetsov wrote:
>> It was reported that with some special Multi Processor Group configuration,
>> e.g:
>>  bcdedit.exe /set groupsize 1
>>  bcdedit.exe /set maxgroup on
>>  bcdedit.exe /set groupaware on
>> for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
>> is in use.
>>
>> Tracing kvm_hv_flush_tlb immediately reveals the issue:
>>
>>  kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
>>
>> The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
>> however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
>> We don't flush anything and apparently it's not what Windows expects.
>>
>> TLFS doesn't say anything about such requests and newer Windows versions
>> seem to be unaffected. This all feels like a WS2012 bug, which is, however,
>> easy to workaround in KVM: let's flush everything when we see an empty
>> flush request, over-flushing doesn't hurt.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kvm/hyperv.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 421899f6ad7b..5887f7d22ac6 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -1371,7 +1371,17 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
>>  
>>  		valid_bank_mask = BIT_ULL(0);
>>  		sparse_banks[0] = flush.processor_mask;
>> -		all_cpus = flush.flags & HV_FLUSH_ALL_PROCESSORS;
>> +
>> +		/*
>> +		 * WS2012 seems to be buggy, under certain conditions it is
>> +		 * possible to observe requests with processor_mask = 0x0 and
>> +		 * no HV_FLUSH_ALL_PROCESSORS flag set. It also seems that
> 
> "and no HV_FLUSH_ALL_PROCESSORS flag set" is awkward, and probably
> extraneous.  The whole comment is a probably a bit more verbose than it
> needs to be, e.g. most readers won't care how we came to the conclusion
> that 'processor_mask == 0', and those that care about the background will
> read the changelog anyways.
> 
> Maybe something like this:
> 
> 		/*
> 		 * Some Windows versions, e.g. WS2012, use processor_mask = 0
> 		 * in lieu of the dedicated flag to flush all processors.
> 		 */

Hmm, not really.  "In lieu" seems intentional.  "without" is more accurate.
My take:

                 * Work around possible WS2012 bug: it sends hypercalls
                 * with processor_mask = 0x0 and HV_FLUSH_ALL_PROCESSORS clear,
                 * while also expecting us to flush something and crashing if
                 * we don't. Let's treat processor_mask == 0 same as
                 * HV_FLUSH_ALL_PROCESSORS.
                 */

Paolo


> 
> 
>> +		 * Windows actually expects us to flush something and crashes
>> +		 * otherwise. Let's treat processor_mask == 0 same as
>> +		 * HV_FLUSH_ALL_PROCESSORS.
>> +		 */
>> +		all_cpus = (flush.flags & HV_FLUSH_ALL_PROCESSORS) ||
>> +			(flush.processor_mask == 0);
> 
> Nits:
> 
> Personal preference, but I like '!flush.processor_mask' in this case as it
> immediately conveys that we're handling the scenario where the guest didn't
> set a mask.  Then there wouldn't be a visual need for the second set of
> parentheses.
> 
> Aligning its indentation with the first first chunk of the statement would
> also be nice, but again, personal preference.  :-)
> 
>>  	} else {
>>  		if (unlikely(kvm_read_guest(kvm, ingpa, &flush_ex,
>>  					    sizeof(flush_ex))))
>> -- 
>> 2.20.1
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
  2019-04-18 16:47   ` Paolo Bonzini
@ 2019-04-18 16:56     ` Sean Christopherson
  2019-04-18 16:58       ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2019-04-18 16:56 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, kvm, Radim Krčmář,
	Roman Kagan, linux-kernel

On Thu, Apr 18, 2019 at 06:47:56PM +0200, Paolo Bonzini wrote:
> On 18/04/19 16:17, Sean Christopherson wrote:
> > On Wed, Mar 20, 2019 at 06:43:20PM +0100, Vitaly Kuznetsov wrote:
> >> It was reported that with some special Multi Processor Group configuration,
> >> e.g:
> >>  bcdedit.exe /set groupsize 1
> >>  bcdedit.exe /set maxgroup on
> >>  bcdedit.exe /set groupaware on
> >> for a 16-vCPU guest WS2012 shows BSOD on boot when PV TLB flush mechanism
> >> is in use.
> >>
> >> Tracing kvm_hv_flush_tlb immediately reveals the issue:
> >>
> >>  kvm_hv_flush_tlb: processor_mask 0x0 address_space 0x0 flags 0x2
> >>
> >> The only flag set in this request is HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES,
> >> however, processor_mask is 0x0 and no HV_FLUSH_ALL_PROCESSORS is specified.
> >> We don't flush anything and apparently it's not what Windows expects.
> >>
> >> TLFS doesn't say anything about such requests and newer Windows versions
> >> seem to be unaffected. This all feels like a WS2012 bug, which is, however,
> >> easy to workaround in KVM: let's flush everything when we see an empty
> >> flush request, over-flushing doesn't hurt.
> >>
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/kvm/hyperv.c | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> >> index 421899f6ad7b..5887f7d22ac6 100644
> >> --- a/arch/x86/kvm/hyperv.c
> >> +++ b/arch/x86/kvm/hyperv.c
> >> @@ -1371,7 +1371,17 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *current_vcpu, u64 ingpa,
> >>  
> >>  		valid_bank_mask = BIT_ULL(0);
> >>  		sparse_banks[0] = flush.processor_mask;
> >> -		all_cpus = flush.flags & HV_FLUSH_ALL_PROCESSORS;
> >> +
> >> +		/*
> >> +		 * WS2012 seems to be buggy, under certain conditions it is
> >> +		 * possible to observe requests with processor_mask = 0x0 and
> >> +		 * no HV_FLUSH_ALL_PROCESSORS flag set. It also seems that
> > 
> > "and no HV_FLUSH_ALL_PROCESSORS flag set" is awkward, and probably
> > extraneous.  The whole comment is a probably a bit more verbose than it
> > needs to be, e.g. most readers won't care how we came to the conclusion
> > that 'processor_mask == 0', and those that care about the background will
> > read the changelog anyways.
> > 
> > Maybe something like this:
> > 
> > 		/*
> > 		 * Some Windows versions, e.g. WS2012, use processor_mask = 0
> > 		 * in lieu of the dedicated flag to flush all processors.
> > 		 */
> 
> Hmm, not really.  "In lieu" seems intentional.  "without" is more accurate.

True, probably isn't exactly by design.

> My take:
> 
>                  * Work around possible WS2012 bug: it sends hypercalls
>                  * with processor_mask = 0x0 and HV_FLUSH_ALL_PROCESSORS clear,
>                  * while also expecting us to flush something and crashing if
>                  * we don't. Let's treat processor_mask == 0 same as
>                  * HV_FLUSH_ALL_PROCESSORS.

Nice.  What about this for the last sentence?  Either way, LGTM.

  For simplicity, flush all CPUs if the guest neglected to set processor_mask.

>                  */
> 
> Paolo
> 
> 
> > 
> > 
> >> +		 * Windows actually expects us to flush something and crashes
> >> +		 * otherwise. Let's treat processor_mask == 0 same as
> >> +		 * HV_FLUSH_ALL_PROCESSORS.
> >> +		 */
> >> +		all_cpus = (flush.flags & HV_FLUSH_ALL_PROCESSORS) ||
> >> +			(flush.processor_mask == 0);
> > 
> > Nits:
> > 
> > Personal preference, but I like '!flush.processor_mask' in this case as it
> > immediately conveys that we're handling the scenario where the guest didn't
> > set a mask.  Then there wouldn't be a visual need for the second set of
> > parentheses.
> > 
> > Aligning its indentation with the first first chunk of the statement would
> > also be nice, but again, personal preference.  :-)
> > 
> >>  	} else {
> >>  		if (unlikely(kvm_read_guest(kvm, ingpa, &flush_ex,
> >>  					    sizeof(flush_ex))))
> >> -- 
> >> 2.20.1
> >>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
  2019-04-18 16:56     ` Sean Christopherson
@ 2019-04-18 16:58       ` Paolo Bonzini
  2019-04-18 17:00         ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2019-04-18 16:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, kvm, Radim Krčmář,
	Roman Kagan, linux-kernel

On 18/04/19 18:56, Sean Christopherson wrote:
>   For simplicity, flush all CPUs if the guest neglected to set processor_mask.

We don't know if it neglected to set processor_mask or the flag...

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
  2019-04-18 16:58       ` Paolo Bonzini
@ 2019-04-18 17:00         ` Sean Christopherson
  0 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2019-04-18 17:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, kvm, Radim Krčmář,
	Roman Kagan, linux-kernel

On Thu, Apr 18, 2019 at 06:58:01PM +0200, Paolo Bonzini wrote:
> On 18/04/19 18:56, Sean Christopherson wrote:
> >   For simplicity, flush all CPUs if the guest neglected to set processor_mask.
> 
> We don't know if it neglected to set processor_mask or the flag...

Were you a lawyer before becoming a maintainer? :-)

Yeah, go with your version, I'll get out of the way...

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-18 17:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-20 17:43 [PATCH] x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012 Vitaly Kuznetsov
2019-04-18  7:14 ` Vitaly Kuznetsov
2019-04-18 14:17 ` Sean Christopherson
2019-04-18 16:47   ` Paolo Bonzini
2019-04-18 16:56     ` Sean Christopherson
2019-04-18 16:58       ` Paolo Bonzini
2019-04-18 17:00         ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).