[PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume
       [not found] <20220301201344.18191-1-sashal@kernel.org>
@ 2022-03-01 20:13 ` Sasha Levin
  2022-03-01 20:19   ` Paolo Bonzini
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 06/28] kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2022-03-01 20:13 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Wanpeng Li, Paolo Bonzini, Sasha Levin, kvm

From: Wanpeng Li <wanpengli@tencent.com>

[ Upstream commit 4cb9a998b1ce25fad74a82f5a5c45a4ef40de337 ]

I saw the below splatting after the host suspended and resumed.

   WARNING: CPU: 0 PID: 2943 at kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:5531 kvm_resume+0x2c/0x30 [kvm]
   CPU: 0 PID: 2943 Comm: step_after_susp Tainted: G        W IOE     5.17.0-rc3+ #4
   RIP: 0010:kvm_resume+0x2c/0x30 [kvm]
   Call Trace:
    <TASK>
    syscore_resume+0x90/0x340
    suspend_devices_and_enter+0xaee/0xe90
    pm_suspend.cold+0x36b/0x3c2
    state_store+0x82/0xf0
    kernfs_fop_write_iter+0x1b6/0x260
    new_sync_write+0x258/0x370
    vfs_write+0x33f/0x510
    ksys_write+0xc9/0x160
    do_syscall_64+0x3b/0xc0
    entry_SYSCALL_64_after_hwframe+0x44/0xae

lockdep_is_held() can return -1 when lockdep is disabled which triggers
this warning. Let's use lockdep_assert_not_held() which can detect
incorrect calls while holding a lock and it also avoids false negatives
when lockdep is disabled.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1644920142-81249-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 virt/kvm/kvm_main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 71ddc7a8bc302..6ae9e04d0585e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5347,9 +5347,7 @@ static int kvm_suspend(void)
 static void kvm_resume(void)
 {
 	if (kvm_usage_count) {
-#ifdef CONFIG_LOCKDEP
-		WARN_ON(lockdep_is_held(&kvm_count_lock));
-#endif
+		lockdep_assert_not_held(&kvm_count_lock);
 		hardware_enable_nolock(NULL);
 	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH AUTOSEL 5.16 06/28] kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
       [not found] <20220301201344.18191-1-sashal@kernel.org>
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume Sasha Levin
@ 2022-03-01 20:13 ` Sasha Levin
  2022-03-01 20:19   ` Paolo Bonzini
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 Sasha Levin
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 12/28] x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU Sasha Levin
  3 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2022-03-01 20:13 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Anton Romanov, Paolo Bonzini, Sasha Levin, tglx, mingo, bp,
	dave.hansen, x86, kvm

From: Anton Romanov <romanton@google.com>

[ Upstream commit 3a55f729240a686aa8af00af436306c0cd532522 ]

If vcpu has tsc_always_catchup set each request updates pvclock data.
KVM_HC_CLOCK_PAIRING consumers such as ptp_kvm_x86 rely on tsc read on
host's side and do hypercall inside pvclock_read_retry loop leading to
infinite loop in such situation.

v3:
    Removed warn
    Changed return code to KVM_EFAULT
v2:
    Added warn

Signed-off-by: Anton Romanov <romanton@google.com>
Message-Id: <20220216182653.506850-1-romanton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kvm/x86.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0714fa0e7ede0..18fc0367ef21a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8769,6 +8769,13 @@ static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
 	if (clock_type != KVM_CLOCK_PAIRING_WALLCLOCK)
 		return -KVM_EOPNOTSUPP;
 
+	/*
+	 * When tsc is in permanent catchup mode guests won't be able to use
+	 * pvclock_read_retry loop to get consistent view of pvclock
+	 */
+	if (vcpu->arch.tsc_always_catchup)
+		return -KVM_EOPNOTSUPP;
+
 	if (!kvm_get_walltime_and_clockread(&ts, &cycle))
 		return -KVM_EOPNOTSUPP;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
       [not found] <20220301201344.18191-1-sashal@kernel.org>
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume Sasha Levin
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 06/28] kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode Sasha Levin
@ 2022-03-01 20:13 ` Sasha Levin
  2022-03-01 20:22   ` Paolo Bonzini
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 12/28] x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU Sasha Levin
  3 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2022-03-01 20:13 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Leonardo Bras, Paolo Bonzini, Sasha Levin, tglx, mingo, bp,
	dave.hansen, x86, chang.seok.bae, luto, kvm

From: Leonardo Bras <leobras@redhat.com>

[ Upstream commit ad856280ddea3401e1f5060ef20e6de9f6122c76 ]

During host/guest switch (like in kvm_arch_vcpu_ioctl_run()), the kernel
swaps the fpu between host/guest contexts, by using fpu_swap_kvm_fpstate().

When xsave feature is available, the fpu swap is done by:
- xsave(s) instruction, with guest's fpstate->xfeatures as mask, is used
  to store the current state of the fpu registers to a buffer.
- xrstor(s) instruction, with (fpu_kernel_cfg.max_features &
  XFEATURE_MASK_FPSTATE) as mask, is used to put the buffer into fpu regs.

For xsave(s) the mask is used to limit what parts of the fpu regs will
be copied to the buffer. Likewise on xrstor(s), the mask is used to
limit what parts of the fpu regs will be changed.

The mask for xsave(s), the guest's fpstate->xfeatures, is defined on
kvm_arch_vcpu_create(), which (in summary) sets it to all features
supported by the cpu which are enabled on kernel config.

This means that xsave(s) will save to guest buffer all the fpu regs
contents the cpu has enabled when the guest is paused, even if they
are not used.

This would not be an issue, if xrstor(s) would also do that.

xrstor(s)'s mask for host/guest swap is basically every valid feature
contained in kernel config, except XFEATURE_MASK_PKRU.
Accordingto kernel src, it is instead switched in switch_to() and
flush_thread().

Then, the following happens with a host supporting PKRU starts a
guest that does not support it:
1 - Host has XFEATURE_MASK_PKRU set. 1st switch to guest,
2 - xsave(s) fpu regs to host fpustate (buffer has XFEATURE_MASK_PKRU)
3 - xrstor(s) guest fpustate to fpu regs (fpu regs have XFEATURE_MASK_PKRU)
4 - guest runs, then switch back to host,
5 - xsave(s) fpu regs to guest fpstate (buffer now have XFEATURE_MASK_PKRU)
6 - xrstor(s) host fpstate to fpu regs.
7 - kvm_vcpu_ioctl_x86_get_xsave() copy guest fpstate to userspace (with
    XFEATURE_MASK_PKRU, which should not be supported by guest vcpu)

On 5, even though the guest does not support PKRU, it does have the flag
set on guest fpstate, which is transferred to userspace via vcpu ioctl
KVM_GET_XSAVE.

This becomes a problem when the user decides on migrating the above guest
to another machine that does not support PKRU: the new host restores
guest's fpu regs to as they were before (xrstor(s)), but since the new
host don't support PKRU, a general-protection exception ocurs in xrstor(s)
and that crashes the guest.

This can be solved by making the guest's fpstate->user_xfeatures hold
a copy of guest_supported_xcr0. This way, on 7 the only flags copied to
userspace will be the ones compatible to guest requirements, and thus
there will be no issue during migration.

As a bonus, it will also fail if userspace tries to set fpu features
(with the KVM_SET_XSAVE ioctl) that are not compatible to the guest
configuration.  Such features will never be returned by KVM_GET_XSAVE
or KVM_GET_XSAVE2.

Also, since kvm_vcpu_after_set_cpuid() now sets fpstate->user_xfeatures,
there is not need to set it in kvm_check_cpuid(). So, change
fpstate_realloc() so it does not touch fpstate->user_xfeatures if a
non-NULL guest_fpu is passed, which is the case when kvm_check_cpuid()
calls it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220217053028.96432-2-leobras@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kernel/fpu/xstate.c | 5 ++++-
 arch/x86/kvm/cpuid.c         | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index d28829403ed08..6ac01f9828530 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1563,7 +1563,10 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
 		fpregs_restore_userregs();

 	newfps->xfeatures = curfps->xfeatures | xfeatures;
-	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
+
+	if (!guest_fpu)
+		newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
+
 	newfps->xfd = curfps->xfd & ~xfeatures;

 	curfps = fpu_install_fpstate(fpu, newfps);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index bf18679757c70..875dce4aa2d28 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -276,6 +276,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_supported_xcr0 =
 		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);

+	vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0;
+
 	kvm_update_pv_runtime(vcpu);

 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH AUTOSEL 5.16 12/28] x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
       [not found] <20220301201344.18191-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 Sasha Levin
@ 2022-03-01 20:13 ` Sasha Levin
  2022-03-01 20:17   ` Paolo Bonzini
  3 siblings, 1 reply; 19+ messages in thread
From: Sasha Levin @ 2022-03-01 20:13 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Wanpeng Li, Paolo Bonzini, Sasha Levin, tglx, mingo, bp,
	dave.hansen, x86, kvm

From: Wanpeng Li <wanpengli@tencent.com>

[ Upstream commit ec756e40e271866f951d77c5e923d8deb6002b15 ]

Inspired by commit 3553ae5690a (x86/kvm: Don't use pvqspinlock code if
only 1 vCPU), on a VM with only 1 vCPU, there is no need to enable
pv tlb/ipi/sched_yield and we can save the memory for __pv_cpu_mask.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1645171838-2855-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kernel/kvm.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 59abbdad7729c..ff3db164e52cb 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -462,19 +462,22 @@ static bool pv_tlb_flush_supported(void)
 {
 	return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
 		!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
-		kvm_para_has_feature(KVM_FEATURE_STEAL_TIME));
+		kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) &&
+		(num_possible_cpus() != 1));
 }
 
 static bool pv_ipi_supported(void)
 {
-	return kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI);
+	return (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
+	       (num_possible_cpus() != 1));
 }
 
 static bool pv_sched_yield_supported(void)
 {
 	return (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) &&
 		!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
-	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME));
+	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) &&
+	    (num_possible_cpus() != 1));
 }
 
 #define KVM_IPI_CLUSTER_SIZE	(2 * BITS_PER_LONG)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 12/28] x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 12/28] x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU Sasha Levin
@ 2022-03-01 20:17   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2022-03-01 20:17 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Wanpeng Li, tglx, mingo, bp, dave.hansen, x86, kvm

On 3/1/22 21:13, Sasha Levin wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> [ Upstream commit ec756e40e271866f951d77c5e923d8deb6002b15 ]
> 
> Inspired by commit 3553ae5690a (x86/kvm: Don't use pvqspinlock code if
> only 1 vCPU), on a VM with only 1 vCPU, there is no need to enable
> pv tlb/ipi/sched_yield and we can save the memory for __pv_cpu_mask.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> Message-Id: <1645171838-2855-1-git-send-email-wanpengli@tencent.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kernel/kvm.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 59abbdad7729c..ff3db164e52cb 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -462,19 +462,22 @@ static bool pv_tlb_flush_supported(void)
>   {
>   	return (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) &&
>   		!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
> -		kvm_para_has_feature(KVM_FEATURE_STEAL_TIME));
> +		kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) &&
> +		(num_possible_cpus() != 1));
>   }
>   
>   static bool pv_ipi_supported(void)
>   {
> -	return kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI);
> +	return (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> +	       (num_possible_cpus() != 1));
>   }
>   
>   static bool pv_sched_yield_supported(void)
>   {
>   	return (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) &&
>   		!kvm_para_has_hint(KVM_HINTS_REALTIME) &&
> -	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME));
> +	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME) &&
> +	    (num_possible_cpus() != 1));
>   }
>   
>   #define KVM_IPI_CLUSTER_SIZE	(2 * BITS_PER_LONG)

NACK

Not really necessary.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume Sasha Levin
@ 2022-03-01 20:19   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2022-03-01 20:19 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable; +Cc: Wanpeng Li, kvm

On 3/1/22 21:13, Sasha Levin wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> [ Upstream commit 4cb9a998b1ce25fad74a82f5a5c45a4ef40de337 ]
> 
> I saw the below splatting after the host suspended and resumed.
> 
>     WARNING: CPU: 0 PID: 2943 at kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:5531 kvm_resume+0x2c/0x30 [kvm]
>     CPU: 0 PID: 2943 Comm: step_after_susp Tainted: G        W IOE     5.17.0-rc3+ #4
>     RIP: 0010:kvm_resume+0x2c/0x30 [kvm]
>     Call Trace:
>      <TASK>
>      syscore_resume+0x90/0x340
>      suspend_devices_and_enter+0xaee/0xe90
>      pm_suspend.cold+0x36b/0x3c2
>      state_store+0x82/0xf0
>      kernfs_fop_write_iter+0x1b6/0x260
>      new_sync_write+0x258/0x370
>      vfs_write+0x33f/0x510
>      ksys_write+0xc9/0x160
>      do_syscall_64+0x3b/0xc0
>      entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> lockdep_is_held() can return -1 when lockdep is disabled which triggers
> this warning. Let's use lockdep_assert_not_held() which can detect
> incorrect calls while holding a lock and it also avoids false negatives
> when lockdep is disabled.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> Message-Id: <1644920142-81249-1-git-send-email-wanpengli@tencent.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   virt/kvm/kvm_main.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 71ddc7a8bc302..6ae9e04d0585e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5347,9 +5347,7 @@ static int kvm_suspend(void)
>   static void kvm_resume(void)
>   {
>   	if (kvm_usage_count) {
> -#ifdef CONFIG_LOCKDEP
> -		WARN_ON(lockdep_is_held(&kvm_count_lock));
> -#endif
> +		lockdep_assert_not_held(&kvm_count_lock);
>   		hardware_enable_nolock(NULL);
>   	}
>   }

Acked-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 06/28] kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 06/28] kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode Sasha Levin
@ 2022-03-01 20:19   ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2022-03-01 20:19 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Anton Romanov, tglx, mingo, bp, dave.hansen, x86, kvm

On 3/1/22 21:13, Sasha Levin wrote:
> From: Anton Romanov <romanton@google.com>
> 
> [ Upstream commit 3a55f729240a686aa8af00af436306c0cd532522 ]
> 
> If vcpu has tsc_always_catchup set each request updates pvclock data.
> KVM_HC_CLOCK_PAIRING consumers such as ptp_kvm_x86 rely on tsc read on
> host's side and do hypercall inside pvclock_read_retry loop leading to
> infinite loop in such situation.
> 
> v3:
>      Removed warn
>      Changed return code to KVM_EFAULT
> v2:
>      Added warn
> 
> Signed-off-by: Anton Romanov <romanton@google.com>
> Message-Id: <20220216182653.506850-1-romanton@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   arch/x86/kvm/x86.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0714fa0e7ede0..18fc0367ef21a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8769,6 +8769,13 @@ static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
>   	if (clock_type != KVM_CLOCK_PAIRING_WALLCLOCK)
>   		return -KVM_EOPNOTSUPP;
>   
> +	/*
> +	 * When tsc is in permanent catchup mode guests won't be able to use
> +	 * pvclock_read_retry loop to get consistent view of pvclock
> +	 */
> +	if (vcpu->arch.tsc_always_catchup)
> +		return -KVM_EOPNOTSUPP;
> +
>   	if (!kvm_get_walltime_and_clockread(&ts, &cycle))
>   		return -KVM_EOPNOTSUPP;
>   

Acked-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 Sasha Levin
@ 2022-03-01 20:22   ` Paolo Bonzini
  2022-06-03 18:40     ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2022-03-01 20:22 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Leonardo Bras, tglx, mingo, bp, dave.hansen, x86, chang.seok.bae,
	luto, kvm

On 3/1/22 21:13, Sasha Levin wrote:
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index d28829403ed08..6ac01f9828530 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1563,7 +1563,10 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
>   		fpregs_restore_userregs();
>   
>   	newfps->xfeatures = curfps->xfeatures | xfeatures;
> -	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
> +
> +	if (!guest_fpu)
> +		newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
> +
>   	newfps->xfd = curfps->xfd & ~xfeatures;
>   
>   	curfps = fpu_install_fpstate(fpu, newfps);
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index bf18679757c70..875dce4aa2d28 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -276,6 +276,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	vcpu->arch.guest_supported_xcr0 =
>   		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
>   
> +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0;
> +
>   	kvm_update_pv_runtime(vcpu);
>   
>   	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);

Leonardo, was this also buggy in 5.16?  (I should have asked for a Fixes 
tag...).

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-03-01 20:22   ` Paolo Bonzini
@ 2022-06-03 18:40     ` Peter Xu
  2022-06-06 16:18       ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2022-06-03 18:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sasha Levin, linux-kernel, stable, Leonardo Bras, tglx, mingo,
	bp, dave.hansen, x86, chang.seok.bae, luto, kvm,
	Sean Christopherson

On Tue, Mar 01, 2022 at 09:22:10PM +0100, Paolo Bonzini wrote:
> On 3/1/22 21:13, Sasha Levin wrote:
> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > index d28829403ed08..6ac01f9828530 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -1563,7 +1563,10 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
> >   		fpregs_restore_userregs();
> >   	newfps->xfeatures = curfps->xfeatures | xfeatures;
> > -	newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
> > +
> > +	if (!guest_fpu)
> > +		newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
> > +
> >   	newfps->xfd = curfps->xfd & ~xfeatures;
> >   	curfps = fpu_install_fpstate(fpu, newfps);
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index bf18679757c70..875dce4aa2d28 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -276,6 +276,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >   	vcpu->arch.guest_supported_xcr0 =
> >   		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> > +	vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0;
> > +
> >   	kvm_update_pv_runtime(vcpu);
> >   	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> 
> Leonardo, was this also buggy in 5.16?  (I should have asked for a Fixes
> tag...).

I just stumbled over this patch on some migration tests in the past few
days..

In short, I was migrating a VM from 5.15 host to 5.18 host and the guest
trigger double fault immediately after the switch-over (I think that's when
it's trying to do vmenter, a VECTOR_DF was injected), with either precopy
or postcopy.

After I upgrade 5.15 src host to 5.18 host, problem goes away. I did a
bisect on dest and surprisingly it points to this commit.

Side note: I'm using two hosts that have the same processor model, so no
case of missing features on either side - they just match.

I'm not really sure whether this is a bug or by design - do we require this
patch to be applied to all stable branches to make the guest not crash
after migration, or it is unexpected?

FWICT, this patch modifies user_xfeatures while we don't do that trick
before. It sounds reasonable to me from the 1st glance, say if the guest
didn't enable some of the fpu features so we don't need to migrate those
fpu state chunks as we're migrating things based on user_xfeatures, and it
sounds good to solve the migration issue on "has-pksu" host to "no-pksu"
host as described in the patch commit message.

However there seems to be something missing at least to me, on why it'll
fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
patch, but 0x0 if with it.

I think what it should be happening is user_xfeatures will be set on src
with 0x7 (old kernel), so we should have migrated some more chunks to dest,
but I just don't quickly understand why that's a problem there because
fundamentally when we restore the fpu status (fpu_swap_kvm_fpstate) we use
the max feature bitmask anyway, and the dest hardware should support all of
them.  I don't quickly see how that could trigger a double fault, though.

I'll continue the dig probably next week, before that, any thoughts?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-03 18:40     ` Peter Xu
@ 2022-06-06 16:18       ` Paolo Bonzini
  2022-06-06 21:27         ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2022-06-06 16:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sasha Levin, linux-kernel, stable, Leonardo Bras, tglx, mingo,
	bp, dave.hansen, x86, chang.seok.bae, luto, kvm,
	Sean Christopherson

On 6/3/22 20:40, Peter Xu wrote:
> I'm not really sure whether this is a bug or by design - do we require this
> patch to be applied to all stable branches to make the guest not crash
> after migration, or it is unexpected?

Yes, we do, though the only reported bug was for PKRU.

> However there seems to be something missing at least to me, on why it'll
> fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> patch, but 0x0 if with it.

What CPU model are you using for the VM?  For example, if the source 
lacks this patch but the destination has it, the source will transmit 
YMM registers, but the destination will fail to set them if they are not 
available for the selected CPU model.

See the commit message: "As a bonus, it will also fail if userspace 
tries to set fpu features (with the KVM_SET_XSAVE ioctl) that are not 
compatible to the guest configuration.  Such features will never be 
returned by KVM_GET_XSAVE or KVM_GET_XSAVE2."

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-06 16:18       ` Paolo Bonzini
@ 2022-06-06 21:27         ` Peter Xu
  2022-06-07 12:54           ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2022-06-06 21:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sasha Levin, linux-kernel, stable, Leonardo Bras, tglx, mingo,
	bp, dave.hansen, x86, chang.seok.bae, luto, kvm,
	Sean Christopherson

On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > However there seems to be something missing at least to me, on why it'll
> > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > patch, but 0x0 if with it.
> 
> What CPU model are you using for the VM?

I didn't specify it, assuming it's qemu64 with no extra parameters.

I just tried two other options with: (1) -cpu host, and (2) -cpu Haswell
(the choice of Haswell was really random..), with the same 5.15->5.18
migration scenario, both of them will not trigger the same guest kernel
crash.  Only qemu64 will.

Both hosts have Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.

> For example, if the source lacks this patch but the destination has it,
> the source will transmit YMM registers, but the destination will fail to
> set them if they are not available for the selected CPU model.
> 
> See the commit message: "As a bonus, it will also fail if userspace tries to
> set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> the guest configuration.  Such features will never be returned by
> KVM_GET_XSAVE or KVM_GET_XSAVE2."

IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
(probably by failing validate_user_xstate_header when checking against the
user_xfeatures on dest host). But that's probably not my case, because here
KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
the precopy migration completes (or for postcopy when the switchover is
done).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-06 21:27         ` Peter Xu
@ 2022-06-07 12:54           ` Paolo Bonzini
  2022-06-07 15:04             ` Sean Christopherson
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2022-06-07 12:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sasha Levin, linux-kernel, stable, Leonardo Bras, tglx, mingo,
	bp, dave.hansen, x86, chang.seok.bae, luto, kvm,
	Sean Christopherson

On 6/6/22 23:27, Peter Xu wrote:
> On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
>>> However there seems to be something missing at least to me, on why it'll
>>> fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
>>> In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
>>> patch, but 0x0 if with it.
>>
>> What CPU model are you using for the VM?
> 
> I didn't specify it, assuming it's qemu64 with no extra parameters.

Ok, so indeed it lacks AVX and this patch can have an effect.

>> For example, if the source lacks this patch but the destination has it,
>> the source will transmit YMM registers, but the destination will fail to
>> set them if they are not available for the selected CPU model.
>>
>> See the commit message: "As a bonus, it will also fail if userspace tries to
>> set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
>> the guest configuration.  Such features will never be returned by
>> KVM_GET_XSAVE or KVM_GET_XSAVE2."
> 
> IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> (probably by failing validate_user_xstate_header when checking against the
> user_xfeatures on dest host). But that's probably not my case, because here
> KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> the precopy migration completes (or for postcopy when the switchover is
> done).

Difficult to say what's happening without seeing at least the guest code 
around the double fault (above you said "fail a migration" and I thought 
that was a different scenario than the double fault), and possibly which 
was the first exception that contributed to the double fault.

Paolo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-07 12:54           ` Paolo Bonzini
@ 2022-06-07 15:04             ` Sean Christopherson
  2022-06-07 18:17               ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2022-06-07 15:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, Sasha Levin, linux-kernel, stable, Leonardo Bras, tglx,
	mingo, bp, dave.hansen, x86, chang.seok.bae, luto, kvm

On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> On 6/6/22 23:27, Peter Xu wrote:
> > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > However there seems to be something missing at least to me, on why it'll
> > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > patch, but 0x0 if with it.
> > > 
> > > What CPU model are you using for the VM?
> > 
> > I didn't specify it, assuming it's qemu64 with no extra parameters.
> 
> Ok, so indeed it lacks AVX and this patch can have an effect.
> 
> > > For example, if the source lacks this patch but the destination has it,
> > > the source will transmit YMM registers, but the destination will fail to
> > > set them if they are not available for the selected CPU model.
> > > 
> > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > the guest configuration.  Such features will never be returned by
> > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > 
> > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > (probably by failing validate_user_xstate_header when checking against the
> > user_xfeatures on dest host). But that's probably not my case, because here
> > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > the precopy migration completes (or for postcopy when the switchover is
> > done).
> 
> Difficult to say what's happening without seeing at least the guest code
> around the double fault (above you said "fail a migration" and I thought
> that was a different scenario than the double fault), and possibly which was
> the first exception that contributed to the double fault.

Regardless of why the guest explodes in the way it does, is someone planning on
bisecting this (if necessary?) and sending a backport to v5.15?  There's another
bug report that is more than likely hitting the same bug.

https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-07 15:04             ` Sean Christopherson
@ 2022-06-07 18:17               ` Peter Xu
  2022-06-07 18:47                 ` Sean Christopherson
  2022-06-07 18:55                 ` Peter Xu
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Xu @ 2022-06-07 18:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Sasha Levin, linux-kernel, stable, Leonardo Bras,
	tglx, mingo, bp, dave.hansen, x86, chang.seok.bae, luto, kvm

On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote:
> On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> > On 6/6/22 23:27, Peter Xu wrote:
> > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > > However there seems to be something missing at least to me, on why it'll
> > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > > patch, but 0x0 if with it.
> > > > 
> > > > What CPU model are you using for the VM?
> > > 
> > > I didn't specify it, assuming it's qemu64 with no extra parameters.
> > 
> > Ok, so indeed it lacks AVX and this patch can have an effect.
> > 
> > > > For example, if the source lacks this patch but the destination has it,
> > > > the source will transmit YMM registers, but the destination will fail to
> > > > set them if they are not available for the selected CPU model.
> > > > 
> > > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > > the guest configuration.  Such features will never be returned by
> > > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > > 
> > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > > (probably by failing validate_user_xstate_header when checking against the
> > > user_xfeatures on dest host). But that's probably not my case, because here
> > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > > the precopy migration completes (or for postcopy when the switchover is
> > > done).
> > 
> > Difficult to say what's happening without seeing at least the guest code
> > around the double fault (above you said "fail a migration" and I thought
> > that was a different scenario than the double fault), and possibly which was
> > the first exception that contributed to the double fault.
> 
> Regardless of why the guest explodes in the way it does, is someone planning on
> bisecting this (if necessary?) and sending a backport to v5.15?  There's another
> bug report that is more than likely hitting the same bug.

What's the bisection you mentioned?  I actually did a bisection and I also
checked reverting Leo's change can also fix this issue.  Or do you mean
something else?

> 
> https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net

That is kvm64, and I agree it could be the same problem since both qemu64
and kvm64 models do not have any xsave feature bit declared in cpuid 0xd,
so potentially we could be migrating some fpu states to it even with
user_xfeatures==0 on dest host.

So today I continued the investigation, and I think what's really missing
is qemu seems to be ignoring the user_xfeatures check for KVM_SET_XSAVE and
continues even if it returns -EINVAL.  IOW, I'm wondering whether we should
fail properly and start to check kvm_arch_put_registers() retcode.  But
that'll be a QEMU fix, and it'll at least not causing random faults
(e.g. double faults) in guest but we should fail the migration gracefully.

Sean: a side note is that I can also easily trigger one WARN_ON_ONCE() in
your commit 98c25ead5eda5 in kvm_arch_vcpu_ioctl_run():

	WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));

It'll be great if you'd like to check that up.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-07 18:17               ` Peter Xu
@ 2022-06-07 18:47                 ` Sean Christopherson
  2022-06-07 21:01                   ` Peter Xu
  2022-06-07 18:55                 ` Peter Xu
  1 sibling, 1 reply; 19+ messages in thread
From: Sean Christopherson @ 2022-06-07 18:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: Paolo Bonzini, Sasha Levin, linux-kernel, stable, Leonardo Bras,
	tglx, mingo, bp, dave.hansen, x86, chang.seok.bae, luto, kvm

On Tue, Jun 07, 2022, Peter Xu wrote:
> On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote:
> > On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> > > On 6/6/22 23:27, Peter Xu wrote:
> > > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > > > However there seems to be something missing at least to me, on why it'll
> > > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > > > patch, but 0x0 if with it.
> > > > > 
> > > > > What CPU model are you using for the VM?
> > > > 
> > > > I didn't specify it, assuming it's qemu64 with no extra parameters.
> > > 
> > > Ok, so indeed it lacks AVX and this patch can have an effect.
> > > 
> > > > > For example, if the source lacks this patch but the destination has it,
> > > > > the source will transmit YMM registers, but the destination will fail to
> > > > > set them if they are not available for the selected CPU model.
> > > > > 
> > > > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > > > the guest configuration.  Such features will never be returned by
> > > > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > > > 
> > > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > > > (probably by failing validate_user_xstate_header when checking against the
> > > > user_xfeatures on dest host). But that's probably not my case, because here
> > > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > > > the precopy migration completes (or for postcopy when the switchover is
> > > > done).
> > > 
> > > Difficult to say what's happening without seeing at least the guest code
> > > around the double fault (above you said "fail a migration" and I thought
> > > that was a different scenario than the double fault), and possibly which was
> > > the first exception that contributed to the double fault.
> > 
> > Regardless of why the guest explodes in the way it does, is someone planning on
> > bisecting this (if necessary?) and sending a backport to v5.15?  There's another
> > bug report that is more than likely hitting the same bug.
> 
> What's the bisection you mentioned?  I actually did a bisection and I also
> checked reverting Leo's change can also fix this issue.  Or do you mean
> something else?

Oooooh, sorry!  I got completely turned around.  You ran into a bug with the
fix.  I thought that you were hitting the same issues as Mike where migrating
between hosts with different capabilities is broken in v5.15, but works in v5.18.

> > https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net
> 
> That is kvm64, and I agree it could be the same problem since both qemu64
> and kvm64 models do not have any xsave feature bit declared in cpuid 0xd,
> so potentially we could be migrating some fpu states to it even with
> user_xfeatures==0 on dest host.
> 
> So today I continued the investigation, and I think what's really missing
> is qemu seems to be ignoring the user_xfeatures check for KVM_SET_XSAVE and
> continues even if it returns -EINVAL.  IOW, I'm wondering whether we should
> fail properly and start to check kvm_arch_put_registers() retcode.  But
> that'll be a QEMU fix, and it'll at least not causing random faults
> (e.g. double faults) in guest but we should fail the migration gracefully.
> 
> Sean: a side note is that I can also easily trigger one WARN_ON_ONCE() in
> your commit 98c25ead5eda5 in kvm_arch_vcpu_ioctl_run():
> 
> 	WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
> 
> It'll be great if you'd like to check that up.

Ugh, userspace can force KVM_MP_STATE_UNINITIALIZED via KVM_SET_MP_STATE.  Looks
like QEMU does that when emulating RESET.

Logically, a full RESET of the xAPIC seems like the right thing to do.  I think
we can get away with that without breaking ABI?  And kvm_lapic_reset() has a
related bug where it stops the HR timer but not doesn't handle the HV timer :-/

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e69b83708f05..948aba894245 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2395,7 +2395,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
                return;

        /* Stop the timer in case it's a reset to an active apic */
-       hrtimer_cancel(&apic->lapic_timer.timer);
+       cancel_apic_timer(&apic->lapic_timer.timer);

        /* The xAPIC ID is set at RESET even if the APIC was already enabled. */
        if (!init_event)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 540651cd28d7..ed2c7cb1642d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10912,6 +10912,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
             mp_state->mp_state == KVM_MP_STATE_INIT_RECEIVED))
                goto out;

+       if (mp_state->mp_state == KVM_MP_STATE_UNINITIALIZED)
+               kvm_lapic_reset(vcpu, false);
+
        if (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED) {
                vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
                set_bit(KVM_APIC_SIPI, &vcpu->arch.apic->pending_events);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-07 18:17               ` Peter Xu
  2022-06-07 18:47                 ` Sean Christopherson
@ 2022-06-07 18:55                 ` Peter Xu
  2022-06-08 20:34                   ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Xu @ 2022-06-07 18:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Sasha Levin, linux-kernel, stable, Leonardo Bras,
	tglx, mingo, bp, dave.hansen, x86, chang.seok.bae, luto, kvm

On Tue, Jun 07, 2022 at 02:17:54PM -0400, Peter Xu wrote:
> On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote:
> > On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> > > On 6/6/22 23:27, Peter Xu wrote:
> > > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > > > However there seems to be something missing at least to me, on why it'll
> > > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > > > patch, but 0x0 if with it.
> > > > > 
> > > > > What CPU model are you using for the VM?
> > > > 
> > > > I didn't specify it, assuming it's qemu64 with no extra parameters.
> > > 
> > > Ok, so indeed it lacks AVX and this patch can have an effect.
> > > 
> > > > > For example, if the source lacks this patch but the destination has it,
> > > > > the source will transmit YMM registers, but the destination will fail to
> > > > > set them if they are not available for the selected CPU model.
> > > > > 
> > > > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > > > the guest configuration.  Such features will never be returned by
> > > > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > > > 
> > > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > > > (probably by failing validate_user_xstate_header when checking against the
> > > > user_xfeatures on dest host). But that's probably not my case, because here
> > > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > > > the precopy migration completes (or for postcopy when the switchover is
> > > > done).
> > > 
> > > Difficult to say what's happening without seeing at least the guest code
> > > around the double fault (above you said "fail a migration" and I thought
> > > that was a different scenario than the double fault), and possibly which was
> > > the first exception that contributed to the double fault.
> > 
> > Regardless of why the guest explodes in the way it does, is someone planning on
> > bisecting this (if necessary?) and sending a backport to v5.15?  There's another
> > bug report that is more than likely hitting the same bug.
> 
> What's the bisection you mentioned?  I actually did a bisection and I also
> checked reverting Leo's change can also fix this issue.  Or do you mean
> something else?

Ah, I forgot to mention on the "stable tree decisions": IIUC it also means
we should apply Leo's patch to all the stable trees if possible, then
migrations between them won't trigger the misterous faults anymore,
including when migrating to the latest Linux versions.

However there's the delimma that other kernels (any kernel that does not
have Leo's patch) will start to fail migrations to the stable branches that
apply Leo's patch too..  So that's kind of a slight pity.  It's just IIUC
the stable trees are more important, because it should have a broader
audience (most Linux distros)?

> 
> > 
> > https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net
> 
> That is kvm64, and I agree it could be the same problem since both qemu64
> and kvm64 models do not have any xsave feature bit declared in cpuid 0xd,
> so potentially we could be migrating some fpu states to it even with
> user_xfeatures==0 on dest host.
> 
> So today I continued the investigation, and I think what's really missing
> is qemu seems to be ignoring the user_xfeatures check for KVM_SET_XSAVE and
> continues even if it returns -EINVAL.  IOW, I'm wondering whether we should
> fail properly and start to check kvm_arch_put_registers() retcode.  But
> that'll be a QEMU fix, and it'll at least not causing random faults
> (e.g. double faults) in guest but we should fail the migration gracefully.
> 
> Sean: a side note is that I can also easily trigger one WARN_ON_ONCE() in
> your commit 98c25ead5eda5 in kvm_arch_vcpu_ioctl_run():
> 
> 	WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
> 
> It'll be great if you'd like to check that up.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-07 18:47                 ` Sean Christopherson
@ 2022-06-07 21:01                   ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2022-06-07 21:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Sasha Levin, linux-kernel, stable, Leonardo Bras,
	tglx, mingo, bp, dave.hansen, x86, chang.seok.bae, luto, kvm

On Tue, Jun 07, 2022 at 06:47:41PM +0000, Sean Christopherson wrote:
> On Tue, Jun 07, 2022, Peter Xu wrote:
> > On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote:
> > > On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> > > > On 6/6/22 23:27, Peter Xu wrote:
> > > > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > > > > However there seems to be something missing at least to me, on why it'll
> > > > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > > > > patch, but 0x0 if with it.
> > > > > > 
> > > > > > What CPU model are you using for the VM?
> > > > > 
> > > > > I didn't specify it, assuming it's qemu64 with no extra parameters.
> > > > 
> > > > Ok, so indeed it lacks AVX and this patch can have an effect.
> > > > 
> > > > > > For example, if the source lacks this patch but the destination has it,
> > > > > > the source will transmit YMM registers, but the destination will fail to
> > > > > > set them if they are not available for the selected CPU model.
> > > > > > 
> > > > > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > > > > the guest configuration.  Such features will never be returned by
> > > > > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > > > > 
> > > > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > > > > (probably by failing validate_user_xstate_header when checking against the
> > > > > user_xfeatures on dest host). But that's probably not my case, because here
> > > > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > > > > the precopy migration completes (or for postcopy when the switchover is
> > > > > done).
> > > > 
> > > > Difficult to say what's happening without seeing at least the guest code
> > > > around the double fault (above you said "fail a migration" and I thought
> > > > that was a different scenario than the double fault), and possibly which was
> > > > the first exception that contributed to the double fault.
> > > 
> > > Regardless of why the guest explodes in the way it does, is someone planning on
> > > bisecting this (if necessary?) and sending a backport to v5.15?  There's another
> > > bug report that is more than likely hitting the same bug.
> > 
> > What's the bisection you mentioned?  I actually did a bisection and I also
> > checked reverting Leo's change can also fix this issue.  Or do you mean
> > something else?
> 
> Oooooh, sorry!  I got completely turned around.  You ran into a bug with the
> fix.  I thought that you were hitting the same issues as Mike where migrating
> between hosts with different capabilities is broken in v5.15, but works in v5.18.

Aha, no worry.

> 
> > > https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net
> > 
> > That is kvm64, and I agree it could be the same problem since both qemu64
> > and kvm64 models do not have any xsave feature bit declared in cpuid 0xd,
> > so potentially we could be migrating some fpu states to it even with
> > user_xfeatures==0 on dest host.
> > 
> > So today I continued the investigation, and I think what's really missing
> > is qemu seems to be ignoring the user_xfeatures check for KVM_SET_XSAVE and
> > continues even if it returns -EINVAL.  IOW, I'm wondering whether we should
> > fail properly and start to check kvm_arch_put_registers() retcode.  But
> > that'll be a QEMU fix, and it'll at least not causing random faults
> > (e.g. double faults) in guest but we should fail the migration gracefully.
> > 
> > Sean: a side note is that I can also easily trigger one WARN_ON_ONCE() in
> > your commit 98c25ead5eda5 in kvm_arch_vcpu_ioctl_run():
> > 
> > 	WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
> > 
> > It'll be great if you'd like to check that up.
> 
> Ugh, userspace can force KVM_MP_STATE_UNINITIALIZED via KVM_SET_MP_STATE.  Looks
> like QEMU does that when emulating RESET.
> 
> Logically, a full RESET of the xAPIC seems like the right thing to do.  I think
> we can get away with that without breaking ABI?  And kvm_lapic_reset() has a
> related bug where it stops the HR timer but not doesn't handle the HV timer :-/
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e69b83708f05..948aba894245 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2395,7 +2395,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>                 return;
> 
>         /* Stop the timer in case it's a reset to an active apic */
> -       hrtimer_cancel(&apic->lapic_timer.timer);
> +       cancel_apic_timer(&apic->lapic_timer.timer);

Needs to be:

  +       cancel_apic_timer(apic);

> 
>         /* The xAPIC ID is set at RESET even if the APIC was already enabled. */
>         if (!init_event)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 540651cd28d7..ed2c7cb1642d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10912,6 +10912,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>              mp_state->mp_state == KVM_MP_STATE_INIT_RECEIVED))
>                 goto out;
> 
> +       if (mp_state->mp_state == KVM_MP_STATE_UNINITIALIZED)
> +               kvm_lapic_reset(vcpu, false);
> +
>         if (mp_state->mp_state == KVM_MP_STATE_SIPI_RECEIVED) {
>                 vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>                 set_bit(KVM_APIC_SIPI, &vcpu->arch.apic->pending_events);
> 

The change looks reasonable, but sadly I did a quick run and it still
triggers.. :-/ So there seems to be something else missing.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-07 18:55                 ` Peter Xu
@ 2022-06-08 20:34                   ` Leonardo Bras Soares Passos
  2022-06-08 20:53                     ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Leonardo Bras Soares Passos @ 2022-06-08 20:34 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sean Christopherson, Paolo Bonzini, Sasha Levin, linux-kernel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Chang S. Bae, Andy Lutomirski, kvm

Hello Peter,

On Tue, Jun 7, 2022 at 5:07 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Tue, Jun 07, 2022 at 02:17:54PM -0400, Peter Xu wrote:
> > On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote:
> > > On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> > > > On 6/6/22 23:27, Peter Xu wrote:
> > > > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > > > > However there seems to be something missing at least to me, on why it'll
> > > > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > > > > patch, but 0x0 if with it.
> > > > > >
> > > > > > What CPU model are you using for the VM?
> > > > >
> > > > > I didn't specify it, assuming it's qemu64 with no extra parameters.
> > > >
> > > > Ok, so indeed it lacks AVX and this patch can have an effect.
> > > >
> > > > > > For example, if the source lacks this patch but the destination has it,
> > > > > > the source will transmit YMM registers, but the destination will fail to
> > > > > > set them if they are not available for the selected CPU model.
> > > > > >
> > > > > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > > > > the guest configuration.  Such features will never be returned by
> > > > > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > > > >
> > > > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > > > > (probably by failing validate_user_xstate_header when checking against the
> > > > > user_xfeatures on dest host). But that's probably not my case, because here
> > > > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > > > > the precopy migration completes (or for postcopy when the switchover is
> > > > > done).
> > > >
> > > > Difficult to say what's happening without seeing at least the guest code
> > > > around the double fault (above you said "fail a migration" and I thought
> > > > that was a different scenario than the double fault), and possibly which was
> > > > the first exception that contributed to the double fault.
> > >
> > > Regardless of why the guest explodes in the way it does, is someone planning on
> > > bisecting this (if necessary?) and sending a backport to v5.15?  There's another
> > > bug report that is more than likely hitting the same bug.
> >
> > What's the bisection you mentioned?  I actually did a bisection and I also
> > checked reverting Leo's change can also fix this issue.  Or do you mean
> > something else?
>
> Ah, I forgot to mention on the "stable tree decisions": IIUC it also means
> we should apply Leo's patch to all the stable trees if possible, then
> migrations between them won't trigger the misterous faults anymore,
> including when migrating to the latest Linux versions.
>
> However there's the delimma that other kernels (any kernel that does not
> have Leo's patch) will start to fail migrations to the stable branches that
> apply Leo's patch too..

IIUC, you commented before that the migration issue should be solved with a
QEMU fix, is that correct? That would mean something like 'QEMU is relying on a
kernel bug to work', and should be no blocker for fixing the kernel.

If that's the case, I think we should apply the fix to every supported
stable branch that
have the fpku issue, and in parallel come with a qemu fix for that.

What do you think about it?

Best regards,
Leo

> So that's kind of a slight pity.  It's just IIUC
> the stable trees are more important, because it should have a broader
> audience (most Linux distros)?
>
> >
> > >
> > > https://lore.kernel.org/all/48353e0d-e771-8a97-21d4-c65ff3bc4192@sentex.net
> >
> > That is kvm64, and I agree it could be the same problem since both qemu64
> > and kvm64 models do not have any xsave feature bit declared in cpuid 0xd,
> > so potentially we could be migrating some fpu states to it even with
> > user_xfeatures==0 on dest host.
> >
> > So today I continued the investigation, and I think what's really missing
> > is qemu seems to be ignoring the user_xfeatures check for KVM_SET_XSAVE and
> > continues even if it returns -EINVAL.  IOW, I'm wondering whether we should
> > fail properly and start to check kvm_arch_put_registers() retcode.  But
> > that'll be a QEMU fix, and it'll at least not causing random faults
> > (e.g. double faults) in guest but we should fail the migration gracefully.
> >
> > Sean: a side note is that I can also easily trigger one WARN_ON_ONCE() in
> > your commit 98c25ead5eda5 in kvm_arch_vcpu_ioctl_run():
> >
> >       WARN_ON_ONCE(kvm_lapic_hv_timer_in_use(vcpu));
> >
> > It'll be great if you'd like to check that up.
> >
> > Thanks,
> >
> > --
> > Peter Xu
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  2022-06-08 20:34                   ` Leonardo Bras Soares Passos
@ 2022-06-08 20:53                     ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2022-06-08 20:53 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Sean Christopherson, Paolo Bonzini, Sasha Levin, linux-kernel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Chang S. Bae, Andy Lutomirski, kvm

On Wed, Jun 08, 2022 at 05:34:18PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter,
> 
> On Tue, Jun 7, 2022 at 5:07 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Jun 07, 2022 at 02:17:54PM -0400, Peter Xu wrote:
> > > On Tue, Jun 07, 2022 at 03:04:27PM +0000, Sean Christopherson wrote:
> > > > On Tue, Jun 07, 2022, Paolo Bonzini wrote:
> > > > > On 6/6/22 23:27, Peter Xu wrote:
> > > > > > On Mon, Jun 06, 2022 at 06:18:12PM +0200, Paolo Bonzini wrote:
> > > > > > > > However there seems to be something missing at least to me, on why it'll
> > > > > > > > fail a migration from 5.15 (without this patch) to 5.18 (with this patch).
> > > > > > > > In my test case, user_xfeatures will be 0x7 (FP|SSE|YMM) if without this
> > > > > > > > patch, but 0x0 if with it.
> > > > > > >
> > > > > > > What CPU model are you using for the VM?
> > > > > >
> > > > > > I didn't specify it, assuming it's qemu64 with no extra parameters.
> > > > >
> > > > > Ok, so indeed it lacks AVX and this patch can have an effect.
> > > > >
> > > > > > > For example, if the source lacks this patch but the destination has it,
> > > > > > > the source will transmit YMM registers, but the destination will fail to
> > > > > > > set them if they are not available for the selected CPU model.
> > > > > > >
> > > > > > > See the commit message: "As a bonus, it will also fail if userspace tries to
> > > > > > > set fpu features (with the KVM_SET_XSAVE ioctl) that are not compatible to
> > > > > > > the guest configuration.  Such features will never be returned by
> > > > > > > KVM_GET_XSAVE or KVM_GET_XSAVE2."
> > > > > >
> > > > > > IIUC you meant we should have failed KVM_SET_XSAVE when they're not aligned
> > > > > > (probably by failing validate_user_xstate_header when checking against the
> > > > > > user_xfeatures on dest host). But that's probably not my case, because here
> > > > > > KVM_SET_XSAVE succeeded, it's just that the guest gets a double fault after
> > > > > > the precopy migration completes (or for postcopy when the switchover is
> > > > > > done).
> > > > >
> > > > > Difficult to say what's happening without seeing at least the guest code
> > > > > around the double fault (above you said "fail a migration" and I thought
> > > > > that was a different scenario than the double fault), and possibly which was
> > > > > the first exception that contributed to the double fault.
> > > >
> > > > Regardless of why the guest explodes in the way it does, is someone planning on
> > > > bisecting this (if necessary?) and sending a backport to v5.15?  There's another
> > > > bug report that is more than likely hitting the same bug.
> > >
> > > What's the bisection you mentioned?  I actually did a bisection and I also
> > > checked reverting Leo's change can also fix this issue.  Or do you mean
> > > something else?
> >
> > Ah, I forgot to mention on the "stable tree decisions": IIUC it also means
> > we should apply Leo's patch to all the stable trees if possible, then
> > migrations between them won't trigger the misterous faults anymore,
> > including when migrating to the latest Linux versions.
> >
> > However there's the delimma that other kernels (any kernel that does not
> > have Leo's patch) will start to fail migrations to the stable branches that
> > apply Leo's patch too..
> 
> IIUC, you commented before that the migration issue should be solved with a
> QEMU fix, is that correct? That would mean something like 'QEMU is relying on a
> kernel bug to work', and should be no blocker for fixing the kernel.

The QEMU fix (that I posted [1]) is not a real fix, only the kernel fix is.

The QEMU patchset only allows the migration to fail early, the kernel patch
allows the migration to go through with no problem as long as both sides
are applied with the fix (or both are not..).  So there're two issues we're
tackling with and IMHO we should fix both.

[1] https://lore.kernel.org/qemu-devel/20220607230645.53950-1-peterx@redhat.com/

> 
> If that's the case, I think we should apply the fix to every supported
> stable branch that
> have the fpku issue, and in parallel come with a qemu fix for that.
> 
> What do you think about it?

Yes I mostly agree with you. I think your patch still does the right thing
by not migrating anything the guest doesn't even support, and that seems to
be the only way to fix the pksu-like issue on migrations between hosts with
different processor configurations.  But it'll also bring other unwanted
side effects, that's why IMHO we need some careful thoughts and I hope I
didn't miss anything important.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-06-08 20:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220301201344.18191-1-sashal@kernel.org>
2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 05/28] KVM: Fix lockdep false negative during host resume Sasha Levin
2022-03-01 20:19   ` Paolo Bonzini
2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 06/28] kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode Sasha Levin
2022-03-01 20:19   ` Paolo Bonzini
2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 07/28] x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0 Sasha Levin
2022-03-01 20:22   ` Paolo Bonzini
2022-06-03 18:40     ` Peter Xu
2022-06-06 16:18       ` Paolo Bonzini
2022-06-06 21:27         ` Peter Xu
2022-06-07 12:54           ` Paolo Bonzini
2022-06-07 15:04             ` Sean Christopherson
2022-06-07 18:17               ` Peter Xu
2022-06-07 18:47                 ` Sean Christopherson
2022-06-07 21:01                   ` Peter Xu
2022-06-07 18:55                 ` Peter Xu
2022-06-08 20:34                   ` Leonardo Bras Soares Passos
2022-06-08 20:53                     ` Peter Xu
2022-03-01 20:13 ` [PATCH AUTOSEL 5.16 12/28] x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU Sasha Levin
2022-03-01 20:17   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).