[PATCH] x86/vpmu: fix race-condition in vpmu_load

* [PATCH] x86/vpmu: fix race-condition in vpmu_load
@ 2022-09-15 14:01 Tamas K Lengyel
  2022-09-16 12:52 ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Tamas K Lengyel @ 2022-09-15 14:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Tamas K Lengyel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu

While experimenting with the vPMU subsystem an ASSERT failure was
observed in vmx_find_msr because the vcpu_runnable state was true.

The root cause of the bug appears to be the fact that the vPMU subsystem
doesn't save its state on context_switch. The vpmu_load function will attempt
to gather the PMU state if its still loaded two different ways:
    1. if the current pcpu is not where the vcpu ran before doing a remote save
    2. if the current pcpu had another vcpu active before doing a local save

However, in case the prev vcpu is being rescheduled on another pcpu its state
has already changed and vcpu_runnable is returning true, thus #2 will trip the
ASSERT. The only way to avoid this race condition is to make sure the
prev vcpu is paused while being checked and its context saved. Once the prev
vcpu is resumed and does #1 it will find its state already saved.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
---
 xen/arch/x86/cpu/vpmu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index cacc24a30f..076c2e5a8d 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -419,8 +419,10 @@ int vpmu_load(struct vcpu *v, bool_t from_guest)
         vpmu = vcpu_vpmu(prev);
 
         /* Someone ran here before us */
+        vcpu_pause(prev);
         vpmu_save_force(prev);
         vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+        vcpu_unpause(prev);
 
         vpmu = vcpu_vpmu(v);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread