* [PATCH 0/2] KVM: x86: fix lockdep splat due to Xen runstate update @ 2021-03-30 16:59 Paolo Bonzini 2021-03-30 16:59 ` [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections Paolo Bonzini 2021-03-30 16:59 ` [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken Paolo Bonzini 0 siblings, 2 replies; 14+ messages in thread From: Paolo Bonzini @ 2021-03-30 16:59 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: mtosatti, vkuznets, dwmw pvclock_gtod_sync_lock can be taken with interrupts disabled if the preempt notifier calls get_kvmclock_ns to update the Xen runstate information: spin_lock include/linux/spinlock.h:354 [inline] get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 So change the users of the spinlock to spin_lock_irqsave and spin_unlock_irqrestore. Before doing that, patch 1 just makes the pvclock_gtod_sync_lock critical sections as small as possible so that the resulting irq-disabled sections are easier to review. Paolo Paolo Bonzini (2): KVM: x86: reduce pvclock_gtod_sync_lock critical sections KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken arch/x86/kvm/x86.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections 2021-03-30 16:59 [PATCH 0/2] KVM: x86: fix lockdep splat due to Xen runstate update Paolo Bonzini @ 2021-03-30 16:59 ` Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li 2021-04-07 17:40 ` Marcelo Tosatti 2021-03-30 16:59 ` [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken Paolo Bonzini 1 sibling, 2 replies; 14+ messages in thread From: Paolo Bonzini @ 2021-03-30 16:59 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: mtosatti, vkuznets, dwmw There is no need to include changes to vcpu->requests into the pvclock_gtod_sync_lock critical section. The changes to the shared data structures (in pvclock_update_vm_gtod_copy) already occur under the lock. Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/x86.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fe806e894212..0a83eff40b43 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2562,10 +2562,12 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) kvm_hv_invalidate_tsc_page(kvm); - spin_lock(&ka->pvclock_gtod_sync_lock); kvm_make_mclock_inprogress_request(kvm); + /* no guest entries from this point */ + spin_lock(&ka->pvclock_gtod_sync_lock); pvclock_update_vm_gtod_copy(kvm); + spin_unlock(&ka->pvclock_gtod_sync_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -2573,8 +2575,6 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) /* guest entries allowed */ kvm_for_each_vcpu(i, vcpu, kvm) kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); - - spin_unlock(&ka->pvclock_gtod_sync_lock); #endif } @@ -7740,16 +7740,14 @@ static void kvm_hyperv_tsc_notifier(void) struct kvm_arch *ka = &kvm->arch; spin_lock(&ka->pvclock_gtod_sync_lock); - pvclock_update_vm_gtod_copy(kvm); + spin_unlock(&ka->pvclock_gtod_sync_lock); kvm_for_each_vcpu(cpu, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); kvm_for_each_vcpu(cpu, vcpu, kvm) kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); - - spin_unlock(&ka->pvclock_gtod_sync_lock); } mutex_unlock(&kvm_lock); } -- 2.26.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections 2021-03-30 16:59 ` [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections Paolo Bonzini @ 2021-03-31 1:41 ` Wanpeng Li 2021-04-07 17:40 ` Marcelo Tosatti 1 sibling, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-03-31 1:41 UTC (permalink / raw) To: Paolo Bonzini Cc: LKML, kvm, Marcelo Tosatti, Vitaly Kuznetsov, David Woodhouse On Wed, 31 Mar 2021 at 01:02, Paolo Bonzini <pbonzini@redhat.com> wrote: > > There is no need to include changes to vcpu->requests into > the pvclock_gtod_sync_lock critical section. The changes to > the shared data structures (in pvclock_update_vm_gtod_copy) > already occur under the lock. > > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> > --- > arch/x86/kvm/x86.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fe806e894212..0a83eff40b43 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2562,10 +2562,12 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > > kvm_hv_invalidate_tsc_page(kvm); > > - spin_lock(&ka->pvclock_gtod_sync_lock); > kvm_make_mclock_inprogress_request(kvm); > + > /* no guest entries from this point */ > + spin_lock(&ka->pvclock_gtod_sync_lock); > pvclock_update_vm_gtod_copy(kvm); > + spin_unlock(&ka->pvclock_gtod_sync_lock); > > kvm_for_each_vcpu(i, vcpu, kvm) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > @@ -2573,8 +2575,6 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > /* guest entries allowed */ > kvm_for_each_vcpu(i, vcpu, kvm) > kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); > - > - spin_unlock(&ka->pvclock_gtod_sync_lock); > #endif > } > > @@ -7740,16 +7740,14 @@ static void kvm_hyperv_tsc_notifier(void) > struct kvm_arch *ka = &kvm->arch; > > spin_lock(&ka->pvclock_gtod_sync_lock); > - > pvclock_update_vm_gtod_copy(kvm); > + spin_unlock(&ka->pvclock_gtod_sync_lock); > > kvm_for_each_vcpu(cpu, vcpu, kvm) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > > kvm_for_each_vcpu(cpu, vcpu, kvm) > kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); > - > - spin_unlock(&ka->pvclock_gtod_sync_lock); > } > mutex_unlock(&kvm_lock); > } > -- > 2.26.2 > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections 2021-03-30 16:59 ` [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li @ 2021-04-07 17:40 ` Marcelo Tosatti 2021-04-08 8:15 ` Paolo Bonzini 1 sibling, 1 reply; 14+ messages in thread From: Marcelo Tosatti @ 2021-04-07 17:40 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm, vkuznets, dwmw On Tue, Mar 30, 2021 at 12:59:57PM -0400, Paolo Bonzini wrote: > There is no need to include changes to vcpu->requests into > the pvclock_gtod_sync_lock critical section. The changes to > the shared data structures (in pvclock_update_vm_gtod_copy) > already occur under the lock. > > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > arch/x86/kvm/x86.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index fe806e894212..0a83eff40b43 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2562,10 +2562,12 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > > kvm_hv_invalidate_tsc_page(kvm); > > - spin_lock(&ka->pvclock_gtod_sync_lock); > kvm_make_mclock_inprogress_request(kvm); > + Might be good to serialize against two kvm_gen_update_masterclock callers? Otherwise one caller could clear KVM_REQ_MCLOCK_INPROGRESS, while the other is still at pvclock_update_vm_gtod_copy(). Otherwise, looks good. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections 2021-04-07 17:40 ` Marcelo Tosatti @ 2021-04-08 8:15 ` Paolo Bonzini 2021-04-08 12:00 ` Marcelo Tosatti 0 siblings, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2021-04-08 8:15 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel, kvm, vkuznets, dwmw On 07/04/21 19:40, Marcelo Tosatti wrote: >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index fe806e894212..0a83eff40b43 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -2562,10 +2562,12 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) >> >> kvm_hv_invalidate_tsc_page(kvm); >> >> - spin_lock(&ka->pvclock_gtod_sync_lock); >> kvm_make_mclock_inprogress_request(kvm); >> + > Might be good to serialize against two kvm_gen_update_masterclock > callers? Otherwise one caller could clear KVM_REQ_MCLOCK_INPROGRESS, > while the other is still at pvclock_update_vm_gtod_copy(). Makes sense, but this stuff has always seemed unnecessarily complicated to me. KVM_REQ_MCLOCK_INPROGRESS is only needed to kick running vCPUs out of the execution loop; clearing it in kvm_gen_update_masterclock is unnecessary, because KVM_REQ_CLOCK_UPDATE takes pvclock_gtod_sync_lock too and thus will already wait for pvclock_update_vm_gtod_copy to end. I think it's possible to use a seqcount in KVM_REQ_CLOCK_UPDATE instead of KVM_REQ_MCLOCK_INPROGRESS. Both cause the vCPUs to spin. I'll take a look. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections 2021-04-08 8:15 ` Paolo Bonzini @ 2021-04-08 12:00 ` Marcelo Tosatti 2021-04-08 12:25 ` Paolo Bonzini 0 siblings, 1 reply; 14+ messages in thread From: Marcelo Tosatti @ 2021-04-08 12:00 UTC (permalink / raw) To: Paolo Bonzini; +Cc: linux-kernel, kvm, vkuznets, dwmw Hi Paolo, On Thu, Apr 08, 2021 at 10:15:16AM +0200, Paolo Bonzini wrote: > On 07/04/21 19:40, Marcelo Tosatti wrote: > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index fe806e894212..0a83eff40b43 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -2562,10 +2562,12 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > > > kvm_hv_invalidate_tsc_page(kvm); > > > - spin_lock(&ka->pvclock_gtod_sync_lock); > > > kvm_make_mclock_inprogress_request(kvm); > > > + > > Might be good to serialize against two kvm_gen_update_masterclock > > callers? Otherwise one caller could clear KVM_REQ_MCLOCK_INPROGRESS, > > while the other is still at pvclock_update_vm_gtod_copy(). > > Makes sense, but this stuff has always seemed unnecessarily complicated to > me. > > KVM_REQ_MCLOCK_INPROGRESS is only needed to kick running vCPUs out of the > execution loop; We do not want vcpus with different system_timestamp/tsc_timestamp pair: * To avoid that problem, do not allow visibility of distinct * system_timestamp/tsc_timestamp values simultaneously: use a master * copy of host monotonic time values. Update that master copy * in lockstep. So KVM_REQ_MCLOCK_INPROGRESS also ensures that no vcpu enters guest mode (via vcpu->requests check before VM-entry) with a different system_timestamp/tsc_timestamp pair. > clearing it in kvm_gen_update_masterclock is unnecessary, > because KVM_REQ_CLOCK_UPDATE takes pvclock_gtod_sync_lock too and thus will > already wait for pvclock_update_vm_gtod_copy to end. > > I think it's possible to use a seqcount in KVM_REQ_CLOCK_UPDATE instead of > KVM_REQ_MCLOCK_INPROGRESS. Both cause the vCPUs to spin. I'll take a look. > > Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections 2021-04-08 12:00 ` Marcelo Tosatti @ 2021-04-08 12:25 ` Paolo Bonzini 0 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2021-04-08 12:25 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: linux-kernel, kvm, vkuznets, dwmw On 08/04/21 14:00, Marcelo Tosatti wrote: >> >> KVM_REQ_MCLOCK_INPROGRESS is only needed to kick running vCPUs out of the >> execution loop; > We do not want vcpus with different system_timestamp/tsc_timestamp > pair: > > * To avoid that problem, do not allow visibility of distinct > * system_timestamp/tsc_timestamp values simultaneously: use a master > * copy of host monotonic time values. Update that master copy > * in lockstep. > > So KVM_REQ_MCLOCK_INPROGRESS also ensures that no vcpu enters > guest mode (via vcpu->requests check before VM-entry) with a > different system_timestamp/tsc_timestamp pair. Yes this is what KVM_REQ_MCLOCK_INPROGRESS does, but it does not have to be done that way. All you really need is the IPI with KVM_REQUEST_WAIT, which ensures that updates happen after the vCPUs have exited guest mode. You don't need to loop on vcpu->requests for example, because kvm_guest_time_update could just spin on pvclock_gtod_sync_lock until pvclock_update_vm_gtod_copy is done. So this morning I tried protecting the kvm->arch fields for kvmclock using a seqcount, which is nice also because get_kvmclock_ns() does not have to bounce the cacheline of pvclock_gtod_sync_lock anymore. I'll post it tomorrow or next week. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken 2021-03-30 16:59 [PATCH 0/2] KVM: x86: fix lockdep splat due to Xen runstate update Paolo Bonzini 2021-03-30 16:59 ` [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections Paolo Bonzini @ 2021-03-30 16:59 ` Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: Paolo Bonzini @ 2021-03-30 16:59 UTC (permalink / raw) To: linux-kernel, kvm; +Cc: mtosatti, vkuznets, dwmw, syzbot+b282b65c2c68492df769 pvclock_gtod_sync_lock can be taken with interrupts disabled if the preempt notifier calls get_kvmclock_ns to update the Xen runstate information: spin_lock include/linux/spinlock.h:354 [inline] get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 So change the users of the spinlock to spin_lock_irqsave and spin_unlock_irqrestore. Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- arch/x86/kvm/x86.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0a83eff40b43..2bfd00da465f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2329,7 +2329,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) kvm_vcpu_write_tsc_offset(vcpu, offset); raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); - spin_lock(&kvm->arch.pvclock_gtod_sync_lock); + spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { kvm->arch.nr_vcpus_matched_tsc = 0; } else if (!already_matched) { @@ -2337,7 +2337,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) } kvm_track_tsc_matching(vcpu); - spin_unlock(&kvm->arch.pvclock_gtod_sync_lock); + spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, @@ -2559,15 +2559,16 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) int i; struct kvm_vcpu *vcpu; struct kvm_arch *ka = &kvm->arch; + unsigned long flags; kvm_hv_invalidate_tsc_page(kvm); kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - spin_lock(&ka->pvclock_gtod_sync_lock); + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); pvclock_update_vm_gtod_copy(kvm); - spin_unlock(&ka->pvclock_gtod_sync_lock); + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -2582,17 +2583,18 @@ u64 get_kvmclock_ns(struct kvm *kvm) { struct kvm_arch *ka = &kvm->arch; struct pvclock_vcpu_time_info hv_clock; + unsigned long flags; u64 ret; - spin_lock(&ka->pvclock_gtod_sync_lock); + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { - spin_unlock(&ka->pvclock_gtod_sync_lock); + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); return get_kvmclock_base_ns() + ka->kvmclock_offset; } hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; - spin_unlock(&ka->pvclock_gtod_sync_lock); + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); /* both __this_cpu_read() and rdtsc() should be on the same cpu */ get_cpu(); @@ -2686,13 +2688,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) * If the host uses TSC clock, then passthrough TSC as stable * to the guest. */ - spin_lock(&ka->pvclock_gtod_sync_lock); + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); use_master_clock = ka->use_master_clock; if (use_master_clock) { host_tsc = ka->master_cycle_now; kernel_ns = ka->master_kernel_ns; } - spin_unlock(&ka->pvclock_gtod_sync_lock); + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); @@ -7724,6 +7726,7 @@ static void kvm_hyperv_tsc_notifier(void) struct kvm *kvm; struct kvm_vcpu *vcpu; int cpu; + unsigned long flags; mutex_lock(&kvm_lock); list_for_each_entry(kvm, &vm_list, vm_list) @@ -7739,9 +7742,9 @@ static void kvm_hyperv_tsc_notifier(void) list_for_each_entry(kvm, &vm_list, vm_list) { struct kvm_arch *ka = &kvm->arch; - spin_lock(&ka->pvclock_gtod_sync_lock); + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); pvclock_update_vm_gtod_copy(kvm); - spin_unlock(&ka->pvclock_gtod_sync_lock); + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); kvm_for_each_vcpu(cpu, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); -- 2.26.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken 2021-03-30 16:59 ` [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken Paolo Bonzini @ 2021-03-31 1:41 ` Wanpeng Li 2021-04-01 15:27 ` [EXTERNAL] " David Woodhouse 2021-10-23 19:33 ` David Woodhouse 2 siblings, 0 replies; 14+ messages in thread From: Wanpeng Li @ 2021-03-31 1:41 UTC (permalink / raw) To: Paolo Bonzini Cc: LKML, kvm, Marcelo Tosatti, Vitaly Kuznetsov, David Woodhouse, syzbot On Wed, 31 Mar 2021 at 01:01, Paolo Bonzini <pbonzini@redhat.com> wrote: > > pvclock_gtod_sync_lock can be taken with interrupts disabled if the > preempt notifier calls get_kvmclock_ns to update the Xen > runstate information: > > spin_lock include/linux/spinlock.h:354 [inline] > get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 > kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 > kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 > kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] > kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 > > So change the users of the spinlock to spin_lock_irqsave and > spin_unlock_irqrestore. > > Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com > Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> > --- > arch/x86/kvm/x86.c | 25 ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 0a83eff40b43..2bfd00da465f 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2329,7 +2329,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) > kvm_vcpu_write_tsc_offset(vcpu, offset); > raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); > > - spin_lock(&kvm->arch.pvclock_gtod_sync_lock); > + spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); > if (!matched) { > kvm->arch.nr_vcpus_matched_tsc = 0; > } else if (!already_matched) { > @@ -2337,7 +2337,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) > } > > kvm_track_tsc_matching(vcpu); > - spin_unlock(&kvm->arch.pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); > } > > static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, > @@ -2559,15 +2559,16 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > int i; > struct kvm_vcpu *vcpu; > struct kvm_arch *ka = &kvm->arch; > + unsigned long flags; > > kvm_hv_invalidate_tsc_page(kvm); > > kvm_make_mclock_inprogress_request(kvm); > > /* no guest entries from this point */ > - spin_lock(&ka->pvclock_gtod_sync_lock); > + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > pvclock_update_vm_gtod_copy(kvm); > - spin_unlock(&ka->pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > kvm_for_each_vcpu(i, vcpu, kvm) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > @@ -2582,17 +2583,18 @@ u64 get_kvmclock_ns(struct kvm *kvm) > { > struct kvm_arch *ka = &kvm->arch; > struct pvclock_vcpu_time_info hv_clock; > + unsigned long flags; > u64 ret; > > - spin_lock(&ka->pvclock_gtod_sync_lock); > + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > if (!ka->use_master_clock) { > - spin_unlock(&ka->pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > return get_kvmclock_base_ns() + ka->kvmclock_offset; > } > > hv_clock.tsc_timestamp = ka->master_cycle_now; > hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; > - spin_unlock(&ka->pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > /* both __this_cpu_read() and rdtsc() should be on the same cpu */ > get_cpu(); > @@ -2686,13 +2688,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > * If the host uses TSC clock, then passthrough TSC as stable > * to the guest. > */ > - spin_lock(&ka->pvclock_gtod_sync_lock); > + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > use_master_clock = ka->use_master_clock; > if (use_master_clock) { > host_tsc = ka->master_cycle_now; > kernel_ns = ka->master_kernel_ns; > } > - spin_unlock(&ka->pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > /* Keep irq disabled to prevent changes to the clock */ > local_irq_save(flags); > @@ -7724,6 +7726,7 @@ static void kvm_hyperv_tsc_notifier(void) > struct kvm *kvm; > struct kvm_vcpu *vcpu; > int cpu; > + unsigned long flags; > > mutex_lock(&kvm_lock); > list_for_each_entry(kvm, &vm_list, vm_list) > @@ -7739,9 +7742,9 @@ static void kvm_hyperv_tsc_notifier(void) > list_for_each_entry(kvm, &vm_list, vm_list) { > struct kvm_arch *ka = &kvm->arch; > > - spin_lock(&ka->pvclock_gtod_sync_lock); > + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > pvclock_update_vm_gtod_copy(kvm); > - spin_unlock(&ka->pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > kvm_for_each_vcpu(cpu, vcpu, kvm) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > -- > 2.26.2 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [EXTERNAL] [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken 2021-03-30 16:59 ` [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li @ 2021-04-01 15:27 ` David Woodhouse 2021-04-01 16:36 ` Paolo Bonzini 2021-10-23 19:33 ` David Woodhouse 2 siblings, 1 reply; 14+ messages in thread From: David Woodhouse @ 2021-04-01 15:27 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: mtosatti, vkuznets, syzbot+b282b65c2c68492df769 [-- Attachment #1: Type: text/plain, Size: 1225 bytes --] On Tue, 2021-03-30 at 12:59 -0400, Paolo Bonzini wrote: > @@ -2686,13 +2688,13 @@ static int kvm_guest_time_update(struct > kvm_vcpu *v) > * If the host uses TSC clock, then passthrough TSC as stable > * to the guest. > */ > - spin_lock(&ka->pvclock_gtod_sync_lock); > + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > use_master_clock = ka->use_master_clock; > if (use_master_clock) { > host_tsc = ka->master_cycle_now; > kernel_ns = ka->master_kernel_ns; > } > - spin_unlock(&ka->pvclock_gtod_sync_lock); > + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > /* Keep irq disabled to prevent changes to the clock */ > local_irq_save(flags); That seems a little gratuitous at the end; restoring the flags as part of the spin_unlock_irqrestore() and then immediately calling local_irq_save(). Is something going to complain if we just use spin_unlock() there and then later restore the flags with the existing local_irq_restore()? Or should we move the local_irq_save() up above the existing spin_lock() and leave the spin lock/unlock as they are? [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [EXTERNAL] [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken 2021-04-01 15:27 ` [EXTERNAL] " David Woodhouse @ 2021-04-01 16:36 ` Paolo Bonzini 0 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2021-04-01 16:36 UTC (permalink / raw) To: David Woodhouse, linux-kernel, kvm Cc: mtosatti, vkuznets, syzbot+b282b65c2c68492df769 On 01/04/21 17:27, David Woodhouse wrote: >> - spin_lock(&ka->pvclock_gtod_sync_lock); >> + spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); >> use_master_clock = ka->use_master_clock; >> if (use_master_clock) { >> host_tsc = ka->master_cycle_now; >> kernel_ns = ka->master_kernel_ns; >> } >> - spin_unlock(&ka->pvclock_gtod_sync_lock); >> + spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); >> >> /* Keep irq disabled to prevent changes to the clock */ >> local_irq_save(flags); > That seems a little gratuitous at the end; restoring the flags as part > of the spin_unlock_irqrestore() and then immediately calling > local_irq_save(). > > Is something going to complain if we just use spin_unlock() there and > then later restore the flags with the existing local_irq_restore()? Yes, I think it breaks on RT kernels. > Or should we move the local_irq_save() up above the existing > spin_lock() and leave the spin lock/unlock as they are? Nope, also breaks on RT (and this one is explicitly called out by Documentation/locking/locktypes.rst). Since it's necessary to use spin_lock_irqsave and spin_unlock_irqrestore, one would need flags and flags2 variables which is really horrible. I thought of a similar one which is to move the critical section within the IRQ-disabled region: local_irq_save(flags); ... spin_lock(&ka->pvclock_gtod_sync_lock); use_master_clock = ka->use_master_clock; if (use_master_clock) { host_tsc = ka->master_cycle_now; kernel_ns = ka->master_kernel_ns; } else { host_tsc = rdtsc(); kernel_ns = get_kvmclock_base_ns(); } spin_unlock(&ka->pvclock_gtod_sync_lock); ... local_irq_restore(flags); ... but didn't do it because of RT again. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [EXTERNAL] [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken 2021-03-30 16:59 ` [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li 2021-04-01 15:27 ` [EXTERNAL] " David Woodhouse @ 2021-10-23 19:33 ` David Woodhouse 2021-10-23 20:29 ` [PATCH] KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock David Woodhouse 2 siblings, 1 reply; 14+ messages in thread From: David Woodhouse @ 2021-10-23 19:33 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: mtosatti, vkuznets, syzbot+b282b65c2c68492df769 [-- Attachment #1: Type: text/plain, Size: 3901 bytes --] On Tue, 2021-03-30 at 12:59 -0400, Paolo Bonzini wrote: > pvclock_gtod_sync_lock can be taken with interrupts disabled if the > preempt notifier calls get_kvmclock_ns to update the Xen > runstate information: > > spin_lock include/linux/spinlock.h:354 [inline] > get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 > kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 > kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 > kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] > kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 > > So change the users of the spinlock to spin_lock_irqsave and > spin_unlock_irqrestore. Apologies, I didn't spot this at the time. Looks sane enough (if we ignore the elephant in the room that kvm_xen_update_runstate_guest() is also writing to userspace with interrupts disabled on this preempted code path, but I have a fix for that in the works¹). However, in 5.15-rc5 I'm still seeing the warning below when I run xen_shinfo_test. I confess I'm not entirely sure what it's telling me. [ 89.138354] ============================= [ 89.138356] [ BUG: Invalid wait context ] [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E [ 89.138360] ----------------------------- [ 89.138361] xen_shinfo_test/2575 is trying to lock: [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138442] other info that might help us debug this: [ 89.138444] context-{5:5} [ 89.138445] 4 locks held by xen_shinfo_test/2575: [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] [ 89.138576] stack backtrace: [ 89.138577] CPU: 27 PID: 2575 Comm: xen_shinfo_test Tainted: G S I E 5.15.0-rc5+ #834 [ 89.138580] Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [ 89.138582] Call Trace: [ 89.138585] dump_stack_lvl+0x6a/0x9a [ 89.138592] __lock_acquire.cold+0x2ac/0x2d5 [ 89.138597] ? __lock_acquire+0x578/0x1f80 [ 89.138604] lock_acquire+0xc0/0x2d0 [ 89.138608] ? get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138648] ? find_held_lock+0x2b/0x80 [ 89.138653] _raw_spin_lock_irqsave+0x48/0x60 [ 89.138656] ? get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] [ 89.138900] __schedule+0x5de/0xbd0 [ 89.138904] ? kvm_mmu_topup_memory_cache+0x21/0x70 [kvm] [ 89.138937] __cond_resched+0x34/0x50 [ 89.138941] kmem_cache_alloc+0x228/0x2e0 [ 89.138946] kvm_mmu_topup_memory_cache+0x21/0x70 [kvm] [ 89.138979] mmu_topup_memory_caches+0x1d/0x70 [kvm] [ 89.139024] kvm_mmu_load+0x2d/0x750 [kvm] [ 89.139070] ? kvm_cpu_has_extint+0x15/0x90 [kvm] [ 89.139113] ? kvm_cpu_has_injectable_intr+0xe/0x50 [kvm] [ 89.139155] vcpu_enter_guest+0xc77/0x1210 [kvm] [ 89.139195] ? kvm_arch_vcpu_ioctl_run+0x146/0x8b0 [kvm] [ 89.139235] kvm_arch_vcpu_ioctl_run+0x146/0x8b0 [kvm] [ 89.139274] kvm_vcpu_ioctl+0x279/0x6f0 [kvm] [ 89.139306] ? find_held_lock+0x2b/0x80 [ 89.139312] __x64_sys_ioctl+0x83/0xb0 [ 89.139316] do_syscall_64+0x3b/0x90 [ 89.139320] entry_SYSCALL_64_after_hwframe+0x44/0xae ¹ https://git.infradead.org/users/dwmw2/linux.git/commitdiff/ec22c08258 [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock 2021-10-23 19:33 ` David Woodhouse @ 2021-10-23 20:29 ` David Woodhouse 2021-10-25 12:15 ` Paolo Bonzini 0 siblings, 1 reply; 14+ messages in thread From: David Woodhouse @ 2021-10-23 20:29 UTC (permalink / raw) To: Paolo Bonzini, linux-kernel, kvm Cc: mtosatti, vkuznets, syzbot+b282b65c2c68492df769 [-- Attachment #1: Type: text/plain, Size: 6906 bytes --] From: David Woodhouse <dwmw@amazon.co.uk> On the preemption path when updating a Xen guest's runstate times, this lock is taken inside the scheduler rq->lock, which is a raw spinlock. This was shown in a lockdep warning: [ 89.138354] ============================= [ 89.138356] [ BUG: Invalid wait context ] [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E [ 89.138360] ----------------------------- [ 89.138361] xen_shinfo_test/2575 is trying to lock: [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138442] other info that might help us debug this: [ 89.138444] context-{5:5} [ 89.138445] 4 locks held by xen_shinfo_test/2575: [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] ... [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] [ 89.138900] __schedule+0x5de/0xbd0 Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> --- > However, in 5.15-rc5 I'm still seeing the warning below when I run > xen_shinfo_test. I confess I'm not entirely sure what it's telling > me. I think this is what it was telling me... arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/x86.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f8f48a7ec577..70771376e246 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1097,7 +1097,7 @@ struct kvm_arch { u64 cur_tsc_generation; int nr_vcpus_matched_tsc; - spinlock_t pvclock_gtod_sync_lock; + raw_spinlock_t pvclock_gtod_sync_lock; bool use_master_clock; u64 master_kernel_ns; u64 master_cycle_now; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aabd3a2ec1bc..f0874c3d12ce 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2542,7 +2542,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) kvm_vcpu_write_tsc_offset(vcpu, offset); raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { kvm->arch.nr_vcpus_matched_tsc = 0; } else if (!already_matched) { @@ -2550,7 +2550,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) } kvm_track_tsc_matching(vcpu); - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, @@ -2780,9 +2780,9 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) kvm_make_mclock_inprogress_request(kvm); /* no guest entries from this point */ - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); pvclock_update_vm_gtod_copy(kvm); - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -2800,15 +2800,15 @@ u64 get_kvmclock_ns(struct kvm *kvm) unsigned long flags; u64 ret; - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); if (!ka->use_master_clock) { - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); return get_kvmclock_base_ns() + ka->kvmclock_offset; } hv_clock.tsc_timestamp = ka->master_cycle_now; hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); /* both __this_cpu_read() and rdtsc() should be on the same cpu */ get_cpu(); @@ -2902,13 +2902,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) * If the host uses TSC clock, then passthrough TSC as stable * to the guest. */ - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); use_master_clock = ka->use_master_clock; if (use_master_clock) { host_tsc = ka->master_cycle_now; kernel_ns = ka->master_kernel_ns; } - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); @@ -6100,13 +6100,13 @@ long kvm_arch_vm_ioctl(struct file *filp, * is slightly ahead) here we risk going negative on unsigned * 'system_time' when 'user_ns.clock' is very small. */ - spin_lock_irq(&ka->pvclock_gtod_sync_lock); + raw_spin_lock_irq(&ka->pvclock_gtod_sync_lock); if (kvm->arch.use_master_clock) now_ns = ka->master_kernel_ns; else now_ns = get_kvmclock_base_ns(); ka->kvmclock_offset = user_ns.clock - now_ns; - spin_unlock_irq(&ka->pvclock_gtod_sync_lock); + raw_spin_unlock_irq(&ka->pvclock_gtod_sync_lock); kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); break; @@ -8139,9 +8139,9 @@ static void kvm_hyperv_tsc_notifier(void) list_for_each_entry(kvm, &vm_list, vm_list) { struct kvm_arch *ka = &kvm->arch; - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); pvclock_update_vm_gtod_copy(kvm); - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); kvm_for_each_vcpu(cpu, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); @@ -11182,7 +11182,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) raw_spin_lock_init(&kvm->arch.tsc_write_lock); mutex_init(&kvm->arch.apic_map_lock); - spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); + raw_spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); pvclock_update_vm_gtod_copy(kvm); -- 2.25.1 [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock 2021-10-23 20:29 ` [PATCH] KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock David Woodhouse @ 2021-10-25 12:15 ` Paolo Bonzini 0 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2021-10-25 12:15 UTC (permalink / raw) To: David Woodhouse, linux-kernel, kvm Cc: mtosatti, vkuznets, syzbot+b282b65c2c68492df769 On 23/10/21 22:29, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > On the preemption path when updating a Xen guest's runstate times, this > lock is taken inside the scheduler rq->lock, which is a raw spinlock. > This was shown in a lockdep warning: > > [ 89.138354] ============================= > [ 89.138356] [ BUG: Invalid wait context ] > [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E > [ 89.138360] ----------------------------- > [ 89.138361] xen_shinfo_test/2575 is trying to lock: > [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] > [ 89.138442] other info that might help us debug this: > [ 89.138444] context-{5:5} > [ 89.138445] 4 locks held by xen_shinfo_test/2575: > [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] > [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] > [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 > [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] > ... > [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] > [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] > [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] > [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] > [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] > [ 89.138900] __schedule+0x5de/0xbd0 > > Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- >> However, in 5.15-rc5 I'm still seeing the warning below when I run >> xen_shinfo_test. I confess I'm not entirely sure what it's telling >> me. > > I think this is what it was telling me... Yes, indeed - I will commit this to 5.15 (with a 'fake' merge commit to kvm/next to inform git that the pvclock_gtod_sync_lock is gone in 5.16 - and the replacement is already a raw spinlock for partly unrelated reasons). Paolo > arch/x86/include/asm/kvm_host.h | 2 +- > arch/x86/kvm/x86.c | 28 ++++++++++++++-------------- > 2 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index f8f48a7ec577..70771376e246 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1097,7 +1097,7 @@ struct kvm_arch { > u64 cur_tsc_generation; > int nr_vcpus_matched_tsc; > > - spinlock_t pvclock_gtod_sync_lock; > + raw_spinlock_t pvclock_gtod_sync_lock; > bool use_master_clock; > u64 master_kernel_ns; > u64 master_cycle_now; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index aabd3a2ec1bc..f0874c3d12ce 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2542,7 +2542,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) > kvm_vcpu_write_tsc_offset(vcpu, offset); > raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); > > - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); > + raw_spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); > if (!matched) { > kvm->arch.nr_vcpus_matched_tsc = 0; > } else if (!already_matched) { > @@ -2550,7 +2550,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) > } > > kvm_track_tsc_matching(vcpu); > - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); > + raw_spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); > } > > static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, > @@ -2780,9 +2780,9 @@ static void kvm_gen_update_masterclock(struct kvm *kvm) > kvm_make_mclock_inprogress_request(kvm); > > /* no guest entries from this point */ > - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > pvclock_update_vm_gtod_copy(kvm); > - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > kvm_for_each_vcpu(i, vcpu, kvm) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > @@ -2800,15 +2800,15 @@ u64 get_kvmclock_ns(struct kvm *kvm) > unsigned long flags; > u64 ret; > > - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > if (!ka->use_master_clock) { > - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > return get_kvmclock_base_ns() + ka->kvmclock_offset; > } > > hv_clock.tsc_timestamp = ka->master_cycle_now; > hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; > - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > /* both __this_cpu_read() and rdtsc() should be on the same cpu */ > get_cpu(); > @@ -2902,13 +2902,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > * If the host uses TSC clock, then passthrough TSC as stable > * to the guest. > */ > - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > use_master_clock = ka->use_master_clock; > if (use_master_clock) { > host_tsc = ka->master_cycle_now; > kernel_ns = ka->master_kernel_ns; > } > - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > /* Keep irq disabled to prevent changes to the clock */ > local_irq_save(flags); > @@ -6100,13 +6100,13 @@ long kvm_arch_vm_ioctl(struct file *filp, > * is slightly ahead) here we risk going negative on unsigned > * 'system_time' when 'user_ns.clock' is very small. > */ > - spin_lock_irq(&ka->pvclock_gtod_sync_lock); > + raw_spin_lock_irq(&ka->pvclock_gtod_sync_lock); > if (kvm->arch.use_master_clock) > now_ns = ka->master_kernel_ns; > else > now_ns = get_kvmclock_base_ns(); > ka->kvmclock_offset = user_ns.clock - now_ns; > - spin_unlock_irq(&ka->pvclock_gtod_sync_lock); > + raw_spin_unlock_irq(&ka->pvclock_gtod_sync_lock); > > kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); > break; > @@ -8139,9 +8139,9 @@ static void kvm_hyperv_tsc_notifier(void) > list_for_each_entry(kvm, &vm_list, vm_list) { > struct kvm_arch *ka = &kvm->arch; > > - spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); > pvclock_update_vm_gtod_copy(kvm); > - spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > + raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); > > kvm_for_each_vcpu(cpu, vcpu, kvm) > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > @@ -11182,7 +11182,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > mutex_init(&kvm->arch.apic_map_lock); > - spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); > + raw_spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); > > kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); > pvclock_update_vm_gtod_copy(kvm); > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-10-25 12:15 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-30 16:59 [PATCH 0/2] KVM: x86: fix lockdep splat due to Xen runstate update Paolo Bonzini 2021-03-30 16:59 ` [PATCH 1/2] KVM: x86: reduce pvclock_gtod_sync_lock critical sections Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li 2021-04-07 17:40 ` Marcelo Tosatti 2021-04-08 8:15 ` Paolo Bonzini 2021-04-08 12:00 ` Marcelo Tosatti 2021-04-08 12:25 ` Paolo Bonzini 2021-03-30 16:59 ` [PATCH 2/2] KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken Paolo Bonzini 2021-03-31 1:41 ` Wanpeng Li 2021-04-01 15:27 ` [EXTERNAL] " David Woodhouse 2021-04-01 16:36 ` Paolo Bonzini 2021-10-23 19:33 ` David Woodhouse 2021-10-23 20:29 ` [PATCH] KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock David Woodhouse 2021-10-25 12:15 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).