* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-06 20:26 [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set Eduardo Valentin
@ 2017-11-07 12:23 ` Paolo Bonzini
2017-11-07 12:39 ` Eduardo Valentin
2017-11-08 17:36 ` Radim Krčmář
2017-11-09 12:43 ` Wanpeng Li
2 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-07 12:23 UTC (permalink / raw)
To: Eduardo Valentin, rkrcmar
Cc: Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, x86, Peter Zijlstra, Waiman Long, kvm, linux-doc,
linux-kernel, Jan H . Schoenherr, Anthony Liguori
On 06/11/2017 21:26, Eduardo Valentin wrote:
> Currently, the existing qspinlock implementation will fallback to
> test-and-set if the hypervisor has not set the PV_UNHALT flag.
>
> This patch gives the opportunity to guest kernels to select
> between test-and-set and the regular queueu fair lock implementation
> based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> flag is not set, the code will still fall back to test-and-set,
> but when the PV_DEDICATED flag is set, the code will use
> the regular queue spinlock implementation.
>
> With this patch, when in autoselect mode, the guest will
> use the default spinlock implementation based on host feature
> flags as follows:
>
> PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
Hi Eduardo,
besides the suggestion to use a separate word than the one for features,
is this still needed after Waiman's patch to adaptively switch between
tas and pvqspinlock?
Paolo
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Suggested-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
> V3:
> - When PV_DEDICATED is set (1), qspinlock is selected,
> regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> - Refreshed on top of tip/master.
> V2:
> - rebase on top of tip/master
>
> Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> arch/x86/include/asm/qspinlock.h | 4 ++++
> arch/x86/include/uapi/asm/kvm_para.h | 1 +
> arch/x86/kernel/kvm.c | 2 ++
> 4 files changed, 13 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..117066a 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
> || || before enabling paravirtualized
> || || spinlock support.
> ------------------------------------------------------------------------------
> +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit
> + || || to determine if they run on
> + || || dedicated vCPUs, allowing opti-
> + || || mizations such as usage of
> + || || qspinlocks.
> +------------------------------------------------------------------------------
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
> || || per-cpu warps are expected in
> || || kvmclock.
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 5e16b5d..de42694 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -3,6 +3,8 @@
> #define _ASM_X86_QSPINLOCK_H
>
> #include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +
> #include <asm/cpufeature.h>
> #include <asm-generic/qspinlock_types.h>
> #include <asm/paravirt.h>
> @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> if (!static_branch_likely(&virt_spin_lock_key))
> return false;
>
> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> + return false;
> /*
> * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> * back to a Test-and-Set spinlock, because fair locks have
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 554aa8f..85a9875 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -25,6 +25,7 @@
> #define KVM_FEATURE_STEAL_TIME 5
> #define KVM_FEATURE_PV_EOI 6
> #define KVM_FEATURE_PV_UNHALT 7
> +#define KVM_FEATURE_PV_DEDICATED 8
>
> /* The last 8 bits are used to indicate how to interpret the flags field
> * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8bb9594..dacd7cf 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> {
> if (!kvm_para_available())
> return;
> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> + return;
> /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> return;
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-07 12:23 ` Paolo Bonzini
@ 2017-11-07 12:39 ` Eduardo Valentin
2017-11-07 12:43 ` Paolo Bonzini
0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Valentin @ 2017-11-07 12:39 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Eduardo Valentin, rkrcmar, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Peter Zijlstra, Waiman Long, kvm, linux-doc, linux-kernel,
Jan H . Schoenherr, Anthony Liguori
On Tue, Nov 07, 2017 at 01:23:56PM +0100, Paolo Bonzini wrote:
> On 06/11/2017 21:26, Eduardo Valentin wrote:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> >
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> >
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> >
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>
> Hi Eduardo,
>
> besides the suggestion to use a separate word than the one for features,
Ok. I will take a look.
> is this still needed after Waiman's patch to adaptively switch between
> tas and pvqspinlock?
Can you please point me to it ? Is it already in tip/master?
>
> Paolo
>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> > V3:
> > - When PV_DEDICATED is set (1), qspinlock is selected,
> > regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> > - Refreshed on top of tip/master.
> > V2:
> > - rebase on top of tip/master
> >
> > Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> > arch/x86/include/asm/qspinlock.h | 4 ++++
> > arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > arch/x86/kernel/kvm.c | 2 ++
> > 4 files changed, 13 insertions(+)
> >
> > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..117066a 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
> > || || before enabling paravirtualized
> > || || spinlock support.
> > ------------------------------------------------------------------------------
> > +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit
> > + || || to determine if they run on
> > + || || dedicated vCPUs, allowing opti-
> > + || || mizations such as usage of
> > + || || qspinlocks.
> > +------------------------------------------------------------------------------
> > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
> > || || per-cpu warps are expected in
> > || || kvmclock.
> > diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> > index 5e16b5d..de42694 100644
> > --- a/arch/x86/include/asm/qspinlock.h
> > +++ b/arch/x86/include/asm/qspinlock.h
> > @@ -3,6 +3,8 @@
> > #define _ASM_X86_QSPINLOCK_H
> >
> > #include <linux/jump_label.h>
> > +#include <linux/kvm_para.h>
> > +
> > #include <asm/cpufeature.h>
> > #include <asm-generic/qspinlock_types.h>
> > #include <asm/paravirt.h>
> > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > if (!static_branch_likely(&virt_spin_lock_key))
> > return false;
> >
> > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > + return false;
> > /*
> > * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> > * back to a Test-and-Set spinlock, because fair locks have
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > index 554aa8f..85a9875 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -25,6 +25,7 @@
> > #define KVM_FEATURE_STEAL_TIME 5
> > #define KVM_FEATURE_PV_EOI 6
> > #define KVM_FEATURE_PV_UNHALT 7
> > +#define KVM_FEATURE_PV_DEDICATED 8
> >
> > /* The last 8 bits are used to indicate how to interpret the flags field
> > * in pvclock structure. If no bits are set, all flags are ignored.
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8bb9594..dacd7cf 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> > {
> > if (!kvm_para_available())
> > return;
> > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > + return;
> > /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> > if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> > return;
> >
>
>
--
All the best,
Eduardo Valentin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-07 12:39 ` Eduardo Valentin
@ 2017-11-07 12:43 ` Paolo Bonzini
2017-11-08 8:45 ` Eduardo Valentin
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-07 12:43 UTC (permalink / raw)
To: Eduardo Valentin
Cc: rkrcmar, Matt Wilson, Jonathan Corbet, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra, Waiman Long,
kvm, linux-doc, linux-kernel, Jan H . Schoenherr,
Anthony Liguori
On 07/11/2017 13:39, Eduardo Valentin wrote:
>> is this still needed after Waiman's patch to adaptively switch between
>> tas and pvqspinlock?
> Can you please point me to it ? Is it already in tip/master?
>
No, he just posted it:
https://marc.info/?l=linux-kernel&m=150972337909996&w=2
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-07 12:43 ` Paolo Bonzini
@ 2017-11-08 8:45 ` Eduardo Valentin
0 siblings, 0 replies; 27+ messages in thread
From: Eduardo Valentin @ 2017-11-08 8:45 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Eduardo Valentin, rkrcmar, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Peter Zijlstra, Waiman Long, kvm, linux-doc, linux-kernel,
Jan H . Schoenherr, Anthony Liguori
Paolo,
On Tue, Nov 07, 2017 at 01:43:15PM +0100, Paolo Bonzini wrote:
> On 07/11/2017 13:39, Eduardo Valentin wrote:
> >> is this still needed after Waiman's patch to adaptively switch between
> >> tas and pvqspinlock?
> > Can you please point me to it ? Is it already in tip/master?
> >
>
> No, he just posted it:
>
> https://marc.info/?l=linux-kernel&m=150972337909996&w=2
OK, Thanks. I've reviewed his V2. I think the patch to have PV_DEDICATED is still interesting to have,
for the case the guest is in autoselect mode and honors what the host gives as hints.
>
> Paolo
--
All the best,
Eduardo Valentin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-06 20:26 [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set Eduardo Valentin
2017-11-07 12:23 ` Paolo Bonzini
@ 2017-11-08 17:36 ` Radim Krčmář
2017-11-09 8:55 ` Eduardo Valentin
2017-11-09 12:43 ` Wanpeng Li
2 siblings, 1 reply; 27+ messages in thread
From: Radim Krčmář @ 2017-11-08 17:36 UTC (permalink / raw)
To: Eduardo Valentin
Cc: Paolo Bonzini, Matt Wilson, Jonathan Corbet, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra, Waiman Long,
kvm, linux-doc, linux-kernel, Jan H . Schoenherr,
Anthony Liguori
2017-11-06 12:26-0800, Eduardo Valentin:
> Currently, the existing qspinlock implementation will fallback to
> test-and-set if the hypervisor has not set the PV_UNHALT flag.
>
> This patch gives the opportunity to guest kernels to select
> between test-and-set and the regular queueu fair lock implementation
> based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> flag is not set, the code will still fall back to test-and-set,
> but when the PV_DEDICATED flag is set, the code will use
> the regular queue spinlock implementation.
>
> With this patch, when in autoselect mode, the guest will
> use the default spinlock implementation based on host feature
> flags as follows:
>
> PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Suggested-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
> V3:
> - When PV_DEDICATED is set (1), qspinlock is selected,
> regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> - Refreshed on top of tip/master.
> V2:
> - rebase on top of tip/master
>
> Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> arch/x86/include/asm/qspinlock.h | 4 ++++
> arch/x86/include/uapi/asm/kvm_para.h | 1 +
> arch/x86/kernel/kvm.c | 2 ++
> 4 files changed, 13 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..117066a 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
> || || before enabling paravirtualized
> || || spinlock support.
> ------------------------------------------------------------------------------
> +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit
> + || || to determine if they run on
> + || || dedicated vCPUs, allowing opti-
> + || || mizations such as usage of
> + || || qspinlocks.
> +------------------------------------------------------------------------------
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
> || || per-cpu warps are expected in
> || || kvmclock.
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 5e16b5d..de42694 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -3,6 +3,8 @@
> #define _ASM_X86_QSPINLOCK_H
>
> #include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +
> #include <asm/cpufeature.h>
> #include <asm-generic/qspinlock_types.h>
> #include <asm/paravirt.h>
> @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> if (!static_branch_likely(&virt_spin_lock_key))
> return false;
>
> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> + return false;
Hm, every spinlock slowpath calls cpuid, which causes a VM exit, so I
wouldn't expect it to be faster than the existing implementations.
(Using the static key would be better.)
How does this patch perform compared to user-forced qspinlock and hybrid
pvqspinlock?
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-08 17:36 ` Radim Krčmář
@ 2017-11-09 8:55 ` Eduardo Valentin
2017-11-09 14:17 ` Radim Krčmář
0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Valentin @ 2017-11-09 8:55 UTC (permalink / raw)
To: Radim Krčmář
Cc: Eduardo Valentin, Paolo Bonzini, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Peter Zijlstra, Waiman Long, kvm, linux-doc, linux-kernel,
Jan H . Schoenherr, Anthony Liguori
Hello,
On Wed, Nov 08, 2017 at 06:36:52PM +0100, Radim Krčmář wrote:
> 2017-11-06 12:26-0800, Eduardo Valentin:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> >
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> >
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> >
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> > V3:
> > - When PV_DEDICATED is set (1), qspinlock is selected,
> > regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> > - Refreshed on top of tip/master.
> > V2:
> > - rebase on top of tip/master
> >
> > Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> > arch/x86/include/asm/qspinlock.h | 4 ++++
> > arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > arch/x86/kernel/kvm.c | 2 ++
> > 4 files changed, 13 insertions(+)
> >
> > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..117066a 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
> > || || before enabling paravirtualized
> > || || spinlock support.
> > ------------------------------------------------------------------------------
> > +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit
> > + || || to determine if they run on
> > + || || dedicated vCPUs, allowing opti-
> > + || || mizations such as usage of
> > + || || qspinlocks.
> > +------------------------------------------------------------------------------
> > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
> > || || per-cpu warps are expected in
> > || || kvmclock.
> > diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> > index 5e16b5d..de42694 100644
> > --- a/arch/x86/include/asm/qspinlock.h
> > +++ b/arch/x86/include/asm/qspinlock.h
> > @@ -3,6 +3,8 @@
> > #define _ASM_X86_QSPINLOCK_H
> >
> > #include <linux/jump_label.h>
> > +#include <linux/kvm_para.h>
> > +
> > #include <asm/cpufeature.h>
> > #include <asm-generic/qspinlock_types.h>
> > #include <asm/paravirt.h>
> > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > if (!static_branch_likely(&virt_spin_lock_key))
> > return false;
> >
> > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > + return false;
>
> Hm, every spinlock slowpath calls cpuid, which causes a VM exit, so I
> wouldn't expect it to be faster than the existing implementations.
> (Using the static key would be better.)
>
> How does this patch perform compared to user-forced qspinlock and hybrid
> pvqspinlock?
This patch should have same effect as user-forced qspinlock. However, the key aspect
here is this patch gives a way for the host to instruct the guest to use qspinlock.
Even with Longman's patch which allows guest to select the spinlock implementation,
there should still be the auto-select mode. In such mode, PV_DEDICATED should
allow the host to get the guest to use qspinlock, without, the guest will fallback
to tas when PV_UNHALT == 0.
>
> Thanks.
>
--
All the best,
Eduardo Valentin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 8:55 ` Eduardo Valentin
@ 2017-11-09 14:17 ` Radim Krčmář
2017-11-16 4:54 ` Eduardo Valentin
0 siblings, 1 reply; 27+ messages in thread
From: Radim Krčmář @ 2017-11-09 14:17 UTC (permalink / raw)
To: Eduardo Valentin
Cc: Paolo Bonzini, Matt Wilson, Jonathan Corbet, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, Peter Zijlstra, Waiman Long,
kvm, linux-doc, linux-kernel, Jan H . Schoenherr,
Anthony Liguori
2017-11-09 00:55-0800, Eduardo Valentin:
> Hello,
>
> On Wed, Nov 08, 2017 at 06:36:52PM +0100, Radim Krčmář wrote:
> > 2017-11-06 12:26-0800, Eduardo Valentin:
> > > Currently, the existing qspinlock implementation will fallback to
> > > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> > >
> > > This patch gives the opportunity to guest kernels to select
> > > between test-and-set and the regular queueu fair lock implementation
> > > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > > flag is not set, the code will still fall back to test-and-set,
> > > but when the PV_DEDICATED flag is set, the code will use
> > > the regular queue spinlock implementation.
> > >
> > > With this patch, when in autoselect mode, the guest will
> > > use the default spinlock implementation based on host feature
> > > flags as follows:
> > >
> > > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> > >
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: x86@kernel.org
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Waiman Long <longman@redhat.com>
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-doc@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > > Cc: Anthony Liguori <aliguori@amazon.com>
> > > Suggested-by: Matt Wilson <msw@amazon.com>
> > > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > > ---
> > > V3:
> > > - When PV_DEDICATED is set (1), qspinlock is selected,
> > > regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> > > - Refreshed on top of tip/master.
> > > V2:
> > > - rebase on top of tip/master
> > >
> > > Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> > > arch/x86/include/asm/qspinlock.h | 4 ++++
> > > arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > > arch/x86/kernel/kvm.c | 2 ++
> > > 4 files changed, 13 insertions(+)
> > >
> > > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> > > index 3c65feb..117066a 100644
> > > --- a/Documentation/virtual/kvm/cpuid.txt
> > > +++ b/Documentation/virtual/kvm/cpuid.txt
> > > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
> > > || || before enabling paravirtualized
> > > || || spinlock support.
> > > ------------------------------------------------------------------------------
> > > +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit
> > > + || || to determine if they run on
> > > + || || dedicated vCPUs, allowing opti-
> > > + || || mizations such as usage of
> > > + || || qspinlocks.
> > > +------------------------------------------------------------------------------
> > > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
> > > || || per-cpu warps are expected in
> > > || || kvmclock.
> > > diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> > > index 5e16b5d..de42694 100644
> > > --- a/arch/x86/include/asm/qspinlock.h
> > > +++ b/arch/x86/include/asm/qspinlock.h
> > > @@ -3,6 +3,8 @@
> > > #define _ASM_X86_QSPINLOCK_H
> > >
> > > #include <linux/jump_label.h>
> > > +#include <linux/kvm_para.h>
> > > +
> > > #include <asm/cpufeature.h>
> > > #include <asm-generic/qspinlock_types.h>
> > > #include <asm/paravirt.h>
> > > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > > if (!static_branch_likely(&virt_spin_lock_key))
> > > return false;
> > >
> > > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > > + return false;
> >
> > Hm, every spinlock slowpath calls cpuid, which causes a VM exit, so I
> > wouldn't expect it to be faster than the existing implementations.
> > (Using the static key would be better.)
> >
> > How does this patch perform compared to user-forced qspinlock and hybrid
> > pvqspinlock?
>
> This patch should have same effect as user-forced qspinlock.
This is what I'm doubting, because the patch is adding about two
thousand cycles to every spinlock-taken path.
Doesn't this patch yield better results?
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3df743b60c80..d9225e48c11a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -676,6 +676,12 @@ void __init kvm_spinlock_init(void)
{
if (!kvm_para_available())
return;
+
+ if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) {
+ static_branch_disable(&virt_spin_lock_key);
+ return;
+ }
+
/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
> However, the key aspect
> here is this patch gives a way for the host to instruct the guest to use qspinlock.
> Even with Longman's patch which allows guest to select the spinlock implementation,
> there should still be the auto-select mode. In such mode, PV_DEDICATED should
> allow the host to get the guest to use qspinlock, without, the guest will fallback
> to tas when PV_UNHALT == 0.
I agree that a flag can be useful for certains setups.
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 14:17 ` Radim Krčmář
@ 2017-11-16 4:54 ` Eduardo Valentin
2017-11-27 0:43 ` Wanpeng Li
0 siblings, 1 reply; 27+ messages in thread
From: Eduardo Valentin @ 2017-11-16 4:54 UTC (permalink / raw)
To: Radim Krčmář
Cc: Eduardo Valentin, Paolo Bonzini, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Peter Zijlstra, Waiman Long, kvm, linux-doc, linux-kernel,
Jan H . Schoenherr, Anthony Liguori
Hey Radim,
On Thu, Nov 09, 2017 at 03:17:33PM +0100, Radim Krčmář wrote:
<cut>
>
> This is what I'm doubting, because the patch is adding about two
> thousand cycles to every spinlock-taken path.
> Doesn't this patch yield better results?
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 3df743b60c80..d9225e48c11a 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -676,6 +676,12 @@ void __init kvm_spinlock_init(void)
> {
> if (!kvm_para_available())
> return;
> +
> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) {
> + static_branch_disable(&virt_spin_lock_key);
> + return;
> + }
> +
Yes, the above suggestion is a much better approach. The code has probably changed from the time I wrote the first version. I will refresh with the above suggestion.
> /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> return;
>
> > However, the key aspect
> > here is this patch gives a way for the host to instruct the guest to use qspinlock.
> > Even with Longman's patch which allows guest to select the spinlock implementation,
> > there should still be the auto-select mode. In such mode, PV_DEDICATED should
> > allow the host to get the guest to use qspinlock, without, the guest will fallback
> > to tas when PV_UNHALT == 0.
>
> I agree that a flag can be useful for certains setups.
Cool!
>
--
All the best,
Eduardo Valentin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-16 4:54 ` Eduardo Valentin
@ 2017-11-27 0:43 ` Wanpeng Li
0 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2017-11-27 0:43 UTC (permalink / raw)
To: Eduardo Valentin
Cc: Radim Krčmář,
Paolo Bonzini, Matt Wilson, Jonathan Corbet, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers,
Peter Zijlstra, Waiman Long, kvm, linux-doc, linux-kernel,
Jan H . Schoenherr, Anthony Liguori
Hi Eduardo,
2017-11-16 12:54 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> Hey Radim,
>
> On Thu, Nov 09, 2017 at 03:17:33PM +0100, Radim Krčmář wrote:
>
> <cut>
>
>>
>> This is what I'm doubting, because the patch is adding about two
>> thousand cycles to every spinlock-taken path.
>> Doesn't this patch yield better results?
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 3df743b60c80..d9225e48c11a 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -676,6 +676,12 @@ void __init kvm_spinlock_init(void)
>> {
>> if (!kvm_para_available())
>> return;
>> +
>> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) {
>> + static_branch_disable(&virt_spin_lock_key);
>> + return;
>> + }
>> +
>
> Yes, the above suggestion is a much better approach. The code has probably changed from the time I wrote the first version. I will refresh with the above suggestion.
Do you mind to send a new version since the merge window is closed?
Regards,
Wanpeng Li
>
>
>> /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>> if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>> return;
>>
>> > However, the key aspect
>> > here is this patch gives a way for the host to instruct the guest to use qspinlock.
>> > Even with Longman's patch which allows guest to select the spinlock implementation,
>> > there should still be the auto-select mode. In such mode, PV_DEDICATED should
>> > allow the host to get the guest to use qspinlock, without, the guest will fallback
>> > to tas when PV_UNHALT == 0.
>>
>> I agree that a flag can be useful for certains setups.
>
> Cool!
>
>>
>
> --
> All the best,
> Eduardo Valentin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-06 20:26 [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set Eduardo Valentin
2017-11-07 12:23 ` Paolo Bonzini
2017-11-08 17:36 ` Radim Krčmář
@ 2017-11-09 12:43 ` Wanpeng Li
2017-11-09 15:53 ` Pankaj Gupta
2017-11-09 16:00 ` Radim Krcmar
2 siblings, 2 replies; 27+ messages in thread
From: Wanpeng Li @ 2017-11-09 12:43 UTC (permalink / raw)
To: Eduardo Valentin
Cc: Paolo Bonzini, Radim Krcmar, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
the arch/x86 maintainers, Peter Zijlstra, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> Currently, the existing qspinlock implementation will fallback to
> test-and-set if the hypervisor has not set the PV_UNHALT flag.
>
> This patch gives the opportunity to guest kernels to select
> between test-and-set and the regular queueu fair lock implementation
> based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> flag is not set, the code will still fall back to test-and-set,
> but when the PV_DEDICATED flag is set, the code will use
> the regular queue spinlock implementation.
>
> With this patch, when in autoselect mode, the guest will
> use the default spinlock implementation based on host feature
> flags as follows:
>
> PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Suggested-by: Matt Wilson <msw@amazon.com>
> Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> ---
> V3:
> - When PV_DEDICATED is set (1), qspinlock is selected,
> regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> - Refreshed on top of tip/master.
> V2:
> - rebase on top of tip/master
>
> Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> arch/x86/include/asm/qspinlock.h | 4 ++++
> arch/x86/include/uapi/asm/kvm_para.h | 1 +
> arch/x86/kernel/kvm.c | 2 ++
> 4 files changed, 13 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
> index 3c65feb..117066a 100644
> --- a/Documentation/virtual/kvm/cpuid.txt
> +++ b/Documentation/virtual/kvm/cpuid.txt
> @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit
> || || before enabling paravirtualized
> || || spinlock support.
> ------------------------------------------------------------------------------
> +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit
> + || || to determine if they run on
> + || || dedicated vCPUs, allowing opti-
> + || || mizations such as usage of
> + || || qspinlocks.
> +------------------------------------------------------------------------------
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side
> || || per-cpu warps are expected in
> || || kvmclock.
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 5e16b5d..de42694 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -3,6 +3,8 @@
> #define _ASM_X86_QSPINLOCK_H
>
> #include <linux/jump_label.h>
> +#include <linux/kvm_para.h>
> +
> #include <asm/cpufeature.h>
> #include <asm-generic/qspinlock_types.h>
> #include <asm/paravirt.h>
> @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> if (!static_branch_likely(&virt_spin_lock_key))
> return false;
>
> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> + return false;
> /*
> * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> * back to a Test-and-Set spinlock, because fair locks have
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 554aa8f..85a9875 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -25,6 +25,7 @@
> #define KVM_FEATURE_STEAL_TIME 5
> #define KVM_FEATURE_PV_EOI 6
> #define KVM_FEATURE_PV_UNHALT 7
> +#define KVM_FEATURE_PV_DEDICATED 8
>
> /* The last 8 bits are used to indicate how to interpret the flags field
> * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8bb9594..dacd7cf 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> {
> if (!kvm_para_available())
> return;
> + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> + return;
> /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> return;
> --
> 2.7.4
>
You should also add a cpuid flag in kvm part.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 12:43 ` Wanpeng Li
@ 2017-11-09 15:53 ` Pankaj Gupta
2017-11-09 16:05 ` Radim Krcmar
2017-11-09 16:00 ` Radim Krcmar
1 sibling, 1 reply; 27+ messages in thread
From: Pankaj Gupta @ 2017-11-09 15:53 UTC (permalink / raw)
To: Wanpeng Li
Cc: Eduardo Valentin, Paolo Bonzini, Radim Krcmar, Matt Wilson,
Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
the arch/x86 maintainers, Peter Zijlstra, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> >
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> >
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> >
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
> > V3:
> > - When PV_DEDICATED is set (1), qspinlock is selected,
> > regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> > - Refreshed on top of tip/master.
> > V2:
> > - rebase on top of tip/master
> >
> > Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> > arch/x86/include/asm/qspinlock.h | 4 ++++
> > arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > arch/x86/kernel/kvm.c | 2 ++
> > 4 files changed, 13 insertions(+)
> >
> > diff --git a/Documentation/virtual/kvm/cpuid.txt
> > b/Documentation/virtual/kvm/cpuid.txt
> > index 3c65feb..117066a 100644
> > --- a/Documentation/virtual/kvm/cpuid.txt
> > +++ b/Documentation/virtual/kvm/cpuid.txt
> > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest
> > checks this feature bit
> > || || before enabling
> > || || paravirtualized
> > || || spinlock support.
> > ------------------------------------------------------------------------------
> > +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature
> > bit
> > + || || to determine if they run on
> > + || || dedicated vCPUs, allowing
> > opti-
> > + || || mizations such as usage of
> > + || || qspinlocks.
> > +------------------------------------------------------------------------------
> > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no
> > guest-side
> > || || per-cpu warps are expected
> > || || in
> > || || kvmclock.
> > diff --git a/arch/x86/include/asm/qspinlock.h
> > b/arch/x86/include/asm/qspinlock.h
> > index 5e16b5d..de42694 100644
> > --- a/arch/x86/include/asm/qspinlock.h
> > +++ b/arch/x86/include/asm/qspinlock.h
> > @@ -3,6 +3,8 @@
> > #define _ASM_X86_QSPINLOCK_H
> >
> > #include <linux/jump_label.h>
> > +#include <linux/kvm_para.h>
> > +
> > #include <asm/cpufeature.h>
> > #include <asm-generic/qspinlock_types.h>
> > #include <asm/paravirt.h>
> > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > if (!static_branch_likely(&virt_spin_lock_key))
> > return false;
> >
> > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > + return false;
> > /*
> > * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> > * back to a Test-and-Set spinlock, because fair locks have
> > diff --git a/arch/x86/include/uapi/asm/kvm_para.h
> > b/arch/x86/include/uapi/asm/kvm_para.h
> > index 554aa8f..85a9875 100644
> > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > @@ -25,6 +25,7 @@
> > #define KVM_FEATURE_STEAL_TIME 5
> > #define KVM_FEATURE_PV_EOI 6
> > #define KVM_FEATURE_PV_UNHALT 7
> > +#define KVM_FEATURE_PV_DEDICATED 8
> >
> > /* The last 8 bits are used to indicate how to interpret the flags field
> > * in pvclock structure. If no bits are set, all flags are ignored.
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 8bb9594..dacd7cf 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> > {
> > if (!kvm_para_available())
> > return;
> > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > + return;
> > /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> > if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> > return;
> > --
> > 2.7.4
> >
>
> You should also add a cpuid flag in kvm part.
Also, I am thinking if PV_DEDICATED helps in performance and with conjunction
with PV TLB patch in other thread. For use-case e.g KVM-RT where we don't overcommit
vCPU's and pin vCPU:pCPU 1:1 we need a way from host side with which user can decide
to enable PV_DEDICATED option. Such that if vCPU's are unlikely going to preempt or
sleep we should avoid traversing the cpulist in PV TLB code.
So, two things:
1] A way to configure PV_DEDICATED from host.
2] PV TLB should also behave as per option PV_DEDICATED for better performance.
Or I am missing any context here?
>
> Regards,
> Wanpeng Li
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 15:53 ` Pankaj Gupta
@ 2017-11-09 16:05 ` Radim Krcmar
2017-11-09 16:17 ` Peter Zijlstra
0 siblings, 1 reply; 27+ messages in thread
From: Radim Krcmar @ 2017-11-09 16:05 UTC (permalink / raw)
To: Pankaj Gupta
Cc: Wanpeng Li, Eduardo Valentin, Paolo Bonzini, Matt Wilson,
Jonathan Corbet, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
the arch/x86 maintainers, Peter Zijlstra, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-09 10:53-0500, Pankaj Gupta:
>
>
> > 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> > > Currently, the existing qspinlock implementation will fallback to
> > > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> > >
> > > This patch gives the opportunity to guest kernels to select
> > > between test-and-set and the regular queueu fair lock implementation
> > > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > > flag is not set, the code will still fall back to test-and-set,
> > > but when the PV_DEDICATED flag is set, the code will use
> > > the regular queue spinlock implementation.
> > >
> > > With this patch, when in autoselect mode, the guest will
> > > use the default spinlock implementation based on host feature
> > > flags as follows:
> > >
> > > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> > >
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: x86@kernel.org
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Waiman Long <longman@redhat.com>
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-doc@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > > Cc: Anthony Liguori <aliguori@amazon.com>
> > > Suggested-by: Matt Wilson <msw@amazon.com>
> > > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > > ---
> > > V3:
> > > - When PV_DEDICATED is set (1), qspinlock is selected,
> > > regardless of the value of PV_UNHAULT. Suggested by Paolo Bonzini.
> > > - Refreshed on top of tip/master.
> > > V2:
> > > - rebase on top of tip/master
> > >
> > > Documentation/virtual/kvm/cpuid.txt | 6 ++++++
> > > arch/x86/include/asm/qspinlock.h | 4 ++++
> > > arch/x86/include/uapi/asm/kvm_para.h | 1 +
> > > arch/x86/kernel/kvm.c | 2 ++
> > > 4 files changed, 13 insertions(+)
> > >
> > > diff --git a/Documentation/virtual/kvm/cpuid.txt
> > > b/Documentation/virtual/kvm/cpuid.txt
> > > index 3c65feb..117066a 100644
> > > --- a/Documentation/virtual/kvm/cpuid.txt
> > > +++ b/Documentation/virtual/kvm/cpuid.txt
> > > @@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT || 7 || guest
> > > checks this feature bit
> > > || || before enabling
> > > || || paravirtualized
> > > || || spinlock support.
> > > ------------------------------------------------------------------------------
> > > +KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature
> > > bit
> > > + || || to determine if they run on
> > > + || || dedicated vCPUs, allowing
> > > opti-
> > > + || || mizations such as usage of
> > > + || || qspinlocks.
> > > +------------------------------------------------------------------------------
> > > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no
> > > guest-side
> > > || || per-cpu warps are expected
> > > || || in
> > > || || kvmclock.
> > > diff --git a/arch/x86/include/asm/qspinlock.h
> > > b/arch/x86/include/asm/qspinlock.h
> > > index 5e16b5d..de42694 100644
> > > --- a/arch/x86/include/asm/qspinlock.h
> > > +++ b/arch/x86/include/asm/qspinlock.h
> > > @@ -3,6 +3,8 @@
> > > #define _ASM_X86_QSPINLOCK_H
> > >
> > > #include <linux/jump_label.h>
> > > +#include <linux/kvm_para.h>
> > > +
> > > #include <asm/cpufeature.h>
> > > #include <asm-generic/qspinlock_types.h>
> > > #include <asm/paravirt.h>
> > > @@ -58,6 +60,8 @@ static inline bool virt_spin_lock(struct qspinlock *lock)
> > > if (!static_branch_likely(&virt_spin_lock_key))
> > > return false;
> > >
> > > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > > + return false;
> > > /*
> > > * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> > > * back to a Test-and-Set spinlock, because fair locks have
> > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h
> > > b/arch/x86/include/uapi/asm/kvm_para.h
> > > index 554aa8f..85a9875 100644
> > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > @@ -25,6 +25,7 @@
> > > #define KVM_FEATURE_STEAL_TIME 5
> > > #define KVM_FEATURE_PV_EOI 6
> > > #define KVM_FEATURE_PV_UNHALT 7
> > > +#define KVM_FEATURE_PV_DEDICATED 8
> > >
> > > /* The last 8 bits are used to indicate how to interpret the flags field
> > > * in pvclock structure. If no bits are set, all flags are ignored.
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 8bb9594..dacd7cf 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -642,6 +642,8 @@ void __init kvm_spinlock_init(void)
> > > {
> > > if (!kvm_para_available())
> > > return;
> > > + if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED))
> > > + return;
> > > /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> > > if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> > > return;
> > > --
> > > 2.7.4
> > >
> >
> > You should also add a cpuid flag in kvm part.
>
> Also, I am thinking if PV_DEDICATED helps in performance and with conjunction
> with PV TLB patch in other thread. For use-case e.g KVM-RT where we don't overcommit
> vCPU's and pin vCPU:pCPU 1:1 we need a way from host side with which user can decide
> to enable PV_DEDICATED option. Such that if vCPU's are unlikely going to preempt or
> sleep we should avoid traversing the cpulist in PV TLB code.
>
> So, two things:
>
> 1] A way to configure PV_DEDICATED from host.
Userspace can already configure it through KVM_SET_CPUID2 ioctl
(regardless of KVM version).
> 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
Right,
thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 16:05 ` Radim Krcmar
@ 2017-11-09 16:17 ` Peter Zijlstra
2017-11-09 16:37 ` Pankaj Gupta
2017-11-09 16:45 ` Radim Krcmar
0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2017-11-09 16:17 UTC (permalink / raw)
To: Radim Krcmar
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> 2017-11-09 10:53-0500, Pankaj Gupta:
> > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
>
> Right,
Shouldn't KVM do flush_tlb_other() in any case? Not sure how
PV_DEDICATED can help with that.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 16:17 ` Peter Zijlstra
@ 2017-11-09 16:37 ` Pankaj Gupta
2017-11-09 16:45 ` Radim Krcmar
1 sibling, 0 replies; 27+ messages in thread
From: Pankaj Gupta @ 2017-11-09 16:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Radim Krcmar, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
> Subject: Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
>
> On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > 2] PV TLB should also behave as per option PV_DEDICATED for better
> > > performance.
> >
> > Right,
>
> Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> PV_DEDICATED can help with that.
Yes.
If I understand it correctly, It will just avoid traversing all the
cpus another time just to check/set 'KVM_VCPU_PREEMPTED'/FLUSH value and
clear the cpu mask.
Thanks,
Pankaj
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 16:17 ` Peter Zijlstra
2017-11-09 16:37 ` Pankaj Gupta
@ 2017-11-09 16:45 ` Radim Krcmar
2017-11-09 17:12 ` Peter Zijlstra
1 sibling, 1 reply; 27+ messages in thread
From: Radim Krcmar @ 2017-11-09 16:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-09 17:17+0100, Peter Zijlstra:
> On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> >
> > Right,
>
> Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> PV_DEDICATED can help with that.
It will, the suggestion was based on recent extension of the
flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
IPI if the target VCPU is not running. This would be a waste of time
with PV_DEDICATED as all VCPUs are expected to always running.
With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 16:45 ` Radim Krcmar
@ 2017-11-09 17:12 ` Peter Zijlstra
2017-11-09 17:15 ` Peter Zijlstra
2017-11-09 17:31 ` Radim Krcmar
0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2017-11-09 17:12 UTC (permalink / raw)
To: Radim Krcmar
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> 2017-11-09 17:17+0100, Peter Zijlstra:
> > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > >
> > > Right,
> >
> > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > PV_DEDICATED can help with that.
>
> It will, the suggestion was based on recent extension of the
> flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
>
> PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> IPI if the target VCPU is not running. This would be a waste of time
> with PV_DEDICATED as all VCPUs are expected to always running.
>
> With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
Is saving that for_each_cpu() really worth the effort compared to the
cost of actually doing the IPIs and CR3 write?
Also, you should not put cpumask_t on stack, that's 'broken'.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 17:12 ` Peter Zijlstra
@ 2017-11-09 17:15 ` Peter Zijlstra
2017-11-09 17:28 ` Peter Zijlstra
2017-11-10 2:07 ` Wanpeng Li
2017-11-09 17:31 ` Radim Krcmar
1 sibling, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2017-11-09 17:15 UTC (permalink / raw)
To: Radim Krcmar
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > >
> > > > Right,
> > >
> > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > PV_DEDICATED can help with that.
> >
> > It will, the suggestion was based on recent extension of the
> > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> >
> > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > IPI if the target VCPU is not running. This would be a waste of time
> > with PV_DEDICATED as all VCPUs are expected to always running.
> >
> > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
>
> Is saving that for_each_cpu() really worth the effort compared to the
> cost of actually doing the IPIs and CR3 write?
>
> Also, you should not put cpumask_t on stack, that's 'broken'.
Also, you'll want to use __cpumask_clear_cpu() there.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 17:15 ` Peter Zijlstra
@ 2017-11-09 17:28 ` Peter Zijlstra
2017-11-09 17:35 ` Radim Krcmar
2017-11-10 2:07 ` Wanpeng Li
1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2017-11-09 17:28 UTC (permalink / raw)
To: Radim Krcmar
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
On Thu, Nov 09, 2017 at 06:15:11PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > > >
> > > > > Right,
> > > >
> > > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > > PV_DEDICATED can help with that.
> > >
> > > It will, the suggestion was based on recent extension of the
> > > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> > >
> > > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > > IPI if the target VCPU is not running. This would be a waste of time
> > > with PV_DEDICATED as all VCPUs are expected to always running.
> > >
> > > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
> >
> > Is saving that for_each_cpu() really worth the effort compared to the
> > cost of actually doing the IPIs and CR3 write?
> >
> > Also, you should not put cpumask_t on stack, that's 'broken'.
>
> Also, you'll want to use __cpumask_clear_cpu() there.
Also^2, that patch split is crazy, after patch 2/3 your machine is
broken due to lost TLB flushes. You have to first add the SHOULD_FLUSH
handling and then clear CPUs from the native_flush_tlb_other() mask.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 17:28 ` Peter Zijlstra
@ 2017-11-09 17:35 ` Radim Krcmar
0 siblings, 0 replies; 27+ messages in thread
From: Radim Krcmar @ 2017-11-09 17:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-09 18:28+0100, Peter Zijlstra:
> On Thu, Nov 09, 2017 at 06:15:11PM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
> > > On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > > > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > > > >
> > > > > > Right,
> > > > >
> > > > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > > > PV_DEDICATED can help with that.
> > > >
> > > > It will, the suggestion was based on recent extension of the
> > > > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> > > >
> > > > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > > > IPI if the target VCPU is not running. This would be a waste of time
> > > > with PV_DEDICATED as all VCPUs are expected to always running.
> > > >
> > > > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
> > >
> > > Is saving that for_each_cpu() really worth the effort compared to the
> > > cost of actually doing the IPIs and CR3 write?
> > >
> > > Also, you should not put cpumask_t on stack, that's 'broken'.
> >
> > Also, you'll want to use __cpumask_clear_cpu() there.
>
> Also^2, that patch split is crazy, after patch 2/3 your machine is
> broken due to lost TLB flushes. You have to first add the SHOULD_FLUSH
> handling and then clear CPUs from the native_flush_tlb_other() mask.
That should be fixed in v2 -- [2/3] must not enable this feature if the
host has not exposed it and [3/3] has to expose it. (The ordering of
those two doesn't matter as they are separate kernel.)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 17:15 ` Peter Zijlstra
2017-11-09 17:28 ` Peter Zijlstra
@ 2017-11-10 2:07 ` Wanpeng Li
2017-11-10 7:59 ` Peter Zijlstra
1 sibling, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2017-11-10 2:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Radim Krcmar, Pankaj Gupta, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-10 1:15 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Nov 09, 2017 at 06:12:41PM +0100, Peter Zijlstra wrote:
>> On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
>> > 2017-11-09 17:17+0100, Peter Zijlstra:
>> > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
>> > > > 2017-11-09 10:53-0500, Pankaj Gupta:
>> > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
>> > > >
>> > > > Right,
>> > >
>> > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
>> > > PV_DEDICATED can help with that.
>> >
>> > It will, the suggestion was based on recent extension of the
>> > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
>> >
>> > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
>> > IPI if the target VCPU is not running. This would be a waste of time
>> > with PV_DEDICATED as all VCPUs are expected to always running.
>> >
>> > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
>>
>> Is saving that for_each_cpu() really worth the effort compared to the
>> cost of actually doing the IPIs and CR3 write?
>>
>> Also, you should not put cpumask_t on stack, that's 'broken'.
Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
/* These two declarations are only used in check_irq_vectors_for_cpu_disable()
* below, which is protected by stop_machine(). Putting them on the stack
* results in a stack frame overflow. Dynamically allocating could result in a
* failure so declare these two cpumasks as global.
*/
static struct cpumask affinity_new, online_new;
>
> Also, you'll want to use __cpumask_clear_cpu() there.
Will do.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-10 2:07 ` Wanpeng Li
@ 2017-11-10 7:59 ` Peter Zijlstra
2017-11-10 8:04 ` Wanpeng Li
0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2017-11-10 7:59 UTC (permalink / raw)
To: Wanpeng Li
Cc: Radim Krcmar, Pankaj Gupta, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote:
> >> Also, you should not put cpumask_t on stack, that's 'broken'.
>
> Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
>
> /* These two declarations are only used in check_irq_vectors_for_cpu_disable()
> * below, which is protected by stop_machine(). Putting them on the stack
> * results in a stack frame overflow. Dynamically allocating could result in a
> * failure so declare these two cpumasks as global.
> */
> static struct cpumask affinity_new, online_new;
That code no longer exists.. Also not entirely sure how it would be
helpful.
What you probably want to do is have a per-cpu cpumask, since
flush_tlb_others() is called with preemption disabled. But you probably
don't want an unconditionally allocated one, since most kernels will not
in fact be PV.
So you'll want something like:
static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
And then you need something like:
for_each_possible_cpu(cpu) {
zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu),
GFP_KERNEL, cpu_to_node(cpu));
}
before you set the pv-op or so.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-10 7:59 ` Peter Zijlstra
@ 2017-11-10 8:04 ` Wanpeng Li
0 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2017-11-10 8:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Radim Krcmar, Pankaj Gupta, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-10 15:59 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Fri, Nov 10, 2017 at 10:07:56AM +0800, Wanpeng Li wrote:
>
>> >> Also, you should not put cpumask_t on stack, that's 'broken'.
>>
>> Thanks pointing out this. I found a useful comments in arch/x86/kernel/irq.c:
>>
>> /* These two declarations are only used in check_irq_vectors_for_cpu_disable()
>> * below, which is protected by stop_machine(). Putting them on the stack
>> * results in a stack frame overflow. Dynamically allocating could result in a
>> * failure so declare these two cpumasks as global.
>> */
>> static struct cpumask affinity_new, online_new;
>
> That code no longer exists.. Also not entirely sure how it would be
> helpful.
>
> What you probably want to do is have a per-cpu cpumask, since
> flush_tlb_others() is called with preemption disabled. But you probably
> don't want an unconditionally allocated one, since most kernels will not
> in fact be PV.
>
> So you'll want something like:
>
> static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
>
> And then you need something like:
>
> for_each_possible_cpu(cpu) {
> zalloc_cpumask_var_node(per_cpu_ptr(&__pb_tlb_mask, cpu),
> GFP_KERNEL, cpu_to_node(cpu));
> }
>
> before you set the pv-op or so.
Thanks Peterz, :) I will have a try.
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 17:12 ` Peter Zijlstra
2017-11-09 17:15 ` Peter Zijlstra
@ 2017-11-09 17:31 ` Radim Krcmar
1 sibling, 0 replies; 27+ messages in thread
From: Radim Krcmar @ 2017-11-09 17:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pankaj Gupta, Wanpeng Li, Eduardo Valentin, Paolo Bonzini,
Matt Wilson, Jonathan Corbet, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, the arch/x86 maintainers, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-09 18:12+0100, Peter Zijlstra:
> On Thu, Nov 09, 2017 at 05:45:23PM +0100, Radim Krcmar wrote:
> > 2017-11-09 17:17+0100, Peter Zijlstra:
> > > On Thu, Nov 09, 2017 at 05:05:36PM +0100, Radim Krcmar wrote:
> > > > 2017-11-09 10:53-0500, Pankaj Gupta:
> > > > > 2] PV TLB should also behave as per option PV_DEDICATED for better performance.
> > > >
> > > > Right,
> > >
> > > Shouldn't KVM do flush_tlb_other() in any case? Not sure how
> > > PV_DEDICATED can help with that.
> >
> > It will, the suggestion was based on recent extension of the
> > flush_tlb_others implementaion, https://lkml.org/lkml/2017/11/8/1146.
> >
> > PV_TLB_FLUSH allows a guest to set a flush bit instead of sending flush
> > IPI if the target VCPU is not running. This would be a waste of time
> > with PV_DEDICATED as all VCPUs are expected to always running.
> >
> > With PV_DEDICATED, the guest should keep using native_flush_tlb_others.
>
> Is saving that for_each_cpu() really worth the effort compared to the
> cost of actually doing the IPIs and CR3 write?
It is one line for a few percent (hopefully better for AMD with AVIC).
Still, keeping the decision completely on userspace would be cleaner.
> Also, you should not put cpumask_t on stack, that's 'broken'.
Good catch, thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 12:43 ` Wanpeng Li
2017-11-09 15:53 ` Pankaj Gupta
@ 2017-11-09 16:00 ` Radim Krcmar
2017-11-10 6:07 ` Wanpeng Li
1 sibling, 1 reply; 27+ messages in thread
From: Radim Krcmar @ 2017-11-09 16:00 UTC (permalink / raw)
To: Wanpeng Li
Cc: Eduardo Valentin, Paolo Bonzini, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
the arch/x86 maintainers, Peter Zijlstra, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-09 20:43+0800, Wanpeng Li:
> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
> > Currently, the existing qspinlock implementation will fallback to
> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
> >
> > This patch gives the opportunity to guest kernels to select
> > between test-and-set and the regular queueu fair lock implementation
> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
> > flag is not set, the code will still fall back to test-and-set,
> > but when the PV_DEDICATED flag is set, the code will use
> > the regular queue spinlock implementation.
> >
> > With this patch, when in autoselect mode, the guest will
> > use the default spinlock implementation based on host feature
> > flags as follows:
> >
> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: x86@kernel.org
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: kvm@vger.kernel.org
> > Cc: linux-doc@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
> > Cc: Anthony Liguori <aliguori@amazon.com>
> > Suggested-by: Matt Wilson <msw@amazon.com>
> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
> > ---
>
> You should also add a cpuid flag in kvm part.
It is better without that. The flag has no dependency on KVM (kernel
hypervisor) code.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-09 16:00 ` Radim Krcmar
@ 2017-11-10 6:07 ` Wanpeng Li
2017-11-10 8:19 ` Paolo Bonzini
0 siblings, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2017-11-10 6:07 UTC (permalink / raw)
To: Radim Krcmar
Cc: Eduardo Valentin, Paolo Bonzini, Matt Wilson, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
the arch/x86 maintainers, Peter Zijlstra, Waiman Long, kvm,
linux-doc, linux-kernel, Jan H . Schoenherr, Anthony Liguori
2017-11-10 0:00 GMT+08:00 Radim Krcmar <rkrcmar@redhat.com>:
> 2017-11-09 20:43+0800, Wanpeng Li:
>> 2017-11-07 4:26 GMT+08:00 Eduardo Valentin <eduval@amazon.com>:
>> > Currently, the existing qspinlock implementation will fallback to
>> > test-and-set if the hypervisor has not set the PV_UNHALT flag.
>> >
>> > This patch gives the opportunity to guest kernels to select
>> > between test-and-set and the regular queueu fair lock implementation
>> > based on the PV_DEDICATED KVM feature flag. When the PV_DEDICATED
>> > flag is not set, the code will still fall back to test-and-set,
>> > but when the PV_DEDICATED flag is set, the code will use
>> > the regular queue spinlock implementation.
>> >
>> > With this patch, when in autoselect mode, the guest will
>> > use the default spinlock implementation based on host feature
>> > flags as follows:
>> >
>> > PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
>> > PV_DEDICATED = 0, PV_UNHALT = 1: default is pvqspinlock
>> > PV_DEDICATED = 0, PV_UNHALT = 0: default is tas
>> >
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: "Radim Krčmář" <rkrcmar@redhat.com>
>> > Cc: Jonathan Corbet <corbet@lwn.net>
>> > Cc: Thomas Gleixner <tglx@linutronix.de>
>> > Cc: Ingo Molnar <mingo@redhat.com>
>> > Cc: "H. Peter Anvin" <hpa@zytor.com>
>> > Cc: x86@kernel.org
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Waiman Long <longman@redhat.com>
>> > Cc: kvm@vger.kernel.org
>> > Cc: linux-doc@vger.kernel.org
>> > Cc: linux-kernel@vger.kernel.org
>> > Cc: Jan H. Schoenherr <jschoenh@amazon.de>
>> > Cc: Anthony Liguori <aliguori@amazon.com>
>> > Suggested-by: Matt Wilson <msw@amazon.com>
>> > Signed-off-by: Eduardo Valentin <eduval@amazon.com>
>> > ---
>>
>> You should also add a cpuid flag in kvm part.
>
> It is better without that. The flag has no dependency on KVM (kernel
> hypervisor) code.
Do you mean -cpu host, +xxxx,I think it will result in "warning: host
doesn't support requested feature: CPUID.40000001H:eax.XXXXXX"
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCHv3 1/1] locking/qspinlock/x86: Avoid test-and-set when PV_DEDICATED is set
2017-11-10 6:07 ` Wanpeng Li
@ 2017-11-10 8:19 ` Paolo Bonzini
0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2017-11-10 8:19 UTC (permalink / raw)
To: Wanpeng Li, Radim Krcmar
Cc: Eduardo Valentin, Matt Wilson, Jonathan Corbet, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers,
Peter Zijlstra, Waiman Long, kvm, linux-doc, linux-kernel,
Jan H . Schoenherr, Anthony Liguori
On 10/11/2017 07:07, Wanpeng Li wrote:
>>> You should also add a cpuid flag in kvm part.
>> It is better without that. The flag has no dependency on KVM (kernel
>> hypervisor) code.
> Do you mean -cpu host, +xxxx,I think it will result in "warning: host
> doesn't support requested feature: CPUID.40000001H:eax.XXXXXX"
There are some exceptions where QEMU overrides the values of
KVM_GET_SUPPORTED_CPUID.
I think it is better to add the flag to KVM *and* to QEMU's override in
kvm_arch_get_supported_cpuid (target/i386/kvm.c).
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread