All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y
@ 2024-03-12 19:39 Sean Christopherson
  2024-03-12 19:39 ` [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-03-12 19:39 UTC (permalink / raw)
  To: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon
  Cc: linux-doc, linux-kernel, Sean Christopherson, Valentin Schneider,
	Marco Elver, Frederic Weisbecker, David Matlack, Friedrich Weber,
	Ankur Arora, Thomas Gleixner

Fix a bug in dynamic preemption where the kernel will yield contended
spinlocks (and rwlocks) even if the selected preemption model is "none" or
"voluntary".  I say "bug" because this divergence from PREEMPT_DYNAMIC=n
behavior effectively broke existing KVM configurations, e.g. vCPUs would
get stuck and become unresponsive for multiple seconds if there was heavy
KSM or NUMA balancing activity in the host.

This isn't super urgent, as 6.8 has a fix in KVM for the over-aggressive
yielding (commit d02c357e5bfa ("KVM: x86/mmu: Retry fault before acquiring
mmu_lock if mapping is changing"), but I wouldn't be surprised if the
behavior is causing other performance issues/regressions that are less
severe and/or less visible.

v2:
 - Rebase onto Linus' tree to deal with the code movement to spinlock.h.
 - Opportunistically document the behavior.
 - Add the PREEMPT_AUTO folks to Cc to get their eyeballs/input.

v1: https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com

Sean Christopherson (2):
  sched/core: Move preempt_model_*() helpers from sched.h to preempt.h
  sched/core: Drop spinlocks on contention iff kernel is preemptible

 .../admin-guide/kernel-parameters.txt         |  4 +-
 include/linux/preempt.h                       | 41 +++++++++++++++++++
 include/linux/sched.h                         | 41 -------------------
 include/linux/spinlock.h                      | 14 +++----
 4 files changed, 50 insertions(+), 50 deletions(-)


base-commit: b29f377119f68b942369a9366bdcb1fec82b2cda
-- 
2.44.0.278.ge034bb2e1d-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h
  2024-03-12 19:39 [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson
@ 2024-03-12 19:39 ` Sean Christopherson
  2024-04-25  6:19   ` Ankur Arora
  2024-03-12 19:39 ` [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible Sean Christopherson
  2024-04-24 20:08 ` [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson
  2 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2024-03-12 19:39 UTC (permalink / raw)
  To: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon
  Cc: linux-doc, linux-kernel, Sean Christopherson, Valentin Schneider,
	Marco Elver, Frederic Weisbecker, David Matlack, Friedrich Weber,
	Ankur Arora, Thomas Gleixner

Move the declarations and inlined implementations of the preempt_model_*()
helpers to preempt.h so that they can be referenced in spinlock.h without
creating a potential circular dependency between spinlock.h and sched.h.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/preempt.h | 41 +++++++++++++++++++++++++++++++++++++++++
 include/linux/sched.h   | 41 -----------------------------------------
 2 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 7233e9cf1bab..ce76f1a45722 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -481,4 +481,45 @@ DEFINE_LOCK_GUARD_0(preempt, preempt_disable(), preempt_enable())
 DEFINE_LOCK_GUARD_0(preempt_notrace, preempt_disable_notrace(), preempt_enable_notrace())
 DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable())
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+
+extern bool preempt_model_none(void);
+extern bool preempt_model_voluntary(void);
+extern bool preempt_model_full(void);
+
+#else
+
+static inline bool preempt_model_none(void)
+{
+	return IS_ENABLED(CONFIG_PREEMPT_NONE);
+}
+static inline bool preempt_model_voluntary(void)
+{
+	return IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY);
+}
+static inline bool preempt_model_full(void)
+{
+	return IS_ENABLED(CONFIG_PREEMPT);
+}
+
+#endif
+
+static inline bool preempt_model_rt(void)
+{
+	return IS_ENABLED(CONFIG_PREEMPT_RT);
+}
+
+/*
+ * Does the preemption model allow non-cooperative preemption?
+ *
+ * For !CONFIG_PREEMPT_DYNAMIC kernels this is an exact match with
+ * CONFIG_PREEMPTION; for CONFIG_PREEMPT_DYNAMIC this doesn't work as the
+ * kernel is *built* with CONFIG_PREEMPTION=y but may run with e.g. the
+ * PREEMPT_NONE model.
+ */
+static inline bool preempt_model_preemptible(void)
+{
+	return preempt_model_full() || preempt_model_rt();
+}
+
 #endif /* __LINUX_PREEMPT_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 17cb0761ff65..e9dc10f7a463 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2058,47 +2058,6 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock);
 	__cond_resched_rwlock_write(lock);					\
 })
 
-#ifdef CONFIG_PREEMPT_DYNAMIC
-
-extern bool preempt_model_none(void);
-extern bool preempt_model_voluntary(void);
-extern bool preempt_model_full(void);
-
-#else
-
-static inline bool preempt_model_none(void)
-{
-	return IS_ENABLED(CONFIG_PREEMPT_NONE);
-}
-static inline bool preempt_model_voluntary(void)
-{
-	return IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY);
-}
-static inline bool preempt_model_full(void)
-{
-	return IS_ENABLED(CONFIG_PREEMPT);
-}
-
-#endif
-
-static inline bool preempt_model_rt(void)
-{
-	return IS_ENABLED(CONFIG_PREEMPT_RT);
-}
-
-/*
- * Does the preemption model allow non-cooperative preemption?
- *
- * For !CONFIG_PREEMPT_DYNAMIC kernels this is an exact match with
- * CONFIG_PREEMPTION; for CONFIG_PREEMPT_DYNAMIC this doesn't work as the
- * kernel is *built* with CONFIG_PREEMPTION=y but may run with e.g. the
- * PREEMPT_NONE model.
- */
-static inline bool preempt_model_preemptible(void)
-{
-	return preempt_model_full() || preempt_model_rt();
-}
-
 static __always_inline bool need_resched(void)
 {
 	return unlikely(tif_need_resched());
-- 
2.44.0.278.ge034bb2e1d-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible
  2024-03-12 19:39 [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson
  2024-03-12 19:39 ` [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h Sean Christopherson
@ 2024-03-12 19:39 ` Sean Christopherson
  2024-04-25  6:18   ` Ankur Arora
  2024-04-25  7:41   ` Chen Yu
  2024-04-24 20:08 ` [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson
  2 siblings, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-03-12 19:39 UTC (permalink / raw)
  To: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon
  Cc: linux-doc, linux-kernel, Sean Christopherson, Valentin Schneider,
	Marco Elver, Frederic Weisbecker, David Matlack, Friedrich Weber,
	Ankur Arora, Thomas Gleixner

Use preempt_model_preemptible() to detect a preemptible kernel when
deciding whether or not to reschedule in order to drop a contended
spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
preemption model is "none" or "voluntary".  In short, make kernels with
dynamically selected models behave the same as kernels with statically
selected models.

Somewhat counter-intuitively, NOT yielding a lock can provide better
latency for the relevant tasks/processes.  E.g. KVM x86's mmu_lock, a
rwlock, is often contended between an invalidation event (takes mmu_lock
for write) and a vCPU servicing a guest page fault (takes mmu_lock for
read).  For _some_ setups, letting the invalidation task complete even
if there is mmu_lock contention provides lower latency for *all* tasks,
i.e. the invalidation completes sooner *and* the vCPU services the guest
page fault sooner.

But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
can vary depending on the host VMM, the guest workload, the number of
vCPUs, the number of pCPUs in the host, why there is lock contention, etc.

In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
opposite and removing contention yielding entirely) needs to come with a
big pile of data proving that changing the status quo is a net positive.

Opportunistically document this side effect of preempt=full, as yielding
contended spinlocks can have significant, user-visible impact.

Fixes: c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic preempt mode")
Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Marco Elver <elver@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Friedrich Weber <f.weber@proxmox.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  4 +++-
 include/linux/spinlock.h                        | 14 ++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 825398d66c69..fdeddb066439 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4689,7 +4689,9 @@
 			none - Limited to cond_resched() calls
 			voluntary - Limited to cond_resched() and might_sleep() calls
 			full - Any section that isn't explicitly preempt disabled
-			       can be preempted anytime.
+			       can be preempted anytime.  Tasks will also yield
+			       contended spinlocks (if the critical section isn't
+			       explicitly preempt disabled beyond the lock itself).
 
 	print-fatal-signals=
 			[KNL] debug: print fatal signals
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 3fcd20de6ca8..63dd8cf3c3c2 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -462,11 +462,10 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
  */
 static inline int spin_needbreak(spinlock_t *lock)
 {
-#ifdef CONFIG_PREEMPTION
+	if (!preempt_model_preemptible())
+		return 0;
+
 	return spin_is_contended(lock);
-#else
-	return 0;
-#endif
 }
 
 /*
@@ -479,11 +478,10 @@ static inline int spin_needbreak(spinlock_t *lock)
  */
 static inline int rwlock_needbreak(rwlock_t *lock)
 {
-#ifdef CONFIG_PREEMPTION
+	if (!preempt_model_preemptible())
+		return 0;
+
 	return rwlock_is_contended(lock);
-#else
-	return 0;
-#endif
 }
 
 /*
-- 
2.44.0.278.ge034bb2e1d-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y
  2024-03-12 19:39 [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson
  2024-03-12 19:39 ` [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h Sean Christopherson
  2024-03-12 19:39 ` [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible Sean Christopherson
@ 2024-04-24 20:08 ` Sean Christopherson
  2 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-04-24 20:08 UTC (permalink / raw)
  To: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon, linux-doc, linux-kernel,
	Valentin Schneider, Marco Elver, Frederic Weisbecker,
	David Matlack, Friedrich Weber, Ankur Arora, Thomas Gleixner

On Tue, Mar 12, 2024, Sean Christopherson wrote:
> Fix a bug in dynamic preemption where the kernel will yield contended
> spinlocks (and rwlocks) even if the selected preemption model is "none" or
> "voluntary".  I say "bug" because this divergence from PREEMPT_DYNAMIC=n
> behavior effectively broke existing KVM configurations, e.g. vCPUs would
> get stuck and become unresponsive for multiple seconds if there was heavy
> KSM or NUMA balancing activity in the host.
> 
> This isn't super urgent, as 6.8 has a fix in KVM for the over-aggressive
> yielding (commit d02c357e5bfa ("KVM: x86/mmu: Retry fault before acquiring
> mmu_lock if mapping is changing"), but I wouldn't be surprised if the
> behavior is causing other performance issues/regressions that are less
> severe and/or less visible.

Anyone have any thoughts on how to move this forward?  I have a hard time
believing no one has opinions on this code :-)

> v2:
>  - Rebase onto Linus' tree to deal with the code movement to spinlock.h.
>  - Opportunistically document the behavior.
>  - Add the PREEMPT_AUTO folks to Cc to get their eyeballs/input.
> 
> v1: https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
> 
> Sean Christopherson (2):
>   sched/core: Move preempt_model_*() helpers from sched.h to preempt.h
>   sched/core: Drop spinlocks on contention iff kernel is preemptible
> 
>  .../admin-guide/kernel-parameters.txt         |  4 +-
>  include/linux/preempt.h                       | 41 +++++++++++++++++++
>  include/linux/sched.h                         | 41 -------------------
>  include/linux/spinlock.h                      | 14 +++----
>  4 files changed, 50 insertions(+), 50 deletions(-)
> 
> 
> base-commit: b29f377119f68b942369a9366bdcb1fec82b2cda
> -- 
> 2.44.0.278.ge034bb2e1d-goog
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible
  2024-03-12 19:39 ` [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible Sean Christopherson
@ 2024-04-25  6:18   ` Ankur Arora
  2024-04-25  7:41   ` Chen Yu
  1 sibling, 0 replies; 9+ messages in thread
From: Ankur Arora @ 2024-04-25  6:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon, linux-doc, linux-kernel,
	Valentin Schneider, Marco Elver, Frederic Weisbecker,
	David Matlack, Friedrich Weber, Ankur Arora, Thomas Gleixner


Sean Christopherson <seanjc@google.com> writes:

> Use preempt_model_preemptible() to detect a preemptible kernel when
> deciding whether or not to reschedule in order to drop a contended
> spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
> built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
> preemption model is "none" or "voluntary".  In short, make kernels with
> dynamically selected models behave the same as kernels with statically
> selected models.

Agreed. This behaviour makes sense. Should also be useful for PREEMPT_AUTO.

The only thing that gives me pause is that now there is an extra
call+ret even when we don't yield the lock.

But maybe that could be addressed separately by converting
preempt_model_* to use a static key or similar.

> Somewhat counter-intuitively, NOT yielding a lock can provide better
> latency for the relevant tasks/processes.  E.g. KVM x86's mmu_lock, a
> rwlock, is often contended between an invalidation event (takes mmu_lock
> for write) and a vCPU servicing a guest page fault (takes mmu_lock for
> read).  For _some_ setups, letting the invalidation task complete even
> if there is mmu_lock contention provides lower latency for *all* tasks,
> i.e. the invalidation completes sooner *and* the vCPU services the guest
> page fault sooner.
>
> But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
> can vary depending on the host VMM, the guest workload, the number of
> vCPUs, the number of pCPUs in the host, why there is lock contention, etc.
>
> In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
> opposite and removing contention yielding entirely) needs to come with a
> big pile of data proving that changing the status quo is a net positive.
>
> Opportunistically document this side effect of preempt=full, as yielding
> contended spinlocks can have significant, user-visible impact.
>
> Fixes: c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic preempt mode")
> Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Marco Elver <elver@google.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Friedrich Weber <f.weber@proxmox.com>
> Cc: Ankur Arora <ankur.a.arora@oracle.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>

> ---
>  Documentation/admin-guide/kernel-parameters.txt |  4 +++-
>  include/linux/spinlock.h                        | 14 ++++++--------
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 825398d66c69..fdeddb066439 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4689,7 +4689,9 @@
>  			none - Limited to cond_resched() calls
>  			voluntary - Limited to cond_resched() and might_sleep() calls
>  			full - Any section that isn't explicitly preempt disabled
> -			       can be preempted anytime.
> +			       can be preempted anytime.  Tasks will also yield
> +			       contended spinlocks (if the critical section isn't
> +			       explicitly preempt disabled beyond the lock itself).

This seems to read a bit better:

+			       can be preempted anytime.  Tasks will also yield
+			       contended spinlocks (unless the critical section is
+			       explicitly preempt disabled beyond the lock itself).


Ankur

>  	print-fatal-signals=
>  			[KNL] debug: print fatal signals
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 3fcd20de6ca8..63dd8cf3c3c2 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -462,11 +462,10 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
>   */
>  static inline int spin_needbreak(spinlock_t *lock)
>  {
> -#ifdef CONFIG_PREEMPTION
> +	if (!preempt_model_preemptible())
> +		return 0;
> +
>  	return spin_is_contended(lock);
> -#else
> -	return 0;
> -#endif
>  }
>
>  /*
> @@ -479,11 +478,10 @@ static inline int spin_needbreak(spinlock_t *lock)
>   */
>  static inline int rwlock_needbreak(rwlock_t *lock)
>  {
> -#ifdef CONFIG_PREEMPTION
> +	if (!preempt_model_preemptible())
> +		return 0;
> +
>  	return rwlock_is_contended(lock);
> -#else
> -	return 0;
> -#endif
>  }
>
>  /*

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h
  2024-03-12 19:39 ` [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h Sean Christopherson
@ 2024-04-25  6:19   ` Ankur Arora
  0 siblings, 0 replies; 9+ messages in thread
From: Ankur Arora @ 2024-04-25  6:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon, linux-doc, linux-kernel,
	Valentin Schneider, Marco Elver, Frederic Weisbecker,
	David Matlack, Friedrich Weber, Ankur Arora, Thomas Gleixner


Sean Christopherson <seanjc@google.com> writes:

> Move the declarations and inlined implementations of the preempt_model_*()
> helpers to preempt.h so that they can be referenced in spinlock.h without
> creating a potential circular dependency between spinlock.h and sched.h.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>

Ankur

> ---
>  include/linux/preempt.h | 41 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/sched.h   | 41 -----------------------------------------
>  2 files changed, 41 insertions(+), 41 deletions(-)
>
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index 7233e9cf1bab..ce76f1a45722 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -481,4 +481,45 @@ DEFINE_LOCK_GUARD_0(preempt, preempt_disable(), preempt_enable())
>  DEFINE_LOCK_GUARD_0(preempt_notrace, preempt_disable_notrace(), preempt_enable_notrace())
>  DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable())
>
> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +
> +extern bool preempt_model_none(void);
> +extern bool preempt_model_voluntary(void);
> +extern bool preempt_model_full(void);
> +
> +#else
> +
> +static inline bool preempt_model_none(void)
> +{
> +	return IS_ENABLED(CONFIG_PREEMPT_NONE);
> +}
> +static inline bool preempt_model_voluntary(void)
> +{
> +	return IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY);
> +}
> +static inline bool preempt_model_full(void)
> +{
> +	return IS_ENABLED(CONFIG_PREEMPT);
> +}
> +
> +#endif
> +
> +static inline bool preempt_model_rt(void)
> +{
> +	return IS_ENABLED(CONFIG_PREEMPT_RT);
> +}
> +
> +/*
> + * Does the preemption model allow non-cooperative preemption?
> + *
> + * For !CONFIG_PREEMPT_DYNAMIC kernels this is an exact match with
> + * CONFIG_PREEMPTION; for CONFIG_PREEMPT_DYNAMIC this doesn't work as the
> + * kernel is *built* with CONFIG_PREEMPTION=y but may run with e.g. the
> + * PREEMPT_NONE model.
> + */
> +static inline bool preempt_model_preemptible(void)
> +{
> +	return preempt_model_full() || preempt_model_rt();
> +}
> +
>  #endif /* __LINUX_PREEMPT_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 17cb0761ff65..e9dc10f7a463 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2058,47 +2058,6 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock);
>  	__cond_resched_rwlock_write(lock);					\
>  })
>
> -#ifdef CONFIG_PREEMPT_DYNAMIC
> -
> -extern bool preempt_model_none(void);
> -extern bool preempt_model_voluntary(void);
> -extern bool preempt_model_full(void);
> -
> -#else
> -
> -static inline bool preempt_model_none(void)
> -{
> -	return IS_ENABLED(CONFIG_PREEMPT_NONE);
> -}
> -static inline bool preempt_model_voluntary(void)
> -{
> -	return IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY);
> -}
> -static inline bool preempt_model_full(void)
> -{
> -	return IS_ENABLED(CONFIG_PREEMPT);
> -}
> -
> -#endif
> -
> -static inline bool preempt_model_rt(void)
> -{
> -	return IS_ENABLED(CONFIG_PREEMPT_RT);
> -}
> -
> -/*
> - * Does the preemption model allow non-cooperative preemption?
> - *
> - * For !CONFIG_PREEMPT_DYNAMIC kernels this is an exact match with
> - * CONFIG_PREEMPTION; for CONFIG_PREEMPT_DYNAMIC this doesn't work as the
> - * kernel is *built* with CONFIG_PREEMPTION=y but may run with e.g. the
> - * PREEMPT_NONE model.
> - */
> -static inline bool preempt_model_preemptible(void)
> -{
> -	return preempt_model_full() || preempt_model_rt();
> -}
> -
>  static __always_inline bool need_resched(void)
>  {
>  	return unlikely(tif_need_resched());

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible
  2024-03-12 19:39 ` [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible Sean Christopherson
  2024-04-25  6:18   ` Ankur Arora
@ 2024-04-25  7:41   ` Chen Yu
  2024-04-25 16:47     ` Sean Christopherson
  1 sibling, 1 reply; 9+ messages in thread
From: Chen Yu @ 2024-04-25  7:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon, linux-doc, linux-kernel,
	Valentin Schneider, Marco Elver, Frederic Weisbecker,
	David Matlack, Friedrich Weber, Ankur Arora, Thomas Gleixner

Hi Sean,

On 2024-03-12 at 12:39:11 -0700, Sean Christopherson wrote:
> Use preempt_model_preemptible() to detect a preemptible kernel when
> deciding whether or not to reschedule in order to drop a contended
> spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels

It took me a while to wonder why PREEMPT_DYNAMIC selects PREEMPTION
in Kconfig, then I assume that you mean the static config is CONFIG_PREEMPTION,
but the live preemption model is "none" or "voluntary", which makes the
static check of CONFIG_PREEMPTION in spin_needbreak() and rwlock_needbreak()
invalid?

> built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
> preemption model is "none" or "voluntary".


> In short, make kernels with
> dynamically selected models behave the same as kernels with statically
> selected models.
> 
> Somewhat counter-intuitively, NOT yielding a lock can provide better
> latency for the relevant tasks/processes.  E.g. KVM x86's mmu_lock, a
> rwlock, is often contended between an invalidation event (takes mmu_lock
> for write) and a vCPU servicing a guest page fault (takes mmu_lock for
> read).  For _some_ setups, letting the invalidation task complete even
> if there is mmu_lock contention provides lower latency for *all* tasks,
> i.e. the invalidation completes sooner *and* the vCPU services the guest
> page fault sooner.
> 
> But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
> can vary depending on the host VMM, the guest workload, the number of
> vCPUs, the number of pCPUs in the host, why there is lock contention, etc.
> 
> In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
> opposite and removing contention yielding entirely) needs to come with a
> big pile of data proving that changing the status quo is a net positive.
> 
> Opportunistically document this side effect of preempt=full, as yielding
> contended spinlocks can have significant, user-visible impact.
> 
> Fixes: c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic preempt mode")
> Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@proxmox.com
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Marco Elver <elver@google.com>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Friedrich Weber <f.weber@proxmox.com>
> Cc: Ankur Arora <ankur.a.arora@oracle.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  4 +++-
>  include/linux/spinlock.h                        | 14 ++++++--------
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 825398d66c69..fdeddb066439 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4689,7 +4689,9 @@
>  			none - Limited to cond_resched() calls
>  			voluntary - Limited to cond_resched() and might_sleep() calls
>  			full - Any section that isn't explicitly preempt disabled
> -			       can be preempted anytime.
> +			       can be preempted anytime.  Tasks will also yield
> +			       contended spinlocks (if the critical section isn't
> +			       explicitly preempt disabled beyond the lock itself).
>  
>  	print-fatal-signals=
>  			[KNL] debug: print fatal signals
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 3fcd20de6ca8..63dd8cf3c3c2 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -462,11 +462,10 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
>   */
>  static inline int spin_needbreak(spinlock_t *lock)
>  {
> -#ifdef CONFIG_PREEMPTION
> +	if (!preempt_model_preemptible())

The old version checks against static CONFIG_PREEMPTION, now we check
the live CONFIG_PREEMPTION and static CONFIG_PREEMPT_RT, just wonder
if the rt check is needed here?

thanks,
Chenyu 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible
  2024-04-25  7:41   ` Chen Yu
@ 2024-04-25 16:47     ` Sean Christopherson
  2024-04-26  3:41       ` Chen Yu
  0 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2024-04-25 16:47 UTC (permalink / raw)
  To: Chen Yu
  Cc: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon, linux-doc, linux-kernel,
	Valentin Schneider, Marco Elver, Frederic Weisbecker,
	David Matlack, Friedrich Weber, Ankur Arora, Thomas Gleixner

On Thu, Apr 25, 2024, Chen Yu wrote:
> Hi Sean,
> 
> On 2024-03-12 at 12:39:11 -0700, Sean Christopherson wrote:
> > Use preempt_model_preemptible() to detect a preemptible kernel when
> > deciding whether or not to reschedule in order to drop a contended
> > spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
> 
> It took me a while to wonder why PREEMPT_DYNAMIC selects PREEMPTION
> in Kconfig, then I assume that you mean the static config is CONFIG_PREEMPTION,
> but the live preemption model is "none" or "voluntary", which makes the
> static check of CONFIG_PREEMPTION in spin_needbreak() and rwlock_needbreak()
> invalid?

Yep, exactly.

> > diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> > index 3fcd20de6ca8..63dd8cf3c3c2 100644
> > --- a/include/linux/spinlock.h
> > +++ b/include/linux/spinlock.h
> > @@ -462,11 +462,10 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
> >   */
> >  static inline int spin_needbreak(spinlock_t *lock)
> >  {
> > -#ifdef CONFIG_PREEMPTION
> > +	if (!preempt_model_preemptible())
> 
> The old version checks against static CONFIG_PREEMPTION, now we check
> the live CONFIG_PREEMPTION and static CONFIG_PREEMPT_RT, just wonder
> if the rt check is needed here?

It's required, as CONFIG_PREEMPT_RT=y doesn't imply CONFIG_PREEMPT, and
CONFIG_PREEMPT_RT=y is mutually exclusive with CONFIG_PREEMPT_DYNAMIC.  I.e. a
CONFIG_PREEMPT_RT=y kernel will look yield:

  CONFIG_PREEMPT_RT=y
  CONFIG_PREEMPT_DYNAMIC=n
  CONFIG_PREEMPT=n

which in turn generates:

  static inline bool preempt_model_full(void)
  {
	return IS_ENABLED(CONFIG_PREEMPT);
  }

and so just checking preempt_model_full() would incorrectly return false for
CONFIG_PREEMPT_RT=y.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible
  2024-04-25 16:47     ` Sean Christopherson
@ 2024-04-26  3:41       ` Chen Yu
  0 siblings, 0 replies; 9+ messages in thread
From: Chen Yu @ 2024-04-26  3:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jonathan Corbet, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Will Deacon, linux-doc, linux-kernel,
	Valentin Schneider, Marco Elver, Frederic Weisbecker,
	David Matlack, Friedrich Weber, Ankur Arora, Thomas Gleixner

On 2024-04-25 at 09:47:52 -0700, Sean Christopherson wrote:
> On Thu, Apr 25, 2024, Chen Yu wrote:
> > Hi Sean,
> > 
> > On 2024-03-12 at 12:39:11 -0700, Sean Christopherson wrote:
> > > Use preempt_model_preemptible() to detect a preemptible kernel when
> > > deciding whether or not to reschedule in order to drop a contended
> > > spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
> > 
> > It took me a while to wonder why PREEMPT_DYNAMIC selects PREEMPTION
> > in Kconfig, then I assume that you mean the static config is CONFIG_PREEMPTION,
> > but the live preemption model is "none" or "voluntary", which makes the
> > static check of CONFIG_PREEMPTION in spin_needbreak() and rwlock_needbreak()
> > invalid?
> 
> Yep, exactly.
> 
> > > diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> > > index 3fcd20de6ca8..63dd8cf3c3c2 100644
> > > --- a/include/linux/spinlock.h
> > > +++ b/include/linux/spinlock.h
> > > @@ -462,11 +462,10 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
> > >   */
> > >  static inline int spin_needbreak(spinlock_t *lock)
> > >  {
> > > -#ifdef CONFIG_PREEMPTION
> > > +	if (!preempt_model_preemptible())
> > 
> > The old version checks against static CONFIG_PREEMPTION, now we check
> > the live CONFIG_PREEMPTION and static CONFIG_PREEMPT_RT, just wonder
> > if the rt check is needed here?
> 
> It's required, as CONFIG_PREEMPT_RT=y doesn't imply CONFIG_PREEMPT, and
> CONFIG_PREEMPT_RT=y is mutually exclusive with CONFIG_PREEMPT_DYNAMIC.  I.e. a
> CONFIG_PREEMPT_RT=y kernel will look yield:
> 
>   CONFIG_PREEMPT_RT=y
>   CONFIG_PREEMPT_DYNAMIC=n
>   CONFIG_PREEMPT=n
> 
> which in turn generates:
> 
>   static inline bool preempt_model_full(void)
>   {
> 	return IS_ENABLED(CONFIG_PREEMPT);
>   }
> 
> and so just checking preempt_model_full() would incorrectly return false for
> CONFIG_PREEMPT_RT=y.

You are right,  I missunderstood the definition of preempt_model_full(). For my
understanding of this patch:

Reviewed-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-26  3:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 19:39 [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson
2024-03-12 19:39 ` [PATCH v2 1/2] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h Sean Christopherson
2024-04-25  6:19   ` Ankur Arora
2024-03-12 19:39 ` [PATCH v2 2/2] sched/core: Drop spinlocks on contention iff kernel is preemptible Sean Christopherson
2024-04-25  6:18   ` Ankur Arora
2024-04-25  7:41   ` Chen Yu
2024-04-25 16:47     ` Sean Christopherson
2024-04-26  3:41       ` Chen Yu
2024-04-24 20:08 ` [PATCH v2 0/2] sched/core: Fix spinlocks vs. PREEMPT_DYNAMIC=y Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.