linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/15]
@ 2020-09-28 23:30 Paul E. McKenney
  2020-09-28 23:30 ` [PATCH tip/core/rcu 01/15] lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies paulmck
                   ` (14 more replies)
  0 siblings, 15 replies; 30+ messages in thread
From: Paul E. McKenney @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds

Hello!

This is a repost of Thomas Gleixner's "Make preempt count unconditional"
series [1], but with the addition of a kvfree_rcu() bug-fix patch making
use of this PREEMPT_COUNT addition.  This series is intended for the
upcoming v5.10 merge window.

This addition fixes the -rt bug reported by Sebastian Andrzej Siewior
[2].  Please note that with the advent of the new lockdep Kconfig
option CONFIG_PROVE_RAW_LOCK_NESTING, this is now also a mainline bug.
In happy contrast to the surprisingly large number of earlier versions
of this fix, this version uses only pre-existing kernel interfaces,
and furthermore uses them in conventional ways.

1-13.	Thomas's series [1].

14.	Use the newly accurate preemptible() macro to cause kvfree_rcu()
	to do its allocations only when it is safe to do so, even in
	CONFIG_PREEMPT_NONE=y kernels, courtesy of Uladzislau Rezki.
	Again, this fixes the bug reported by Sebastian [2].

15.	Checkpatch fix removing NULL check guarding kfree(), courtesy
	of kernel test robot and Julia Lawall.

In addition, this series reduces the size of the kernel by 100 lines
of code, about two thirds of which is from Thomas's original series and
the remaining one third from the bug fix.

Changes from v1 [1]:

o	Fix trivial !SMP build failure.
o	Apply checkpatch spelling suggestions.

							Thanx, Paul

[1]	https://lore.kernel.org/linux-mm/20200914204209.256266093@linutronix.de/
[2]	https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/

------------------------------------------------------------------------

 arch/arm/include/asm/assembler.h                                 |   11 -
 arch/arm/kernel/iwmmxt.S                                         |    2 
 arch/arm/mach-ep93xx/crunch-bits.S                               |    2 
 arch/xtensa/kernel/entry.S                                       |    2 
 drivers/gpu/drm/i915/Kconfig.debug                               |    1 
 drivers/gpu/drm/i915/i915_utils.h                                |    3 
 include/linux/bit_spinlock.h                                     |    4 
 include/linux/lockdep.h                                          |    6 
 include/linux/pagemap.h                                          |    4 
 include/linux/preempt.h                                          |   37 -----
 include/linux/uaccess.h                                          |    6 
 kernel/Kconfig.preempt                                           |    6 
 kernel/rcu/tree.c                                                |   73 ++--------
 kernel/sched/core.c                                              |    6 
 lib/Kconfig.debug                                                |    5 
 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t            |    1 
 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u            |    1 
 tools/testing/selftests/rcutorture/configs/rcu/TINY01            |    1 
 tools/testing/selftests/rcutorture/doc/TINY_RCU.txt              |    5 
 tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt      |    1 
 tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h |    1 
 21 files changed, 39 insertions(+), 139 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 01/15] lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 02/15] preempt: Make preempt count unconditional paulmck
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

ARCH_NO_PREEMPT disables the selection of CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT, but architectures which set this config option still
support preempt count for hard and softirq accounting.

There is absolutely no reason to prevent lockdep from using the preempt
counter nor is there a reason to prevent the enablement of
CONFIG_DEBUG_ATOMIC_SLEEP on such architectures.

Remove the dependencies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 lib/Kconfig.debug | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e068c3c..f50fbcf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1161,7 +1161,7 @@ config PROVE_LOCKING
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
-	select PREEMPT_COUNT if !ARCH_NO_PREEMPT
+	select PREEMPT_COUNT
 	select TRACE_IRQFLAGS
 	default n
 	help
@@ -1323,7 +1323,6 @@ config DEBUG_ATOMIC_SLEEP
 	bool "Sleep inside atomic section checking"
 	select PREEMPT_COUNT
 	depends on DEBUG_KERNEL
-	depends on !ARCH_NO_PREEMPT
 	help
 	  If you say Y here, various routines which may sleep will become very
 	  noisy if they are called inside atomic sections: when a spinlock is
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 02/15] preempt: Make preempt count unconditional
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
  2020-09-28 23:30 ` [PATCH tip/core/rcu 01/15] lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 03/15] preempt: Cleanup PREEMPT_COUNT leftovers paulmck
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

The handling of preempt_count() is inconsistent across kernel
configurations. On kernels which have PREEMPT_COUNT=n
preempt_disable/enable() and the lock/unlock functions are not affecting
the preempt count, only local_bh_disable/enable() and _bh variants of
locking, soft interrupt delivery, hard interrupt and NMI context affect it.

It's therefore impossible to have a consistent set of checks which provide
information about the context in which a function is called. In many cases
it makes sense to have separate functions for separate contexts, but there
are valid reasons to avoid that and handle different calling contexts
conditionally.

The lack of such indicators which work on all kernel configuratios is a
constant source of trouble because developers either do not understand the
implications or try to work around this inconsistency in weird
ways. Neither seem these issues be catched by reviewers and testing.

Recently merged code does:

	 gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;

Looks obviously correct, except for the fact that preemptible() is
unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
that code use GFP_ATOMIC on such kernels.

Attempts to make preempt count unconditional and consistent have been
rejected in the past with handwaving performance arguments.

Freshly conducted benchmarks did not reveal any measurable impact from
enabling preempt count unconditionally. On kernels with CONFIG_PREEMPT_NONE
or CONFIG_PREEMPT_VOLUNTARY the preempt count is only incremented and
decremented but the result of the decrement is not tested. Contrary to that
enabling CONFIG_PREEMPT which tests the result has a small but measurable
impact due to the conditional branch/call.

It's about time to make essential functionality of the kernel consistent
across the various preemption models.

Enable CONFIG_PREEMPT_COUNT unconditionally. Follow up changes will remove
the #ifdeffery and remove the config option at the end.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/Kconfig.preempt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index bf82259..3f4712f 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -75,8 +75,7 @@ config PREEMPT_RT
 endchoice
 
 config PREEMPT_COUNT
-       bool
+       def_bool y
 
 config PREEMPTION
        bool
-       select PREEMPT_COUNT
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 03/15] preempt: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
  2020-09-28 23:30 ` [PATCH tip/core/rcu 01/15] lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 02/15] preempt: Make preempt count unconditional paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 04/15] lockdep: " paulmck
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/preempt.h | 37 ++++---------------------------------
 1 file changed, 4 insertions(+), 33 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 7d9c1c0..513769b 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -56,8 +56,7 @@
 #define PREEMPT_DISABLED	(PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
 
 /*
- * Disable preemption until the scheduler is running -- use an unconditional
- * value so that it also works on !PREEMPT_COUNT kernels.
+ * Disable preemption until the scheduler is running.
  *
  * Reset by start_kernel()->sched_init()->init_idle()->init_idle_preempt_count().
  */
@@ -69,7 +68,6 @@
  *
  *    preempt_count() == 2*PREEMPT_DISABLE_OFFSET
  *
- * Note: PREEMPT_DISABLE_OFFSET is 0 for !PREEMPT_COUNT kernels.
  * Note: See finish_task_switch().
  */
 #define FORK_PREEMPT_COUNT	(2*PREEMPT_DISABLE_OFFSET + PREEMPT_ENABLED)
@@ -106,11 +104,7 @@
 /*
  * The preempt_count offset after preempt_disable();
  */
-#if defined(CONFIG_PREEMPT_COUNT)
-# define PREEMPT_DISABLE_OFFSET	PREEMPT_OFFSET
-#else
-# define PREEMPT_DISABLE_OFFSET	0
-#endif
+#define PREEMPT_DISABLE_OFFSET	PREEMPT_OFFSET
 
 /*
  * The preempt_count offset after spin_lock()
@@ -122,8 +116,8 @@
  *
  *  spin_lock_bh()
  *
- * Which need to disable both preemption (CONFIG_PREEMPT_COUNT) and
- * softirqs, such that unlock sequences of:
+ * Which need to disable both preemption and softirqs, such that unlock
+ * sequences of:
  *
  *  spin_unlock();
  *  local_bh_enable();
@@ -164,8 +158,6 @@ extern void preempt_count_sub(int val);
 #define preempt_count_inc() preempt_count_add(1)
 #define preempt_count_dec() preempt_count_sub(1)
 
-#ifdef CONFIG_PREEMPT_COUNT
-
 #define preempt_disable() \
 do { \
 	preempt_count_inc(); \
@@ -231,27 +223,6 @@ do { \
 	__preempt_count_dec(); \
 } while (0)
 
-#else /* !CONFIG_PREEMPT_COUNT */
-
-/*
- * Even if we don't have any preemption, we need preempt disable/enable
- * to be barriers, so that we don't have things like get_user/put_user
- * that can cause faults and scheduling migrate into our preempt-protected
- * region.
- */
-#define preempt_disable()			barrier()
-#define sched_preempt_enable_no_resched()	barrier()
-#define preempt_enable_no_resched()		barrier()
-#define preempt_enable()			barrier()
-#define preempt_check_resched()			do { } while (0)
-
-#define preempt_disable_notrace()		barrier()
-#define preempt_enable_no_resched_notrace()	barrier()
-#define preempt_enable_notrace()		barrier()
-#define preemptible()				0
-
-#endif /* CONFIG_PREEMPT_COUNT */
-
 #ifdef MODULE
 /*
  * Modules have no business playing preemption tricks.
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 04/15] lockdep: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (2 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 03/15] preempt: Cleanup PREEMPT_COUNT leftovers paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 05/15] mm/pagemap: " paulmck
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Will Deacon, Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/lockdep.h | 6 ++----
 lib/Kconfig.debug       | 1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6a584b3..879de69 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -560,16 +560,14 @@ do {									\
 
 #define lockdep_assert_preemption_enabled()				\
 do {									\
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
-		     debug_locks			&&		\
+	WARN_ON_ONCE(debug_locks			&&		\
 		     (preempt_count() != 0		||		\
 		      !raw_cpu_read(hardirqs_enabled)));		\
 } while (0)
 
 #define lockdep_assert_preemption_disabled()				\
 do {									\
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)	&&		\
-		     debug_locks			&&		\
+	WARN_ON_ONCE(debug_locks			&&		\
 		     (preempt_count() == 0		&&		\
 		      raw_cpu_read(hardirqs_enabled)));			\
 } while (0)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f50fbcf..d4d0574 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1161,7 +1161,6 @@ config PROVE_LOCKING
 	select DEBUG_RWSEMS
 	select DEBUG_WW_MUTEX_SLOWPATH
 	select DEBUG_LOCK_ALLOC
-	select PREEMPT_COUNT
 	select TRACE_IRQFLAGS
 	default n
 	help
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 05/15] mm/pagemap: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (3 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 04/15] lockdep: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 06/15] locking/bitspinlock: " paulmck
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	linux-mm, Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
[ paulmck: Fix !SMP build error per kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/pagemap.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 7de11dc..b3d9d92 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -168,9 +168,7 @@ void release_pages(struct page **pages, int nr);
 static inline int __page_cache_add_speculative(struct page *page, int count)
 {
 #ifdef CONFIG_TINY_RCU
-# ifdef CONFIG_PREEMPT_COUNT
-	VM_BUG_ON(!in_atomic() && !irqs_disabled());
-# endif
+	VM_BUG_ON(preemptible());
 	/*
 	 * Preempt must be disabled here - we rely on rcu_read_lock doing
 	 * this for us.
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 06/15] locking/bitspinlock: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (4 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 05/15] mm/pagemap: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 07/15] uaccess: " paulmck
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/bit_spinlock.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/bit_spinlock.h b/include/linux/bit_spinlock.h
index bbc4730..1e03d54 100644
--- a/include/linux/bit_spinlock.h
+++ b/include/linux/bit_spinlock.h
@@ -90,10 +90,8 @@ static inline int bit_spin_is_locked(int bitnum, unsigned long *addr)
 {
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
 	return test_bit(bitnum, addr);
-#elif defined CONFIG_PREEMPT_COUNT
-	return preempt_count();
 #else
-	return 1;
+	return preempt_count();
 #endif
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 07/15] uaccess: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (5 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 06/15] locking/bitspinlock: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 08/15] sched: " paulmck
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/uaccess.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 94b2854..b79a5a9 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -230,9 +230,9 @@ static inline bool pagefault_disabled(void)
  *
  * This function should only be used by the fault handlers. Other users should
  * stick to pagefault_disabled().
- * Please NEVER use preempt_disable() to disable the fault handler. With
- * !CONFIG_PREEMPT_COUNT, this is like a NOP. So the handler won't be disabled.
- * in_atomic() will report different values based on !CONFIG_PREEMPT_COUNT.
+ *
+ * Please NEVER use preempt_disable() or local_irq_disable() to disable the
+ * fault handler.
  */
 #define faulthandler_disabled() (pagefault_disabled() || in_atomic())
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 08/15] sched: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (6 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 07/15] uaccess: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 09/15] ARM: " paulmck
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/sched/core.c | 6 +-----
 lib/Kconfig.debug   | 1 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d95dc3..1c304a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3706,8 +3706,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 	 * finish_task_switch() for details.
 	 *
 	 * finish_task_switch() will drop rq->lock() and lower preempt_count
-	 * and the preempt_enable() will end up enabling preemption (on
-	 * PREEMPT_COUNT kernels).
+	 * and the preempt_enable() will end up enabling preemption.
 	 */
 
 	rq = finish_task_switch(prev);
@@ -7308,9 +7307,6 @@ void __cant_sleep(const char *file, int line, int preempt_offset)
 	if (irqs_disabled())
 		return;
 
-	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
-		return;
-
 	if (preempt_count() > preempt_offset)
 		return;
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d4d0574..52af6ad 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1320,7 +1320,6 @@ config DEBUG_LOCKDEP
 
 config DEBUG_ATOMIC_SLEEP
 	bool "Sleep inside atomic section checking"
-	select PREEMPT_COUNT
 	depends on DEBUG_KERNEL
 	help
 	  If you say Y here, various routines which may sleep will become very
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 09/15] ARM: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (7 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 08/15] sched: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 10/15] xtensa: " paulmck
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Russell King, linux-arm-kernel, Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/arm/include/asm/assembler.h   | 11 -----------
 arch/arm/kernel/iwmmxt.S           |  2 --
 arch/arm/mach-ep93xx/crunch-bits.S |  2 --
 3 files changed, 15 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index feac2c8..fce52eed 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -212,7 +212,6 @@
 /*
  * Increment/decrement the preempt count.
  */
-#ifdef CONFIG_PREEMPT_COUNT
 	.macro	inc_preempt_count, ti, tmp
 	ldr	\tmp, [\ti, #TI_PREEMPT]	@ get preempt count
 	add	\tmp, \tmp, #1			@ increment it
@@ -229,16 +228,6 @@
 	get_thread_info \ti
 	dec_preempt_count \ti, \tmp
 	.endm
-#else
-	.macro	inc_preempt_count, ti, tmp
-	.endm
-
-	.macro	dec_preempt_count, ti, tmp
-	.endm
-
-	.macro	dec_preempt_count_ti, ti, tmp
-	.endm
-#endif
 
 #define USERL(l, x...)				\
 9999:	x;					\
diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S
index 0dcae78..1f845be 100644
--- a/arch/arm/kernel/iwmmxt.S
+++ b/arch/arm/kernel/iwmmxt.S
@@ -94,9 +94,7 @@ ENTRY(iwmmxt_task_enable)
 	mov	r2, r2				@ cpwait
 	bl	concan_save
 
-#ifdef CONFIG_PREEMPT_COUNT
 	get_thread_info r10
-#endif
 4:	dec_preempt_count r10, r3
 	ret	r9				@ normal exit from exception
 
diff --git a/arch/arm/mach-ep93xx/crunch-bits.S b/arch/arm/mach-ep93xx/crunch-bits.S
index fb2dbf7..0aabcf4 100644
--- a/arch/arm/mach-ep93xx/crunch-bits.S
+++ b/arch/arm/mach-ep93xx/crunch-bits.S
@@ -191,9 +191,7 @@ crunch_load:
 	cfldr64		mvdx15, [r0, #CRUNCH_MVDX15]
 
 1:
-#ifdef CONFIG_PREEMPT_COUNT
 	get_thread_info r10
-#endif
 2:	dec_preempt_count r10, r3
 	ret	lr
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 10/15] xtensa: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (8 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 09/15] ARM: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:30 ` [PATCH tip/core/rcu 11/15] drm/i915: " paulmck
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Chris Zankel, Max Filippov, linux-xtensa, Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: linux-xtensa@linux-xtensa.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/xtensa/kernel/entry.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/xtensa/kernel/entry.S b/arch/xtensa/kernel/entry.S
index 703cf62..5a27dd3 100644
--- a/arch/xtensa/kernel/entry.S
+++ b/arch/xtensa/kernel/entry.S
@@ -819,7 +819,7 @@ ENTRY(debug_exception)
 	 * preemption if we have HW breakpoints to preserve DEBUGCAUSE.DBNUM
 	 * meaning.
 	 */
-#if defined(CONFIG_PREEMPT_COUNT) && defined(CONFIG_HAVE_HW_BREAKPOINT)
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
 	GET_THREAD_INFO(a2, a1)
 	l32i	a3, a2, TI_PRE_COUNT
 	addi	a3, a3, 1
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 11/15] drm/i915: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (9 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 10/15] xtensa: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-10-01  7:17   ` Joonas Lahtinen
  2020-09-28 23:30 ` [PATCH tip/core/rcu 12/15] rcutorture: " paulmck
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	Daniel Vetter, intel-gfx, dri-devel, Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 drivers/gpu/drm/i915/Kconfig.debug | 1 -
 drivers/gpu/drm/i915/i915_utils.h  | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
index 1cb28c2..17d9b00 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -20,7 +20,6 @@ config DRM_I915_DEBUG
 	bool "Enable additional driver debugging"
 	depends on DRM_I915
 	select DEBUG_FS
-	select PREEMPT_COUNT
 	select I2C_CHARDEV
 	select STACKDEPOT
 	select DRM_DP_AUX_CHARDEV
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 5477337..ecfed86 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -337,8 +337,7 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 						   (Wmax))
 #define wait_for(COND, MS)		_wait_for((COND), (MS) * 1000, 10, 1000)
 
-/* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#ifdef CONFIG_DRM_I915_DEBUG
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
 #else
 # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 12/15] rcutorture: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (10 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 11/15] drm/i915: " paulmck
@ 2020-09-28 23:30 ` paulmck
  2020-09-28 23:31 ` [PATCH tip/core/rcu 13/15] preempt: Remove PREEMPT_COUNT from Kconfig paulmck
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Paul E. McKenney, Shuah Khan, linux-kselftest

From: Thomas Gleixner <tglx@linutronix.de>

CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: rcu@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t            | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u            | 1 -
 tools/testing/selftests/rcutorture/configs/rcu/TINY01            | 1 -
 tools/testing/selftests/rcutorture/doc/TINY_RCU.txt              | 5 ++---
 tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt      | 1 -
 tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h | 1 -
 6 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
index 6c78022..553cf65 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
@@ -7,4 +7,3 @@ CONFIG_RCU_TRACE=n
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
 CONFIG_DEBUG_ATOMIC_SLEEP=y
-#CHECK#CONFIG_PREEMPT_COUNT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
index c15ada8..99563da 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
@@ -7,4 +7,3 @@ CONFIG_RCU_TRACE=n
 CONFIG_DEBUG_LOCK_ALLOC=y
 CONFIG_PROVE_LOCKING=y
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TINY01 b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
index 6db705e..9b22b8e 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TINY01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
@@ -10,4 +10,3 @@ CONFIG_RCU_TRACE=n
 #CHECK#CONFIG_RCU_STALL_COMMON=n
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
index a75b169..d30cedf 100644
--- a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
+++ b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
@@ -3,11 +3,10 @@ This document gives a brief rationale for the TINY_RCU test cases.
 
 Kconfig Parameters:
 
-CONFIG_DEBUG_LOCK_ALLOC -- Do all three and none of the three.
-CONFIG_PREEMPT_COUNT
+CONFIG_DEBUG_LOCK_ALLOC -- Do both and none of the two.
 CONFIG_RCU_TRACE
 
-The theory here is that randconfig testing will hit the other six possible
+The theory here is that randconfig testing will hit the other two possible
 combinations of these parameters.
 
 
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index 1b96d68..cfdd48f 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -43,7 +43,6 @@ CONFIG_64BIT
 
 	Used only to check CONFIG_RCU_FANOUT value, inspection suffices.
 
-CONFIG_PREEMPT_COUNT
 CONFIG_PREEMPT_RCU
 
 	Redundant with CONFIG_PREEMPT, ignore.
diff --git a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
index 283d710..d0d485d 100644
--- a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
+++ b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
@@ -8,7 +8,6 @@
 #undef CONFIG_HOTPLUG_CPU
 #undef CONFIG_MODULES
 #undef CONFIG_NO_HZ_FULL_SYSIDLE
-#undef CONFIG_PREEMPT_COUNT
 #undef CONFIG_PREEMPT_RCU
 #undef CONFIG_PROVE_RCU
 #undef CONFIG_RCU_NOCB_CPU
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 13/15] preempt: Remove PREEMPT_COUNT from Kconfig
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (11 preceding siblings ...)
  2020-09-28 23:30 ` [PATCH tip/core/rcu 12/15] rcutorture: " paulmck
@ 2020-09-28 23:31 ` paulmck
  2020-09-28 23:31 ` [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible paulmck
  2020-09-28 23:31 ` [PATCH tip/core/rcu 15/15] kvfree_rcu(): Fix ifnullfree.cocci warnings paulmck
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:31 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Paul E . McKenney

From: Thomas Gleixner <tglx@linutronix.de>

All conditionals and irritations are gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/Kconfig.preempt | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 3f4712f..120b63f 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -74,8 +74,5 @@ config PREEMPT_RT
 
 endchoice
 
-config PREEMPT_COUNT
-       def_bool y
-
 config PREEMPTION
        bool
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (12 preceding siblings ...)
  2020-09-28 23:31 ` [PATCH tip/core/rcu 13/15] preempt: Remove PREEMPT_COUNT from Kconfig paulmck
@ 2020-09-28 23:31 ` paulmck
  2020-09-29 12:07   ` Michal Hocko
  2020-09-28 23:31 ` [PATCH tip/core/rcu 15/15] kvfree_rcu(): Fix ifnullfree.cocci warnings paulmck
  14 siblings, 1 reply; 30+ messages in thread
From: paulmck @ 2020-09-28 23:31 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

The current memory-allocation interface poses the following challenges:

a)	In kernels built with CONFIG_PROVE_RAW_LOCK_NESTING, lockdep
	complains ("BUG: Invalid wait context").  This complaint is due
	to the memory allocator acquiring non-raw spinlocks while a raw
	spinlocks is held.  This problem can also arise if kvfree_rcu()
	is invoked while holding a raw spinlock.

b)	In -rt kernels built with CONFIG_PREEMPT_RT, the situation
	described in (a) above results in an attempt to acquire a
	sleeplock while holding a spinlock, which is of course forbidden.
	This can lead to "BUG: scheduling while atomic".

c)	Please note that call_rcu() is invoked from raw atomic context,
	so that kfree_rcu() and kvfree_rcu() are therefore also expected
	to be callable from atomic raw context as well.

However given that CONFIG_PREEMPT_COUNT is unconditionally enabled
by the earlier commits in this series, the preemptible() macro now
properly detects preempt-disable code regions even in kernels built
with CONFIG_PREEMPT_NONE.

This commit therefore uses preemptible() to determine whether allocation
is possible at all for double-argument kvfree_rcu().  If !preemptible(),
then allocation is not possible, and kvfree_rcu() falls back to using
the less cache-friendly rcu_head approach.  Even when preemptible(),
the caller might be involved in reclaim, so the GFP_ flags used by
double-argument kvfree_rcu() must avoid invoking reclaim processing.

Note that single-argument kvfree_rcu() must be invoked in sleepable
contexts, and that its fallback is the relatively high latency
synchronize_rcu().  Single-argument kvfree_rcu() therefore uses
GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
memory allocator.

Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/
Fixes: 3042f83f19be ("rcu: Support reclaim for head-less object")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 70 ++++++++++++++++---------------------------------------
 1 file changed, 20 insertions(+), 50 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9..cc998d7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3166,7 +3166,7 @@ static void kfree_rcu_work(struct work_struct *work)
 			krc_this_cpu_unlock(krcp, flags);
 
 			if (bkvhead[i])
-				free_page((unsigned long) bkvhead[i]);
+				kfree(bkvhead[i]);
 
 			cond_resched_tasks_rcu_qs();
 		}
@@ -3291,43 +3291,28 @@ static void kfree_rcu_monitor(struct work_struct *work)
 }
 
 static inline bool
-kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
+add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
+	unsigned long *flags, void *ptr, bool can_sleep)
 {
 	struct kvfree_rcu_bulk_data *bnode;
+	bool can_alloc_page = preemptible();
+	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
 	int idx;
 
-	if (unlikely(!krcp->initialized))
+	*krcp = krc_this_cpu_lock(flags);
+	if (unlikely(!(*krcp)->initialized))
 		return false;
 
-	lockdep_assert_held(&krcp->lock);
 	idx = !!is_vmalloc_addr(ptr);
 
 	/* Check if a new block is required. */
-	if (!krcp->bkvhead[idx] ||
-			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
-		bnode = get_cached_bnode(krcp);
-		if (!bnode) {
-			/*
-			 * To keep this path working on raw non-preemptible
-			 * sections, prevent the optional entry into the
-			 * allocator as it uses sleeping locks. In fact, even
-			 * if the caller of kfree_rcu() is preemptible, this
-			 * path still is not, as krcp->lock is a raw spinlock.
-			 * With additional page pre-allocation in the works,
-			 * hitting this return is going to be much less likely.
-			 */
-			if (IS_ENABLED(CONFIG_PREEMPT_RT))
-				return false;
-
-			/*
-			 * NOTE: For one argument of kvfree_rcu() we can
-			 * drop the lock and get the page in sleepable
-			 * context. That would allow to maintain an array
-			 * for the CONFIG_PREEMPT_RT as well if no cached
-			 * pages are available.
-			 */
-			bnode = (struct kvfree_rcu_bulk_data *)
-				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+	if (!(*krcp)->bkvhead[idx] ||
+			(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
+		bnode = get_cached_bnode(*krcp);
+		if (!bnode && can_alloc_page) {
+			krc_this_cpu_unlock(*krcp, *flags);
+			bnode = kmalloc(PAGE_SIZE, gfp);
+			*krcp = krc_this_cpu_lock(flags);
 		}
 
 		/* Switch to emergency path. */
@@ -3336,15 +3321,15 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
-		bnode->next = krcp->bkvhead[idx];
+		bnode->next = (*krcp)->bkvhead[idx];
 
 		/* Attach it to the head. */
-		krcp->bkvhead[idx] = bnode;
+		(*krcp)->bkvhead[idx] = bnode;
 	}
 
 	/* Finally insert. */
-	krcp->bkvhead[idx]->records
-		[krcp->bkvhead[idx]->nr_records++] = ptr;
+	(*krcp)->bkvhead[idx]->records
+		[(*krcp)->bkvhead[idx]->nr_records++] = ptr;
 
 	return true;
 }
@@ -3382,24 +3367,20 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 		ptr = (unsigned long *) func;
 	}
 
-	krcp = krc_this_cpu_lock(&flags);
-
 	// Queue the object but don't yet schedule the batch.
 	if (debug_rcu_head_queue(ptr)) {
 		// Probable double kfree_rcu(), just leak.
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
 
-		// Mark as success and leave.
-		success = true;
-		goto unlock_return;
+		return;
 	}
 
 	/*
 	 * Under high memory pressure GFP_NOWAIT can fail,
 	 * in that case the emergency path is maintained.
 	 */
-	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
+	success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
 	if (!success) {
 		if (head == NULL)
 			// Inline if kvfree_rcu(one_arg) call.
@@ -4394,23 +4375,12 @@ static void __init kfree_rcu_batch_init(void)
 
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-		struct kvfree_rcu_bulk_data *bnode;
 
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
 		}
 
-		for (i = 0; i < rcu_min_cached_objs; i++) {
-			bnode = (struct kvfree_rcu_bulk_data *)
-				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-
-			if (bnode)
-				put_cached_bnode(krcp, bnode);
-			else
-				pr_err("Failed to preallocate for %d CPU!\n", cpu);
-		}
-
 		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
 		krcp->initialized = true;
 	}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH tip/core/rcu 15/15] kvfree_rcu(): Fix ifnullfree.cocci warnings
  2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
                   ` (13 preceding siblings ...)
  2020-09-28 23:31 ` [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible paulmck
@ 2020-09-28 23:31 ` paulmck
  14 siblings, 0 replies; 30+ messages in thread
From: paulmck @ 2020-09-28 23:31 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	kernel test robot, Julia Lawall, Paul E . McKenney

From: kernel test robot <lkp@intel.com>

NULL check before kfree is not needed.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Fixes: 9f7e887e648c ("rcu/tree: Allocate a page when caller is preemptible")
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cc998d7..4a85ea2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3165,8 +3165,7 @@ static void kfree_rcu_work(struct work_struct *work)
 				bkvhead[i] = NULL;
 			krc_this_cpu_unlock(krcp, flags);
 
-			if (bkvhead[i])
-				kfree(bkvhead[i]);
+			kfree(bkvhead[i]);
 
 			cond_resched_tasks_rcu_qs();
 		}
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-28 23:31 ` [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible paulmck
@ 2020-09-29 12:07   ` Michal Hocko
  2020-09-30  1:53     ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-09-29 12:07 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Mon 28-09-20 16:31:01, paulmck@kernel.org wrote:
[...]
> This commit therefore uses preemptible() to determine whether allocation
> is possible at all for double-argument kvfree_rcu().

This deserves a comment. Because GFP_ATOMIC is possible for many
!preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
others that are a problem. You are taking a conservative approach which
is fine but it would be good to articulate that explicitly.

> If !preemptible(),
> then allocation is not possible, and kvfree_rcu() falls back to using
> the less cache-friendly rcu_head approach.  Even when preemptible(),
> the caller might be involved in reclaim, so the GFP_ flags used by
> double-argument kvfree_rcu() must avoid invoking reclaim processing.

Could you be more specific? Is this about being called directly in the
reclaim context and you want to prevent a recursion? If that is the
case, do you really need to special case this in any way? Any memory
reclaim will set PF_MEMALLOC so allocations called from that context
will not perform reclaim. So if you are called from the reclaim directly
then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN.
That should handle both from-the-recalim and outside of reclaim contexts
just fine (assuming you don't allocated from !preemptible()) context.

> Note that single-argument kvfree_rcu() must be invoked in sleepable
> contexts, and that its fallback is the relatively high latency
> synchronize_rcu().  Single-argument kvfree_rcu() therefore uses
> GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
> memory allocator.
[...]
>  static inline bool
> -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> +	unsigned long *flags, void *ptr, bool can_sleep)
>  {
>  	struct kvfree_rcu_bulk_data *bnode;
> +	bool can_alloc_page = preemptible();
> +	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;

This is quite confusing IMHO. At least without a further explanation.
can_sleep is not as much about sleeping as it is about the reclaim
recursion AFAIU your changelog, right?

>  	int idx;
>  
> -	if (unlikely(!krcp->initialized))
> +	*krcp = krc_this_cpu_lock(flags);
> +	if (unlikely(!(*krcp)->initialized))
>  		return false;
>  
> -	lockdep_assert_held(&krcp->lock);
>  	idx = !!is_vmalloc_addr(ptr);
>  
>  	/* Check if a new block is required. */
> -	if (!krcp->bkvhead[idx] ||
> -			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> -		bnode = get_cached_bnode(krcp);
> -		if (!bnode) {
> -			/*
> -			 * To keep this path working on raw non-preemptible
> -			 * sections, prevent the optional entry into the
> -			 * allocator as it uses sleeping locks. In fact, even
> -			 * if the caller of kfree_rcu() is preemptible, this
> -			 * path still is not, as krcp->lock is a raw spinlock.
> -			 * With additional page pre-allocation in the works,
> -			 * hitting this return is going to be much less likely.
> -			 */
> -			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -				return false;
> -
> -			/*
> -			 * NOTE: For one argument of kvfree_rcu() we can
> -			 * drop the lock and get the page in sleepable
> -			 * context. That would allow to maintain an array
> -			 * for the CONFIG_PREEMPT_RT as well if no cached
> -			 * pages are available.
> -			 */
> -			bnode = (struct kvfree_rcu_bulk_data *)
> -				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> +	if (!(*krcp)->bkvhead[idx] ||
> +			(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> +		bnode = get_cached_bnode(*krcp);
> +		if (!bnode && can_alloc_page) {
> +			krc_this_cpu_unlock(*krcp, *flags);
> +			bnode = kmalloc(PAGE_SIZE, gfp);

What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
using the page allocator directly be better?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-29 12:07   ` Michal Hocko
@ 2020-09-30  1:53     ` Paul E. McKenney
  2020-09-30  8:41       ` Michal Hocko
  0 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2020-09-30  1:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> On Mon 28-09-20 16:31:01, paulmck@kernel.org wrote:
> [...]

Apologies for the delay, but today has not been boring.

> > This commit therefore uses preemptible() to determine whether allocation
> > is possible at all for double-argument kvfree_rcu().
> 
> This deserves a comment. Because GFP_ATOMIC is possible for many
> !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
> others that are a problem. You are taking a conservative approach which
> is fine but it would be good to articulate that explicitly.

Good point, and so I have added the following as a header comment to
the add_ptr_to_bulk_krc_lock() function:

// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
// state specified by flags.  If can_sleep is true, the caller must
// be schedulable and not be holding any locks or mutexes that might be
// acquired by the memory allocator or anything that it might invoke.
// If !can_sleep, then if !preemptible() no allocation will be undertaken,
// otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
// the aforementioned deadlock possibilities.  Returns true iff ptr was
// successfully recorded, else the caller must use a fallback.

The updated patch is shown below.

Of course, to Vlastimil's point, lockless access to the allocator
would significantly reduce the level of confusion posed by this code.
But that is a separate issue at the moment.

> > If !preemptible(),
> > then allocation is not possible, and kvfree_rcu() falls back to using
> > the less cache-friendly rcu_head approach.  Even when preemptible(),
> > the caller might be involved in reclaim, so the GFP_ flags used by
> > double-argument kvfree_rcu() must avoid invoking reclaim processing.
> 
> Could you be more specific? Is this about being called directly in the
> reclaim context and you want to prevent a recursion? If that is the
> case, do you really need to special case this in any way? Any memory
> reclaim will set PF_MEMALLOC so allocations called from that context
> will not perform reclaim. So if you are called from the reclaim directly
> then you might want to do GFP_KERNEL | __GFP_NOMEMALLOC | __GFP_NOWARN.
> That should handle both from-the-recalim and outside of reclaim contexts
> just fine (assuming you don't allocated from !preemptible()) context.

My thought was to point you at Daniel Vetter, but you already got in
touch with him, so thank you for that!

> > Note that single-argument kvfree_rcu() must be invoked in sleepable
> > contexts, and that its fallback is the relatively high latency
> > synchronize_rcu().  Single-argument kvfree_rcu() therefore uses
> > GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
> > memory allocator.
> [...]
> >  static inline bool
> > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > +	unsigned long *flags, void *ptr, bool can_sleep)
> >  {
> >  	struct kvfree_rcu_bulk_data *bnode;
> > +	bool can_alloc_page = preemptible();
> > +	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
> 
> This is quite confusing IMHO. At least without a further explanation.
> can_sleep is not as much about sleeping as it is about the reclaim
> recursion AFAIU your changelog, right?

No argument on it being confusing, and I hope that the added header
comment helps.  But specifically, can_sleep==true is a promise by the
caller to be schedulable and not to be holding any lock/mutex/whatever
that might possibly be acquired by the memory allocator or by anything
else that the memory allocator might invoke, to your point, including
for but one example the reclaim logic.

The only way that can_sleep==true is if this function was invoked due
to a call to single-argument kvfree_rcu(), which must be schedulable
because its fallback is to invoke synchronize_rcu().

> >  	int idx;
> >  
> > -	if (unlikely(!krcp->initialized))
> > +	*krcp = krc_this_cpu_lock(flags);
> > +	if (unlikely(!(*krcp)->initialized))
> >  		return false;
> >  
> > -	lockdep_assert_held(&krcp->lock);
> >  	idx = !!is_vmalloc_addr(ptr);
> >  
> >  	/* Check if a new block is required. */
> > -	if (!krcp->bkvhead[idx] ||
> > -			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > -		bnode = get_cached_bnode(krcp);
> > -		if (!bnode) {
> > -			/*
> > -			 * To keep this path working on raw non-preemptible
> > -			 * sections, prevent the optional entry into the
> > -			 * allocator as it uses sleeping locks. In fact, even
> > -			 * if the caller of kfree_rcu() is preemptible, this
> > -			 * path still is not, as krcp->lock is a raw spinlock.
> > -			 * With additional page pre-allocation in the works,
> > -			 * hitting this return is going to be much less likely.
> > -			 */
> > -			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > -				return false;
> > -
> > -			/*
> > -			 * NOTE: For one argument of kvfree_rcu() we can
> > -			 * drop the lock and get the page in sleepable
> > -			 * context. That would allow to maintain an array
> > -			 * for the CONFIG_PREEMPT_RT as well if no cached
> > -			 * pages are available.
> > -			 */
> > -			bnode = (struct kvfree_rcu_bulk_data *)
> > -				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > +	if (!(*krcp)->bkvhead[idx] ||
> > +			(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > +		bnode = get_cached_bnode(*krcp);
> > +		if (!bnode && can_alloc_page) {
> > +			krc_this_cpu_unlock(*krcp, *flags);
> > +			bnode = kmalloc(PAGE_SIZE, gfp);
> 
> What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> using the page allocator directly be better?

Well, you guys gave me considerable heat about abusing internal allocator
interfaces, and kmalloc() and kfree() seem to be about as non-internal
as you can get and still be invoking the allocator.  ;-)

Again, please see below for the updated patch.  The only change from
the earlier verison is the addition of the header comment mentioned above.

							Thanx, Paul

------------------------------------------------------------------------

commit 3a8e10ea2a2ecf59a33d7d80ad6c4372c0c27f78
Author: Uladzislau Rezki (Sony) <urezki@gmail.com>
Date:   Tue Sep 22 21:06:22 2020 +0200

    rcu/tree: Allocate a page when caller is preemptible
    
    The current memory-allocation interface poses the following challenges:
    
    a)      In kernels built with CONFIG_PROVE_RAW_LOCK_NESTING, lockdep
            complains ("BUG: Invalid wait context").  This complaint is due
            to the memory allocator acquiring non-raw spinlocks while a raw
            spinlocks is held.  This problem can also arise if kvfree_rcu()
            is invoked while holding a raw spinlock.
    
    b)      In -rt kernels built with CONFIG_PREEMPT_RT, the situation
            described in (a) above results in an attempt to acquire a
            sleeplock while holding a spinlock, which is of course forbidden.
            This can lead to "BUG: scheduling while atomic".
    
    c)      Please note that call_rcu() is invoked from raw atomic context,
            so that kfree_rcu() and kvfree_rcu() are therefore also expected
            to be callable from atomic raw context as well.
    
    However given that CONFIG_PREEMPT_COUNT is unconditionally enabled
    by the earlier commits in this series, the preemptible() macro now
    properly detects preempt-disable code regions even in kernels built
    with CONFIG_PREEMPT_NONE.
    
    This commit therefore uses preemptible() to determine whether allocation
    is possible at all for double-argument kvfree_rcu().  If !preemptible(),
    then allocation is not possible, and kvfree_rcu() falls back to using
    the less cache-friendly rcu_head approach.  Even when preemptible(),
    the caller might be involved in reclaim, so the GFP_ flags used by
    double-argument kvfree_rcu() must avoid invoking reclaim processing.
    
    Note that single-argument kvfree_rcu() must be invoked in sleepable
    contexts, and that its fallback is the relatively high latency
    synchronize_rcu().  Single-argument kvfree_rcu() therefore uses
    GFP_KERNEL|__GFP_RETRY_MAYFAIL to allow limited sleeping within the
    memory allocator.
    
    Link: https://lore.kernel.org/lkml/20200630164543.4mdcf6zb4zfclhln@linutronix.de/
    Fixes: 3042f83f19be ("rcu: Support reclaim for head-less object")
    Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    [ paulmck: Add add_ptr_to_bulk_krc_lock header comment per Michal Hocko. ]
    Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9..39ac930 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3166,7 +3166,7 @@ static void kfree_rcu_work(struct work_struct *work)
 			krc_this_cpu_unlock(krcp, flags);
 
 			if (bkvhead[i])
-				free_page((unsigned long) bkvhead[i]);
+				kfree(bkvhead[i]);
 
 			cond_resched_tasks_rcu_qs();
 		}
@@ -3290,44 +3290,37 @@ static void kfree_rcu_monitor(struct work_struct *work)
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
+// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
+// state specified by flags.  If can_sleep is true, the caller must
+// be schedulable and not be holding any locks or mutexes that might be
+// acquired by the memory allocator or anything that it might invoke.
+// If !can_sleep, then if !preemptible() no allocation will be undertaken,
+// otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
+// the aforementioned deadlock possibilities.  Returns true iff ptr was
+// successfully recorded, else the caller must use a fallback.
 static inline bool
-kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
+add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
+	unsigned long *flags, void *ptr, bool can_sleep)
 {
 	struct kvfree_rcu_bulk_data *bnode;
+	bool can_alloc_page = preemptible();
+	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
 	int idx;
 
-	if (unlikely(!krcp->initialized))
+	*krcp = krc_this_cpu_lock(flags);
+	if (unlikely(!(*krcp)->initialized))
 		return false;
 
-	lockdep_assert_held(&krcp->lock);
 	idx = !!is_vmalloc_addr(ptr);
 
 	/* Check if a new block is required. */
-	if (!krcp->bkvhead[idx] ||
-			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
-		bnode = get_cached_bnode(krcp);
-		if (!bnode) {
-			/*
-			 * To keep this path working on raw non-preemptible
-			 * sections, prevent the optional entry into the
-			 * allocator as it uses sleeping locks. In fact, even
-			 * if the caller of kfree_rcu() is preemptible, this
-			 * path still is not, as krcp->lock is a raw spinlock.
-			 * With additional page pre-allocation in the works,
-			 * hitting this return is going to be much less likely.
-			 */
-			if (IS_ENABLED(CONFIG_PREEMPT_RT))
-				return false;
-
-			/*
-			 * NOTE: For one argument of kvfree_rcu() we can
-			 * drop the lock and get the page in sleepable
-			 * context. That would allow to maintain an array
-			 * for the CONFIG_PREEMPT_RT as well if no cached
-			 * pages are available.
-			 */
-			bnode = (struct kvfree_rcu_bulk_data *)
-				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+	if (!(*krcp)->bkvhead[idx] ||
+			(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
+		bnode = get_cached_bnode(*krcp);
+		if (!bnode && can_alloc_page) {
+			krc_this_cpu_unlock(*krcp, *flags);
+			bnode = kmalloc(PAGE_SIZE, gfp);
+			*krcp = krc_this_cpu_lock(flags);
 		}
 
 		/* Switch to emergency path. */
@@ -3336,15 +3329,15 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
-		bnode->next = krcp->bkvhead[idx];
+		bnode->next = (*krcp)->bkvhead[idx];
 
 		/* Attach it to the head. */
-		krcp->bkvhead[idx] = bnode;
+		(*krcp)->bkvhead[idx] = bnode;
 	}
 
 	/* Finally insert. */
-	krcp->bkvhead[idx]->records
-		[krcp->bkvhead[idx]->nr_records++] = ptr;
+	(*krcp)->bkvhead[idx]->records
+		[(*krcp)->bkvhead[idx]->nr_records++] = ptr;
 
 	return true;
 }
@@ -3382,24 +3375,20 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 		ptr = (unsigned long *) func;
 	}
 
-	krcp = krc_this_cpu_lock(&flags);
-
 	// Queue the object but don't yet schedule the batch.
 	if (debug_rcu_head_queue(ptr)) {
 		// Probable double kfree_rcu(), just leak.
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
 
-		// Mark as success and leave.
-		success = true;
-		goto unlock_return;
+		return;
 	}
 
 	/*
 	 * Under high memory pressure GFP_NOWAIT can fail,
 	 * in that case the emergency path is maintained.
 	 */
-	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
+	success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
 	if (!success) {
 		if (head == NULL)
 			// Inline if kvfree_rcu(one_arg) call.
@@ -4394,23 +4383,12 @@ static void __init kfree_rcu_batch_init(void)
 
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-		struct kvfree_rcu_bulk_data *bnode;
 
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
 		}
 
-		for (i = 0; i < rcu_min_cached_objs; i++) {
-			bnode = (struct kvfree_rcu_bulk_data *)
-				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
-
-			if (bnode)
-				put_cached_bnode(krcp, bnode);
-			else
-				pr_err("Failed to preallocate for %d CPU!\n", cpu);
-		}
-
 		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
 		krcp->initialized = true;
 	}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-30  1:53     ` Paul E. McKenney
@ 2020-09-30  8:41       ` Michal Hocko
  2020-09-30 12:31         ` Uladzislau Rezki
  2020-09-30 23:21         ` Paul E. McKenney
  0 siblings, 2 replies; 30+ messages in thread
From: Michal Hocko @ 2020-09-30  8:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> > On Mon 28-09-20 16:31:01, paulmck@kernel.org wrote:
> > [...]
> 
> Apologies for the delay, but today has not been boring.
> 
> > > This commit therefore uses preemptible() to determine whether allocation
> > > is possible at all for double-argument kvfree_rcu().
> > 
> > This deserves a comment. Because GFP_ATOMIC is possible for many
> > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
> > others that are a problem. You are taking a conservative approach which
> > is fine but it would be good to articulate that explicitly.
> 
> Good point, and so I have added the following as a header comment to
> the add_ptr_to_bulk_krc_lock() function:
> 
> // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
> // state specified by flags.  If can_sleep is true, the caller must
> // be schedulable and not be holding any locks or mutexes that might be
> // acquired by the memory allocator or anything that it might invoke.
> // If !can_sleep, then if !preemptible() no allocation will be undertaken,
> // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
> // the aforementioned deadlock possibilities.  Returns true iff ptr was
> // successfully recorded, else the caller must use a fallback.

OK, not trivial to follow but at least verbose enough to understand the
intention after some mulling. Definitely an improvement, thanks!

[...]
> > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > > +	unsigned long *flags, void *ptr, bool can_sleep)
> > >  {
> > >  	struct kvfree_rcu_bulk_data *bnode;
> > > +	bool can_alloc_page = preemptible();
> > > +	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
> > 
> > This is quite confusing IMHO. At least without a further explanation.
> > can_sleep is not as much about sleeping as it is about the reclaim
> > recursion AFAIU your changelog, right?
> 
> No argument on it being confusing, and I hope that the added header
> comment helps.  But specifically, can_sleep==true is a promise by the
> caller to be schedulable and not to be holding any lock/mutex/whatever
> that might possibly be acquired by the memory allocator or by anything
> else that the memory allocator might invoke, to your point, including
> for but one example the reclaim logic.
> 
> The only way that can_sleep==true is if this function was invoked due
> to a call to single-argument kvfree_rcu(), which must be schedulable
> because its fallback is to invoke synchronize_rcu().

OK. I have to say that it is still not clear to me whether this call
path can be called from the memory reclaim context. If yes then you need
__GFP_NOMEMALLOC as well.

[...]

> > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > using the page allocator directly be better?
> 
> Well, you guys gave me considerable heat about abusing internal allocator
> interfaces, and kmalloc() and kfree() seem to be about as non-internal
> as you can get and still be invoking the allocator.  ;-)

alloc_pages resp. __get_free_pages is a normal page allocator interface
to use for page size granular allocations. kmalloc is for more fine
grained allocations.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-30  8:41       ` Michal Hocko
@ 2020-09-30 12:31         ` Uladzislau Rezki
  2020-09-30 23:21         ` Paul E. McKenney
  1 sibling, 0 replies; 30+ messages in thread
From: Uladzislau Rezki @ 2020-09-30 12:31 UTC (permalink / raw)
  To: Michal Hocko, Paul E. McKenney
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, akpm, mathieu.desnoyers, josh, tglx, peterz,
	rostedt, dhowells, edumazet, fweisbec, oleg, joel, mgorman,
	torvalds, Uladzislau Rezki (Sony)

>
> > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > using the page allocator directly be better?
> > 
> > Well, you guys gave me considerable heat about abusing internal allocator
> > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > as you can get and still be invoking the allocator.  ;-)
> 
> alloc_pages resp. __get_free_pages is a normal page allocator interface
> to use for page size granular allocations. kmalloc is for more fine
> grained allocations.
>
I do not have a strong opinion here but i would tend to using of the page
allocator directly. Because we operate on a page level where the page
allocator is supposed to be used. From the other hand i saw a slightly
better performance in case of SLAB only. I think that is because of
a) slab caches; b) more objects which can be cached. But i have one more
concern about using of kmalloc(). That is about fragmentations it can
cause with PAGE_SIZE requests.

As for SLUB it was identical whereas the SLOB i din not check.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-30  8:41       ` Michal Hocko
  2020-09-30 12:31         ` Uladzislau Rezki
@ 2020-09-30 23:21         ` Paul E. McKenney
  2020-10-01  9:02           ` Michal Hocko
  1 sibling, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2020-09-30 23:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> > On Tue, Sep 29, 2020 at 02:07:56PM +0200, Michal Hocko wrote:
> > > On Mon 28-09-20 16:31:01, paulmck@kernel.org wrote:
> > > [...]
> > 
> > Apologies for the delay, but today has not been boring.
> > 
> > > > This commit therefore uses preemptible() to determine whether allocation
> > > > is possible at all for double-argument kvfree_rcu().
> > > 
> > > This deserves a comment. Because GFP_ATOMIC is possible for many
> > > !preemptible() contexts. It is the raw_spin_lock, NMIs and likely few
> > > others that are a problem. You are taking a conservative approach which
> > > is fine but it would be good to articulate that explicitly.
> > 
> > Good point, and so I have added the following as a header comment to
> > the add_ptr_to_bulk_krc_lock() function:
> > 
> > // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
> > // state specified by flags.  If can_sleep is true, the caller must
> > // be schedulable and not be holding any locks or mutexes that might be
> > // acquired by the memory allocator or anything that it might invoke.
> > // If !can_sleep, then if !preemptible() no allocation will be undertaken,
> > // otherwise the allocation will use GFP_ATOMIC to avoid the remainder of
> > // the aforementioned deadlock possibilities.  Returns true iff ptr was
> > // successfully recorded, else the caller must use a fallback.
> 
> OK, not trivial to follow but at least verbose enough to understand the
> intention after some mulling. Definitely an improvement, thanks!

Glad it helped!  With some luck, perhaps it will improve with time...

> [...]
> > > > -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > > +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> > > > +	unsigned long *flags, void *ptr, bool can_sleep)
> > > >  {
> > > >  	struct kvfree_rcu_bulk_data *bnode;
> > > > +	bool can_alloc_page = preemptible();
> > > > +	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
> > > 
> > > This is quite confusing IMHO. At least without a further explanation.
> > > can_sleep is not as much about sleeping as it is about the reclaim
> > > recursion AFAIU your changelog, right?
> > 
> > No argument on it being confusing, and I hope that the added header
> > comment helps.  But specifically, can_sleep==true is a promise by the
> > caller to be schedulable and not to be holding any lock/mutex/whatever
> > that might possibly be acquired by the memory allocator or by anything
> > else that the memory allocator might invoke, to your point, including
> > for but one example the reclaim logic.
> > 
> > The only way that can_sleep==true is if this function was invoked due
> > to a call to single-argument kvfree_rcu(), which must be schedulable
> > because its fallback is to invoke synchronize_rcu().
> 
> OK. I have to say that it is still not clear to me whether this call
> path can be called from the memory reclaim context. If yes then you need
> __GFP_NOMEMALLOC as well.

Right now the restriction is that single-argument (AKA can_sleep==true)
kvfree_rcu() cannot be invoked from memory reclaim context.

But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
allow us to remove this restriction?  If so, I will queue a separate
patch making this change.  The improved ease of use would be well
worth it, if I understand correctly (ha!!!).

> [...]
> 
> > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > using the page allocator directly be better?
> > 
> > Well, you guys gave me considerable heat about abusing internal allocator
> > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > as you can get and still be invoking the allocator.  ;-)
> 
> alloc_pages resp. __get_free_pages is a normal page allocator interface
> to use for page size granular allocations. kmalloc is for more fine
> grained allocations.

OK, in the short term, both work, but I have queued a separate patch
making this change and recording the tradeoffs.  This is not yet a
promise to push this patch, but it is a promise not to lose this part
of the picture.  Please see below.

You mentioned alloc_pages().  I reverted to __get_free_pages(), but
alloc_pages() of course looks nicer.  What are the tradeoffs between
__get_free_pages() and alloc_pages()?

							Thanx, Paul

------------------------------------------------------------------------

commit 490b638d7c241ac06cee168ccf8688bb8b872478
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed Sep 30 16:16:39 2020 -0700

    kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
    
    The advantages of using kmalloc() and kfree() are a possible small speedup
    on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
    more-familiar API members.  The advantages of using __get_free_page()
    and free_page() are a possible reduction in fragmentation and direct
    access to the buddy allocator.
    
    To help settle the question as to which to use, this commit switches
    from kmalloc() and kfree() to __get_free_page() and free_page().
    
    Suggested-by: Michal Hocko <mhocko@suse.com>
    Suggested-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2886e81..242f0f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work)
 				bkvhead[i] = NULL;
 			krc_this_cpu_unlock(krcp, flags);
 
-			kfree(bkvhead[i]);
+			if (bkvhead[i])
+				free_page((unsigned long)bkvhead[i]);
 
 			cond_resched_tasks_rcu_qs();
 		}
@@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
 		bnode = get_cached_bnode(*krcp);
 		if (!bnode && can_alloc_page) {
 			krc_this_cpu_unlock(*krcp, *flags);
-			bnode = kmalloc(PAGE_SIZE, gfp);
+			bnode = (struct kvfree_rcu_bulk_data *)__get_free_page(gfp);
 			*krcp = krc_this_cpu_lock(flags);
 		}
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 11/15] drm/i915: Cleanup PREEMPT_COUNT leftovers
  2020-09-28 23:30 ` [PATCH tip/core/rcu 11/15] drm/i915: " paulmck
@ 2020-10-01  7:17   ` Joonas Lahtinen
  2020-10-01  8:25     ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Joonas Lahtinen @ 2020-10-01  7:17 UTC (permalink / raw)
  To: paulmck, rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mhocko, mgorman, torvalds,
	Jani Nikula, Rodrigo Vivi, David Airlie, Daniel Vetter,
	intel-gfx, dri-devel, Paul E . McKenney

Quoting paulmck@kernel.org (2020-09-29 02:30:58)
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> removed. Cleanup the leftovers before doing so.

Change looks fine:

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Are you looking for us to merge or merge through another tree?

If us, did the base patch always enabling PREEMPT_COUNT go into 5.9 or is
it heading to 5.10? We can queue this earliest for 5.11 as drm-next closed
for 5.10 at week of -rc5.

Regards, Joonas

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: intel-gfx@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  drivers/gpu/drm/i915/Kconfig.debug | 1 -
>  drivers/gpu/drm/i915/i915_utils.h  | 3 +--
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig.debug b/drivers/gpu/drm/i915/Kconfig.debug
> index 1cb28c2..17d9b00 100644
> --- a/drivers/gpu/drm/i915/Kconfig.debug
> +++ b/drivers/gpu/drm/i915/Kconfig.debug
> @@ -20,7 +20,6 @@ config DRM_I915_DEBUG
>         bool "Enable additional driver debugging"
>         depends on DRM_I915
>         select DEBUG_FS
> -       select PREEMPT_COUNT
>         select I2C_CHARDEV
>         select STACKDEPOT
>         select DRM_DP_AUX_CHARDEV
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 5477337..ecfed86 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -337,8 +337,7 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
>                                                    (Wmax))
>  #define wait_for(COND, MS)             _wait_for((COND), (MS) * 1000, 10, 1000)
>  
> -/* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */
> -#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
> +#ifdef CONFIG_DRM_I915_DEBUG
>  # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
>  #else
>  # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 11/15] drm/i915: Cleanup PREEMPT_COUNT leftovers
  2020-10-01  7:17   ` Joonas Lahtinen
@ 2020-10-01  8:25     ` Thomas Gleixner
  2020-10-01 16:03       ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2020-10-01  8:25 UTC (permalink / raw)
  To: Joonas Lahtinen, paulmck, rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, peterz, rostedt, dhowells, edumazet,
	fweisbec, oleg, joel, mhocko, mgorman, torvalds, Jani Nikula,
	Rodrigo Vivi, David Airlie, Daniel Vetter, intel-gfx, dri-devel,
	Paul E . McKenney

On Thu, Oct 01 2020 at 10:17, Joonas Lahtinen wrote:
> Quoting paulmck@kernel.org (2020-09-29 02:30:58)
>> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
>> removed. Cleanup the leftovers before doing so.
>
> Change looks fine:
>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>
> Are you looking for us to merge or merge through another tree?
>
> If us, did the base patch always enabling PREEMPT_COUNT go into 5.9 or is
> it heading to 5.10? We can queue this earliest for 5.11 as drm-next closed
> for 5.10 at week of -rc5.

If at all it goes through rcu/tip because it depends on the earlier patches.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-09-30 23:21         ` Paul E. McKenney
@ 2020-10-01  9:02           ` Michal Hocko
  2020-10-01 16:27             ` Paul E. McKenney
                               ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Michal Hocko @ 2020-10-01  9:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
[...]
> > > No argument on it being confusing, and I hope that the added header
> > > comment helps.  But specifically, can_sleep==true is a promise by the
> > > caller to be schedulable and not to be holding any lock/mutex/whatever
> > > that might possibly be acquired by the memory allocator or by anything
> > > else that the memory allocator might invoke, to your point, including
> > > for but one example the reclaim logic.
> > > 
> > > The only way that can_sleep==true is if this function was invoked due
> > > to a call to single-argument kvfree_rcu(), which must be schedulable
> > > because its fallback is to invoke synchronize_rcu().
> > 
> > OK. I have to say that it is still not clear to me whether this call
> > path can be called from the memory reclaim context. If yes then you need
> > __GFP_NOMEMALLOC as well.
> 
> Right now the restriction is that single-argument (AKA can_sleep==true)
> kvfree_rcu() cannot be invoked from memory reclaim context.
> 
> But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
> allow us to remove this restriction?  If so, I will queue a separate
> patch making this change.  The improved ease of use would be well
> worth it, if I understand correctly (ha!!!).

It would be quite daring to claim it will be ok but it will certainly be
less problematic. Adding the flag will not hurt in any case. As this is
a shared called that might be called from many contexts I think it will
be safer to have it there. The justification is that it will prevent
consumption of memory reserves from MEMALLOC contexts.

> 
> > [...]
> > 
> > > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > > using the page allocator directly be better?
> > > 
> > > Well, you guys gave me considerable heat about abusing internal allocator
> > > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > > as you can get and still be invoking the allocator.  ;-)
> > 
> > alloc_pages resp. __get_free_pages is a normal page allocator interface
> > to use for page size granular allocations. kmalloc is for more fine
> > grained allocations.
> 
> OK, in the short term, both work, but I have queued a separate patch
> making this change and recording the tradeoffs.  This is not yet a
> promise to push this patch, but it is a promise not to lose this part
> of the picture.  Please see below.

It doesn't matter all that much. Both allocators will work. It is just a
matter of using optimal tool for the specific purose.

> You mentioned alloc_pages().  I reverted to __get_free_pages(), but
> alloc_pages() of course looks nicer.  What are the tradeoffs between
> __get_free_pages() and alloc_pages()?

alloc_pages will return struct page but you need a kernel pointer. That
is what __get_free_pages will give you (or you can call page_address
directly).

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 490b638d7c241ac06cee168ccf8688bb8b872478
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Wed Sep 30 16:16:39 2020 -0700
> 
>     kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
>     
>     The advantages of using kmalloc() and kfree() are a possible small speedup
>     on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
>     more-familiar API members.  The advantages of using __get_free_page()
>     and free_page() are a possible reduction in fragmentation and direct
>     access to the buddy allocator.
>     
>     To help settle the question as to which to use, this commit switches
>     from kmalloc() and kfree() to __get_free_page() and free_page().
>     
>     Suggested-by: Michal Hocko <mhocko@suse.com>
>     Suggested-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Yes, looks good to me. I am not entirely sure about the fragmentation
argument. It really depends on the SL.B allocator internals. The same
applies for the potential speed up. I would be even surprised if the
SLAB was faster in average considering it has to use the page allocator
as well. So to me the primary motivation would be "use the right tool
for the purpose".

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2886e81..242f0f0 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work)
>  				bkvhead[i] = NULL;
>  			krc_this_cpu_unlock(krcp, flags);
>  
> -			kfree(bkvhead[i]);
> +			if (bkvhead[i])
> +				free_page((unsigned long)bkvhead[i]);
>  
>  			cond_resched_tasks_rcu_qs();
>  		}
> @@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
>  		bnode = get_cached_bnode(*krcp);
>  		if (!bnode && can_alloc_page) {
>  			krc_this_cpu_unlock(*krcp, *flags);
> -			bnode = kmalloc(PAGE_SIZE, gfp);
> +			bnode = (struct kvfree_rcu_bulk_data *)__get_free_page(gfp);
>  			*krcp = krc_this_cpu_lock(flags);
>  		}
>  

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 11/15] drm/i915: Cleanup PREEMPT_COUNT leftovers
  2020-10-01  8:25     ` Thomas Gleixner
@ 2020-10-01 16:03       ` Paul E. McKenney
  0 siblings, 0 replies; 30+ messages in thread
From: Paul E. McKenney @ 2020-10-01 16:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joonas Lahtinen, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, akpm, mathieu.desnoyers, josh, peterz, rostedt,
	dhowells, edumazet, fweisbec, oleg, joel, mhocko, mgorman,
	torvalds, Jani Nikula, Rodrigo Vivi, David Airlie, Daniel Vetter,
	intel-gfx, dri-devel

On Thu, Oct 01, 2020 at 10:25:06AM +0200, Thomas Gleixner wrote:
> On Thu, Oct 01 2020 at 10:17, Joonas Lahtinen wrote:
> > Quoting paulmck@kernel.org (2020-09-29 02:30:58)
> >> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> >> removed. Cleanup the leftovers before doing so.
> >
> > Change looks fine:
> >
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Applied, thank you!

> > Are you looking for us to merge or merge through another tree?
> >
> > If us, did the base patch always enabling PREEMPT_COUNT go into 5.9 or is
> > it heading to 5.10? We can queue this earliest for 5.11 as drm-next closed
> > for 5.10 at week of -rc5.
> 
> If at all it goes through rcu/tip because it depends on the earlier patches.

I was figuring on sending a pull request later today, Pacific Time.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-10-01  9:02           ` Michal Hocko
@ 2020-10-01 16:27             ` Paul E. McKenney
  2020-10-02  6:57               ` Michal Hocko
  2020-10-01 16:28             ` Paul E. McKenney
  2020-10-01 20:03             ` Uladzislau Rezki
  2 siblings, 1 reply; 30+ messages in thread
From: Paul E. McKenney @ 2020-10-01 16:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote:
> On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> [...]
> > > > No argument on it being confusing, and I hope that the added header
> > > > comment helps.  But specifically, can_sleep==true is a promise by the
> > > > caller to be schedulable and not to be holding any lock/mutex/whatever
> > > > that might possibly be acquired by the memory allocator or by anything
> > > > else that the memory allocator might invoke, to your point, including
> > > > for but one example the reclaim logic.
> > > > 
> > > > The only way that can_sleep==true is if this function was invoked due
> > > > to a call to single-argument kvfree_rcu(), which must be schedulable
> > > > because its fallback is to invoke synchronize_rcu().
> > > 
> > > OK. I have to say that it is still not clear to me whether this call
> > > path can be called from the memory reclaim context. If yes then you need
> > > __GFP_NOMEMALLOC as well.
> > 
> > Right now the restriction is that single-argument (AKA can_sleep==true)
> > kvfree_rcu() cannot be invoked from memory reclaim context.
> > 
> > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
> > allow us to remove this restriction?  If so, I will queue a separate
> > patch making this change.  The improved ease of use would be well
> > worth it, if I understand correctly (ha!!!).
> 
> It would be quite daring to claim it will be ok but it will certainly be
> less problematic. Adding the flag will not hurt in any case. As this is
> a shared called that might be called from many contexts I think it will
> be safer to have it there. The justification is that it will prevent
> consumption of memory reserves from MEMALLOC contexts.

Ah, so a different goal (and yes, I finally went over and read the
relevant documentation).  Agreed, the can_sleep path does not really
need to be dipping into the emergency reserves.  And it looks like the
not-from-reclaim restriction is still at least partially in effect,
but one step at a time.

The patch is shown below, which I have queued for a later release.

> > > [...]
> > > 
> > > > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > > > using the page allocator directly be better?
> > > > 
> > > > Well, you guys gave me considerable heat about abusing internal allocator
> > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > > > as you can get and still be invoking the allocator.  ;-)
> > > 
> > > alloc_pages resp. __get_free_pages is a normal page allocator interface
> > > to use for page size granular allocations. kmalloc is for more fine
> > > grained allocations.
> > 
> > OK, in the short term, both work, but I have queued a separate patch
> > making this change and recording the tradeoffs.  This is not yet a
> > promise to push this patch, but it is a promise not to lose this part
> > of the picture.  Please see below.
> 
> It doesn't matter all that much. Both allocators will work. It is just a
> matter of using optimal tool for the specific purose.
> 
> > You mentioned alloc_pages().  I reverted to __get_free_pages(), but
> > alloc_pages() of course looks nicer.  What are the tradeoffs between
> > __get_free_pages() and alloc_pages()?
> 
> alloc_pages will return struct page but you need a kernel pointer. That
> is what __get_free_pages will give you (or you can call page_address
> directly).

Thank you, looks like __get_free_pages() is the tool for this job.

Please see below for the aforementioned patch.

							Thanx, Paul

------------------------------------------------------------------------

commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Oct 1 09:24:40 2020 -0700

    kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
    
    This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
    carried out by the single-argument variant of kvfree_rcu(), thus avoiding
    this can-sleep code path from dipping into the emergency reserves.
    
    Suggested-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 242f0f0..6132452 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
 {
 	struct kvfree_rcu_bulk_data *bnode;
 	bool can_alloc_page = preemptible();
-	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
+	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC
+			       : GFP_ATOMIC) | __GFP_NOWARN;
 	int idx;
 
 	*krcp = krc_this_cpu_lock(flags);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-10-01  9:02           ` Michal Hocko
  2020-10-01 16:27             ` Paul E. McKenney
@ 2020-10-01 16:28             ` Paul E. McKenney
  2020-10-01 20:03             ` Uladzislau Rezki
  2 siblings, 0 replies; 30+ messages in thread
From: Paul E. McKenney @ 2020-10-01 16:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote:
> On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:

[ . . . ]

Hit "send" too soon, apologies...

> > ------------------------------------------------------------------------
> > 
> > commit 490b638d7c241ac06cee168ccf8688bb8b872478
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Wed Sep 30 16:16:39 2020 -0700
> > 
> >     kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
> >     
> >     The advantages of using kmalloc() and kfree() are a possible small speedup
> >     on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
> >     more-familiar API members.  The advantages of using __get_free_page()
> >     and free_page() are a possible reduction in fragmentation and direct
> >     access to the buddy allocator.
> >     
> >     To help settle the question as to which to use, this commit switches
> >     from kmalloc() and kfree() to __get_free_page() and free_page().
> >     
> >     Suggested-by: Michal Hocko <mhocko@suse.com>
> >     Suggested-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Yes, looks good to me. I am not entirely sure about the fragmentation
> argument. It really depends on the SL.B allocator internals. The same
> applies for the potential speed up. I would be even surprised if the
> SLAB was faster in average considering it has to use the page allocator
> as well. So to me the primary motivation would be "use the right tool
> for the purpose".

Very well, I will update the commit message, and thank you!

							Thanx, Paul

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 2886e81..242f0f0 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3225,7 +3225,8 @@ static void kfree_rcu_work(struct work_struct *work)
> >  				bkvhead[i] = NULL;
> >  			krc_this_cpu_unlock(krcp, flags);
> >  
> > -			kfree(bkvhead[i]);
> > +			if (bkvhead[i])
> > +				free_page((unsigned long)bkvhead[i]);
> >  
> >  			cond_resched_tasks_rcu_qs();
> >  		}
> > @@ -3378,7 +3379,7 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> >  		bnode = get_cached_bnode(*krcp);
> >  		if (!bnode && can_alloc_page) {
> >  			krc_this_cpu_unlock(*krcp, *flags);
> > -			bnode = kmalloc(PAGE_SIZE, gfp);
> > +			bnode = (struct kvfree_rcu_bulk_data *)__get_free_page(gfp);
> >  			*krcp = krc_this_cpu_lock(flags);
> >  		}
> >  
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-10-01  9:02           ` Michal Hocko
  2020-10-01 16:27             ` Paul E. McKenney
  2020-10-01 16:28             ` Paul E. McKenney
@ 2020-10-01 20:03             ` Uladzislau Rezki
  2 siblings, 0 replies; 30+ messages in thread
From: Uladzislau Rezki @ 2020-10-01 20:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, mingo,
	jiangshanlai, akpm, mathieu.desnoyers, josh, tglx, peterz,
	rostedt, dhowells, edumazet, fweisbec, oleg, joel, mgorman,
	torvalds, Uladzislau Rezki (Sony)

On Thu, Oct 01, 2020 at 11:02:20AM +0200, Michal Hocko wrote:
> On Wed 30-09-20 16:21:54, Paul E. McKenney wrote:
> > On Wed, Sep 30, 2020 at 10:41:39AM +0200, Michal Hocko wrote:
> > > On Tue 29-09-20 18:53:27, Paul E. McKenney wrote:
> [...]
> > > > No argument on it being confusing, and I hope that the added header
> > > > comment helps.  But specifically, can_sleep==true is a promise by the
> > > > caller to be schedulable and not to be holding any lock/mutex/whatever
> > > > that might possibly be acquired by the memory allocator or by anything
> > > > else that the memory allocator might invoke, to your point, including
> > > > for but one example the reclaim logic.
> > > > 
> > > > The only way that can_sleep==true is if this function was invoked due
> > > > to a call to single-argument kvfree_rcu(), which must be schedulable
> > > > because its fallback is to invoke synchronize_rcu().
> > > 
> > > OK. I have to say that it is still not clear to me whether this call
> > > path can be called from the memory reclaim context. If yes then you need
> > > __GFP_NOMEMALLOC as well.
> > 
> > Right now the restriction is that single-argument (AKA can_sleep==true)
> > kvfree_rcu() cannot be invoked from memory reclaim context.
> > 
> > But would adding __GFP_NOMEMALLOC to the can_sleep==true GFP_ flags
> > allow us to remove this restriction?  If so, I will queue a separate
> > patch making this change.  The improved ease of use would be well
> > worth it, if I understand correctly (ha!!!).
> 
> It would be quite daring to claim it will be ok but it will certainly be
> less problematic. Adding the flag will not hurt in any case. As this is
> a shared called that might be called from many contexts I think it will
> be safer to have it there. The justification is that it will prevent
> consumption of memory reserves from MEMALLOC contexts.
> 
> > 
> > > [...]
> > > 
> > > > > What is the point of calling kmalloc  for a PAGE_SIZE object? Wouldn't
> > > > > using the page allocator directly be better?
> > > > 
> > > > Well, you guys gave me considerable heat about abusing internal allocator
> > > > interfaces, and kmalloc() and kfree() seem to be about as non-internal
> > > > as you can get and still be invoking the allocator.  ;-)
> > > 
> > > alloc_pages resp. __get_free_pages is a normal page allocator interface
> > > to use for page size granular allocations. kmalloc is for more fine
> > > grained allocations.
> > 
> > OK, in the short term, both work, but I have queued a separate patch
> > making this change and recording the tradeoffs.  This is not yet a
> > promise to push this patch, but it is a promise not to lose this part
> > of the picture.  Please see below.
> 
> It doesn't matter all that much. Both allocators will work. It is just a
> matter of using optimal tool for the specific purose.
> 
> > You mentioned alloc_pages().  I reverted to __get_free_pages(), but
> > alloc_pages() of course looks nicer.  What are the tradeoffs between
> > __get_free_pages() and alloc_pages()?
> 
> alloc_pages will return struct page but you need a kernel pointer. That
> is what __get_free_pages will give you (or you can call page_address
> directly).
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 490b638d7c241ac06cee168ccf8688bb8b872478
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Wed Sep 30 16:16:39 2020 -0700
> > 
> >     kvfree_rcu(): Switch from kmalloc/kfree to __get_free_page/free_page.
> >     
> >     The advantages of using kmalloc() and kfree() are a possible small speedup
> >     on CONFIG_SLAB=y systems, avoiding the allocation-side cast, and use of
> >     more-familiar API members.  The advantages of using __get_free_page()
> >     and free_page() are a possible reduction in fragmentation and direct
> >     access to the buddy allocator.
> >     
> >     To help settle the question as to which to use, this commit switches
> >     from kmalloc() and kfree() to __get_free_page() and free_page().
> >     
> >     Suggested-by: Michal Hocko <mhocko@suse.com>
> >     Suggested-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Yes, looks good to me. I am not entirely sure about the fragmentation
> argument. It really depends on the SL.B allocator internals. The same
> applies for the potential speed up. I would be even surprised if the
> SLAB was faster in average considering it has to use the page allocator
> as well. So to me the primary motivation would be "use the right tool
> for the purpose".
> 
As for raised a concern about fragmentation, i mostly was thinking about
that SLAbs where not designed to do an efficient allocations for sizes
which are >= than PAGE_SIZE. But it depends on three different
implementations, actually it also a good argument to switch to the page
allocator. I mean to get rid of such dependency.

Other side is, SLABs, at least SLAB and SLUB use slab-caches and sizes
which they support include up to:

<snip>
kmalloc-8k           420    420   8192    4
kmalloc-4k          1372   1392   4096    8    8 : tunables    0    0
...
<snip>

I would no be surprised that SLAB is faster than using the page allocator
in _some_ sense. If it is principle i can double check. I can explain it
just in having dynamic caching that can grow based on demand, thus there
is no need to go deeper to page allocator if the kmalloc-4k has extra
objects. But the worst case of course will be slower :)

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-10-01 16:27             ` Paul E. McKenney
@ 2020-10-02  6:57               ` Michal Hocko
  2020-10-02 14:12                 ` Paul E. McKenney
  0 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2020-10-02  6:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Thu 01-10-20 09:27:09, Paul E. McKenney wrote:
[...]
> commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Oct 1 09:24:40 2020 -0700
> 
>     kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
>     
>     This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
>     carried out by the single-argument variant of kvfree_rcu(), thus avoiding
>     this can-sleep code path from dipping into the emergency reserves.
>     
>     Suggested-by: Michal Hocko <mhocko@suse.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

LGTM. At least for this one I feel competent to give you
Acked-by: Michal Hocko <mhocko@suse.com>

> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 242f0f0..6132452 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
>  {
>  	struct kvfree_rcu_bulk_data *bnode;
>  	bool can_alloc_page = preemptible();
> -	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
> +	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC
> +			       : GFP_ATOMIC) | __GFP_NOWARN;
>  	int idx;
>  
>  	*krcp = krc_this_cpu_lock(flags);

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible
  2020-10-02  6:57               ` Michal Hocko
@ 2020-10-02 14:12                 ` Paul E. McKenney
  0 siblings, 0 replies; 30+ messages in thread
From: Paul E. McKenney @ 2020-10-02 14:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: rcu, linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, mgorman, torvalds,
	Uladzislau Rezki (Sony)

On Fri, Oct 02, 2020 at 08:57:46AM +0200, Michal Hocko wrote:
> On Thu 01-10-20 09:27:09, Paul E. McKenney wrote:
> [...]
> > commit ea5c19d21233b5e8d3d06c0d4ecd6be9f2829dc3
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Oct 1 09:24:40 2020 -0700
> > 
> >     kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
> >     
> >     This commit applies the __GFP_NOMEMALLOC gfp flag to memory allocations
> >     carried out by the single-argument variant of kvfree_rcu(), thus avoiding
> >     this can-sleep code path from dipping into the emergency reserves.
> >     
> >     Suggested-by: Michal Hocko <mhocko@suse.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> LGTM. At least for this one I feel competent to give you
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you very much!  I will apply this on the next rebase later today,
Pacific Time.

						Thanx, Paul

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 242f0f0..6132452 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3364,7 +3364,8 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> >  {
> >  	struct kvfree_rcu_bulk_data *bnode;
> >  	bool can_alloc_page = preemptible();
> > -	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL : GFP_ATOMIC) | __GFP_NOWARN;
> > +	gfp_t gfp = (can_sleep ? GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOMEMALLOC
> > +			       : GFP_ATOMIC) | __GFP_NOWARN;
> >  	int idx;
> >  
> >  	*krcp = krc_this_cpu_lock(flags);
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-10-02 14:12 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-28 23:30 [PATCH tip/core/rcu 0/15] Paul E. McKenney
2020-09-28 23:30 ` [PATCH tip/core/rcu 01/15] lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 02/15] preempt: Make preempt count unconditional paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 03/15] preempt: Cleanup PREEMPT_COUNT leftovers paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 04/15] lockdep: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 05/15] mm/pagemap: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 06/15] locking/bitspinlock: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 07/15] uaccess: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 08/15] sched: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 09/15] ARM: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 10/15] xtensa: " paulmck
2020-09-28 23:30 ` [PATCH tip/core/rcu 11/15] drm/i915: " paulmck
2020-10-01  7:17   ` Joonas Lahtinen
2020-10-01  8:25     ` Thomas Gleixner
2020-10-01 16:03       ` Paul E. McKenney
2020-09-28 23:30 ` [PATCH tip/core/rcu 12/15] rcutorture: " paulmck
2020-09-28 23:31 ` [PATCH tip/core/rcu 13/15] preempt: Remove PREEMPT_COUNT from Kconfig paulmck
2020-09-28 23:31 ` [PATCH tip/core/rcu 14/15] rcu/tree: Allocate a page when caller is preemptible paulmck
2020-09-29 12:07   ` Michal Hocko
2020-09-30  1:53     ` Paul E. McKenney
2020-09-30  8:41       ` Michal Hocko
2020-09-30 12:31         ` Uladzislau Rezki
2020-09-30 23:21         ` Paul E. McKenney
2020-10-01  9:02           ` Michal Hocko
2020-10-01 16:27             ` Paul E. McKenney
2020-10-02  6:57               ` Michal Hocko
2020-10-02 14:12                 ` Paul E. McKenney
2020-10-01 16:28             ` Paul E. McKenney
2020-10-01 20:03             ` Uladzislau Rezki
2020-09-28 23:31 ` [PATCH tip/core/rcu 15/15] kvfree_rcu(): Fix ifnullfree.cocci warnings paulmck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).