All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested().
@ 2022-08-25 16:41 Sebastian Andrzej Siewior
  2022-08-25 16:41 ` [PATCH v2 1/8] preempt: Provide preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox

Folks,

this is v2 of "Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested()."
The v1 is at
   https://lore.kernel.org/all/20220817162703.728679-1-bigeasy@linutronix.de/

v1…v2:
   - The SLUB patch got s/use_lockless_fast_path/USE_LOCKLESS_FAST_PATH/ as
     Linus asked for and was handed over to Vlastimil and is sitting in his
     slab tree:
        https://git.kernel.org/vbabka/slab/c/591570a7

   - The compaction patch got an "depends on COMPACTION" so it is not listed
     .config if compaction is disabled (Rasmus Villemoes).

   - The removal of u64_stats_fetch_begin_irq() has been excluded from the
     series and staged for later. Jakub asked for this, so that it can be
     included in later merge window.

   - A patch for flex_proportions has been added which also needs a
     preempt_disable_nested() since the merge window.

Original cover letter:

this is the follow up to the "vfs.git pile 3 - dcache" pull request [0].
It was concluded that the introduction of
	preempt_disable_nested()

in general makes sense and should be used in places where preemption on
!RT is disabled by other means and PREEMPT_RT need to explicitly disable
it.

This series introduces the macro and converts already existing users to
that macro. The u64_stat interface was simplified to make the change
simpler and code easier to follow.

[0] https://lore.kernel.org/all/YurA3aSb4GRr4wlW@ZenIV/

Sebastian



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/8] preempt: Provide preempt_[dis|en]able_nested()
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  2022-08-25 16:41 ` [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Mel Gorman,
	Valentin Schneider, Vincent Guittot, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither disabling
preemption nor interrupts. Though there are a few places which depend on
the implicit preemption/interrupt disable of those locks, e.g. seqcount
write sections, per CPU statistics updates etc.

To avoid sprinkling CONFIG_PREEMPT_RT conditionals all over the place, add
preempt_disable_nested() and preempt_enable_nested() which should be
descriptive enough.

Add a lockdep assertion for the !PREEMPT_RT case to catch callers which
do not have preemption disabled.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/preempt.h | 42 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index b4381f255a5ca..0df425bf9bd75 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -421,4 +421,46 @@ static inline void migrate_enable(void) { }
 
 #endif /* CONFIG_SMP */
 
+/**
+ * preempt_disable_nested - Disable preemption inside a normally preempt disabled section
+ *
+ * Use for code which requires preemption protection inside a critical
+ * section which has preemption disabled implicitly on non-PREEMPT_RT
+ * enabled kernels, by e.g.:
+ *  - holding a spinlock/rwlock
+ *  - soft interrupt context
+ *  - regular interrupt handlers
+ *
+ * On PREEMPT_RT enabled kernels spinlock/rwlock held sections, soft
+ * interrupt context and regular interrupt handlers are preemptible and
+ * only prevent migration. preempt_disable_nested() ensures that preemption
+ * is disabled for cases which require CPU local serialization even on
+ * PREEMPT_RT. For non-PREEMPT_RT kernels this is a NOP.
+ *
+ * The use cases are code sequences which are not serialized by a
+ * particular lock instance, e.g.:
+ *  - seqcount write side critical sections where the seqcount is not
+ *    associated to a particular lock and therefore the automatic
+ *    protection mechanism does not work. This prevents a live lock
+ *    against a preempting high priority reader.
+ *  - RMW per CPU variable updates like vmstat.
+ */
+/* Macro to avoid header recursion hell vs. lockdep */
+#define preempt_disable_nested()				\
+do {								\
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))			\
+		preempt_disable();				\
+	else							\
+		lockdep_assert_preemption_disabled();		\
+} while (0)
+
+/**
+ * preempt_enable_nested - Undo the effect of preempt_disable_nested()
+ */
+static __always_inline void preempt_enable_nested(void)
+{
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_enable();
+}
+
 #endif /* __LINUX_PREEMPT_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested()
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
  2022-08-25 16:41 ` [PATCH v2 1/8] preempt: Provide preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-08-26  7:52   ` Christian Brauner
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  2022-08-25 16:41 ` [PATCH v2 3/8] mm/vmstat: " Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Alexander Viro, linux-fsdevel,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Replace the open coded CONFIG_PREEMPT_RT conditional
preempt_disable/enable() with the new helper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 fs/dcache.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index bb0c4d0038dbd..2ee8636016ee9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2597,15 +2597,7 @@ EXPORT_SYMBOL(d_rehash);
 
 static inline unsigned start_dir_add(struct inode *dir)
 {
-	/*
-	 * The caller holds a spinlock (dentry::d_lock). On !PREEMPT_RT
-	 * kernels spin_lock() implicitly disables preemption, but not on
-	 * PREEMPT_RT.  So for RT it has to be done explicitly to protect
-	 * the sequence count write side critical section against a reader
-	 * or another writer preempting, which would result in a live lock.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 	for (;;) {
 		unsigned n = dir->i_dir_seq;
 		if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n)
@@ -2618,8 +2610,7 @@ static inline void end_dir_add(struct inode *dir, unsigned int n,
 			       wait_queue_head_t *d_wait)
 {
 	smp_store_release(&dir->i_dir_seq, n + 2);
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 	wake_up_all(d_wait);
 }
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/8] mm/vmstat: Use preempt_[dis|en]able_nested()
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
  2022-08-25 16:41 ` [PATCH v2 1/8] preempt: Provide preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
  2022-08-25 16:41 ` [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-09-01 14:41   ` Michal Hocko
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  2022-08-25 16:41 ` [PATCH v2 4/8] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Replace the open coded CONFIG_PREEMPT_RT conditional
preempt_enable/disable() pairs with the new helper functions which hide
the underlying implementation details.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 mm/vmstat.c | 36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 373d2730fcf21..d514fe7f90af0 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -355,8 +355,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 	 * CPU migrations and preemption potentially corrupts a counter so
 	 * disable preemption.
 	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	x = delta + __this_cpu_read(*p);
 
@@ -368,8 +367,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 	}
 	__this_cpu_write(*p, x);
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 EXPORT_SYMBOL(__mod_zone_page_state);
 
@@ -393,8 +391,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
 	}
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	x = delta + __this_cpu_read(*p);
 
@@ -406,8 +403,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
 	}
 	__this_cpu_write(*p, x);
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 EXPORT_SYMBOL(__mod_node_page_state);
 
@@ -441,8 +437,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 	s8 v, t;
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_inc_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -453,8 +448,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 		__this_cpu_write(*p, -overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -466,8 +460,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 	VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_inc_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -478,8 +471,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 		__this_cpu_write(*p, -overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -501,8 +493,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 	s8 v, t;
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_dec_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -513,8 +504,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 		__this_cpu_write(*p, overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -526,8 +516,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 	VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_dec_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -538,8 +527,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 		__this_cpu_write(*p, overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/8] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2022-08-25 16:41 ` [PATCH v2 3/8] mm/vmstat: " Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-09-01 14:41   ` Michal Hocko
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  2022-08-25 16:41   ` Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Some places in the VM code expect interrupts disabled, which is a valid
expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in
some places because the RT spinlock substitution does not disable
interrupts.

To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places,
provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and
PREEMPT_RT=n.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/mmdebug.h | 6 ++++++
 lib/Kconfig.debug       | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 15ae78cd28536..b8728d11c9490 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -94,6 +94,12 @@ void dump_mm(const struct mm_struct *mm);
 #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #endif
 
+#ifdef CONFIG_DEBUG_VM_IRQSOFF
+#define VM_WARN_ON_IRQS_ENABLED() WARN_ON_ONCE(!irqs_disabled())
+#else
+#define VM_WARN_ON_IRQS_ENABLED() do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_VIRTUAL
 #define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
 #else
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 072e4b289c13e..c96fc6820544c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -803,6 +803,9 @@ config ARCH_HAS_DEBUG_VM_PGTABLE
 	  An architecture should select this when it can successfully
 	  build and run DEBUG_VM_PGTABLE.
 
+config DEBUG_VM_IRQSOFF
+	def_bool DEBUG_VM && !PREEMPT_RT
+
 config DEBUG_VM
 	bool "Debug VM"
 	depends on DEBUG_KERNEL
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/8] mm/memcontrol: Replace the PREEMPT_RT conditionals
@ 2022-08-25 16:41   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, cgroups, linux-mm,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
replace the CONFIG_PREEMPT_RT #ifdeffery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 mm/memcontrol.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b69979c9ced5c..d35b6fa560f0a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -597,25 +597,18 @@ static u64 flush_next_time;
  */
 static void memcg_stats_lock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_disable();
-#else
-      VM_BUG_ON(!irqs_disabled());
-#endif
+	preempt_disable_nested();
+	VM_WARN_ON_IRQS_ENABLED();
 }
 
 static void __memcg_stats_lock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_disable();
-#endif
+	preempt_disable_nested();
 }
 
 static void memcg_stats_unlock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_enable();
-#endif
+	preempt_enable_nested();
 }
 
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
@@ -715,7 +708,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	 * interrupt context while other caller need to have disabled interrupt.
 	 */
 	__memcg_stats_lock();
-	if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
+	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
 		switch (idx) {
 		case NR_ANON_MAPPED:
 		case NR_FILE_MAPPED:
@@ -725,7 +718,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			WARN_ON_ONCE(!in_task());
 			break;
 		default:
-			WARN_ON_ONCE(!irqs_disabled());
+			VM_WARN_ON_IRQS_ENABLED();
 		}
 	}
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/8] mm/memcontrol: Replace the PREEMPT_RT conditionals
@ 2022-08-25 16:41   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>

Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
replace the CONFIG_PREEMPT_RT #ifdeffery.

Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
Cc: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
---
 mm/memcontrol.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b69979c9ced5c..d35b6fa560f0a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -597,25 +597,18 @@ static u64 flush_next_time;
  */
 static void memcg_stats_lock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_disable();
-#else
-      VM_BUG_ON(!irqs_disabled());
-#endif
+	preempt_disable_nested();
+	VM_WARN_ON_IRQS_ENABLED();
 }
 
 static void __memcg_stats_lock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_disable();
-#endif
+	preempt_disable_nested();
 }
 
 static void memcg_stats_unlock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_enable();
-#endif
+	preempt_enable_nested();
 }
 
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
@@ -715,7 +708,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	 * interrupt context while other caller need to have disabled interrupt.
 	 */
 	__memcg_stats_lock();
-	if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
+	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
 		switch (idx) {
 		case NR_ANON_MAPPED:
 		case NR_FILE_MAPPED:
@@ -725,7 +718,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			WARN_ON_ONCE(!in_task());
 			break;
 		default:
-			WARN_ON_ONCE(!irqs_disabled());
+			VM_WARN_ON_IRQS_ENABLED();
 		}
 	}
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/8] mm/compaction: Get rid of RT ifdeffery
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2022-08-25 16:41   ` Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  2022-08-25 16:41 ` [PATCH v2 7/8] flex_proportions: Disable preemption entering the write section Sebastian Andrzej Siewior
  2022-08-25 16:41 ` [PATCH v2 8/8] u64_stats: Streamline the implementation Sebastian Andrzej Siewior
  7 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Andrew Morton, Nick Terrell, linux-mm,
	Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

Move the RT dependency for the initial value of
sysctl_compact_unevictable_allowed into Kconfig.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Terrell <terrelln@fb.com>
Cc: linux-mm@kvack.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 mm/Kconfig      | 6 ++++++
 mm/compaction.c | 6 +-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 0331f1461f81c..3897e924e40f2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -579,6 +579,12 @@ config COMPACTION
 	  it and then we would be really interested to hear about that at
 	  linux-mm@kvack.org.
 
+config COMPACT_UNEVICTABLE_DEFAULT
+	int
+	depends on COMPACTION
+	default 0 if PREEMPT_RT
+	default 1
+
 #
 # support for free page reporting
 config PAGE_REPORTING
diff --git a/mm/compaction.c b/mm/compaction.c
index 640fa76228dd9..10561cb1aaad9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1727,11 +1727,7 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
-#ifdef CONFIG_PREEMPT_RT
-int sysctl_compact_unevictable_allowed __read_mostly = 0;
-#else
-int sysctl_compact_unevictable_allowed __read_mostly = 1;
-#endif
+int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 7/8] flex_proportions: Disable preemption entering the write section.
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2022-08-25 16:41 ` [PATCH v2 6/8] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Sebastian Andrzej Siewior
  2022-08-25 16:41 ` [PATCH v2 8/8] u64_stats: Streamline the implementation Sebastian Andrzej Siewior
  7 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, Sebastian Andrzej Siewior

The seqcount fprop_global::sequence is not associated with a lock. The
write section (fprop_new_period()) is invoked from a timer and since the
softirq is preemptible on PREEMPT_RT it is possible to preempt the write
section which is not desited.

Disable preemption around the write section on PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 lib/flex_proportions.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
index 05cccbcf1661a..83332fefa6f42 100644
--- a/lib/flex_proportions.c
+++ b/lib/flex_proportions.c
@@ -70,6 +70,7 @@ bool fprop_new_period(struct fprop_global *p, int periods)
 	 */
 	if (events <= 1)
 		return false;
+	preempt_disable_nested();
 	write_seqcount_begin(&p->sequence);
 	if (periods < 64)
 		events -= events >> periods;
@@ -77,6 +78,7 @@ bool fprop_new_period(struct fprop_global *p, int periods)
 	percpu_counter_add(&p->events, -events);
 	p->period += periods;
 	write_seqcount_end(&p->sequence);
+	preempt_enable_nested();
 
 	return true;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 8/8] u64_stats: Streamline the implementation
  2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2022-08-25 16:41 ` [PATCH v2 7/8] flex_proportions: Disable preemption entering the write section Sebastian Andrzej Siewior
@ 2022-08-25 16:41 ` Sebastian Andrzej Siewior
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  7 siblings, 1 reply; 23+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-25 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Linus Torvalds,
	Matthew Wilcox, netdev, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

The u64 stats code handles 3 different cases:

  - 32bit UP
  - 32bit SMP
  - 64bit

with an unreadable #ifdef maze, which was recently expanded with PREEMPT_RT
conditionals.

Reduce it to two cases (32bit and 64bit) and drop the optimization for
32bit UP as suggested by Linus.

Use the new preempt_disable/enable_nested() helpers to get rid of the
CONFIG_PREEMPT_RT conditionals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/u64_stats_sync.h | 147 +++++++++++++++------------------
 1 file changed, 65 insertions(+), 82 deletions(-)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 6ad4e9032d538..46040d66334a8 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -8,7 +8,7 @@
  *
  * Key points :
  *
- * -  Use a seqcount on 32-bit SMP, only disable preemption for 32-bit UP.
+ * -  Use a seqcount on 32-bit
  * -  The whole thing is a no-op on 64-bit architectures.
  *
  * Usage constraints:
@@ -20,7 +20,8 @@
  *    writer and also spin forever.
  *
  * 3) Write side must use the _irqsave() variant if other writers, or a reader,
- *    can be invoked from an IRQ context.
+ *    can be invoked from an IRQ context. On 64bit systems this variant does not
+ *    disable interrupts.
  *
  * 4) If reader fetches several counters, there is no guarantee the whole values
  *    are consistent w.r.t. each other (remember point #2: seqcounts are not
@@ -29,11 +30,6 @@
  * 5) Readers are allowed to sleep or be preempted/interrupted: they perform
  *    pure reads.
  *
- * 6) Readers must use both u64_stats_fetch_{begin,retry}_irq() if the stats
- *    might be updated from a hardirq or softirq context (remember point #1:
- *    seqcounts are not used for UP kernels). 32-bit UP stat readers could read
- *    corrupted 64-bit values otherwise.
- *
  * Usage :
  *
  * Stats producer (writer) should use following template granted it already got
@@ -66,7 +62,7 @@
 #include <linux/seqlock.h>
 
 struct u64_stats_sync {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
+#if BITS_PER_LONG == 32
 	seqcount_t	seq;
 #endif
 };
@@ -98,7 +94,22 @@ static inline void u64_stats_inc(u64_stats_t *p)
 	local64_inc(&p->v);
 }
 
-#else
+static inline void u64_stats_init(struct u64_stats_sync *syncp) { }
+static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) { }
+static inline void __u64_stats_update_end(struct u64_stats_sync *syncp) { }
+static inline unsigned long __u64_stats_irqsave(void) { return 0; }
+static inline void __u64_stats_irqrestore(unsigned long flags) { }
+static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
+{
+	return 0;
+}
+static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
+					   unsigned int start)
+{
+	return false;
+}
+
+#else /* 64 bit */
 
 typedef struct {
 	u64		v;
@@ -123,123 +134,95 @@ static inline void u64_stats_inc(u64_stats_t *p)
 {
 	p->v++;
 }
-#endif
 
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-#define u64_stats_init(syncp)	seqcount_init(&(syncp)->seq)
-#else
 static inline void u64_stats_init(struct u64_stats_sync *syncp)
 {
+	seqcount_init(&syncp->seq);
 }
-#endif
 
-static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
+static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 	write_seqcount_begin(&syncp->seq);
-#endif
 }
 
-static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
+static inline void __u64_stats_update_end(struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
 	write_seqcount_end(&syncp->seq);
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
-#endif
+	preempt_enable_nested();
 }
 
-static inline unsigned long
-u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp)
+static inline unsigned long __u64_stats_irqsave(void)
 {
-	unsigned long flags = 0;
+	unsigned long flags;
 
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
-	else
-		local_irq_save(flags);
-	write_seqcount_begin(&syncp->seq);
-#endif
+	local_irq_save(flags);
 	return flags;
 }
 
-static inline void
-u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp,
-				unsigned long flags)
+static inline void __u64_stats_irqrestore(unsigned long flags)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	write_seqcount_end(&syncp->seq);
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
-	else
-		local_irq_restore(flags);
-#endif
+	local_irq_restore(flags);
 }
 
 static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
 	return read_seqcount_begin(&syncp->seq);
-#else
-	return 0;
-#endif
+}
+
+static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
+					   unsigned int start)
+{
+	return read_seqcount_retry(&syncp->seq, start);
+}
+#endif /* !64 bit */
+
+static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
+{
+	__u64_stats_update_begin(syncp);
+}
+
+static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
+{
+	__u64_stats_update_end(syncp);
+}
+
+static inline unsigned long u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp)
+{
+	unsigned long flags = __u64_stats_irqsave();
+
+	__u64_stats_update_begin(syncp);
+	return flags;
+}
+
+static inline void u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp,
+						   unsigned long flags)
+{
+	__u64_stats_update_end(syncp);
+	__u64_stats_irqrestore(flags);
 }
 
 static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (!defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT))
-	preempt_disable();
-#endif
 	return __u64_stats_fetch_begin(syncp);
 }
 
-static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
-					 unsigned int start)
-{
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	return read_seqcount_retry(&syncp->seq, start);
-#else
-	return false;
-#endif
-}
-
 static inline bool u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
 					 unsigned int start)
 {
-#if BITS_PER_LONG == 32 && (!defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT))
-	preempt_enable();
-#endif
 	return __u64_stats_fetch_retry(syncp, start);
 }
 
-/*
- * In case irq handlers can update u64 counters, readers can use following helpers
- * - SMP 32bit arches use seqcount protection, irq safe.
- * - UP 32bit must disable irqs.
- * - 64bit have no problem atomically reading u64 values, irq safe.
- */
+/* Obsolete interfaces */
 static inline unsigned int u64_stats_fetch_begin_irq(const struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && defined(CONFIG_PREEMPT_RT)
-	preempt_disable();
-#elif BITS_PER_LONG == 32 && !defined(CONFIG_SMP)
-	local_irq_disable();
-#endif
-	return __u64_stats_fetch_begin(syncp);
+	return u64_stats_fetch_begin(syncp);
 }
 
 static inline bool u64_stats_fetch_retry_irq(const struct u64_stats_sync *syncp,
 					     unsigned int start)
 {
-#if BITS_PER_LONG == 32 && defined(CONFIG_PREEMPT_RT)
-	preempt_enable();
-#elif BITS_PER_LONG == 32 && !defined(CONFIG_SMP)
-	local_irq_enable();
-#endif
-	return __u64_stats_fetch_retry(syncp, start);
+	return u64_stats_fetch_retry(syncp, start);
 }
 
 #endif /* _LINUX_U64_STATS_SYNC_H */
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested()
  2022-08-25 16:41 ` [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
@ 2022-08-26  7:52   ` Christian Brauner
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2022-08-26  7:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Linus Torvalds, Matthew Wilcox, Alexander Viro, linux-fsdevel

On Thu, Aug 25, 2022 at 06:41:25PM +0200, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Replace the open coded CONFIG_PREEMPT_RT conditional
> preempt_disable/enable() with the new helper.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/8] mm/vmstat: Use preempt_[dis|en]able_nested()
  2022-08-25 16:41 ` [PATCH v2 3/8] mm/vmstat: " Sebastian Andrzej Siewior
@ 2022-09-01 14:41   ` Michal Hocko
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2022-09-01 14:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Linus Torvalds, Matthew Wilcox, Andrew Morton, linux-mm

On Thu 25-08-22 18:41:26, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Replace the open coded CONFIG_PREEMPT_RT conditional
> preempt_enable/disable() pairs with the new helper functions which hide
> the underlying implementation details.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> ---
>  mm/vmstat.c | 36 ++++++++++++------------------------
>  1 file changed, 12 insertions(+), 24 deletions(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 373d2730fcf21..d514fe7f90af0 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -355,8 +355,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>  	 * CPU migrations and preemption potentially corrupts a counter so
>  	 * disable preemption.
>  	 */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_disable();
> +	preempt_disable_nested();
>  
>  	x = delta + __this_cpu_read(*p);
>  
> @@ -368,8 +367,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
>  	}
>  	__this_cpu_write(*p, x);
>  
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_enable();
> +	preempt_enable_nested();
>  }
>  EXPORT_SYMBOL(__mod_zone_page_state);
>  
> @@ -393,8 +391,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>  	}
>  
>  	/* See __mod_node_page_state */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_disable();
> +	preempt_disable_nested();
>  
>  	x = delta + __this_cpu_read(*p);
>  
> @@ -406,8 +403,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
>  	}
>  	__this_cpu_write(*p, x);
>  
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_enable();
> +	preempt_enable_nested();
>  }
>  EXPORT_SYMBOL(__mod_node_page_state);
>  
> @@ -441,8 +437,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>  	s8 v, t;
>  
>  	/* See __mod_node_page_state */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_disable();
> +	preempt_disable_nested();
>  
>  	v = __this_cpu_inc_return(*p);
>  	t = __this_cpu_read(pcp->stat_threshold);
> @@ -453,8 +448,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
>  		__this_cpu_write(*p, -overstep);
>  	}
>  
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_enable();
> +	preempt_enable_nested();
>  }
>  
>  void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
> @@ -466,8 +460,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>  	VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
>  
>  	/* See __mod_node_page_state */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_disable();
> +	preempt_disable_nested();
>  
>  	v = __this_cpu_inc_return(*p);
>  	t = __this_cpu_read(pcp->stat_threshold);
> @@ -478,8 +471,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>  		__this_cpu_write(*p, -overstep);
>  	}
>  
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_enable();
> +	preempt_enable_nested();
>  }
>  
>  void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
> @@ -501,8 +493,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>  	s8 v, t;
>  
>  	/* See __mod_node_page_state */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_disable();
> +	preempt_disable_nested();
>  
>  	v = __this_cpu_dec_return(*p);
>  	t = __this_cpu_read(pcp->stat_threshold);
> @@ -513,8 +504,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
>  		__this_cpu_write(*p, overstep);
>  	}
>  
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_enable();
> +	preempt_enable_nested();
>  }
>  
>  void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
> @@ -526,8 +516,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>  	VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
>  
>  	/* See __mod_node_page_state */
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_disable();
> +	preempt_disable_nested();
>  
>  	v = __this_cpu_dec_return(*p);
>  	t = __this_cpu_read(pcp->stat_threshold);
> @@ -538,8 +527,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
>  		__this_cpu_write(*p, overstep);
>  	}
>  
> -	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> -		preempt_enable();
> +	preempt_enable_nested();
>  }
>  
>  void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
> -- 
> 2.37.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/8] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
  2022-08-25 16:41 ` [PATCH v2 4/8] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
@ 2022-09-01 14:41   ` Michal Hocko
  2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2022-09-01 14:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Linus Torvalds, Matthew Wilcox, Andrew Morton, linux-mm

On Thu 25-08-22 18:41:27, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Some places in the VM code expect interrupts disabled, which is a valid
> expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in
> some places because the RT spinlock substitution does not disable
> interrupts.
> 
> To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places,
> provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and
> PREEMPT_RT=n.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> ---
>  include/linux/mmdebug.h | 6 ++++++
>  lib/Kconfig.debug       | 3 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 15ae78cd28536..b8728d11c9490 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -94,6 +94,12 @@ void dump_mm(const struct mm_struct *mm);
>  #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
>  #endif
>  
> +#ifdef CONFIG_DEBUG_VM_IRQSOFF
> +#define VM_WARN_ON_IRQS_ENABLED() WARN_ON_ONCE(!irqs_disabled())
> +#else
> +#define VM_WARN_ON_IRQS_ENABLED() do { } while (0)
> +#endif
> +
>  #ifdef CONFIG_DEBUG_VIRTUAL
>  #define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
>  #else
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 072e4b289c13e..c96fc6820544c 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -803,6 +803,9 @@ config ARCH_HAS_DEBUG_VM_PGTABLE
>  	  An architecture should select this when it can successfully
>  	  build and run DEBUG_VM_PGTABLE.
>  
> +config DEBUG_VM_IRQSOFF
> +	def_bool DEBUG_VM && !PREEMPT_RT
> +
>  config DEBUG_VM
>  	bool "Debug VM"
>  	depends on DEBUG_KERNEL
> -- 
> 2.37.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/8] mm/memcontrol: Replace the PREEMPT_RT conditionals
@ 2022-09-01 14:45     ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2022-09-01 14:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Steven Rostedt,
	Linus Torvalds, Matthew Wilcox, Johannes Weiner, Roman Gushchin,
	Shakeel Butt, Muchun Song, cgroups, linux-mm

On Thu 25-08-22 18:41:28, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
> replace the CONFIG_PREEMPT_RT #ifdeffery.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Cc: cgroups@vger.kernel.org
> Cc: linux-mm@kvack.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> ---
>  mm/memcontrol.c | 19 ++++++-------------
>  1 file changed, 6 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b69979c9ced5c..d35b6fa560f0a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -597,25 +597,18 @@ static u64 flush_next_time;
>   */
>  static void memcg_stats_lock(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT
> -      preempt_disable();
> -#else
> -      VM_BUG_ON(!irqs_disabled());
> -#endif
> +	preempt_disable_nested();
> +	VM_WARN_ON_IRQS_ENABLED();
>  }
>  
>  static void __memcg_stats_lock(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT
> -      preempt_disable();
> -#endif
> +	preempt_disable_nested();
>  }
>  
>  static void memcg_stats_unlock(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT
> -      preempt_enable();
> -#endif
> +	preempt_enable_nested();
>  }
>  
>  static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
> @@ -715,7 +708,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  	 * interrupt context while other caller need to have disabled interrupt.
>  	 */
>  	__memcg_stats_lock();
> -	if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
>  		switch (idx) {
>  		case NR_ANON_MAPPED:
>  		case NR_FILE_MAPPED:
> @@ -725,7 +718,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  			WARN_ON_ONCE(!in_task());
>  			break;
>  		default:
> -			WARN_ON_ONCE(!irqs_disabled());
> +			VM_WARN_ON_IRQS_ENABLED();
>  		}
>  	}
>  
> -- 
> 2.37.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/8] mm/memcontrol: Replace the PREEMPT_RT conditionals
@ 2022-09-01 14:45     ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2022-09-01 14:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, Linus Torvalds, Matthew Wilcox,
	Johannes Weiner, Roman Gushchin, Shakeel Butt, Muchun Song,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Thu 25-08-22 18:41:28, Sebastian Andrzej Siewior wrote:
> From: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> 
> Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
> replace the CONFIG_PREEMPT_RT #ifdeffery.
> 
> Signed-off-by: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org>
> Cc: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Reviewed-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>

Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Thanks!

> ---
>  mm/memcontrol.c | 19 ++++++-------------
>  1 file changed, 6 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b69979c9ced5c..d35b6fa560f0a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -597,25 +597,18 @@ static u64 flush_next_time;
>   */
>  static void memcg_stats_lock(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT
> -      preempt_disable();
> -#else
> -      VM_BUG_ON(!irqs_disabled());
> -#endif
> +	preempt_disable_nested();
> +	VM_WARN_ON_IRQS_ENABLED();
>  }
>  
>  static void __memcg_stats_lock(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT
> -      preempt_disable();
> -#endif
> +	preempt_disable_nested();
>  }
>  
>  static void memcg_stats_unlock(void)
>  {
> -#ifdef CONFIG_PREEMPT_RT
> -      preempt_enable();
> -#endif
> +	preempt_enable_nested();
>  }
>  
>  static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
> @@ -715,7 +708,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  	 * interrupt context while other caller need to have disabled interrupt.
>  	 */
>  	__memcg_stats_lock();
> -	if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
>  		switch (idx) {
>  		case NR_ANON_MAPPED:
>  		case NR_FILE_MAPPED:
> @@ -725,7 +718,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>  			WARN_ON_ONCE(!in_task());
>  			break;
>  		default:
> -			WARN_ON_ONCE(!irqs_disabled());
> +			VM_WARN_ON_IRQS_ENABLED();
>  		}
>  	}
>  
> -- 
> 2.37.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip: sched/rt] u64_stats: Streamline the implementation
  2022-08-25 16:41 ` [PATCH v2 8/8] u64_stats: Streamline the implementation Sebastian Andrzej Siewior
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     44b0c2957adc62b86fcd51adeaf8e993171bc319
Gitweb:        https://git.kernel.org/tip/44b0c2957adc62b86fcd51adeaf8e993171bc319
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:31 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:08 +02:00

u64_stats: Streamline the implementation

The u64 stats code handles 3 different cases:

  - 32bit UP
  - 32bit SMP
  - 64bit

with an unreadable #ifdef maze, which was recently expanded with PREEMPT_RT
conditionals.

Reduce it to two cases (32bit and 64bit) and drop the optimization for
32bit UP as suggested by Linus.

Use the new preempt_disable/enable_nested() helpers to get rid of the
CONFIG_PREEMPT_RT conditionals.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825164131.402717-9-bigeasy@linutronix.de

---
 include/linux/u64_stats_sync.h | 145 ++++++++++++++------------------
 1 file changed, 64 insertions(+), 81 deletions(-)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 6ad4e90..46040d6 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -8,7 +8,7 @@
  *
  * Key points :
  *
- * -  Use a seqcount on 32-bit SMP, only disable preemption for 32-bit UP.
+ * -  Use a seqcount on 32-bit
  * -  The whole thing is a no-op on 64-bit architectures.
  *
  * Usage constraints:
@@ -20,7 +20,8 @@
  *    writer and also spin forever.
  *
  * 3) Write side must use the _irqsave() variant if other writers, or a reader,
- *    can be invoked from an IRQ context.
+ *    can be invoked from an IRQ context. On 64bit systems this variant does not
+ *    disable interrupts.
  *
  * 4) If reader fetches several counters, there is no guarantee the whole values
  *    are consistent w.r.t. each other (remember point #2: seqcounts are not
@@ -29,11 +30,6 @@
  * 5) Readers are allowed to sleep or be preempted/interrupted: they perform
  *    pure reads.
  *
- * 6) Readers must use both u64_stats_fetch_{begin,retry}_irq() if the stats
- *    might be updated from a hardirq or softirq context (remember point #1:
- *    seqcounts are not used for UP kernels). 32-bit UP stat readers could read
- *    corrupted 64-bit values otherwise.
- *
  * Usage :
  *
  * Stats producer (writer) should use following template granted it already got
@@ -66,7 +62,7 @@
 #include <linux/seqlock.h>
 
 struct u64_stats_sync {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
+#if BITS_PER_LONG == 32
 	seqcount_t	seq;
 #endif
 };
@@ -98,7 +94,22 @@ static inline void u64_stats_inc(u64_stats_t *p)
 	local64_inc(&p->v);
 }
 
-#else
+static inline void u64_stats_init(struct u64_stats_sync *syncp) { }
+static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) { }
+static inline void __u64_stats_update_end(struct u64_stats_sync *syncp) { }
+static inline unsigned long __u64_stats_irqsave(void) { return 0; }
+static inline void __u64_stats_irqrestore(unsigned long flags) { }
+static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
+{
+	return 0;
+}
+static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
+					   unsigned int start)
+{
+	return false;
+}
+
+#else /* 64 bit */
 
 typedef struct {
 	u64		v;
@@ -123,123 +134,95 @@ static inline void u64_stats_inc(u64_stats_t *p)
 {
 	p->v++;
 }
-#endif
 
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-#define u64_stats_init(syncp)	seqcount_init(&(syncp)->seq)
-#else
 static inline void u64_stats_init(struct u64_stats_sync *syncp)
 {
+	seqcount_init(&syncp->seq);
 }
-#endif
 
-static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
+static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 	write_seqcount_begin(&syncp->seq);
-#endif
 }
 
-static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
+static inline void __u64_stats_update_end(struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
 	write_seqcount_end(&syncp->seq);
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
-#endif
+	preempt_enable_nested();
 }
 
-static inline unsigned long
-u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp)
+static inline unsigned long __u64_stats_irqsave(void)
 {
-	unsigned long flags = 0;
+	unsigned long flags;
 
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
-	else
-		local_irq_save(flags);
-	write_seqcount_begin(&syncp->seq);
-#endif
+	local_irq_save(flags);
 	return flags;
 }
 
-static inline void
-u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp,
-				unsigned long flags)
+static inline void __u64_stats_irqrestore(unsigned long flags)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	write_seqcount_end(&syncp->seq);
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
-	else
-		local_irq_restore(flags);
-#endif
+	local_irq_restore(flags);
 }
 
 static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
 	return read_seqcount_begin(&syncp->seq);
-#else
-	return 0;
-#endif
 }
 
-static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
+static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
+					   unsigned int start)
 {
-#if BITS_PER_LONG == 32 && (!defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT))
-	preempt_disable();
-#endif
-	return __u64_stats_fetch_begin(syncp);
+	return read_seqcount_retry(&syncp->seq, start);
 }
+#endif /* !64 bit */
 
-static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
-					 unsigned int start)
+static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT))
-	return read_seqcount_retry(&syncp->seq, start);
-#else
-	return false;
-#endif
+	__u64_stats_update_begin(syncp);
+}
+
+static inline void u64_stats_update_end(struct u64_stats_sync *syncp)
+{
+	__u64_stats_update_end(syncp);
+}
+
+static inline unsigned long u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp)
+{
+	unsigned long flags = __u64_stats_irqsave();
+
+	__u64_stats_update_begin(syncp);
+	return flags;
+}
+
+static inline void u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp,
+						   unsigned long flags)
+{
+	__u64_stats_update_end(syncp);
+	__u64_stats_irqrestore(flags);
+}
+
+static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp)
+{
+	return __u64_stats_fetch_begin(syncp);
 }
 
 static inline bool u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
 					 unsigned int start)
 {
-#if BITS_PER_LONG == 32 && (!defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT))
-	preempt_enable();
-#endif
 	return __u64_stats_fetch_retry(syncp, start);
 }
 
-/*
- * In case irq handlers can update u64 counters, readers can use following helpers
- * - SMP 32bit arches use seqcount protection, irq safe.
- * - UP 32bit must disable irqs.
- * - 64bit have no problem atomically reading u64 values, irq safe.
- */
+/* Obsolete interfaces */
 static inline unsigned int u64_stats_fetch_begin_irq(const struct u64_stats_sync *syncp)
 {
-#if BITS_PER_LONG == 32 && defined(CONFIG_PREEMPT_RT)
-	preempt_disable();
-#elif BITS_PER_LONG == 32 && !defined(CONFIG_SMP)
-	local_irq_disable();
-#endif
-	return __u64_stats_fetch_begin(syncp);
+	return u64_stats_fetch_begin(syncp);
 }
 
 static inline bool u64_stats_fetch_retry_irq(const struct u64_stats_sync *syncp,
 					     unsigned int start)
 {
-#if BITS_PER_LONG == 32 && defined(CONFIG_PREEMPT_RT)
-	preempt_enable();
-#elif BITS_PER_LONG == 32 && !defined(CONFIG_SMP)
-	local_irq_enable();
-#endif
-	return __u64_stats_fetch_retry(syncp, start);
+	return u64_stats_fetch_retry(syncp, start);
 }
 
 #endif /* _LINUX_U64_STATS_SYNC_H */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] flex_proportions: Disable preemption entering the write section.
  2022-08-25 16:41 ` [PATCH v2 7/8] flex_proportions: Disable preemption entering the write section Sebastian Andrzej Siewior
@ 2022-09-19 12:37   ` tip-bot2 for Sebastian Andrzej Siewior
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Sebastian Andrzej Siewior @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     9458e0a78c45bc6537ce11eb9f03489eab92f9c2
Gitweb:        https://git.kernel.org/tip/9458e0a78c45bc6537ce11eb9f03489eab92f9c2
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:30 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:08 +02:00

flex_proportions: Disable preemption entering the write section.

The seqcount fprop_global::sequence is not associated with a lock. The
write section (fprop_new_period()) is invoked from a timer and since the
softirq is preemptible on PREEMPT_RT it is possible to preempt the write
section which is not desited.

Disable preemption around the write section on PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220825164131.402717-8-bigeasy@linutronix.de

---
 lib/flex_proportions.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
index 05cccbc..83332fe 100644
--- a/lib/flex_proportions.c
+++ b/lib/flex_proportions.c
@@ -70,6 +70,7 @@ bool fprop_new_period(struct fprop_global *p, int periods)
 	 */
 	if (events <= 1)
 		return false;
+	preempt_disable_nested();
 	write_seqcount_begin(&p->sequence);
 	if (periods < 64)
 		events -= events >> periods;
@@ -77,6 +78,7 @@ bool fprop_new_period(struct fprop_global *p, int periods)
 	percpu_counter_add(&p->events, -events);
 	p->period += periods;
 	write_seqcount_end(&p->sequence);
+	preempt_enable_nested();
 
 	return true;
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] mm/compaction: Get rid of RT ifdeffery
  2022-08-25 16:41 ` [PATCH v2 6/8] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     c7e0b3d088717d148707cd6fcb12f97c6fd961c1
Gitweb:        https://git.kernel.org/tip/c7e0b3d088717d148707cd6fcb12f97c6fd961c1
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:29 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:08 +02:00

mm/compaction: Get rid of RT ifdeffery

Move the RT dependency for the initial value of
sysctl_compact_unevictable_allowed into Kconfig.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825164131.402717-7-bigeasy@linutronix.de

---
 mm/Kconfig      | 6 ++++++
 mm/compaction.c | 6 +-----
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 0331f14..3897e92 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -579,6 +579,12 @@ config COMPACTION
 	  it and then we would be really interested to hear about that at
 	  linux-mm@kvack.org.
 
+config COMPACT_UNEVICTABLE_DEFAULT
+	int
+	depends on COMPACTION
+	default 0 if PREEMPT_RT
+	default 1
+
 #
 # support for free page reporting
 config PAGE_REPORTING
diff --git a/mm/compaction.c b/mm/compaction.c
index 640fa76..10561cb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1727,11 +1727,7 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
-#ifdef CONFIG_PREEMPT_RT
-int sysctl_compact_unevictable_allowed __read_mostly = 0;
-#else
-int sysctl_compact_unevictable_allowed __read_mostly = 1;
-#endif
+int sysctl_compact_unevictable_allowed __read_mostly = CONFIG_COMPACT_UNEVICTABLE_DEFAULT;
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] mm/memcontrol: Replace the PREEMPT_RT conditionals
  2022-08-25 16:41   ` Sebastian Andrzej Siewior
  (?)
  (?)
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  -1 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior, Muchun Song,
	Johannes Weiner, Peter Zijlstra (Intel),
	Michal Hocko, x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     e575d401583273a7ac5dfb27520e41c821e81816
Gitweb:        https://git.kernel.org/tip/e575d401583273a7ac5dfb27520e41c821e81816
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:28 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:08 +02:00

mm/memcontrol: Replace the PREEMPT_RT conditionals

Use VM_WARN_ON_IRQS_ENABLED() and preempt_disable/enable_nested() to
replace the CONFIG_PREEMPT_RT #ifdeffery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/r/20220825164131.402717-6-bigeasy@linutronix.de

---
 mm/memcontrol.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b69979c..d35b6fa 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -597,25 +597,18 @@ static u64 flush_next_time;
  */
 static void memcg_stats_lock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_disable();
-#else
-      VM_BUG_ON(!irqs_disabled());
-#endif
+	preempt_disable_nested();
+	VM_WARN_ON_IRQS_ENABLED();
 }
 
 static void __memcg_stats_lock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_disable();
-#endif
+	preempt_disable_nested();
 }
 
 static void memcg_stats_unlock(void)
 {
-#ifdef CONFIG_PREEMPT_RT
-      preempt_enable();
-#endif
+	preempt_enable_nested();
 }
 
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
@@ -715,7 +708,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	 * interrupt context while other caller need to have disabled interrupt.
 	 */
 	__memcg_stats_lock();
-	if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
+	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
 		switch (idx) {
 		case NR_ANON_MAPPED:
 		case NR_FILE_MAPPED:
@@ -725,7 +718,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			WARN_ON_ONCE(!in_task());
 			break;
 		default:
-			WARN_ON_ONCE(!irqs_disabled());
+			VM_WARN_ON_IRQS_ENABLED();
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
  2022-08-25 16:41 ` [PATCH v2 4/8] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
  2022-09-01 14:41   ` Michal Hocko
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior,
	Peter Zijlstra (Intel),
	Michal Hocko, x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     a738e9bad6030d4fd33bfd7db3399a250b7e94d8
Gitweb:        https://git.kernel.org/tip/a738e9bad6030d4fd33bfd7db3399a250b7e94d8
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:27 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:08 +02:00

mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()

Some places in the VM code expect interrupts disabled, which is a valid
expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in
some places because the RT spinlock substitution does not disable
interrupts.

To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places,
provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and
PREEMPT_RT=n.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/r/20220825164131.402717-5-bigeasy@linutronix.de

---
 include/linux/mmdebug.h | 6 ++++++
 lib/Kconfig.debug       | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 15ae78c..b8728d1 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -94,6 +94,12 @@ void dump_mm(const struct mm_struct *mm);
 #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
 #endif
 
+#ifdef CONFIG_DEBUG_VM_IRQSOFF
+#define VM_WARN_ON_IRQS_ENABLED() WARN_ON_ONCE(!irqs_disabled())
+#else
+#define VM_WARN_ON_IRQS_ENABLED() do { } while (0)
+#endif
+
 #ifdef CONFIG_DEBUG_VIRTUAL
 #define VIRTUAL_BUG_ON(cond) BUG_ON(cond)
 #else
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index bcbe60d..cdb4b27 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -803,6 +803,9 @@ config ARCH_HAS_DEBUG_VM_PGTABLE
 	  An architecture should select this when it can successfully
 	  build and run DEBUG_VM_PGTABLE.
 
+config DEBUG_VM_IRQSOFF
+	def_bool DEBUG_VM && !PREEMPT_RT
+
 config DEBUG_VM
 	bool "Debug VM"
 	depends on DEBUG_KERNEL

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] mm/vmstat: Use preempt_[dis|en]able_nested()
  2022-08-25 16:41 ` [PATCH v2 3/8] mm/vmstat: " Sebastian Andrzej Siewior
  2022-09-01 14:41   ` Michal Hocko
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior,
	Peter Zijlstra (Intel),
	Michal Hocko, x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     7a025e91abd23effe869a05d037b26770ffa0309
Gitweb:        https://git.kernel.org/tip/7a025e91abd23effe869a05d037b26770ffa0309
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:26 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:08 +02:00

mm/vmstat: Use preempt_[dis|en]able_nested()

Replace the open coded CONFIG_PREEMPT_RT conditional
preempt_enable/disable() pairs with the new helper functions which hide
the underlying implementation details.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/r/20220825164131.402717-4-bigeasy@linutronix.de

---
 mm/vmstat.c | 36 ++++++++++++------------------------
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 90af9a8..7a2d73f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -355,8 +355,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 	 * CPU migrations and preemption potentially corrupts a counter so
 	 * disable preemption.
 	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	x = delta + __this_cpu_read(*p);
 
@@ -368,8 +367,7 @@ void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 	}
 	__this_cpu_write(*p, x);
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 EXPORT_SYMBOL(__mod_zone_page_state);
 
@@ -393,8 +391,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
 	}
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	x = delta + __this_cpu_read(*p);
 
@@ -406,8 +403,7 @@ void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
 	}
 	__this_cpu_write(*p, x);
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 EXPORT_SYMBOL(__mod_node_page_state);
 
@@ -441,8 +437,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 	s8 v, t;
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_inc_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -453,8 +448,7 @@ void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 		__this_cpu_write(*p, -overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -466,8 +460,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 	VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_inc_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -478,8 +471,7 @@ void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 		__this_cpu_write(*p, -overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -501,8 +493,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 	s8 v, t;
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_dec_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -513,8 +504,7 @@ void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 		__this_cpu_write(*p, overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -526,8 +516,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 	VM_WARN_ON_ONCE(vmstat_item_in_bytes(item));
 
 	/* See __mod_node_page_state */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 
 	v = __this_cpu_dec_return(*p);
 	t = __this_cpu_read(pcp->stat_threshold);
@@ -538,8 +527,7 @@ void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
 		__this_cpu_write(*p, overstep);
 	}
 
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 }
 
 void __dec_zone_page_state(struct page *page, enum zone_stat_item item)

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] dentry: Use preempt_[dis|en]able_nested()
  2022-08-25 16:41 ` [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
  2022-08-26  7:52   ` Christian Brauner
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Sebastian Andrzej Siewior,
	Peter Zijlstra (Intel), Christian Brauner (Microsoft),
	x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     93f6d4e1893657f07ba3c9e2bfa74b355a0b32f9
Gitweb:        https://git.kernel.org/tip/93f6d4e1893657f07ba3c9e2bfa74b355a0b32f9
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:25 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:07 +02:00

dentry: Use preempt_[dis|en]able_nested()

Replace the open coded CONFIG_PREEMPT_RT conditional
preempt_disable/enable() with the new helper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20220825164131.402717-3-bigeasy@linutronix.de

---
 fs/dcache.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index bb0c4d0..2ee8636 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2597,15 +2597,7 @@ EXPORT_SYMBOL(d_rehash);
 
 static inline unsigned start_dir_add(struct inode *dir)
 {
-	/*
-	 * The caller holds a spinlock (dentry::d_lock). On !PREEMPT_RT
-	 * kernels spin_lock() implicitly disables preemption, but not on
-	 * PREEMPT_RT.  So for RT it has to be done explicitly to protect
-	 * the sequence count write side critical section against a reader
-	 * or another writer preempting, which would result in a live lock.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_disable();
+	preempt_disable_nested();
 	for (;;) {
 		unsigned n = dir->i_dir_seq;
 		if (!(n & 1) && cmpxchg(&dir->i_dir_seq, n, n + 1) == n)
@@ -2618,8 +2610,7 @@ static inline void end_dir_add(struct inode *dir, unsigned int n,
 			       wait_queue_head_t *d_wait)
 {
 	smp_store_release(&dir->i_dir_seq, n + 2);
-	if (IS_ENABLED(CONFIG_PREEMPT_RT))
-		preempt_enable();
+	preempt_enable_nested();
 	wake_up_all(d_wait);
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/rt] preempt: Provide preempt_[dis|en]able_nested()
  2022-08-25 16:41 ` [PATCH v2 1/8] preempt: Provide preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
@ 2022-09-19 12:37   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-09-19 12:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Linus Torvalds, Thomas Gleixner, Sebastian Andrzej Siewior,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the sched/rt branch of tip:

Commit-ID:     555bb4ccd1dd78d0263eae31629fe1fdd65c1fb5
Gitweb:        https://git.kernel.org/tip/555bb4ccd1dd78d0263eae31629fe1fdd65c1fb5
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 25 Aug 2022 18:41:24 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 19 Sep 2022 14:35:07 +02:00

preempt: Provide preempt_[dis|en]able_nested()

On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither disabling
preemption nor interrupts. Though there are a few places which depend on
the implicit preemption/interrupt disable of those locks, e.g. seqcount
write sections, per CPU statistics updates etc.

To avoid sprinkling CONFIG_PREEMPT_RT conditionals all over the place, add
preempt_disable_nested() and preempt_enable_nested() which should be
descriptive enough.

Add a lockdep assertion for the !PREEMPT_RT case to catch callers which
do not have preemption disabled.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825164131.402717-2-bigeasy@linutronix.de

---
 include/linux/preempt.h | 42 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index b4381f2..0df425b 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -421,4 +421,46 @@ static inline void migrate_enable(void) { }
 
 #endif /* CONFIG_SMP */
 
+/**
+ * preempt_disable_nested - Disable preemption inside a normally preempt disabled section
+ *
+ * Use for code which requires preemption protection inside a critical
+ * section which has preemption disabled implicitly on non-PREEMPT_RT
+ * enabled kernels, by e.g.:
+ *  - holding a spinlock/rwlock
+ *  - soft interrupt context
+ *  - regular interrupt handlers
+ *
+ * On PREEMPT_RT enabled kernels spinlock/rwlock held sections, soft
+ * interrupt context and regular interrupt handlers are preemptible and
+ * only prevent migration. preempt_disable_nested() ensures that preemption
+ * is disabled for cases which require CPU local serialization even on
+ * PREEMPT_RT. For non-PREEMPT_RT kernels this is a NOP.
+ *
+ * The use cases are code sequences which are not serialized by a
+ * particular lock instance, e.g.:
+ *  - seqcount write side critical sections where the seqcount is not
+ *    associated to a particular lock and therefore the automatic
+ *    protection mechanism does not work. This prevents a live lock
+ *    against a preempting high priority reader.
+ *  - RMW per CPU variable updates like vmstat.
+ */
+/* Macro to avoid header recursion hell vs. lockdep */
+#define preempt_disable_nested()				\
+do {								\
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))			\
+		preempt_disable();				\
+	else							\
+		lockdep_assert_preemption_disabled();		\
+} while (0)
+
+/**
+ * preempt_enable_nested - Undo the effect of preempt_disable_nested()
+ */
+static __always_inline void preempt_enable_nested(void)
+{
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		preempt_enable();
+}
+
 #endif /* __LINUX_PREEMPT_H */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-09-19 12:38 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 16:41 [PATCH v2 0/8] Replace PREEMPT_RT ifdefs with preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
2022-08-25 16:41 ` [PATCH v2 1/8] preempt: Provide preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2022-08-25 16:41 ` [PATCH v2 2/8] dentry: Use preempt_[dis|en]able_nested() Sebastian Andrzej Siewior
2022-08-26  7:52   ` Christian Brauner
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2022-08-25 16:41 ` [PATCH v2 3/8] mm/vmstat: " Sebastian Andrzej Siewior
2022-09-01 14:41   ` Michal Hocko
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2022-08-25 16:41 ` [PATCH v2 4/8] mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Sebastian Andrzej Siewior
2022-09-01 14:41   ` Michal Hocko
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2022-08-25 16:41 ` [PATCH v2 5/8] mm/memcontrol: Replace the PREEMPT_RT conditionals Sebastian Andrzej Siewior
2022-08-25 16:41   ` Sebastian Andrzej Siewior
2022-09-01 14:45   ` Michal Hocko
2022-09-01 14:45     ` Michal Hocko
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2022-08-25 16:41 ` [PATCH v2 6/8] mm/compaction: Get rid of RT ifdeffery Sebastian Andrzej Siewior
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner
2022-08-25 16:41 ` [PATCH v2 7/8] flex_proportions: Disable preemption entering the write section Sebastian Andrzej Siewior
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Sebastian Andrzej Siewior
2022-08-25 16:41 ` [PATCH v2 8/8] u64_stats: Streamline the implementation Sebastian Andrzej Siewior
2022-09-19 12:37   ` [tip: sched/rt] " tip-bot2 for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.