[PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes
@ 2022-08-09  3:45 Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 1/6] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu

Just a refresh of v3 with one additional debug patch. v3's cover letter is here:
 https://lore.kernel.org/all/20220713213237.1596225-1-joel@joelfernandes.org/

I just started working on this again while I have some time during paternity
leave ;-) So I thought I'll just send it out again. No other changes other
than that 1 debug patch I added on the top.

Next I am going to go refine the power results as mentioned in Paul's comments
on the last cover letter.

Joel Fernandes (Google) (5):
rcu: Introduce call_rcu_lazy() API implementation
rcuscale: Add laziness and kfree tests
fs: Move call_rcu() to call_rcu_lazy() in some paths
rcutorture: Add test code for call_rcu_lazy()
debug: Toggle lazy at runtime and change flush jiffies

Vineeth Pillai (1):
rcu: shrinker for lazy rcu

fs/dcache.c                                   |   4 +-
fs/eventpoll.c                                |   2 +-
fs/file_table.c                               |   2 +-
fs/inode.c                                    |   2 +-
include/linux/rcu_segcblist.h                 |   1 +
include/linux/rcupdate.h                      |   6 +
include/linux/sched/sysctl.h                  |   3 +
kernel/rcu/Kconfig                            |   8 +
kernel/rcu/rcu.h                              |  12 +
kernel/rcu/rcu_segcblist.c                    |  15 +-
kernel/rcu/rcu_segcblist.h                    |  20 +-
kernel/rcu/rcuscale.c                         |  74 +++++-
kernel/rcu/rcutorture.c                       |  60 ++++-
kernel/rcu/tree.c                             | 131 ++++++----
kernel/rcu/tree.h                             |  10 +-
kernel/rcu/tree_nocb.h                        | 246 +++++++++++++++---
kernel/sysctl.c                               |  17 ++
.../selftests/rcutorture/configs/rcu/CFLIST   |   1 +
.../selftests/rcutorture/configs/rcu/TREE11   |  18 ++
.../rcutorture/configs/rcu/TREE11.boot        |   8 +
20 files changed, 536 insertions(+), 104 deletions(-)
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot

--
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 resend 1/6] rcu: Introduce call_rcu_lazy() API implementation
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
@ 2022-08-09  3:45 ` Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 2/6] rcu: shrinker for lazy rcu Joel Fernandes (Google)
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	Paul McKenney, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, rcu

Implement timer-based RCU lazy callback batching. The batch is flushed
whenever a certain amount of time has passed, or the batch on a
particular CPU grows too big. Also memory pressure will flush it in a
future patch.

To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists to handle lazy CBs. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.

Suggested-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcu_segcblist.h |   1 +
 include/linux/rcupdate.h      |   6 ++
 kernel/rcu/Kconfig            |   8 ++
 kernel/rcu/rcu.h              |  11 ++
 kernel/rcu/rcu_segcblist.c    |  15 ++-
 kernel/rcu/rcu_segcblist.h    |  20 +++-
 kernel/rcu/tree.c             | 130 +++++++++++++++---------
 kernel/rcu/tree.h             |  10 +-
 kernel/rcu/tree_nocb.h        | 185 ++++++++++++++++++++++++++--------
 9 files changed, 289 insertions(+), 97 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 659d13a7ddaa..9a992707917b 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -22,6 +22,7 @@ struct rcu_cblist {
 	struct rcu_head *head;
 	struct rcu_head **tail;
 	long len;
+	long lazy_len;
 };
 
 #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1a32036c918c..9191a3d88087 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -82,6 +82,12 @@ static inline int rcu_preempt_depth(void)
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_RCU_LAZY
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+#else
+#define call_rcu_lazy(head, func) call_rcu(head, func)
+#endif
+
 /* Internal to kernel */
 void rcu_init(void);
 extern int rcu_scheduler_active;
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 27aab870ae4c..779b6e84006b 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -293,4 +293,12 @@ config TASKS_TRACE_RCU_READ_MB
 	  Say N here if you hate read-side memory barriers.
 	  Take the default if you are unsure.
 
+config RCU_LAZY
+	bool "RCU callback lazy invocation functionality"
+	depends on RCU_NOCB_CPU
+	default n
+	help
+	  To save power, batch RCU callbacks and flush after delay, memory
+	  pressure or callback list growing too big.
+
 endmenu # "RCU Subsystem"
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 4916077119f3..608f6ab76c7f 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -463,6 +463,14 @@ enum rcutorture_type {
 	INVALID_RCU_FLAVOR
 };
 
+#if defined(CONFIG_RCU_LAZY)
+unsigned long rcu_lazy_get_jiffies_till_flush(void);
+void rcu_lazy_set_jiffies_till_flush(unsigned long j);
+#else
+static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; }
+static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { }
+#endif
+
 #if defined(CONFIG_TREE_RCU)
 void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
 			    unsigned long *gp_seq);
@@ -472,6 +480,8 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
 			       unsigned long c_old,
 			       unsigned long c);
 void rcu_gp_set_torture_wait(int duration);
+void rcu_force_call_rcu_to_lazy(bool force);
+
 #else
 static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
 					  int *flags, unsigned long *gp_seq)
@@ -490,6 +500,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
 	do { } while (0)
 #endif
 static inline void rcu_gp_set_torture_wait(int duration) { }
+static inline void rcu_force_call_rcu_to_lazy(bool force) { }
 #endif
 
 #if IS_ENABLED(CONFIG_RCU_TORTURE_TEST) || IS_MODULE(CONFIG_RCU_TORTURE_TEST)
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index c54ea2b6a36b..776647cd2d6c 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -20,16 +20,21 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
 	rclp->head = NULL;
 	rclp->tail = &rclp->head;
 	rclp->len = 0;
+	rclp->lazy_len = 0;
 }
 
 /*
  * Enqueue an rcu_head structure onto the specified callback list.
  */
-void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
+void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp,
+			bool lazy)
 {
 	*rclp->tail = rhp;
 	rclp->tail = &rhp->next;
 	WRITE_ONCE(rclp->len, rclp->len + 1);
+
+	if (IS_ENABLED(CONFIG_RCU_LAZY) && lazy)
+		WRITE_ONCE(rclp->lazy_len, rclp->lazy_len + 1);
 }
 
 /*
@@ -38,11 +43,12 @@ void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
  * element of the second rcu_cblist structure, but ensuring that the second
  * rcu_cblist structure, if initially non-empty, always appears non-empty
  * throughout the process.  If rdp is NULL, the second rcu_cblist structure
- * is instead initialized to empty.
+ * is instead initialized to empty. Also account for lazy_len for lazy CBs.
  */
 void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
 			      struct rcu_cblist *srclp,
-			      struct rcu_head *rhp)
+			      struct rcu_head *rhp,
+			      bool lazy)
 {
 	drclp->head = srclp->head;
 	if (drclp->head)
@@ -58,6 +64,9 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
 		srclp->tail = &rhp->next;
 		WRITE_ONCE(srclp->len, 1);
 	}
+
+	if (IS_ENABLED(CONFIG_RCU_LAZY) && rhp && lazy)
+		WRITE_ONCE(srclp->lazy_len, 1);
 }
 
 /*
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 431cee212467..8e90b34adb00 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -15,14 +15,30 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
 	return READ_ONCE(rclp->len);
 }
 
+/* Return number of callbacks in the specified callback list. */
+static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
+{
+	if (IS_ENABLED(CONFIG_RCU_LAZY))
+		return READ_ONCE(rclp->lazy_len);
+	return 0;
+}
+
+static inline void rcu_cblist_reset_lazy_len(struct rcu_cblist *rclp)
+{
+	if (IS_ENABLED(CONFIG_RCU_LAZY))
+		WRITE_ONCE(rclp->lazy_len, 0);
+}
+
 /* Return number of callbacks in segmented callback list by summing seglen. */
 long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
 
 void rcu_cblist_init(struct rcu_cblist *rclp);
-void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
+void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp,
+			bool lazy);
 void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
 			      struct rcu_cblist *srclp,
-			      struct rcu_head *rhp);
+			      struct rcu_head *rhp,
+			      bool lazy);
 struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp);
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a..e76fef8031be 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3058,47 +3058,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
 	raw_spin_unlock_rcu_node(rnp);
 }
 
-/**
- * call_rcu() - Queue an RCU callback for invocation after a grace period.
- * @head: structure to be used for queueing the RCU updates.
- * @func: actual callback function to be invoked after the grace period
- *
- * The callback function will be invoked some time after a full grace
- * period elapses, in other words after all pre-existing RCU read-side
- * critical sections have completed.  However, the callback function
- * might well execute concurrently with RCU read-side critical sections
- * that started after call_rcu() was invoked.
- *
- * RCU read-side critical sections are delimited by rcu_read_lock()
- * and rcu_read_unlock(), and may be nested.  In addition, but only in
- * v5.0 and later, regions of code across which interrupts, preemption,
- * or softirqs have been disabled also serve as RCU read-side critical
- * sections.  This includes hardware interrupt handlers, softirq handlers,
- * and NMI handlers.
- *
- * Note that all CPUs must agree that the grace period extended beyond
- * all pre-existing RCU read-side critical section.  On systems with more
- * than one CPU, this means that when "func()" is invoked, each CPU is
- * guaranteed to have executed a full memory barrier since the end of its
- * last RCU read-side critical section whose beginning preceded the call
- * to call_rcu().  It also means that each CPU executing an RCU read-side
- * critical section that continues beyond the start of "func()" must have
- * executed a memory barrier after the call_rcu() but before the beginning
- * of that RCU read-side critical section.  Note that these guarantees
- * include CPUs that are offline, idle, or executing in user mode, as
- * well as CPUs that are executing in the kernel.
- *
- * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
- * resulting RCU callback function "func()", then both CPU A and CPU B are
- * guaranteed to execute a full memory barrier during the time interval
- * between the call to call_rcu() and the invocation of "func()" -- even
- * if CPU A and CPU B are the same CPU (but again only if the system has
- * more than one CPU).
- *
- * Implementation of these memory-ordering guarantees is described here:
- * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
- */
-void call_rcu(struct rcu_head *head, rcu_callback_t func)
+static void
+__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
 {
 	static atomic_t doublefrees;
 	unsigned long flags;
@@ -3139,7 +3100,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 	}
 
 	check_cb_ovld(rdp);
-	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
+	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
 		return; // Enqueued onto ->nocb_bypass, so just leave.
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
@@ -3161,8 +3122,86 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 		local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(call_rcu);
 
+#ifdef CONFIG_RCU_LAZY
+/**
+ * call_rcu_lazy() - Lazily queue RCU callback for invocation after grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed.
+ *
+ * Use this API instead of call_rcu() if you don't mind the callback being
+ * invoked after very long periods of time on systems without memory pressure
+ * and on systems which are lightly loaded or mostly idle.
+ *
+ * Other than the extra delay in callbacks being invoked, this function is
+ * identical to, and reuses call_rcu()'s logic. Refer to call_rcu() for more
+ * details about memory ordering and other functionality.
+ */
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
+{
+	return __call_rcu_common(head, func, true);
+}
+EXPORT_SYMBOL_GPL(call_rcu_lazy);
+#endif
+
+static bool force_call_rcu_to_lazy;
+
+void rcu_force_call_rcu_to_lazy(bool force)
+{
+	if (IS_ENABLED(CONFIG_RCU_SCALE_TEST))
+		WRITE_ONCE(force_call_rcu_to_lazy, force);
+}
+EXPORT_SYMBOL_GPL(rcu_force_call_rcu_to_lazy);
+
+/**
+ * call_rcu() - Queue an RCU callback for invocation after a grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed.  However, the callback function
+ * might well execute concurrently with RCU read-side critical sections
+ * that started after call_rcu() was invoked.
+ *
+ * RCU read-side critical sections are delimited by rcu_read_lock()
+ * and rcu_read_unlock(), and may be nested.  In addition, but only in
+ * v5.0 and later, regions of code across which interrupts, preemption,
+ * or softirqs have been disabled also serve as RCU read-side critical
+ * sections.  This includes hardware interrupt handlers, softirq handlers,
+ * and NMI handlers.
+ *
+ * Note that all CPUs must agree that the grace period extended beyond
+ * all pre-existing RCU read-side critical section.  On systems with more
+ * than one CPU, this means that when "func()" is invoked, each CPU is
+ * guaranteed to have executed a full memory barrier since the end of its
+ * last RCU read-side critical section whose beginning preceded the call
+ * to call_rcu().  It also means that each CPU executing an RCU read-side
+ * critical section that continues beyond the start of "func()" must have
+ * executed a memory barrier after the call_rcu() but before the beginning
+ * of that RCU read-side critical section.  Note that these guarantees
+ * include CPUs that are offline, idle, or executing in user mode, as
+ * well as CPUs that are executing in the kernel.
+ *
+ * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
+ * resulting RCU callback function "func()", then both CPU A and CPU B are
+ * guaranteed to execute a full memory barrier during the time interval
+ * between the call to call_rcu() and the invocation of "func()" -- even
+ * if CPU A and CPU B are the same CPU (but again only if the system has
+ * more than one CPU).
+ *
+ * Implementation of these memory-ordering guarantees is described here:
+ * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
+ */
+void call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	return __call_rcu_common(head, func, force_call_rcu_to_lazy);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
 
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (HZ / 50)
@@ -4056,7 +4095,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 	rdp->barrier_head.func = rcu_barrier_callback;
 	debug_rcu_head_queue(&rdp->barrier_head);
 	rcu_nocb_lock(rdp);
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false,
+		     /* wake gp thread */ true));
 	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
 		atomic_inc(&rcu_state.barrier_cpu_count);
 	} else {
@@ -4476,7 +4516,7 @@ void rcutree_migrate_callbacks(int cpu)
 	my_rdp = this_cpu_ptr(&rcu_data);
 	my_rnp = my_rdp->mynode;
 	rcu_nocb_lock(my_rdp); /* irqs already disabled. */
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false, false));
 	raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */
 	/* Leverage recent GPs and set GP for new callbacks. */
 	needwake = rcu_advance_cbs(my_rnp, rdp) ||
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2ccf5845957d..7b1ddee6a159 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -267,8 +267,9 @@ struct rcu_data {
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
 #define RCU_NOCB_WAKE_NOT	0
 #define RCU_NOCB_WAKE_BYPASS	1
-#define RCU_NOCB_WAKE		2
-#define RCU_NOCB_WAKE_FORCE	3
+#define RCU_NOCB_WAKE_LAZY	2
+#define RCU_NOCB_WAKE		3
+#define RCU_NOCB_WAKE_FORCE	4
 
 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
 					/* For jiffies_till_first_fqs and */
@@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j);
+				  unsigned long j, bool lazy, bool wakegp);
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags);
+				bool *was_alldone, unsigned long flags,
+				bool lazy);
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
 				 unsigned long flags);
 static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index e369efe94fda..61b1409bd4d6 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -256,6 +256,31 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
 	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
 }
 
+/*
+ * LAZY_FLUSH_JIFFIES decides the maximum amount of time that
+ * can elapse before lazy callbacks are flushed. Lazy callbacks
+ * could be flushed much earlier for a number of other reasons
+ * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are
+ * left unsubmitted to RCU after those many jiffies.
+ */
+#define LAZY_FLUSH_JIFFIES (10 * HZ)
+unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
+
+#ifdef CONFIG_RCU_LAZY
+// To be called only from test code.
+void rcu_lazy_set_jiffies_till_flush(unsigned long jif)
+{
+	jiffies_till_flush = jif;
+}
+EXPORT_SYMBOL(rcu_lazy_set_jiffies_till_flush);
+
+unsigned long rcu_lazy_get_jiffies_till_flush(void)
+{
+	return jiffies_till_flush;
+}
+EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush);
+#endif
+
 /*
  * Arrange to wake the GP kthread for this NOCB group at some future
  * time when it is safe to do so.
@@ -265,6 +290,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 {
 	unsigned long flags;
 	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
+	unsigned long mod_jif = 0;
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
@@ -272,16 +298,32 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	 * Bypass wakeup overrides previous deferments. In case
 	 * of callback storm, no need to wake up too early.
 	 */
-	if (waketype == RCU_NOCB_WAKE_BYPASS) {
-		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
-		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
-	} else {
+	switch (waketype) {
+	case RCU_NOCB_WAKE_LAZY:
+		if (rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_LAZY)
+			mod_jif = jiffies_till_flush;
+		break;
+
+	case RCU_NOCB_WAKE_BYPASS:
+		mod_jif = 2;
+		break;
+
+	case RCU_NOCB_WAKE:
+	case RCU_NOCB_WAKE_FORCE:
+		// If the type of deferred wake is "stronger"
+		// than it was before, make it wake up the soonest.
 		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
-			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
-		if (rdp_gp->nocb_defer_wakeup < waketype)
-			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+			mod_jif = 1;
+		break;
 	}
 
+	if (mod_jif)
+		mod_timer(&rdp_gp->nocb_timer, jiffies + mod_jif);
+
+	// If new type of wake up is stronger than before, promote.
+	if (rdp_gp->nocb_defer_wakeup < waketype)
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+
 	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason);
@@ -296,7 +338,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
  * Note that this function always returns true if rhp is NULL.
  */
 static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				     unsigned long j)
+				     unsigned long j, bool lazy)
 {
 	struct rcu_cblist rcl;
 
@@ -310,7 +352,9 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	/* Note: ->cblist.len already accounts for ->nocb_bypass contents. */
 	if (rhp)
 		rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
-	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
+
+	/* The lazy CBs are being flushed, but a new one might be enqueued. */
+	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp, lazy);
 	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
 	WRITE_ONCE(rdp->nocb_bypass_first, j);
 	rcu_nocb_bypass_unlock(rdp);
@@ -326,13 +370,20 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
  * Note that this function always returns true if rhp is NULL.
  */
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j)
+				  unsigned long j, bool lazy, bool wake_gp)
 {
+	bool ret;
+
 	if (!rcu_rdp_is_offloaded(rdp))
 		return true;
 	rcu_lockdep_assert_cblist_protected(rdp);
 	rcu_nocb_bypass_lock(rdp);
-	return rcu_nocb_do_flush_bypass(rdp, rhp, j);
+	ret = rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy);
+
+	if (wake_gp)
+		wake_nocb_gp(rdp, true);
+
+	return ret;
 }
 
 /*
@@ -345,7 +396,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
 	if (!rcu_rdp_is_offloaded(rdp) ||
 	    !rcu_nocb_bypass_trylock(rdp))
 		return;
-	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
+	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false));
 }
 
 /*
@@ -367,12 +418,14 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
  * there is only one CPU in operation.
  */
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags)
+				bool *was_alldone, unsigned long flags,
+				bool lazy)
 {
 	unsigned long c;
 	unsigned long cur_gp_seq;
 	unsigned long j = jiffies;
 	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	long n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 
 	lockdep_assert_irqs_disabled();
 
@@ -414,30 +467,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	}
 	WRITE_ONCE(rdp->nocb_nobypass_count, c);
 
-	// If there hasn't yet been all that many ->cblist enqueues
-	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
-	// ->nocb_bypass first.
-	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
+	// If caller passed a non-lazy CB and there hasn't yet been all that
+	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
+	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
+	// number of CBs (lazy + non-lazy) grows too much.
+	//
+	// Note that if the bypass list has lazy CBs, and the main list is
+	// empty, and rhp happens to be non-lazy, then we end up flushing all
+	// the lazy CBs to the main list as well. That's the right thing to do,
+	// since we are kick-starting RCU GP processing anyway for the non-lazy
+	// one, we can just reuse that GP for the already queued-up lazy ones.
+	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
+	    (lazy && n_lazy_cbs >= qhimark)) {
 		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 		if (*was_alldone)
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-					    TPS("FirstQ"));
-		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
+					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
+		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, lazy, false));
 		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
 		return false; // Caller must enqueue the callback.
 	}
 
 	// If ->nocb_bypass has been used too long or is too full,
 	// flush ->nocb_bypass to ->cblist.
-	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
-	    ncbs >= qhimark) {
+	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
 		rcu_nocb_lock(rdp);
-		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
+		if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
 			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 			if (*was_alldone)
 				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-						    TPS("FirstQ"));
+						    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
 			WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
 			return false; // Caller must enqueue the callback.
 		}
@@ -455,12 +515,18 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	rcu_nocb_wait_contended(rdp);
 	rcu_nocb_bypass_lock(rdp);
 	ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 	rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
-	rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);
+	rcu_cblist_enqueue(&rdp->nocb_bypass, rhp, lazy);
+
 	if (!ncbs) {
 		WRITE_ONCE(rdp->nocb_bypass_first, j);
-		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQ"));
+		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
+				    lazy ? TPS("FirstLazyBQ") : TPS("FirstBQ"));
+	} else if (!n_lazy_cbs && lazy) {
+		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstLazyBQ"));
 	}
+
 	rcu_nocb_bypass_unlock(rdp);
 	smp_mb(); /* Order enqueue before wake. */
 	if (ncbs) {
@@ -493,7 +559,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 {
 	unsigned long cur_gp_seq;
 	unsigned long j;
-	long len;
+	long len, lazy_len, bypass_len;
 	struct task_struct *t;
 
 	// If we are being polled or there is no kthread, just leave.
@@ -506,9 +572,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	}
 	// Need to actually to a wakeup.
 	len = rcu_segcblist_n_cbs(&rdp->cblist);
+	bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 	if (was_alldone) {
 		rdp->qlen_last_fqs_check = len;
-		if (!irqs_disabled_flags(flags)) {
+		// Only lazy CBs in bypass list
+		if (lazy_len && bypass_len == lazy_len) {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
+					   TPS("WakeLazy"));
+		} else if (!irqs_disabled_flags(flags)) {
 			/* ... if queue was empty ... */
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			wake_nocb_gp(rdp, false);
@@ -599,8 +672,8 @@ static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp,
  */
 static void nocb_gp_wait(struct rcu_data *my_rdp)
 {
-	bool bypass = false;
-	long bypass_ncbs;
+	bool bypass = false, lazy = false;
+	long bypass_ncbs, lazy_ncbs;
 	int __maybe_unused cpu = my_rdp->cpu;
 	unsigned long cur_gp_seq;
 	unsigned long flags;
@@ -636,6 +709,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	 */
 	list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) {
 		bool needwake_state = false;
+		bool flush_bypass = false;
 
 		if (!nocb_gp_enabled_cb(rdp))
 			continue;
@@ -648,22 +722,37 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 			continue;
 		}
 		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
-		if (bypass_ncbs &&
+		lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+
+		if (lazy_ncbs &&
+		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush) ||
+		     bypass_ncbs > 2 * qhimark)) {
+			flush_bypass = true;
+		} else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
 		     bypass_ncbs > 2 * qhimark)) {
-			// Bypass full or old, so flush it.
-			(void)rcu_nocb_try_flush_bypass(rdp, j);
-			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+			flush_bypass = true;
 		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			if (needwake_state)
 				swake_up_one(&rdp->nocb_state_wq);
 			continue; /* No callbacks here, try next. */
 		}
+
+		if (flush_bypass) {
+			// Bypass full or old, so flush it.
+			(void)rcu_nocb_try_flush_bypass(rdp, j);
+			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+			lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+		}
+
 		if (bypass_ncbs) {
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-					    TPS("Bypass"));
-			bypass = true;
+				    bypass_ncbs == lazy_ncbs ? TPS("Lazy") : TPS("Bypass"));
+			if (bypass_ncbs == lazy_ncbs)
+				lazy = true;
+			else
+				bypass = true;
 		}
 		rnp = rdp->mynode;
 
@@ -713,12 +802,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	my_rdp->nocb_gp_gp = needwait_gp;
 	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
 
-	if (bypass && !rcu_nocb_poll) {
-		// At least one child with non-empty ->nocb_bypass, so set
-		// timer in order to avoid stranding its callbacks.
-		wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
-				   TPS("WakeBypassIsDeferred"));
+	// At least one child with non-empty ->nocb_bypass, so set
+	// timer in order to avoid stranding its callbacks.
+	if (!rcu_nocb_poll) {
+		// If bypass list only has lazy CBs. Add a deferred
+		// lazy wake up.
+		if (lazy && !bypass) {
+			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY,
+					TPS("WakeLazyIsDeferred"));
+		// Otherwise add a deferred bypass wake up.
+		} else if (bypass) {
+			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
+					TPS("WakeBypassIsDeferred"));
+		}
 	}
+
 	if (rcu_nocb_poll) {
 		/* Polling, so trace if first poll in the series. */
 		if (gotcbs)
@@ -999,7 +1097,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	 * return false, which means that future calls to rcu_nocb_try_bypass()
 	 * will refuse to put anything into the bypass.
 	 */
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false, false));
 	/*
 	 * Start with invoking rcu_core() early. This way if the current thread
 	 * happens to preempt an ongoing call to rcu_core() in the middle,
@@ -1500,13 +1598,14 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 }
 
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j)
+				  unsigned long j, bool lazy, bool wakegp)
 {
 	return true;
 }
 
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags)
+				bool *was_alldone, unsigned long flags,
+				bool lazy)
 {
 	return false;
 }
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 resend 2/6] rcu: shrinker for lazy rcu
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 1/6] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
@ 2022-08-09  3:45 ` Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 3/6] rcuscale: Add laziness and kfree tests Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Vineeth Pillai, Joel Fernandes, rushikesh.s.kadam, urezki,
	neeraj.iitr10, frederic, paulmck, rostedt, rcu

From: Vineeth Pillai <vineeth@bitbyteword.org>

The shrinker is used to speed up the free'ing of memory potentially held
by RCU lazy callbacks. RCU kernel module test cases show this to be
effective. Test is introduced in a later patch.

Signed-off-by: Vineeth Pillai <vineeth@bitbyteword.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_nocb.h | 52 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 61b1409bd4d6..f8d7255f4f0a 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1249,6 +1249,55 @@ int rcu_nocb_cpu_offload(int cpu)
 }
 EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload);
 
+static unsigned long
+lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	int cpu;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+		count += rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+	}
+
+	return count ? count : SHRINK_EMPTY;
+}
+
+static unsigned long
+lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	int cpu;
+	unsigned long flags;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+		int _count = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+
+		if (_count == 0)
+			continue;
+		rcu_nocb_lock_irqsave(rdp, flags);
+		rcu_cblist_reset_lazy_len(&rdp->nocb_bypass);
+		rcu_nocb_unlock_irqrestore(rdp, flags);
+		wake_nocb_gp(rdp, false);
+		sc->nr_to_scan -= _count;
+		count += _count;
+		if (sc->nr_to_scan <= 0)
+			break;
+	}
+	return count ? count : SHRINK_STOP;
+}
+
+static struct shrinker lazy_rcu_shrinker = {
+	.count_objects = lazy_rcu_shrink_count,
+	.scan_objects = lazy_rcu_shrink_scan,
+	.batch = 0,
+	.seeks = DEFAULT_SEEKS,
+};
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -1286,6 +1335,9 @@ void __init rcu_init_nohz(void)
 	if (!rcu_state.nocb_is_setup)
 		return;
 
+	if (register_shrinker(&lazy_rcu_shrinker))
+		pr_err("Failed to register lazy_rcu shrinker!\n");
+
 #if defined(CONFIG_NO_HZ_FULL)
 	if (tick_nohz_full_running)
 		cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 resend 3/6] rcuscale: Add laziness and kfree tests
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 1/6] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 2/6] rcu: shrinker for lazy rcu Joel Fernandes (Google)
@ 2022-08-09  3:45 ` Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu

We had 2 tests to rcuscale, one is a startup test to check whether we
are not too lazy or too hard working. Two, emulate kfree_rcu() itself to
use call_rcu_lazy() and check memory pressure. In my testing,
call_rcu_lazy() does well to keep memory pressure under control, similar
to kfree_rcu().

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcuscale.c | 74 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 277a5bfb37d4..ed5544227f4d 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -95,6 +95,7 @@ torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
 torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
+torture_param(int, kfree_rcu_by_lazy, 0, "Use call_rcu_lazy() to emulate kfree_rcu()?");
 
 static char *scale_type = "rcu";
 module_param(scale_type, charp, 0444);
@@ -658,6 +659,14 @@ struct kfree_obj {
 	struct rcu_head rh;
 };
 
+/* Used if doing RCU-kfree'ing via call_rcu_lazy(). */
+static void kfree_rcu_lazy(struct rcu_head *rh)
+{
+	struct kfree_obj *obj = container_of(rh, struct kfree_obj, rh);
+
+	kfree(obj);
+}
+
 static int
 kfree_scale_thread(void *arg)
 {
@@ -695,6 +704,11 @@ kfree_scale_thread(void *arg)
 			if (!alloc_ptr)
 				return -ENOMEM;
 
+			if (kfree_rcu_by_lazy) {
+				call_rcu_lazy(&(alloc_ptr->rh), kfree_rcu_lazy);
+				continue;
+			}
+
 			// By default kfree_rcu_test_single and kfree_rcu_test_double are
 			// initialized to false. If both have the same value (false or true)
 			// both are randomly tested, otherwise only the one with value true
@@ -737,6 +751,9 @@ kfree_scale_cleanup(void)
 {
 	int i;
 
+	if (kfree_rcu_by_lazy)
+		rcu_force_call_rcu_to_lazy(false);
+
 	if (torture_cleanup_begin())
 		return;
 
@@ -766,11 +783,64 @@ kfree_scale_shutdown(void *arg)
 	return -EINVAL;
 }
 
+// Used if doing RCU-kfree'ing via call_rcu_lazy().
+static unsigned long jiffies_at_lazy_cb;
+static struct rcu_head lazy_test1_rh;
+static int rcu_lazy_test1_cb_called;
+static void call_rcu_lazy_test1(struct rcu_head *rh)
+{
+	jiffies_at_lazy_cb = jiffies;
+	WRITE_ONCE(rcu_lazy_test1_cb_called, 1);
+}
+
 static int __init
 kfree_scale_init(void)
 {
 	long i;
 	int firsterr = 0;
+	unsigned long orig_jif, jif_start;
+
+	// If lazy-rcu based kfree'ing is requested, then for kernels that
+	// support it, force all call_rcu() to call_rcu_lazy() so that non-lazy
+	// CBs do not remove laziness of the lazy ones (since the test tries to
+	// stress call_rcu_lazy() for OOM).
+	//
+	// Also, do a quick self-test to ensure laziness is as much as
+	// expected.
+	if (kfree_rcu_by_lazy && !IS_ENABLED(CONFIG_RCU_LAZY)) {
+		pr_alert("CONFIG_RCU_LAZY is disabled, falling back to kfree_rcu() "
+			 "for delayed RCU kfree'ing\n");
+		kfree_rcu_by_lazy = 0;
+	}
+
+	if (kfree_rcu_by_lazy) {
+		/* do a test to check the timeout. */
+		orig_jif = rcu_lazy_get_jiffies_till_flush();
+
+		rcu_force_call_rcu_to_lazy(true);
+		rcu_lazy_set_jiffies_till_flush(2 * HZ);
+		rcu_barrier();
+
+		jif_start = jiffies;
+		jiffies_at_lazy_cb = 0;
+		call_rcu_lazy(&lazy_test1_rh, call_rcu_lazy_test1);
+
+		smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
+
+		rcu_lazy_set_jiffies_till_flush(orig_jif);
+
+		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
+			pr_alert("ERROR: Lazy CBs are not being lazy as expected!\n");
+			WARN_ON_ONCE(1);
+			return -1;
+		}
+
+		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
+			pr_alert("ERROR: Lazy CBs are being too lazy!\n");
+			WARN_ON_ONCE(1);
+			return -1;
+		}
+	}
 
 	kfree_nrealthreads = compute_real(kfree_nthreads);
 	/* Start up the kthreads. */
@@ -783,7 +853,9 @@ kfree_scale_init(void)
 		schedule_timeout_uninterruptible(1);
 	}
 
-	pr_alert("kfree object size=%zu\n", kfree_mult * sizeof(struct kfree_obj));
+	pr_alert("kfree object size=%zu, kfree_rcu_by_lazy=%d\n",
+			kfree_mult * sizeof(struct kfree_obj),
+			kfree_rcu_by_lazy);
 
 	kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
 			       GFP_KERNEL);
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2022-08-09  3:45 ` [PATCH v3 resend 3/6] rcuscale: Add laziness and kfree tests Joel Fernandes (Google)
@ 2022-08-09  3:45 ` Joel Fernandes (Google)
  2022-08-18 17:22   ` Joel Fernandes
  2022-08-18 17:23   ` Joel Fernandes
  2022-08-09  3:45 ` [PATCH v3 resend 5/6] rcutorture: Add test code for call_rcu_lazy() Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu

This is required to prevent callbacks triggering RCU machinery too
quickly and too often, which adds more power to the system.

When testing, we found that these paths were invoked often when the
system is not doing anything (screen is ON but otherwise idle).

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 fs/dcache.c     | 4 ++--
 fs/eventpoll.c  | 2 +-
 fs/file_table.c | 2 +-
 fs/inode.c      | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 93f4f5ee07bf..7f51bac390c8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -366,7 +366,7 @@ static void dentry_free(struct dentry *dentry)
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
 		if (likely(atomic_dec_and_test(&p->u.count))) {
-			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
+			call_rcu_lazy(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
 	}
@@ -374,7 +374,7 @@ static void dentry_free(struct dentry *dentry)
 	if (dentry->d_flags & DCACHE_NORCU)
 		__d_free(&dentry->d_u.d_rcu);
 	else
-		call_rcu(&dentry->d_u.d_rcu, __d_free);
+		call_rcu_lazy(&dentry->d_u.d_rcu, __d_free);
 }
 
 /*
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 971f98af48ff..57b3f781760c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -729,7 +729,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	 * ep->mtx. The rcu read side, reverse_path_check_proc(), does not make
 	 * use of the rbn field.
 	 */
-	call_rcu(&epi->rcu, epi_rcu_free);
+	call_rcu_lazy(&epi->rcu, epi_rcu_free);
 
 	percpu_counter_dec(&ep->user->epoll_watches);
 
diff --git a/fs/file_table.c b/fs/file_table.c
index 5424e3a8df5f..417f57e9cb30 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -56,7 +56,7 @@ static inline void file_free(struct file *f)
 	security_file_free(f);
 	if (!(f->f_mode & FMODE_NOACCOUNT))
 		percpu_counter_dec(&nr_files);
-	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
+	call_rcu_lazy(&f->f_u.fu_rcuhead, file_free_rcu);
 }
 
 /*
diff --git a/fs/inode.c b/fs/inode.c
index bd4da9c5207e..38fe040ddbd6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -312,7 +312,7 @@ static void destroy_inode(struct inode *inode)
 			return;
 	}
 	inode->free_inode = ops->free_inode;
-	call_rcu(&inode->i_rcu, i_callback);
+	call_rcu_lazy(&inode->i_rcu, i_callback);
 }
 
 /**
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 resend 5/6] rcutorture: Add test code for call_rcu_lazy()
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2022-08-09  3:45 ` [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
@ 2022-08-09  3:45 ` Joel Fernandes (Google)
  2022-08-09  3:45 ` [PATCH v3 resend 6/6] debug: Toggle lazy at runtime and change flush jiffies Joel Fernandes (Google)
  2022-08-11  2:23 ` [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes
  6 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu

We add a new RCU type to test call_rcu_lazy(). This allows us to just
override the '.call' callback. To compensate for the laziness, we force
the laziness to a small number of jiffies. The idea of this test is to
stress the new code paths for stability and ensure it at least is providing
behavior in parity with, or similar to, call_rcu(). The actual check for
amount of laziness is in another test (rcuscale).

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu.h                              |  1 +
 kernel/rcu/rcutorture.c                       | 60 ++++++++++++++++++-
 kernel/rcu/tree.c                             |  1 +
 .../selftests/rcutorture/configs/rcu/CFLIST   |  1 +
 .../selftests/rcutorture/configs/rcu/TREE11   | 18 ++++++
 .../rcutorture/configs/rcu/TREE11.boot        |  8 +++
 6 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11
 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 608f6ab76c7f..aa3243e49506 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -460,6 +460,7 @@ enum rcutorture_type {
 	RCU_TASKS_TRACING_FLAVOR,
 	RCU_TRIVIAL_FLAVOR,
 	SRCU_FLAVOR,
+	RCU_LAZY_FLAVOR,
 	INVALID_RCU_FLAVOR
 };
 
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7120165a9342..c52cc4c064f9 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -872,6 +872,64 @@ static struct rcu_torture_ops tasks_rude_ops = {
 
 #endif // #else #ifdef CONFIG_TASKS_RUDE_RCU
 
+#ifdef CONFIG_RCU_LAZY
+
+/*
+ * Definitions for lazy RCU torture testing.
+ */
+static unsigned long orig_jiffies_till_flush;
+
+static void rcu_sync_torture_init_lazy(void)
+{
+	rcu_sync_torture_init();
+
+	orig_jiffies_till_flush = rcu_lazy_get_jiffies_till_flush();
+	rcu_lazy_set_jiffies_till_flush(50);
+}
+
+static void rcu_lazy_cleanup(void)
+{
+	rcu_lazy_set_jiffies_till_flush(orig_jiffies_till_flush);
+}
+
+static struct rcu_torture_ops rcu_lazy_ops = {
+	.ttype			= RCU_LAZY_FLAVOR,
+	.init			= rcu_sync_torture_init_lazy,
+	.cleanup		= rcu_lazy_cleanup,
+	.readlock		= rcu_torture_read_lock,
+	.read_delay		= rcu_read_delay,
+	.readunlock		= rcu_torture_read_unlock,
+	.readlock_held		= torture_readlock_not_held,
+	.get_gp_seq		= rcu_get_gp_seq,
+	.gp_diff		= rcu_seq_diff,
+	.deferred_free		= rcu_torture_deferred_free,
+	.sync			= synchronize_rcu,
+	.exp_sync		= synchronize_rcu_expedited,
+	.get_gp_state		= get_state_synchronize_rcu,
+	.start_gp_poll		= start_poll_synchronize_rcu,
+	.poll_gp_state		= poll_state_synchronize_rcu,
+	.cond_sync		= cond_synchronize_rcu,
+	.call			= call_rcu_lazy,
+	.cb_barrier		= rcu_barrier,
+	.fqs			= rcu_force_quiescent_state,
+	.stats			= NULL,
+	.gp_kthread_dbg		= show_rcu_gp_kthreads,
+	.check_boost_failed	= rcu_check_boost_fail,
+	.stall_dur		= rcu_jiffies_till_stall_check,
+	.irq_capable		= 1,
+	.can_boost		= IS_ENABLED(CONFIG_RCU_BOOST),
+	.extendables		= RCUTORTURE_MAX_EXTEND,
+	.name			= "rcu_lazy"
+};
+
+#define LAZY_OPS &rcu_lazy_ops,
+
+#else // #ifdef CONFIG_RCU_LAZY
+
+#define LAZY_OPS
+
+#endif // #else #ifdef CONFIG_RCU_LAZY
+
 
 #ifdef CONFIG_TASKS_TRACE_RCU
 
@@ -3145,7 +3203,7 @@ rcu_torture_init(void)
 	unsigned long gp_seq = 0;
 	static struct rcu_torture_ops *torture_ops[] = {
 		&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &busted_srcud_ops,
-		TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS
+		TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS LAZY_OPS
 		&trivial_ops,
 	};
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e76fef8031be..67026382dc21 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -600,6 +600,7 @@ void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
 {
 	switch (test_type) {
 	case RCU_FLAVOR:
+	case RCU_LAZY_FLAVOR:
 		*flags = READ_ONCE(rcu_state.gp_flags);
 		*gp_seq = rcu_seq_current(&rcu_state.gp_seq);
 		break;
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/CFLIST b/tools/testing/selftests/rcutorture/configs/rcu/CFLIST
index 98b6175e5aa0..609c3370616f 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/CFLIST
+++ b/tools/testing/selftests/rcutorture/configs/rcu/CFLIST
@@ -5,6 +5,7 @@ TREE04
 TREE05
 TREE07
 TREE09
+TREE11
 SRCU-N
 SRCU-P
 SRCU-T
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11 b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
new file mode 100644
index 000000000000..436013f3e015
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
@@ -0,0 +1,18 @@
+CONFIG_SMP=y
+CONFIG_PREEMPT_NONE=n
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=y
+#CHECK#CONFIG_PREEMPT_RCU=y
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_FULL=n
+CONFIG_RCU_TRACE=y
+CONFIG_HOTPLUG_CPU=y
+CONFIG_MAXSMP=y
+CONFIG_CPUMASK_OFFSTACK=y
+CONFIG_RCU_NOCB_CPU=y
+CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_RCU_BOOST=n
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_RCU_EXPERT=y
+CONFIG_RCU_LAZY=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
new file mode 100644
index 000000000000..9b6f720d4ccd
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
@@ -0,0 +1,8 @@
+maxcpus=8 nr_cpus=43
+rcutree.gp_preinit_delay=3
+rcutree.gp_init_delay=3
+rcutree.gp_cleanup_delay=3
+rcu_nocbs=0-7
+rcutorture.torture_type=rcu_lazy
+rcutorture.nocbs_nthreads=8
+rcutorture.fwd_progress=0
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 resend 6/6] debug: Toggle lazy at runtime and change flush jiffies
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2022-08-09  3:45 ` [PATCH v3 resend 5/6] rcutorture: Add test code for call_rcu_lazy() Joel Fernandes (Google)
@ 2022-08-09  3:45 ` Joel Fernandes (Google)
  2022-08-11  2:23 ` [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes
  6 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes (Google) @ 2022-08-09  3:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes (Google),
	rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu

Enable/Disable this feature by writing 1 or 0 /proc/sys/vm/rcu_lazy.

Change value of /proc/sys/vm/rcu_lazy_jiffies to change max duration before
flush.

Do not merge, only for debug for reviewers.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Change-Id: Ia733937531f7a75e75f7b6807319e18c94d70750
---
 include/linux/sched/sysctl.h |  3 +++
 kernel/rcu/tree_nocb.h       |  9 +++++++++
 kernel/sysctl.c              | 17 +++++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 2cd928f15df6..54610f9cd962 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -14,6 +14,9 @@ extern unsigned long sysctl_hung_task_timeout_secs;
 enum { sysctl_hung_task_timeout_secs = 0 };
 #endif
 
+extern unsigned int sysctl_rcu_lazy;
+extern unsigned int sysctl_rcu_lazy_jiffies;
+
 enum sched_tunable_scaling {
 	SCHED_TUNABLESCALING_NONE,
 	SCHED_TUNABLESCALING_LOG,
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f8d7255f4f0a..2b7d3d87e70b 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -266,6 +266,9 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
 #define LAZY_FLUSH_JIFFIES (10 * HZ)
 unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
 
+unsigned int sysctl_rcu_lazy_jiffies = LAZY_FLUSH_JIFFIES;
+unsigned int sysctl_rcu_lazy = 1;
+
 #ifdef CONFIG_RCU_LAZY
 // To be called only from test code.
 void rcu_lazy_set_jiffies_till_flush(unsigned long jif)
@@ -292,6 +295,9 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 	unsigned long mod_jif = 0;
 
+	/* debug: not for merge */
+	rcu_lazy_set_jiffies_till_flush(sysctl_rcu_lazy_jiffies);
+
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
 	/*
@@ -687,6 +693,9 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
 	bool wasempty = false;
 
+	/* debug: not for merge */
+	rcu_lazy_set_jiffies_till_flush(sysctl_rcu_lazy_jiffies);
+
 	/*
 	 * Each pass through the following loop checks for CBs and for the
 	 * nearest grace period (if any) to wait for next.  The CB kthreads
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b00f92df0af5..bbe25d635dc0 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2450,6 +2450,23 @@ static struct ctl_table vm_table[] = {
 		.extra2		= SYSCTL_ONE,
 	},
 #endif
+#ifdef CONFIG_RCU_LAZY
+	{
+		.procname	= "rcu_lazy",
+		.data		= &sysctl_rcu_lazy,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
+		.procname	= "rcu_lazy_jiffies",
+		.data		= &sysctl_rcu_lazy_jiffies,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif
+
 	{ }
 };
 
-- 
2.37.1.559.g78731f0fdb-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes
  2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (5 preceding siblings ...)
  2022-08-09  3:45 ` [PATCH v3 resend 6/6] debug: Toggle lazy at runtime and change flush jiffies Joel Fernandes (Google)
@ 2022-08-11  2:23 ` Joel Fernandes
  2022-08-11  2:31   ` Joel Fernandes
  6 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-11  2:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu



On 8/8/2022 11:45 PM, Joel Fernandes (Google) wrote:
> Just a refresh of v3 with one additional debug patch. v3's cover letter is here:
>  https://lore.kernel.org/all/20220713213237.1596225-1-joel@joelfernandes.org/
> 
> I just started working on this again while I have some time during paternity
> leave ;-) So I thought I'll just send it out again. No other changes other
> than that 1 debug patch I added on the top.
> 
> Next I am going to go refine the power results as mentioned in Paul's comments
> on the last cover letter.

Side note: Here is another big selling point for call_rcu_lazy().
Instead of _lazy(), if you just increased jiffies_till_first_fqs, and
slowed *all* call_rcu() down to achieve the same effect, that would
affect percpu refcounters switching to atomic-mode, for example.

They switch to atomic mode by calling __percpu_ref_switch_mode() which
is called by percpu_ref_switch_to_atomic_sync().

This will slow this call down for the full lazy duration which will slow
down suspend in blk_pre_runtime_suspend().

This is why, we cannot assume call_rcu() users will mostly just want to
free memory. There could be cases just like this, and just blanket slow
down of call_rcu() might bite at unexpected times.

I am going to add this as a selling point for selective lazyfication
(hey I get to invent words while I'm inventing new features), to my
cover letter and slides.

 - Joel



> 
> Joel Fernandes (Google) (5):
> rcu: Introduce call_rcu_lazy() API implementation
> rcuscale: Add laziness and kfree tests
> fs: Move call_rcu() to call_rcu_lazy() in some paths
> rcutorture: Add test code for call_rcu_lazy()
> debug: Toggle lazy at runtime and change flush jiffies
> 
> Vineeth Pillai (1):
> rcu: shrinker for lazy rcu
> 
> fs/dcache.c                                   |   4 +-
> fs/eventpoll.c                                |   2 +-
> fs/file_table.c                               |   2 +-
> fs/inode.c                                    |   2 +-
> include/linux/rcu_segcblist.h                 |   1 +
> include/linux/rcupdate.h                      |   6 +
> include/linux/sched/sysctl.h                  |   3 +
> kernel/rcu/Kconfig                            |   8 +
> kernel/rcu/rcu.h                              |  12 +
> kernel/rcu/rcu_segcblist.c                    |  15 +-
> kernel/rcu/rcu_segcblist.h                    |  20 +-
> kernel/rcu/rcuscale.c                         |  74 +++++-
> kernel/rcu/rcutorture.c                       |  60 ++++-
> kernel/rcu/tree.c                             | 131 ++++++----
> kernel/rcu/tree.h                             |  10 +-
> kernel/rcu/tree_nocb.h                        | 246 +++++++++++++++---
> kernel/sysctl.c                               |  17 ++
> .../selftests/rcutorture/configs/rcu/CFLIST   |   1 +
> .../selftests/rcutorture/configs/rcu/TREE11   |  18 ++
> .../rcutorture/configs/rcu/TREE11.boot        |   8 +
> 20 files changed, 536 insertions(+), 104 deletions(-)
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> 
> --
> 2.37.1.559.g78731f0fdb-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes
  2022-08-11  2:23 ` [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes
@ 2022-08-11  2:31   ` Joel Fernandes
  2022-08-11  2:51     ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-11  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: rushikesh.s.kadam, urezki, neeraj.iitr10, frederic, paulmck,
	rostedt, rcu



On 8/10/2022 10:23 PM, Joel Fernandes wrote:
> 
> 
> On 8/8/2022 11:45 PM, Joel Fernandes (Google) wrote:
>> Just a refresh of v3 with one additional debug patch. v3's cover letter is here:
>>  https://lore.kernel.org/all/20220713213237.1596225-1-joel@joelfernandes.org/
>>
>> I just started working on this again while I have some time during paternity
>> leave ;-) So I thought I'll just send it out again. No other changes other
>> than that 1 debug patch I added on the top.
>>
>> Next I am going to go refine the power results as mentioned in Paul's comments
>> on the last cover letter.
> 
> Side note: Here is another big selling point for call_rcu_lazy().
> Instead of _lazy(), if you just increased jiffies_till_first_fqs, and
> slowed *all* call_rcu() down to achieve the same effect, that would
> affect percpu refcounters switching to atomic-mode, for example.
> 
> They switch to atomic mode by calling __percpu_ref_switch_mode() which
> is called by percpu_ref_switch_to_atomic_sync().>
> This will slow this call down for the full lazy duration which will slow
> down suspend in blk_pre_runtime_suspend().

Correction while I am going on the record (got to be careful these
days). It *might* slow down RCU for the full lazy duration, unless of
course a fly-by rescue call_rcu() comes in.

- Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes
  2022-08-11  2:31   ` Joel Fernandes
@ 2022-08-11  2:51     ` Paul E. McKenney
  2022-08-11  3:22       ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-08-11  2:51 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	rostedt, rcu

On Wed, Aug 10, 2022 at 10:31:56PM -0400, Joel Fernandes wrote:
> 
> 
> On 8/10/2022 10:23 PM, Joel Fernandes wrote:
> > 
> > 
> > On 8/8/2022 11:45 PM, Joel Fernandes (Google) wrote:
> >> Just a refresh of v3 with one additional debug patch. v3's cover letter is here:
> >>  https://lore.kernel.org/all/20220713213237.1596225-1-joel@joelfernandes.org/
> >>
> >> I just started working on this again while I have some time during paternity
> >> leave ;-) So I thought I'll just send it out again. No other changes other
> >> than that 1 debug patch I added on the top.
> >>
> >> Next I am going to go refine the power results as mentioned in Paul's comments
> >> on the last cover letter.
> > 
> > Side note: Here is another big selling point for call_rcu_lazy().
> > Instead of _lazy(), if you just increased jiffies_till_first_fqs, and
> > slowed *all* call_rcu() down to achieve the same effect, that would
> > affect percpu refcounters switching to atomic-mode, for example.
> > 
> > They switch to atomic mode by calling __percpu_ref_switch_mode() which
> > is called by percpu_ref_switch_to_atomic_sync().>
> > This will slow this call down for the full lazy duration which will slow
> > down suspend in blk_pre_runtime_suspend().
> 
> Correction while I am going on the record (got to be careful these
> days). It *might* slow down RCU for the full lazy duration, unless of
> course a fly-by rescue call_rcu() comes in.

Just unload a module, which if I remember correctly invokes rcu_barrier().
Lots of rescue callbacks.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes
  2022-08-11  2:51     ` Paul E. McKenney
@ 2022-08-11  3:22       ` Joel Fernandes
  2022-08-11  3:46         ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-11  3:22 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	rostedt, rcu



On 8/10/2022 10:51 PM, Paul E. McKenney wrote:
> On Wed, Aug 10, 2022 at 10:31:56PM -0400, Joel Fernandes wrote:
>>
>>
>> On 8/10/2022 10:23 PM, Joel Fernandes wrote:
>>>
>>>
>>> On 8/8/2022 11:45 PM, Joel Fernandes (Google) wrote:
>>>> Just a refresh of v3 with one additional debug patch. v3's cover letter is here:
>>>>  https://lore.kernel.org/all/20220713213237.1596225-1-joel@joelfernandes.org/
>>>>
>>>> I just started working on this again while I have some time during paternity
>>>> leave ;-) So I thought I'll just send it out again. No other changes other
>>>> than that 1 debug patch I added on the top.
>>>>
>>>> Next I am going to go refine the power results as mentioned in Paul's comments
>>>> on the last cover letter.
>>>
>>> Side note: Here is another big selling point for call_rcu_lazy().
>>> Instead of _lazy(), if you just increased jiffies_till_first_fqs, and
>>> slowed *all* call_rcu() down to achieve the same effect, that would
>>> affect percpu refcounters switching to atomic-mode, for example.
>>>
>>> They switch to atomic mode by calling __percpu_ref_switch_mode() which
>>> is called by percpu_ref_switch_to_atomic_sync().>
>>> This will slow this call down for the full lazy duration which will slow
>>> down suspend in blk_pre_runtime_suspend().
>>
>> Correction while I am going on the record (got to be careful these
>> days). It *might* slow down RCU for the full lazy duration, unless of
>> course a fly-by rescue call_rcu() comes in.
> 
> Just unload a module, which if I remember correctly invokes rcu_barrier().
> Lots of rescue callbacks.  ;-)

Haha. Yes I suppose the per-cpu atomic switch paths can also invoke
rcu_barrier() but I suspect somebody might complain about IPIs :-P

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes
  2022-08-11  3:22       ` Joel Fernandes
@ 2022-08-11  3:46         ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2022-08-11  3:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	rostedt, rcu

On Wed, Aug 10, 2022 at 11:22:13PM -0400, Joel Fernandes wrote:
> 
> 
> On 8/10/2022 10:51 PM, Paul E. McKenney wrote:
> > On Wed, Aug 10, 2022 at 10:31:56PM -0400, Joel Fernandes wrote:
> >>
> >>
> >> On 8/10/2022 10:23 PM, Joel Fernandes wrote:
> >>>
> >>>
> >>> On 8/8/2022 11:45 PM, Joel Fernandes (Google) wrote:
> >>>> Just a refresh of v3 with one additional debug patch. v3's cover letter is here:
> >>>>  https://lore.kernel.org/all/20220713213237.1596225-1-joel@joelfernandes.org/
> >>>>
> >>>> I just started working on this again while I have some time during paternity
> >>>> leave ;-) So I thought I'll just send it out again. No other changes other
> >>>> than that 1 debug patch I added on the top.
> >>>>
> >>>> Next I am going to go refine the power results as mentioned in Paul's comments
> >>>> on the last cover letter.
> >>>
> >>> Side note: Here is another big selling point for call_rcu_lazy().
> >>> Instead of _lazy(), if you just increased jiffies_till_first_fqs, and
> >>> slowed *all* call_rcu() down to achieve the same effect, that would
> >>> affect percpu refcounters switching to atomic-mode, for example.
> >>>
> >>> They switch to atomic mode by calling __percpu_ref_switch_mode() which
> >>> is called by percpu_ref_switch_to_atomic_sync().>
> >>> This will slow this call down for the full lazy duration which will slow
> >>> down suspend in blk_pre_runtime_suspend().
> >>
> >> Correction while I am going on the record (got to be careful these
> >> days). It *might* slow down RCU for the full lazy duration, unless of
> >> course a fly-by rescue call_rcu() comes in.
> > 
> > Just unload a module, which if I remember correctly invokes rcu_barrier().
> > Lots of rescue callbacks.  ;-)
> 
> Haha. Yes I suppose the per-cpu atomic switch paths can also invoke
> rcu_barrier() but I suspect somebody might complain about IPIs :-P

There is always a critic!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-09  3:45 ` [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
@ 2022-08-18 17:22   ` Joel Fernandes
  2022-08-18 17:23   ` Joel Fernandes
  1 sibling, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2022-08-18 17:22 UTC (permalink / raw)
  To: LKML

On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> This is required to prevent callbacks triggering RCU machinery too
> quickly and too often, which adds more power to the system.
>
> When testing, we found that these paths were invoked often when the
> system is not doing anything (screen is ON but otherwise idle).

Unfortunately, I am seeing a slow down in ChromeOS boot performance
after applying this particular patch. It is the first time I could
test ChromeOS boot times with the series since it was hard to find a
ChromeOS device that runs the upstream kernel.

Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
patch? I wonder if the issue is with wake up interaction with the nocb
GP threads.

We ought to disable lazy RCU during boot since it would have little
benefit anyway. But I am also concerned about some deeper problem I
did not catch before.

I'll look into tracing the fs paths to see if I can narrow down what's
causing it. Will also try a newer kernel, I am currently testing on
5.19-rc4.

Thanks,

 - Joel

>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  fs/dcache.c     | 4 ++--
>  fs/eventpoll.c  | 2 +-
>  fs/file_table.c | 2 +-
>  fs/inode.c      | 2 +-
>  4 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 93f4f5ee07bf..7f51bac390c8 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -366,7 +366,7 @@ static void dentry_free(struct dentry *dentry)
>         if (unlikely(dname_external(dentry))) {
>                 struct external_name *p = external_name(dentry);
>                 if (likely(atomic_dec_and_test(&p->u.count))) {
> -                       call_rcu(&dentry->d_u.d_rcu, __d_free_external);
> +                       call_rcu_lazy(&dentry->d_u.d_rcu, __d_free_external);
>                         return;
>                 }
>         }
> @@ -374,7 +374,7 @@ static void dentry_free(struct dentry *dentry)
>         if (dentry->d_flags & DCACHE_NORCU)
>                 __d_free(&dentry->d_u.d_rcu);
>         else
> -               call_rcu(&dentry->d_u.d_rcu, __d_free);
> +               call_rcu_lazy(&dentry->d_u.d_rcu, __d_free);
>  }
>
>  /*
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 971f98af48ff..57b3f781760c 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -729,7 +729,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
>          * ep->mtx. The rcu read side, reverse_path_check_proc(), does not make
>          * use of the rbn field.
>          */
> -       call_rcu(&epi->rcu, epi_rcu_free);
> +       call_rcu_lazy(&epi->rcu, epi_rcu_free);
>
>         percpu_counter_dec(&ep->user->epoll_watches);
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 5424e3a8df5f..417f57e9cb30 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -56,7 +56,7 @@ static inline void file_free(struct file *f)
>         security_file_free(f);
>         if (!(f->f_mode & FMODE_NOACCOUNT))
>                 percpu_counter_dec(&nr_files);
> -       call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
> +       call_rcu_lazy(&f->f_u.fu_rcuhead, file_free_rcu);
>  }
>
>  /*
> diff --git a/fs/inode.c b/fs/inode.c
> index bd4da9c5207e..38fe040ddbd6 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -312,7 +312,7 @@ static void destroy_inode(struct inode *inode)
>                         return;
>         }
>         inode->free_inode = ops->free_inode;
> -       call_rcu(&inode->i_rcu, i_callback);
> +       call_rcu_lazy(&inode->i_rcu, i_callback);
>  }
>
>  /**
> --
> 2.37.1.559.g78731f0fdb-goog
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-09  3:45 ` [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
  2022-08-18 17:22   ` Joel Fernandes
@ 2022-08-18 17:23   ` Joel Fernandes
  2022-08-18 23:05     ` Joel Fernandes
  1 sibling, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-18 17:23 UTC (permalink / raw)
  To: LKML
  Cc: Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Paul E. McKenney,
	Steven Rostedt, rcu

[Sorry, adding back the CC list]

On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> This is required to prevent callbacks triggering RCU machinery too
> quickly and too often, which adds more power to the system.
>
> When testing, we found that these paths were invoked often when the
> system is not doing anything (screen is ON but otherwise idle).

Unfortunately, I am seeing a slow down in ChromeOS boot performance
after applying this particular patch. It is the first time I could
test ChromeOS boot times with the series since it was hard to find a
ChromeOS device that runs the upstream kernel.

Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
patch? I wonder if the issue is with wake up interaction with the nocb
GP threads.

We ought to disable lazy RCU during boot since it would have little
benefit anyway. But I am also concerned about some deeper problem I
did not catch before.

I'll look into tracing the fs paths to see if I can narrow down what's
causing it. Will also try a newer kernel, I am currently testing on
5.19-rc4.

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-18 17:23   ` Joel Fernandes
@ 2022-08-18 23:05     ` Joel Fernandes
  2022-08-19  1:21       ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-18 23:05 UTC (permalink / raw)
  To: LKML
  Cc: Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Paul E. McKenney,
	Steven Rostedt, rcu

On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> [Sorry, adding back the CC list]
>
> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
> <joel@joelfernandes.org> wrote:
> >
> > This is required to prevent callbacks triggering RCU machinery too
> > quickly and too often, which adds more power to the system.
> >
> > When testing, we found that these paths were invoked often when the
> > system is not doing anything (screen is ON but otherwise idle).
>
> Unfortunately, I am seeing a slow down in ChromeOS boot performance
> after applying this particular patch. It is the first time I could
> test ChromeOS boot times with the series since it was hard to find a
> ChromeOS device that runs the upstream kernel.
>
> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
> patch? I wonder if the issue is with wake up interaction with the nocb
> GP threads.
>
> We ought to disable lazy RCU during boot since it would have little
> benefit anyway. But I am also concerned about some deeper problem I
> did not catch before.
>
> I'll look into tracing the fs paths to see if I can narrow down what's
> causing it. Will also try a newer kernel, I am currently testing on
> 5.19-rc4.

I got somewhere with this. It looks like queuing CBs as lazy CBs
instead of normal CBs, are triggering expedited stalls during the boot
process:

  39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.

No idea how/why lazy RCU CBs would be related to expedited GP issues,
but maybe something hangs and causes that side-effect.

initcall_debug did not help, as it seems initcalls all work fine, and
then 8 seconds after the boot, it starts slowing down a lot, followed
by the RCU stall messages. As a next step I'll enable ftrace during
the boot to see if I can get more insight. But I believe, its not the
FS layer, the FS layer just triggers lazy CBs, but there is something
wrong with the core lazy-RCU work itself.

This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
recent kernels and debug.

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-18 23:05     ` Joel Fernandes
@ 2022-08-19  1:21       ` Joel Fernandes
  2022-08-19  1:29         ` Joel Fernandes
  2022-08-19  2:35         ` Paul E. McKenney
  0 siblings, 2 replies; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19  1:21 UTC (permalink / raw)
  To: LKML
  Cc: Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Paul E. McKenney,
	Steven Rostedt, rcu

On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > [Sorry, adding back the CC list]
> >
> > On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
> > <joel@joelfernandes.org> wrote:
> > >
> > > This is required to prevent callbacks triggering RCU machinery too
> > > quickly and too often, which adds more power to the system.
> > >
> > > When testing, we found that these paths were invoked often when the
> > > system is not doing anything (screen is ON but otherwise idle).
> >
> > Unfortunately, I am seeing a slow down in ChromeOS boot performance
> > after applying this particular patch. It is the first time I could
> > test ChromeOS boot times with the series since it was hard to find a
> > ChromeOS device that runs the upstream kernel.
> >
> > Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
> > patch? I wonder if the issue is with wake up interaction with the nocb
> > GP threads.
> >
> > We ought to disable lazy RCU during boot since it would have little
> > benefit anyway. But I am also concerned about some deeper problem I
> > did not catch before.
> >
> > I'll look into tracing the fs paths to see if I can narrow down what's
> > causing it. Will also try a newer kernel, I am currently testing on
> > 5.19-rc4.
>
> I got somewhere with this. It looks like queuing CBs as lazy CBs
> instead of normal CBs, are triggering expedited stalls during the boot
> process:
>
>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>
> No idea how/why lazy RCU CBs would be related to expedited GP issues,
> but maybe something hangs and causes that side-effect.
>
> initcall_debug did not help, as it seems initcalls all work fine, and
> then 8 seconds after the boot, it starts slowing down a lot, followed
> by the RCU stall messages. As a next step I'll enable ftrace during
> the boot to see if I can get more insight. But I believe, its not the
> FS layer, the FS layer just triggers lazy CBs, but there is something
> wrong with the core lazy-RCU work itself.
>
> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
> recent kernels and debug.

More digging, thanks to trace_event= boot option , I find that the
boot process does have some synchronous waits, and though these are
"non-lazy", for some reason the lazy CBs that were previously queued
are making them wait for the *full* lazy duration. Which points to a
likely bug in the lazy RCU logic. These synchronous CBs should never
be waiting like the lazy ones:

[   17.715904]  => trace_dump_stack
[   17.715904]  => __wait_rcu_gp
[   17.715904]  => synchronize_rcu
[   17.715904]  => selinux_netcache_avc_callback
[   17.715904]  => avc_ss_reset
[   17.715904]  => sel_write_enforce
[   17.715904]  => vfs_write
[   17.715904]  => ksys_write
[   17.715904]  => do_syscall_64
[   17.715904]  => entry_SYSCALL_64_after_hwframe

I'm tired so I'll resume the debug later.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19  1:21       ` Joel Fernandes
@ 2022-08-19  1:29         ` Joel Fernandes
  2022-08-19  2:35         ` Paul E. McKenney
  1 sibling, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19  1:29 UTC (permalink / raw)
  To: LKML
  Cc: Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Paul E. McKenney,
	Steven Rostedt, rcu

On Thu, Aug 18, 2022 at 9:21 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > [Sorry, adding back the CC list]
> > >
> > > On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
> > > <joel@joelfernandes.org> wrote:
> > > >
> > > > This is required to prevent callbacks triggering RCU machinery too
> > > > quickly and too often, which adds more power to the system.
> > > >
> > > > When testing, we found that these paths were invoked often when the
> > > > system is not doing anything (screen is ON but otherwise idle).
> > >
> > > Unfortunately, I am seeing a slow down in ChromeOS boot performance
> > > after applying this particular patch. It is the first time I could
> > > test ChromeOS boot times with the series since it was hard to find a
> > > ChromeOS device that runs the upstream kernel.
> > >
> > > Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
> > > patch? I wonder if the issue is with wake up interaction with the nocb
> > > GP threads.
> > >
> > > We ought to disable lazy RCU during boot since it would have little
> > > benefit anyway. But I am also concerned about some deeper problem I
> > > did not catch before.
> > >
> > > I'll look into tracing the fs paths to see if I can narrow down what's
> > > causing it. Will also try a newer kernel, I am currently testing on
> > > 5.19-rc4.
> >
> > I got somewhere with this. It looks like queuing CBs as lazy CBs
> > instead of normal CBs, are triggering expedited stalls during the boot
> > process:
> >
> >   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
> > CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
> >
> > No idea how/why lazy RCU CBs would be related to expedited GP issues,
> > but maybe something hangs and causes that side-effect.
> >
> > initcall_debug did not help, as it seems initcalls all work fine, and
> > then 8 seconds after the boot, it starts slowing down a lot, followed
> > by the RCU stall messages. As a next step I'll enable ftrace during
> > the boot to see if I can get more insight. But I believe, its not the
> > FS layer, the FS layer just triggers lazy CBs, but there is something
> > wrong with the core lazy-RCU work itself.
> >
> > This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
> > recent kernels and debug.
>
> More digging, thanks to trace_event= boot option , I find that the
> boot process does have some synchronous waits, and though these are
> "non-lazy", for some reason the lazy CBs that were previously queued
> are making them wait for the *full* lazy duration. Which points to a
> likely bug in the lazy RCU logic. These synchronous CBs should never
> be waiting like the lazy ones:
>

[   17.715904]     init-1         2..... 8917039us : <stack trace>
> [   17.715904]  => trace_dump_stack
> [   17.715904]  => __wait_rcu_gp
> [   17.715904]  => synchronize_rcu
> [   17.715904]  => selinux_netcache_avc_callback
> [   17.715904]  => avc_ss_reset
> [   17.715904]  => sel_write_enforce
> [   17.715904]  => vfs_write
> [   17.715904]  => ksys_write
> [   17.715904]  => do_syscall_64
> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>
> I'm tired so I'll resume the debug later.

Rushikesh, btw do note that using jiffies_till_first_fqs would also
likely slow these synchronous waits down, and thus slow down the boot
process for ChromeOS. This could be what you're seeing with the
jiffies option. No idea why you were not seeing this in earlier
experiments, maybe there was a change in ChromeOS with how selinux
enforce option was being written out, or something. This write syscall
is being done by the init process. I'll add this to the slides as well
as another call_rcu_lazy() selling point (vs using jiffies) :)

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19  1:21       ` Joel Fernandes
  2022-08-19  1:29         ` Joel Fernandes
@ 2022-08-19  2:35         ` Paul E. McKenney
  2022-08-19  2:45           ` Joel Fernandes
  1 sibling, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-08-19  2:35 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu

On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > [Sorry, adding back the CC list]
> > >
> > > On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
> > > <joel@joelfernandes.org> wrote:
> > > >
> > > > This is required to prevent callbacks triggering RCU machinery too
> > > > quickly and too often, which adds more power to the system.
> > > >
> > > > When testing, we found that these paths were invoked often when the
> > > > system is not doing anything (screen is ON but otherwise idle).
> > >
> > > Unfortunately, I am seeing a slow down in ChromeOS boot performance
> > > after applying this particular patch. It is the first time I could
> > > test ChromeOS boot times with the series since it was hard to find a
> > > ChromeOS device that runs the upstream kernel.
> > >
> > > Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
> > > patch? I wonder if the issue is with wake up interaction with the nocb
> > > GP threads.
> > >
> > > We ought to disable lazy RCU during boot since it would have little
> > > benefit anyway. But I am also concerned about some deeper problem I
> > > did not catch before.
> > >
> > > I'll look into tracing the fs paths to see if I can narrow down what's
> > > causing it. Will also try a newer kernel, I am currently testing on
> > > 5.19-rc4.
> >
> > I got somewhere with this. It looks like queuing CBs as lazy CBs
> > instead of normal CBs, are triggering expedited stalls during the boot
> > process:
> >
> >   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
> > CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
> >
> > No idea how/why lazy RCU CBs would be related to expedited GP issues,
> > but maybe something hangs and causes that side-effect.
> >
> > initcall_debug did not help, as it seems initcalls all work fine, and
> > then 8 seconds after the boot, it starts slowing down a lot, followed
> > by the RCU stall messages. As a next step I'll enable ftrace during
> > the boot to see if I can get more insight. But I believe, its not the
> > FS layer, the FS layer just triggers lazy CBs, but there is something
> > wrong with the core lazy-RCU work itself.
> >
> > This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
> > recent kernels and debug.
> 
> More digging, thanks to trace_event= boot option , I find that the
> boot process does have some synchronous waits, and though these are
> "non-lazy", for some reason the lazy CBs that were previously queued
> are making them wait for the *full* lazy duration. Which points to a
> likely bug in the lazy RCU logic. These synchronous CBs should never
> be waiting like the lazy ones:
> 
> [   17.715904]  => trace_dump_stack
> [   17.715904]  => __wait_rcu_gp
> [   17.715904]  => synchronize_rcu
> [   17.715904]  => selinux_netcache_avc_callback
> [   17.715904]  => avc_ss_reset
> [   17.715904]  => sel_write_enforce
> [   17.715904]  => vfs_write
> [   17.715904]  => ksys_write
> [   17.715904]  => do_syscall_64
> [   17.715904]  => entry_SYSCALL_64_after_hwframe
> 
> I'm tired so I'll resume the debug later.

At times like this, I often pull the suspect code into userspace and
run it through its paces.  In this case, a bunch of call_rcu_lazy()
invocations into an empty bypass list, followed by a call_rcu()
invocation, then a check to make sure that the bypass list is no longer
lazy.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19  2:35         ` Paul E. McKenney
@ 2022-08-19  2:45           ` Joel Fernandes
  2022-08-19 16:30             ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19  2:45 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu



On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>
>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>
>>>> [Sorry, adding back the CC list]
>>>>
>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
>>>> <joel@joelfernandes.org> wrote:
>>>>>
>>>>> This is required to prevent callbacks triggering RCU machinery too
>>>>> quickly and too often, which adds more power to the system.
>>>>>
>>>>> When testing, we found that these paths were invoked often when the
>>>>> system is not doing anything (screen is ON but otherwise idle).
>>>>
>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
>>>> after applying this particular patch. It is the first time I could
>>>> test ChromeOS boot times with the series since it was hard to find a
>>>> ChromeOS device that runs the upstream kernel.
>>>>
>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
>>>> patch? I wonder if the issue is with wake up interaction with the nocb
>>>> GP threads.
>>>>
>>>> We ought to disable lazy RCU during boot since it would have little
>>>> benefit anyway. But I am also concerned about some deeper problem I
>>>> did not catch before.
>>>>
>>>> I'll look into tracing the fs paths to see if I can narrow down what's
>>>> causing it. Will also try a newer kernel, I am currently testing on
>>>> 5.19-rc4.
>>>
>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
>>> instead of normal CBs, are triggering expedited stalls during the boot
>>> process:
>>>
>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>>>
>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
>>> but maybe something hangs and causes that side-effect.
>>>
>>> initcall_debug did not help, as it seems initcalls all work fine, and
>>> then 8 seconds after the boot, it starts slowing down a lot, followed
>>> by the RCU stall messages. As a next step I'll enable ftrace during
>>> the boot to see if I can get more insight. But I believe, its not the
>>> FS layer, the FS layer just triggers lazy CBs, but there is something
>>> wrong with the core lazy-RCU work itself.
>>>
>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
>>> recent kernels and debug.
>>
>> More digging, thanks to trace_event= boot option , I find that the
>> boot process does have some synchronous waits, and though these are
>> "non-lazy", for some reason the lazy CBs that were previously queued
>> are making them wait for the *full* lazy duration. Which points to a
>> likely bug in the lazy RCU logic. These synchronous CBs should never
>> be waiting like the lazy ones:
>>
>> [   17.715904]  => trace_dump_stack
>> [   17.715904]  => __wait_rcu_gp
>> [   17.715904]  => synchronize_rcu
>> [   17.715904]  => selinux_netcache_avc_callback
>> [   17.715904]  => avc_ss_reset
>> [   17.715904]  => sel_write_enforce
>> [   17.715904]  => vfs_write
>> [   17.715904]  => ksys_write
>> [   17.715904]  => do_syscall_64
>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>>
>> I'm tired so I'll resume the debug later.
> 
> At times like this, I often pull the suspect code into userspace and
> run it through its paces.  In this case, a bunch of call_rcu_lazy()
> invocations into an empty bypass list, followed by a call_rcu()
> invocation, then a check to make sure that the bypass list is no longer
> lazy.

Thanks a lot for this great debug idea, I will look into it.

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19  2:45           ` Joel Fernandes
@ 2022-08-19 16:30             ` Joel Fernandes
  2022-08-19 17:12               ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19 16:30 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu



On 8/18/2022 10:45 PM, Joel Fernandes wrote:
> 
> 
> On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
>> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
>>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>
>>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>
>>>>> [Sorry, adding back the CC list]
>>>>>
>>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
>>>>> <joel@joelfernandes.org> wrote:
>>>>>>
>>>>>> This is required to prevent callbacks triggering RCU machinery too
>>>>>> quickly and too often, which adds more power to the system.
>>>>>>
>>>>>> When testing, we found that these paths were invoked often when the
>>>>>> system is not doing anything (screen is ON but otherwise idle).
>>>>>
>>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
>>>>> after applying this particular patch. It is the first time I could
>>>>> test ChromeOS boot times with the series since it was hard to find a
>>>>> ChromeOS device that runs the upstream kernel.
>>>>>
>>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
>>>>> patch? I wonder if the issue is with wake up interaction with the nocb
>>>>> GP threads.
>>>>>
>>>>> We ought to disable lazy RCU during boot since it would have little
>>>>> benefit anyway. But I am also concerned about some deeper problem I
>>>>> did not catch before.
>>>>>
>>>>> I'll look into tracing the fs paths to see if I can narrow down what's
>>>>> causing it. Will also try a newer kernel, I am currently testing on
>>>>> 5.19-rc4.
>>>>
>>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
>>>> instead of normal CBs, are triggering expedited stalls during the boot
>>>> process:
>>>>
>>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
>>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>>>>
>>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
>>>> but maybe something hangs and causes that side-effect.
>>>>
>>>> initcall_debug did not help, as it seems initcalls all work fine, and
>>>> then 8 seconds after the boot, it starts slowing down a lot, followed
>>>> by the RCU stall messages. As a next step I'll enable ftrace during
>>>> the boot to see if I can get more insight. But I believe, its not the
>>>> FS layer, the FS layer just triggers lazy CBs, but there is something
>>>> wrong with the core lazy-RCU work itself.
>>>>
>>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
>>>> recent kernels and debug.
>>>
>>> More digging, thanks to trace_event= boot option , I find that the
>>> boot process does have some synchronous waits, and though these are
>>> "non-lazy", for some reason the lazy CBs that were previously queued
>>> are making them wait for the *full* lazy duration. Which points to a
>>> likely bug in the lazy RCU logic. These synchronous CBs should never
>>> be waiting like the lazy ones:
>>>
>>> [   17.715904]  => trace_dump_stack
>>> [   17.715904]  => __wait_rcu_gp
>>> [   17.715904]  => synchronize_rcu
>>> [   17.715904]  => selinux_netcache_avc_callback
>>> [   17.715904]  => avc_ss_reset
>>> [   17.715904]  => sel_write_enforce
>>> [   17.715904]  => vfs_write
>>> [   17.715904]  => ksys_write
>>> [   17.715904]  => do_syscall_64
>>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>>>
>>> I'm tired so I'll resume the debug later.
>>
>> At times like this, I often pull the suspect code into userspace and
>> run it through its paces.  In this case, a bunch of call_rcu_lazy()
>> invocations into an empty bypass list, followed by a call_rcu()
>> invocation, then a check to make sure that the bypass list is no longer
>> lazy.
> 
> Thanks a lot for this great debug idea, I will look into it.

It seems to be a subtle issue when a large number of callbacks are
queued trigging the lock-contention code, which happens at boot. It
appears the non-lazy ones and lazy ones collide, so you have the lazy
timer which wins, and then the regular bypass lock-contention timer is
not allowed to do its thing. Due to this, the rcuog thread wakes up much
later than a jiffie.

Things are much better with the following change. However, this brings
me to a question about lock-contention based or any deferring and boot time.

If you have a path like selinux doing a synchronize_rcu(), shouldn't we
skip the jiffie waiting for the bypass timer? Otherwise things
synchronously waiting will slow down more than usual. Maybe bypassing
should not be done for any case until boot up is done. I'm curious to
see if that improves boot time.

@@ -580,7 +585,11 @@ static void __call_rcu_nocb_wake(struct rcu_data
*rdp, bool was_alldone,
        len = rcu_segcblist_n_cbs(&rdp->cblist);
        bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
        lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
-       if (was_alldone) {
+
+       // If we are in lazy-mode, we still need to do a wake up even if
+       // all CBs were previously done. Otherwise the GP thread will
+       // wait for the full lazy duration.
+       if (was_alldone || (READ_ONCE(rdp->nocb_defer_wakeup) ==
RCU_NOCB_WAKE_LAZY)) {
                rdp->qlen_last_fqs_check = len;
                // Only lazy CBs in bypass list
                if (lazy_len && bypass_len == lazy_len) {

---

Thanks,

 - Joel



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 16:30             ` Joel Fernandes
@ 2022-08-19 17:12               ` Paul E. McKenney
  2022-08-19 18:14                 ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-08-19 17:12 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu

On Fri, Aug 19, 2022 at 12:30:49PM -0400, Joel Fernandes wrote:
> On 8/18/2022 10:45 PM, Joel Fernandes wrote:
> > On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
> >> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
> >>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>>
> >>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>>>
> >>>>> [Sorry, adding back the CC list]
> >>>>>
> >>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
> >>>>> <joel@joelfernandes.org> wrote:
> >>>>>>
> >>>>>> This is required to prevent callbacks triggering RCU machinery too
> >>>>>> quickly and too often, which adds more power to the system.
> >>>>>>
> >>>>>> When testing, we found that these paths were invoked often when the
> >>>>>> system is not doing anything (screen is ON but otherwise idle).
> >>>>>
> >>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
> >>>>> after applying this particular patch. It is the first time I could
> >>>>> test ChromeOS boot times with the series since it was hard to find a
> >>>>> ChromeOS device that runs the upstream kernel.
> >>>>>
> >>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
> >>>>> patch? I wonder if the issue is with wake up interaction with the nocb
> >>>>> GP threads.
> >>>>>
> >>>>> We ought to disable lazy RCU during boot since it would have little
> >>>>> benefit anyway. But I am also concerned about some deeper problem I
> >>>>> did not catch before.
> >>>>>
> >>>>> I'll look into tracing the fs paths to see if I can narrow down what's
> >>>>> causing it. Will also try a newer kernel, I am currently testing on
> >>>>> 5.19-rc4.
> >>>>
> >>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
> >>>> instead of normal CBs, are triggering expedited stalls during the boot
> >>>> process:
> >>>>
> >>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
> >>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
> >>>>
> >>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
> >>>> but maybe something hangs and causes that side-effect.
> >>>>
> >>>> initcall_debug did not help, as it seems initcalls all work fine, and
> >>>> then 8 seconds after the boot, it starts slowing down a lot, followed
> >>>> by the RCU stall messages. As a next step I'll enable ftrace during
> >>>> the boot to see if I can get more insight. But I believe, its not the
> >>>> FS layer, the FS layer just triggers lazy CBs, but there is something
> >>>> wrong with the core lazy-RCU work itself.
> >>>>
> >>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
> >>>> recent kernels and debug.
> >>>
> >>> More digging, thanks to trace_event= boot option , I find that the
> >>> boot process does have some synchronous waits, and though these are
> >>> "non-lazy", for some reason the lazy CBs that were previously queued
> >>> are making them wait for the *full* lazy duration. Which points to a
> >>> likely bug in the lazy RCU logic. These synchronous CBs should never
> >>> be waiting like the lazy ones:
> >>>
> >>> [   17.715904]  => trace_dump_stack
> >>> [   17.715904]  => __wait_rcu_gp
> >>> [   17.715904]  => synchronize_rcu
> >>> [   17.715904]  => selinux_netcache_avc_callback
> >>> [   17.715904]  => avc_ss_reset
> >>> [   17.715904]  => sel_write_enforce
> >>> [   17.715904]  => vfs_write
> >>> [   17.715904]  => ksys_write
> >>> [   17.715904]  => do_syscall_64
> >>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
> >>>
> >>> I'm tired so I'll resume the debug later.
> >>
> >> At times like this, I often pull the suspect code into userspace and
> >> run it through its paces.  In this case, a bunch of call_rcu_lazy()
> >> invocations into an empty bypass list, followed by a call_rcu()
> >> invocation, then a check to make sure that the bypass list is no longer
> >> lazy.
> > 
> > Thanks a lot for this great debug idea, I will look into it.
> 
> It seems to be a subtle issue when a large number of callbacks are
> queued trigging the lock-contention code, which happens at boot. It
> appears the non-lazy ones and lazy ones collide, so you have the lazy
> timer which wins, and then the regular bypass lock-contention timer is
> not allowed to do its thing. Due to this, the rcuog thread wakes up much
> later than a jiffie.

Good show, and glad you found it!

> Things are much better with the following change. However, this brings
> me to a question about lock-contention based or any deferring and boot time.
> 
> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
> skip the jiffie waiting for the bypass timer? Otherwise things
> synchronously waiting will slow down more than usual. Maybe bypassing
> should not be done for any case until boot up is done. I'm curious to
> see if that improves boot time.

Why not simply disable laziness at boot time and enable it only after
booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
boot parameter uses a similar scheme.

> @@ -580,7 +585,11 @@ static void __call_rcu_nocb_wake(struct rcu_data
> *rdp, bool was_alldone,
>         len = rcu_segcblist_n_cbs(&rdp->cblist);
>         bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>         lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> -       if (was_alldone) {
> +
> +       // If we are in lazy-mode, we still need to do a wake up even if
> +       // all CBs were previously done. Otherwise the GP thread will
> +       // wait for the full lazy duration.
> +       if (was_alldone || (READ_ONCE(rdp->nocb_defer_wakeup) ==
> RCU_NOCB_WAKE_LAZY)) {
>                 rdp->qlen_last_fqs_check = len;
>                 // Only lazy CBs in bypass list
>                 if (lazy_len && bypass_len == lazy_len) {

And this change looks plausible, though as always, the system's opinion
carries much more weight than does mine.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 17:12               ` Paul E. McKenney
@ 2022-08-19 18:14                 ` Joel Fernandes
  2022-08-19 18:17                   ` Joel Fernandes
  2022-08-19 19:40                   ` Joel Fernandes
  0 siblings, 2 replies; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19 18:14 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu



On 8/19/2022 1:12 PM, Paul E. McKenney wrote:
> On Fri, Aug 19, 2022 at 12:30:49PM -0400, Joel Fernandes wrote:
>> On 8/18/2022 10:45 PM, Joel Fernandes wrote:
>>> On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
>>>> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
>>>>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>>
>>>>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>>>
>>>>>>> [Sorry, adding back the CC list]
>>>>>>>
>>>>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
>>>>>>> <joel@joelfernandes.org> wrote:
>>>>>>>>
>>>>>>>> This is required to prevent callbacks triggering RCU machinery too
>>>>>>>> quickly and too often, which adds more power to the system.
>>>>>>>>
>>>>>>>> When testing, we found that these paths were invoked often when the
>>>>>>>> system is not doing anything (screen is ON but otherwise idle).
>>>>>>>
>>>>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
>>>>>>> after applying this particular patch. It is the first time I could
>>>>>>> test ChromeOS boot times with the series since it was hard to find a
>>>>>>> ChromeOS device that runs the upstream kernel.
>>>>>>>
>>>>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
>>>>>>> patch? I wonder if the issue is with wake up interaction with the nocb
>>>>>>> GP threads.
>>>>>>>
>>>>>>> We ought to disable lazy RCU during boot since it would have little
>>>>>>> benefit anyway. But I am also concerned about some deeper problem I
>>>>>>> did not catch before.
>>>>>>>
>>>>>>> I'll look into tracing the fs paths to see if I can narrow down what's
>>>>>>> causing it. Will also try a newer kernel, I am currently testing on
>>>>>>> 5.19-rc4.
>>>>>>
>>>>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
>>>>>> instead of normal CBs, are triggering expedited stalls during the boot
>>>>>> process:
>>>>>>
>>>>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
>>>>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>>>>>>
>>>>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
>>>>>> but maybe something hangs and causes that side-effect.
>>>>>>
>>>>>> initcall_debug did not help, as it seems initcalls all work fine, and
>>>>>> then 8 seconds after the boot, it starts slowing down a lot, followed
>>>>>> by the RCU stall messages. As a next step I'll enable ftrace during
>>>>>> the boot to see if I can get more insight. But I believe, its not the
>>>>>> FS layer, the FS layer just triggers lazy CBs, but there is something
>>>>>> wrong with the core lazy-RCU work itself.
>>>>>>
>>>>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
>>>>>> recent kernels and debug.
>>>>>
>>>>> More digging, thanks to trace_event= boot option , I find that the
>>>>> boot process does have some synchronous waits, and though these are
>>>>> "non-lazy", for some reason the lazy CBs that were previously queued
>>>>> are making them wait for the *full* lazy duration. Which points to a
>>>>> likely bug in the lazy RCU logic. These synchronous CBs should never
>>>>> be waiting like the lazy ones:
>>>>>
>>>>> [   17.715904]  => trace_dump_stack
>>>>> [   17.715904]  => __wait_rcu_gp
>>>>> [   17.715904]  => synchronize_rcu
>>>>> [   17.715904]  => selinux_netcache_avc_callback
>>>>> [   17.715904]  => avc_ss_reset
>>>>> [   17.715904]  => sel_write_enforce
>>>>> [   17.715904]  => vfs_write
>>>>> [   17.715904]  => ksys_write
>>>>> [   17.715904]  => do_syscall_64
>>>>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>>>>>
>>>>> I'm tired so I'll resume the debug later.
>>>>
>>>> At times like this, I often pull the suspect code into userspace and
>>>> run it through its paces.  In this case, a bunch of call_rcu_lazy()
>>>> invocations into an empty bypass list, followed by a call_rcu()
>>>> invocation, then a check to make sure that the bypass list is no longer
>>>> lazy.
>>>
>>> Thanks a lot for this great debug idea, I will look into it.
>>
>> It seems to be a subtle issue when a large number of callbacks are
>> queued trigging the lock-contention code, which happens at boot. It
>> appears the non-lazy ones and lazy ones collide, so you have the lazy
>> timer which wins, and then the regular bypass lock-contention timer is
>> not allowed to do its thing. Due to this, the rcuog thread wakes up much
>> later than a jiffie.
> 
> Good show, and glad you found it!

Thanks!

>> Things are much better with the following change. However, this brings
>> me to a question about lock-contention based or any deferring and boot time.
>>
>> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
>> skip the jiffie waiting for the bypass timer? Otherwise things
>> synchronously waiting will slow down more than usual. Maybe bypassing
>> should not be done for any case until boot up is done. I'm curious to
>> see if that improves boot time.
> 
> Why not simply disable laziness at boot time and enable it only after
> booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
> boot parameter uses a similar scheme.

That sounds like the right thing to good, but unfortunately it wont help
this problem. The boot time issue happens after init has started. So the
OS is still "booting" even though the kernel has.

Also the problem can happen after boot as well, like if RCU
lazy/non-lazy callbacks come back to back quickly, or so.

But yes nonetheless, I can see the value of disabling it till the
in-kernel boot completets.

>> @@ -580,7 +585,11 @@ static void __call_rcu_nocb_wake(struct rcu_data
>> *rdp, bool was_alldone,
>>         len = rcu_segcblist_n_cbs(&rdp->cblist);
>>         bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>>         lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>> -       if (was_alldone) {
>> +
>> +       // If we are in lazy-mode, we still need to do a wake up even if
>> +       // all CBs were previously done. Otherwise the GP thread will
>> +       // wait for the full lazy duration.
>> +       if (was_alldone || (READ_ONCE(rdp->nocb_defer_wakeup) ==
>> RCU_NOCB_WAKE_LAZY)) {
>>                 rdp->qlen_last_fqs_check = len;
>>                 // Only lazy CBs in bypass list
>>                 if (lazy_len && bypass_len == lazy_len) {
> 
> And this change looks plausible, though as always, the system's opinion
> carries much more weight than does mine.

Sounds good, thanks, I am testing it more. Will update it for v4.

Thanks,

 - Joel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 18:14                 ` Joel Fernandes
@ 2022-08-19 18:17                   ` Joel Fernandes
  2022-08-19 18:26                     ` Paul E. McKenney
  2022-08-19 19:40                   ` Joel Fernandes
  1 sibling, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19 18:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu

On Fri, Aug 19, 2022 at 2:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
[..]
> >> Things are much better with the following change. However, this brings
> >> me to a question about lock-contention based or any deferring and boot time.
> >>
> >> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
> >> skip the jiffie waiting for the bypass timer? Otherwise things
> >> synchronously waiting will slow down more than usual. Maybe bypassing
> >> should not be done for any case until boot up is done. I'm curious to
> >> see if that improves boot time.
> >
> > Why not simply disable laziness at boot time and enable it only after
> > booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
> > boot parameter uses a similar scheme.
>
> That sounds like the right thing to good, but unfortunately it wont help
> this problem. The boot time issue happens after init has started. So the
> OS is still "booting" even though the kernel has.
>
> Also the problem can happen after boot as well, like if RCU
> lazy/non-lazy callbacks come back to back quickly, or so.
>
> But yes nonetheless, I can see the value of disabling it till the
> in-kernel boot completets.

My mail client is acting weird. I meant to add to this, I wonder if
there is a way other subsystems detect when userspace boots using some
heuristic?

Thanks,

 -  Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 18:17                   ` Joel Fernandes
@ 2022-08-19 18:26                     ` Paul E. McKenney
  2022-08-19 18:29                       ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-08-19 18:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu

On Fri, Aug 19, 2022 at 02:17:32PM -0400, Joel Fernandes wrote:
> On Fri, Aug 19, 2022 at 2:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> [..]
> > >> Things are much better with the following change. However, this brings
> > >> me to a question about lock-contention based or any deferring and boot time.
> > >>
> > >> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
> > >> skip the jiffie waiting for the bypass timer? Otherwise things
> > >> synchronously waiting will slow down more than usual. Maybe bypassing
> > >> should not be done for any case until boot up is done. I'm curious to
> > >> see if that improves boot time.
> > >
> > > Why not simply disable laziness at boot time and enable it only after
> > > booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
> > > boot parameter uses a similar scheme.
> >
> > That sounds like the right thing to good, but unfortunately it wont help
> > this problem. The boot time issue happens after init has started. So the
> > OS is still "booting" even though the kernel has.
> >
> > Also the problem can happen after boot as well, like if RCU
> > lazy/non-lazy callbacks come back to back quickly, or so.
> >
> > But yes nonetheless, I can see the value of disabling it till the
> > in-kernel boot completets.
> 
> My mail client is acting weird. I meant to add to this, I wonder if
> there is a way other subsystems detect when userspace boots using some
> heuristic?

I don't know of one, but I bet that ChromeOS has ways.  If nothing else,
might you add a sysfs write to one of the boot-up phases?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 18:26                     ` Paul E. McKenney
@ 2022-08-19 18:29                       ` Joel Fernandes
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19 18:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu

On Fri, Aug 19, 2022 at 2:26 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Aug 19, 2022 at 02:17:32PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 19, 2022 at 2:14 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > [..]
> > > >> Things are much better with the following change. However, this brings
> > > >> me to a question about lock-contention based or any deferring and boot time.
> > > >>
> > > >> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
> > > >> skip the jiffie waiting for the bypass timer? Otherwise things
> > > >> synchronously waiting will slow down more than usual. Maybe bypassing
> > > >> should not be done for any case until boot up is done. I'm curious to
> > > >> see if that improves boot time.
> > > >
> > > > Why not simply disable laziness at boot time and enable it only after
> > > > booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
> > > > boot parameter uses a similar scheme.
> > >
> > > That sounds like the right thing to good, but unfortunately it wont help
> > > this problem. The boot time issue happens after init has started. So the
> > > OS is still "booting" even though the kernel has.
> > >
> > > Also the problem can happen after boot as well, like if RCU
> > > lazy/non-lazy callbacks come back to back quickly, or so.
> > >
> > > But yes nonetheless, I can see the value of disabling it till the
> > > in-kernel boot completets.
> >
> > My mail client is acting weird. I meant to add to this, I wonder if
> > there is a way other subsystems detect when userspace boots using some
> > heuristic?
>
> I don't know of one, but I bet that ChromeOS has ways.  If nothing else,
> might you add a sysfs write to one of the boot-up phases?

Yes, that's possible :) Thanks, I will consider this idea.

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 18:14                 ` Joel Fernandes
  2022-08-19 18:17                   ` Joel Fernandes
@ 2022-08-19 19:40                   ` Joel Fernandes
  2022-08-19 19:58                     ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19 19:40 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu



On 8/19/2022 2:14 PM, Joel Fernandes wrote:
> 
> 
> On 8/19/2022 1:12 PM, Paul E. McKenney wrote:
>> On Fri, Aug 19, 2022 at 12:30:49PM -0400, Joel Fernandes wrote:
>>> On 8/18/2022 10:45 PM, Joel Fernandes wrote:
>>>> On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
>>>>> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
>>>>>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>>>
>>>>>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>>>>
>>>>>>>> [Sorry, adding back the CC list]
>>>>>>>>
>>>>>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
>>>>>>>> <joel@joelfernandes.org> wrote:
>>>>>>>>>
>>>>>>>>> This is required to prevent callbacks triggering RCU machinery too
>>>>>>>>> quickly and too often, which adds more power to the system.
>>>>>>>>>
>>>>>>>>> When testing, we found that these paths were invoked often when the
>>>>>>>>> system is not doing anything (screen is ON but otherwise idle).
>>>>>>>>
>>>>>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
>>>>>>>> after applying this particular patch. It is the first time I could
>>>>>>>> test ChromeOS boot times with the series since it was hard to find a
>>>>>>>> ChromeOS device that runs the upstream kernel.
>>>>>>>>
>>>>>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
>>>>>>>> patch? I wonder if the issue is with wake up interaction with the nocb
>>>>>>>> GP threads.
>>>>>>>>
>>>>>>>> We ought to disable lazy RCU during boot since it would have little
>>>>>>>> benefit anyway. But I am also concerned about some deeper problem I
>>>>>>>> did not catch before.
>>>>>>>>
>>>>>>>> I'll look into tracing the fs paths to see if I can narrow down what's
>>>>>>>> causing it. Will also try a newer kernel, I am currently testing on
>>>>>>>> 5.19-rc4.
>>>>>>>
>>>>>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
>>>>>>> instead of normal CBs, are triggering expedited stalls during the boot
>>>>>>> process:
>>>>>>>
>>>>>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
>>>>>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>>>>>>>
>>>>>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
>>>>>>> but maybe something hangs and causes that side-effect.
>>>>>>>
>>>>>>> initcall_debug did not help, as it seems initcalls all work fine, and
>>>>>>> then 8 seconds after the boot, it starts slowing down a lot, followed
>>>>>>> by the RCU stall messages. As a next step I'll enable ftrace during
>>>>>>> the boot to see if I can get more insight. But I believe, its not the
>>>>>>> FS layer, the FS layer just triggers lazy CBs, but there is something
>>>>>>> wrong with the core lazy-RCU work itself.
>>>>>>>
>>>>>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
>>>>>>> recent kernels and debug.
>>>>>>
>>>>>> More digging, thanks to trace_event= boot option , I find that the
>>>>>> boot process does have some synchronous waits, and though these are
>>>>>> "non-lazy", for some reason the lazy CBs that were previously queued
>>>>>> are making them wait for the *full* lazy duration. Which points to a
>>>>>> likely bug in the lazy RCU logic. These synchronous CBs should never
>>>>>> be waiting like the lazy ones:
>>>>>>
>>>>>> [   17.715904]  => trace_dump_stack
>>>>>> [   17.715904]  => __wait_rcu_gp
>>>>>> [   17.715904]  => synchronize_rcu
>>>>>> [   17.715904]  => selinux_netcache_avc_callback
>>>>>> [   17.715904]  => avc_ss_reset
>>>>>> [   17.715904]  => sel_write_enforce
>>>>>> [   17.715904]  => vfs_write
>>>>>> [   17.715904]  => ksys_write
>>>>>> [   17.715904]  => do_syscall_64
>>>>>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>>>>>>
>>>>>> I'm tired so I'll resume the debug later.
>>>>>
>>>>> At times like this, I often pull the suspect code into userspace and
>>>>> run it through its paces.  In this case, a bunch of call_rcu_lazy()
>>>>> invocations into an empty bypass list, followed by a call_rcu()
>>>>> invocation, then a check to make sure that the bypass list is no longer
>>>>> lazy.
>>>>
>>>> Thanks a lot for this great debug idea, I will look into it.
>>>
>>> It seems to be a subtle issue when a large number of callbacks are
>>> queued trigging the lock-contention code, which happens at boot. It
>>> appears the non-lazy ones and lazy ones collide, so you have the lazy
>>> timer which wins, and then the regular bypass lock-contention timer is
>>> not allowed to do its thing. Due to this, the rcuog thread wakes up much
>>> later than a jiffie.
>>
>> Good show, and glad you found it!
> 
> Thanks!
> 
>>> Things are much better with the following change. However, this brings
>>> me to a question about lock-contention based or any deferring and boot time.
>>>
>>> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
>>> skip the jiffie waiting for the bypass timer? Otherwise things
>>> synchronously waiting will slow down more than usual. Maybe bypassing
>>> should not be done for any case until boot up is done. I'm curious to
>>> see if that improves boot time.
>>
>> Why not simply disable laziness at boot time and enable it only after
>> booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
>> boot parameter uses a similar scheme.
> 
> That sounds like the right thing to good, but unfortunately it wont help
> this problem. The boot time issue happens after init has started. So the
> OS is still "booting" even though the kernel has.
> 
> Also the problem can happen after boot as well, like if RCU
> lazy/non-lazy callbacks come back to back quickly, or so.
> 
> But yes nonetheless, I can see the value of disabling it till the
> in-kernel boot completets.
> 
>>> @@ -580,7 +585,11 @@ static void __call_rcu_nocb_wake(struct rcu_data
>>> *rdp, bool was_alldone,
>>>         len = rcu_segcblist_n_cbs(&rdp->cblist);
>>>         bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>>>         lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>>> -       if (was_alldone) {
>>> +
>>> +       // If we are in lazy-mode, we still need to do a wake up even if
>>> +       // all CBs were previously done. Otherwise the GP thread will
>>> +       // wait for the full lazy duration.
>>> +       if (was_alldone || (READ_ONCE(rdp->nocb_defer_wakeup) ==
>>> RCU_NOCB_WAKE_LAZY)) {
>>>                 rdp->qlen_last_fqs_check = len;
>>>                 // Only lazy CBs in bypass list
>>>                 if (lazy_len && bypass_len == lazy_len) {
>>
>> And this change looks plausible, though as always, the system's opinion
>> carries much more weight than does mine.
> 
> Sounds good, thanks, I am testing it more. Will update it for v4.

We could also do the following, I tested it and it fixes it. It seems more maintainable
and less fragile, but it comes at a slightly higher (but likely negligible) cost. If there
are lazy CBs queued, and any non-lazy one comes, then the first non-lazy one is not
considered to be added to the bypass list but hopefully that's Ok with you. Later non-lazy
ones will be added to the bypass.

@@ -484,9 +490,17 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head
*rhp,
        // since we are kick-starting RCU GP processing anyway for the non-lazy
        // one, we can just reuse that GP for the already queued-up lazy ones.
        if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
-           (lazy && n_lazy_cbs >= qhimark)) {
+           (!lazy && n_lazy_cbs) ||
+           (lazy  && n_lazy_cbs >= qhimark)) {
                rcu_nocb_lock(rdp);
-               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
+
+               // This variable helps decide if a wakeup of the rcuog thread
+               // is needed. It is passed to __call_rcu_nocb_wake() by the
+               // caller.  If only lazy CBs were previously queued and this one
+               // is non-lazy, make sure the caller does a wake up.
+               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist) ||
+                               (!lazy && n_lazy_cbs);
+
                if (*was_alldone)
                        trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                            lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
@@ -500,7 +514,8 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head
*rhp,
        if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
                rcu_nocb_lock(rdp);
                if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
-                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
+                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist) ||
+                               (!lazy && n_lazy_cbs);
                        if (*was_alldone)
                                trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                                    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 19:40                   ` Joel Fernandes
@ 2022-08-19 19:58                     ` Paul E. McKenney
  2022-08-19 20:13                       ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-08-19 19:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu

On Fri, Aug 19, 2022 at 03:40:00PM -0400, Joel Fernandes wrote:
> On 8/19/2022 2:14 PM, Joel Fernandes wrote:
> > On 8/19/2022 1:12 PM, Paul E. McKenney wrote:
> >> On Fri, Aug 19, 2022 at 12:30:49PM -0400, Joel Fernandes wrote:
> >>> On 8/18/2022 10:45 PM, Joel Fernandes wrote:
> >>>> On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
> >>>>> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
> >>>>>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>>>>>
> >>>>>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>>>>>>>
> >>>>>>>> [Sorry, adding back the CC list]
> >>>>>>>>
> >>>>>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
> >>>>>>>> <joel@joelfernandes.org> wrote:
> >>>>>>>>>
> >>>>>>>>> This is required to prevent callbacks triggering RCU machinery too
> >>>>>>>>> quickly and too often, which adds more power to the system.
> >>>>>>>>>
> >>>>>>>>> When testing, we found that these paths were invoked often when the
> >>>>>>>>> system is not doing anything (screen is ON but otherwise idle).
> >>>>>>>>
> >>>>>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
> >>>>>>>> after applying this particular patch. It is the first time I could
> >>>>>>>> test ChromeOS boot times with the series since it was hard to find a
> >>>>>>>> ChromeOS device that runs the upstream kernel.
> >>>>>>>>
> >>>>>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
> >>>>>>>> patch? I wonder if the issue is with wake up interaction with the nocb
> >>>>>>>> GP threads.
> >>>>>>>>
> >>>>>>>> We ought to disable lazy RCU during boot since it would have little
> >>>>>>>> benefit anyway. But I am also concerned about some deeper problem I
> >>>>>>>> did not catch before.
> >>>>>>>>
> >>>>>>>> I'll look into tracing the fs paths to see if I can narrow down what's
> >>>>>>>> causing it. Will also try a newer kernel, I am currently testing on
> >>>>>>>> 5.19-rc4.
> >>>>>>>
> >>>>>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
> >>>>>>> instead of normal CBs, are triggering expedited stalls during the boot
> >>>>>>> process:
> >>>>>>>
> >>>>>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
> >>>>>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
> >>>>>>>
> >>>>>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
> >>>>>>> but maybe something hangs and causes that side-effect.
> >>>>>>>
> >>>>>>> initcall_debug did not help, as it seems initcalls all work fine, and
> >>>>>>> then 8 seconds after the boot, it starts slowing down a lot, followed
> >>>>>>> by the RCU stall messages. As a next step I'll enable ftrace during
> >>>>>>> the boot to see if I can get more insight. But I believe, its not the
> >>>>>>> FS layer, the FS layer just triggers lazy CBs, but there is something
> >>>>>>> wrong with the core lazy-RCU work itself.
> >>>>>>>
> >>>>>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
> >>>>>>> recent kernels and debug.
> >>>>>>
> >>>>>> More digging, thanks to trace_event= boot option , I find that the
> >>>>>> boot process does have some synchronous waits, and though these are
> >>>>>> "non-lazy", for some reason the lazy CBs that were previously queued
> >>>>>> are making them wait for the *full* lazy duration. Which points to a
> >>>>>> likely bug in the lazy RCU logic. These synchronous CBs should never
> >>>>>> be waiting like the lazy ones:
> >>>>>>
> >>>>>> [   17.715904]  => trace_dump_stack
> >>>>>> [   17.715904]  => __wait_rcu_gp
> >>>>>> [   17.715904]  => synchronize_rcu
> >>>>>> [   17.715904]  => selinux_netcache_avc_callback
> >>>>>> [   17.715904]  => avc_ss_reset
> >>>>>> [   17.715904]  => sel_write_enforce
> >>>>>> [   17.715904]  => vfs_write
> >>>>>> [   17.715904]  => ksys_write
> >>>>>> [   17.715904]  => do_syscall_64
> >>>>>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
> >>>>>>
> >>>>>> I'm tired so I'll resume the debug later.
> >>>>>
> >>>>> At times like this, I often pull the suspect code into userspace and
> >>>>> run it through its paces.  In this case, a bunch of call_rcu_lazy()
> >>>>> invocations into an empty bypass list, followed by a call_rcu()
> >>>>> invocation, then a check to make sure that the bypass list is no longer
> >>>>> lazy.
> >>>>
> >>>> Thanks a lot for this great debug idea, I will look into it.
> >>>
> >>> It seems to be a subtle issue when a large number of callbacks are
> >>> queued trigging the lock-contention code, which happens at boot. It
> >>> appears the non-lazy ones and lazy ones collide, so you have the lazy
> >>> timer which wins, and then the regular bypass lock-contention timer is
> >>> not allowed to do its thing. Due to this, the rcuog thread wakes up much
> >>> later than a jiffie.
> >>
> >> Good show, and glad you found it!
> > 
> > Thanks!
> > 
> >>> Things are much better with the following change. However, this brings
> >>> me to a question about lock-contention based or any deferring and boot time.
> >>>
> >>> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
> >>> skip the jiffie waiting for the bypass timer? Otherwise things
> >>> synchronously waiting will slow down more than usual. Maybe bypassing
> >>> should not be done for any case until boot up is done. I'm curious to
> >>> see if that improves boot time.
> >>
> >> Why not simply disable laziness at boot time and enable it only after
> >> booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
> >> boot parameter uses a similar scheme.
> > 
> > That sounds like the right thing to good, but unfortunately it wont help
> > this problem. The boot time issue happens after init has started. So the
> > OS is still "booting" even though the kernel has.
> > 
> > Also the problem can happen after boot as well, like if RCU
> > lazy/non-lazy callbacks come back to back quickly, or so.
> > 
> > But yes nonetheless, I can see the value of disabling it till the
> > in-kernel boot completets.
> > 
> >>> @@ -580,7 +585,11 @@ static void __call_rcu_nocb_wake(struct rcu_data
> >>> *rdp, bool was_alldone,
> >>>         len = rcu_segcblist_n_cbs(&rdp->cblist);
> >>>         bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> >>>         lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> >>> -       if (was_alldone) {
> >>> +
> >>> +       // If we are in lazy-mode, we still need to do a wake up even if
> >>> +       // all CBs were previously done. Otherwise the GP thread will
> >>> +       // wait for the full lazy duration.
> >>> +       if (was_alldone || (READ_ONCE(rdp->nocb_defer_wakeup) ==
> >>> RCU_NOCB_WAKE_LAZY)) {
> >>>                 rdp->qlen_last_fqs_check = len;
> >>>                 // Only lazy CBs in bypass list
> >>>                 if (lazy_len && bypass_len == lazy_len) {
> >>
> >> And this change looks plausible, though as always, the system's opinion
> >> carries much more weight than does mine.
> > 
> > Sounds good, thanks, I am testing it more. Will update it for v4.
> 
> We could also do the following, I tested it and it fixes it. It seems more maintainable
> and less fragile, but it comes at a slightly higher (but likely negligible) cost. If there
> are lazy CBs queued, and any non-lazy one comes, then the first non-lazy one is not
> considered to be added to the bypass list but hopefully that's Ok with you. Later non-lazy
> ones will be added to the bypass.

At first I was concerned that you intended to reorder the callbacks,
but fortunately that is not what the patch below does.  ;-)

But don't you also need to clear the "lazy" flag at some point in this
execution path?  After all, once a non-lazy callback arrives, all the
callbacks are treated as if they are non-lazy, correct?

							Thanx, Paul

> @@ -484,9 +490,17 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head
> *rhp,
>         // since we are kick-starting RCU GP processing anyway for the non-lazy
>         // one, we can just reuse that GP for the already queued-up lazy ones.
>         if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> -           (lazy && n_lazy_cbs >= qhimark)) {
> +           (!lazy && n_lazy_cbs) ||
> +           (lazy  && n_lazy_cbs >= qhimark)) {
>                 rcu_nocb_lock(rdp);
> -               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> +
> +               // This variable helps decide if a wakeup of the rcuog thread
> +               // is needed. It is passed to __call_rcu_nocb_wake() by the
> +               // caller.  If only lazy CBs were previously queued and this one
> +               // is non-lazy, make sure the caller does a wake up.
> +               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist) ||
> +                               (!lazy && n_lazy_cbs);
> +
>                 if (*was_alldone)
>                         trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>                                             lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> @@ -500,7 +514,8 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head
> *rhp,
>         if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
>                 rcu_nocb_lock(rdp);
>                 if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
> -                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> +                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist) ||
> +                               (!lazy && n_lazy_cbs);
>                         if (*was_alldone)
>                                 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>                                                     lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-08-19 19:58                     ` Paul E. McKenney
@ 2022-08-19 20:13                       ` Joel Fernandes
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Fernandes @ 2022-08-19 20:13 UTC (permalink / raw)
  To: paulmck
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, rcu



On 8/19/2022 3:58 PM, Paul E. McKenney wrote:
> On Fri, Aug 19, 2022 at 03:40:00PM -0400, Joel Fernandes wrote:
>> On 8/19/2022 2:14 PM, Joel Fernandes wrote:
>>> On 8/19/2022 1:12 PM, Paul E. McKenney wrote:
>>>> On Fri, Aug 19, 2022 at 12:30:49PM -0400, Joel Fernandes wrote:
>>>>> On 8/18/2022 10:45 PM, Joel Fernandes wrote:
>>>>>> On 8/18/2022 10:35 PM, Paul E. McKenney wrote:
>>>>>>> On Thu, Aug 18, 2022 at 09:21:56PM -0400, Joel Fernandes wrote:
>>>>>>>> On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>>>>>
>>>>>>>>> On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>>>>>>>>>>
>>>>>>>>>> [Sorry, adding back the CC list]
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google)
>>>>>>>>>> <joel@joelfernandes.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> This is required to prevent callbacks triggering RCU machinery too
>>>>>>>>>>> quickly and too often, which adds more power to the system.
>>>>>>>>>>>
>>>>>>>>>>> When testing, we found that these paths were invoked often when the
>>>>>>>>>>> system is not doing anything (screen is ON but otherwise idle).
>>>>>>>>>>
>>>>>>>>>> Unfortunately, I am seeing a slow down in ChromeOS boot performance
>>>>>>>>>> after applying this particular patch. It is the first time I could
>>>>>>>>>> test ChromeOS boot times with the series since it was hard to find a
>>>>>>>>>> ChromeOS device that runs the upstream kernel.
>>>>>>>>>>
>>>>>>>>>> Anyway, Vlad, Neeraj, do you guys also see slower boot times with this
>>>>>>>>>> patch? I wonder if the issue is with wake up interaction with the nocb
>>>>>>>>>> GP threads.
>>>>>>>>>>
>>>>>>>>>> We ought to disable lazy RCU during boot since it would have little
>>>>>>>>>> benefit anyway. But I am also concerned about some deeper problem I
>>>>>>>>>> did not catch before.
>>>>>>>>>>
>>>>>>>>>> I'll look into tracing the fs paths to see if I can narrow down what's
>>>>>>>>>> causing it. Will also try a newer kernel, I am currently testing on
>>>>>>>>>> 5.19-rc4.
>>>>>>>>>
>>>>>>>>> I got somewhere with this. It looks like queuing CBs as lazy CBs
>>>>>>>>> instead of normal CBs, are triggering expedited stalls during the boot
>>>>>>>>> process:
>>>>>>>>>
>>>>>>>>>   39.949198] rcu: INFO: rcu_preempt detected expedited stalls on
>>>>>>>>> CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/.
>>>>>>>>>
>>>>>>>>> No idea how/why lazy RCU CBs would be related to expedited GP issues,
>>>>>>>>> but maybe something hangs and causes that side-effect.
>>>>>>>>>
>>>>>>>>> initcall_debug did not help, as it seems initcalls all work fine, and
>>>>>>>>> then 8 seconds after the boot, it starts slowing down a lot, followed
>>>>>>>>> by the RCU stall messages. As a next step I'll enable ftrace during
>>>>>>>>> the boot to see if I can get more insight. But I believe, its not the
>>>>>>>>> FS layer, the FS layer just triggers lazy CBs, but there is something
>>>>>>>>> wrong with the core lazy-RCU work itself.
>>>>>>>>>
>>>>>>>>> This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more
>>>>>>>>> recent kernels and debug.
>>>>>>>>
>>>>>>>> More digging, thanks to trace_event= boot option , I find that the
>>>>>>>> boot process does have some synchronous waits, and though these are
>>>>>>>> "non-lazy", for some reason the lazy CBs that were previously queued
>>>>>>>> are making them wait for the *full* lazy duration. Which points to a
>>>>>>>> likely bug in the lazy RCU logic. These synchronous CBs should never
>>>>>>>> be waiting like the lazy ones:
>>>>>>>>
>>>>>>>> [   17.715904]  => trace_dump_stack
>>>>>>>> [   17.715904]  => __wait_rcu_gp
>>>>>>>> [   17.715904]  => synchronize_rcu
>>>>>>>> [   17.715904]  => selinux_netcache_avc_callback
>>>>>>>> [   17.715904]  => avc_ss_reset
>>>>>>>> [   17.715904]  => sel_write_enforce
>>>>>>>> [   17.715904]  => vfs_write
>>>>>>>> [   17.715904]  => ksys_write
>>>>>>>> [   17.715904]  => do_syscall_64
>>>>>>>> [   17.715904]  => entry_SYSCALL_64_after_hwframe
>>>>>>>>
>>>>>>>> I'm tired so I'll resume the debug later.
>>>>>>>
>>>>>>> At times like this, I often pull the suspect code into userspace and
>>>>>>> run it through its paces.  In this case, a bunch of call_rcu_lazy()
>>>>>>> invocations into an empty bypass list, followed by a call_rcu()
>>>>>>> invocation, then a check to make sure that the bypass list is no longer
>>>>>>> lazy.
>>>>>>
>>>>>> Thanks a lot for this great debug idea, I will look into it.
>>>>>
>>>>> It seems to be a subtle issue when a large number of callbacks are
>>>>> queued trigging the lock-contention code, which happens at boot. It
>>>>> appears the non-lazy ones and lazy ones collide, so you have the lazy
>>>>> timer which wins, and then the regular bypass lock-contention timer is
>>>>> not allowed to do its thing. Due to this, the rcuog thread wakes up much
>>>>> later than a jiffie.
>>>>
>>>> Good show, and glad you found it!
>>>
>>> Thanks!
>>>
>>>>> Things are much better with the following change. However, this brings
>>>>> me to a question about lock-contention based or any deferring and boot time.
>>>>>
>>>>> If you have a path like selinux doing a synchronize_rcu(), shouldn't we
>>>>> skip the jiffie waiting for the bypass timer? Otherwise things
>>>>> synchronously waiting will slow down more than usual. Maybe bypassing
>>>>> should not be done for any case until boot up is done. I'm curious to
>>>>> see if that improves boot time.
>>>>
>>>> Why not simply disable laziness at boot time and enable it only after
>>>> booting is complete?  The exiting rcupdate.rcu_normal_after_boot kernel
>>>> boot parameter uses a similar scheme.
>>>
>>> That sounds like the right thing to good, but unfortunately it wont help
>>> this problem. The boot time issue happens after init has started. So the
>>> OS is still "booting" even though the kernel has.
>>>
>>> Also the problem can happen after boot as well, like if RCU
>>> lazy/non-lazy callbacks come back to back quickly, or so.
>>>
>>> But yes nonetheless, I can see the value of disabling it till the
>>> in-kernel boot completets.
>>>
>>>>> @@ -580,7 +585,11 @@ static void __call_rcu_nocb_wake(struct rcu_data
>>>>> *rdp, bool was_alldone,
>>>>>         len = rcu_segcblist_n_cbs(&rdp->cblist);
>>>>>         bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>>>>>         lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>>>>> -       if (was_alldone) {
>>>>> +
>>>>> +       // If we are in lazy-mode, we still need to do a wake up even if
>>>>> +       // all CBs were previously done. Otherwise the GP thread will
>>>>> +       // wait for the full lazy duration.
>>>>> +       if (was_alldone || (READ_ONCE(rdp->nocb_defer_wakeup) ==
>>>>> RCU_NOCB_WAKE_LAZY)) {
>>>>>                 rdp->qlen_last_fqs_check = len;
>>>>>                 // Only lazy CBs in bypass list
>>>>>                 if (lazy_len && bypass_len == lazy_len) {
>>>>
>>>> And this change looks plausible, though as always, the system's opinion
>>>> carries much more weight than does mine.
>>>
>>> Sounds good, thanks, I am testing it more. Will update it for v4.
>>
>> We could also do the following, I tested it and it fixes it. It seems more maintainable
>> and less fragile, but it comes at a slightly higher (but likely negligible) cost. If there
>> are lazy CBs queued, and any non-lazy one comes, then the first non-lazy one is not
>> considered to be added to the bypass list but hopefully that's Ok with you. Later non-lazy
>> ones will be added to the bypass.
> 
> At first I was concerned that you intended to reorder the callbacks,
> but fortunately that is not what the patch below does.  ;-)
> 
> But don't you also need to clear the "lazy" flag at some point in this
> execution path?  After all, once a non-lazy callback arrives, all the
> callbacks are treated as if they are non-lazy, correct?

Oh, I did not see a need to clear this flag, I did not want to modify the flag because it
is set by the caller and is more to signal what the caller intended than what its treated as.

IMO it should not effect the treatment of the lazy CB as a non-lazy one, because the
effect is the same as if something was queued into the bypass list. The bypass list is
flushed first before the "non-bypass" (which could be non-lazy) CB is requested to be
queued by the caller, so we're doing the same thing as the existing bypass logic.

Basically these comments:
        // If there hasn't yet been all that many ->cblist enqueues
        // this jiffy, tell the caller to enqueue onto ->cblist.  But flush
        // ->nocb_bypass first.

    WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, lazy, false));

Also the issue that the below diff fixed is more about the non-lazy one being treated as a
lazy one than the other way.

Let me know if I missed something. Thanks,

- Joel


> 							Thanx, Paul
> 
>> @@ -484,9 +490,17 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head
>> *rhp,
>>         // since we are kick-starting RCU GP processing anyway for the non-lazy
>>         // one, we can just reuse that GP for the already queued-up lazy ones.
>>         if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
>> -           (lazy && n_lazy_cbs >= qhimark)) {
>> +           (!lazy && n_lazy_cbs) ||
>> +           (lazy  && n_lazy_cbs >= qhimark)) {
>>                 rcu_nocb_lock(rdp);
>> -               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>> +
>> +               // This variable helps decide if a wakeup of the rcuog thread
>> +               // is needed. It is passed to __call_rcu_nocb_wake() by the
>> +               // caller.  If only lazy CBs were previously queued and this one
>> +               // is non-lazy, make sure the caller does a wake up.
>> +               *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist) ||
>> +                               (!lazy && n_lazy_cbs);
>> +
>>                 if (*was_alldone)
>>                         trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>>                                             lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
>> @@ -500,7 +514,8 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head
>> *rhp,
>>         if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
>>                 rcu_nocb_lock(rdp);
>>                 if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy, false)) {
>> -                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>> +                       *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist) ||
>> +                               (!lazy && n_lazy_cbs);
>>                         if (*was_alldone)
>>                                 trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>>                                                     lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-08-19 20:13 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09  3:45 [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 1/6] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 2/6] rcu: shrinker for lazy rcu Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 3/6] rcuscale: Add laziness and kfree tests Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
2022-08-18 17:22   ` Joel Fernandes
2022-08-18 17:23   ` Joel Fernandes
2022-08-18 23:05     ` Joel Fernandes
2022-08-19  1:21       ` Joel Fernandes
2022-08-19  1:29         ` Joel Fernandes
2022-08-19  2:35         ` Paul E. McKenney
2022-08-19  2:45           ` Joel Fernandes
2022-08-19 16:30             ` Joel Fernandes
2022-08-19 17:12               ` Paul E. McKenney
2022-08-19 18:14                 ` Joel Fernandes
2022-08-19 18:17                   ` Joel Fernandes
2022-08-19 18:26                     ` Paul E. McKenney
2022-08-19 18:29                       ` Joel Fernandes
2022-08-19 19:40                   ` Joel Fernandes
2022-08-19 19:58                     ` Paul E. McKenney
2022-08-19 20:13                       ` Joel Fernandes
2022-08-09  3:45 ` [PATCH v3 resend 5/6] rcutorture: Add test code for call_rcu_lazy() Joel Fernandes (Google)
2022-08-09  3:45 ` [PATCH v3 resend 6/6] debug: Toggle lazy at runtime and change flush jiffies Joel Fernandes (Google)
2022-08-11  2:23 ` [PATCH v3 resend 0/6] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes
2022-08-11  2:31   ` Joel Fernandes
2022-08-11  2:51     ` Paul E. McKenney
2022-08-11  3:22       ` Joel Fernandes
2022-08-11  3:46         ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.