All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes
@ 2022-06-22 22:50 Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN Joel Fernandes (Google)
                   ` (9 more replies)
  0 siblings, 10 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)


Hello!
Please find the next improved version of call_rcu_lazy() attached.  The main
difference between the previous version is that it is now using bypass lists,
and thus handling rcu_barrier() and hotplug situations, with some small changes
to those parts.

I also don't see the TREE07 RCU stall from v1 anymore.

In the v1, we some numbers below (testing on v2 is in progress). Rushikesh,
feel free to pull these patches into your tree. Just to note, you will also
need to pull the call_rcu_lazy() user patches from v1. I have dropped in this
series, just to make the series focus on the feature code first.

Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
The observation is that due to a 'trickle down' effect of RCU callbacks, the
system is very lightly loaded but constantly running few RCU callbacks very
often. This confuses the power management hardware that the system is active,
when it is in fact idle.

For example, when ChromeOS screen is off and user is not doing anything on the
system, we can see big power savings.
Before:
Pk%pc10 = 72.13
PkgWatt = 0.58
CorWatt = 0.04

After:
Pk%pc10 = 81.28
PkgWatt = 0.41
CorWatt = 0.03

Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
can see that the display pipeline is constantly doing RCU callback queuing due
to open/close of file descriptors associated with graphics buffers. This is
attributed to the file_free_rcu() path which this patch series also touches.

This patch series adds a simple but effective, and lockless implementation of
RCU callback batching. On memory pressure, timeout or queue growing too big, we
initiate a flush of one or more per-CPU lists.

Similar results can be achieved by increasing jiffies_till_first_fqs, however
that also has the effect of slowing down RCU. Especially I saw huge slow down
of function graph tracer when increasing that.

One drawback of this series is, if another frequent RCU callback creeps up in
the future, that's not lazy, then that will again hurt the power. However, I
believe identifying and fixing those is a more reasonable approach than slowing
RCU down for the whole system.

Disclaimer: I have intentionally not CC'd other subsystem maintainers (like
net, fs) to keep noise low and will CC them in the future after 1 or 2 rounds
of review and agreements.

Joel Fernandes (Google) (7):
  rcu: Introduce call_rcu_lazy() API implementation
  fs: Move call_rcu() to call_rcu_lazy() in some paths
  rcu/nocb: Add option to force all call_rcu() to lazy
  rcu/nocb: Wake up gp thread when flushing
  rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  rcu/nocb: Rewrite deferred wake up logic to be more clean
  rcu/kfree: Fix kfree_rcu_shrink_count() return value

Vineeth Pillai (1):
  rcu: shrinker for lazy rcu

 fs/dcache.c                   |   4 +-
 fs/eventpoll.c                |   2 +-
 fs/file_table.c               |   2 +-
 fs/inode.c                    |   2 +-
 include/linux/rcu_segcblist.h |   1 +
 include/linux/rcupdate.h      |   6 +
 kernel/rcu/Kconfig            |   8 ++
 kernel/rcu/rcu.h              |   8 ++
 kernel/rcu/rcu_segcblist.c    |  19 +++
 kernel/rcu/rcu_segcblist.h    |  24 ++++
 kernel/rcu/rcuscale.c         |  64 +++++++++-
 kernel/rcu/tree.c             |  35 +++++-
 kernel/rcu/tree.h             |  10 +-
 kernel/rcu/tree_nocb.h        | 217 +++++++++++++++++++++++++++-------
 14 files changed, 345 insertions(+), 57 deletions(-)

-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
@ 2022-06-22 22:50 ` Joel Fernandes (Google)
  2022-06-22 22:58   ` Joel Fernandes
  2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Marco Elver

From: "Paul E. McKenney" <paulmck@kernel.org>

Context tracking's __ct_state() function can be invoked from noinstr state
where RCU is not watching.  This means that its use of atomic_read()
causes KASAN to invoke the non-noinstr __kasan_check_read() function
from the noinstr function __ct_state().  This is problematic because
someone tracing the __kasan_check_read() function could get a nasty
surprise because of RCU not watching.

This commit therefore replaces the __ct_state() function's use of
atomic_read() with arch_atomic_read(), which KASAN does not attempt to
add instrumention to.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
---
 include/linux/context_tracking_state.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
index 0aecc07fb4f5..81c51e5f0314 100644
--- a/include/linux/context_tracking_state.h
+++ b/include/linux/context_tracking_state.h
@@ -49,7 +49,7 @@ DECLARE_PER_CPU(struct context_tracking, context_tracking);
 
 static __always_inline int __ct_state(void)
 {
-	return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
+	return arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
 }
 #endif
 
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN Joel Fernandes (Google)
@ 2022-06-22 22:50 ` Joel Fernandes (Google)
  2022-06-22 23:18   ` Joel Fernandes
                     ` (3 more replies)
  2022-06-22 22:50 ` [PATCH v2 2/8] rcu: shrinker for lazy rcu Joel Fernandes (Google)
                   ` (7 subsequent siblings)
  9 siblings, 4 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

Implement timer-based RCU lazy callback batching. The batch is flushed
whenever a certain amount of time has passed, or the batch on a
particular CPU grows too big. Also memory pressure will flush it in a
future patch.

To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists to handle lazy CBs. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.

Suggested-by: Paul McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcu_segcblist.h |   1 +
 include/linux/rcupdate.h      |   6 ++
 kernel/rcu/Kconfig            |   8 +++
 kernel/rcu/rcu_segcblist.c    |  19 ++++++
 kernel/rcu/rcu_segcblist.h    |  14 ++++
 kernel/rcu/tree.c             |  24 +++++--
 kernel/rcu/tree.h             |  10 +--
 kernel/rcu/tree_nocb.h        | 125 +++++++++++++++++++++++++---------
 8 files changed, 164 insertions(+), 43 deletions(-)

diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 659d13a7ddaa..9a992707917b 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -22,6 +22,7 @@ struct rcu_cblist {
 	struct rcu_head *head;
 	struct rcu_head **tail;
 	long len;
+	long lazy_len;
 };
 
 #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1a32036c918c..9191a3d88087 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -82,6 +82,12 @@ static inline int rcu_preempt_depth(void)
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_RCU_LAZY
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
+#else
+#define call_rcu_lazy(head, func) call_rcu(head, func)
+#endif
+
 /* Internal to kernel */
 void rcu_init(void);
 extern int rcu_scheduler_active;
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 27aab870ae4c..0bffa992fdc4 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -293,4 +293,12 @@ config TASKS_TRACE_RCU_READ_MB
 	  Say N here if you hate read-side memory barriers.
 	  Take the default if you are unsure.
 
+config RCU_LAZY
+	bool "RCU callback lazy invocation functionality"
+	depends on RCU_NOCB_CPU
+	default n
+	help
+	  To save power, batch RCU callbacks and flush after delay, memory
+          pressure or callback list growing too big.
+
 endmenu # "RCU Subsystem"
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index c54ea2b6a36b..627a3218a372 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -20,6 +20,7 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
 	rclp->head = NULL;
 	rclp->tail = &rclp->head;
 	rclp->len = 0;
+	rclp->lazy_len = 0;
 }
 
 /*
@@ -32,6 +33,15 @@ void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
 	WRITE_ONCE(rclp->len, rclp->len + 1);
 }
 
+/*
+ * Enqueue an rcu_head structure onto the specified callback list.
+ */
+void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp)
+{
+	rcu_cblist_enqueue(rclp, rhp);
+	WRITE_ONCE(rclp->lazy_len, rclp->lazy_len + 1);
+}
+
 /*
  * Flush the second rcu_cblist structure onto the first one, obliterating
  * any contents of the first.  If rhp is non-NULL, enqueue it as the sole
@@ -60,6 +70,15 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
 	}
 }
 
+void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
+			      struct rcu_cblist *srclp,
+			      struct rcu_head *rhp)
+{
+	rcu_cblist_flush_enqueue(drclp, srclp, rhp);
+	if (rhp)
+		WRITE_ONCE(srclp->lazy_len, 1);
+}
+
 /*
  * Dequeue the oldest rcu_head structure from the specified callback
  * list.
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 431cee212467..c3d7de65b689 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -15,14 +15,28 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
 	return READ_ONCE(rclp->len);
 }
 
+/* Return number of callbacks in the specified callback list. */
+static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
+{
+#ifdef CONFIG_RCU_LAZY
+	return READ_ONCE(rclp->lazy_len);
+#else
+	return 0;
+#endif
+}
+
 /* Return number of callbacks in segmented callback list by summing seglen. */
 long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
 
 void rcu_cblist_init(struct rcu_cblist *rclp);
 void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
+void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp);
 void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
 			      struct rcu_cblist *srclp,
 			      struct rcu_head *rhp);
+void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
+			      struct rcu_cblist *srclp,
+			      struct rcu_head *rhp);
 struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp);
 
 /*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c25ba442044a..d2e3d6e176d2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3098,7 +3098,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
  * Implementation of these memory-ordering guarantees is described here:
  * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
  */
-void call_rcu(struct rcu_head *head, rcu_callback_t func)
+static void
+__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
 {
 	static atomic_t doublefrees;
 	unsigned long flags;
@@ -3139,7 +3140,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 	}
 
 	check_cb_ovld(rdp);
-	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
+	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
 		return; // Enqueued onto ->nocb_bypass, so just leave.
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
@@ -3161,8 +3162,21 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 		local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(call_rcu);
 
+#ifdef CONFIG_RCU_LAZY
+void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
+{
+	return __call_rcu_common(head, func, true);
+}
+EXPORT_SYMBOL_GPL(call_rcu_lazy);
+#endif
+
+void call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	return __call_rcu_common(head, func, false);
+
+}
+EXPORT_SYMBOL_GPL(call_rcu);
 
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (HZ / 50)
@@ -4056,7 +4070,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 	rdp->barrier_head.func = rcu_barrier_callback;
 	debug_rcu_head_queue(&rdp->barrier_head);
 	rcu_nocb_lock(rdp);
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
 	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
 		atomic_inc(&rcu_state.barrier_cpu_count);
 	} else {
@@ -4476,7 +4490,7 @@ void rcutree_migrate_callbacks(int cpu)
 	my_rdp = this_cpu_ptr(&rcu_data);
 	my_rnp = my_rdp->mynode;
 	rcu_nocb_lock(my_rdp); /* irqs already disabled. */
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false));
 	raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */
 	/* Leverage recent GPs and set GP for new callbacks. */
 	needwake = rcu_advance_cbs(my_rnp, rdp) ||
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 2ccf5845957d..fec4fad6654b 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -267,8 +267,9 @@ struct rcu_data {
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
 #define RCU_NOCB_WAKE_NOT	0
 #define RCU_NOCB_WAKE_BYPASS	1
-#define RCU_NOCB_WAKE		2
-#define RCU_NOCB_WAKE_FORCE	3
+#define RCU_NOCB_WAKE_LAZY	2
+#define RCU_NOCB_WAKE		3
+#define RCU_NOCB_WAKE_FORCE	4
 
 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
 					/* For jiffies_till_first_fqs and */
@@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j);
+				  unsigned long j, bool lazy);
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags);
+				bool *was_alldone, unsigned long flags,
+				bool lazy);
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
 				 unsigned long flags);
 static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index e369efe94fda..b9244f22e102 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -256,6 +256,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
 	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
 }
 
+#define LAZY_FLUSH_JIFFIES (10 * HZ)
+
 /*
  * Arrange to wake the GP kthread for this NOCB group at some future
  * time when it is safe to do so.
@@ -272,7 +274,10 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	 * Bypass wakeup overrides previous deferments. In case
 	 * of callback storm, no need to wake up too early.
 	 */
-	if (waketype == RCU_NOCB_WAKE_BYPASS) {
+	if (waketype == RCU_NOCB_WAKE_LAZY) {
+		mod_timer(&rdp_gp->nocb_timer, jiffies + LAZY_FLUSH_JIFFIES);
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
 		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
 	} else {
@@ -296,7 +301,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
  * Note that this function always returns true if rhp is NULL.
  */
 static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				     unsigned long j)
+				     unsigned long j, bool lazy)
 {
 	struct rcu_cblist rcl;
 
@@ -310,7 +315,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	/* Note: ->cblist.len already accounts for ->nocb_bypass contents. */
 	if (rhp)
 		rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
-	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
+
+	trace_printk("call_rcu_lazy callbacks = %ld\n", READ_ONCE(rdp->nocb_bypass.lazy_len));
+	/* The lazy CBs are being flushed, but a new one might be enqueued. */
+	if (lazy)
+		rcu_cblist_flush_enqueue_lazy(&rcl, &rdp->nocb_bypass, rhp);
+	else
+		rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
 	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
 	WRITE_ONCE(rdp->nocb_bypass_first, j);
 	rcu_nocb_bypass_unlock(rdp);
@@ -326,13 +337,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
  * Note that this function always returns true if rhp is NULL.
  */
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j)
+				  unsigned long j, bool lazy)
 {
 	if (!rcu_rdp_is_offloaded(rdp))
 		return true;
 	rcu_lockdep_assert_cblist_protected(rdp);
 	rcu_nocb_bypass_lock(rdp);
-	return rcu_nocb_do_flush_bypass(rdp, rhp, j);
+	return rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy);
 }
 
 /*
@@ -345,7 +356,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
 	if (!rcu_rdp_is_offloaded(rdp) ||
 	    !rcu_nocb_bypass_trylock(rdp))
 		return;
-	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
+	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false));
 }
 
 /*
@@ -367,12 +378,14 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
  * there is only one CPU in operation.
  */
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags)
+				bool *was_alldone, unsigned long flags,
+				bool lazy)
 {
 	unsigned long c;
 	unsigned long cur_gp_seq;
 	unsigned long j = jiffies;
 	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	long n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 
 	lockdep_assert_irqs_disabled();
 
@@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	}
 	WRITE_ONCE(rdp->nocb_nobypass_count, c);
 
-	// If there hasn't yet been all that many ->cblist enqueues
-	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
-	// ->nocb_bypass first.
-	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
+	// If caller passed a non-lazy CB and there hasn't yet been all that
+	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
+	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
+	// number of CBs (lazy + non-lazy) grows too much.
+	//
+	// Note that if the bypass list has lazy CBs, and the main list is
+	// empty, and rhp happens to be non-lazy, then we end up flushing all
+	// the lazy CBs to the main list as well. That's the right thing to do,
+	// since we are kick-starting RCU GP processing anyway for the non-lazy
+	// one, we can just reuse that GP for the already queued-up lazy ones.
+	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
+	    (lazy && n_lazy_cbs >= qhimark)) {
 		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 		if (*was_alldone)
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-					    TPS("FirstQ"));
-		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
+					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
+		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
 		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
 		return false; // Caller must enqueue the callback.
 	}
 
 	// If ->nocb_bypass has been used too long or is too full,
 	// flush ->nocb_bypass to ->cblist.
-	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
-	    ncbs >= qhimark) {
+	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
 		rcu_nocb_lock(rdp);
-		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
+		if (!rcu_nocb_flush_bypass(rdp, rhp, j, true)) {
 			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 			if (*was_alldone)
 				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-						    TPS("FirstQ"));
+						    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
 			WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
 			return false; // Caller must enqueue the callback.
 		}
@@ -455,12 +475,20 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	rcu_nocb_wait_contended(rdp);
 	rcu_nocb_bypass_lock(rdp);
 	ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 	rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
-	rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);
+	if (lazy)
+		rcu_cblist_enqueue_lazy(&rdp->nocb_bypass, rhp);
+	else
+		rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);
 	if (!ncbs) {
 		WRITE_ONCE(rdp->nocb_bypass_first, j);
-		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQ"));
+		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
+				    lazy ? TPS("FirstLazyBQ") : TPS("FirstBQ"));
+	} else if (!n_lazy_cbs && lazy) {
+		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstLazyBQ"));
 	}
+
 	rcu_nocb_bypass_unlock(rdp);
 	smp_mb(); /* Order enqueue before wake. */
 	if (ncbs) {
@@ -493,7 +521,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 {
 	unsigned long cur_gp_seq;
 	unsigned long j;
-	long len;
+	long len, lazy_len, bypass_len;
 	struct task_struct *t;
 
 	// If we are being polled or there is no kthread, just leave.
@@ -506,9 +534,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	}
 	// Need to actually to a wakeup.
 	len = rcu_segcblist_n_cbs(&rdp->cblist);
+	bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 	if (was_alldone) {
 		rdp->qlen_last_fqs_check = len;
-		if (!irqs_disabled_flags(flags)) {
+		// Only lazy CBs in bypass list
+		if (lazy_len && bypass_len == lazy_len) {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
+					   TPS("WakeLazy"));
+		} else if (!irqs_disabled_flags(flags)) {
 			/* ... if queue was empty ... */
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			wake_nocb_gp(rdp, false);
@@ -599,8 +634,8 @@ static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp,
  */
 static void nocb_gp_wait(struct rcu_data *my_rdp)
 {
-	bool bypass = false;
-	long bypass_ncbs;
+	bool bypass = false, lazy = false;
+	long bypass_ncbs, lazy_ncbs;
 	int __maybe_unused cpu = my_rdp->cpu;
 	unsigned long cur_gp_seq;
 	unsigned long flags;
@@ -648,12 +683,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 			continue;
 		}
 		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
-		if (bypass_ncbs &&
+		lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+		if (lazy_ncbs &&
+		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + LAZY_FLUSH_JIFFIES) ||
+		     bypass_ncbs > qhimark)) {
+			// Bypass full or old, so flush it.
+			(void)rcu_nocb_try_flush_bypass(rdp, j);
+			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+			lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+		} else if (bypass_ncbs &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
 		     bypass_ncbs > 2 * qhimark)) {
 			// Bypass full or old, so flush it.
 			(void)rcu_nocb_try_flush_bypass(rdp, j);
 			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+			lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
 		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			if (needwake_state)
@@ -662,8 +706,11 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		}
 		if (bypass_ncbs) {
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-					    TPS("Bypass"));
-			bypass = true;
+				    bypass_ncbs == lazy_ncbs ? TPS("Lazy") : TPS("Bypass"));
+			if (bypass_ncbs == lazy_ncbs)
+				lazy = true;
+			else
+				bypass = true;
 		}
 		rnp = rdp->mynode;
 
@@ -713,12 +760,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	my_rdp->nocb_gp_gp = needwait_gp;
 	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
 
-	if (bypass && !rcu_nocb_poll) {
-		// At least one child with non-empty ->nocb_bypass, so set
-		// timer in order to avoid stranding its callbacks.
-		wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
-				   TPS("WakeBypassIsDeferred"));
+	// At least one child with non-empty ->nocb_bypass, so set
+	// timer in order to avoid stranding its callbacks.
+	if (!rcu_nocb_poll) {
+		// If bypass list only has lazy CBs. Add a deferred
+		// lazy wake up.
+		if (lazy && !bypass) {
+			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY,
+					TPS("WakeLazyIsDeferred"));
+		// Otherwise add a deferred bypass wake up.
+		} else if (bypass) {
+			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
+					TPS("WakeBypassIsDeferred"));
+		}
 	}
+
 	if (rcu_nocb_poll) {
 		/* Polling, so trace if first poll in the series. */
 		if (gotcbs)
@@ -999,7 +1055,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	 * return false, which means that future calls to rcu_nocb_try_bypass()
 	 * will refuse to put anything into the bypass.
 	 */
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
 	/*
 	 * Start with invoking rcu_core() early. This way if the current thread
 	 * happens to preempt an ongoing call to rcu_core() in the middle,
@@ -1500,13 +1556,14 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 }
 
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j)
+				  unsigned long j, bool lazy)
 {
 	return true;
 }
 
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags)
+				bool *was_alldone, unsigned long flags,
+				bool lazy)
 {
 	return false;
 }
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 2/8] rcu: shrinker for lazy rcu
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
@ 2022-06-22 22:50 ` Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 3/8] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes

From: Vineeth Pillai <vineeth@bitbyteword.org>

The shrinker is used to speed up the free'ing of memory potentially
held by RCU lazy callbacks. RCU kernel module test cases show this to be
effective. Test is introduced in a later patch.

Signed-off-by: Vineeth Pillai <vineeth@bitbyteword.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu_segcblist.h | 14 +++++++++--
 kernel/rcu/tree_nocb.h     | 48 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index c3d7de65b689..cf71425dbb5e 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -15,16 +15,26 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
 	return READ_ONCE(rclp->len);
 }
 
+#ifdef CONFIG_RCU_LAZY
 /* Return number of callbacks in the specified callback list. */
 static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
 {
-#ifdef CONFIG_RCU_LAZY
 	return READ_ONCE(rclp->lazy_len);
+}
+
+static inline void rcu_cblist_reset_lazy_len(struct rcu_cblist *rclp)
+{
+	WRITE_ONCE(rclp->lazy_len, 0);
+}
 #else
+static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
+{
 	return 0;
-#endif
 }
 
+static inline void rcu_cblist_reset_lazy_len(struct rcu_cblist *rclp) {}
+#endif
+
 /* Return number of callbacks in segmented callback list by summing seglen. */
 long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
 
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index b9244f22e102..2f5da12811a5 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1207,6 +1207,51 @@ int rcu_nocb_cpu_offload(int cpu)
 }
 EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload);
 
+static unsigned long
+lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) {
+	int cpu;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+		count += rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+	}
+
+	return count ? count : SHRINK_EMPTY;
+}
+
+static unsigned long
+lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) {
+	int cpu;
+	unsigned long flags;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+		int _count = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
+		if (_count == 0)
+			continue;
+		rcu_nocb_lock_irqsave(rdp, flags);
+		rcu_cblist_reset_lazy_len(&rdp->nocb_bypass);
+		rcu_nocb_unlock_irqrestore(rdp, flags);
+		wake_nocb_gp(rdp, false);
+		sc->nr_to_scan -= _count;
+		count += _count;
+		if (sc->nr_to_scan <= 0)
+			break;
+	}
+	return count ? count : SHRINK_STOP;
+}
+
+static struct shrinker lazy_rcu_shrinker = {
+	.count_objects = lazy_rcu_shrink_count,
+	.scan_objects = lazy_rcu_shrink_scan,
+	.batch = 0,
+	.seeks = DEFAULT_SEEKS,
+};
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -1244,6 +1289,9 @@ void __init rcu_init_nohz(void)
 	if (!rcu_state.nocb_is_setup)
 		return;
 
+	if (register_shrinker(&lazy_rcu_shrinker))
+		pr_err("Failed to register lazy_rcu shrinker!\n");
+
 #if defined(CONFIG_NO_HZ_FULL)
 	if (tick_nohz_full_running)
 		cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 3/8] fs: Move call_rcu() to call_rcu_lazy() in some paths
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (2 preceding siblings ...)
  2022-06-22 22:50 ` [PATCH v2 2/8] rcu: shrinker for lazy rcu Joel Fernandes (Google)
@ 2022-06-22 22:50 ` Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 4/8] rcu/nocb: Add option to force all call_rcu() to lazy Joel Fernandes (Google)
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

This is required to prevent callbacks triggering RCU machinery too
quickly and too often, which adds more power to the system.

When testing, we found that these paths were invoked often when the
system is not doing anything (screen is ON but otherwise idle).

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 fs/dcache.c     | 4 ++--
 fs/eventpoll.c  | 2 +-
 fs/file_table.c | 2 +-
 fs/inode.c      | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 93f4f5ee07bf..7f51bac390c8 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -366,7 +366,7 @@ static void dentry_free(struct dentry *dentry)
 	if (unlikely(dname_external(dentry))) {
 		struct external_name *p = external_name(dentry);
 		if (likely(atomic_dec_and_test(&p->u.count))) {
-			call_rcu(&dentry->d_u.d_rcu, __d_free_external);
+			call_rcu_lazy(&dentry->d_u.d_rcu, __d_free_external);
 			return;
 		}
 	}
@@ -374,7 +374,7 @@ static void dentry_free(struct dentry *dentry)
 	if (dentry->d_flags & DCACHE_NORCU)
 		__d_free(&dentry->d_u.d_rcu);
 	else
-		call_rcu(&dentry->d_u.d_rcu, __d_free);
+		call_rcu_lazy(&dentry->d_u.d_rcu, __d_free);
 }
 
 /*
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 971f98af48ff..57b3f781760c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -729,7 +729,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	 * ep->mtx. The rcu read side, reverse_path_check_proc(), does not make
 	 * use of the rbn field.
 	 */
-	call_rcu(&epi->rcu, epi_rcu_free);
+	call_rcu_lazy(&epi->rcu, epi_rcu_free);
 
 	percpu_counter_dec(&ep->user->epoll_watches);
 
diff --git a/fs/file_table.c b/fs/file_table.c
index 5424e3a8df5f..417f57e9cb30 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -56,7 +56,7 @@ static inline void file_free(struct file *f)
 	security_file_free(f);
 	if (!(f->f_mode & FMODE_NOACCOUNT))
 		percpu_counter_dec(&nr_files);
-	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
+	call_rcu_lazy(&f->f_u.fu_rcuhead, file_free_rcu);
 }
 
 /*
diff --git a/fs/inode.c b/fs/inode.c
index bd4da9c5207e..38fe040ddbd6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -312,7 +312,7 @@ static void destroy_inode(struct inode *inode)
 			return;
 	}
 	inode->free_inode = ops->free_inode;
-	call_rcu(&inode->i_rcu, i_callback);
+	call_rcu_lazy(&inode->i_rcu, i_callback);
 }
 
 /**
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 4/8] rcu/nocb: Add option to force all call_rcu() to lazy
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (3 preceding siblings ...)
  2022-06-22 22:50 ` [PATCH v2 3/8] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
@ 2022-06-22 22:50 ` Joel Fernandes (Google)
  2022-06-22 22:50 ` [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing Joel Fernandes (Google)
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

This will be used in the rcu scale test, to ensure that fly-by call_rcu()s do
no cause call_rcu_lazy() CBs to be flushed to the rdp ->cblist.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu.h  |  2 ++
 kernel/rcu/tree.c | 11 ++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 4916077119f3..71c0f45e70c3 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -472,6 +472,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
 			       unsigned long c_old,
 			       unsigned long c);
 void rcu_gp_set_torture_wait(int duration);
+void rcu_force_call_rcu_to_lazy(bool force);
 #else
 static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
 					  int *flags, unsigned long *gp_seq)
@@ -490,6 +491,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
 	do { } while (0)
 #endif
 static inline void rcu_gp_set_torture_wait(int duration) { }
+static inline void rcu_force_call_rcu_to_lazy(bool force) { }
 #endif
 
 #if IS_ENABLED(CONFIG_RCU_TORTURE_TEST) || IS_MODULE(CONFIG_RCU_TORTURE_TEST)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d2e3d6e176d2..711679d10cbb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3171,9 +3171,18 @@ void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
 EXPORT_SYMBOL_GPL(call_rcu_lazy);
 #endif
 
+static bool force_call_rcu_to_lazy;
+
+void rcu_force_call_rcu_to_lazy(bool force)
+{
+	if (IS_ENABLED(CONFIG_RCU_SCALE_TEST))
+		WRITE_ONCE(force_call_rcu_to_lazy, force);
+}
+EXPORT_SYMBOL_GPL(rcu_force_call_rcu_to_lazy);
+
 void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-	return __call_rcu_common(head, func, false);
+	return __call_rcu_common(head, func, force_call_rcu_to_lazy);
 
 }
 EXPORT_SYMBOL_GPL(call_rcu);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (4 preceding siblings ...)
  2022-06-22 22:50 ` [PATCH v2 4/8] rcu/nocb: Add option to force all call_rcu() to lazy Joel Fernandes (Google)
@ 2022-06-22 22:50 ` Joel Fernandes (Google)
  2022-06-26  4:06   ` Paul E. McKenney
  2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:50 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

We notice that rcu_barrier() can take a really long time. It appears
that this can happen when all CBs are lazy and the timer does not fire
yet. So after flushing, nothing wakes up GP thread. This patch forces
GP thread to wake when bypass flushing happens, this fixes the
rcu_barrier() delays with lazy CBs.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_nocb.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 2f5da12811a5..b481f1ea57c0 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -325,6 +325,8 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
 	WRITE_ONCE(rdp->nocb_bypass_first, j);
 	rcu_nocb_bypass_unlock(rdp);
+
+	wake_nocb_gp(rdp, true);
 	return true;
 }
 
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (5 preceding siblings ...)
  2022-06-22 22:50 ` [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing Joel Fernandes (Google)
@ 2022-06-22 22:51 ` Joel Fernandes (Google)
  2022-06-23  2:09   ` kernel test robot
                     ` (3 more replies)
  2022-06-22 22:51 ` [PATCH v2 7/8] rcu/nocb: Rewrite deferred wake up logic to be more clean Joel Fernandes (Google)
                   ` (2 subsequent siblings)
  9 siblings, 4 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:51 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

Reuse the kfree_rcu() test in order to be able to compare the memory reclaiming
properties of call_rcu_lazy() with kfree_rcu().

With this test, we find similar memory footprint and time call_rcu_lazy()
free'ing takes compared to kfree_rcu(). Also we confirm that call_rcu_lazy()
can survive OOM during extremely frequent calls.

If we really push it, i.e. boot system with low memory and compare
kfree_rcu() with call_rcu_lazy(), I find that call_rcu_lazy() is more
resilient and is much harder to produce OOM as compared to kfree_rcu().

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcu.h       |  6 ++++
 kernel/rcu/rcuscale.c  | 64 +++++++++++++++++++++++++++++++++++++++++-
 kernel/rcu/tree_nocb.h | 17 ++++++++++-
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 71c0f45e70c3..436faf80a66b 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -473,6 +473,12 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
 			       unsigned long c);
 void rcu_gp_set_torture_wait(int duration);
 void rcu_force_call_rcu_to_lazy(bool force);
+
+#if IS_ENABLED(CONFIG_RCU_SCALE_TEST)
+unsigned long rcu_scale_get_jiffies_till_flush(void);
+void rcu_scale_set_jiffies_till_flush(unsigned long j);
+#endif
+
 #else
 static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
 					  int *flags, unsigned long *gp_seq)
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 277a5bfb37d4..58ee5c2cb37b 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -95,6 +95,7 @@ torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
 torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
+torture_param(int, kfree_rcu_by_lazy, 0, "Use call_rcu_lazy() to emulate kfree_rcu()?");
 
 static char *scale_type = "rcu";
 module_param(scale_type, charp, 0444);
@@ -658,6 +659,13 @@ struct kfree_obj {
 	struct rcu_head rh;
 };
 
+/* Used if doing RCU-kfree'ing via call_rcu_lazy(). */
+void kfree_rcu_lazy(struct rcu_head *rh)
+{
+	struct kfree_obj *obj = container_of(rh, struct kfree_obj, rh);
+	kfree(obj);
+}
+
 static int
 kfree_scale_thread(void *arg)
 {
@@ -695,6 +703,11 @@ kfree_scale_thread(void *arg)
 			if (!alloc_ptr)
 				return -ENOMEM;
 
+			if (kfree_rcu_by_lazy) {
+				call_rcu_lazy(&(alloc_ptr->rh), kfree_rcu_lazy);
+				continue;
+			}
+
 			// By default kfree_rcu_test_single and kfree_rcu_test_double are
 			// initialized to false. If both have the same value (false or true)
 			// both are randomly tested, otherwise only the one with value true
@@ -737,6 +750,9 @@ kfree_scale_cleanup(void)
 {
 	int i;
 
+	if (kfree_rcu_by_lazy)
+		rcu_force_call_rcu_to_lazy(false);
+
 	if (torture_cleanup_begin())
 		return;
 
@@ -766,11 +782,55 @@ kfree_scale_shutdown(void *arg)
 	return -EINVAL;
 }
 
+// Used if doing RCU-kfree'ing via call_rcu_lazy().
+unsigned long jiffies_at_lazy_cb;
+struct rcu_head lazy_test1_rh;
+int rcu_lazy_test1_cb_called;
+void call_rcu_lazy_test1(struct rcu_head *rh)
+{
+	jiffies_at_lazy_cb = jiffies;
+	WRITE_ONCE(rcu_lazy_test1_cb_called, 1);
+}
+
 static int __init
 kfree_scale_init(void)
 {
 	long i;
 	int firsterr = 0;
+	unsigned long orig_jif, jif_start;
+
+	// Force all call_rcu() to call_rcu_lazy() so that non-lazy CBs
+	// do not remove laziness of the lazy ones (since the test tries
+	// to stress call_rcu_lazy() for OOM).
+	//
+	// Also, do a quick self-test to ensure laziness is as much as
+	// expected.
+	if (kfree_rcu_by_lazy) {
+		/* do a test to check the timeout. */
+		orig_jif = rcu_scale_get_jiffies_till_flush();
+
+		rcu_force_call_rcu_to_lazy(true);
+		rcu_scale_set_jiffies_till_flush(2 * HZ);
+		rcu_barrier();
+
+		jif_start = jiffies;
+		jiffies_at_lazy_cb = 0;
+		call_rcu_lazy(&lazy_test1_rh, call_rcu_lazy_test1);
+
+		smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
+
+		rcu_scale_set_jiffies_till_flush(orig_jif);
+
+		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
+			pr_alert("Lazy CBs are not being lazy as expected!\n");
+			return -1;
+		}
+
+		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
+			pr_alert("Lazy CBs are being too lazy!\n");
+			return -1;
+		}
+	}
 
 	kfree_nrealthreads = compute_real(kfree_nthreads);
 	/* Start up the kthreads. */
@@ -783,7 +843,9 @@ kfree_scale_init(void)
 		schedule_timeout_uninterruptible(1);
 	}
 
-	pr_alert("kfree object size=%zu\n", kfree_mult * sizeof(struct kfree_obj));
+	pr_alert("kfree object size=%zu, kfree_rcu_by_lazy=%d\n",
+			kfree_mult * sizeof(struct kfree_obj),
+			kfree_rcu_by_lazy);
 
 	kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
 			       GFP_KERNEL);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index b481f1ea57c0..255f2945b0fc 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -257,6 +257,21 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
 }
 
 #define LAZY_FLUSH_JIFFIES (10 * HZ)
+unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
+
+#ifdef CONFIG_RCU_SCALE_TEST
+void rcu_scale_set_jiffies_till_flush(unsigned long jif)
+{
+	jiffies_till_flush = jif;
+}
+EXPORT_SYMBOL(rcu_scale_set_jiffies_till_flush);
+
+unsigned long rcu_scale_get_jiffies_till_flush(void)
+{
+	return jiffies_till_flush;
+}
+EXPORT_SYMBOL(rcu_scale_get_jiffies_till_flush);
+#endif
 
 /*
  * Arrange to wake the GP kthread for this NOCB group at some future
@@ -275,7 +290,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	 * of callback storm, no need to wake up too early.
 	 */
 	if (waketype == RCU_NOCB_WAKE_LAZY) {
-		mod_timer(&rdp_gp->nocb_timer, jiffies + LAZY_FLUSH_JIFFIES);
+		mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
 	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
 		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 7/8] rcu/nocb: Rewrite deferred wake up logic to be more clean
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (6 preceding siblings ...)
  2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
@ 2022-06-22 22:51 ` Joel Fernandes (Google)
  2022-06-22 22:51 ` [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value Joel Fernandes (Google)
  2022-06-26  3:12 ` [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Paul E. McKenney
  9 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:51 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

There are 2 things this function does:
1. modify the gp wake timer.
2. save the value of the strongest requested wake up so far.

The strongest is "wake force" and the weakest is "lazy".

The existing logic already does the following:
1. if the existing deferred wake is stronger than the requested one
   (requested in waketype), modify the gp timer to be more in the
   future. For example, if the existing one is WAKE and the new waketype
   requested is BYPASS, then the timer is made to expire later than
   earlier.

2. even though the timer is modified in #1, a weaker waketype does not
   end up changing rdp->nocb_gp_defer to be weaker. In other words,
   ->nocb_gp_defer records the strongest waketype requested so far,
   even though the timer may or may not be the soonest expiry possible.

For simplicity, we write this logic using switch statements and
consolidate some of the timer modification operations.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree_nocb.h | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 255f2945b0fc..67b0bd5d233a 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -282,6 +282,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 {
 	unsigned long flags;
 	struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
+	unsigned long mod_jif = 0;
 
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
@@ -289,19 +290,31 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	 * Bypass wakeup overrides previous deferments. In case
 	 * of callback storm, no need to wake up too early.
 	 */
-	if (waketype == RCU_NOCB_WAKE_LAZY) {
-		mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
-		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
-	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
-		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
-		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
-	} else {
-		if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
-			mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
-		if (rdp_gp->nocb_defer_wakeup < waketype)
-			WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+	switch (waketype) {
+		case RCU_NOCB_WAKE_LAZY:
+			mod_jif = jiffies_till_flush;
+			break;
+
+		case RCU_NOCB_WAKE_BYPASS:
+			mod_jif = 2;
+			break;
+
+		case RCU_NOCB_WAKE:
+		case RCU_NOCB_WAKE_FORCE:
+			// If the type of deferred wake is "stronger"
+			// than it was before, make it wake up the soonest.
+			if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
+				mod_jif = 1;
+			break;
 	}
 
+	if (mod_jif)
+		mod_timer(&rdp_gp->nocb_timer, jiffies + mod_jif);
+
+	// If new type of wake up is strong than before, promote.
+	if (rdp_gp->nocb_defer_wakeup < waketype)
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+
 	raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
 
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason);
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (7 preceding siblings ...)
  2022-06-22 22:51 ` [PATCH v2 7/8] rcu/nocb: Rewrite deferred wake up logic to be more clean Joel Fernandes (Google)
@ 2022-06-22 22:51 ` Joel Fernandes (Google)
  2022-06-26  4:17   ` Paul E. McKenney
  2022-06-27 18:56   ` Uladzislau Rezki
  2022-06-26  3:12 ` [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Paul E. McKenney
  9 siblings, 2 replies; 60+ messages in thread
From: Joel Fernandes (Google) @ 2022-06-22 22:51 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth, Joel Fernandes (Google)

As per the comments in include/linux/shrinker.h, .count_objects callback
should return the number of freeable items, but if there are no objects
to free, SHRINK_EMPTY should be returned. The only time 0 is returned
should be when we are unable to determine the number of objects, or the
cache should be skipped for another reason.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 711679d10cbb..935788e8d2d7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 		atomic_set(&krcp->backoff_page_cache_fill, 1);
 	}
 
-	return count;
+	return count == 0 ? SHRINK_EMPTY : count;
 }
 
 static unsigned long
-- 
2.37.0.rc0.104.g0611611a94-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN
  2022-06-22 22:50 ` [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN Joel Fernandes (Google)
@ 2022-06-22 22:58   ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2022-06-22 22:58 UTC (permalink / raw)
  To: rcu
  Cc: LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Paul E. McKenney,
	Steven Rostedt, vineeth, Marco Elver

Apologies, I accidentally picked this specific commit up in my recent
git format-patch which ended up sending it out. Totally unintended.

On Wed, Jun 22, 2022 at 6:51 PM Joel Fernandes (Google)
<joel@joelfernandes.org> wrote:
>
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> Context tracking's __ct_state() function can be invoked from noinstr state
> where RCU is not watching.  This means that its use of atomic_read()
> causes KASAN to invoke the non-noinstr __kasan_check_read() function
> from the noinstr function __ct_state().  This is problematic because
> someone tracing the __kasan_check_read() function could get a nasty
> surprise because of RCU not watching.
>
> This commit therefore replaces the __ct_state() function's use of
> atomic_read() with arch_atomic_read(), which KASAN does not attempt to
> add instrumention to.
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Marco Elver <elver@google.com>
> Reviewed-by: Marco Elver <elver@google.com>
> ---
>  include/linux/context_tracking_state.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
> index 0aecc07fb4f5..81c51e5f0314 100644
> --- a/include/linux/context_tracking_state.h
> +++ b/include/linux/context_tracking_state.h
> @@ -49,7 +49,7 @@ DECLARE_PER_CPU(struct context_tracking, context_tracking);
>
>  static __always_inline int __ct_state(void)
>  {
> -       return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
> +       return arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
>  }
>  #endif
>
> --
> 2.37.0.rc0.104.g0611611a94-goog
>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
@ 2022-06-22 23:18   ` Joel Fernandes
  2022-06-26  4:00     ` Paul E. McKenney
  2022-06-23  1:38   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-22 23:18 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10, frederic,
	paulmck, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
[..]
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 2ccf5845957d..fec4fad6654b 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -267,8 +267,9 @@ struct rcu_data {
>  /* Values for nocb_defer_wakeup field in struct rcu_data. */
>  #define RCU_NOCB_WAKE_NOT	0
>  #define RCU_NOCB_WAKE_BYPASS	1
> -#define RCU_NOCB_WAKE		2
> -#define RCU_NOCB_WAKE_FORCE	3
> +#define RCU_NOCB_WAKE_LAZY	2
> +#define RCU_NOCB_WAKE		3
> +#define RCU_NOCB_WAKE_FORCE	4
>  
>  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
>  					/* For jiffies_till_first_fqs and */
> @@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
>  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
>  static void rcu_init_one_nocb(struct rcu_node *rnp);
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				  unsigned long j);
> +				  unsigned long j, bool lazy);
>  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				bool *was_alldone, unsigned long flags);
> +				bool *was_alldone, unsigned long flags,
> +				bool lazy);
>  static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
>  				 unsigned long flags);
>  static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index e369efe94fda..b9244f22e102 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -256,6 +256,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
>  	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
>  }
>  
> +#define LAZY_FLUSH_JIFFIES (10 * HZ)
> +
>  /*
>   * Arrange to wake the GP kthread for this NOCB group at some future
>   * time when it is safe to do so.
> @@ -272,7 +274,10 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>  	 * Bypass wakeup overrides previous deferments. In case
>  	 * of callback storm, no need to wake up too early.
>  	 */
> -	if (waketype == RCU_NOCB_WAKE_BYPASS) {
> +	if (waketype == RCU_NOCB_WAKE_LAZY) {
> +		mod_timer(&rdp_gp->nocb_timer, jiffies + LAZY_FLUSH_JIFFIES);
> +		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> +	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
>  		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
>  		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
>  	} else {
> @@ -296,7 +301,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>   * Note that this function always returns true if rhp is NULL.
>   */
>  static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				     unsigned long j)
> +				     unsigned long j, bool lazy)
>  {
>  	struct rcu_cblist rcl;
>  
> @@ -310,7 +315,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	/* Note: ->cblist.len already accounts for ->nocb_bypass contents. */
>  	if (rhp)
>  		rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
> -	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
> +
> +	trace_printk("call_rcu_lazy callbacks = %ld\n", READ_ONCE(rdp->nocb_bypass.lazy_len));

Before anyone yells at me, that trace_printk() has been politely asked to take
a walk :-). It got mad at me, but on the next iteration, it wont be there.

thanks,

 - Joel

 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
  2022-06-22 23:18   ` Joel Fernandes
@ 2022-06-23  1:38   ` kernel test robot
  2022-06-26  4:00   ` Paul E. McKenney
  2022-06-29 11:53   ` Frederic Weisbecker
  3 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-06-23  1:38 UTC (permalink / raw)
  To: Joel Fernandes (Google), rcu
  Cc: kbuild-all, linux-kernel, rushikesh.s.kadam, urezki,
	neeraj.iitr10, frederic, paulmck, rostedt, vineeth,
	Joel Fernandes (Google)

Hi "Joel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.19-rc3 next-20220622]
[cannot apply to paulmck-rcu/dev]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git de5c208d533a46a074eb46ea17f672cc005a7269
config: riscv-nommu_k210_defconfig (https://download.01.org/0day-ci/archive/20220623/202206230916.2YtpR3sO-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/543ee31928d1cff057320ff64603283a34fe0052
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
        git checkout 543ee31928d1cff057320ff64603283a34fe0052
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash kernel/rcu/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/rcu/tree.c:3103: warning: Function parameter or member 'lazy' not described in '__call_rcu_common'
>> kernel/rcu/tree.c:3103: warning: expecting prototype for call_rcu(). Prototype was for __call_rcu_common() instead


vim +3103 kernel/rcu/tree.c

b2b00ddf193bf83 kernel/rcu/tree.c Paul E. McKenney        2019-10-30  3060  
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3061  /**
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3062   * call_rcu() - Queue an RCU callback for invocation after a grace period.
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3063   * @head: structure to be used for queueing the RCU updates.
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3064   * @func: actual callback function to be invoked after the grace period
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3065   *
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3066   * The callback function will be invoked some time after a full grace
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3067   * period elapses, in other words after all pre-existing RCU read-side
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3068   * critical sections have completed.  However, the callback function
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3069   * might well execute concurrently with RCU read-side critical sections
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3070   * that started after call_rcu() was invoked.
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3071   *
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3072   * RCU read-side critical sections are delimited by rcu_read_lock()
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3073   * and rcu_read_unlock(), and may be nested.  In addition, but only in
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3074   * v5.0 and later, regions of code across which interrupts, preemption,
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3075   * or softirqs have been disabled also serve as RCU read-side critical
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3076   * sections.  This includes hardware interrupt handlers, softirq handlers,
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3077   * and NMI handlers.
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3078   *
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3079   * Note that all CPUs must agree that the grace period extended beyond
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3080   * all pre-existing RCU read-side critical section.  On systems with more
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3081   * than one CPU, this means that when "func()" is invoked, each CPU is
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3082   * guaranteed to have executed a full memory barrier since the end of its
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3083   * last RCU read-side critical section whose beginning preceded the call
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3084   * to call_rcu().  It also means that each CPU executing an RCU read-side
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3085   * critical section that continues beyond the start of "func()" must have
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3086   * executed a memory barrier after the call_rcu() but before the beginning
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3087   * of that RCU read-side critical section.  Note that these guarantees
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3088   * include CPUs that are offline, idle, or executing in user mode, as
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3089   * well as CPUs that are executing in the kernel.
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3090   *
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3091   * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3092   * resulting RCU callback function "func()", then both CPU A and CPU B are
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3093   * guaranteed to execute a full memory barrier during the time interval
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3094   * between the call to call_rcu() and the invocation of "func()" -- even
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3095   * if CPU A and CPU B are the same CPU (but again only if the system has
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3096   * more than one CPU).
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3097   *
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3098   * Implementation of these memory-ordering guarantees is described here:
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3099   * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3100   */
543ee31928d1cff kernel/rcu/tree.c Joel Fernandes (Google  2022-06-22  3101) static void
543ee31928d1cff kernel/rcu/tree.c Joel Fernandes (Google  2022-06-22  3102) __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18 @3103  {
b4b7914a6a73fc1 kernel/rcu/tree.c Paul E. McKenney        2020-12-08  3104  	static atomic_t doublefrees;
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3105  	unsigned long flags;
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3106  	struct rcu_data *rdp;
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3107  	bool was_alldone;
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3108  
b8f2ed538477d9a kernel/rcu/tree.c Paul E. McKenney        2016-08-23  3109  	/* Misaligned rcu_head! */
b8f2ed538477d9a kernel/rcu/tree.c Paul E. McKenney        2016-08-23  3110  	WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
b8f2ed538477d9a kernel/rcu/tree.c Paul E. McKenney        2016-08-23  3111  
ae15018456c44b7 kernel/rcutree.c  Paul E. McKenney        2013-04-23  3112  	if (debug_rcu_head_queue(head)) {
fa3c66476975abf kernel/rcu/tree.c Paul E. McKenney        2017-05-03  3113  		/*
fa3c66476975abf kernel/rcu/tree.c Paul E. McKenney        2017-05-03  3114  		 * Probable double call_rcu(), so leak the callback.
fa3c66476975abf kernel/rcu/tree.c Paul E. McKenney        2017-05-03  3115  		 * Use rcu:rcu_callback trace event to find the previous
1fe09ebe7a9c990 kernel/rcu/tree.c Paul E. McKenney        2021-12-18  3116  		 * time callback was passed to call_rcu().
fa3c66476975abf kernel/rcu/tree.c Paul E. McKenney        2017-05-03  3117  		 */
b4b7914a6a73fc1 kernel/rcu/tree.c Paul E. McKenney        2020-12-08  3118  		if (atomic_inc_return(&doublefrees) < 4) {
b4b7914a6a73fc1 kernel/rcu/tree.c Paul E. McKenney        2020-12-08  3119  			pr_err("%s(): Double-freed CB %p->%pS()!!!  ", __func__, head, head->func);
b4b7914a6a73fc1 kernel/rcu/tree.c Paul E. McKenney        2020-12-08  3120  			mem_dump_obj(head);
b4b7914a6a73fc1 kernel/rcu/tree.c Paul E. McKenney        2020-12-08  3121  		}
7d0ae8086b82831 kernel/rcu/tree.c Paul E. McKenney        2015-03-03  3122  		WRITE_ONCE(head->func, rcu_leak_callback);
ae15018456c44b7 kernel/rcutree.c  Paul E. McKenney        2013-04-23  3123  		return;
ae15018456c44b7 kernel/rcutree.c  Paul E. McKenney        2013-04-23  3124  	}
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3125  	head->func = func;
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3126  	head->next = NULL;
300c0c5e721834f kernel/rcu/tree.c Jun Miao                2021-11-16  3127  	kasan_record_aux_stack_noalloc(head);
d818cc76e2b4d5f kernel/rcu/tree.c Zqiang                  2021-12-26  3128  	local_irq_save(flags);
da1df50d16171f4 kernel/rcu/tree.c Paul E. McKenney        2018-07-03  3129  	rdp = this_cpu_ptr(&rcu_data);
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3130  
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3131  	/* Add the callback to our list. */
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3132  	if (unlikely(!rcu_segcblist_is_enabled(&rdp->cblist))) {
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3133  		// This can trigger due to call_rcu() from offline CPU:
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3134  		WARN_ON_ONCE(rcu_scheduler_active != RCU_SCHEDULER_INACTIVE);
34404ca8fb252cc kernel/rcu/tree.c Paul E. McKenney        2015-01-19  3135  		WARN_ON_ONCE(!rcu_is_watching());
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3136  		// Very early boot, before rcu_init().  Initialize if needed
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3137  		// and then drop through to queue the callback.
15fecf89e46a962 kernel/rcu/tree.c Paul E. McKenney        2017-02-08  3138  		if (rcu_segcblist_empty(&rdp->cblist))
15fecf89e46a962 kernel/rcu/tree.c Paul E. McKenney        2017-02-08  3139  			rcu_segcblist_init(&rdp->cblist);
143da9c2fc030a5 kernel/rcu/tree.c Paul E. McKenney        2015-01-19  3140  	}
77a40f97030b27b kernel/rcu/tree.c Joel Fernandes (Google  2019-08-30  3141) 
b2b00ddf193bf83 kernel/rcu/tree.c Paul E. McKenney        2019-10-30  3142  	check_cb_ovld(rdp);
543ee31928d1cff kernel/rcu/tree.c Joel Fernandes (Google  2022-06-22  3143) 	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
d1b222c6be1f8bf kernel/rcu/tree.c Paul E. McKenney        2019-07-02  3144  		return; // Enqueued onto ->nocb_bypass, so just leave.
b692dc4adfcff54 kernel/rcu/tree.c Paul E. McKenney        2020-02-11  3145  	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
77a40f97030b27b kernel/rcu/tree.c Joel Fernandes (Google  2019-08-30  3146) 	rcu_segcblist_enqueue(&rdp->cblist, head);
c408b215f58f715 kernel/rcu/tree.c Uladzislau Rezki (Sony  2020-05-25  3147) 	if (__is_kvfree_rcu_offset((unsigned long)func))
c408b215f58f715 kernel/rcu/tree.c Uladzislau Rezki (Sony  2020-05-25  3148) 		trace_rcu_kvfree_callback(rcu_state.name, head,
3c779dfef2c4524 kernel/rcu/tree.c Paul E. McKenney        2018-07-05  3149  					 (unsigned long)func,
15fecf89e46a962 kernel/rcu/tree.c Paul E. McKenney        2017-02-08  3150  					 rcu_segcblist_n_cbs(&rdp->cblist));
d4c08f2ac311a36 kernel/rcutree.c  Paul E. McKenney        2011-06-25  3151  	else
3c779dfef2c4524 kernel/rcu/tree.c Paul E. McKenney        2018-07-05  3152  		trace_rcu_callback(rcu_state.name, head,
15fecf89e46a962 kernel/rcu/tree.c Paul E. McKenney        2017-02-08  3153  				   rcu_segcblist_n_cbs(&rdp->cblist));
d4c08f2ac311a36 kernel/rcutree.c  Paul E. McKenney        2011-06-25  3154  
3afe7fa535491ec kernel/rcu/tree.c Joel Fernandes (Google  2020-11-14  3155) 	trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
3afe7fa535491ec kernel/rcu/tree.c Joel Fernandes (Google  2020-11-14  3156) 
29154c57e35a191 kernel/rcutree.c  Paul E. McKenney        2012-05-30  3157  	/* Go handle any RCU core processing required. */
3820b513a2e33d6 kernel/rcu/tree.c Frederic Weisbecker     2020-11-12  3158  	if (unlikely(rcu_rdp_is_offloaded(rdp))) {
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3159  		__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3160  	} else {
5c7d89676bc5196 kernel/rcu/tree.c Paul E. McKenney        2018-07-03  3161  		__call_rcu_core(rdp, head, flags);
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3162  		local_irq_restore(flags);
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3163  	}
5d6742b37727e11 kernel/rcu/tree.c Paul E. McKenney        2019-05-15  3164  }
64db4cfff99c04c kernel/rcutree.c  Paul E. McKenney        2008-12-18  3165  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
@ 2022-06-23  2:09   ` kernel test robot
  2022-06-23  3:00   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-06-23  2:09 UTC (permalink / raw)
  To: Joel Fernandes (Google), rcu
  Cc: llvm, kbuild-all, linux-kernel, rushikesh.s.kadam, urezki,
	neeraj.iitr10, frederic, paulmck, rostedt, vineeth,
	Joel Fernandes (Google)

Hi "Joel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19-rc3 next-20220622]
[cannot apply to paulmck-rcu/dev]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git de5c208d533a46a074eb46ea17f672cc005a7269
config: hexagon-randconfig-r045-20220622 (https://download.01.org/0day-ci/archive/20220623/202206230936.goRWmVwu-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 46be5faaf03466c3751f8a2882bef5a217e15926)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/6c59cb940f39b882c20e6858c41df7c1470b930a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
        git checkout 6c59cb940f39b882c20e6858c41df7c1470b930a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/rcu/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> kernel/rcu/rcuscale.c:663:6: warning: no previous prototype for function 'kfree_rcu_lazy' [-Wmissing-prototypes]
   void kfree_rcu_lazy(struct rcu_head *rh)
        ^
   kernel/rcu/rcuscale.c:663:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void kfree_rcu_lazy(struct rcu_head *rh)
   ^
   static 
>> kernel/rcu/rcuscale.c:789:6: warning: no previous prototype for function 'call_rcu_lazy_test1' [-Wmissing-prototypes]
   void call_rcu_lazy_test1(struct rcu_head *rh)
        ^
   kernel/rcu/rcuscale.c:789:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void call_rcu_lazy_test1(struct rcu_head *rh)
   ^
   static 
>> kernel/rcu/rcuscale.c:810:14: error: call to undeclared function 'rcu_scale_get_jiffies_till_flush'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   orig_jif = rcu_scale_get_jiffies_till_flush();
                              ^
>> kernel/rcu/rcuscale.c:813:3: error: call to undeclared function 'rcu_scale_set_jiffies_till_flush'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                   rcu_scale_set_jiffies_till_flush(2 * HZ);
                   ^
   2 warnings and 2 errors generated.


vim +/rcu_scale_get_jiffies_till_flush +810 kernel/rcu/rcuscale.c

   661	
   662	/* Used if doing RCU-kfree'ing via call_rcu_lazy(). */
 > 663	void kfree_rcu_lazy(struct rcu_head *rh)
   664	{
   665		struct kfree_obj *obj = container_of(rh, struct kfree_obj, rh);
   666		kfree(obj);
   667	}
   668	
   669	static int
   670	kfree_scale_thread(void *arg)
   671	{
   672		int i, loop = 0;
   673		long me = (long)arg;
   674		struct kfree_obj *alloc_ptr;
   675		u64 start_time, end_time;
   676		long long mem_begin, mem_during = 0;
   677		bool kfree_rcu_test_both;
   678		DEFINE_TORTURE_RANDOM(tr);
   679	
   680		VERBOSE_SCALEOUT_STRING("kfree_scale_thread task started");
   681		set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
   682		set_user_nice(current, MAX_NICE);
   683		kfree_rcu_test_both = (kfree_rcu_test_single == kfree_rcu_test_double);
   684	
   685		start_time = ktime_get_mono_fast_ns();
   686	
   687		if (atomic_inc_return(&n_kfree_scale_thread_started) >= kfree_nrealthreads) {
   688			if (gp_exp)
   689				b_rcu_gp_test_started = cur_ops->exp_completed() / 2;
   690			else
   691				b_rcu_gp_test_started = cur_ops->get_gp_seq();
   692		}
   693	
   694		do {
   695			if (!mem_during) {
   696				mem_during = mem_begin = si_mem_available();
   697			} else if (loop % (kfree_loops / 4) == 0) {
   698				mem_during = (mem_during + si_mem_available()) / 2;
   699			}
   700	
   701			for (i = 0; i < kfree_alloc_num; i++) {
   702				alloc_ptr = kmalloc(kfree_mult * sizeof(struct kfree_obj), GFP_KERNEL);
   703				if (!alloc_ptr)
   704					return -ENOMEM;
   705	
   706				if (kfree_rcu_by_lazy) {
   707					call_rcu_lazy(&(alloc_ptr->rh), kfree_rcu_lazy);
   708					continue;
   709				}
   710	
   711				// By default kfree_rcu_test_single and kfree_rcu_test_double are
   712				// initialized to false. If both have the same value (false or true)
   713				// both are randomly tested, otherwise only the one with value true
   714				// is tested.
   715				if ((kfree_rcu_test_single && !kfree_rcu_test_double) ||
   716						(kfree_rcu_test_both && torture_random(&tr) & 0x800))
   717					kfree_rcu(alloc_ptr);
   718				else
   719					kfree_rcu(alloc_ptr, rh);
   720			}
   721	
   722			cond_resched();
   723		} while (!torture_must_stop() && ++loop < kfree_loops);
   724	
   725		if (atomic_inc_return(&n_kfree_scale_thread_ended) >= kfree_nrealthreads) {
   726			end_time = ktime_get_mono_fast_ns();
   727	
   728			if (gp_exp)
   729				b_rcu_gp_test_finished = cur_ops->exp_completed() / 2;
   730			else
   731				b_rcu_gp_test_finished = cur_ops->get_gp_seq();
   732	
   733			pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n",
   734			       (unsigned long long)(end_time - start_time), kfree_loops,
   735			       rcuscale_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started),
   736			       (mem_begin - mem_during) >> (20 - PAGE_SHIFT));
   737	
   738			if (shutdown) {
   739				smp_mb(); /* Assign before wake. */
   740				wake_up(&shutdown_wq);
   741			}
   742		}
   743	
   744		torture_kthread_stopping("kfree_scale_thread");
   745		return 0;
   746	}
   747	
   748	static void
   749	kfree_scale_cleanup(void)
   750	{
   751		int i;
   752	
   753		if (kfree_rcu_by_lazy)
   754			rcu_force_call_rcu_to_lazy(false);
   755	
   756		if (torture_cleanup_begin())
   757			return;
   758	
   759		if (kfree_reader_tasks) {
   760			for (i = 0; i < kfree_nrealthreads; i++)
   761				torture_stop_kthread(kfree_scale_thread,
   762						     kfree_reader_tasks[i]);
   763			kfree(kfree_reader_tasks);
   764		}
   765	
   766		torture_cleanup_end();
   767	}
   768	
   769	/*
   770	 * shutdown kthread.  Just waits to be awakened, then shuts down system.
   771	 */
   772	static int
   773	kfree_scale_shutdown(void *arg)
   774	{
   775		wait_event(shutdown_wq,
   776			   atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
   777	
   778		smp_mb(); /* Wake before output. */
   779	
   780		kfree_scale_cleanup();
   781		kernel_power_off();
   782		return -EINVAL;
   783	}
   784	
   785	// Used if doing RCU-kfree'ing via call_rcu_lazy().
   786	unsigned long jiffies_at_lazy_cb;
   787	struct rcu_head lazy_test1_rh;
   788	int rcu_lazy_test1_cb_called;
 > 789	void call_rcu_lazy_test1(struct rcu_head *rh)
   790	{
   791		jiffies_at_lazy_cb = jiffies;
   792		WRITE_ONCE(rcu_lazy_test1_cb_called, 1);
   793	}
   794	
   795	static int __init
   796	kfree_scale_init(void)
   797	{
   798		long i;
   799		int firsterr = 0;
   800		unsigned long orig_jif, jif_start;
   801	
   802		// Force all call_rcu() to call_rcu_lazy() so that non-lazy CBs
   803		// do not remove laziness of the lazy ones (since the test tries
   804		// to stress call_rcu_lazy() for OOM).
   805		//
   806		// Also, do a quick self-test to ensure laziness is as much as
   807		// expected.
   808		if (kfree_rcu_by_lazy) {
   809			/* do a test to check the timeout. */
 > 810			orig_jif = rcu_scale_get_jiffies_till_flush();
   811	
   812			rcu_force_call_rcu_to_lazy(true);
 > 813			rcu_scale_set_jiffies_till_flush(2 * HZ);
   814			rcu_barrier();
   815	
   816			jif_start = jiffies;
   817			jiffies_at_lazy_cb = 0;
   818			call_rcu_lazy(&lazy_test1_rh, call_rcu_lazy_test1);
   819	
   820			smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
   821	
   822			rcu_scale_set_jiffies_till_flush(orig_jif);
   823	
   824			if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
   825				pr_alert("Lazy CBs are not being lazy as expected!\n");
   826				return -1;
   827			}
   828	
   829			if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
   830				pr_alert("Lazy CBs are being too lazy!\n");
   831				return -1;
   832			}
   833		}
   834	
   835		kfree_nrealthreads = compute_real(kfree_nthreads);
   836		/* Start up the kthreads. */
   837		if (shutdown) {
   838			init_waitqueue_head(&shutdown_wq);
   839			firsterr = torture_create_kthread(kfree_scale_shutdown, NULL,
   840							  shutdown_task);
   841			if (torture_init_error(firsterr))
   842				goto unwind;
   843			schedule_timeout_uninterruptible(1);
   844		}
   845	
   846		pr_alert("kfree object size=%zu, kfree_rcu_by_lazy=%d\n",
   847				kfree_mult * sizeof(struct kfree_obj),
   848				kfree_rcu_by_lazy);
   849	
   850		kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
   851				       GFP_KERNEL);
   852		if (kfree_reader_tasks == NULL) {
   853			firsterr = -ENOMEM;
   854			goto unwind;
   855		}
   856	
   857		for (i = 0; i < kfree_nrealthreads; i++) {
   858			firsterr = torture_create_kthread(kfree_scale_thread, (void *)i,
   859							  kfree_reader_tasks[i]);
   860			if (torture_init_error(firsterr))
   861				goto unwind;
   862		}
   863	
   864		while (atomic_read(&n_kfree_scale_thread_started) < kfree_nrealthreads)
   865			schedule_timeout_uninterruptible(1);
   866	
   867		torture_init_end();
   868		return 0;
   869	
   870	unwind:
   871		torture_init_end();
   872		kfree_scale_cleanup();
   873		return firsterr;
   874	}
   875	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
  2022-06-23  2:09   ` kernel test robot
@ 2022-06-23  3:00   ` kernel test robot
  2022-06-23  8:10   ` kernel test robot
  2022-06-26  4:13   ` Paul E. McKenney
  3 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-06-23  3:00 UTC (permalink / raw)
  To: Joel Fernandes (Google), rcu
  Cc: llvm, kbuild-all, linux-kernel, rushikesh.s.kadam, urezki,
	neeraj.iitr10, frederic, paulmck, rostedt, vineeth,
	Joel Fernandes (Google)

Hi "Joel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.19-rc3 next-20220622]
[cannot apply to paulmck-rcu/dev]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git de5c208d533a46a074eb46ea17f672cc005a7269
config: s390-randconfig-r044-20220622 (https://download.01.org/0day-ci/archive/20220623/202206231028.gVHd7wR5-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 46be5faaf03466c3751f8a2882bef5a217e15926)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/6c59cb940f39b882c20e6858c41df7c1470b930a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
        git checkout 6c59cb940f39b882c20e6858c41df7c1470b930a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: kernel/rcu/rcuscale.o: in function `kfree_scale_init':
>> kernel/rcu/rcuscale.c:810: undefined reference to `rcu_scale_get_jiffies_till_flush'
>> /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: kernel/rcu/rcuscale.c:813: undefined reference to `rcu_scale_set_jiffies_till_flush'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: kernel/rcu/rcuscale.c:822: undefined reference to `rcu_scale_set_jiffies_till_flush'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: certs/system_keyring.o: in function `load_system_certificate_list':
   certs/system_keyring.c:207: undefined reference to `x509_load_certificate_list'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: drivers/dma/fsl-edma.o: in function `fsl_edma_probe':
   drivers/dma/fsl-edma.c:302: undefined reference to `devm_ioremap_resource'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: drivers/dma/fsl-edma.c:327: undefined reference to `devm_ioremap_resource'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: drivers/dma/idma64.o: in function `idma64_platform_probe':
   drivers/dma/idma64.c:644: undefined reference to `devm_ioremap_resource'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: drivers/clocksource/timer-of.o: in function `timer_of_init':
   drivers/clocksource/timer-of.c:151: undefined reference to `iounmap'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: drivers/clocksource/timer-of.o: in function `timer_of_base_init':
   drivers/clocksource/timer-of.c:159: undefined reference to `of_iomap'
   /opt/cross/gcc-11.3.0-nolibc/s390x-linux/bin/s390x-linux-ld: drivers/clocksource/timer-of.o: in function `timer_of_cleanup':
   drivers/clocksource/timer-of.c:151: undefined reference to `iounmap'


vim +810 kernel/rcu/rcuscale.c

   794	
   795	static int __init
   796	kfree_scale_init(void)
   797	{
   798		long i;
   799		int firsterr = 0;
   800		unsigned long orig_jif, jif_start;
   801	
   802		// Force all call_rcu() to call_rcu_lazy() so that non-lazy CBs
   803		// do not remove laziness of the lazy ones (since the test tries
   804		// to stress call_rcu_lazy() for OOM).
   805		//
   806		// Also, do a quick self-test to ensure laziness is as much as
   807		// expected.
   808		if (kfree_rcu_by_lazy) {
   809			/* do a test to check the timeout. */
 > 810			orig_jif = rcu_scale_get_jiffies_till_flush();
   811	
   812			rcu_force_call_rcu_to_lazy(true);
 > 813			rcu_scale_set_jiffies_till_flush(2 * HZ);
   814			rcu_barrier();
   815	
   816			jif_start = jiffies;
   817			jiffies_at_lazy_cb = 0;
   818			call_rcu_lazy(&lazy_test1_rh, call_rcu_lazy_test1);
   819	
   820			smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
   821	
   822			rcu_scale_set_jiffies_till_flush(orig_jif);
   823	
   824			if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
   825				pr_alert("Lazy CBs are not being lazy as expected!\n");
   826				return -1;
   827			}
   828	
   829			if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
   830				pr_alert("Lazy CBs are being too lazy!\n");
   831				return -1;
   832			}
   833		}
   834	
   835		kfree_nrealthreads = compute_real(kfree_nthreads);
   836		/* Start up the kthreads. */
   837		if (shutdown) {
   838			init_waitqueue_head(&shutdown_wq);
   839			firsterr = torture_create_kthread(kfree_scale_shutdown, NULL,
   840							  shutdown_task);
   841			if (torture_init_error(firsterr))
   842				goto unwind;
   843			schedule_timeout_uninterruptible(1);
   844		}
   845	
   846		pr_alert("kfree object size=%zu, kfree_rcu_by_lazy=%d\n",
   847				kfree_mult * sizeof(struct kfree_obj),
   848				kfree_rcu_by_lazy);
   849	
   850		kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
   851				       GFP_KERNEL);
   852		if (kfree_reader_tasks == NULL) {
   853			firsterr = -ENOMEM;
   854			goto unwind;
   855		}
   856	
   857		for (i = 0; i < kfree_nrealthreads; i++) {
   858			firsterr = torture_create_kthread(kfree_scale_thread, (void *)i,
   859							  kfree_reader_tasks[i]);
   860			if (torture_init_error(firsterr))
   861				goto unwind;
   862		}
   863	
   864		while (atomic_read(&n_kfree_scale_thread_started) < kfree_nrealthreads)
   865			schedule_timeout_uninterruptible(1);
   866	
   867		torture_init_end();
   868		return 0;
   869	
   870	unwind:
   871		torture_init_end();
   872		kfree_scale_cleanup();
   873		return firsterr;
   874	}
   875	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
  2022-06-23  2:09   ` kernel test robot
  2022-06-23  3:00   ` kernel test robot
@ 2022-06-23  8:10   ` kernel test robot
  2022-06-26  4:13   ` Paul E. McKenney
  3 siblings, 0 replies; 60+ messages in thread
From: kernel test robot @ 2022-06-23  8:10 UTC (permalink / raw)
  To: Joel Fernandes (Google), rcu
  Cc: kbuild-all, linux-kernel, rushikesh.s.kadam, urezki,
	neeraj.iitr10, frederic, paulmck, rostedt, vineeth,
	Joel Fernandes (Google)

Hi "Joel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.19-rc3 next-20220622]
[cannot apply to paulmck-rcu/dev]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git de5c208d533a46a074eb46ea17f672cc005a7269
config: x86_64-randconfig-s022 (https://download.01.org/0day-ci/archive/20220623/202206231529.kLjzriV0-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-31-g4880bd19-dirty
        # https://github.com/intel-lab-lkp/linux/commit/6c59cb940f39b882c20e6858c41df7c1470b930a
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Joel-Fernandes-Google/Implement-call_rcu_lazy-and-miscellaneous-fixes/20220623-065447
        git checkout 6c59cb940f39b882c20e6858c41df7c1470b930a
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash kernel/rcu/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> kernel/rcu/rcuscale.c:663:6: sparse: sparse: symbol 'kfree_rcu_lazy' was not declared. Should it be static?
>> kernel/rcu/rcuscale.c:786:15: sparse: sparse: symbol 'jiffies_at_lazy_cb' was not declared. Should it be static?
>> kernel/rcu/rcuscale.c:787:17: sparse: sparse: symbol 'lazy_test1_rh' was not declared. Should it be static?
>> kernel/rcu/rcuscale.c:788:5: sparse: sparse: symbol 'rcu_lazy_test1_cb_called' was not declared. Should it be static?
>> kernel/rcu/rcuscale.c:789:6: sparse: sparse: symbol 'call_rcu_lazy_test1' was not declared. Should it be static?

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes
  2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
                   ` (8 preceding siblings ...)
  2022-06-22 22:51 ` [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value Joel Fernandes (Google)
@ 2022-06-26  3:12 ` Paul E. McKenney
  2022-07-08  4:17   ` Joel Fernandes
  9 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26  3:12 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:50:53PM +0000, Joel Fernandes (Google) wrote:
> 
> Hello!
> Please find the next improved version of call_rcu_lazy() attached.  The main
> difference between the previous version is that it is now using bypass lists,
> and thus handling rcu_barrier() and hotplug situations, with some small changes
> to those parts.
> 
> I also don't see the TREE07 RCU stall from v1 anymore.
> 
> In the v1, we some numbers below (testing on v2 is in progress). Rushikesh,
> feel free to pull these patches into your tree. Just to note, you will also
> need to pull the call_rcu_lazy() user patches from v1. I have dropped in this
> series, just to make the series focus on the feature code first.
> 
> Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
> The observation is that due to a 'trickle down' effect of RCU callbacks, the
> system is very lightly loaded but constantly running few RCU callbacks very
> often. This confuses the power management hardware that the system is active,
> when it is in fact idle.
> 
> For example, when ChromeOS screen is off and user is not doing anything on the
> system, we can see big power savings.
> Before:
> Pk%pc10 = 72.13
> PkgWatt = 0.58
> CorWatt = 0.04
> 
> After:
> Pk%pc10 = 81.28
> PkgWatt = 0.41
> CorWatt = 0.03

So not quite 30% savings in power at the package level?  Not bad at all!

> Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
> can see that the display pipeline is constantly doing RCU callback queuing due
> to open/close of file descriptors associated with graphics buffers. This is
> attributed to the file_free_rcu() path which this patch series also touches.
> 
> This patch series adds a simple but effective, and lockless implementation of
> RCU callback batching. On memory pressure, timeout or queue growing too big, we
> initiate a flush of one or more per-CPU lists.

It is no longer lockless, correct?  Or am I missing something subtle?

Full disclosure: I don't see a whole lot of benefit to its being lockless.
But truth in advertising!  ;-)

> Similar results can be achieved by increasing jiffies_till_first_fqs, however
> that also has the effect of slowing down RCU. Especially I saw huge slow down
> of function graph tracer when increasing that.
> 
> One drawback of this series is, if another frequent RCU callback creeps up in
> the future, that's not lazy, then that will again hurt the power. However, I
> believe identifying and fixing those is a more reasonable approach than slowing
> RCU down for the whole system.

Very good!  I have you down as the official call_rcu_lazy() whack-a-mole
developer.  ;-)

							Thanx, Paul

> Disclaimer: I have intentionally not CC'd other subsystem maintainers (like
> net, fs) to keep noise low and will CC them in the future after 1 or 2 rounds
> of review and agreements.
> 
> Joel Fernandes (Google) (7):
>   rcu: Introduce call_rcu_lazy() API implementation
>   fs: Move call_rcu() to call_rcu_lazy() in some paths
>   rcu/nocb: Add option to force all call_rcu() to lazy
>   rcu/nocb: Wake up gp thread when flushing
>   rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
>   rcu/nocb: Rewrite deferred wake up logic to be more clean
>   rcu/kfree: Fix kfree_rcu_shrink_count() return value
> 
> Vineeth Pillai (1):
>   rcu: shrinker for lazy rcu
> 
>  fs/dcache.c                   |   4 +-
>  fs/eventpoll.c                |   2 +-
>  fs/file_table.c               |   2 +-
>  fs/inode.c                    |   2 +-
>  include/linux/rcu_segcblist.h |   1 +
>  include/linux/rcupdate.h      |   6 +
>  kernel/rcu/Kconfig            |   8 ++
>  kernel/rcu/rcu.h              |   8 ++
>  kernel/rcu/rcu_segcblist.c    |  19 +++
>  kernel/rcu/rcu_segcblist.h    |  24 ++++
>  kernel/rcu/rcuscale.c         |  64 +++++++++-
>  kernel/rcu/tree.c             |  35 +++++-
>  kernel/rcu/tree.h             |  10 +-
>  kernel/rcu/tree_nocb.h        | 217 +++++++++++++++++++++++++++-------
>  14 files changed, 345 insertions(+), 57 deletions(-)
> 
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
  2022-06-22 23:18   ` Joel Fernandes
  2022-06-23  1:38   ` kernel test robot
@ 2022-06-26  4:00   ` Paul E. McKenney
  2022-07-08 18:43     ` Joel Fernandes
  2022-07-10  2:26     ` Joel Fernandes
  2022-06-29 11:53   ` Frederic Weisbecker
  3 siblings, 2 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26  4:00 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> Implement timer-based RCU lazy callback batching. The batch is flushed
> whenever a certain amount of time has passed, or the batch on a
> particular CPU grows too big. Also memory pressure will flush it in a
> future patch.
> 
> To handle several corner cases automagically (such as rcu_barrier() and
> hotplug), we re-use bypass lists to handle lazy CBs. The bypass list
> length has the lazy CB length included in it. A separate lazy CB length
> counter is also introduced to keep track of the number of lazy CBs.
> 
> Suggested-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Not bad, but some questions and comments below.

							Thanx, Paul

> ---
>  include/linux/rcu_segcblist.h |   1 +
>  include/linux/rcupdate.h      |   6 ++
>  kernel/rcu/Kconfig            |   8 +++
>  kernel/rcu/rcu_segcblist.c    |  19 ++++++
>  kernel/rcu/rcu_segcblist.h    |  14 ++++
>  kernel/rcu/tree.c             |  24 +++++--
>  kernel/rcu/tree.h             |  10 +--
>  kernel/rcu/tree_nocb.h        | 125 +++++++++++++++++++++++++---------
>  8 files changed, 164 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> index 659d13a7ddaa..9a992707917b 100644
> --- a/include/linux/rcu_segcblist.h
> +++ b/include/linux/rcu_segcblist.h
> @@ -22,6 +22,7 @@ struct rcu_cblist {
>  	struct rcu_head *head;
>  	struct rcu_head **tail;
>  	long len;
> +	long lazy_len;
>  };
>  
>  #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 1a32036c918c..9191a3d88087 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -82,6 +82,12 @@ static inline int rcu_preempt_depth(void)
>  
>  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
>  
> +#ifdef CONFIG_RCU_LAZY
> +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
> +#else
> +#define call_rcu_lazy(head, func) call_rcu(head, func)
> +#endif
> +
>  /* Internal to kernel */
>  void rcu_init(void);
>  extern int rcu_scheduler_active;
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 27aab870ae4c..0bffa992fdc4 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -293,4 +293,12 @@ config TASKS_TRACE_RCU_READ_MB
>  	  Say N here if you hate read-side memory barriers.
>  	  Take the default if you are unsure.
>  
> +config RCU_LAZY
> +	bool "RCU callback lazy invocation functionality"
> +	depends on RCU_NOCB_CPU
> +	default n
> +	help
> +	  To save power, batch RCU callbacks and flush after delay, memory
> +          pressure or callback list growing too big.

Spaces vs. tabs.

The checkpatch warning is unhelpful ("please write a help paragraph that
fully describes the config symbol"), but please fix the whitespace if
you have not already done so.

>  endmenu # "RCU Subsystem"
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index c54ea2b6a36b..627a3218a372 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -20,6 +20,7 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
>  	rclp->head = NULL;
>  	rclp->tail = &rclp->head;
>  	rclp->len = 0;
> +	rclp->lazy_len = 0;
>  }
>  
>  /*
> @@ -32,6 +33,15 @@ void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
>  	WRITE_ONCE(rclp->len, rclp->len + 1);
>  }
>  
> +/*
> + * Enqueue an rcu_head structure onto the specified callback list.

Please also note the fact that it is enqueuing lazily.

> + */
> +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp)
> +{
> +	rcu_cblist_enqueue(rclp, rhp);
> +	WRITE_ONCE(rclp->lazy_len, rclp->lazy_len + 1);

Except...  Why not just add a "lazy" parameter to rcu_cblist_enqueue()?
IS_ENABLED() can make it fast.

> +}
> +
>  /*
>   * Flush the second rcu_cblist structure onto the first one, obliterating
>   * any contents of the first.  If rhp is non-NULL, enqueue it as the sole
> @@ -60,6 +70,15 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
>  	}
>  }

Header comment, please.  It can be short, referring to that of the
function rcu_cblist_flush_enqueue().

> +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> +			      struct rcu_cblist *srclp,
> +			      struct rcu_head *rhp)

Please line up the "struct" keywords.  (Picky, I know...)

> +{
> +	rcu_cblist_flush_enqueue(drclp, srclp, rhp);
> +	if (rhp)
> +		WRITE_ONCE(srclp->lazy_len, 1);

Shouldn't this instead be a lazy argument to rcu_cblist_flush_enqueue()?
Concerns about speed in the !RCU_LAZY case can be addressed using
IS_ENABLED(), for example:

	if (IS_ENABLED(CONFIG_RCU_LAZY) && rhp)
		WRITE_ONCE(srclp->lazy_len, 1);

> +}
> +
>  /*
>   * Dequeue the oldest rcu_head structure from the specified callback
>   * list.
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 431cee212467..c3d7de65b689 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -15,14 +15,28 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
>  	return READ_ONCE(rclp->len);
>  }
>  
> +/* Return number of callbacks in the specified callback list. */
> +static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
> +{
> +#ifdef CONFIG_RCU_LAZY
> +	return READ_ONCE(rclp->lazy_len);
> +#else
> +	return 0;
> +#endif

Please use IS_ENABLED().  This saves a line (and lots of characters)
but compiles just as efficienctly.

> +}
> +
>  /* Return number of callbacks in segmented callback list by summing seglen. */
>  long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
>  
>  void rcu_cblist_init(struct rcu_cblist *rclp);
>  void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
> +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp);
>  void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
>  			      struct rcu_cblist *srclp,
>  			      struct rcu_head *rhp);
> +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> +			      struct rcu_cblist *srclp,
> +			      struct rcu_head *rhp);

Please line up the "struct" keywords.  (Still picky, I know...)

> +{
> +	rcu_cblist_flush_enqueue(drclp, srclp, rhp);
> +	if (rhp)
> +		WRITE_ONCE(srclp->lazy_len, 1);
> +}
> +
>  /*
>   * Dequeue the oldest rcu_head structure from the specified callback
>   * list.
> diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> index 431cee212467..c3d7de65b689 100644
> --- a/kernel/rcu/rcu_segcblist.h
> +++ b/kernel/rcu/rcu_segcblist.h
> @@ -15,14 +15,28 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
>  	return READ_ONCE(rclp->len);
>  }
>  
> +/* Return number of callbacks in the specified callback list. */
> +static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
> +{
> +#ifdef CONFIG_RCU_LAZY
> +	return READ_ONCE(rclp->lazy_len);
> +#else
> +	return 0;
> +#endif

Please use IS_ENABLED().  This saves a line (and lots of characters)
but compiles just as efficienctly.

> +}
> +
>  /* Return number of callbacks in segmented callback list by summing seglen. */
>  long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
>  
>  void rcu_cblist_init(struct rcu_cblist *rclp);
>  void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
> +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp);
>  void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
>  			      struct rcu_cblist *srclp,
>  			      struct rcu_head *rhp);
> +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> +			      struct rcu_cblist *srclp,
> +			      struct rcu_head *rhp);

Please line up the "struct" keywords.  (Even more pickiness, I know...)

>  struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp);
>  
>  /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c25ba442044a..d2e3d6e176d2 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3098,7 +3098,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
>   * Implementation of these memory-ordering guarantees is described here:
>   * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
>   */

The above docbook comment needs to move to call_rcu().

> -void call_rcu(struct rcu_head *head, rcu_callback_t func)
> +static void
> +__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
>  {
>  	static atomic_t doublefrees;
>  	unsigned long flags;
> @@ -3139,7 +3140,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	}
>  
>  	check_cb_ovld(rdp);
> -	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> +	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
>  		return; // Enqueued onto ->nocb_bypass, so just leave.
>  	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
>  	rcu_segcblist_enqueue(&rdp->cblist, head);
> @@ -3161,8 +3162,21 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  		local_irq_restore(flags);
>  	}
>  }
> -EXPORT_SYMBOL_GPL(call_rcu);

Please add a docbook comment for call_rcu_lazy().  It can be brief, for
example, by referring to call_rcu()'s docbook comment for memory-ordering
details.

> +#ifdef CONFIG_RCU_LAZY
> +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
> +{
> +	return __call_rcu_common(head, func, true);
> +}
> +EXPORT_SYMBOL_GPL(call_rcu_lazy);
> +#endif
> +
> +void call_rcu(struct rcu_head *head, rcu_callback_t func)
> +{
> +	return __call_rcu_common(head, func, false);
> +
> +}
> +EXPORT_SYMBOL_GPL(call_rcu);
>  
>  /* Maximum number of jiffies to wait before draining a batch. */
>  #define KFREE_DRAIN_JIFFIES (HZ / 50)
> @@ -4056,7 +4070,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
>  	rdp->barrier_head.func = rcu_barrier_callback;
>  	debug_rcu_head_queue(&rdp->barrier_head);
>  	rcu_nocb_lock(rdp);
> -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
>  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
>  		atomic_inc(&rcu_state.barrier_cpu_count);
>  	} else {
> @@ -4476,7 +4490,7 @@ void rcutree_migrate_callbacks(int cpu)
>  	my_rdp = this_cpu_ptr(&rcu_data);
>  	my_rnp = my_rdp->mynode;
>  	rcu_nocb_lock(my_rdp); /* irqs already disabled. */
> -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies));
> +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false));
>  	raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */
>  	/* Leverage recent GPs and set GP for new callbacks. */
>  	needwake = rcu_advance_cbs(my_rnp, rdp) ||
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 2ccf5845957d..fec4fad6654b 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -267,8 +267,9 @@ struct rcu_data {
>  /* Values for nocb_defer_wakeup field in struct rcu_data. */
>  #define RCU_NOCB_WAKE_NOT	0
>  #define RCU_NOCB_WAKE_BYPASS	1
> -#define RCU_NOCB_WAKE		2
> -#define RCU_NOCB_WAKE_FORCE	3
> +#define RCU_NOCB_WAKE_LAZY	2
> +#define RCU_NOCB_WAKE		3
> +#define RCU_NOCB_WAKE_FORCE	4
>  
>  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
>  					/* For jiffies_till_first_fqs and */
> @@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
>  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
>  static void rcu_init_one_nocb(struct rcu_node *rnp);
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				  unsigned long j);
> +				  unsigned long j, bool lazy);
>  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				bool *was_alldone, unsigned long flags);
> +				bool *was_alldone, unsigned long flags,
> +				bool lazy);
>  static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
>  				 unsigned long flags);
>  static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index e369efe94fda..b9244f22e102 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -256,6 +256,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
>  	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
>  }

Comment on LAZY_FLUSH_JIFFIES purpose in life, please!  (At some point
more flexibility may be required, but let's not unnecessarily rush
into that.)

> +#define LAZY_FLUSH_JIFFIES (10 * HZ)
> +
>  /*
>   * Arrange to wake the GP kthread for this NOCB group at some future
>   * time when it is safe to do so.
> @@ -272,7 +274,10 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>  	 * Bypass wakeup overrides previous deferments. In case
>  	 * of callback storm, no need to wake up too early.
>  	 */
> -	if (waketype == RCU_NOCB_WAKE_BYPASS) {
> +	if (waketype == RCU_NOCB_WAKE_LAZY) {

Presumably we get here only if all of this CPU's callbacks are lazy?

> +		mod_timer(&rdp_gp->nocb_timer, jiffies + LAZY_FLUSH_JIFFIES);
> +		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> +	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
>  		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
>  		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
>  	} else {
> @@ -296,7 +301,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>   * Note that this function always returns true if rhp is NULL.
>   */
>  static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				     unsigned long j)
> +				     unsigned long j, bool lazy)
>  {
>  	struct rcu_cblist rcl;
>  
> @@ -310,7 +315,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	/* Note: ->cblist.len already accounts for ->nocb_bypass contents. */
>  	if (rhp)
>  		rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
> -	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
> +
> +	trace_printk("call_rcu_lazy callbacks = %ld\n", READ_ONCE(rdp->nocb_bypass.lazy_len));

This debug code need to go, of course.  If you would like, you could
replace it with a trace event.

> +	/* The lazy CBs are being flushed, but a new one might be enqueued. */
> +	if (lazy)
> +		rcu_cblist_flush_enqueue_lazy(&rcl, &rdp->nocb_bypass, rhp);
> +	else
> +		rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);

Shouldn't these be a single function with a "lazy" argument, as noted
earlier?

>  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
>  	WRITE_ONCE(rdp->nocb_bypass_first, j);
>  	rcu_nocb_bypass_unlock(rdp);
> @@ -326,13 +337,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>   * Note that this function always returns true if rhp is NULL.
>   */
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				  unsigned long j)
> +				  unsigned long j, bool lazy)
>  {
>  	if (!rcu_rdp_is_offloaded(rdp))
>  		return true;
>  	rcu_lockdep_assert_cblist_protected(rdp);
>  	rcu_nocb_bypass_lock(rdp);
> -	return rcu_nocb_do_flush_bypass(rdp, rhp, j);
> +	return rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy);
>  }
>  
>  /*
> @@ -345,7 +356,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
>  	if (!rcu_rdp_is_offloaded(rdp) ||
>  	    !rcu_nocb_bypass_trylock(rdp))
>  		return;
> -	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
> +	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false));
>  }
>  
>  /*
> @@ -367,12 +378,14 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
>   * there is only one CPU in operation.
>   */
>  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				bool *was_alldone, unsigned long flags)
> +				bool *was_alldone, unsigned long flags,
> +				bool lazy)
>  {
>  	unsigned long c;
>  	unsigned long cur_gp_seq;
>  	unsigned long j = jiffies;
>  	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> +	long n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>  
>  	lockdep_assert_irqs_disabled();
>  
> @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	}
>  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
>  
> -	// If there hasn't yet been all that many ->cblist enqueues
> -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> -	// ->nocb_bypass first.
> -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> +	// If caller passed a non-lazy CB and there hasn't yet been all that
> +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> +	// number of CBs (lazy + non-lazy) grows too much.
> +	//
> +	// Note that if the bypass list has lazy CBs, and the main list is
> +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> +	// the lazy CBs to the main list as well. That's the right thing to do,
> +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> +	// one, we can just reuse that GP for the already queued-up lazy ones.
> +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> +	    (lazy && n_lazy_cbs >= qhimark)) {
>  		rcu_nocb_lock(rdp);
>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>  		if (*was_alldone)
>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> -					    TPS("FirstQ"));
> -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));

The "false" here instead of "lazy" is because the caller is to do the
enqueuing, correct?

>  		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
>  		return false; // Caller must enqueue the callback.
>  	}
>  
>  	// If ->nocb_bypass has been used too long or is too full,
>  	// flush ->nocb_bypass to ->cblist.
> -	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
> -	    ncbs >= qhimark) {
> +	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
>  		rcu_nocb_lock(rdp);
> -		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
> +		if (!rcu_nocb_flush_bypass(rdp, rhp, j, true)) {

But shouldn't this "true" be "lazy"?  I don't see how we are guaranteed
that the callback is in fact lazy at this point in the code.  Also,
there is not yet a guarantee that the caller will do the enqueuing.
So what am I missing?

>  			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>  			if (*was_alldone)
>  				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> -						    TPS("FirstQ"));
> +						    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
>  			WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
>  			return false; // Caller must enqueue the callback.
>  		}
> @@ -455,12 +475,20 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	rcu_nocb_wait_contended(rdp);
>  	rcu_nocb_bypass_lock(rdp);
>  	ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> +	n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>  	rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
> -	rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);
> +	if (lazy)
> +		rcu_cblist_enqueue_lazy(&rdp->nocb_bypass, rhp);
> +	else
> +		rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);

And this is one reason to add the "lazy" parameter to rcu_cblist_enqueue().

>  	if (!ncbs) {
>  		WRITE_ONCE(rdp->nocb_bypass_first, j);
> -		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQ"));
> +		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> +				    lazy ? TPS("FirstLazyBQ") : TPS("FirstBQ"));
> +	} else if (!n_lazy_cbs && lazy) {
> +		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstLazyBQ"));

But in this case, there already are callbacks.  In that case, how does it
help to keep track of the laziness of this callback?

In fact, aren't the only meaningful states "empty", "all lazy", and
"at least one non-lazy callback"?  After all, if you have at least one
non-lazy callback, it needs to be business as usual, correct?

>  	}
> +
>  	rcu_nocb_bypass_unlock(rdp);
>  	smp_mb(); /* Order enqueue before wake. */
>  	if (ncbs) {
> @@ -493,7 +521,7 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  {
>  	unsigned long cur_gp_seq;
>  	unsigned long j;
> -	long len;
> +	long len, lazy_len, bypass_len;
>  	struct task_struct *t;
>  
>  	// If we are being polled or there is no kthread, just leave.
> @@ -506,9 +534,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
>  	}
>  	// Need to actually to a wakeup.
>  	len = rcu_segcblist_n_cbs(&rdp->cblist);
> +	bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> +	lazy_len = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>  	if (was_alldone) {
>  		rdp->qlen_last_fqs_check = len;
> -		if (!irqs_disabled_flags(flags)) {
> +		// Only lazy CBs in bypass list
> +		if (lazy_len && bypass_len == lazy_len) {

And this is one piece of evidence -- you are only checking for all
callbacks being lazy.  So do you really need to count them, as
opposed to note that all are lazy on the one hand or some are
non-lazy on the other?

> +			rcu_nocb_unlock_irqrestore(rdp, flags);
> +			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
> +					   TPS("WakeLazy"));
> +		} else if (!irqs_disabled_flags(flags)) {
>  			/* ... if queue was empty ... */
>  			rcu_nocb_unlock_irqrestore(rdp, flags);
>  			wake_nocb_gp(rdp, false);
> @@ -599,8 +634,8 @@ static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp,
>   */
>  static void nocb_gp_wait(struct rcu_data *my_rdp)
>  {
> -	bool bypass = false;
> -	long bypass_ncbs;
> +	bool bypass = false, lazy = false;
> +	long bypass_ncbs, lazy_ncbs;
>  	int __maybe_unused cpu = my_rdp->cpu;
>  	unsigned long cur_gp_seq;
>  	unsigned long flags;
> @@ -648,12 +683,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  			continue;
>  		}
>  		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> -		if (bypass_ncbs &&
> +		lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> +		if (lazy_ncbs &&

This one works either way.  The current approach works because the
timeout is longer.  If you have exceeded the lazy timeout, you don't
care that there are non-lazy callbacks queued.

> +		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + LAZY_FLUSH_JIFFIES) ||
> +		     bypass_ncbs > qhimark)) {
> +			// Bypass full or old, so flush it.
> +			(void)rcu_nocb_try_flush_bypass(rdp, j);
> +			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> +			lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> +		} else if (bypass_ncbs &&
>  		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
>  		     bypass_ncbs > 2 * qhimark)) {
>  			// Bypass full or old, so flush it.
>  			(void)rcu_nocb_try_flush_bypass(rdp, j);
>  			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> +			lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);

And these two "if" bodies are identical, correct?  Can they be merged?

>  		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
>  			rcu_nocb_unlock_irqrestore(rdp, flags);
>  			if (needwake_state)
> @@ -662,8 +706,11 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  		}
>  		if (bypass_ncbs) {
>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> -					    TPS("Bypass"));
> -			bypass = true;
> +				    bypass_ncbs == lazy_ncbs ? TPS("Lazy") : TPS("Bypass"));
> +			if (bypass_ncbs == lazy_ncbs)
> +				lazy = true;
> +			else
> +				bypass = true;

Another place where a lazy flag rather than count would suit.

>  		}
>  		rnp = rdp->mynode;
>  
> @@ -713,12 +760,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>  	my_rdp->nocb_gp_gp = needwait_gp;
>  	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
>  
> -	if (bypass && !rcu_nocb_poll) {
> -		// At least one child with non-empty ->nocb_bypass, so set
> -		// timer in order to avoid stranding its callbacks.
> -		wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
> -				   TPS("WakeBypassIsDeferred"));
> +	// At least one child with non-empty ->nocb_bypass, so set
> +	// timer in order to avoid stranding its callbacks.
> +	if (!rcu_nocb_poll) {
> +		// If bypass list only has lazy CBs. Add a deferred
> +		// lazy wake up.
> +		if (lazy && !bypass) {
> +			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY,
> +					TPS("WakeLazyIsDeferred"));
> +		// Otherwise add a deferred bypass wake up.
> +		} else if (bypass) {
> +			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
> +					TPS("WakeBypassIsDeferred"));
> +		}
>  	}
> +
>  	if (rcu_nocb_poll) {
>  		/* Polling, so trace if first poll in the series. */
>  		if (gotcbs)
> @@ -999,7 +1055,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
>  	 * return false, which means that future calls to rcu_nocb_try_bypass()
>  	 * will refuse to put anything into the bypass.
>  	 */
> -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
>  	/*
>  	 * Start with invoking rcu_core() early. This way if the current thread
>  	 * happens to preempt an ongoing call to rcu_core() in the middle,
> @@ -1500,13 +1556,14 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
>  }
>  
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				  unsigned long j)
> +				  unsigned long j, bool lazy)
>  {
>  	return true;
>  }
>  
>  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> -				bool *was_alldone, unsigned long flags)
> +				bool *was_alldone, unsigned long flags,
> +				bool lazy)
>  {
>  	return false;
>  }
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-22 23:18   ` Joel Fernandes
@ 2022-06-26  4:00     ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26  4:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Wed, Jun 22, 2022 at 11:18:02PM +0000, Joel Fernandes wrote:
> On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> [..]
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 2ccf5845957d..fec4fad6654b 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -267,8 +267,9 @@ struct rcu_data {
> >  /* Values for nocb_defer_wakeup field in struct rcu_data. */
> >  #define RCU_NOCB_WAKE_NOT	0
> >  #define RCU_NOCB_WAKE_BYPASS	1
> > -#define RCU_NOCB_WAKE		2
> > -#define RCU_NOCB_WAKE_FORCE	3
> > +#define RCU_NOCB_WAKE_LAZY	2
> > +#define RCU_NOCB_WAKE		3
> > +#define RCU_NOCB_WAKE_FORCE	4
> >  
> >  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
> >  					/* For jiffies_till_first_fqs and */
> > @@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
> >  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
> >  static void rcu_init_one_nocb(struct rcu_node *rnp);
> >  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				  unsigned long j);
> > +				  unsigned long j, bool lazy);
> >  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				bool *was_alldone, unsigned long flags);
> > +				bool *was_alldone, unsigned long flags,
> > +				bool lazy);
> >  static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
> >  				 unsigned long flags);
> >  static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index e369efe94fda..b9244f22e102 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -256,6 +256,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> >  	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
> >  }
> >  
> > +#define LAZY_FLUSH_JIFFIES (10 * HZ)
> > +
> >  /*
> >   * Arrange to wake the GP kthread for this NOCB group at some future
> >   * time when it is safe to do so.
> > @@ -272,7 +274,10 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> >  	 * Bypass wakeup overrides previous deferments. In case
> >  	 * of callback storm, no need to wake up too early.
> >  	 */
> > -	if (waketype == RCU_NOCB_WAKE_BYPASS) {
> > +	if (waketype == RCU_NOCB_WAKE_LAZY) {
> > +		mod_timer(&rdp_gp->nocb_timer, jiffies + LAZY_FLUSH_JIFFIES);
> > +		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> > +	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
> >  		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
> >  		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> >  	} else {
> > @@ -296,7 +301,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> >   * Note that this function always returns true if rhp is NULL.
> >   */
> >  static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				     unsigned long j)
> > +				     unsigned long j, bool lazy)
> >  {
> >  	struct rcu_cblist rcl;
> >  
> > @@ -310,7 +315,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >  	/* Note: ->cblist.len already accounts for ->nocb_bypass contents. */
> >  	if (rhp)
> >  		rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
> > -	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
> > +
> > +	trace_printk("call_rcu_lazy callbacks = %ld\n", READ_ONCE(rdp->nocb_bypass.lazy_len));
> 
> Before anyone yells at me, that trace_printk() has been politely asked to take
> a walk :-). It got mad at me, but on the next iteration, it wont be there.

;-) ;-) ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing
  2022-06-22 22:50 ` [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing Joel Fernandes (Google)
@ 2022-06-26  4:06   ` Paul E. McKenney
  2022-06-26 13:45     ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26  4:06 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote:
> We notice that rcu_barrier() can take a really long time. It appears
> that this can happen when all CBs are lazy and the timer does not fire
> yet. So after flushing, nothing wakes up GP thread. This patch forces
> GP thread to wake when bypass flushing happens, this fixes the
> rcu_barrier() delays with lazy CBs.

I am wondering if there is a bug in non-rcu_barrier() lazy callback
processing hiding here as well?

							Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree_nocb.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 2f5da12811a5..b481f1ea57c0 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -325,6 +325,8 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
>  	WRITE_ONCE(rdp->nocb_bypass_first, j);
>  	rcu_nocb_bypass_unlock(rdp);
> +
> +	wake_nocb_gp(rdp, true);
>  	return true;
>  }
>  
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
                     ` (2 preceding siblings ...)
  2022-06-23  8:10   ` kernel test robot
@ 2022-06-26  4:13   ` Paul E. McKenney
  2022-07-08  4:25     ` Joel Fernandes
  3 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26  4:13 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:51:00PM +0000, Joel Fernandes (Google) wrote:
> Reuse the kfree_rcu() test in order to be able to compare the memory reclaiming
> properties of call_rcu_lazy() with kfree_rcu().
> 
> With this test, we find similar memory footprint and time call_rcu_lazy()
> free'ing takes compared to kfree_rcu(). Also we confirm that call_rcu_lazy()
> can survive OOM during extremely frequent calls.
> 
> If we really push it, i.e. boot system with low memory and compare
> kfree_rcu() with call_rcu_lazy(), I find that call_rcu_lazy() is more
> resilient and is much harder to produce OOM as compared to kfree_rcu().

Another approach would be to make rcutorture's forward-progress testing
able to use call_rcu_lazy().  This would test lazy callback flooding.

Yet another approach would be to keep one CPU idle other than a
kthread doing call_rcu_lazy().  Of course "idle" includes redirecting
those pesky interrupts.

It is almost certainly necessary for rcutorture to exercise the
call_rcu_lazy() path regularly.

							Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/rcu.h       |  6 ++++
>  kernel/rcu/rcuscale.c  | 64 +++++++++++++++++++++++++++++++++++++++++-
>  kernel/rcu/tree_nocb.h | 17 ++++++++++-
>  3 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 71c0f45e70c3..436faf80a66b 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -473,6 +473,12 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
>  			       unsigned long c);
>  void rcu_gp_set_torture_wait(int duration);
>  void rcu_force_call_rcu_to_lazy(bool force);
> +
> +#if IS_ENABLED(CONFIG_RCU_SCALE_TEST)
> +unsigned long rcu_scale_get_jiffies_till_flush(void);
> +void rcu_scale_set_jiffies_till_flush(unsigned long j);
> +#endif
> +
>  #else
>  static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
>  					  int *flags, unsigned long *gp_seq)
> diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
> index 277a5bfb37d4..58ee5c2cb37b 100644
> --- a/kernel/rcu/rcuscale.c
> +++ b/kernel/rcu/rcuscale.c
> @@ -95,6 +95,7 @@ torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
>  torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
>  torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
>  torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
> +torture_param(int, kfree_rcu_by_lazy, 0, "Use call_rcu_lazy() to emulate kfree_rcu()?");
>  
>  static char *scale_type = "rcu";
>  module_param(scale_type, charp, 0444);
> @@ -658,6 +659,13 @@ struct kfree_obj {
>  	struct rcu_head rh;
>  };
>  
> +/* Used if doing RCU-kfree'ing via call_rcu_lazy(). */
> +void kfree_rcu_lazy(struct rcu_head *rh)
> +{
> +	struct kfree_obj *obj = container_of(rh, struct kfree_obj, rh);
> +	kfree(obj);
> +}
> +
>  static int
>  kfree_scale_thread(void *arg)
>  {
> @@ -695,6 +703,11 @@ kfree_scale_thread(void *arg)
>  			if (!alloc_ptr)
>  				return -ENOMEM;
>  
> +			if (kfree_rcu_by_lazy) {
> +				call_rcu_lazy(&(alloc_ptr->rh), kfree_rcu_lazy);
> +				continue;
> +			}
> +
>  			// By default kfree_rcu_test_single and kfree_rcu_test_double are
>  			// initialized to false. If both have the same value (false or true)
>  			// both are randomly tested, otherwise only the one with value true
> @@ -737,6 +750,9 @@ kfree_scale_cleanup(void)
>  {
>  	int i;
>  
> +	if (kfree_rcu_by_lazy)
> +		rcu_force_call_rcu_to_lazy(false);
> +
>  	if (torture_cleanup_begin())
>  		return;
>  
> @@ -766,11 +782,55 @@ kfree_scale_shutdown(void *arg)
>  	return -EINVAL;
>  }
>  
> +// Used if doing RCU-kfree'ing via call_rcu_lazy().
> +unsigned long jiffies_at_lazy_cb;
> +struct rcu_head lazy_test1_rh;
> +int rcu_lazy_test1_cb_called;
> +void call_rcu_lazy_test1(struct rcu_head *rh)
> +{
> +	jiffies_at_lazy_cb = jiffies;
> +	WRITE_ONCE(rcu_lazy_test1_cb_called, 1);
> +}
> +
>  static int __init
>  kfree_scale_init(void)
>  {
>  	long i;
>  	int firsterr = 0;
> +	unsigned long orig_jif, jif_start;
> +
> +	// Force all call_rcu() to call_rcu_lazy() so that non-lazy CBs
> +	// do not remove laziness of the lazy ones (since the test tries
> +	// to stress call_rcu_lazy() for OOM).
> +	//
> +	// Also, do a quick self-test to ensure laziness is as much as
> +	// expected.
> +	if (kfree_rcu_by_lazy) {
> +		/* do a test to check the timeout. */
> +		orig_jif = rcu_scale_get_jiffies_till_flush();
> +
> +		rcu_force_call_rcu_to_lazy(true);
> +		rcu_scale_set_jiffies_till_flush(2 * HZ);
> +		rcu_barrier();
> +
> +		jif_start = jiffies;
> +		jiffies_at_lazy_cb = 0;
> +		call_rcu_lazy(&lazy_test1_rh, call_rcu_lazy_test1);
> +
> +		smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
> +
> +		rcu_scale_set_jiffies_till_flush(orig_jif);
> +
> +		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
> +			pr_alert("Lazy CBs are not being lazy as expected!\n");
> +			return -1;
> +		}
> +
> +		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
> +			pr_alert("Lazy CBs are being too lazy!\n");
> +			return -1;
> +		}
> +	}
>  
>  	kfree_nrealthreads = compute_real(kfree_nthreads);
>  	/* Start up the kthreads. */
> @@ -783,7 +843,9 @@ kfree_scale_init(void)
>  		schedule_timeout_uninterruptible(1);
>  	}
>  
> -	pr_alert("kfree object size=%zu\n", kfree_mult * sizeof(struct kfree_obj));
> +	pr_alert("kfree object size=%zu, kfree_rcu_by_lazy=%d\n",
> +			kfree_mult * sizeof(struct kfree_obj),
> +			kfree_rcu_by_lazy);
>  
>  	kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
>  			       GFP_KERNEL);
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index b481f1ea57c0..255f2945b0fc 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -257,6 +257,21 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
>  }
>  
>  #define LAZY_FLUSH_JIFFIES (10 * HZ)
> +unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
> +
> +#ifdef CONFIG_RCU_SCALE_TEST
> +void rcu_scale_set_jiffies_till_flush(unsigned long jif)
> +{
> +	jiffies_till_flush = jif;
> +}
> +EXPORT_SYMBOL(rcu_scale_set_jiffies_till_flush);
> +
> +unsigned long rcu_scale_get_jiffies_till_flush(void)
> +{
> +	return jiffies_till_flush;
> +}
> +EXPORT_SYMBOL(rcu_scale_get_jiffies_till_flush);
> +#endif
>  
>  /*
>   * Arrange to wake the GP kthread for this NOCB group at some future
> @@ -275,7 +290,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
>  	 * of callback storm, no need to wake up too early.
>  	 */
>  	if (waketype == RCU_NOCB_WAKE_LAZY) {
> -		mod_timer(&rdp_gp->nocb_timer, jiffies + LAZY_FLUSH_JIFFIES);
> +		mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
>  		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
>  	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
>  		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-22 22:51 ` [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value Joel Fernandes (Google)
@ 2022-06-26  4:17   ` Paul E. McKenney
  2022-06-27 18:56   ` Uladzislau Rezki
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26  4:17 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:51:02PM +0000, Joel Fernandes (Google) wrote:
> As per the comments in include/linux/shrinker.h, .count_objects callback
> should return the number of freeable items, but if there are no objects
> to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> should be when we are unable to determine the number of objects, or the
> cache should be skipped for another reason.

Good catch!

							Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 711679d10cbb..935788e8d2d7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
>  		atomic_set(&krcp->backoff_page_cache_fill, 1);
>  	}
>  
> -	return count;
> +	return count == 0 ? SHRINK_EMPTY : count;
>  }
>  
>  static unsigned long
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing
  2022-06-26  4:06   ` Paul E. McKenney
@ 2022-06-26 13:45     ` Joel Fernandes
  2022-06-26 13:52       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-26 13:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sat, Jun 25, 2022 at 09:06:22PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote:
> > We notice that rcu_barrier() can take a really long time. It appears
> > that this can happen when all CBs are lazy and the timer does not fire
> > yet. So after flushing, nothing wakes up GP thread. This patch forces
> > GP thread to wake when bypass flushing happens, this fixes the
> > rcu_barrier() delays with lazy CBs.
> 
> I am wondering if there is a bug in non-rcu_barrier() lazy callback
> processing hiding here as well?

I don't think so because in both nocb_try_bypass and nocb_gp_wait, we are not
going to an indefinite sleep after the flush. However, with rcu_barrier() ,
there is nothing to keep the RCU GP thread awake. That's my theory at least.
In practice, I have not been able to reproduce this issue with
non-rcu_barrier().

With rcu_barrier() I happen to hit it thanks to the rcuscale changes I did.
That's an interesting story! As I apply call_rcu_lazy() to the file table
code, turns out that on boot, the initram unpacking code continously triggers
call_rcu_lazy(). This happens apparently in a different thread than the one
that rcuscale is running in. In rcuscale, I did rcu_barrier() at init time
and this stalled for a long time to my surprise, and this patch fixed it.

thanks,

 - Joel


> 
> 							Thanx, Paul
> 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree_nocb.h | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 2f5da12811a5..b481f1ea57c0 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -325,6 +325,8 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
> >  	WRITE_ONCE(rdp->nocb_bypass_first, j);
> >  	rcu_nocb_bypass_unlock(rdp);
> > +
> > +	wake_nocb_gp(rdp, true);
> >  	return true;
> >  }
> >  
> > -- 
> > 2.37.0.rc0.104.g0611611a94-goog
> > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing
  2022-06-26 13:45     ` Joel Fernandes
@ 2022-06-26 13:52       ` Paul E. McKenney
  2022-06-26 14:37         ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-26 13:52 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sun, Jun 26, 2022 at 01:45:32PM +0000, Joel Fernandes wrote:
> On Sat, Jun 25, 2022 at 09:06:22PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote:
> > > We notice that rcu_barrier() can take a really long time. It appears
> > > that this can happen when all CBs are lazy and the timer does not fire
> > > yet. So after flushing, nothing wakes up GP thread. This patch forces
> > > GP thread to wake when bypass flushing happens, this fixes the
> > > rcu_barrier() delays with lazy CBs.
> > 
> > I am wondering if there is a bug in non-rcu_barrier() lazy callback
> > processing hiding here as well?
> 
> I don't think so because in both nocb_try_bypass and nocb_gp_wait, we are not
> going to an indefinite sleep after the flush. However, with rcu_barrier() ,
> there is nothing to keep the RCU GP thread awake. That's my theory at least.
> In practice, I have not been able to reproduce this issue with
> non-rcu_barrier().
> 
> With rcu_barrier() I happen to hit it thanks to the rcuscale changes I did.
> That's an interesting story! As I apply call_rcu_lazy() to the file table
> code, turns out that on boot, the initram unpacking code continously triggers
> call_rcu_lazy(). This happens apparently in a different thread than the one
> that rcuscale is running in. In rcuscale, I did rcu_barrier() at init time
> and this stalled for a long time to my surprise, and this patch fixed it.

Cool!

Then should this wake_nocb_gp() instead go into the rcu_barrier()
code path?  As shown below, wouldn't we be doing some spurious wakeups?

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree_nocb.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index 2f5da12811a5..b481f1ea57c0 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -325,6 +325,8 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > >  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
> > >  	WRITE_ONCE(rdp->nocb_bypass_first, j);
> > >  	rcu_nocb_bypass_unlock(rdp);
> > > +
> > > +	wake_nocb_gp(rdp, true);
> > >  	return true;
> > >  }
> > >  
> > > -- 
> > > 2.37.0.rc0.104.g0611611a94-goog
> > > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing
  2022-06-26 13:52       ` Paul E. McKenney
@ 2022-06-26 14:37         ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2022-06-26 14:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sun, Jun 26, 2022 at 06:52:40AM -0700, Paul E. McKenney wrote:
> On Sun, Jun 26, 2022 at 01:45:32PM +0000, Joel Fernandes wrote:
> > On Sat, Jun 25, 2022 at 09:06:22PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote:
> > > > We notice that rcu_barrier() can take a really long time. It appears
> > > > that this can happen when all CBs are lazy and the timer does not fire
> > > > yet. So after flushing, nothing wakes up GP thread. This patch forces
> > > > GP thread to wake when bypass flushing happens, this fixes the
> > > > rcu_barrier() delays with lazy CBs.
> > > 
> > > I am wondering if there is a bug in non-rcu_barrier() lazy callback
> > > processing hiding here as well?
> > 
> > I don't think so because in both nocb_try_bypass and nocb_gp_wait, we are not
> > going to an indefinite sleep after the flush. However, with rcu_barrier() ,
> > there is nothing to keep the RCU GP thread awake. That's my theory at least.
> > In practice, I have not been able to reproduce this issue with
> > non-rcu_barrier().
> > 
> > With rcu_barrier() I happen to hit it thanks to the rcuscale changes I did.
> > That's an interesting story! As I apply call_rcu_lazy() to the file table
> > code, turns out that on boot, the initram unpacking code continously triggers
> > call_rcu_lazy(). This happens apparently in a different thread than the one
> > that rcuscale is running in. In rcuscale, I did rcu_barrier() at init time
> > and this stalled for a long time to my surprise, and this patch fixed it.
> 
> Cool!
> 
> Then should this wake_nocb_gp() instead go into the rcu_barrier()
> code path?  As shown below, wouldn't we be doing some spurious wakeups?

You are right. In my testing, I don't see any issue with the extra wake up
which is going to happen anyway and my thought was why not do it so that a
future bypass flush from some other path forgets to call wake up.

I'll refine it to be rcu-barrier-only then.

thanks!

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-22 22:51 ` [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value Joel Fernandes (Google)
  2022-06-26  4:17   ` Paul E. McKenney
@ 2022-06-27 18:56   ` Uladzislau Rezki
  2022-06-27 20:59     ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Uladzislau Rezki @ 2022-06-27 18:56 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, paulmck, rostedt, vineeth

> As per the comments in include/linux/shrinker.h, .count_objects callback
> should return the number of freeable items, but if there are no objects
> to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> should be when we are unable to determine the number of objects, or the
> cache should be skipped for another reason.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 711679d10cbb..935788e8d2d7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
>  		atomic_set(&krcp->backoff_page_cache_fill, 1);
>  	}
>  
> -	return count;
> +	return count == 0 ? SHRINK_EMPTY : count;
>  }
>  
>  static unsigned long
> -- 
> 2.37.0.rc0.104.g0611611a94-goog
> 
Looks good to me!

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-27 18:56   ` Uladzislau Rezki
@ 2022-06-27 20:59     ` Paul E. McKenney
  2022-06-27 21:18       ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-27 20:59 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes (Google),
	rcu, linux-kernel, rushikesh.s.kadam, neeraj.iitr10, frederic,
	rostedt, vineeth

On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > As per the comments in include/linux/shrinker.h, .count_objects callback
> > should return the number of freeable items, but if there are no objects
> > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > should be when we are unable to determine the number of objects, or the
> > cache should be skipped for another reason.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 711679d10cbb..935788e8d2d7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> >  		atomic_set(&krcp->backoff_page_cache_fill, 1);
> >  	}
> >  
> > -	return count;
> > +	return count == 0 ? SHRINK_EMPTY : count;
> >  }
> >  
> >  static unsigned long
> > -- 
> > 2.37.0.rc0.104.g0611611a94-goog
> > 
> Looks good to me!
> 
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Now that you mention it, this does look independent of the rest of
the series.  I have pulled it in with Uladzislau's Reviewed-by.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-27 20:59     ` Paul E. McKenney
@ 2022-06-27 21:18       ` Joel Fernandes
  2022-06-27 21:43         ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-27 21:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, rcu, linux-kernel, rushikesh.s.kadam,
	neeraj.iitr10, frederic, rostedt, vineeth

On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > should return the number of freeable items, but if there are no objects
> > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > should be when we are unable to determine the number of objects, or the
> > > cache should be skipped for another reason.
> > > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 711679d10cbb..935788e8d2d7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > >  		atomic_set(&krcp->backoff_page_cache_fill, 1);
> > >  	}
> > >  
> > > -	return count;
> > > +	return count == 0 ? SHRINK_EMPTY : count;
> > >  }
> > >  
> > >  static unsigned long
> > > -- 
> > > 2.37.0.rc0.104.g0611611a94-goog
> > > 
> > Looks good to me!
> > 
> > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Now that you mention it, this does look independent of the rest of
> the series.  I have pulled it in with Uladzislau's Reviewed-by.

Thanks Paul and Vlad!

Paul, apologies for being quiet. I have been working on the series and the
review comments carefully. I appreciate your help with this work.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-27 21:18       ` Joel Fernandes
@ 2022-06-27 21:43         ` Paul E. McKenney
  2022-06-28 16:56           ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-27 21:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, rcu, linux-kernel, rushikesh.s.kadam,
	neeraj.iitr10, frederic, rostedt, vineeth

On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > should return the number of freeable items, but if there are no objects
> > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > should be when we are unable to determine the number of objects, or the
> > > > cache should be skipped for another reason.
> > > > 
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 711679d10cbb..935788e8d2d7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > >  		atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > >  	}
> > > >  
> > > > -	return count;
> > > > +	return count == 0 ? SHRINK_EMPTY : count;
> > > >  }
> > > >  
> > > >  static unsigned long
> > > > -- 
> > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > 
> > > Looks good to me!
> > > 
> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > 
> > Now that you mention it, this does look independent of the rest of
> > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> 
> Thanks Paul and Vlad!
> 
> Paul, apologies for being quiet. I have been working on the series and the
> review comments carefully. I appreciate your help with this work.

Not a problem.  After all, this stuff is changing some of the trickier
parts of RCU.  We must therefore assume that some significant time and
effort will be required to get it right.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-27 21:43         ` Paul E. McKenney
@ 2022-06-28 16:56           ` Joel Fernandes
  2022-06-28 21:13             ` Joel Fernandes
  2022-06-29 16:52             ` Paul E. McKenney
  0 siblings, 2 replies; 60+ messages in thread
From: Joel Fernandes @ 2022-06-28 16:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, rcu, linux-kernel, rushikesh.s.kadam,
	neeraj.iitr10, frederic, rostedt, vineeth

On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > should return the number of freeable items, but if there are no objects
> > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > should be when we are unable to determine the number of objects, or the
> > > > > cache should be skipped for another reason.
> > > > > 
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  kernel/rcu/tree.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > >  		atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > >  	}
> > > > >  
> > > > > -	return count;
> > > > > +	return count == 0 ? SHRINK_EMPTY : count;
> > > > >  }
> > > > >  
> > > > >  static unsigned long
> > > > > -- 
> > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > 
> > > > Looks good to me!
> > > > 
> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > 
> > > Now that you mention it, this does look independent of the rest of
> > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > 
> > Thanks Paul and Vlad!
> > 
> > Paul, apologies for being quiet. I have been working on the series and the
> > review comments carefully. I appreciate your help with this work.
> 
> Not a problem.  After all, this stuff is changing some of the trickier
> parts of RCU.  We must therefore assume that some significant time and
> effort will be required to get it right.

To your point about trickier parts of RCU, the v2 series though I tested it
before submitting is now giving me strange results with rcuscale. Sometimes
laziness does not seem to be in effect (as pointed out by rcuscale), other
times I am seeing stalls.

So I have to carefully look through all of this again. I am not sure why I
was not seeing these issues with the exact same code before (frustrated).

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-28 16:56           ` Joel Fernandes
@ 2022-06-28 21:13             ` Joel Fernandes
  2022-06-29 16:56               ` Paul E. McKenney
  2022-06-29 16:52             ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-28 21:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, rcu, LKML, Rushikesh S Kadam, Neeraj upadhyay,
	Frederic Weisbecker, Steven Rostedt, vineeth

On Tue, Jun 28, 2022 at 12:56 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > should return the number of freeable items, but if there are no objects
> > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > cache should be skipped for another reason.
> > > > > >
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > >               atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > >       }
> > > > > >
> > > > > > -     return count;
> > > > > > +     return count == 0 ? SHRINK_EMPTY : count;
> > > > > >  }
> > > > > >
> > > > > >  static unsigned long
> > > > > > --
> > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > >
> > > > > Looks good to me!
> > > > >
> > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > >
> > > > Now that you mention it, this does look independent of the rest of
> > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > >
> > > Thanks Paul and Vlad!
> > >
> > > Paul, apologies for being quiet. I have been working on the series and the
> > > review comments carefully. I appreciate your help with this work.
> >
> > Not a problem.  After all, this stuff is changing some of the trickier
> > parts of RCU.  We must therefore assume that some significant time and
> > effort will be required to get it right.
>
> To your point about trickier parts of RCU, the v2 series though I tested it
> before submitting is now giving me strange results with rcuscale. Sometimes
> laziness does not seem to be in effect (as pointed out by rcuscale), other
> times I am seeing stalls.
>
> So I have to carefully look through all of this again. I am not sure why I
> was not seeing these issues with the exact same code before (frustrated).

Looks like I found at least 3 bugs in my v2 series which testing
picked up now. RCU-lazy was being too lazy or not too lazy. Now tests
pass, so its progress but does beg for more testing:

On top of v2 series:
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index c06a96b6a18a..7021ee05155d 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -292,7 +292,8 @@ static void wake_nocb_gp_defer(struct rcu_data
*rdp, int waketype,
         */
        switch (waketype) {
                case RCU_NOCB_WAKE_LAZY:
-                       mod_jif = jiffies_till_flush;
+                       if (rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_LAZY)
+                               mod_jif = jiffies_till_flush;
                        break;

                case RCU_NOCB_WAKE_BYPASS:
@@ -714,13 +715,13 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
                bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
                lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
                if (lazy_ncbs &&
-                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
LAZY_FLUSH_JIFFIES) ||
+                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
jiffies_till_flush) ||
                     bypass_ncbs > qhimark)) {
                        // Bypass full or old, so flush it.
                        (void)rcu_nocb_try_flush_bypass(rdp, j);
                        bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
                        lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
-               } else if (bypass_ncbs &&
+               } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
                    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
                     bypass_ncbs > 2 * qhimark)) {
                        // Bypass full or old, so flush it.

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
                     ` (2 preceding siblings ...)
  2022-06-26  4:00   ` Paul E. McKenney
@ 2022-06-29 11:53   ` Frederic Weisbecker
  2022-06-29 17:05     ` Paul E. McKenney
  2022-06-29 20:29     ` Joel Fernandes
  3 siblings, 2 replies; 60+ messages in thread
From: Frederic Weisbecker @ 2022-06-29 11:53 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	paulmck, rostedt, vineeth

On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  	}
>  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
>  
> -	// If there hasn't yet been all that many ->cblist enqueues
> -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> -	// ->nocb_bypass first.
> -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> +	// If caller passed a non-lazy CB and there hasn't yet been all that
> +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> +	// number of CBs (lazy + non-lazy) grows too much.
> +	//
> +	// Note that if the bypass list has lazy CBs, and the main list is
> +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> +	// the lazy CBs to the main list as well. That's the right thing to do,
> +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> +	// one, we can just reuse that GP for the already queued-up lazy ones.
> +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> +	    (lazy && n_lazy_cbs >= qhimark)) {
>  		rcu_nocb_lock(rdp);
>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>  		if (*was_alldone)
>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> -					    TPS("FirstQ"));
> -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));

That's outside the scope of this patchset but this makes me realize we
unconditionally try to flush the bypass from call_rcu() fastpath, and
therefore we unconditionally lock the bypass lock from call_rcu() fastpath.

It shouldn't be contended at this stage since we are holding the nocb_lock
already, and only the local CPU can hold the nocb_bypass_lock without holding
the nocb_lock. But still...

It looks safe to locklessly early check if (rcu_cblist_n_cbs(&rdp->nocb_bypass))
before doing anything. Only the local CPU can enqueue to the bypass list.

Adding that to my TODO list...


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-28 16:56           ` Joel Fernandes
  2022-06-28 21:13             ` Joel Fernandes
@ 2022-06-29 16:52             ` Paul E. McKenney
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-29 16:52 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, rcu, linux-kernel, rushikesh.s.kadam,
	neeraj.iitr10, frederic, rostedt, vineeth

On Tue, Jun 28, 2022 at 04:56:14PM +0000, Joel Fernandes wrote:
> On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > should return the number of freeable items, but if there are no objects
> > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > cache should be skipped for another reason.
> > > > > > 
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > >  		atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > >  	}
> > > > > >  
> > > > > > -	return count;
> > > > > > +	return count == 0 ? SHRINK_EMPTY : count;
> > > > > >  }
> > > > > >  
> > > > > >  static unsigned long
> > > > > > -- 
> > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > > 
> > > > > Looks good to me!
> > > > > 
> > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > 
> > > > Now that you mention it, this does look independent of the rest of
> > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > > 
> > > Thanks Paul and Vlad!
> > > 
> > > Paul, apologies for being quiet. I have been working on the series and the
> > > review comments carefully. I appreciate your help with this work.
> > 
> > Not a problem.  After all, this stuff is changing some of the trickier
> > parts of RCU.  We must therefore assume that some significant time and
> > effort will be required to get it right.
> 
> To your point about trickier parts of RCU, the v2 series though I tested it
> before submitting is now giving me strange results with rcuscale. Sometimes
> laziness does not seem to be in effect (as pointed out by rcuscale), other
> times I am seeing stalls.
> 
> So I have to carefully look through all of this again. I am not sure why I
> was not seeing these issues with the exact same code before (frustrated).

This is one of the mechanisms behind that famous Brian Kerghnihan saying
about code being three times harder to debug than to write.  You see,
when you are writing the code, you only need to deal with that part of
the state space that you are aware of.  When you are debugging code,
the rest of the state space makes its presence known.

That is, if you are lucky.

If you are not so lucky, the rest of the state space waits to make
its presence known until your code is running some critical workload
in production.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-28 21:13             ` Joel Fernandes
@ 2022-06-29 16:56               ` Paul E. McKenney
  2022-06-29 19:47                 ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-29 16:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, rcu, LKML, Rushikesh S Kadam, Neeraj upadhyay,
	Frederic Weisbecker, Steven Rostedt, vineeth

On Tue, Jun 28, 2022 at 05:13:21PM -0400, Joel Fernandes wrote:
> On Tue, Jun 28, 2022 at 12:56 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > > should return the number of freeable items, but if there are no objects
> > > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > > cache should be skipped for another reason.
> > > > > > >
> > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > ---
> > > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > >               atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > > >       }
> > > > > > >
> > > > > > > -     return count;
> > > > > > > +     return count == 0 ? SHRINK_EMPTY : count;
> > > > > > >  }
> > > > > > >
> > > > > > >  static unsigned long
> > > > > > > --
> > > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > > >
> > > > > > Looks good to me!
> > > > > >
> > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > >
> > > > > Now that you mention it, this does look independent of the rest of
> > > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > > >
> > > > Thanks Paul and Vlad!
> > > >
> > > > Paul, apologies for being quiet. I have been working on the series and the
> > > > review comments carefully. I appreciate your help with this work.
> > >
> > > Not a problem.  After all, this stuff is changing some of the trickier
> > > parts of RCU.  We must therefore assume that some significant time and
> > > effort will be required to get it right.
> >
> > To your point about trickier parts of RCU, the v2 series though I tested it
> > before submitting is now giving me strange results with rcuscale. Sometimes
> > laziness does not seem to be in effect (as pointed out by rcuscale), other
> > times I am seeing stalls.
> >
> > So I have to carefully look through all of this again. I am not sure why I
> > was not seeing these issues with the exact same code before (frustrated).
> 
> Looks like I found at least 3 bugs in my v2 series which testing
> picked up now. RCU-lazy was being too lazy or not too lazy. Now tests
> pass, so its progress but does beg for more testing:

It is entirely possible that call_rcu_lazy() needs its own special
purpose tests.  This might be a separate test parallel to the test for
kfree_rcu() in kernel/rcu/rcuscale.c, for example.

For but one example, you might need to do bunch of call_rcu_lazy()
invocations, then keep the kernel completely quiet for long enough to
let the timer fire, and without anything else happening.

							Thanx, Paul

> On top of v2 series:
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index c06a96b6a18a..7021ee05155d 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -292,7 +292,8 @@ static void wake_nocb_gp_defer(struct rcu_data
> *rdp, int waketype,
>          */
>         switch (waketype) {
>                 case RCU_NOCB_WAKE_LAZY:
> -                       mod_jif = jiffies_till_flush;
> +                       if (rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_LAZY)
> +                               mod_jif = jiffies_till_flush;
>                         break;
> 
>                 case RCU_NOCB_WAKE_BYPASS:
> @@ -714,13 +715,13 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
>                 bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>                 lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
>                 if (lazy_ncbs &&
> -                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
> LAZY_FLUSH_JIFFIES) ||
> +                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
> jiffies_till_flush) ||
>                      bypass_ncbs > qhimark)) {
>                         // Bypass full or old, so flush it.
>                         (void)rcu_nocb_try_flush_bypass(rdp, j);
>                         bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
>                         lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> -               } else if (bypass_ncbs &&
> +               } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
>                     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
>                      bypass_ncbs > 2 * qhimark)) {
>                         // Bypass full or old, so flush it.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-29 11:53   ` Frederic Weisbecker
@ 2022-06-29 17:05     ` Paul E. McKenney
  2022-06-29 20:29     ` Joel Fernandes
  1 sibling, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-29 17:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Joel Fernandes (Google),
	rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	rostedt, vineeth

On Wed, Jun 29, 2022 at 01:53:49PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> > @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >  	}
> >  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
> >  
> > -	// If there hasn't yet been all that many ->cblist enqueues
> > -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> > -	// ->nocb_bypass first.
> > -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> > +	// If caller passed a non-lazy CB and there hasn't yet been all that
> > +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> > +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> > +	// number of CBs (lazy + non-lazy) grows too much.
> > +	//
> > +	// Note that if the bypass list has lazy CBs, and the main list is
> > +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> > +	// the lazy CBs to the main list as well. That's the right thing to do,
> > +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> > +	// one, we can just reuse that GP for the already queued-up lazy ones.
> > +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> > +	    (lazy && n_lazy_cbs >= qhimark)) {
> >  		rcu_nocb_lock(rdp);
> >  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> >  		if (*was_alldone)
> >  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > -					    TPS("FirstQ"));
> > -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> > +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> > +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> 
> That's outside the scope of this patchset but this makes me realize we
> unconditionally try to flush the bypass from call_rcu() fastpath, and
> therefore we unconditionally lock the bypass lock from call_rcu() fastpath.
> 
> It shouldn't be contended at this stage since we are holding the nocb_lock
> already, and only the local CPU can hold the nocb_bypass_lock without holding
> the nocb_lock. But still...
> 
> It looks safe to locklessly early check if (rcu_cblist_n_cbs(&rdp->nocb_bypass))
> before doing anything. Only the local CPU can enqueue to the bypass list.
> 
> Adding that to my TODO list...

That does sound like a potentially very helpful approach!  As always,
please analyze and test carefully!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-29 16:56               ` Paul E. McKenney
@ 2022-06-29 19:47                 ` Joel Fernandes
  2022-06-29 21:07                   ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-29 19:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, rcu, LKML, Rushikesh S Kadam, Neeraj upadhyay,
	Frederic Weisbecker, Steven Rostedt, vineeth

On Wed, Jun 29, 2022 at 09:56:27AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 28, 2022 at 05:13:21PM -0400, Joel Fernandes wrote:
> > On Tue, Jun 28, 2022 at 12:56 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > > > should return the number of freeable items, but if there are no objects
> > > > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > > > cache should be skipped for another reason.
> > > > > > > >
> > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > ---
> > > > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > >               atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > > > >       }
> > > > > > > >
> > > > > > > > -     return count;
> > > > > > > > +     return count == 0 ? SHRINK_EMPTY : count;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static unsigned long
> > > > > > > > --
> > > > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > > > >
> > > > > > > Looks good to me!
> > > > > > >
> > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > >
> > > > > > Now that you mention it, this does look independent of the rest of
> > > > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > > > >
> > > > > Thanks Paul and Vlad!
> > > > >
> > > > > Paul, apologies for being quiet. I have been working on the series and the
> > > > > review comments carefully. I appreciate your help with this work.
> > > >
> > > > Not a problem.  After all, this stuff is changing some of the trickier
> > > > parts of RCU.  We must therefore assume that some significant time and
> > > > effort will be required to get it right.
> > >
> > > To your point about trickier parts of RCU, the v2 series though I tested it
> > > before submitting is now giving me strange results with rcuscale. Sometimes
> > > laziness does not seem to be in effect (as pointed out by rcuscale), other
> > > times I am seeing stalls.
> > >
> > > So I have to carefully look through all of this again. I am not sure why I
> > > was not seeing these issues with the exact same code before (frustrated).
> > 
> > Looks like I found at least 3 bugs in my v2 series which testing
> > picked up now. RCU-lazy was being too lazy or not too lazy. Now tests
> > pass, so its progress but does beg for more testing:
> 
> It is entirely possible that call_rcu_lazy() needs its own special
> purpose tests.  This might be a separate test parallel to the test for
> kfree_rcu() in kernel/rcu/rcuscale.c, for example.

I see, perhaps I can add a 'lazy' flag to rcutorture as well, so it uses
call_rcu_lazy() for its async RCU invocations?

> For but one example, you might need to do bunch of call_rcu_lazy()
> invocations, then keep the kernel completely quiet for long enough to
> let the timer fire, and without anything else happening.

Yes, I sort of do that in rcuscale. There is a flood of call_rcu_lazy() due
to the FS code doing it. And, the timer does fire at the right time. I then
measure the time to make sure the timing matches, that's how I found the bugs
I earlier mentioned.

You had mentioned something like for testing earlier, I thought of trying it
out:

	It also helps to make rcutorture help you out if you have not
	already done so.  For example, providing some facility to allow
	rcu_torture_fwd_prog_cr() to flood with call_rcu_lazy() instead of and
	in addition to call_rcu().


thanks,

 - Joel


> 
> 							Thanx, Paul
> 
> > On top of v2 series:
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index c06a96b6a18a..7021ee05155d 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -292,7 +292,8 @@ static void wake_nocb_gp_defer(struct rcu_data
> > *rdp, int waketype,
> >          */
> >         switch (waketype) {
> >                 case RCU_NOCB_WAKE_LAZY:
> > -                       mod_jif = jiffies_till_flush;
> > +                       if (rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_LAZY)
> > +                               mod_jif = jiffies_till_flush;
> >                         break;
> > 
> >                 case RCU_NOCB_WAKE_BYPASS:
> > @@ -714,13 +715,13 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> >                 bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> >                 lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> >                 if (lazy_ncbs &&
> > -                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
> > LAZY_FLUSH_JIFFIES) ||
> > +                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
> > jiffies_till_flush) ||
> >                      bypass_ncbs > qhimark)) {
> >                         // Bypass full or old, so flush it.
> >                         (void)rcu_nocb_try_flush_bypass(rdp, j);
> >                         bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> >                         lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> > -               } else if (bypass_ncbs &&
> > +               } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
> >                     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> >                      bypass_ncbs > 2 * qhimark)) {
> >                         // Bypass full or old, so flush it.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-29 11:53   ` Frederic Weisbecker
  2022-06-29 17:05     ` Paul E. McKenney
@ 2022-06-29 20:29     ` Joel Fernandes
  2022-06-29 22:01       ` Frederic Weisbecker
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-29 20:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	paulmck, rostedt, vineeth

On Wed, Jun 29, 2022 at 01:53:49PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> > @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >  	}
> >  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
> >  
> > -	// If there hasn't yet been all that many ->cblist enqueues
> > -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> > -	// ->nocb_bypass first.
> > -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> > +	// If caller passed a non-lazy CB and there hasn't yet been all that
> > +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> > +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> > +	// number of CBs (lazy + non-lazy) grows too much.
> > +	//
> > +	// Note that if the bypass list has lazy CBs, and the main list is
> > +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> > +	// the lazy CBs to the main list as well. That's the right thing to do,
> > +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> > +	// one, we can just reuse that GP for the already queued-up lazy ones.
> > +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> > +	    (lazy && n_lazy_cbs >= qhimark)) {
> >  		rcu_nocb_lock(rdp);
> >  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> >  		if (*was_alldone)
> >  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > -					    TPS("FirstQ"));
> > -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> > +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> > +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> 
> That's outside the scope of this patchset but this makes me realize we
> unconditionally try to flush the bypass from call_rcu() fastpath, and
> therefore we unconditionally lock the bypass lock from call_rcu() fastpath.
> 
> It shouldn't be contended at this stage since we are holding the nocb_lock
> already, and only the local CPU can hold the nocb_bypass_lock without holding
> the nocb_lock. But still...
> 
> It looks safe to locklessly early check if (rcu_cblist_n_cbs(&rdp->nocb_bypass))
> before doing anything. Only the local CPU can enqueue to the bypass list.
> 
> Adding that to my TODO list...
> 

I am afraid I did not understand your comment. The bypass list lock is held
once we have decided to use the bypass list to queue something on to it.

The bypass flushing is also conditional on either the bypass cblist growing
too big or a jiffie elapsing since the first bypass queue.

So in both cases, acquiring the lock is conditional. What do you mean it is
unconditionally acquiring the bypass lock? Where?

Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-29 19:47                 ` Joel Fernandes
@ 2022-06-29 21:07                   ` Paul E. McKenney
  2022-06-30 14:25                     ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-29 21:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, rcu, LKML, Rushikesh S Kadam, Neeraj upadhyay,
	Frederic Weisbecker, Steven Rostedt, vineeth

On Wed, Jun 29, 2022 at 07:47:36PM +0000, Joel Fernandes wrote:
> On Wed, Jun 29, 2022 at 09:56:27AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 28, 2022 at 05:13:21PM -0400, Joel Fernandes wrote:
> > > On Tue, Jun 28, 2022 at 12:56 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > > > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > > > > should return the number of freeable items, but if there are no objects
> > > > > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > > > > cache should be skipped for another reason.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > ---
> > > > > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > > >               atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     return count;
> > > > > > > > > +     return count == 0 ? SHRINK_EMPTY : count;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >  static unsigned long
> > > > > > > > > --
> > > > > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > > > > >
> > > > > > > > Looks good to me!
> > > > > > > >
> > > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > >
> > > > > > > Now that you mention it, this does look independent of the rest of
> > > > > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > > > > >
> > > > > > Thanks Paul and Vlad!
> > > > > >
> > > > > > Paul, apologies for being quiet. I have been working on the series and the
> > > > > > review comments carefully. I appreciate your help with this work.
> > > > >
> > > > > Not a problem.  After all, this stuff is changing some of the trickier
> > > > > parts of RCU.  We must therefore assume that some significant time and
> > > > > effort will be required to get it right.
> > > >
> > > > To your point about trickier parts of RCU, the v2 series though I tested it
> > > > before submitting is now giving me strange results with rcuscale. Sometimes
> > > > laziness does not seem to be in effect (as pointed out by rcuscale), other
> > > > times I am seeing stalls.
> > > >
> > > > So I have to carefully look through all of this again. I am not sure why I
> > > > was not seeing these issues with the exact same code before (frustrated).
> > > 
> > > Looks like I found at least 3 bugs in my v2 series which testing
> > > picked up now. RCU-lazy was being too lazy or not too lazy. Now tests
> > > pass, so its progress but does beg for more testing:
> > 
> > It is entirely possible that call_rcu_lazy() needs its own special
> > purpose tests.  This might be a separate test parallel to the test for
> > kfree_rcu() in kernel/rcu/rcuscale.c, for example.
> 
> I see, perhaps I can add a 'lazy' flag to rcutorture as well, so it uses
> call_rcu_lazy() for its async RCU invocations?

That will be tricky because of rcutorture's timeliness expectations.

Maybe a self-invoking lazy callback initiated by rcu_torture_fakewriter()
that prints a line about its statistics at shutdown time?  At a minimum,
the number of times that it was invoked.  Better would be to print one
line summarizing stats for all of them.

The main thing that could be detected from this is a callback being
stranded.  Given that rcutorture enqueues non-lazy callbacks like a
drunken sailor, they won't end up being all that lazy.

> > For but one example, you might need to do bunch of call_rcu_lazy()
> > invocations, then keep the kernel completely quiet for long enough to
> > let the timer fire, and without anything else happening.
> 
> Yes, I sort of do that in rcuscale. There is a flood of call_rcu_lazy() due
> to the FS code doing it. And, the timer does fire at the right time. I then
> measure the time to make sure the timing matches, that's how I found the bugs
> I earlier mentioned.
> 
> You had mentioned something like for testing earlier, I thought of trying it
> out:
> 
> 	It also helps to make rcutorture help you out if you have not
> 	already done so.  For example, providing some facility to allow
> 	rcu_torture_fwd_prog_cr() to flood with call_rcu_lazy() instead of and
> 	in addition to call_rcu().

Sounds good!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > > On top of v2 series:
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index c06a96b6a18a..7021ee05155d 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -292,7 +292,8 @@ static void wake_nocb_gp_defer(struct rcu_data
> > > *rdp, int waketype,
> > >          */
> > >         switch (waketype) {
> > >                 case RCU_NOCB_WAKE_LAZY:
> > > -                       mod_jif = jiffies_till_flush;
> > > +                       if (rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_LAZY)
> > > +                               mod_jif = jiffies_till_flush;
> > >                         break;
> > > 
> > >                 case RCU_NOCB_WAKE_BYPASS:
> > > @@ -714,13 +715,13 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > >                 bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > >                 lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> > >                 if (lazy_ncbs &&
> > > -                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
> > > LAZY_FLUSH_JIFFIES) ||
> > > +                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) +
> > > jiffies_till_flush) ||
> > >                      bypass_ncbs > qhimark)) {
> > >                         // Bypass full or old, so flush it.
> > >                         (void)rcu_nocb_try_flush_bypass(rdp, j);
> > >                         bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > >                         lazy_ncbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> > > -               } else if (bypass_ncbs &&
> > > +               } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
> > >                     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > >                      bypass_ncbs > 2 * qhimark)) {
> > >                         // Bypass full or old, so flush it.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-29 20:29     ` Joel Fernandes
@ 2022-06-29 22:01       ` Frederic Weisbecker
  2022-06-30 14:08         ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Frederic Weisbecker @ 2022-06-29 22:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	paulmck, rostedt, vineeth

On Wed, Jun 29, 2022 at 08:29:48PM +0000, Joel Fernandes wrote:
> On Wed, Jun 29, 2022 at 01:53:49PM +0200, Frederic Weisbecker wrote:
> > On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> > > @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > >  	}
> > >  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > >  
> > > -	// If there hasn't yet been all that many ->cblist enqueues
> > > -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> > > -	// ->nocb_bypass first.
> > > -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> > > +	// If caller passed a non-lazy CB and there hasn't yet been all that
> > > +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> > > +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> > > +	// number of CBs (lazy + non-lazy) grows too much.
> > > +	//
> > > +	// Note that if the bypass list has lazy CBs, and the main list is
> > > +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> > > +	// the lazy CBs to the main list as well. That's the right thing to do,
> > > +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> > > +	// one, we can just reuse that GP for the already queued-up lazy ones.
> > > +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> > > +	    (lazy && n_lazy_cbs >= qhimark)) {
> > >  		rcu_nocb_lock(rdp);
> > >  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> > >  		if (*was_alldone)
> > >  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > > -					    TPS("FirstQ"));
> > > -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> > > +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> > > +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> > 
> > That's outside the scope of this patchset but this makes me realize we
> > unconditionally try to flush the bypass from call_rcu() fastpath, and
> > therefore we unconditionally lock the bypass lock from call_rcu() fastpath.
> > 
> > It shouldn't be contended at this stage since we are holding the nocb_lock
> > already, and only the local CPU can hold the nocb_bypass_lock without holding
> > the nocb_lock. But still...
> > 
> > It looks safe to locklessly early check if (rcu_cblist_n_cbs(&rdp->nocb_bypass))
> > before doing anything. Only the local CPU can enqueue to the bypass list.
> > 
> > Adding that to my TODO list...
> > 
> 
> I am afraid I did not understand your comment. The bypass list lock is held
> once we have decided to use the bypass list to queue something on to it.
> 
> The bypass flushing is also conditional on either the bypass cblist growing
> too big or a jiffie elapsing since the first bypass queue.
> 
> So in both cases, acquiring the lock is conditional. What do you mean it is
> unconditionally acquiring the bypass lock? Where?

Just to make sure we are talking about the same thing, I'm referring to this
path:

	// If there hasn't yet been all that many ->cblist enqueues
	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
	// ->nocb_bypass first.
	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
		rcu_nocb_lock(rdp);
		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
		if (*was_alldone)
			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
					    TPS("FirstQ"));
		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
		return false; // Caller must enqueue the callback.
	}

This is called whenever we decide not to queue to the bypass list because
there is no flooding detected (rdp->nocb_nobypass_count hasn't reached
nocb_nobypass_lim_per_jiffy for the current jiffy). I call this the fast path
because this is what I would except in a normal load, as opposed to callbacks
flooding.

And in this fastpath, the above rcu_nocb_flush_bypass() is unconditional.

> 
> Thanks!
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-29 22:01       ` Frederic Weisbecker
@ 2022-06-30 14:08         ` Joel Fernandes
  0 siblings, 0 replies; 60+ messages in thread
From: Joel Fernandes @ 2022-06-30 14:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	paulmck, rostedt, vineeth

On Thu, Jun 30, 2022 at 12:01:14AM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 29, 2022 at 08:29:48PM +0000, Joel Fernandes wrote:
> > On Wed, Jun 29, 2022 at 01:53:49PM +0200, Frederic Weisbecker wrote:
> > > On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> > > > @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > > >  	}
> > > >  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > > >  
> > > > -	// If there hasn't yet been all that many ->cblist enqueues
> > > > -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> > > > -	// ->nocb_bypass first.
> > > > -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> > > > +	// If caller passed a non-lazy CB and there hasn't yet been all that
> > > > +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> > > > +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> > > > +	// number of CBs (lazy + non-lazy) grows too much.
> > > > +	//
> > > > +	// Note that if the bypass list has lazy CBs, and the main list is
> > > > +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> > > > +	// the lazy CBs to the main list as well. That's the right thing to do,
> > > > +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> > > > +	// one, we can just reuse that GP for the already queued-up lazy ones.
> > > > +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> > > > +	    (lazy && n_lazy_cbs >= qhimark)) {
> > > >  		rcu_nocb_lock(rdp);
> > > >  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> > > >  		if (*was_alldone)
> > > >  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > > > -					    TPS("FirstQ"));
> > > > -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> > > > +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> > > > +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> > > 
> > > That's outside the scope of this patchset but this makes me realize we
> > > unconditionally try to flush the bypass from call_rcu() fastpath, and
> > > therefore we unconditionally lock the bypass lock from call_rcu() fastpath.
> > > 
> > > It shouldn't be contended at this stage since we are holding the nocb_lock
> > > already, and only the local CPU can hold the nocb_bypass_lock without holding
> > > the nocb_lock. But still...
> > > 
> > > It looks safe to locklessly early check if (rcu_cblist_n_cbs(&rdp->nocb_bypass))
> > > before doing anything. Only the local CPU can enqueue to the bypass list.
> > > 
> > > Adding that to my TODO list...
> > > 
> > 
> > I am afraid I did not understand your comment. The bypass list lock is held
> > once we have decided to use the bypass list to queue something on to it.
> > 
> > The bypass flushing is also conditional on either the bypass cblist growing
> > too big or a jiffie elapsing since the first bypass queue.
> > 
> > So in both cases, acquiring the lock is conditional. What do you mean it is
> > unconditionally acquiring the bypass lock? Where?
> 
> Just to make sure we are talking about the same thing, I'm referring to this
> path:
> 
> 	// If there hasn't yet been all that many ->cblist enqueues
> 	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> 	// ->nocb_bypass first.
> 	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> 		rcu_nocb_lock(rdp);
> 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> 		if (*was_alldone)
> 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> 					    TPS("FirstQ"));
> 		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> 		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
> 		return false; // Caller must enqueue the callback.
> 	}
> 
> This is called whenever we decide not to queue to the bypass list because
> there is no flooding detected (rdp->nocb_nobypass_count hasn't reached
> nocb_nobypass_lim_per_jiffy for the current jiffy). I call this the fast path
> because this is what I would except in a normal load, as opposed to callbacks
> flooding.
> 
> And in this fastpath, the above rcu_nocb_flush_bypass() is unconditional.

Sorry you are right, I see that now.

Another reason for why the contention is probably not a big deal (other than
the nocb lock being held), is that all other callers of the flush appear to
be in slow paths except for this one. Unless someone is offloading/deoffloading
rapidly or something :)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-29 21:07                   ` Paul E. McKenney
@ 2022-06-30 14:25                     ` Joel Fernandes
  2022-06-30 15:29                       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-06-30 14:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, rcu, LKML, Rushikesh S Kadam, Neeraj upadhyay,
	Frederic Weisbecker, Steven Rostedt, vineeth

On Wed, Jun 29, 2022 at 02:07:20PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 29, 2022 at 07:47:36PM +0000, Joel Fernandes wrote:
> > On Wed, Jun 29, 2022 at 09:56:27AM -0700, Paul E. McKenney wrote:
> > > On Tue, Jun 28, 2022 at 05:13:21PM -0400, Joel Fernandes wrote:
> > > > On Tue, Jun 28, 2022 at 12:56 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > >
> > > > > On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > > > > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > > > > > should return the number of freeable items, but if there are no objects
> > > > > > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > > > > > cache should be skipped for another reason.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > > ---
> > > > > > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > > > >               atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > > > > > >       }
> > > > > > > > > >
> > > > > > > > > > -     return count;
> > > > > > > > > > +     return count == 0 ? SHRINK_EMPTY : count;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > >  static unsigned long
> > > > > > > > > > --
> > > > > > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > > > > > >
> > > > > > > > > Looks good to me!
> > > > > > > > >
> > > > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > >
> > > > > > > > Now that you mention it, this does look independent of the rest of
> > > > > > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > > > > > >
> > > > > > > Thanks Paul and Vlad!
> > > > > > >
> > > > > > > Paul, apologies for being quiet. I have been working on the series and the
> > > > > > > review comments carefully. I appreciate your help with this work.
> > > > > >
> > > > > > Not a problem.  After all, this stuff is changing some of the trickier
> > > > > > parts of RCU.  We must therefore assume that some significant time and
> > > > > > effort will be required to get it right.
> > > > >
> > > > > To your point about trickier parts of RCU, the v2 series though I tested it
> > > > > before submitting is now giving me strange results with rcuscale. Sometimes
> > > > > laziness does not seem to be in effect (as pointed out by rcuscale), other
> > > > > times I am seeing stalls.
> > > > >
> > > > > So I have to carefully look through all of this again. I am not sure why I
> > > > > was not seeing these issues with the exact same code before (frustrated).
> > > > 
> > > > Looks like I found at least 3 bugs in my v2 series which testing
> > > > picked up now. RCU-lazy was being too lazy or not too lazy. Now tests
> > > > pass, so its progress but does beg for more testing:
> > > 
> > > It is entirely possible that call_rcu_lazy() needs its own special
> > > purpose tests.  This might be a separate test parallel to the test for
> > > kfree_rcu() in kernel/rcu/rcuscale.c, for example.
> > 
> > I see, perhaps I can add a 'lazy' flag to rcutorture as well, so it uses
> > call_rcu_lazy() for its async RCU invocations?
> 
> That will be tricky because of rcutorture's timeliness expectations.

I have facility now to set the lazy timeout from test kernel modules. I was
thinking I could set the same from rcu torture. Maybe something like a 100
jiffies? Then it can run through all the regular rcutorture tests and
still exercise the new code paths.

> Maybe a self-invoking lazy callback initiated by rcu_torture_fakewriter()
> that prints a line about its statistics at shutdown time?  At a minimum,
> the number of times that it was invoked.  Better would be to print one
> line summarizing stats for all of them.
> 
> The main thing that could be detected from this is a callback being
> stranded.  Given that rcutorture enqueues non-lazy callbacks like a
> drunken sailor, they won't end up being all that lazy.

Thanks for this idea as well. I'll think more about it. thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value
  2022-06-30 14:25                     ` Joel Fernandes
@ 2022-06-30 15:29                       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-06-30 15:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, rcu, LKML, Rushikesh S Kadam, Neeraj upadhyay,
	Frederic Weisbecker, Steven Rostedt, vineeth

On Thu, Jun 30, 2022 at 02:25:16PM +0000, Joel Fernandes wrote:
> On Wed, Jun 29, 2022 at 02:07:20PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 29, 2022 at 07:47:36PM +0000, Joel Fernandes wrote:
> > > On Wed, Jun 29, 2022 at 09:56:27AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jun 28, 2022 at 05:13:21PM -0400, Joel Fernandes wrote:
> > > > > On Tue, Jun 28, 2022 at 12:56 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > > >
> > > > > > On Mon, Jun 27, 2022 at 02:43:59PM -0700, Paul E. McKenney wrote:
> > > > > > > On Mon, Jun 27, 2022 at 09:18:13PM +0000, Joel Fernandes wrote:
> > > > > > > > On Mon, Jun 27, 2022 at 01:59:07PM -0700, Paul E. McKenney wrote:
> > > > > > > > > On Mon, Jun 27, 2022 at 08:56:43PM +0200, Uladzislau Rezki wrote:
> > > > > > > > > > > As per the comments in include/linux/shrinker.h, .count_objects callback
> > > > > > > > > > > should return the number of freeable items, but if there are no objects
> > > > > > > > > > > to free, SHRINK_EMPTY should be returned. The only time 0 is returned
> > > > > > > > > > > should be when we are unable to determine the number of objects, or the
> > > > > > > > > > > cache should be skipped for another reason.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > > > > > > ---
> > > > > > > > > > >  kernel/rcu/tree.c | 2 +-
> > > > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > > > > index 711679d10cbb..935788e8d2d7 100644
> > > > > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > > > > @@ -3722,7 +3722,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > > > > >               atomic_set(&krcp->backoff_page_cache_fill, 1);
> > > > > > > > > > >       }
> > > > > > > > > > >
> > > > > > > > > > > -     return count;
> > > > > > > > > > > +     return count == 0 ? SHRINK_EMPTY : count;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > >  static unsigned long
> > > > > > > > > > > --
> > > > > > > > > > > 2.37.0.rc0.104.g0611611a94-goog
> > > > > > > > > > >
> > > > > > > > > > Looks good to me!
> > > > > > > > > >
> > > > > > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > >
> > > > > > > > > Now that you mention it, this does look independent of the rest of
> > > > > > > > > the series.  I have pulled it in with Uladzislau's Reviewed-by.
> > > > > > > >
> > > > > > > > Thanks Paul and Vlad!
> > > > > > > >
> > > > > > > > Paul, apologies for being quiet. I have been working on the series and the
> > > > > > > > review comments carefully. I appreciate your help with this work.
> > > > > > >
> > > > > > > Not a problem.  After all, this stuff is changing some of the trickier
> > > > > > > parts of RCU.  We must therefore assume that some significant time and
> > > > > > > effort will be required to get it right.
> > > > > >
> > > > > > To your point about trickier parts of RCU, the v2 series though I tested it
> > > > > > before submitting is now giving me strange results with rcuscale. Sometimes
> > > > > > laziness does not seem to be in effect (as pointed out by rcuscale), other
> > > > > > times I am seeing stalls.
> > > > > >
> > > > > > So I have to carefully look through all of this again. I am not sure why I
> > > > > > was not seeing these issues with the exact same code before (frustrated).
> > > > > 
> > > > > Looks like I found at least 3 bugs in my v2 series which testing
> > > > > picked up now. RCU-lazy was being too lazy or not too lazy. Now tests
> > > > > pass, so its progress but does beg for more testing:
> > > > 
> > > > It is entirely possible that call_rcu_lazy() needs its own special
> > > > purpose tests.  This might be a separate test parallel to the test for
> > > > kfree_rcu() in kernel/rcu/rcuscale.c, for example.
> > > 
> > > I see, perhaps I can add a 'lazy' flag to rcutorture as well, so it uses
> > > call_rcu_lazy() for its async RCU invocations?
> > 
> > That will be tricky because of rcutorture's timeliness expectations.
> 
> I have facility now to set the lazy timeout from test kernel modules. I was
> thinking I could set the same from rcu torture. Maybe something like a 100
> jiffies? Then it can run through all the regular rcutorture tests and
> still exercise the new code paths.

That might work, and of course feel free to try it.  Except that there
are a lot of forward-progress checks in rcutorture that will like as not
spew huge steaming piles of false positives if it is only lazy callbacks
that are driving the grace period forward.  You have been warned.  ;-)

> > Maybe a self-invoking lazy callback initiated by rcu_torture_fakewriter()
> > that prints a line about its statistics at shutdown time?  At a minimum,
> > the number of times that it was invoked.  Better would be to print one
> > line summarizing stats for all of them.
> > 
> > The main thing that could be detected from this is a callback being
> > stranded.  Given that rcutorture enqueues non-lazy callbacks like a
> > drunken sailor, they won't end up being all that lazy.
> 
> Thanks for this idea as well. I'll think more about it. thanks,

We probably need a special-purpose test (for example, in rcuscale), but
the self-enqueuing lazy callback should at least avoid false positives
from rcutorture's forward-progress checks.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes
  2022-06-26  3:12 ` [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Paul E. McKenney
@ 2022-07-08  4:17   ` Joel Fernandes
  2022-07-08 22:45     ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-08  4:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sat, Jun 25, 2022 at 08:12:06PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 10:50:53PM +0000, Joel Fernandes (Google) wrote:
> > 
> > Hello!
> > Please find the next improved version of call_rcu_lazy() attached.  The main
> > difference between the previous version is that it is now using bypass lists,
> > and thus handling rcu_barrier() and hotplug situations, with some small changes
> > to those parts.
> > 
> > I also don't see the TREE07 RCU stall from v1 anymore.
> > 
> > In the v1, we some numbers below (testing on v2 is in progress). Rushikesh,
> > feel free to pull these patches into your tree. Just to note, you will also
> > need to pull the call_rcu_lazy() user patches from v1. I have dropped in this
> > series, just to make the series focus on the feature code first.
> > 
> > Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
> > The observation is that due to a 'trickle down' effect of RCU callbacks, the
> > system is very lightly loaded but constantly running few RCU callbacks very
> > often. This confuses the power management hardware that the system is active,
> > when it is in fact idle.
> > 
> > For example, when ChromeOS screen is off and user is not doing anything on the
> > system, we can see big power savings.
> > Before:
> > Pk%pc10 = 72.13
> > PkgWatt = 0.58
> > CorWatt = 0.04
> > 
> > After:
> > Pk%pc10 = 81.28
> > PkgWatt = 0.41
> > CorWatt = 0.03
> 
> So not quite 30% savings in power at the package level?  Not bad at all!

Yes this is the package residency amount, not the amount of power. This % is
not power.

> > Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
> > can see that the display pipeline is constantly doing RCU callback queuing due
> > to open/close of file descriptors associated with graphics buffers. This is
> > attributed to the file_free_rcu() path which this patch series also touches.
> > 
> > This patch series adds a simple but effective, and lockless implementation of
> > RCU callback batching. On memory pressure, timeout or queue growing too big, we
> > initiate a flush of one or more per-CPU lists.
> 
> It is no longer lockless, correct?  Or am I missing something subtle?
> 
> Full disclosure: I don't see a whole lot of benefit to its being lockless.
> But truth in advertising!  ;-)

Yes, you are right. Maybe a better way I could put it is it is "lock
contention less" :D

> > Similar results can be achieved by increasing jiffies_till_first_fqs, however
> > that also has the effect of slowing down RCU. Especially I saw huge slow down
> > of function graph tracer when increasing that.
> > 
> > One drawback of this series is, if another frequent RCU callback creeps up in
> > the future, that's not lazy, then that will again hurt the power. However, I
> > believe identifying and fixing those is a more reasonable approach than slowing
> > RCU down for the whole system.
> 
> Very good!  I have you down as the official call_rcu_lazy() whack-a-mole
> developer.  ;-)

:-D

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-06-26  4:13   ` Paul E. McKenney
@ 2022-07-08  4:25     ` Joel Fernandes
  2022-07-08 23:06       ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-08  4:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sat, Jun 25, 2022 at 09:13:27PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 10:51:00PM +0000, Joel Fernandes (Google) wrote:
> > Reuse the kfree_rcu() test in order to be able to compare the memory reclaiming
> > properties of call_rcu_lazy() with kfree_rcu().
> > 
> > With this test, we find similar memory footprint and time call_rcu_lazy()
> > free'ing takes compared to kfree_rcu(). Also we confirm that call_rcu_lazy()
> > can survive OOM during extremely frequent calls.
> > 
> > If we really push it, i.e. boot system with low memory and compare
> > kfree_rcu() with call_rcu_lazy(), I find that call_rcu_lazy() is more
> > resilient and is much harder to produce OOM as compared to kfree_rcu().
> 
> Another approach would be to make rcutorture's forward-progress testing
> able to use call_rcu_lazy().  This would test lazy callback flooding.
> 
> Yet another approach would be to keep one CPU idle other than a
> kthread doing call_rcu_lazy().  Of course "idle" includes redirecting
> those pesky interrupts.
> 
> It is almost certainly necessary for rcutorture to exercise the
> call_rcu_lazy() path regularly.

Currently I added a test like the following which adds a new torture type, my
thought was to stress the new code to make sure nothing crashed or hung the
kernel. That is working well except I don't exactly understand the total-gps
print showing 0, which the other print shows 1188 GPs. I'll go dig into that
tomorrow.. thanks!

The print shows
TREE11 ------- 1474 GPs (12.2833/s) [rcu_lazy: g0 f0x0 total-gps=0]
TREE11 no success message, 7 successful version messages

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7120165a9342..cc6b7392d801 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -872,6 +872,64 @@ static struct rcu_torture_ops tasks_rude_ops = {
 
 #endif // #else #ifdef CONFIG_TASKS_RUDE_RCU
 
+#ifdef CONFIG_RCU_LAZY
+
+/*
+ * Definitions for lazy RCU torture testing.
+ */
+unsigned long orig_jiffies_till_flush;
+
+static void rcu_sync_torture_init_lazy(void)
+{
+	rcu_sync_torture_init();
+
+	orig_jiffies_till_flush = rcu_lazy_get_jiffies_till_flush();
+	rcu_lazy_set_jiffies_till_flush(50);
+}
+
+static void rcu_lazy_cleanup(void)
+{
+	rcu_lazy_set_jiffies_till_flush(orig_jiffies_till_flush);
+}
+
+static struct rcu_torture_ops rcu_lazy_ops = {
+	.ttype			= RCU_LAZY_FLAVOR,
+	.init			= rcu_sync_torture_init_lazy,
+	.cleanup		= rcu_lazy_cleanup,
+	.readlock		= rcu_torture_read_lock,
+	.read_delay		= rcu_read_delay,
+	.readunlock		= rcu_torture_read_unlock,
+	.readlock_held		= torture_readlock_not_held,
+	.get_gp_seq		= rcu_get_gp_seq,
+	.gp_diff		= rcu_seq_diff,
+	.deferred_free		= rcu_torture_deferred_free,
+	.sync			= synchronize_rcu,
+	.exp_sync		= synchronize_rcu_expedited,
+	.get_gp_state		= get_state_synchronize_rcu,
+	.start_gp_poll		= start_poll_synchronize_rcu,
+	.poll_gp_state		= poll_state_synchronize_rcu,
+	.cond_sync		= cond_synchronize_rcu,
+	.call			= call_rcu_lazy,
+	.cb_barrier		= rcu_barrier,
+	.fqs			= rcu_force_quiescent_state,
+	.stats			= NULL,
+	.gp_kthread_dbg		= show_rcu_gp_kthreads,
+	.check_boost_failed	= rcu_check_boost_fail,
+	.stall_dur		= rcu_jiffies_till_stall_check,
+	.irq_capable		= 1,
+	.can_boost		= IS_ENABLED(CONFIG_RCU_BOOST),
+	.extendables		= RCUTORTURE_MAX_EXTEND,
+	.name			= "rcu_lazy"
+};
+
+#define LAZY_OPS &rcu_lazy_ops,
+
+#else // #ifdef CONFIG_RCU_LAZY
+
+#define LAZY_OPS
+
+#endif // #else #ifdef CONFIG_RCU_LAZY
+
 
 #ifdef CONFIG_TASKS_TRACE_RCU
 
@@ -3145,7 +3203,7 @@ rcu_torture_init(void)
 	unsigned long gp_seq = 0;
 	static struct rcu_torture_ops *torture_ops[] = {
 		&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &busted_srcud_ops,
-		TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS
+		TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS LAZY_OPS
 		&trivial_ops,
 	};
 
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11 b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
new file mode 100644
index 000000000000..436013f3e015
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
@@ -0,0 +1,18 @@
+CONFIG_SMP=y
+CONFIG_PREEMPT_NONE=n
+CONFIG_PREEMPT_VOLUNTARY=n
+CONFIG_PREEMPT=y
+#CHECK#CONFIG_PREEMPT_RCU=y
+CONFIG_HZ_PERIODIC=n
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_FULL=n
+CONFIG_RCU_TRACE=y
+CONFIG_HOTPLUG_CPU=y
+CONFIG_MAXSMP=y
+CONFIG_CPUMASK_OFFSTACK=y
+CONFIG_RCU_NOCB_CPU=y
+CONFIG_DEBUG_LOCK_ALLOC=n
+CONFIG_RCU_BOOST=n
+CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_RCU_EXPERT=y
+CONFIG_RCU_LAZY=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
new file mode 100644
index 000000000000..9b6f720d4ccd
--- /dev/null
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
@@ -0,0 +1,8 @@
+maxcpus=8 nr_cpus=43
+rcutree.gp_preinit_delay=3
+rcutree.gp_init_delay=3
+rcutree.gp_cleanup_delay=3
+rcu_nocbs=0-7
+rcutorture.torture_type=rcu_lazy
+rcutorture.nocbs_nthreads=8
+rcutorture.fwd_progress=0
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-26  4:00   ` Paul E. McKenney
@ 2022-07-08 18:43     ` Joel Fernandes
  2022-07-08 23:10       ` Paul E. McKenney
  2022-07-10  2:26     ` Joel Fernandes
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-08 18:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sat, Jun 25, 2022 at 09:00:19PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> > Implement timer-based RCU lazy callback batching. The batch is flushed
> > whenever a certain amount of time has passed, or the batch on a
> > particular CPU grows too big. Also memory pressure will flush it in a
> > future patch.
> > 
> > To handle several corner cases automagically (such as rcu_barrier() and
> > hotplug), we re-use bypass lists to handle lazy CBs. The bypass list
> > length has the lazy CB length included in it. A separate lazy CB length
> > counter is also introduced to keep track of the number of lazy CBs.
> > 
> > Suggested-by: Paul McKenney <paulmck@kernel.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> Not bad, but some questions and comments below.

Thanks a lot for these, real helpful and I replied below:

> > diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> > index 659d13a7ddaa..9a992707917b 100644
> > --- a/include/linux/rcu_segcblist.h
> > +++ b/include/linux/rcu_segcblist.h
> > @@ -22,6 +22,7 @@ struct rcu_cblist {
> >  	struct rcu_head *head;
> >  	struct rcu_head **tail;
> >  	long len;
> > +	long lazy_len;
> >  };
> >  
> >  #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 1a32036c918c..9191a3d88087 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -82,6 +82,12 @@ static inline int rcu_preempt_depth(void)
> >  
> >  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> >  
> > +#ifdef CONFIG_RCU_LAZY
> > +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
> > +#else
> > +#define call_rcu_lazy(head, func) call_rcu(head, func)
> > +#endif
> > +
> >  /* Internal to kernel */
> >  void rcu_init(void);
> >  extern int rcu_scheduler_active;
> > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > index 27aab870ae4c..0bffa992fdc4 100644
> > --- a/kernel/rcu/Kconfig
> > +++ b/kernel/rcu/Kconfig
> > @@ -293,4 +293,12 @@ config TASKS_TRACE_RCU_READ_MB
> >  	  Say N here if you hate read-side memory barriers.
> >  	  Take the default if you are unsure.
> >  
> > +config RCU_LAZY
> > +	bool "RCU callback lazy invocation functionality"
> > +	depends on RCU_NOCB_CPU
> > +	default n
> > +	help
> > +	  To save power, batch RCU callbacks and flush after delay, memory
> > +          pressure or callback list growing too big.
> 
> Spaces vs. tabs.

Fixed, thanks.

> The checkpatch warning is unhelpful ("please write a help paragraph that
> fully describes the config symbol")

Good old checkpatch :D

> >  endmenu # "RCU Subsystem"
> > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > index c54ea2b6a36b..627a3218a372 100644
> > --- a/kernel/rcu/rcu_segcblist.c
> > +++ b/kernel/rcu/rcu_segcblist.c
> > @@ -20,6 +20,7 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
> >  	rclp->head = NULL;
> >  	rclp->tail = &rclp->head;
> >  	rclp->len = 0;
> > +	rclp->lazy_len = 0;
> >  }
> >  
> >  /*
> > @@ -32,6 +33,15 @@ void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
> >  	WRITE_ONCE(rclp->len, rclp->len + 1);
> >  }
> >  
> > +/*
> > + * Enqueue an rcu_head structure onto the specified callback list.
> 
> Please also note the fact that it is enqueuing lazily.

Sorry, done.

> > + */
> > +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp)
> > +{
> > +	rcu_cblist_enqueue(rclp, rhp);
> > +	WRITE_ONCE(rclp->lazy_len, rclp->lazy_len + 1);
> 
> Except...  Why not just add a "lazy" parameter to rcu_cblist_enqueue()?
> IS_ENABLED() can make it fast.

Yeah good idea, it simplifies the code too. Thank you!

So you mean I should add in this function so that the branch gets optimized:
if (lazy && IS_ENABLE(CONFIG_RCU_LAZY)) {
  ...
}

That makes total sense considering the compiler may otherwise not be able to
optimize the function viewing just the individual translation unit. I fixed
it.

The 6 month old baby and wife are calling my attention now. I will continue
to reply to the other parts of this and other emails this evening and thanks
for your help!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes
  2022-07-08  4:17   ` Joel Fernandes
@ 2022-07-08 22:45     ` Paul E. McKenney
  2022-07-10  1:38       ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-08 22:45 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Fri, Jul 08, 2022 at 04:17:30AM +0000, Joel Fernandes wrote:
> On Sat, Jun 25, 2022 at 08:12:06PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2022 at 10:50:53PM +0000, Joel Fernandes (Google) wrote:
> > > 
> > > Hello!
> > > Please find the next improved version of call_rcu_lazy() attached.  The main
> > > difference between the previous version is that it is now using bypass lists,
> > > and thus handling rcu_barrier() and hotplug situations, with some small changes
> > > to those parts.
> > > 
> > > I also don't see the TREE07 RCU stall from v1 anymore.
> > > 
> > > In the v1, we some numbers below (testing on v2 is in progress). Rushikesh,
> > > feel free to pull these patches into your tree. Just to note, you will also
> > > need to pull the call_rcu_lazy() user patches from v1. I have dropped in this
> > > series, just to make the series focus on the feature code first.
> > > 
> > > Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
> > > The observation is that due to a 'trickle down' effect of RCU callbacks, the
> > > system is very lightly loaded but constantly running few RCU callbacks very
> > > often. This confuses the power management hardware that the system is active,
> > > when it is in fact idle.
> > > 
> > > For example, when ChromeOS screen is off and user is not doing anything on the
> > > system, we can see big power savings.
> > > Before:
> > > Pk%pc10 = 72.13
> > > PkgWatt = 0.58
> > > CorWatt = 0.04
> > > 
> > > After:
> > > Pk%pc10 = 81.28
> > > PkgWatt = 0.41
> > > CorWatt = 0.03
> > 
> > So not quite 30% savings in power at the package level?  Not bad at all!
> 
> Yes this is the package residency amount, not the amount of power. This % is
> not power.

So what exactly is PkgWatt, then?  If you can say.  That is where I was
getting the 30% from.

> > > Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
> > > can see that the display pipeline is constantly doing RCU callback queuing due
> > > to open/close of file descriptors associated with graphics buffers. This is
> > > attributed to the file_free_rcu() path which this patch series also touches.
> > > 
> > > This patch series adds a simple but effective, and lockless implementation of
> > > RCU callback batching. On memory pressure, timeout or queue growing too big, we
> > > initiate a flush of one or more per-CPU lists.
> > 
> > It is no longer lockless, correct?  Or am I missing something subtle?
> > 
> > Full disclosure: I don't see a whole lot of benefit to its being lockless.
> > But truth in advertising!  ;-)
> 
> Yes, you are right. Maybe a better way I could put it is it is "lock
> contention less" :D

Yes, "reduced lock contention" would be a good phrase.  As long as you
carefully indicate exactly what scenario with greater lock contention
you are comparing to.

But aren't you acquiring the bypass lock at about the same rate as it
would be aquired without laziness?  What am I missing here?

							Thanx, Paul

> > > Similar results can be achieved by increasing jiffies_till_first_fqs, however
> > > that also has the effect of slowing down RCU. Especially I saw huge slow down
> > > of function graph tracer when increasing that.
> > > 
> > > One drawback of this series is, if another frequent RCU callback creeps up in
> > > the future, that's not lazy, then that will again hurt the power. However, I
> > > believe identifying and fixing those is a more reasonable approach than slowing
> > > RCU down for the whole system.
> > 
> > Very good!  I have you down as the official call_rcu_lazy() whack-a-mole
> > developer.  ;-)
> 
> :-D
> 
> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-07-08  4:25     ` Joel Fernandes
@ 2022-07-08 23:06       ` Paul E. McKenney
  2022-07-12 20:27         ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-08 23:06 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Fri, Jul 08, 2022 at 04:25:09AM +0000, Joel Fernandes wrote:
> On Sat, Jun 25, 2022 at 09:13:27PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2022 at 10:51:00PM +0000, Joel Fernandes (Google) wrote:
> > > Reuse the kfree_rcu() test in order to be able to compare the memory reclaiming
> > > properties of call_rcu_lazy() with kfree_rcu().
> > > 
> > > With this test, we find similar memory footprint and time call_rcu_lazy()
> > > free'ing takes compared to kfree_rcu(). Also we confirm that call_rcu_lazy()
> > > can survive OOM during extremely frequent calls.
> > > 
> > > If we really push it, i.e. boot system with low memory and compare
> > > kfree_rcu() with call_rcu_lazy(), I find that call_rcu_lazy() is more
> > > resilient and is much harder to produce OOM as compared to kfree_rcu().
> > 
> > Another approach would be to make rcutorture's forward-progress testing
> > able to use call_rcu_lazy().  This would test lazy callback flooding.
> > 
> > Yet another approach would be to keep one CPU idle other than a
> > kthread doing call_rcu_lazy().  Of course "idle" includes redirecting
> > those pesky interrupts.
> > 
> > It is almost certainly necessary for rcutorture to exercise the
> > call_rcu_lazy() path regularly.
> 
> Currently I added a test like the following which adds a new torture type, my
> thought was to stress the new code to make sure nothing crashed or hung the
> kernel. That is working well except I don't exactly understand the total-gps
> print showing 0, which the other print shows 1188 GPs. I'll go dig into that
> tomorrow.. thanks!
> 
> The print shows
> TREE11 ------- 1474 GPs (12.2833/s) [rcu_lazy: g0 f0x0 total-gps=0]
> TREE11 no success message, 7 successful version messages

Nice!!!  It is very good to see you correctly using the rcu_torture_ops
facility correctly!

And this could be good for your own testing, and I am happy to pull it
in for that purpose (given it being fixed, having a good commit log,
and so on).  After all, TREE10 is quite similar -- not part of CFLIST,
but useful for certain types of focused testing.

However, it would be very good to get call_rcu_lazy() testing going
more generally, and in particular in TREE01 where offloading changes
dynamically.  A good way to do this is to add a .call_lazy() component
to the rcu_torture_ops structure, and check for it in a manner similar
to that done for the .deferred_free() component.  Including adding a
gp_normal_lazy module parameter.  This would allow habitual testing
on a few scenarios and focused lazy testing on all of them via the
--bootargs parameter.

On the total-gps=0, the usual suspicion would be that the lazy callbacks
never got invoked.  It looks like you were doing about a two-minute run,
so maybe a longer run?  Though weren't they supposed to kick in at 15
seconds or so?  Or did this value of zero come about because this run
used exactly 300 grace periods?

							Thanx, Paul

> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 7120165a9342..cc6b7392d801 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -872,6 +872,64 @@ static struct rcu_torture_ops tasks_rude_ops = {
>  
>  #endif // #else #ifdef CONFIG_TASKS_RUDE_RCU
>  
> +#ifdef CONFIG_RCU_LAZY
> +
> +/*
> + * Definitions for lazy RCU torture testing.
> + */
> +unsigned long orig_jiffies_till_flush;
> +
> +static void rcu_sync_torture_init_lazy(void)
> +{
> +	rcu_sync_torture_init();
> +
> +	orig_jiffies_till_flush = rcu_lazy_get_jiffies_till_flush();
> +	rcu_lazy_set_jiffies_till_flush(50);
> +}
> +
> +static void rcu_lazy_cleanup(void)
> +{
> +	rcu_lazy_set_jiffies_till_flush(orig_jiffies_till_flush);
> +}
> +
> +static struct rcu_torture_ops rcu_lazy_ops = {
> +	.ttype			= RCU_LAZY_FLAVOR,
> +	.init			= rcu_sync_torture_init_lazy,
> +	.cleanup		= rcu_lazy_cleanup,
> +	.readlock		= rcu_torture_read_lock,
> +	.read_delay		= rcu_read_delay,
> +	.readunlock		= rcu_torture_read_unlock,
> +	.readlock_held		= torture_readlock_not_held,
> +	.get_gp_seq		= rcu_get_gp_seq,
> +	.gp_diff		= rcu_seq_diff,
> +	.deferred_free		= rcu_torture_deferred_free,
> +	.sync			= synchronize_rcu,
> +	.exp_sync		= synchronize_rcu_expedited,
> +	.get_gp_state		= get_state_synchronize_rcu,
> +	.start_gp_poll		= start_poll_synchronize_rcu,
> +	.poll_gp_state		= poll_state_synchronize_rcu,
> +	.cond_sync		= cond_synchronize_rcu,
> +	.call			= call_rcu_lazy,
> +	.cb_barrier		= rcu_barrier,
> +	.fqs			= rcu_force_quiescent_state,
> +	.stats			= NULL,
> +	.gp_kthread_dbg		= show_rcu_gp_kthreads,
> +	.check_boost_failed	= rcu_check_boost_fail,
> +	.stall_dur		= rcu_jiffies_till_stall_check,
> +	.irq_capable		= 1,
> +	.can_boost		= IS_ENABLED(CONFIG_RCU_BOOST),
> +	.extendables		= RCUTORTURE_MAX_EXTEND,
> +	.name			= "rcu_lazy"
> +};
> +
> +#define LAZY_OPS &rcu_lazy_ops,
> +
> +#else // #ifdef CONFIG_RCU_LAZY
> +
> +#define LAZY_OPS
> +
> +#endif // #else #ifdef CONFIG_RCU_LAZY
> +
>  
>  #ifdef CONFIG_TASKS_TRACE_RCU
>  
> @@ -3145,7 +3203,7 @@ rcu_torture_init(void)
>  	unsigned long gp_seq = 0;
>  	static struct rcu_torture_ops *torture_ops[] = {
>  		&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &busted_srcud_ops,
> -		TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS
> +		TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS LAZY_OPS
>  		&trivial_ops,
>  	};
>  
> diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11 b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
> new file mode 100644
> index 000000000000..436013f3e015
> --- /dev/null
> +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
> @@ -0,0 +1,18 @@
> +CONFIG_SMP=y
> +CONFIG_PREEMPT_NONE=n
> +CONFIG_PREEMPT_VOLUNTARY=n
> +CONFIG_PREEMPT=y
> +#CHECK#CONFIG_PREEMPT_RCU=y
> +CONFIG_HZ_PERIODIC=n
> +CONFIG_NO_HZ_IDLE=y
> +CONFIG_NO_HZ_FULL=n
> +CONFIG_RCU_TRACE=y
> +CONFIG_HOTPLUG_CPU=y
> +CONFIG_MAXSMP=y
> +CONFIG_CPUMASK_OFFSTACK=y
> +CONFIG_RCU_NOCB_CPU=y
> +CONFIG_DEBUG_LOCK_ALLOC=n
> +CONFIG_RCU_BOOST=n
> +CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> +CONFIG_RCU_EXPERT=y
> +CONFIG_RCU_LAZY=y
> diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> new file mode 100644
> index 000000000000..9b6f720d4ccd
> --- /dev/null
> +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> @@ -0,0 +1,8 @@
> +maxcpus=8 nr_cpus=43
> +rcutree.gp_preinit_delay=3
> +rcutree.gp_init_delay=3
> +rcutree.gp_cleanup_delay=3
> +rcu_nocbs=0-7
> +rcutorture.torture_type=rcu_lazy
> +rcutorture.nocbs_nthreads=8
> +rcutorture.fwd_progress=0
> -- 
> 2.37.0.rc0.161.g10f37bed90-goog
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-07-08 18:43     ` Joel Fernandes
@ 2022-07-08 23:10       ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-08 23:10 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Fri, Jul 08, 2022 at 06:43:21PM +0000, Joel Fernandes wrote:
> On Sat, Jun 25, 2022 at 09:00:19PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2022 at 10:50:55PM +0000, Joel Fernandes (Google) wrote:
> > > Implement timer-based RCU lazy callback batching. The batch is flushed
> > > whenever a certain amount of time has passed, or the batch on a
> > > particular CPU grows too big. Also memory pressure will flush it in a
> > > future patch.
> > > 
> > > To handle several corner cases automagically (such as rcu_barrier() and
> > > hotplug), we re-use bypass lists to handle lazy CBs. The bypass list
> > > length has the lazy CB length included in it. A separate lazy CB length
> > > counter is also introduced to keep track of the number of lazy CBs.
> > > 
> > > Suggested-by: Paul McKenney <paulmck@kernel.org>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> > Not bad, but some questions and comments below.
> 
> Thanks a lot for these, real helpful and I replied below:
> 
> > > diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
> > > index 659d13a7ddaa..9a992707917b 100644
> > > --- a/include/linux/rcu_segcblist.h
> > > +++ b/include/linux/rcu_segcblist.h
> > > @@ -22,6 +22,7 @@ struct rcu_cblist {
> > >  	struct rcu_head *head;
> > >  	struct rcu_head **tail;
> > >  	long len;
> > > +	long lazy_len;
> > >  };
> > >  
> > >  #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = &n.head }
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 1a32036c918c..9191a3d88087 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -82,6 +82,12 @@ static inline int rcu_preempt_depth(void)
> > >  
> > >  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> > >  
> > > +#ifdef CONFIG_RCU_LAZY
> > > +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
> > > +#else
> > > +#define call_rcu_lazy(head, func) call_rcu(head, func)
> > > +#endif
> > > +
> > >  /* Internal to kernel */
> > >  void rcu_init(void);
> > >  extern int rcu_scheduler_active;
> > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > index 27aab870ae4c..0bffa992fdc4 100644
> > > --- a/kernel/rcu/Kconfig
> > > +++ b/kernel/rcu/Kconfig
> > > @@ -293,4 +293,12 @@ config TASKS_TRACE_RCU_READ_MB
> > >  	  Say N here if you hate read-side memory barriers.
> > >  	  Take the default if you are unsure.
> > >  
> > > +config RCU_LAZY
> > > +	bool "RCU callback lazy invocation functionality"
> > > +	depends on RCU_NOCB_CPU
> > > +	default n
> > > +	help
> > > +	  To save power, batch RCU callbacks and flush after delay, memory
> > > +          pressure or callback list growing too big.
> > 
> > Spaces vs. tabs.
> 
> Fixed, thanks.
> 
> > The checkpatch warning is unhelpful ("please write a help paragraph that
> > fully describes the config symbol")
> 
> Good old checkpatch :D

;-) ;-) ;-)

> > >  endmenu # "RCU Subsystem"
> > > diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> > > index c54ea2b6a36b..627a3218a372 100644
> > > --- a/kernel/rcu/rcu_segcblist.c
> > > +++ b/kernel/rcu/rcu_segcblist.c
> > > @@ -20,6 +20,7 @@ void rcu_cblist_init(struct rcu_cblist *rclp)
> > >  	rclp->head = NULL;
> > >  	rclp->tail = &rclp->head;
> > >  	rclp->len = 0;
> > > +	rclp->lazy_len = 0;
> > >  }
> > >  
> > >  /*
> > > @@ -32,6 +33,15 @@ void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
> > >  	WRITE_ONCE(rclp->len, rclp->len + 1);
> > >  }
> > >  
> > > +/*
> > > + * Enqueue an rcu_head structure onto the specified callback list.
> > 
> > Please also note the fact that it is enqueuing lazily.
> 
> Sorry, done.
> 
> > > + */
> > > +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp)
> > > +{
> > > +	rcu_cblist_enqueue(rclp, rhp);
> > > +	WRITE_ONCE(rclp->lazy_len, rclp->lazy_len + 1);
> > 
> > Except...  Why not just add a "lazy" parameter to rcu_cblist_enqueue()?
> > IS_ENABLED() can make it fast.
> 
> Yeah good idea, it simplifies the code too. Thank you!
> 
> So you mean I should add in this function so that the branch gets optimized:
> if (lazy && IS_ENABLE(CONFIG_RCU_LAZY)) {
>   ...
> }
> 
> That makes total sense considering the compiler may otherwise not be able to
> optimize the function viewing just the individual translation unit. I fixed
> it.

Or the other way around:

	if (IS_ENABLE(CONFIG_RCU_LAZY) && lazy) {

Just in case the compiler is stumbling over its boolean logic.  Or in
case the human reader is.  ;-)

> The 6 month old baby and wife are calling my attention now. I will continue
> to reply to the other parts of this and other emails this evening and thanks
> for your help!

Ah, for those who believe that SIGCHLD can be ignored in real life!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes
  2022-07-08 22:45     ` Paul E. McKenney
@ 2022-07-10  1:38       ` Joel Fernandes
  2022-07-10 15:47         ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-10  1:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Fri, Jul 08, 2022 at 03:45:14PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 08, 2022 at 04:17:30AM +0000, Joel Fernandes wrote:
> > On Sat, Jun 25, 2022 at 08:12:06PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 22, 2022 at 10:50:53PM +0000, Joel Fernandes (Google) wrote:
> > > > 
> > > > Hello!
> > > > Please find the next improved version of call_rcu_lazy() attached.  The main
> > > > difference between the previous version is that it is now using bypass lists,
> > > > and thus handling rcu_barrier() and hotplug situations, with some small changes
> > > > to those parts.
> > > > 
> > > > I also don't see the TREE07 RCU stall from v1 anymore.
> > > > 
> > > > In the v1, we some numbers below (testing on v2 is in progress). Rushikesh,
> > > > feel free to pull these patches into your tree. Just to note, you will also
> > > > need to pull the call_rcu_lazy() user patches from v1. I have dropped in this
> > > > series, just to make the series focus on the feature code first.
> > > > 
> > > > Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
> > > > The observation is that due to a 'trickle down' effect of RCU callbacks, the
> > > > system is very lightly loaded but constantly running few RCU callbacks very
> > > > often. This confuses the power management hardware that the system is active,
> > > > when it is in fact idle.
> > > > 
> > > > For example, when ChromeOS screen is off and user is not doing anything on the
> > > > system, we can see big power savings.
> > > > Before:
> > > > Pk%pc10 = 72.13
> > > > PkgWatt = 0.58
> > > > CorWatt = 0.04
> > > > 
> > > > After:
> > > > Pk%pc10 = 81.28
> > > > PkgWatt = 0.41
> > > > CorWatt = 0.03
> > > 
> > > So not quite 30% savings in power at the package level?  Not bad at all!
> > 
> > Yes this is the package residency amount, not the amount of power. This % is
> > not power.
> 
> So what exactly is PkgWatt, then?  If you can say.  That is where I was
> getting the 30% from.

Its the total package power (SoC power) - so like not just the CPU but also
the interconnect, other controllers and other blocks in there.

This output is from the turbostat program and the number is mentioned in the
manpage:
"PkgWatt Watts consumed by the whole package."
https://manpages.debian.org/testing/linux-cpupower/turbostat.8.en.html


> > > > Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
> > > > can see that the display pipeline is constantly doing RCU callback queuing due
> > > > to open/close of file descriptors associated with graphics buffers. This is
> > > > attributed to the file_free_rcu() path which this patch series also touches.
> > > > 
> > > > This patch series adds a simple but effective, and lockless implementation of
> > > > RCU callback batching. On memory pressure, timeout or queue growing too big, we
> > > > initiate a flush of one or more per-CPU lists.
> > > 
> > > It is no longer lockless, correct?  Or am I missing something subtle?
> > > 
> > > Full disclosure: I don't see a whole lot of benefit to its being lockless.
> > > But truth in advertising!  ;-)
> > 
> > Yes, you are right. Maybe a better way I could put it is it is "lock
> > contention less" :D
> 
> Yes, "reduced lock contention" would be a good phrase.  As long as you
> carefully indicate exactly what scenario with greater lock contention
> you are comparing to.
> 
> But aren't you acquiring the bypass lock at about the same rate as it
> would be aquired without laziness?  What am I missing here?

You are right, why not I just drop the locking phrases from the summary.
Anyway the main win from this work is not related to locking.

thanks,

 - Joel

> 
> 							Thanx, Paul
> 
> > > > Similar results can be achieved by increasing jiffies_till_first_fqs, however
> > > > that also has the effect of slowing down RCU. Especially I saw huge slow down
> > > > of function graph tracer when increasing that.
> > > > 
> > > > One drawback of this series is, if another frequent RCU callback creeps up in
> > > > the future, that's not lazy, then that will again hurt the power. However, I
> > > > believe identifying and fixing those is a more reasonable approach than slowing
> > > > RCU down for the whole system.
> > > 
> > > Very good!  I have you down as the official call_rcu_lazy() whack-a-mole
> > > developer.  ;-)
> > 
> > :-D
> > 
> > thanks,
> > 
> >  - Joel
> > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-06-26  4:00   ` Paul E. McKenney
  2022-07-08 18:43     ` Joel Fernandes
@ 2022-07-10  2:26     ` Joel Fernandes
  2022-07-10 16:03       ` Paul E. McKenney
  1 sibling, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-10  2:26 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

I replied some more and will reply more soon, its a really long thread and
thank you for the detailed review :)

On Sat, Jun 25, 2022 at 09:00:19PM -0700, Paul E. McKenney wrote:
[..]
> > +}
> > +
> >  /*
> >   * Flush the second rcu_cblist structure onto the first one, obliterating
> >   * any contents of the first.  If rhp is non-NULL, enqueue it as the sole
> > @@ -60,6 +70,15 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
> >  	}
> >  }
> 
> Header comment, please.  It can be short, referring to that of the
> function rcu_cblist_flush_enqueue().

Done, what I ended up doing is nuking the new function and doing the same
IS_ENABLED() trick to the existing rcu_cblist_flush_enqueue(). diffstat is
also happy!

> > +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> > +			      struct rcu_cblist *srclp,
> > +			      struct rcu_head *rhp)
> 
> Please line up the "struct" keywords.  (Picky, I know...)
> 
> > +{
> > +	rcu_cblist_flush_enqueue(drclp, srclp, rhp);
> > +	if (rhp)
> > +		WRITE_ONCE(srclp->lazy_len, 1);
> 
> Shouldn't this instead be a lazy argument to rcu_cblist_flush_enqueue()?
> Concerns about speed in the !RCU_LAZY case can be addressed using
> IS_ENABLED(), for example:
> 
> 	if (IS_ENABLED(CONFIG_RCU_LAZY) && rhp)
> 		WRITE_ONCE(srclp->lazy_len, 1);

Ah indeed exactly what I ended up doing.

> > +}
> > +
> >  /*
> >   * Dequeue the oldest rcu_head structure from the specified callback
> >   * list.
> > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > index 431cee212467..c3d7de65b689 100644
> > --- a/kernel/rcu/rcu_segcblist.h
> > +++ b/kernel/rcu/rcu_segcblist.h
> > @@ -15,14 +15,28 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
> >  	return READ_ONCE(rclp->len);
> >  }
> >  
> > +/* Return number of callbacks in the specified callback list. */
> > +static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
> > +{
> > +#ifdef CONFIG_RCU_LAZY
> > +	return READ_ONCE(rclp->lazy_len);
> > +#else
> > +	return 0;
> > +#endif
> 
> Please use IS_ENABLED().  This saves a line (and lots of characters)
> but compiles just as efficienctly.

Sounds good, looks a lot better, thanks!

It ends up looking like:

static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
{
        if (IS_ENABLED(CONFIG_RCU_LAZY))
                return READ_ONCE(rclp->lazy_len);
        return 0;
}

static inline void rcu_cblist_reset_lazy_len(struct rcu_cblist *rclp)
{
        if (IS_ENABLED(CONFIG_RCU_LAZY))
                WRITE_ONCE(rclp->lazy_len, 0);
}

> > +}
> > +
> >  /* Return number of callbacks in segmented callback list by summing seglen. */
> >  long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
> >  
> >  void rcu_cblist_init(struct rcu_cblist *rclp);
> >  void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
> > +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp);
> >  void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
> >  			      struct rcu_cblist *srclp,
> >  			      struct rcu_head *rhp);
> > +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> > +			      struct rcu_cblist *srclp,
> > +			      struct rcu_head *rhp);
> 
> Please line up the "struct" keywords.  (Still picky, I know...)

Nuked it due to new lazy parameter so no issue now.
 
> >  struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp);
> >  
> >  /*
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c25ba442044a..d2e3d6e176d2 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3098,7 +3098,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
> >   * Implementation of these memory-ordering guarantees is described here:
> >   * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
> >   */
> 
> The above docbook comment needs to move to call_rcu().

Ok sure.

> > -void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +static void
> > +__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
> >  {
> >  	static atomic_t doublefrees;
> >  	unsigned long flags;
> > @@ -3139,7 +3140,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	}
> >  
> >  	check_cb_ovld(rdp);
> > -	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> > +	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
> >  		return; // Enqueued onto ->nocb_bypass, so just leave.
> >  	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
> >  	rcu_segcblist_enqueue(&rdp->cblist, head);
> > @@ -3161,8 +3162,21 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  		local_irq_restore(flags);
> >  	}
> >  }
> > -EXPORT_SYMBOL_GPL(call_rcu);
> 
> Please add a docbook comment for call_rcu_lazy().  It can be brief, for
> example, by referring to call_rcu()'s docbook comment for memory-ordering
> details.

I added something like the following, hope it looks OK:

#ifdef CONFIG_RCU_LAZY
/**
 * call_rcu_lazy() - Lazily queue RCU callback for invocation after grace period.
 * @head: structure to be used for queueing the RCU updates.
 * @func: actual callback function to be invoked after the grace period
 *
 * The callback function will be invoked some time after a full grace
 * period elapses, in other words after all pre-existing RCU read-side
 * critical sections have completed.
 *
 * Use this API instead of call_rcu() if you don't mind the callback being
 * invoked after very long periods of time on systems without memory pressure
 * and on systems which are lightly loaded or mostly idle.
 *
 * Other than the extra delay in callbacks being invoked, this function is
 * identical to, and reuses call_rcu()'s logic. Refer to call_rcu() for more
 * details about memory ordering and other functionality.
 */
void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
{
        return __call_rcu_common(head, func, true);
}
EXPORT_SYMBOL_GPL(call_rcu_lazy);
#endif

> 
> > +#ifdef CONFIG_RCU_LAZY
> > +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
> > +{
> > +	return __call_rcu_common(head, func, true);
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu_lazy);
> > +#endif
> > +
> > +void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +{
> > +	return __call_rcu_common(head, func, false);
> > +
> > +}
> > +EXPORT_SYMBOL_GPL(call_rcu);
> >  
> >  /* Maximum number of jiffies to wait before draining a batch. */
> >  #define KFREE_DRAIN_JIFFIES (HZ / 50)
> > @@ -4056,7 +4070,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> >  	rdp->barrier_head.func = rcu_barrier_callback;
> >  	debug_rcu_head_queue(&rdp->barrier_head);
> >  	rcu_nocb_lock(rdp);
> > -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> > +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
> >  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
> >  		atomic_inc(&rcu_state.barrier_cpu_count);
> >  	} else {
> > @@ -4476,7 +4490,7 @@ void rcutree_migrate_callbacks(int cpu)
> >  	my_rdp = this_cpu_ptr(&rcu_data);
> >  	my_rnp = my_rdp->mynode;
> >  	rcu_nocb_lock(my_rdp); /* irqs already disabled. */
> > -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies));
> > +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false));
> >  	raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */
> >  	/* Leverage recent GPs and set GP for new callbacks. */
> >  	needwake = rcu_advance_cbs(my_rnp, rdp) ||
> > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > index 2ccf5845957d..fec4fad6654b 100644
> > --- a/kernel/rcu/tree.h
> > +++ b/kernel/rcu/tree.h
> > @@ -267,8 +267,9 @@ struct rcu_data {
> >  /* Values for nocb_defer_wakeup field in struct rcu_data. */
> >  #define RCU_NOCB_WAKE_NOT	0
> >  #define RCU_NOCB_WAKE_BYPASS	1
> > -#define RCU_NOCB_WAKE		2
> > -#define RCU_NOCB_WAKE_FORCE	3
> > +#define RCU_NOCB_WAKE_LAZY	2
> > +#define RCU_NOCB_WAKE		3
> > +#define RCU_NOCB_WAKE_FORCE	4
> >  
> >  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
> >  					/* For jiffies_till_first_fqs and */
> > @@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
> >  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
> >  static void rcu_init_one_nocb(struct rcu_node *rnp);
> >  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				  unsigned long j);
> > +				  unsigned long j, bool lazy);
> >  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				bool *was_alldone, unsigned long flags);
> > +				bool *was_alldone, unsigned long flags,
> > +				bool lazy);
> >  static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
> >  				 unsigned long flags);
> >  static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index e369efe94fda..b9244f22e102 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -256,6 +256,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> >  	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
> >  }
> 
> Comment on LAZY_FLUSH_JIFFIES purpose in life, please!  (At some point
> more flexibility may be required, but let's not unnecessarily rush
> into that.)

I added this:
/*
 * LAZY_FLUSH_JIFFIES decides the maximum amount of time that
 * can elapse before lazy callbacks are flushed. Lazy callbacks
 * could be flushed much earlier for a number of other reasons
 * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are
 * left unsubmitted to RCU after those many jiffies.
 */

> > +#define LAZY_FLUSH_JIFFIES (10 * HZ)
> > +
> >  /*
> >   * Arrange to wake the GP kthread for this NOCB group at some future
> >   * time when it is safe to do so.
> > @@ -272,7 +274,10 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> >  	 * Bypass wakeup overrides previous deferments. In case
> >  	 * of callback storm, no need to wake up too early.
> >  	 */
> > -	if (waketype == RCU_NOCB_WAKE_BYPASS) {
> > +	if (waketype == RCU_NOCB_WAKE_LAZY) {
> 
> Presumably we get here only if all of this CPU's callbacks are lazy?

Yes that's right.

> >  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
> >  	WRITE_ONCE(rdp->nocb_bypass_first, j);
> >  	rcu_nocb_bypass_unlock(rdp);
> > @@ -326,13 +337,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >   * Note that this function always returns true if rhp is NULL.
> >   */
> >  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				  unsigned long j)
> > +				  unsigned long j, bool lazy)
> >  {
> >  	if (!rcu_rdp_is_offloaded(rdp))
> >  		return true;
> >  	rcu_lockdep_assert_cblist_protected(rdp);
> >  	rcu_nocb_bypass_lock(rdp);
> > -	return rcu_nocb_do_flush_bypass(rdp, rhp, j);
> > +	return rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy);
> >  }
> >  
> >  /*
> > @@ -345,7 +356,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
> >  	if (!rcu_rdp_is_offloaded(rdp) ||
> >  	    !rcu_nocb_bypass_trylock(rdp))
> >  		return;
> > -	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
> > +	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false));
> >  }
> >  
> >  /*
> > @@ -367,12 +378,14 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
> >   * there is only one CPU in operation.
> >   */
> >  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > -				bool *was_alldone, unsigned long flags)
> > +				bool *was_alldone, unsigned long flags,
> > +				bool lazy)
> >  {
> >  	unsigned long c;
> >  	unsigned long cur_gp_seq;
> >  	unsigned long j = jiffies;
> >  	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > +	long n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> >  
> >  	lockdep_assert_irqs_disabled();
> >  
> > @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> >  	}
> >  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
> >  
> > -	// If there hasn't yet been all that many ->cblist enqueues
> > -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> > -	// ->nocb_bypass first.
> > -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> > +	// If caller passed a non-lazy CB and there hasn't yet been all that
> > +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> > +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> > +	// number of CBs (lazy + non-lazy) grows too much.
> > +	//
> > +	// Note that if the bypass list has lazy CBs, and the main list is
> > +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> > +	// the lazy CBs to the main list as well. That's the right thing to do,
> > +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> > +	// one, we can just reuse that GP for the already queued-up lazy ones.
> > +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> > +	    (lazy && n_lazy_cbs >= qhimark)) {
> >  		rcu_nocb_lock(rdp);
> >  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> >  		if (*was_alldone)
> >  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > -					    TPS("FirstQ"));
> > -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> > +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> > +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> 
> The "false" here instead of "lazy" is because the caller is to do the
> enqueuing, correct?

There is no difference between using false or lazy here, because the bypass
flush is not also enqueuing the lazy callback, right?

We can also pass lazy instead of false if that's less confusing.

Or maybe I missed the issue you're raising?

> >  		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
> >  		return false; // Caller must enqueue the callback.
> >  	}
> >  
> >  	// If ->nocb_bypass has been used too long or is too full,
> >  	// flush ->nocb_bypass to ->cblist.
> > -	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
> > -	    ncbs >= qhimark) {
> > +	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
> >  		rcu_nocb_lock(rdp);
> > -		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
> > +		if (!rcu_nocb_flush_bypass(rdp, rhp, j, true)) {
> 
> But shouldn't this "true" be "lazy"?  I don't see how we are guaranteed
> that the callback is in fact lazy at this point in the code.  Also,
> there is not yet a guarantee that the caller will do the enqueuing.
> So what am I missing?

Sorry I screwed this part up. I think I meant 'false' here, if the list grew
too big- then I think I would prefer if the new lazy CB instead is treated as
non-lazy. But if that's too confusing, I will just pass 'lazy' instead. What
do you think?

Will reply more to the rest of the comments soon, thanks!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes
  2022-07-10  1:38       ` Joel Fernandes
@ 2022-07-10 15:47         ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-10 15:47 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sun, Jul 10, 2022 at 01:38:01AM +0000, Joel Fernandes wrote:
> On Fri, Jul 08, 2022 at 03:45:14PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 08, 2022 at 04:17:30AM +0000, Joel Fernandes wrote:
> > > On Sat, Jun 25, 2022 at 08:12:06PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 22, 2022 at 10:50:53PM +0000, Joel Fernandes (Google) wrote:
> > > > > 
> > > > > Hello!
> > > > > Please find the next improved version of call_rcu_lazy() attached.  The main
> > > > > difference between the previous version is that it is now using bypass lists,
> > > > > and thus handling rcu_barrier() and hotplug situations, with some small changes
> > > > > to those parts.
> > > > > 
> > > > > I also don't see the TREE07 RCU stall from v1 anymore.
> > > > > 
> > > > > In the v1, we some numbers below (testing on v2 is in progress). Rushikesh,
> > > > > feel free to pull these patches into your tree. Just to note, you will also
> > > > > need to pull the call_rcu_lazy() user patches from v1. I have dropped in this
> > > > > series, just to make the series focus on the feature code first.
> > > > > 
> > > > > Following are power savings we see on top of RCU_NOCB_CPU on an Intel platform.
> > > > > The observation is that due to a 'trickle down' effect of RCU callbacks, the
> > > > > system is very lightly loaded but constantly running few RCU callbacks very
> > > > > often. This confuses the power management hardware that the system is active,
> > > > > when it is in fact idle.
> > > > > 
> > > > > For example, when ChromeOS screen is off and user is not doing anything on the
> > > > > system, we can see big power savings.
> > > > > Before:
> > > > > Pk%pc10 = 72.13
> > > > > PkgWatt = 0.58
> > > > > CorWatt = 0.04
> > > > > 
> > > > > After:
> > > > > Pk%pc10 = 81.28
> > > > > PkgWatt = 0.41
> > > > > CorWatt = 0.03
> > > > 
> > > > So not quite 30% savings in power at the package level?  Not bad at all!
> > > 
> > > Yes this is the package residency amount, not the amount of power. This % is
> > > not power.
> > 
> > So what exactly is PkgWatt, then?  If you can say.  That is where I was
> > getting the 30% from.
> 
> Its the total package power (SoC power) - so like not just the CPU but also
> the interconnect, other controllers and other blocks in there.
> 
> This output is from the turbostat program and the number is mentioned in the
> manpage:
> "PkgWatt Watts consumed by the whole package."
> https://manpages.debian.org/testing/linux-cpupower/turbostat.8.en.html

Are we back to about a 30% savings in power at the package level?  ;-)

Either way, please quantify your "big power savings" by calculating and
stating a percentage decrease.

> > > > > Further, when ChromeOS screen is ON but system is idle or lightly loaded, we
> > > > > can see that the display pipeline is constantly doing RCU callback queuing due
> > > > > to open/close of file descriptors associated with graphics buffers. This is
> > > > > attributed to the file_free_rcu() path which this patch series also touches.
> > > > > 
> > > > > This patch series adds a simple but effective, and lockless implementation of
> > > > > RCU callback batching. On memory pressure, timeout or queue growing too big, we
> > > > > initiate a flush of one or more per-CPU lists.
> > > > 
> > > > It is no longer lockless, correct?  Or am I missing something subtle?
> > > > 
> > > > Full disclosure: I don't see a whole lot of benefit to its being lockless.
> > > > But truth in advertising!  ;-)
> > > 
> > > Yes, you are right. Maybe a better way I could put it is it is "lock
> > > contention less" :D
> > 
> > Yes, "reduced lock contention" would be a good phrase.  As long as you
> > carefully indicate exactly what scenario with greater lock contention
> > you are comparing to.
> > 
> > But aren't you acquiring the bypass lock at about the same rate as it
> > would be aquired without laziness?  What am I missing here?
> 
> You are right, why not I just drop the locking phrases from the summary.
> Anyway the main win from this work is not related to locking.

Sounds good!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > 
> > 							Thanx, Paul
> > 
> > > > > Similar results can be achieved by increasing jiffies_till_first_fqs, however
> > > > > that also has the effect of slowing down RCU. Especially I saw huge slow down
> > > > > of function graph tracer when increasing that.
> > > > > 
> > > > > One drawback of this series is, if another frequent RCU callback creeps up in
> > > > > the future, that's not lazy, then that will again hurt the power. However, I
> > > > > believe identifying and fixing those is a more reasonable approach than slowing
> > > > > RCU down for the whole system.
> > > > 
> > > > Very good!  I have you down as the official call_rcu_lazy() whack-a-mole
> > > > developer.  ;-)
> > > 
> > > :-D
> > > 
> > > thanks,
> > > 
> > >  - Joel
> > > 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-07-10  2:26     ` Joel Fernandes
@ 2022-07-10 16:03       ` Paul E. McKenney
  2022-07-12 20:53         ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-10 16:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Sun, Jul 10, 2022 at 02:26:51AM +0000, Joel Fernandes wrote:
> I replied some more and will reply more soon, its a really long thread and
> thank you for the detailed review :)
> 
> On Sat, Jun 25, 2022 at 09:00:19PM -0700, Paul E. McKenney wrote:
> [..]
> > > +}
> > > +
> > >  /*
> > >   * Flush the second rcu_cblist structure onto the first one, obliterating
> > >   * any contents of the first.  If rhp is non-NULL, enqueue it as the sole
> > > @@ -60,6 +70,15 @@ void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
> > >  	}
> > >  }
> > 
> > Header comment, please.  It can be short, referring to that of the
> > function rcu_cblist_flush_enqueue().
> 
> Done, what I ended up doing is nuking the new function and doing the same
> IS_ENABLED() trick to the existing rcu_cblist_flush_enqueue(). diffstat is
> also happy!
> 
> > > +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> > > +			      struct rcu_cblist *srclp,
> > > +			      struct rcu_head *rhp)
> > 
> > Please line up the "struct" keywords.  (Picky, I know...)
> > 
> > > +{
> > > +	rcu_cblist_flush_enqueue(drclp, srclp, rhp);
> > > +	if (rhp)
> > > +		WRITE_ONCE(srclp->lazy_len, 1);
> > 
> > Shouldn't this instead be a lazy argument to rcu_cblist_flush_enqueue()?
> > Concerns about speed in the !RCU_LAZY case can be addressed using
> > IS_ENABLED(), for example:
> > 
> > 	if (IS_ENABLED(CONFIG_RCU_LAZY) && rhp)
> > 		WRITE_ONCE(srclp->lazy_len, 1);
> 
> Ah indeed exactly what I ended up doing.
> 
> > > +}
> > > +
> > >  /*
> > >   * Dequeue the oldest rcu_head structure from the specified callback
> > >   * list.
> > > diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
> > > index 431cee212467..c3d7de65b689 100644
> > > --- a/kernel/rcu/rcu_segcblist.h
> > > +++ b/kernel/rcu/rcu_segcblist.h
> > > @@ -15,14 +15,28 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp)
> > >  	return READ_ONCE(rclp->len);
> > >  }
> > >  
> > > +/* Return number of callbacks in the specified callback list. */
> > > +static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
> > > +{
> > > +#ifdef CONFIG_RCU_LAZY
> > > +	return READ_ONCE(rclp->lazy_len);
> > > +#else
> > > +	return 0;
> > > +#endif
> > 
> > Please use IS_ENABLED().  This saves a line (and lots of characters)
> > but compiles just as efficienctly.
> 
> Sounds good, looks a lot better, thanks!
> 
> It ends up looking like:
> 
> static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp)
> {
>         if (IS_ENABLED(CONFIG_RCU_LAZY))
>                 return READ_ONCE(rclp->lazy_len);
>         return 0;
> }
> 
> static inline void rcu_cblist_reset_lazy_len(struct rcu_cblist *rclp)
> {
>         if (IS_ENABLED(CONFIG_RCU_LAZY))
>                 WRITE_ONCE(rclp->lazy_len, 0);
> }

All much better, thank you!

> > > +}
> > > +
> > >  /* Return number of callbacks in segmented callback list by summing seglen. */
> > >  long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp);
> > >  
> > >  void rcu_cblist_init(struct rcu_cblist *rclp);
> > >  void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp);
> > > +void rcu_cblist_enqueue_lazy(struct rcu_cblist *rclp, struct rcu_head *rhp);
> > >  void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp,
> > >  			      struct rcu_cblist *srclp,
> > >  			      struct rcu_head *rhp);
> > > +void rcu_cblist_flush_enqueue_lazy(struct rcu_cblist *drclp,
> > > +			      struct rcu_cblist *srclp,
> > > +			      struct rcu_head *rhp);
> > 
> > Please line up the "struct" keywords.  (Still picky, I know...)
> 
> Nuked it due to new lazy parameter so no issue now.
>  
> > >  struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp);
> > >  
> > >  /*
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index c25ba442044a..d2e3d6e176d2 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3098,7 +3098,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
> > >   * Implementation of these memory-ordering guarantees is described here:
> > >   * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
> > >   */
> > 
> > The above docbook comment needs to move to call_rcu().
> 
> Ok sure.
> 
> > > -void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > +static void
> > > +__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
> > >  {
> > >  	static atomic_t doublefrees;
> > >  	unsigned long flags;
> > > @@ -3139,7 +3140,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  	}
> > >  
> > >  	check_cb_ovld(rdp);
> > > -	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
> > > +	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
> > >  		return; // Enqueued onto ->nocb_bypass, so just leave.
> > >  	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
> > >  	rcu_segcblist_enqueue(&rdp->cblist, head);
> > > @@ -3161,8 +3162,21 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  		local_irq_restore(flags);
> > >  	}
> > >  }
> > > -EXPORT_SYMBOL_GPL(call_rcu);
> > 
> > Please add a docbook comment for call_rcu_lazy().  It can be brief, for
> > example, by referring to call_rcu()'s docbook comment for memory-ordering
> > details.
> 
> I added something like the following, hope it looks OK:
> 
> #ifdef CONFIG_RCU_LAZY
> /**
>  * call_rcu_lazy() - Lazily queue RCU callback for invocation after grace period.
>  * @head: structure to be used for queueing the RCU updates.
>  * @func: actual callback function to be invoked after the grace period
>  *
>  * The callback function will be invoked some time after a full grace
>  * period elapses, in other words after all pre-existing RCU read-side
>  * critical sections have completed.
>  *
>  * Use this API instead of call_rcu() if you don't mind the callback being
>  * invoked after very long periods of time on systems without memory pressure
>  * and on systems which are lightly loaded or mostly idle.
>  *
>  * Other than the extra delay in callbacks being invoked, this function is
>  * identical to, and reuses call_rcu()'s logic. Refer to call_rcu() for more
>  * details about memory ordering and other functionality.

Much better, thank you!

>  */
> void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
> {
>         return __call_rcu_common(head, func, true);
> }
> EXPORT_SYMBOL_GPL(call_rcu_lazy);
> #endif
> 
> > 
> > > +#ifdef CONFIG_RCU_LAZY
> > > +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func)
> > > +{
> > > +	return __call_rcu_common(head, func, true);
> > > +}
> > > +EXPORT_SYMBOL_GPL(call_rcu_lazy);
> > > +#endif
> > > +
> > > +void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > +{
> > > +	return __call_rcu_common(head, func, false);
> > > +
> > > +}
> > > +EXPORT_SYMBOL_GPL(call_rcu);
> > >  
> > >  /* Maximum number of jiffies to wait before draining a batch. */
> > >  #define KFREE_DRAIN_JIFFIES (HZ / 50)
> > > @@ -4056,7 +4070,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> > >  	rdp->barrier_head.func = rcu_barrier_callback;
> > >  	debug_rcu_head_queue(&rdp->barrier_head);
> > >  	rcu_nocb_lock(rdp);
> > > -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> > > +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
> > >  	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
> > >  		atomic_inc(&rcu_state.barrier_cpu_count);
> > >  	} else {
> > > @@ -4476,7 +4490,7 @@ void rcutree_migrate_callbacks(int cpu)
> > >  	my_rdp = this_cpu_ptr(&rcu_data);
> > >  	my_rnp = my_rdp->mynode;
> > >  	rcu_nocb_lock(my_rdp); /* irqs already disabled. */
> > > -	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies));
> > > +	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false));
> > >  	raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */
> > >  	/* Leverage recent GPs and set GP for new callbacks. */
> > >  	needwake = rcu_advance_cbs(my_rnp, rdp) ||
> > > diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> > > index 2ccf5845957d..fec4fad6654b 100644
> > > --- a/kernel/rcu/tree.h
> > > +++ b/kernel/rcu/tree.h
> > > @@ -267,8 +267,9 @@ struct rcu_data {
> > >  /* Values for nocb_defer_wakeup field in struct rcu_data. */
> > >  #define RCU_NOCB_WAKE_NOT	0
> > >  #define RCU_NOCB_WAKE_BYPASS	1
> > > -#define RCU_NOCB_WAKE		2
> > > -#define RCU_NOCB_WAKE_FORCE	3
> > > +#define RCU_NOCB_WAKE_LAZY	2
> > > +#define RCU_NOCB_WAKE		3
> > > +#define RCU_NOCB_WAKE_FORCE	4
> > >  
> > >  #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
> > >  					/* For jiffies_till_first_fqs and */
> > > @@ -436,9 +437,10 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
> > >  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
> > >  static void rcu_init_one_nocb(struct rcu_node *rnp);
> > >  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > > -				  unsigned long j);
> > > +				  unsigned long j, bool lazy);
> > >  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > > -				bool *was_alldone, unsigned long flags);
> > > +				bool *was_alldone, unsigned long flags,
> > > +				bool lazy);
> > >  static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
> > >  				 unsigned long flags);
> > >  static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index e369efe94fda..b9244f22e102 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -256,6 +256,8 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> > >  	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
> > >  }
> > 
> > Comment on LAZY_FLUSH_JIFFIES purpose in life, please!  (At some point
> > more flexibility may be required, but let's not unnecessarily rush
> > into that.)
> 
> I added this:
> /*
>  * LAZY_FLUSH_JIFFIES decides the maximum amount of time that
>  * can elapse before lazy callbacks are flushed. Lazy callbacks
>  * could be flushed much earlier for a number of other reasons
>  * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are
>  * left unsubmitted to RCU after those many jiffies.
>  */

Much better, thank you!

> > > +#define LAZY_FLUSH_JIFFIES (10 * HZ)
> > > +
> > >  /*
> > >   * Arrange to wake the GP kthread for this NOCB group at some future
> > >   * time when it is safe to do so.
> > > @@ -272,7 +274,10 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> > >  	 * Bypass wakeup overrides previous deferments. In case
> > >  	 * of callback storm, no need to wake up too early.
> > >  	 */
> > > -	if (waketype == RCU_NOCB_WAKE_BYPASS) {
> > > +	if (waketype == RCU_NOCB_WAKE_LAZY) {
> > 
> > Presumably we get here only if all of this CPU's callbacks are lazy?
> 
> Yes that's right.

OK, thank yoiu for the clarification.

> > >  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
> > >  	WRITE_ONCE(rdp->nocb_bypass_first, j);
> > >  	rcu_nocb_bypass_unlock(rdp);
> > > @@ -326,13 +337,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > >   * Note that this function always returns true if rhp is NULL.
> > >   */
> > >  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > > -				  unsigned long j)
> > > +				  unsigned long j, bool lazy)
> > >  {
> > >  	if (!rcu_rdp_is_offloaded(rdp))
> > >  		return true;
> > >  	rcu_lockdep_assert_cblist_protected(rdp);
> > >  	rcu_nocb_bypass_lock(rdp);
> > > -	return rcu_nocb_do_flush_bypass(rdp, rhp, j);
> > > +	return rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy);
> > >  }
> > >  
> > >  /*
> > > @@ -345,7 +356,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
> > >  	if (!rcu_rdp_is_offloaded(rdp) ||
> > >  	    !rcu_nocb_bypass_trylock(rdp))
> > >  		return;
> > > -	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
> > > +	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false));
> > >  }
> > >  
> > >  /*
> > > @@ -367,12 +378,14 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
> > >   * there is only one CPU in operation.
> > >   */
> > >  static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > > -				bool *was_alldone, unsigned long flags)
> > > +				bool *was_alldone, unsigned long flags,
> > > +				bool lazy)
> > >  {
> > >  	unsigned long c;
> > >  	unsigned long cur_gp_seq;
> > >  	unsigned long j = jiffies;
> > >  	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > +	long n_lazy_cbs = rcu_cblist_n_lazy_cbs(&rdp->nocb_bypass);
> > >  
> > >  	lockdep_assert_irqs_disabled();
> > >  
> > > @@ -414,30 +427,37 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > >  	}
> > >  	WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > >  
> > > -	// If there hasn't yet been all that many ->cblist enqueues
> > > -	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
> > > -	// ->nocb_bypass first.
> > > -	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
> > > +	// If caller passed a non-lazy CB and there hasn't yet been all that
> > > +	// many ->cblist enqueues this jiffy, tell the caller to enqueue it
> > > +	// onto ->cblist.  But flush ->nocb_bypass first. Also do so, if total
> > > +	// number of CBs (lazy + non-lazy) grows too much.
> > > +	//
> > > +	// Note that if the bypass list has lazy CBs, and the main list is
> > > +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> > > +	// the lazy CBs to the main list as well. That's the right thing to do,
> > > +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> > > +	// one, we can just reuse that GP for the already queued-up lazy ones.
> > > +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> > > +	    (lazy && n_lazy_cbs >= qhimark)) {
> > >  		rcu_nocb_lock(rdp);
> > >  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> > >  		if (*was_alldone)
> > >  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> > > -					    TPS("FirstQ"));
> > > -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> > > +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> > > +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> > 
> > The "false" here instead of "lazy" is because the caller is to do the
> > enqueuing, correct?
> 
> There is no difference between using false or lazy here, because the bypass
> flush is not also enqueuing the lazy callback, right?
> 
> We can also pass lazy instead of false if that's less confusing.
> 
> Or maybe I missed the issue you're raising?

I am mostly checking up on your intended meaning of "lazy" in various
contexts.  It could mean only that the caller requested laziness, or in
some cases it could mean that the callback actually will be lazy.

I can rationalize the "false" above as a "don't care" in this case
because (as you say) there is not callback.  In which case this code
is OK as is, as long as the header comment for rcu_nocb_flush_bypass()
clearly states that this parameter has meaning only when there really
is a callback being queued.

> > >  		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
> > >  		return false; // Caller must enqueue the callback.
> > >  	}
> > >  
> > >  	// If ->nocb_bypass has been used too long or is too full,
> > >  	// flush ->nocb_bypass to ->cblist.
> > > -	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
> > > -	    ncbs >= qhimark) {
> > > +	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
> > >  		rcu_nocb_lock(rdp);
> > > -		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
> > > +		if (!rcu_nocb_flush_bypass(rdp, rhp, j, true)) {
> > 
> > But shouldn't this "true" be "lazy"?  I don't see how we are guaranteed
> > that the callback is in fact lazy at this point in the code.  Also,
> > there is not yet a guarantee that the caller will do the enqueuing.
> > So what am I missing?
> 
> Sorry I screwed this part up. I think I meant 'false' here, if the list grew
> too big- then I think I would prefer if the new lazy CB instead is treated as
> non-lazy. But if that's too confusing, I will just pass 'lazy' instead. What
> do you think?

Good point, if we are choosing to override the laziness requested by the
caller, then it should say "true".  It would be good to have a comment
saying that is what we are doing, correct?

> Will reply more to the rest of the comments soon, thanks!

Sounds good!  (Hey, wouldn't want you to be bored!)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-07-08 23:06       ` Paul E. McKenney
@ 2022-07-12 20:27         ` Joel Fernandes
  2022-07-12 20:58           ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-12 20:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, vineeth

Ah, with all the threads, I missed this one :(. Sorry about that.

On Fri, Jul 8, 2022 at 7:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:

> > Currently I added a test like the following which adds a new torture type, my
> > thought was to stress the new code to make sure nothing crashed or hung the
> > kernel. That is working well except I don't exactly understand the total-gps
> > print showing 0, which the other print shows 1188 GPs. I'll go dig into that
> > tomorrow.. thanks!
> >
> > The print shows
> > TREE11 ------- 1474 GPs (12.2833/s) [rcu_lazy: g0 f0x0 total-gps=0]
> > TREE11 no success message, 7 successful version messages
>
> Nice!!!  It is very good to see you correctly using the rcu_torture_ops
> facility correctly!
>
> And this could be good for your own testing, and I am happy to pull it
> in for that purpose (given it being fixed, having a good commit log,
> and so on).  After all, TREE10 is quite similar -- not part of CFLIST,
> but useful for certain types of focused testing.
>
> However, it would be very good to get call_rcu_lazy() testing going
> more generally, and in particular in TREE01 where offloading changes
> dynamically.  A good way to do this is to add a .call_lazy() component
> to the rcu_torture_ops structure, and check for it in a manner similar
> to that done for the .deferred_free() component.  Including adding a
> gp_normal_lazy module parameter.  This would allow habitual testing
> on a few scenarios and focused lazy testing on all of them via the
> --bootargs parameter.

Ok, if you don't mind I will make this particular enhancement to the
torture test in a future patchset, since I kind of decided on doing v3
with just fixes to what I have and more testing. Certainly happy to
enhance these tests in a future version.

> On the total-gps=0, the usual suspicion would be that the lazy callbacks
> never got invoked.  It looks like you were doing about a two-minute run,
> so maybe a longer run?  Though weren't they supposed to kick in at 15
> seconds or so?  Or did this value of zero come about because this run
> used exactly 300 grace periods?

It was zero because it required the RCU_FLAVOR torture type, where as
my torture type was lazy. Adding RCU_LAZY_FLAVOR to the list fixed it
:)

Thanks!

 - Joel


> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 7120165a9342..cc6b7392d801 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -872,6 +872,64 @@ static struct rcu_torture_ops tasks_rude_ops = {
> >
> >  #endif // #else #ifdef CONFIG_TASKS_RUDE_RCU
> >
> > +#ifdef CONFIG_RCU_LAZY
> > +
> > +/*
> > + * Definitions for lazy RCU torture testing.
> > + */
> > +unsigned long orig_jiffies_till_flush;
> > +
> > +static void rcu_sync_torture_init_lazy(void)
> > +{
> > +     rcu_sync_torture_init();
> > +
> > +     orig_jiffies_till_flush = rcu_lazy_get_jiffies_till_flush();
> > +     rcu_lazy_set_jiffies_till_flush(50);
> > +}
> > +
> > +static void rcu_lazy_cleanup(void)
> > +{
> > +     rcu_lazy_set_jiffies_till_flush(orig_jiffies_till_flush);
> > +}
> > +
> > +static struct rcu_torture_ops rcu_lazy_ops = {
> > +     .ttype                  = RCU_LAZY_FLAVOR,
> > +     .init                   = rcu_sync_torture_init_lazy,
> > +     .cleanup                = rcu_lazy_cleanup,
> > +     .readlock               = rcu_torture_read_lock,
> > +     .read_delay             = rcu_read_delay,
> > +     .readunlock             = rcu_torture_read_unlock,
> > +     .readlock_held          = torture_readlock_not_held,
> > +     .get_gp_seq             = rcu_get_gp_seq,
> > +     .gp_diff                = rcu_seq_diff,
> > +     .deferred_free          = rcu_torture_deferred_free,
> > +     .sync                   = synchronize_rcu,
> > +     .exp_sync               = synchronize_rcu_expedited,
> > +     .get_gp_state           = get_state_synchronize_rcu,
> > +     .start_gp_poll          = start_poll_synchronize_rcu,
> > +     .poll_gp_state          = poll_state_synchronize_rcu,
> > +     .cond_sync              = cond_synchronize_rcu,
> > +     .call                   = call_rcu_lazy,
> > +     .cb_barrier             = rcu_barrier,
> > +     .fqs                    = rcu_force_quiescent_state,
> > +     .stats                  = NULL,
> > +     .gp_kthread_dbg         = show_rcu_gp_kthreads,
> > +     .check_boost_failed     = rcu_check_boost_fail,
> > +     .stall_dur              = rcu_jiffies_till_stall_check,
> > +     .irq_capable            = 1,
> > +     .can_boost              = IS_ENABLED(CONFIG_RCU_BOOST),
> > +     .extendables            = RCUTORTURE_MAX_EXTEND,
> > +     .name                   = "rcu_lazy"
> > +};
> > +
> > +#define LAZY_OPS &rcu_lazy_ops,
> > +
> > +#else // #ifdef CONFIG_RCU_LAZY
> > +
> > +#define LAZY_OPS
> > +
> > +#endif // #else #ifdef CONFIG_RCU_LAZY
> > +
> >
> >  #ifdef CONFIG_TASKS_TRACE_RCU
> >
> > @@ -3145,7 +3203,7 @@ rcu_torture_init(void)
> >       unsigned long gp_seq = 0;
> >       static struct rcu_torture_ops *torture_ops[] = {
> >               &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &busted_srcud_ops,
> > -             TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS
> > +             TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS LAZY_OPS
> >               &trivial_ops,
> >       };
> >
> > diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11 b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
> > new file mode 100644
> > index 000000000000..436013f3e015
> > --- /dev/null
> > +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
> > @@ -0,0 +1,18 @@
> > +CONFIG_SMP=y
> > +CONFIG_PREEMPT_NONE=n
> > +CONFIG_PREEMPT_VOLUNTARY=n
> > +CONFIG_PREEMPT=y
> > +#CHECK#CONFIG_PREEMPT_RCU=y
> > +CONFIG_HZ_PERIODIC=n
> > +CONFIG_NO_HZ_IDLE=y
> > +CONFIG_NO_HZ_FULL=n
> > +CONFIG_RCU_TRACE=y
> > +CONFIG_HOTPLUG_CPU=y
> > +CONFIG_MAXSMP=y
> > +CONFIG_CPUMASK_OFFSTACK=y
> > +CONFIG_RCU_NOCB_CPU=y
> > +CONFIG_DEBUG_LOCK_ALLOC=n
> > +CONFIG_RCU_BOOST=n
> > +CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> > +CONFIG_RCU_EXPERT=y
> > +CONFIG_RCU_LAZY=y
> > diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> > new file mode 100644
> > index 000000000000..9b6f720d4ccd
> > --- /dev/null
> > +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> > @@ -0,0 +1,8 @@
> > +maxcpus=8 nr_cpus=43
> > +rcutree.gp_preinit_delay=3
> > +rcutree.gp_init_delay=3
> > +rcutree.gp_cleanup_delay=3
> > +rcu_nocbs=0-7
> > +rcutorture.torture_type=rcu_lazy
> > +rcutorture.nocbs_nthreads=8
> > +rcutorture.fwd_progress=0
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-07-10 16:03       ` Paul E. McKenney
@ 2022-07-12 20:53         ` Joel Fernandes
  2022-07-12 21:04           ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-12 20:53 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth


On 7/10/2022 12:03 PM, Paul E. McKenney wrote:
[..]
>>>> +	// Note that if the bypass list has lazy CBs, and the main list is
>>>> +	// empty, and rhp happens to be non-lazy, then we end up flushing all
>>>> +	// the lazy CBs to the main list as well. That's the right thing to do,
>>>> +	// since we are kick-starting RCU GP processing anyway for the non-lazy
>>>> +	// one, we can just reuse that GP for the already queued-up lazy ones.
>>>> +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
>>>> +	    (lazy && n_lazy_cbs >= qhimark)) {
>>>>  		rcu_nocb_lock(rdp);
>>>>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>>>>  		if (*was_alldone)
>>>>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>>>> -					    TPS("FirstQ"));
>>>> -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
>>>> +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
>>>> +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
>>>
>>> The "false" here instead of "lazy" is because the caller is to do the
>>> enqueuing, correct?
>>
>> There is no difference between using false or lazy here, because the bypass
>> flush is not also enqueuing the lazy callback, right?
>>
>> We can also pass lazy instead of false if that's less confusing.
>>
>> Or maybe I missed the issue you're raising?
> 
> I am mostly checking up on your intended meaning of "lazy" in various
> contexts.  It could mean only that the caller requested laziness, or in
> some cases it could mean that the callback actually will be lazy.
> 
> I can rationalize the "false" above as a "don't care" in this case
> because (as you say) there is not callback.  In which case this code
> is OK as is, as long as the header comment for rcu_nocb_flush_bypass()
> clearly states that this parameter has meaning only when there really
> is a callback being queued.

I decided to change this and the below to "lazy" variable instead of
true/false, as the code is cleaner and less confusing IMO. It makes
sense to me and in my testing it works fine. Hope that's Ok with you.

About changing the lazy length count to a flag, one drawback of doing
that is, say if there are some non-lazy CBs in the bypass list, then the
lazy shrinker will end up reporting an inaccurate count. Also
considering that it might be harder to add the count back later say if
we need it for tracing, I would say lets leave it as is. I will keep the
counter for v3 and we can discuss. Does that sound good to you?

I think some more testing, checkpatch running etc and I should be good
to send v3 :)

Thanks!

 - Joel


> 
>>>>  		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
>>>>  		return false; // Caller must enqueue the callback.
>>>>  	}
>>>>  
>>>>  	// If ->nocb_bypass has been used too long or is too full,
>>>>  	// flush ->nocb_bypass to ->cblist.
>>>> -	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
>>>> -	    ncbs >= qhimark) {
>>>> +	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
>>>>  		rcu_nocb_lock(rdp);
>>>> -		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
>>>> +		if (!rcu_nocb_flush_bypass(rdp, rhp, j, true)) {
>>>
>>> But shouldn't this "true" be "lazy"?  I don't see how we are guaranteed
>>> that the callback is in fact lazy at this point in the code.  Also,
>>> there is not yet a guarantee that the caller will do the enqueuing.
>>> So what am I missing?
>>
>> Sorry I screwed this part up. I think I meant 'false' here, if the list grew
>> too big- then I think I would prefer if the new lazy CB instead is treated as
>> non-lazy. But if that's too confusing, I will just pass 'lazy' instead. What
>> do you think?
> 
> Good point, if we are choosing to override the laziness requested by the
> caller, then it should say "true".  It would be good to have a comment
> saying that is what we are doing, correct?
> 
>> Will reply more to the rest of the comments soon, thanks!
> 
> Sounds good!  (Hey, wouldn't want you to be bored!)
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-07-12 20:27         ` Joel Fernandes
@ 2022-07-12 20:58           ` Paul E. McKenney
  2022-07-12 21:15             ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-12 20:58 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, vineeth

On Tue, Jul 12, 2022 at 04:27:05PM -0400, Joel Fernandes wrote:
> Ah, with all the threads, I missed this one :(. Sorry about that.

I know that feeling...

> On Fri, Jul 8, 2022 at 7:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> > > Currently I added a test like the following which adds a new torture type, my
> > > thought was to stress the new code to make sure nothing crashed or hung the
> > > kernel. That is working well except I don't exactly understand the total-gps
> > > print showing 0, which the other print shows 1188 GPs. I'll go dig into that
> > > tomorrow.. thanks!
> > >
> > > The print shows
> > > TREE11 ------- 1474 GPs (12.2833/s) [rcu_lazy: g0 f0x0 total-gps=0]
> > > TREE11 no success message, 7 successful version messages
> >
> > Nice!!!  It is very good to see you correctly using the rcu_torture_ops
> > facility correctly!
> >
> > And this could be good for your own testing, and I am happy to pull it
> > in for that purpose (given it being fixed, having a good commit log,
> > and so on).  After all, TREE10 is quite similar -- not part of CFLIST,
> > but useful for certain types of focused testing.
> >
> > However, it would be very good to get call_rcu_lazy() testing going
> > more generally, and in particular in TREE01 where offloading changes
> > dynamically.  A good way to do this is to add a .call_lazy() component
> > to the rcu_torture_ops structure, and check for it in a manner similar
> > to that done for the .deferred_free() component.  Including adding a
> > gp_normal_lazy module parameter.  This would allow habitual testing
> > on a few scenarios and focused lazy testing on all of them via the
> > --bootargs parameter.
> 
> Ok, if you don't mind I will make this particular enhancement to the
> torture test in a future patchset, since I kind of decided on doing v3
> with just fixes to what I have and more testing. Certainly happy to
> enhance these tests in a future version.

No need to gate v3 on those tests.

> > On the total-gps=0, the usual suspicion would be that the lazy callbacks
> > never got invoked.  It looks like you were doing about a two-minute run,
> > so maybe a longer run?  Though weren't they supposed to kick in at 15
> > seconds or so?  Or did this value of zero come about because this run
> > used exactly 300 grace periods?
> 
> It was zero because it required the RCU_FLAVOR torture type, where as
> my torture type was lazy. Adding RCU_LAZY_FLAVOR to the list fixed it
> :)

Heh!  Then it didn't actually do any testing.  Done that as well!

							Thanx, Paul

> Thanks!
> 
>  - Joel
> 
> 
> > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > index 7120165a9342..cc6b7392d801 100644
> > > --- a/kernel/rcu/rcutorture.c
> > > +++ b/kernel/rcu/rcutorture.c
> > > @@ -872,6 +872,64 @@ static struct rcu_torture_ops tasks_rude_ops = {
> > >
> > >  #endif // #else #ifdef CONFIG_TASKS_RUDE_RCU
> > >
> > > +#ifdef CONFIG_RCU_LAZY
> > > +
> > > +/*
> > > + * Definitions for lazy RCU torture testing.
> > > + */
> > > +unsigned long orig_jiffies_till_flush;
> > > +
> > > +static void rcu_sync_torture_init_lazy(void)
> > > +{
> > > +     rcu_sync_torture_init();
> > > +
> > > +     orig_jiffies_till_flush = rcu_lazy_get_jiffies_till_flush();
> > > +     rcu_lazy_set_jiffies_till_flush(50);
> > > +}
> > > +
> > > +static void rcu_lazy_cleanup(void)
> > > +{
> > > +     rcu_lazy_set_jiffies_till_flush(orig_jiffies_till_flush);
> > > +}
> > > +
> > > +static struct rcu_torture_ops rcu_lazy_ops = {
> > > +     .ttype                  = RCU_LAZY_FLAVOR,
> > > +     .init                   = rcu_sync_torture_init_lazy,
> > > +     .cleanup                = rcu_lazy_cleanup,
> > > +     .readlock               = rcu_torture_read_lock,
> > > +     .read_delay             = rcu_read_delay,
> > > +     .readunlock             = rcu_torture_read_unlock,
> > > +     .readlock_held          = torture_readlock_not_held,
> > > +     .get_gp_seq             = rcu_get_gp_seq,
> > > +     .gp_diff                = rcu_seq_diff,
> > > +     .deferred_free          = rcu_torture_deferred_free,
> > > +     .sync                   = synchronize_rcu,
> > > +     .exp_sync               = synchronize_rcu_expedited,
> > > +     .get_gp_state           = get_state_synchronize_rcu,
> > > +     .start_gp_poll          = start_poll_synchronize_rcu,
> > > +     .poll_gp_state          = poll_state_synchronize_rcu,
> > > +     .cond_sync              = cond_synchronize_rcu,
> > > +     .call                   = call_rcu_lazy,
> > > +     .cb_barrier             = rcu_barrier,
> > > +     .fqs                    = rcu_force_quiescent_state,
> > > +     .stats                  = NULL,
> > > +     .gp_kthread_dbg         = show_rcu_gp_kthreads,
> > > +     .check_boost_failed     = rcu_check_boost_fail,
> > > +     .stall_dur              = rcu_jiffies_till_stall_check,
> > > +     .irq_capable            = 1,
> > > +     .can_boost              = IS_ENABLED(CONFIG_RCU_BOOST),
> > > +     .extendables            = RCUTORTURE_MAX_EXTEND,
> > > +     .name                   = "rcu_lazy"
> > > +};
> > > +
> > > +#define LAZY_OPS &rcu_lazy_ops,
> > > +
> > > +#else // #ifdef CONFIG_RCU_LAZY
> > > +
> > > +#define LAZY_OPS
> > > +
> > > +#endif // #else #ifdef CONFIG_RCU_LAZY
> > > +
> > >
> > >  #ifdef CONFIG_TASKS_TRACE_RCU
> > >
> > > @@ -3145,7 +3203,7 @@ rcu_torture_init(void)
> > >       unsigned long gp_seq = 0;
> > >       static struct rcu_torture_ops *torture_ops[] = {
> > >               &rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops, &busted_srcud_ops,
> > > -             TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS
> > > +             TASKS_OPS TASKS_RUDE_OPS TASKS_TRACING_OPS LAZY_OPS
> > >               &trivial_ops,
> > >       };
> > >
> > > diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11 b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
> > > new file mode 100644
> > > index 000000000000..436013f3e015
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11
> > > @@ -0,0 +1,18 @@
> > > +CONFIG_SMP=y
> > > +CONFIG_PREEMPT_NONE=n
> > > +CONFIG_PREEMPT_VOLUNTARY=n
> > > +CONFIG_PREEMPT=y
> > > +#CHECK#CONFIG_PREEMPT_RCU=y
> > > +CONFIG_HZ_PERIODIC=n
> > > +CONFIG_NO_HZ_IDLE=y
> > > +CONFIG_NO_HZ_FULL=n
> > > +CONFIG_RCU_TRACE=y
> > > +CONFIG_HOTPLUG_CPU=y
> > > +CONFIG_MAXSMP=y
> > > +CONFIG_CPUMASK_OFFSTACK=y
> > > +CONFIG_RCU_NOCB_CPU=y
> > > +CONFIG_DEBUG_LOCK_ALLOC=n
> > > +CONFIG_RCU_BOOST=n
> > > +CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> > > +CONFIG_RCU_EXPERT=y
> > > +CONFIG_RCU_LAZY=y
> > > diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> > > new file mode 100644
> > > index 000000000000..9b6f720d4ccd
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
> > > @@ -0,0 +1,8 @@
> > > +maxcpus=8 nr_cpus=43
> > > +rcutree.gp_preinit_delay=3
> > > +rcutree.gp_init_delay=3
> > > +rcutree.gp_cleanup_delay=3
> > > +rcu_nocbs=0-7
> > > +rcutorture.torture_type=rcu_lazy
> > > +rcutorture.nocbs_nthreads=8
> > > +rcutorture.fwd_progress=0
> > > --
> > > 2.37.0.rc0.161.g10f37bed90-goog
> > >

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-07-12 20:53         ` Joel Fernandes
@ 2022-07-12 21:04           ` Paul E. McKenney
  2022-07-12 21:10             ` Joel Fernandes
  0 siblings, 1 reply; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-12 21:04 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Tue, Jul 12, 2022 at 04:53:48PM -0400, Joel Fernandes wrote:
> 
> On 7/10/2022 12:03 PM, Paul E. McKenney wrote:
> [..]
> >>>> +	// Note that if the bypass list has lazy CBs, and the main list is
> >>>> +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> >>>> +	// the lazy CBs to the main list as well. That's the right thing to do,
> >>>> +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> >>>> +	// one, we can just reuse that GP for the already queued-up lazy ones.
> >>>> +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> >>>> +	    (lazy && n_lazy_cbs >= qhimark)) {
> >>>>  		rcu_nocb_lock(rdp);
> >>>>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> >>>>  		if (*was_alldone)
> >>>>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> >>>> -					    TPS("FirstQ"));
> >>>> -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> >>>> +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> >>>> +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> >>>
> >>> The "false" here instead of "lazy" is because the caller is to do the
> >>> enqueuing, correct?
> >>
> >> There is no difference between using false or lazy here, because the bypass
> >> flush is not also enqueuing the lazy callback, right?
> >>
> >> We can also pass lazy instead of false if that's less confusing.
> >>
> >> Or maybe I missed the issue you're raising?
> > 
> > I am mostly checking up on your intended meaning of "lazy" in various
> > contexts.  It could mean only that the caller requested laziness, or in
> > some cases it could mean that the callback actually will be lazy.
> > 
> > I can rationalize the "false" above as a "don't care" in this case
> > because (as you say) there is not callback.  In which case this code
> > is OK as is, as long as the header comment for rcu_nocb_flush_bypass()
> > clearly states that this parameter has meaning only when there really
> > is a callback being queued.
> 
> I decided to change this and the below to "lazy" variable instead of
> true/false, as the code is cleaner and less confusing IMO. It makes
> sense to me and in my testing it works fine. Hope that's Ok with you.

Sounds plausible.

> About changing the lazy length count to a flag, one drawback of doing
> that is, say if there are some non-lazy CBs in the bypass list, then the
> lazy shrinker will end up reporting an inaccurate count. Also
> considering that it might be harder to add the count back later say if
> we need it for tracing, I would say lets leave it as is. I will keep the
> counter for v3 and we can discuss. Does that sound good to you?

You lost me on this one.  If there are any non-lazy callbacks, the whole
bypass list is already being treated as non-lazy, right?  If so, then
the lazy shrinker should report the full count if all callbacks are lazy,
and should report none otherwise.  Or am I missing something here?

> I think some more testing, checkpatch running etc and I should be good
> to send v3 :)

Sounds good!

							Thanx, Paul

> Thanks!
> 
>  - Joel
> 
> 
> > 
> >>>>  		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
> >>>>  		return false; // Caller must enqueue the callback.
> >>>>  	}
> >>>>  
> >>>>  	// If ->nocb_bypass has been used too long or is too full,
> >>>>  	// flush ->nocb_bypass to ->cblist.
> >>>> -	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
> >>>> -	    ncbs >= qhimark) {
> >>>> +	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) || ncbs >= qhimark) {
> >>>>  		rcu_nocb_lock(rdp);
> >>>> -		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
> >>>> +		if (!rcu_nocb_flush_bypass(rdp, rhp, j, true)) {
> >>>
> >>> But shouldn't this "true" be "lazy"?  I don't see how we are guaranteed
> >>> that the callback is in fact lazy at this point in the code.  Also,
> >>> there is not yet a guarantee that the caller will do the enqueuing.
> >>> So what am I missing?
> >>
> >> Sorry I screwed this part up. I think I meant 'false' here, if the list grew
> >> too big- then I think I would prefer if the new lazy CB instead is treated as
> >> non-lazy. But if that's too confusing, I will just pass 'lazy' instead. What
> >> do you think?
> > 
> > Good point, if we are choosing to override the laziness requested by the
> > caller, then it should say "true".  It would be good to have a comment
> > saying that is what we are doing, correct?
> > 
> >> Will reply more to the rest of the comments soon, thanks!
> > 
> > Sounds good!  (Hey, wouldn't want you to be bored!)
> > 
> > 							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-07-12 21:04           ` Paul E. McKenney
@ 2022-07-12 21:10             ` Joel Fernandes
  2022-07-12 22:41               ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-12 21:10 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth



On 7/12/2022 5:04 PM, Paul E. McKenney wrote:
> On Tue, Jul 12, 2022 at 04:53:48PM -0400, Joel Fernandes wrote:
>>
>> On 7/10/2022 12:03 PM, Paul E. McKenney wrote:
>> [..]
>>>>>> +	// Note that if the bypass list has lazy CBs, and the main list is
>>>>>> +	// empty, and rhp happens to be non-lazy, then we end up flushing all
>>>>>> +	// the lazy CBs to the main list as well. That's the right thing to do,
>>>>>> +	// since we are kick-starting RCU GP processing anyway for the non-lazy
>>>>>> +	// one, we can just reuse that GP for the already queued-up lazy ones.
>>>>>> +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
>>>>>> +	    (lazy && n_lazy_cbs >= qhimark)) {
>>>>>>  		rcu_nocb_lock(rdp);
>>>>>>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
>>>>>>  		if (*was_alldone)
>>>>>>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
>>>>>> -					    TPS("FirstQ"));
>>>>>> -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
>>>>>> +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
>>>>>> +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
>>>>>
>>>>> The "false" here instead of "lazy" is because the caller is to do the
>>>>> enqueuing, correct?
>>>>
>>>> There is no difference between using false or lazy here, because the bypass
>>>> flush is not also enqueuing the lazy callback, right?
>>>>
>>>> We can also pass lazy instead of false if that's less confusing.
>>>>
>>>> Or maybe I missed the issue you're raising?
>>>
>>> I am mostly checking up on your intended meaning of "lazy" in various
>>> contexts.  It could mean only that the caller requested laziness, or in
>>> some cases it could mean that the callback actually will be lazy.
>>>
>>> I can rationalize the "false" above as a "don't care" in this case
>>> because (as you say) there is not callback.  In which case this code
>>> is OK as is, as long as the header comment for rcu_nocb_flush_bypass()
>>> clearly states that this parameter has meaning only when there really
>>> is a callback being queued.
>>
>> I decided to change this and the below to "lazy" variable instead of
>> true/false, as the code is cleaner and less confusing IMO. It makes
>> sense to me and in my testing it works fine. Hope that's Ok with you.
> 
> Sounds plausible.
> 
>> About changing the lazy length count to a flag, one drawback of doing
>> that is, say if there are some non-lazy CBs in the bypass list, then the
>> lazy shrinker will end up reporting an inaccurate count. Also
>> considering that it might be harder to add the count back later say if
>> we need it for tracing, I would say lets leave it as is. I will keep the
>> counter for v3 and we can discuss. Does that sound good to you?
> 
> You lost me on this one.  If there are any non-lazy callbacks, the whole
> bypass list is already being treated as non-lazy, right?  If so, then
> the lazy shrinker should report the full count if all callbacks are lazy,
> and should report none otherwise.  Or am I missing something here?
> 

That's one way to interpret it, another way is say there were a 1000
lazy CBs, and now 1 non-lazy came in. The shrinker could report the lazy
count as 0 per your interpretation. Yes its true they will get flushed
out in the next jiffie, but for that time instant, the number of lazy
CBs in the list is not zero! :) Yeah OK its a weak argument, still an
argument! ;-)

In any case, we saw the need for the length of the segcb lists to figure
out things via tracing, so I suspect we may need this in the future as
well, its a small cost so I would rather keep it if that's Ok with you! :)

Thanks,

- Joel


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-07-12 20:58           ` Paul E. McKenney
@ 2022-07-12 21:15             ` Joel Fernandes
  2022-07-12 22:41               ` Paul E. McKenney
  0 siblings, 1 reply; 60+ messages in thread
From: Joel Fernandes @ 2022-07-12 21:15 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, vineeth



On 7/12/2022 4:58 PM, Paul E. McKenney wrote:
> On Tue, Jul 12, 2022 at 04:27:05PM -0400, Joel Fernandes wrote:
>> Ah, with all the threads, I missed this one :(. Sorry about that.
> 
> I know that feeling...
> 
>> On Fri, Jul 8, 2022 at 7:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>>
>>>> Currently I added a test like the following which adds a new torture type, my
>>>> thought was to stress the new code to make sure nothing crashed or hung the
>>>> kernel. That is working well except I don't exactly understand the total-gps
>>>> print showing 0, which the other print shows 1188 GPs. I'll go dig into that
>>>> tomorrow.. thanks!
>>>>
>>>> The print shows
>>>> TREE11 ------- 1474 GPs (12.2833/s) [rcu_lazy: g0 f0x0 total-gps=0]
>>>> TREE11 no success message, 7 successful version messages
>>>
>>> Nice!!!  It is very good to see you correctly using the rcu_torture_ops
>>> facility correctly!
>>>
>>> And this could be good for your own testing, and I am happy to pull it
>>> in for that purpose (given it being fixed, having a good commit log,
>>> and so on).  After all, TREE10 is quite similar -- not part of CFLIST,
>>> but useful for certain types of focused testing.
>>>
>>> However, it would be very good to get call_rcu_lazy() testing going
>>> more generally, and in particular in TREE01 where offloading changes
>>> dynamically.  A good way to do this is to add a .call_lazy() component
>>> to the rcu_torture_ops structure, and check for it in a manner similar
>>> to that done for the .deferred_free() component.  Including adding a
>>> gp_normal_lazy module parameter.  This would allow habitual testing
>>> on a few scenarios and focused lazy testing on all of them via the
>>> --bootargs parameter.
>>
>> Ok, if you don't mind I will make this particular enhancement to the
>> torture test in a future patchset, since I kind of decided on doing v3
>> with just fixes to what I have and more testing. Certainly happy to
>> enhance these tests in a future version.
> 
> No need to gate v3 on those tests.
> 
>>> On the total-gps=0, the usual suspicion would be that the lazy callbacks
>>> never got invoked.  It looks like you were doing about a two-minute run,
>>> so maybe a longer run?  Though weren't they supposed to kick in at 15
>>> seconds or so?  Or did this value of zero come about because this run
>>> used exactly 300 grace periods?
>>
>> It was zero because it required the RCU_FLAVOR torture type, where as
>> my torture type was lazy. Adding RCU_LAZY_FLAVOR to the list fixed it
>> :)
> 
> Heh!  Then it didn't actually do any testing.  Done that as well!

Sorry to not be clear, I meant the switch-case list below, not the
torture list in rcutorture.c! It was in the rcutorture.c so was being
tested, just reporting zero gp_seq as I pointed.

/*
 * Send along grace-period-related data for rcutorture diagnostics.
 */
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
                            unsigned long *gp_seq)
{
        switch (test_type) {
        case RCU_FLAVOR:
        case RCU_LAZY_FLAVOR:
                *flags = READ_ONCE(rcu_state.gp_flags);
                *gp_seq = rcu_seq_current(&rcu_state.gp_seq);
                break;
        default:
                break;
        }
}
EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu()
  2022-07-12 21:15             ` Joel Fernandes
@ 2022-07-12 22:41               ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-12 22:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, LKML, Rushikesh S Kadam, Uladzislau Rezki (Sony),
	Neeraj upadhyay, Frederic Weisbecker, Steven Rostedt, vineeth

On Tue, Jul 12, 2022 at 05:15:23PM -0400, Joel Fernandes wrote:
> 
> 
> On 7/12/2022 4:58 PM, Paul E. McKenney wrote:
> > On Tue, Jul 12, 2022 at 04:27:05PM -0400, Joel Fernandes wrote:
> >> Ah, with all the threads, I missed this one :(. Sorry about that.
> > 
> > I know that feeling...
> > 
> >> On Fri, Jul 8, 2022 at 7:06 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>
> >>>> Currently I added a test like the following which adds a new torture type, my
> >>>> thought was to stress the new code to make sure nothing crashed or hung the
> >>>> kernel. That is working well except I don't exactly understand the total-gps
> >>>> print showing 0, which the other print shows 1188 GPs. I'll go dig into that
> >>>> tomorrow.. thanks!
> >>>>
> >>>> The print shows
> >>>> TREE11 ------- 1474 GPs (12.2833/s) [rcu_lazy: g0 f0x0 total-gps=0]
> >>>> TREE11 no success message, 7 successful version messages
> >>>
> >>> Nice!!!  It is very good to see you correctly using the rcu_torture_ops
> >>> facility correctly!
> >>>
> >>> And this could be good for your own testing, and I am happy to pull it
> >>> in for that purpose (given it being fixed, having a good commit log,
> >>> and so on).  After all, TREE10 is quite similar -- not part of CFLIST,
> >>> but useful for certain types of focused testing.
> >>>
> >>> However, it would be very good to get call_rcu_lazy() testing going
> >>> more generally, and in particular in TREE01 where offloading changes
> >>> dynamically.  A good way to do this is to add a .call_lazy() component
> >>> to the rcu_torture_ops structure, and check for it in a manner similar
> >>> to that done for the .deferred_free() component.  Including adding a
> >>> gp_normal_lazy module parameter.  This would allow habitual testing
> >>> on a few scenarios and focused lazy testing on all of them via the
> >>> --bootargs parameter.
> >>
> >> Ok, if you don't mind I will make this particular enhancement to the
> >> torture test in a future patchset, since I kind of decided on doing v3
> >> with just fixes to what I have and more testing. Certainly happy to
> >> enhance these tests in a future version.
> > 
> > No need to gate v3 on those tests.
> > 
> >>> On the total-gps=0, the usual suspicion would be that the lazy callbacks
> >>> never got invoked.  It looks like you were doing about a two-minute run,
> >>> so maybe a longer run?  Though weren't they supposed to kick in at 15
> >>> seconds or so?  Or did this value of zero come about because this run
> >>> used exactly 300 grace periods?
> >>
> >> It was zero because it required the RCU_FLAVOR torture type, where as
> >> my torture type was lazy. Adding RCU_LAZY_FLAVOR to the list fixed it
> >> :)
> > 
> > Heh!  Then it didn't actually do any testing.  Done that as well!
> 
> Sorry to not be clear, I meant the switch-case list below, not the
> torture list in rcutorture.c! It was in the rcutorture.c so was being
> tested, just reporting zero gp_seq as I pointed.
> 
> /*
>  * Send along grace-period-related data for rcutorture diagnostics.
>  */
> void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
>                             unsigned long *gp_seq)
> {
>         switch (test_type) {
>         case RCU_FLAVOR:
>         case RCU_LAZY_FLAVOR:
>                 *flags = READ_ONCE(rcu_state.gp_flags);
>                 *gp_seq = rcu_seq_current(&rcu_state.gp_seq);
>                 break;
>         default:
>                 break;
>         }
> }
> EXPORT_SYMBOL_GPL(rcutorture_get_gp_data);

Ah, that would do it!  Thank you for the clarification.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation
  2022-07-12 21:10             ` Joel Fernandes
@ 2022-07-12 22:41               ` Paul E. McKenney
  0 siblings, 0 replies; 60+ messages in thread
From: Paul E. McKenney @ 2022-07-12 22:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, linux-kernel, rushikesh.s.kadam, urezki, neeraj.iitr10,
	frederic, rostedt, vineeth

On Tue, Jul 12, 2022 at 05:10:41PM -0400, Joel Fernandes wrote:
> 
> 
> On 7/12/2022 5:04 PM, Paul E. McKenney wrote:
> > On Tue, Jul 12, 2022 at 04:53:48PM -0400, Joel Fernandes wrote:
> >>
> >> On 7/10/2022 12:03 PM, Paul E. McKenney wrote:
> >> [..]
> >>>>>> +	// Note that if the bypass list has lazy CBs, and the main list is
> >>>>>> +	// empty, and rhp happens to be non-lazy, then we end up flushing all
> >>>>>> +	// the lazy CBs to the main list as well. That's the right thing to do,
> >>>>>> +	// since we are kick-starting RCU GP processing anyway for the non-lazy
> >>>>>> +	// one, we can just reuse that GP for the already queued-up lazy ones.
> >>>>>> +	if ((rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) ||
> >>>>>> +	    (lazy && n_lazy_cbs >= qhimark)) {
> >>>>>>  		rcu_nocb_lock(rdp);
> >>>>>>  		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
> >>>>>>  		if (*was_alldone)
> >>>>>>  			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
> >>>>>> -					    TPS("FirstQ"));
> >>>>>> -		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
> >>>>>> +					    lazy ? TPS("FirstLazyQ") : TPS("FirstQ"));
> >>>>>> +		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
> >>>>>
> >>>>> The "false" here instead of "lazy" is because the caller is to do the
> >>>>> enqueuing, correct?
> >>>>
> >>>> There is no difference between using false or lazy here, because the bypass
> >>>> flush is not also enqueuing the lazy callback, right?
> >>>>
> >>>> We can also pass lazy instead of false if that's less confusing.
> >>>>
> >>>> Or maybe I missed the issue you're raising?
> >>>
> >>> I am mostly checking up on your intended meaning of "lazy" in various
> >>> contexts.  It could mean only that the caller requested laziness, or in
> >>> some cases it could mean that the callback actually will be lazy.
> >>>
> >>> I can rationalize the "false" above as a "don't care" in this case
> >>> because (as you say) there is not callback.  In which case this code
> >>> is OK as is, as long as the header comment for rcu_nocb_flush_bypass()
> >>> clearly states that this parameter has meaning only when there really
> >>> is a callback being queued.
> >>
> >> I decided to change this and the below to "lazy" variable instead of
> >> true/false, as the code is cleaner and less confusing IMO. It makes
> >> sense to me and in my testing it works fine. Hope that's Ok with you.
> > 
> > Sounds plausible.
> > 
> >> About changing the lazy length count to a flag, one drawback of doing
> >> that is, say if there are some non-lazy CBs in the bypass list, then the
> >> lazy shrinker will end up reporting an inaccurate count. Also
> >> considering that it might be harder to add the count back later say if
> >> we need it for tracing, I would say lets leave it as is. I will keep the
> >> counter for v3 and we can discuss. Does that sound good to you?
> > 
> > You lost me on this one.  If there are any non-lazy callbacks, the whole
> > bypass list is already being treated as non-lazy, right?  If so, then
> > the lazy shrinker should report the full count if all callbacks are lazy,
> > and should report none otherwise.  Or am I missing something here?
> > 
> 
> That's one way to interpret it, another way is say there were a 1000
> lazy CBs, and now 1 non-lazy came in. The shrinker could report the lazy
> count as 0 per your interpretation. Yes its true they will get flushed
> out in the next jiffie, but for that time instant, the number of lazy
> CBs in the list is not zero! :) Yeah OK its a weak argument, still an
> argument! ;-)
> 
> In any case, we saw the need for the length of the segcb lists to figure
> out things via tracing, so I suspect we may need this in the future as
> well, its a small cost so I would rather keep it if that's Ok with you! :)

OK, being needed for tracing/diagnostics is a somewhat less weak argument...

Let's see what v3 looks like.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-07-12 22:41 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-22 22:50 [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
2022-06-22 22:50 ` [PATCH v2 1/1] context_tracking: Use arch_atomic_read() in __ct_state for KASAN Joel Fernandes (Google)
2022-06-22 22:58   ` Joel Fernandes
2022-06-22 22:50 ` [PATCH v2 1/8] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
2022-06-22 23:18   ` Joel Fernandes
2022-06-26  4:00     ` Paul E. McKenney
2022-06-23  1:38   ` kernel test robot
2022-06-26  4:00   ` Paul E. McKenney
2022-07-08 18:43     ` Joel Fernandes
2022-07-08 23:10       ` Paul E. McKenney
2022-07-10  2:26     ` Joel Fernandes
2022-07-10 16:03       ` Paul E. McKenney
2022-07-12 20:53         ` Joel Fernandes
2022-07-12 21:04           ` Paul E. McKenney
2022-07-12 21:10             ` Joel Fernandes
2022-07-12 22:41               ` Paul E. McKenney
2022-06-29 11:53   ` Frederic Weisbecker
2022-06-29 17:05     ` Paul E. McKenney
2022-06-29 20:29     ` Joel Fernandes
2022-06-29 22:01       ` Frederic Weisbecker
2022-06-30 14:08         ` Joel Fernandes
2022-06-22 22:50 ` [PATCH v2 2/8] rcu: shrinker for lazy rcu Joel Fernandes (Google)
2022-06-22 22:50 ` [PATCH v2 3/8] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
2022-06-22 22:50 ` [PATCH v2 4/8] rcu/nocb: Add option to force all call_rcu() to lazy Joel Fernandes (Google)
2022-06-22 22:50 ` [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing Joel Fernandes (Google)
2022-06-26  4:06   ` Paul E. McKenney
2022-06-26 13:45     ` Joel Fernandes
2022-06-26 13:52       ` Paul E. McKenney
2022-06-26 14:37         ` Joel Fernandes
2022-06-22 22:51 ` [PATCH v2 6/8] rcuscale: Add test for using call_rcu_lazy() to emulate kfree_rcu() Joel Fernandes (Google)
2022-06-23  2:09   ` kernel test robot
2022-06-23  3:00   ` kernel test robot
2022-06-23  8:10   ` kernel test robot
2022-06-26  4:13   ` Paul E. McKenney
2022-07-08  4:25     ` Joel Fernandes
2022-07-08 23:06       ` Paul E. McKenney
2022-07-12 20:27         ` Joel Fernandes
2022-07-12 20:58           ` Paul E. McKenney
2022-07-12 21:15             ` Joel Fernandes
2022-07-12 22:41               ` Paul E. McKenney
2022-06-22 22:51 ` [PATCH v2 7/8] rcu/nocb: Rewrite deferred wake up logic to be more clean Joel Fernandes (Google)
2022-06-22 22:51 ` [PATCH v2 8/8] rcu/kfree: Fix kfree_rcu_shrink_count() return value Joel Fernandes (Google)
2022-06-26  4:17   ` Paul E. McKenney
2022-06-27 18:56   ` Uladzislau Rezki
2022-06-27 20:59     ` Paul E. McKenney
2022-06-27 21:18       ` Joel Fernandes
2022-06-27 21:43         ` Paul E. McKenney
2022-06-28 16:56           ` Joel Fernandes
2022-06-28 21:13             ` Joel Fernandes
2022-06-29 16:56               ` Paul E. McKenney
2022-06-29 19:47                 ` Joel Fernandes
2022-06-29 21:07                   ` Paul E. McKenney
2022-06-30 14:25                     ` Joel Fernandes
2022-06-30 15:29                       ` Paul E. McKenney
2022-06-29 16:52             ` Paul E. McKenney
2022-06-26  3:12 ` [PATCH v2 0/8] Implement call_rcu_lazy() and miscellaneous fixes Paul E. McKenney
2022-07-08  4:17   ` Joel Fernandes
2022-07-08 22:45     ` Paul E. McKenney
2022-07-10  1:38       ` Joel Fernandes
2022-07-10 15:47         ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.