linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14
@ 2021-05-11 22:54 Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 1/7] kvfree_rcu: Release a page cache under memory pressure Paul E. McKenney
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:54 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This series contains kfree_rcu() updates.

1.	kvfree_rcu: Release a page cache under memory pressure, courtesy
	of Zhang Qiang.

2.	kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to
	nr_bkv_objs, courtesy of "Uladzislau Rezki (Sony)".

3.	kvfree_rcu: Add a bulk-list check when a scheduler is run,
	courtesy of "Uladzislau Rezki (Sony)".

4.	kvfree_rcu: Update "monitor_todo" once a batch is started,
	courtesy of "Uladzislau Rezki (Sony)".

5.	kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant,
	courtesy of "Uladzislau Rezki (Sony)".

6.	kvfree_rcu: Fix comments according to current code, courtesy of
	"Uladzislau Rezki (Sony)".

7.	kvfree_rcu: Refactor kfree_rcu_monitor(), courtesy of "Uladzislau
	Rezki (Sony)".

						Thanx, Paul

------------------------------------------------------------------------

 b/Documentation/admin-guide/kernel-parameters.txt |    5 
 b/kernel/rcu/tree.c                               |   82 ++++++++++++--
 kernel/rcu/tree.c                                 |  127 +++++++---------------
 3 files changed, 121 insertions(+), 93 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 1/7] kvfree_rcu: Release a page cache under memory pressure
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 2/7] kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to nr_bkv_objs Paul E. McKenney
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Zhang Qiang, Uladzislau Rezki,
	Paul E . McKenney

From: Zhang Qiang <qiang.zhang@windriver.com>

Add a drain_page_cache() function to drain a per-cpu page cache.
The reason behind of it is a system can run into a low memory
condition, in that case a page shrinker can ask for its users
to free their caches in order to get extra memory available for
other needs in a system.

When a system hits such condition, a page cache is drained for
all CPUs in a system. By default a page cache work is delayed
with 5 seconds interval until a memory pressure disappears, if
needed it can be changed. See a rcu_delay_page_cache_fill_msec
module parameter.

Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++
 kernel/rcu/tree.c                             | 82 +++++++++++++++++--
 2 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..4405fd32e8ab 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4290,6 +4290,11 @@
 			whole algorithm to behave better in low memory
 			condition.
 
+	rcutree.rcu_delay_page_cache_fill_msec= [KNL]
+			Set the page-cache refill delay (in milliseconds)
+			in response to low-memory conditions.  The range
+			of permitted values is in the range 0:100000.
+
 	rcutree.jiffies_till_first_fqs= [KNL]
 			Set delay from grace-period initialization to
 			first attempt to force quiescent states.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8e78b2430c16..74d840aa877b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -186,6 +186,17 @@ module_param(rcu_unlock_delay, int, 0444);
 static int rcu_min_cached_objs = 5;
 module_param(rcu_min_cached_objs, int, 0444);
 
+// A page shrinker can ask for pages to be freed to make them
+// available for other parts of the system. This usually happens
+// under low memory conditions, and in that case we should also
+// defer page-cache filling for a short time period.
+//
+// The default value is 5 seconds, which is long enough to reduce
+// interference with the shrinker while it asks other systems to
+// drain their caches.
+static int rcu_delay_page_cache_fill_msec = 5000;
+module_param(rcu_delay_page_cache_fill_msec, int, 0444);
+
 /* Retrieve RCU kthreads priority for rcutorture */
 int rcu_get_gp_kthreads_prio(void)
 {
@@ -3171,6 +3182,7 @@ struct kfree_rcu_cpu_work {
  *	Even though it is lockless an access has to be protected by the
  *	per-cpu lock.
  * @page_cache_work: A work to refill the cache when it is empty
+ * @backoff_page_cache_fill: Delay cache refills
  * @work_in_progress: Indicates that page_cache_work is running
  * @hrtimer: A hrtimer for scheduling a page_cache_work
  * @nr_bkv_objs: number of allocated objects at @bkvcache.
@@ -3190,7 +3202,8 @@ struct kfree_rcu_cpu {
 	bool initialized;
 	int count;
 
-	struct work_struct page_cache_work;
+	struct delayed_work page_cache_work;
+	atomic_t backoff_page_cache_fill;
 	atomic_t work_in_progress;
 	struct hrtimer hrtimer;
 
@@ -3256,6 +3269,26 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
 
 }
 
+static int
+drain_page_cache(struct kfree_rcu_cpu *krcp)
+{
+	unsigned long flags;
+	struct llist_node *page_list, *pos, *n;
+	int freed = 0;
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+	page_list = llist_del_all(&krcp->bkvcache);
+	krcp->nr_bkv_objs = 0;
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+	llist_for_each_safe(pos, n, page_list) {
+		free_page((unsigned long)pos);
+		freed++;
+	}
+
+	return freed;
+}
+
 /*
  * This function is invoked in workqueue context after a grace period.
  * It frees all the objects queued on ->bhead_free or ->head_free.
@@ -3446,7 +3479,7 @@ schedule_page_work_fn(struct hrtimer *t)
 	struct kfree_rcu_cpu *krcp =
 		container_of(t, struct kfree_rcu_cpu, hrtimer);
 
-	queue_work(system_highpri_wq, &krcp->page_cache_work);
+	queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0);
 	return HRTIMER_NORESTART;
 }
 
@@ -3455,12 +3488,16 @@ static void fill_page_cache_func(struct work_struct *work)
 	struct kvfree_rcu_bulk_data *bnode;
 	struct kfree_rcu_cpu *krcp =
 		container_of(work, struct kfree_rcu_cpu,
-			page_cache_work);
+			page_cache_work.work);
 	unsigned long flags;
+	int nr_pages;
 	bool pushed;
 	int i;
 
-	for (i = 0; i < rcu_min_cached_objs; i++) {
+	nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ?
+		1 : rcu_min_cached_objs;
+
+	for (i = 0; i < nr_pages; i++) {
 		bnode = (struct kvfree_rcu_bulk_data *)
 			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
 
@@ -3477,6 +3514,7 @@ static void fill_page_cache_func(struct work_struct *work)
 	}
 
 	atomic_set(&krcp->work_in_progress, 0);
+	atomic_set(&krcp->backoff_page_cache_fill, 0);
 }
 
 static void
@@ -3484,10 +3522,15 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 {
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
 			!atomic_xchg(&krcp->work_in_progress, 1)) {
-		hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC,
-			HRTIMER_MODE_REL);
-		krcp->hrtimer.function = schedule_page_work_fn;
-		hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
+		if (atomic_read(&krcp->backoff_page_cache_fill)) {
+			queue_delayed_work(system_wq,
+				&krcp->page_cache_work,
+					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
+		} else {
+			hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+			krcp->hrtimer.function = schedule_page_work_fn;
+			hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
+		}
 	}
 }
 
@@ -3639,12 +3682,19 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
 	int cpu;
 	unsigned long count = 0;
+	unsigned long flags;
 
 	/* Snapshot count of all CPUs */
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count += READ_ONCE(krcp->count);
+
+		raw_spin_lock_irqsave(&krcp->lock, flags);
+		count += krcp->nr_bkv_objs;
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+		atomic_set(&krcp->backoff_page_cache_fill, 1);
 	}
 
 	return count;
@@ -3661,6 +3711,8 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count = krcp->count;
+		count += drain_page_cache(krcp);
+
 		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (krcp->monitor_todo)
 			kfree_rcu_drain_unlock(krcp, flags);
@@ -4687,6 +4739,18 @@ static void __init kfree_rcu_batch_init(void)
 	int cpu;
 	int i;
 
+	/* Clamp it to [0:100] seconds interval. */
+	if (rcu_delay_page_cache_fill_msec < 0 ||
+		rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
+
+		rcu_delay_page_cache_fill_msec =
+			clamp(rcu_delay_page_cache_fill_msec, 0,
+				(int) (100 * MSEC_PER_SEC));
+
+		pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n",
+			rcu_delay_page_cache_fill_msec);
+	}
+
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
@@ -4696,7 +4760,7 @@ static void __init kfree_rcu_batch_init(void)
 		}
 
 		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
-		INIT_WORK(&krcp->page_cache_work, fill_page_cache_func);
+		INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
 		krcp->initialized = true;
 	}
 	if (register_shrinker(&kfree_rcu_shrinker))
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 2/7] kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to nr_bkv_objs
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 1/7] kvfree_rcu: Release a page cache under memory pressure Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 3/7] kvfree_rcu: Add a bulk-list check when a scheduler is run Paul E. McKenney
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

nr_bkv_objs is a count of the objects in the kvfree_rcu page cache.
Accessing it requires holding the ->lock.  Switch to READ_ONCE() and
WRITE_ONCE() macros to provide lockless access to this counter.
This lockless access is used for the shrinker.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 74d840aa877b..676a49ab5b2b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3250,7 +3250,7 @@ get_cached_bnode(struct kfree_rcu_cpu *krcp)
 	if (!krcp->nr_bkv_objs)
 		return NULL;
 
-	krcp->nr_bkv_objs--;
+	WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1);
 	return (struct kvfree_rcu_bulk_data *)
 		llist_del_first(&krcp->bkvcache);
 }
@@ -3264,9 +3264,8 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
 		return false;
 
 	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
-	krcp->nr_bkv_objs++;
+	WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1);
 	return true;
-
 }
 
 static int
@@ -3278,7 +3277,7 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
 
 	raw_spin_lock_irqsave(&krcp->lock, flags);
 	page_list = llist_del_all(&krcp->bkvcache);
-	krcp->nr_bkv_objs = 0;
+	WRITE_ONCE(krcp->nr_bkv_objs, 0);
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
 	llist_for_each_safe(pos, n, page_list) {
@@ -3682,18 +3681,13 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
 	int cpu;
 	unsigned long count = 0;
-	unsigned long flags;
 
 	/* Snapshot count of all CPUs */
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count += READ_ONCE(krcp->count);
-
-		raw_spin_lock_irqsave(&krcp->lock, flags);
-		count += krcp->nr_bkv_objs;
-		raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
+		count += READ_ONCE(krcp->nr_bkv_objs);
 		atomic_set(&krcp->backoff_page_cache_fill, 1);
 	}
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 3/7] kvfree_rcu: Add a bulk-list check when a scheduler is run
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 1/7] kvfree_rcu: Release a page cache under memory pressure Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 2/7] kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to nr_bkv_objs Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 4/7] kvfree_rcu: Update "monitor_todo" once a batch is started Paul E. McKenney
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

The rcu_scheduler_active flag is set to RCU_SCHEDULER_RUNNING once the
scheduler is up and running.  That signal is used in order to check
and queue a "monitor work" to reclaim freed objects (if there are any)
during early boot.  This flag is used by kvfree_rcu() to determine when
work can safely be queued, at which point memory passed to earlier
invocations of kvfree_rcu() can be processed.

However, only "krcp->head" is checked for objects that need to be
released, and there are now two more, namely, "krcp->bkvhead[0]" and
"krcp->bkvhead[1]".  Therefore, check these two additional channels.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 676a49ab5b2b..e86f32d6b8f9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3739,7 +3739,8 @@ void __init kfree_rcu_scheduler_running(void)
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);
-		if (!krcp->head || krcp->monitor_todo) {
+		if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
+				krcp->monitor_todo) {
 			raw_spin_unlock_irqrestore(&krcp->lock, flags);
 			continue;
 		}
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 4/7] kvfree_rcu: Update "monitor_todo" once a batch is started
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-05-11 22:55 ` [PATCH tip/core/rcu 3/7] kvfree_rcu: Add a bulk-list check when a scheduler is run Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 5/7] kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant Paul E. McKenney
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

Before attempting to start a new batch the "monitor_todo" variable is
set to "false" and set back to "true" when a previous RCU batch is still
in progress.  This is at best confusing.

Thus change this variable to "false" only when a new batch has been
successfully queued, otherwise, just leave it be.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e86f32d6b8f9..1ae5f88e475f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3442,15 +3442,14 @@ static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
 					  unsigned long flags)
 {
 	// Attempt to start a new batch.
-	krcp->monitor_todo = false;
 	if (queue_kfree_rcu_work(krcp)) {
 		// Success! Our job is done here.
+		krcp->monitor_todo = false;
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 		return;
 	}
 
 	// Previous RCU batch still in progress, try again later.
-	krcp->monitor_todo = true;
 	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 5/7] kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-05-11 22:55 ` [PATCH tip/core/rcu 4/7] kvfree_rcu: Update "monitor_todo" once a batch is started Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 6/7] kvfree_rcu: Fix comments according to current code Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 7/7] kvfree_rcu: Refactor kfree_rcu_monitor() Paul E. McKenney
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

Replace an open-coded version of the kfree_rcu_monitor() function body
with a call to that function.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1ae5f88e475f..d643fd8327b6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3697,7 +3697,6 @@ static unsigned long
 kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 {
 	int cpu, freed = 0;
-	unsigned long flags;
 
 	for_each_possible_cpu(cpu) {
 		int count;
@@ -3705,12 +3704,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 
 		count = krcp->count;
 		count += drain_page_cache(krcp);
-
-		raw_spin_lock_irqsave(&krcp->lock, flags);
-		if (krcp->monitor_todo)
-			kfree_rcu_drain_unlock(krcp, flags);
-		else
-			raw_spin_unlock_irqrestore(&krcp->lock, flags);
+		kfree_rcu_monitor(&krcp->monitor_work.work);
 
 		sc->nr_to_scan -= count;
 		freed += count;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 6/7] kvfree_rcu: Fix comments according to current code
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2021-05-11 22:55 ` [PATCH tip/core/rcu 5/7] kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  2021-05-11 22:55 ` [PATCH tip/core/rcu 7/7] kvfree_rcu: Refactor kfree_rcu_monitor() Paul E. McKenney
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

The kvfree_rcu() function now defers allocations in the common
case due to the fact that there is no lockless access to the
memory-allocator caches/pools.  In addition, in CONFIG_PREEMPT_NONE=y
and in CONFIG_PREEMPT_VOLUNTARY=y kernels, there is no reliable way to
determine if spinlocks are held.  As a result, allocation is deferred in
the common case, and the two-argument form of kvfree_rcu() thus uses the
"channel 3" queue through all the rcu_head structures.  This channel
is called referred to as the emergency case in comments, and these
comments are now obsolete.

This commit therefore updates these comments to reflect the new
common-case nature of such emergencies.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d643fd8327b6..b043af7b0212 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3355,9 +3355,11 @@ static void kfree_rcu_work(struct work_struct *work)
 	}
 
 	/*
-	 * Emergency case only. It can happen under low memory
-	 * condition when an allocation gets failed, so the "bulk"
-	 * path can not be temporary maintained.
+	 * This is used when the "bulk" path can not be used for the
+	 * double-argument of kvfree_rcu().  This happens when the
+	 * page-cache is empty, which means that objects are instead
+	 * queued on a linked list through their rcu_head structures.
+	 * This list is named "Channel 3".
 	 */
 	for (; head; head = next) {
 		unsigned long offset = (unsigned long)head->func;
@@ -3403,8 +3405,8 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
 			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
 				(krcp->head && !krwp->head_free)) {
-			// Channel 1 corresponds to SLAB ptrs.
-			// Channel 2 corresponds to vmalloc ptrs.
+			// Channel 1 corresponds to the SLAB-pointer bulk path.
+			// Channel 2 corresponds to vmalloc-pointer bulk path.
 			for (j = 0; j < FREE_N_CHANNELS; j++) {
 				if (!krwp->bkvhead_free[j]) {
 					krwp->bkvhead_free[j] = krcp->bkvhead[j];
@@ -3412,7 +3414,8 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 				}
 			}
 
-			// Channel 3 corresponds to emergency path.
+			// Channel 3 corresponds to both SLAB and vmalloc
+			// objects queued on the linked list.
 			if (!krwp->head_free) {
 				krwp->head_free = krcp->head;
 				krcp->head = NULL;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH tip/core/rcu 7/7] kvfree_rcu: Refactor kfree_rcu_monitor()
  2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2021-05-11 22:55 ` [PATCH tip/core/rcu 6/7] kvfree_rcu: Fix comments according to current code Paul E. McKenney
@ 2021-05-11 22:55 ` Paul E. McKenney
  6 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2021-05-11 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Uladzislau Rezki (Sony),
	Paul E . McKenney

From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>

Currently we have three functions which depend on each other.
Two of them are quite tiny and the last one where the most
work is done. All of them are related to queuing RCU batches
to reclaim objects after a GP.

1. kfree_rcu_monitor(). It consist of few lines. It acquires a spin-lock
   and calls kfree_rcu_drain_unlock().

2. kfree_rcu_drain_unlock(). It also consists of few lines of code. It
   calls queue_kfree_rcu_work() to queue the batch.  If this fails,
   it rearms the monitor work to try again later.

3. queue_kfree_rcu_work(). This provides the bulk of the functionality,
   attempting to start a new batch to free objects after a GP.

Since there are no external users of functions [2] and [3], both
can eliminated by moving all logic directly into [1], which both
shrinks and simplifies the code.

Also replace comments which start with "/*" to "//" format to make it
unified across the file.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 84 +++++++++++++++--------------------------------
 1 file changed, 26 insertions(+), 58 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b043af7b0212..618ec9152e5e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3379,29 +3379,26 @@ static void kfree_rcu_work(struct work_struct *work)
 }
 
 /*
- * Schedule the kfree batch RCU work to run in workqueue context after a GP.
- *
- * This function is invoked by kfree_rcu_monitor() when the KFREE_DRAIN_JIFFIES
- * timeout has been reached.
+ * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
  */
-static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
+static void kfree_rcu_monitor(struct work_struct *work)
 {
-	struct kfree_rcu_cpu_work *krwp;
-	bool repeat = false;
+	struct kfree_rcu_cpu *krcp = container_of(work,
+		struct kfree_rcu_cpu, monitor_work.work);
+	unsigned long flags;
 	int i, j;
 
-	lockdep_assert_held(&krcp->lock);
+	raw_spin_lock_irqsave(&krcp->lock, flags);
 
+	// Attempt to start a new batch.
 	for (i = 0; i < KFREE_N_BATCHES; i++) {
-		krwp = &(krcp->krw_arr[i]);
+		struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
 
-		/*
-		 * Try to detach bkvhead or head and attach it over any
-		 * available corresponding free channel. It can be that
-		 * a previous RCU batch is in progress, it means that
-		 * immediately to queue another one is not possible so
-		 * return false to tell caller to retry.
-		 */
+		// Try to detach bkvhead or head and attach it over any
+		// available corresponding free channel. It can be that
+		// a previous RCU batch is in progress, it means that
+		// immediately to queue another one is not possible so
+		// in that case the monitor work is rearmed.
 		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
 			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
 				(krcp->head && !krwp->head_free)) {
@@ -3423,57 +3420,28 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 
 			WRITE_ONCE(krcp->count, 0);
 
-			/*
-			 * One work is per one batch, so there are three
-			 * "free channels", the batch can handle. It can
-			 * be that the work is in the pending state when
-			 * channels have been detached following by each
-			 * other.
-			 */
+			// One work is per one batch, so there are three
+			// "free channels", the batch can handle. It can
+			// be that the work is in the pending state when
+			// channels have been detached following by each
+			// other.
 			queue_rcu_work(system_wq, &krwp->rcu_work);
 		}
-
-		// Repeat if any "free" corresponding channel is still busy.
-		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
-			repeat = true;
 	}
 
-	return !repeat;
-}
-
-static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
-					  unsigned long flags)
-{
-	// Attempt to start a new batch.
-	if (queue_kfree_rcu_work(krcp)) {
-		// Success! Our job is done here.
+	// If there is nothing to detach, it means that our job is
+	// successfully done here. In case of having at least one
+	// of the channels that is still busy we should rearm the
+	// work to repeat an attempt. Because previous batches are
+	// still in progress.
+	if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
 		krcp->monitor_todo = false;
-		raw_spin_unlock_irqrestore(&krcp->lock, flags);
-		return;
-	}
+	else
+		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 
-	// Previous RCU batch still in progress, try again later.
-	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
-/*
- * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
- * It invokes kfree_rcu_drain_unlock() to attempt to start another batch.
- */
-static void kfree_rcu_monitor(struct work_struct *work)
-{
-	unsigned long flags;
-	struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
-						 monitor_work.work);
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-	if (krcp->monitor_todo)
-		kfree_rcu_drain_unlock(krcp, flags);
-	else
-		raw_spin_unlock_irqrestore(&krcp->lock, flags);
-}
-
 static enum hrtimer_restart
 schedule_page_work_fn(struct hrtimer *t)
 {
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-11 22:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 22:54 [PATCH tip/core/rcu 0/7] kvfree_rcu() updates for v5.14 Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 1/7] kvfree_rcu: Release a page cache under memory pressure Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 2/7] kvfree_rcu: Use [READ/WRITE]_ONCE() macros to access to nr_bkv_objs Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 3/7] kvfree_rcu: Add a bulk-list check when a scheduler is run Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 4/7] kvfree_rcu: Update "monitor_todo" once a batch is started Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 5/7] kvfree_rcu: Use kfree_rcu_monitor() instead of open-coded variant Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 6/7] kvfree_rcu: Fix comments according to current code Paul E. McKenney
2021-05-11 22:55 ` [PATCH tip/core/rcu 7/7] kvfree_rcu: Refactor kfree_rcu_monitor() Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).