linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments)
@ 2020-05-25 21:47 Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 01/16] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
                   ` (15 more replies)
  0 siblings, 16 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

This is a v2 of the https://lkml.org/lkml/2020/4/28/1626 series.
Please have look at v1 to find out more about motivation and details.
It is based on the latest dev.2020.05.17a Paul's branch.

Short changelog (v1 -> v2):
    - Combine some patches, thus reduce the overall number;
    - Switch one line comment type from "/* */" to "//";
    - Improve commit messages of some patches;
    - For tiny-RCU we just do synchronize_rcu() followed by kvfree()
      for single-argument of kvfree_rcu();
    - Drop the dynamic rcu_head attaching techniques + related patches;
    - Reduce duplication of code in some functions;
    - Added more comments for better understanding of code.

There is one patch related to "mm": "Rename kvfree_rcu() to local variant"
please note, it does not change any functionality, only renaming is done.

Joel Fernandes (Google) (3):
  rcu/tree: Keep kfree_rcu() awake during lock contention
  rcu/tree: Skip entry into the page allocator for PREEMPT_RT
  rcu/tree: Make debug_objects logic independent of rcu_head

Sebastian Andrzej Siewior (1):
  rcu/tree: Use static initializer for krc.lock

Uladzislau Rezki (Sony) (12):
  rcu/tree: Repeat the monitor if any free channel is busy
  rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro
  rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions
  rcu/tree: cache specified number of objects
  rcu/tree: Maintain separate array for vmalloc ptrs
  rcu/tiny: support vmalloc in tiny-RCU
  rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_*
  mm/list_lru.c: Rename kvfree_rcu() to local variant
  rcu: Introduce 2 arg kvfree_rcu() interface
  rcu: Support reclaim for head-less object
  rcu: Introduce single argument kvfree_rcu() interface
  lib/test_vmalloc.c: Add test cases for kvfree_rcu()

 .../admin-guide/kernel-parameters.txt         |   8 +
 include/linux/rcupdate.h                      |  53 ++-
 include/linux/rcutiny.h                       |  20 +-
 include/linux/rcutree.h                       |   2 +-
 include/trace/events/rcu.h                    |   8 +-
 kernel/rcu/tiny.c                             |   7 +-
 kernel/rcu/tree.c                             | 381 ++++++++++++------
 lib/test_vmalloc.c                            | 103 ++++-
 mm/list_lru.c                                 |   6 +-
 9 files changed, 446 insertions(+), 142 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v2 01/16] rcu/tree: Keep kfree_rcu() awake during lock contention
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 02/16] rcu/tree: Skip entry into the page allocator for PREEMPT_RT Uladzislau Rezki (Sony)
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, bigeasy

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

On PREEMPT_RT kernels, the krcp spinlock gets converted to an rt-mutex
and causes kfree_rcu() callers to sleep. This makes it unusable for
callers in purely atomic sections such as non-threaded IRQ handlers and
raw spinlock sections. Fix it by converting the spinlock to a raw
spinlock.

Vetting all code paths, there is no reason to believe that the raw
spinlock will hurt RT latencies as it is not held for a long time.

Cc: bigeasy@linutronix.de
Cc: Uladzislau Rezki <urezki@gmail.com>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6e120be29332..6e967a9d6704 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2882,7 +2882,7 @@ struct kfree_rcu_cpu {
 	struct kfree_rcu_bulk_data *bhead;
 	struct kfree_rcu_bulk_data *bcached;
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
-	spinlock_t lock;
+	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
 	bool monitor_todo;
 	bool initialized;
@@ -2915,12 +2915,12 @@ static void kfree_rcu_work(struct work_struct *work)
 	krwp = container_of(to_rcu_work(work),
 			    struct kfree_rcu_cpu_work, rcu_work);
 	krcp = krwp->krcp;
-	spin_lock_irqsave(&krcp->lock, flags);
+	raw_spin_lock_irqsave(&krcp->lock, flags);
 	head = krwp->head_free;
 	krwp->head_free = NULL;
 	bhead = krwp->bhead_free;
 	krwp->bhead_free = NULL;
-	spin_unlock_irqrestore(&krcp->lock, flags);
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
 	/* "bhead" is now private, so traverse locklessly. */
 	for (; bhead; bhead = bnext) {
@@ -3023,14 +3023,14 @@ static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
 	krcp->monitor_todo = false;
 	if (queue_kfree_rcu_work(krcp)) {
 		// Success! Our job is done here.
-		spin_unlock_irqrestore(&krcp->lock, flags);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 		return;
 	}
 
 	// Previous RCU batch still in progress, try again later.
 	krcp->monitor_todo = true;
 	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
-	spin_unlock_irqrestore(&krcp->lock, flags);
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
 /*
@@ -3043,11 +3043,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
 	struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
 						 monitor_work.work);
 
-	spin_lock_irqsave(&krcp->lock, flags);
+	raw_spin_lock_irqsave(&krcp->lock, flags);
 	if (krcp->monitor_todo)
 		kfree_rcu_drain_unlock(krcp, flags);
 	else
-		spin_unlock_irqrestore(&krcp->lock, flags);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
 static inline bool
@@ -3118,7 +3118,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	local_irq_save(flags);	// For safely calling this_cpu_ptr().
 	krcp = this_cpu_ptr(&krc);
 	if (krcp->initialized)
-		spin_lock(&krcp->lock);
+		raw_spin_lock(&krcp->lock);
 
 	// Queue the object but don't yet schedule the batch.
 	if (debug_rcu_head_queue(head)) {
@@ -3149,7 +3149,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 unlock_return:
 	if (krcp->initialized)
-		spin_unlock(&krcp->lock);
+		raw_spin_unlock(&krcp->lock);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
@@ -3181,11 +3181,11 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count = krcp->count;
-		spin_lock_irqsave(&krcp->lock, flags);
+		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (krcp->monitor_todo)
 			kfree_rcu_drain_unlock(krcp, flags);
 		else
-			spin_unlock_irqrestore(&krcp->lock, flags);
+			raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
 		sc->nr_to_scan -= count;
 		freed += count;
@@ -3212,15 +3212,15 @@ void __init kfree_rcu_scheduler_running(void)
 	for_each_online_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
-		spin_lock_irqsave(&krcp->lock, flags);
+		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (!krcp->head || krcp->monitor_todo) {
-			spin_unlock_irqrestore(&krcp->lock, flags);
+			raw_spin_unlock_irqrestore(&krcp->lock, flags);
 			continue;
 		}
 		krcp->monitor_todo = true;
 		schedule_delayed_work_on(cpu, &krcp->monitor_work,
 					 KFREE_DRAIN_JIFFIES);
-		spin_unlock_irqrestore(&krcp->lock, flags);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 	}
 }
 
@@ -4113,7 +4113,7 @@ static void __init kfree_rcu_batch_init(void)
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
-		spin_lock_init(&krcp->lock);
+		raw_spin_lock_init(&krcp->lock);
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 02/16] rcu/tree: Skip entry into the page allocator for PREEMPT_RT
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 01/16] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 03/16] rcu/tree: Repeat the monitor if any free channel is busy Uladzislau Rezki (Sony)
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, Sebastian Andrzej Siewior

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

To keep the kfree_rcu() code working in purely atomic sections on RT,
such as non-threaded IRQ handlers and raw spinlock sections, avoid
calling into the page allocator which uses sleeping locks on RT.

In fact, even if the  caller is preemptible, the kfree_rcu() code is
not, as the krcp->lock is a raw spinlock.

Calling into the page allocator is optional and avoiding it should be
Ok, especially with the page pre-allocation support in future patches.
Such pre-allocation would further avoid the a need for a dynamically
allocated page in the first place.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Co-developed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6e967a9d6704..d28010cf158b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3068,6 +3068,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 		if (!bnode) {
 			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
 
+			/*
+			 * To keep this path working on raw non-preemptible
+			 * sections, prevent the optional entry into the
+			 * allocator as it uses sleeping locks. In fact, even
+			 * if the caller of kfree_rcu() is preemptible, this
+			 * path still is not, as krcp->lock is a raw spinlock.
+			 * With additional page pre-allocation in the works,
+			 * hitting this return is going to be much less likely.
+			 */
+			if (IS_ENABLED(CONFIG_PREEMPT_RT))
+				return false;
+
 			bnode = (struct kfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 03/16] rcu/tree: Repeat the monitor if any free channel is busy
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 01/16] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 02/16] rcu/tree: Skip entry into the page allocator for PREEMPT_RT Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 04/16] rcu/tree: Make debug_objects logic independent of rcu_head Uladzislau Rezki (Sony)
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

It is possible that one of the channels cannot be detached
because its free channel is busy and previously queued data
has not been processed yet. On the other hand, another
channel can be successfully detached causing the monitor
work to stop.

Prevent that by rescheduling the monitor work if there are
any channels in the pending state after a detach attempt.

Fixes: 34c881745549e ("rcu: Support kfree_bulk() interface in kfree_rcu()")
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d28010cf158b..ae21e1bddd3e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2971,7 +2971,7 @@ static void kfree_rcu_work(struct work_struct *work)
 static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 {
 	struct kfree_rcu_cpu_work *krwp;
-	bool queued = false;
+	bool repeat = false;
 	int i;
 
 	lockdep_assert_held(&krcp->lock);
@@ -3009,11 +3009,14 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 			 * been detached following each other, one by one.
 			 */
 			queue_rcu_work(system_wq, &krwp->rcu_work);
-			queued = true;
 		}
+
+		/* Repeat if any "free" corresponding channel is still busy. */
+		if (krcp->bhead || krcp->head)
+			repeat = true;
 	}
 
-	return queued;
+	return !repeat;
 }
 
 static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 04/16] rcu/tree: Make debug_objects logic independent of rcu_head
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (2 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 03/16] rcu/tree: Repeat the monitor if any free channel is busy Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 05/16] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro Uladzislau Rezki (Sony)
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

kfree_rcu()'s debug_objects logic uses the address of the object's
embedded rcu_head to queue/unqueue. Instead of this, make use of the
object's address itself as preparation for future headless kfree_rcu()
support.

Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ae21e1bddd3e..776ccf858154 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2836,13 +2836,11 @@ EXPORT_SYMBOL_GPL(call_rcu);
  * @nr_records: Number of active pointers in the array
  * @records: Array of the kfree_rcu() pointers
  * @next: Next bulk object in the block chain
- * @head_free_debug: For debug, when CONFIG_DEBUG_OBJECTS_RCU_HEAD is set
  */
 struct kfree_rcu_bulk_data {
 	unsigned long nr_records;
 	void *records[KFREE_BULK_MAX_ENTR];
 	struct kfree_rcu_bulk_data *next;
-	struct rcu_head *head_free_debug;
 };
 
 /**
@@ -2892,11 +2890,13 @@ struct kfree_rcu_cpu {
 static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
 
 static __always_inline void
-debug_rcu_head_unqueue_bulk(struct rcu_head *head)
+debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
 {
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-	for (; head; head = head->next)
-		debug_rcu_head_unqueue(head);
+	int i;
+
+	for (i = 0; i < bhead->nr_records; i++)
+		debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i]));
 #endif
 }
 
@@ -2926,7 +2926,7 @@ static void kfree_rcu_work(struct work_struct *work)
 	for (; bhead; bhead = bnext) {
 		bnext = bhead->next;
 
-		debug_rcu_head_unqueue_bulk(bhead->head_free_debug);
+		debug_rcu_bhead_unqueue(bhead);
 
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
@@ -2948,14 +2948,15 @@ static void kfree_rcu_work(struct work_struct *work)
 	 */
 	for (; head; head = next) {
 		unsigned long offset = (unsigned long)head->func;
+		void *ptr = (void *)head - offset;
 
 		next = head->next;
-		debug_rcu_head_unqueue(head);
+		debug_rcu_head_unqueue((struct rcu_head *)ptr);
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
 
 		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
-			kfree((void *)head - offset);
+			kfree(ptr);
 
 		rcu_lock_release(&rcu_callback_map);
 		cond_resched_tasks_rcu_qs();
@@ -3094,18 +3095,11 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
 		bnode->next = krcp->bhead;
-		bnode->head_free_debug = NULL;
 
 		/* Attach it to the head. */
 		krcp->bhead = bnode;
 	}
 
-#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-	head->func = func;
-	head->next = krcp->bhead->head_free_debug;
-	krcp->bhead->head_free_debug = head;
-#endif
-
 	/* Finally insert. */
 	krcp->bhead->records[krcp->bhead->nr_records++] =
 		(void *) head - (unsigned long) func;
@@ -3129,14 +3123,17 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
+	void *ptr;
 
 	local_irq_save(flags);	// For safely calling this_cpu_ptr().
 	krcp = this_cpu_ptr(&krc);
 	if (krcp->initialized)
 		raw_spin_lock(&krcp->lock);
 
+	ptr = (void *)head - (unsigned long)func;
+
 	// Queue the object but don't yet schedule the batch.
-	if (debug_rcu_head_queue(head)) {
+	if (debug_rcu_head_queue(ptr)) {
 		// Probable double kfree_rcu(), just leak.
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 05/16] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (3 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 04/16] rcu/tree: Make debug_objects logic independent of rcu_head Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 06/16] rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions Uladzislau Rezki (Sony)
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, Boqun Feng

We can simplify KFREE_BULK_MAX_ENTR macro and get rid of
magic numbers which were used to make the structure to be
exactly one page.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 776ccf858154..6501fbcae3c7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2824,13 +2824,6 @@ EXPORT_SYMBOL_GPL(call_rcu);
 #define KFREE_DRAIN_JIFFIES (HZ / 50)
 #define KFREE_N_BATCHES 2
 
-/*
- * This macro defines how many entries the "records" array
- * will contain. It is based on the fact that the size of
- * kfree_rcu_bulk_data structure becomes exactly one page.
- */
-#define KFREE_BULK_MAX_ENTR ((PAGE_SIZE / sizeof(void *)) - 3)
-
 /**
  * struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers
  * @nr_records: Number of active pointers in the array
@@ -2839,10 +2832,18 @@ EXPORT_SYMBOL_GPL(call_rcu);
  */
 struct kfree_rcu_bulk_data {
 	unsigned long nr_records;
-	void *records[KFREE_BULK_MAX_ENTR];
 	struct kfree_rcu_bulk_data *next;
+	void *records[];
 };
 
+/*
+ * This macro defines how many entries the "records" array
+ * will contain. It is based on the fact that the size of
+ * kfree_rcu_bulk_data structure becomes exactly one page.
+ */
+#define KFREE_BULK_MAX_ENTR \
+	((PAGE_SIZE - sizeof(struct kfree_rcu_bulk_data)) / sizeof(void *))
+
 /**
  * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
  * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 06/16] rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (4 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 05/16] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 07/16] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce helpers to lock and unlock per-cpu "kfree_rcu_cpu"
structures. That will make kfree_call_rcu() more readable
and prevent programming errors.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6501fbcae3c7..c1f4b740cf14 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2901,6 +2901,27 @@ debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
 #endif
 }
 
+static inline struct kfree_rcu_cpu *
+krc_this_cpu_lock(unsigned long *flags)
+{
+	struct kfree_rcu_cpu *krcp;
+
+	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
+	krcp = this_cpu_ptr(&krc);
+	if (likely(krcp->initialized))
+		raw_spin_lock(&krcp->lock);
+
+	return krcp;
+}
+
+static inline void
+krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
+{
+	if (likely(krcp->initialized))
+		raw_spin_unlock(&krcp->lock);
+	local_irq_restore(flags);
+}
+
 /*
  * This function is invoked in workqueue context after a grace period.
  * It frees all the objects queued on ->bhead_free or ->head_free.
@@ -3126,11 +3147,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	struct kfree_rcu_cpu *krcp;
 	void *ptr;
 
-	local_irq_save(flags);	// For safely calling this_cpu_ptr().
-	krcp = this_cpu_ptr(&krc);
-	if (krcp->initialized)
-		raw_spin_lock(&krcp->lock);
-
+	krcp = krc_this_cpu_lock(&flags);
 	ptr = (void *)head - (unsigned long)func;
 
 	// Queue the object but don't yet schedule the batch.
@@ -3161,9 +3178,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	}
 
 unlock_return:
-	if (krcp->initialized)
-		raw_spin_unlock(&krcp->lock);
-	local_irq_restore(flags);
+	krc_this_cpu_unlock(krcp, flags);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 07/16] rcu/tree: Use static initializer for krc.lock
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (5 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 06/16] rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 08/16] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The per-CPU variable is initialized at runtime in
kfree_rcu_batch_init(). This function is invoked before
'rcu_scheduler_active' is set to 'RCU_SCHEDULER_RUNNING'.
After the initialisation, '->initialized' is to true.

The raw_spin_lock is only acquired if '->initialized' is
set to true. The worqueue item is only used if 'rcu_scheduler_active'
set to RCU_SCHEDULER_RUNNING which happens after initialisation.

Use a static initializer for krc.lock and remove the runtime
initialisation of the lock. Since the lock can now be always
acquired, remove the '->initialized' check.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1f4b740cf14..4b1710c1d8f6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2868,7 +2868,7 @@ struct kfree_rcu_cpu_work {
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
  * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
- * @initialized: The @lock and @rcu_work fields have been initialized
+ * @initialized: The @rcu_work fields have been initialized
  * @count: Number of objects for which GP not started
  *
  * This is a per-CPU structure.  The reason that it is not included in
@@ -2888,7 +2888,9 @@ struct kfree_rcu_cpu {
 	int count;
 };
 
-static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
+static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
+	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
+};
 
 static __always_inline void
 debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
@@ -2908,8 +2910,7 @@ krc_this_cpu_lock(unsigned long *flags)
 
 	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
 	krcp = this_cpu_ptr(&krc);
-	if (likely(krcp->initialized))
-		raw_spin_lock(&krcp->lock);
+	raw_spin_lock(&krcp->lock);
 
 	return krcp;
 }
@@ -2917,8 +2918,7 @@ krc_this_cpu_lock(unsigned long *flags)
 static inline void
 krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
 {
-	if (likely(krcp->initialized))
-		raw_spin_unlock(&krcp->lock);
+	raw_spin_unlock(&krcp->lock);
 	local_irq_restore(flags);
 }
 
@@ -4141,7 +4141,6 @@ static void __init kfree_rcu_batch_init(void)
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
-		raw_spin_lock_init(&krcp->lock);
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 08/16] rcu/tree: cache specified number of objects
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (6 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 07/16] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

In order to reduce the dynamic need for pages in kfree_rcu(),
pre-allocate a configurable number of pages per CPU and link
them in a list. When kfree_rcu() reclaims objects, the object's
container page is cached into a list instead of being released
to the low-level page allocator.

Such an approach provides O(1) access to free pages while also
reducing the number of requests to the page allocator. It also
makes the kfree_rcu() code to have free pages available during
a low memory condition.

A read-only sysfs parameter (rcu_min_cached_objs) reflects the
minimum number of allowed cached pages per CPU.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 .../admin-guide/kernel-parameters.txt         |  8 +++
 kernel/rcu/tree.c                             | 66 +++++++++++++++++--
 2 files changed, 70 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ea4228779c28..b90af44ee81d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3977,6 +3977,14 @@
 			latencies, which will choose a value aligned
 			with the appropriate hardware boundaries.
 
+	rcutree.rcu_min_cached_objs= [KNL]
+			Minimum number of objects which are cached and
+			maintained per one CPU. Object size is equal
+			to PAGE_SIZE. The cache allows to reduce the
+			pressure to page allocator, also it makes the
+			whole algorithm to behave better in low memory
+			condition.
+
 	rcutree.jiffies_till_first_fqs= [KNL]
 			Set delay from grace-period initialization to
 			first attempt to force quiescent states.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4b1710c1d8f6..e2267e92de5d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -167,6 +167,15 @@ module_param(gp_init_delay, int, 0444);
 static int gp_cleanup_delay;
 module_param(gp_cleanup_delay, int, 0444);
 
+/*
+ * This rcu parameter is runtime-read-only. It reflects
+ * a minimum allowed number of objects which can be cached
+ * per-CPU. Object size is equal to one page. This value
+ * can be changed at boot time.
+ */
+static int rcu_min_cached_objs = 2;
+module_param(rcu_min_cached_objs, int, 0444);
+
 /* Retrieve RCU kthreads priority for rcutorture */
 int rcu_get_gp_kthreads_prio(void)
 {
@@ -2863,7 +2872,6 @@ struct kfree_rcu_cpu_work {
  * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
  * @head: List of kfree_rcu() objects not yet waiting for a grace period
  * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
- * @bcached: Keeps at most one object for later reuse when build chain blocks
  * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
@@ -2879,13 +2887,22 @@ struct kfree_rcu_cpu_work {
 struct kfree_rcu_cpu {
 	struct rcu_head *head;
 	struct kfree_rcu_bulk_data *bhead;
-	struct kfree_rcu_bulk_data *bcached;
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
 	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
 	bool monitor_todo;
 	bool initialized;
 	int count;
+
+	/*
+	 * A simple cache list that contains objects for
+	 * reuse purpose. In order to save some per-cpu
+	 * space the list is singular. Even though it is
+	 * lockless an access has to be protected by the
+	 * per-cpu lock.
+	 */
+	struct llist_head bkvcache;
+	int nr_bkv_objs;
 };
 
 static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
@@ -2922,6 +2939,31 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
 	local_irq_restore(flags);
 }
 
+static inline struct kfree_rcu_bulk_data *
+get_cached_bnode(struct kfree_rcu_cpu *krcp)
+{
+	if (!krcp->nr_bkv_objs)
+		return NULL;
+
+	krcp->nr_bkv_objs--;
+	return (struct kfree_rcu_bulk_data *)
+		llist_del_first(&krcp->bkvcache);
+}
+
+static inline bool
+put_cached_bnode(struct kfree_rcu_cpu *krcp,
+	struct kfree_rcu_bulk_data *bnode)
+{
+	// Check the limit.
+	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
+		return false;
+
+	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
+	krcp->nr_bkv_objs++;
+	return true;
+
+}
+
 /*
  * This function is invoked in workqueue context after a grace period.
  * It frees all the objects queued on ->bhead_free or ->head_free.
@@ -2957,7 +2999,12 @@ static void kfree_rcu_work(struct work_struct *work)
 		kfree_bulk(bhead->nr_records, bhead->records);
 		rcu_lock_release(&rcu_callback_map);
 
-		if (cmpxchg(&krcp->bcached, NULL, bhead))
+		krcp = krc_this_cpu_lock(&flags);
+		if (put_cached_bnode(krcp, bhead))
+			bhead = NULL;
+		krc_this_cpu_unlock(krcp, flags);
+
+		if (bhead)
 			free_page((unsigned long) bhead);
 
 		cond_resched_tasks_rcu_qs();
@@ -3090,7 +3137,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 	/* Check if a new block is required. */
 	if (!krcp->bhead ||
 			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
-		bnode = xchg(&krcp->bcached, NULL);
+		bnode = get_cached_bnode(krcp);
 		if (!bnode) {
 			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
 
@@ -4140,12 +4187,23 @@ static void __init kfree_rcu_batch_init(void)
 
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+		struct kfree_rcu_bulk_data *bnode;
 
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
 		}
 
+		for (i = 0; i < rcu_min_cached_objs; i++) {
+			bnode = (struct kfree_rcu_bulk_data *)
+				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+
+			if (bnode)
+				put_cached_bnode(krcp, bnode);
+			else
+				pr_err("Failed to preallocate for %d CPU!\n", cpu);
+		}
+
 		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
 		krcp->initialized = true;
 	}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (7 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 08/16] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-06-17 23:46   ` Paul E. McKenney
  2020-05-25 21:47 ` [PATCH v2 10/16] rcu/tiny: support vmalloc in tiny-RCU Uladzislau Rezki (Sony)
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

To do so, we use an array of kvfree_rcu_bulk_data structures.
It consists of two elements:
 - index number 0 corresponds to slab pointers.
 - index number 1 corresponds to vmalloc pointers.

Keeping vmalloc pointers separated from slab pointers makes
it possible to invoke the right freeing API for the right
kind of pointer.

It also prepares us for future headless support for vmalloc
and SLAB objects. Such objects cannot be queued on a linked
list and are instead directly into an array.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 173 +++++++++++++++++++++++++++-------------------
 1 file changed, 100 insertions(+), 73 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e2267e92de5d..9f84ff80bc25 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -57,6 +57,8 @@
 #include <linux/slab.h>
 #include <linux/sched/isolation.h>
 #include <linux/sched/clock.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
 #include "../time/tick-internal.h"
 
 #include "tree.h"
@@ -2832,46 +2834,47 @@ EXPORT_SYMBOL_GPL(call_rcu);
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (HZ / 50)
 #define KFREE_N_BATCHES 2
+#define FREE_N_CHANNELS 2
 
 /**
- * struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers
+ * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
  * @nr_records: Number of active pointers in the array
- * @records: Array of the kfree_rcu() pointers
  * @next: Next bulk object in the block chain
+ * @records: Array of the kvfree_rcu() pointers
  */
-struct kfree_rcu_bulk_data {
+struct kvfree_rcu_bulk_data {
 	unsigned long nr_records;
-	struct kfree_rcu_bulk_data *next;
+	struct kvfree_rcu_bulk_data *next;
 	void *records[];
 };
 
 /*
  * This macro defines how many entries the "records" array
  * will contain. It is based on the fact that the size of
- * kfree_rcu_bulk_data structure becomes exactly one page.
+ * kvfree_rcu_bulk_data structure becomes exactly one page.
  */
-#define KFREE_BULK_MAX_ENTR \
-	((PAGE_SIZE - sizeof(struct kfree_rcu_bulk_data)) / sizeof(void *))
+#define KVFREE_BULK_MAX_ENTR \
+	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
 
 /**
  * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
  * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
  * @head_free: List of kfree_rcu() objects waiting for a grace period
- * @bhead_free: Bulk-List of kfree_rcu() objects waiting for a grace period
+ * @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
  * @krcp: Pointer to @kfree_rcu_cpu structure
  */
 
 struct kfree_rcu_cpu_work {
 	struct rcu_work rcu_work;
 	struct rcu_head *head_free;
-	struct kfree_rcu_bulk_data *bhead_free;
+	struct kvfree_rcu_bulk_data *bkvhead_free[FREE_N_CHANNELS];
 	struct kfree_rcu_cpu *krcp;
 };
 
 /**
  * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
  * @head: List of kfree_rcu() objects not yet waiting for a grace period
- * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
+ * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
  * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
@@ -2886,7 +2889,7 @@ struct kfree_rcu_cpu_work {
  */
 struct kfree_rcu_cpu {
 	struct rcu_head *head;
-	struct kfree_rcu_bulk_data *bhead;
+	struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS];
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
 	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
@@ -2910,7 +2913,7 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
 };
 
 static __always_inline void
-debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
+debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
 {
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
 	int i;
@@ -2939,20 +2942,20 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
 	local_irq_restore(flags);
 }
 
-static inline struct kfree_rcu_bulk_data *
+static inline struct kvfree_rcu_bulk_data *
 get_cached_bnode(struct kfree_rcu_cpu *krcp)
 {
 	if (!krcp->nr_bkv_objs)
 		return NULL;
 
 	krcp->nr_bkv_objs--;
-	return (struct kfree_rcu_bulk_data *)
+	return (struct kvfree_rcu_bulk_data *)
 		llist_del_first(&krcp->bkvcache);
 }
 
 static inline bool
 put_cached_bnode(struct kfree_rcu_cpu *krcp,
-	struct kfree_rcu_bulk_data *bnode)
+	struct kvfree_rcu_bulk_data *bnode)
 {
 	// Check the limit.
 	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
@@ -2971,43 +2974,63 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
 static void kfree_rcu_work(struct work_struct *work)
 {
 	unsigned long flags;
+	struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS], *bnext;
 	struct rcu_head *head, *next;
-	struct kfree_rcu_bulk_data *bhead, *bnext;
 	struct kfree_rcu_cpu *krcp;
 	struct kfree_rcu_cpu_work *krwp;
+	int i, j;
 
 	krwp = container_of(to_rcu_work(work),
 			    struct kfree_rcu_cpu_work, rcu_work);
 	krcp = krwp->krcp;
+
 	raw_spin_lock_irqsave(&krcp->lock, flags);
+	// Channels 1 and 2.
+	for (i = 0; i < FREE_N_CHANNELS; i++) {
+		bkvhead[i] = krwp->bkvhead_free[i];
+		krwp->bkvhead_free[i] = NULL;
+	}
+
+	// Channel 3.
 	head = krwp->head_free;
 	krwp->head_free = NULL;
-	bhead = krwp->bhead_free;
-	krwp->bhead_free = NULL;
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
-	/* "bhead" is now private, so traverse locklessly. */
-	for (; bhead; bhead = bnext) {
-		bnext = bhead->next;
-
-		debug_rcu_bhead_unqueue(bhead);
-
-		rcu_lock_acquire(&rcu_callback_map);
-		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
-			bhead->nr_records, bhead->records);
-
-		kfree_bulk(bhead->nr_records, bhead->records);
-		rcu_lock_release(&rcu_callback_map);
+	// Handle two first channels.
+	for (i = 0; i < FREE_N_CHANNELS; i++) {
+		for (; bkvhead[i]; bkvhead[i] = bnext) {
+			bnext = bkvhead[i]->next;
+			debug_rcu_bhead_unqueue(bkvhead[i]);
+
+			rcu_lock_acquire(&rcu_callback_map);
+			if (i == 0) { // kmalloc() / kfree().
+				trace_rcu_invoke_kfree_bulk_callback(
+					rcu_state.name, bkvhead[i]->nr_records,
+					bkvhead[i]->records);
+
+				kfree_bulk(bkvhead[i]->nr_records,
+					bkvhead[i]->records);
+			} else { // vmalloc() / vfree().
+				for (j = 0; j < bkvhead[i]->nr_records; j++) {
+					trace_rcu_invoke_kfree_callback(
+						rcu_state.name,
+						bkvhead[i]->records[j], 0);
+
+					vfree(bkvhead[i]->records[j]);
+				}
+			}
+			rcu_lock_release(&rcu_callback_map);
 
-		krcp = krc_this_cpu_lock(&flags);
-		if (put_cached_bnode(krcp, bhead))
-			bhead = NULL;
-		krc_this_cpu_unlock(krcp, flags);
+			krcp = krc_this_cpu_lock(&flags);
+			if (put_cached_bnode(krcp, bkvhead[i]))
+				bkvhead[i] = NULL;
+			krc_this_cpu_unlock(krcp, flags);
 
-		if (bhead)
-			free_page((unsigned long) bhead);
+			if (bkvhead[i])
+				free_page((unsigned long) bkvhead[i]);
 
-		cond_resched_tasks_rcu_qs();
+			cond_resched_tasks_rcu_qs();
+		}
 	}
 
 	/*
@@ -3025,7 +3048,7 @@ static void kfree_rcu_work(struct work_struct *work)
 		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
 
 		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
-			kfree(ptr);
+			kvfree(ptr);
 
 		rcu_lock_release(&rcu_callback_map);
 		cond_resched_tasks_rcu_qs();
@@ -3042,7 +3065,7 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 {
 	struct kfree_rcu_cpu_work *krwp;
 	bool repeat = false;
-	int i;
+	int i, j;
 
 	lockdep_assert_held(&krcp->lock);
 
@@ -3050,21 +3073,25 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 		krwp = &(krcp->krw_arr[i]);
 
 		/*
-		 * Try to detach bhead or head and attach it over any
+		 * Try to detach bkvhead or head and attach it over any
 		 * available corresponding free channel. It can be that
 		 * a previous RCU batch is in progress, it means that
 		 * immediately to queue another one is not possible so
 		 * return false to tell caller to retry.
 		 */
-		if ((krcp->bhead && !krwp->bhead_free) ||
+		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
+			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
 				(krcp->head && !krwp->head_free)) {
-			/* Channel 1. */
-			if (!krwp->bhead_free) {
-				krwp->bhead_free = krcp->bhead;
-				krcp->bhead = NULL;
+			// Channel 1 corresponds to SLAB ptrs.
+			// Channel 2 corresponds to vmalloc ptrs.
+			for (j = 0; j < FREE_N_CHANNELS; j++) {
+				if (!krwp->bkvhead_free[j]) {
+					krwp->bkvhead_free[j] = krcp->bkvhead[j];
+					krcp->bkvhead[j] = NULL;
+				}
 			}
 
-			/* Channel 2. */
+			// Channel 3 corresponds to emergency path.
 			if (!krwp->head_free) {
 				krwp->head_free = krcp->head;
 				krcp->head = NULL;
@@ -3073,16 +3100,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 			WRITE_ONCE(krcp->count, 0);
 
 			/*
-			 * One work is per one batch, so there are two "free channels",
-			 * "bhead_free" and "head_free" the batch can handle. It can be
-			 * that the work is in the pending state when two channels have
-			 * been detached following each other, one by one.
+			 * One work is per one batch, so there are three
+			 * "free channels", the batch can handle. It can
+			 * be that the work is in the pending state when
+			 * channels have been detached following by each
+			 * other.
 			 */
 			queue_rcu_work(system_wq, &krwp->rcu_work);
 		}
 
-		/* Repeat if any "free" corresponding channel is still busy. */
-		if (krcp->bhead || krcp->head)
+		// Repeat if any "free" corresponding channel is still busy.
+		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
 			repeat = true;
 	}
 
@@ -3124,23 +3152,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
 }
 
 static inline bool
-kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
-	struct rcu_head *head, rcu_callback_t func)
+kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 {
-	struct kfree_rcu_bulk_data *bnode;
+	struct kvfree_rcu_bulk_data *bnode;
+	int idx;
 
 	if (unlikely(!krcp->initialized))
 		return false;
 
 	lockdep_assert_held(&krcp->lock);
+	idx = !!is_vmalloc_addr(ptr);
 
 	/* Check if a new block is required. */
-	if (!krcp->bhead ||
-			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
+	if (!krcp->bkvhead[idx] ||
+			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
 		bnode = get_cached_bnode(krcp);
 		if (!bnode) {
-			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
-
 			/*
 			 * To keep this path working on raw non-preemptible
 			 * sections, prevent the optional entry into the
@@ -3153,7 +3180,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 			if (IS_ENABLED(CONFIG_PREEMPT_RT))
 				return false;
 
-			bnode = (struct kfree_rcu_bulk_data *)
+			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		}
 
@@ -3163,30 +3190,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
-		bnode->next = krcp->bhead;
+		bnode->next = krcp->bkvhead[idx];
 
 		/* Attach it to the head. */
-		krcp->bhead = bnode;
+		krcp->bkvhead[idx] = bnode;
 	}
 
 	/* Finally insert. */
-	krcp->bhead->records[krcp->bhead->nr_records++] =
-		(void *) head - (unsigned long) func;
+	krcp->bkvhead[idx]->records
+		[krcp->bkvhead[idx]->nr_records++] = ptr;
 
 	return true;
 }
 
 /*
- * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
- * period. Please note there are two paths are maintained, one is the main one
- * that uses kfree_bulk() interface and second one is emergency one, that is
- * used only when the main path can not be maintained temporary, due to memory
- * pressure.
+ * Queue a request for lazy invocation of appropriate free routine after a
+ * grace period. Please note there are three paths are maintained, two are the
+ * main ones that use array of pointers interface and third one is emergency
+ * one, that is used only when the main path can not be maintained temporary,
+ * due to memory pressure.
  *
  * Each kfree_call_rcu() request is added to a batch. The batch will be drained
  * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
  * be free'd in workqueue context. This allows us to: batch requests together to
- * reduce the number of grace periods during heavy kfree_rcu() load.
+ * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
  */
 void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
@@ -3209,7 +3236,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	 * Under high memory pressure GFP_NOWAIT can fail,
 	 * in that case the emergency path is maintained.
 	 */
-	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
+	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
 		head->func = func;
 		head->next = krcp->head;
 		krcp->head = head;
@@ -4187,7 +4214,7 @@ static void __init kfree_rcu_batch_init(void)
 
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-		struct kfree_rcu_bulk_data *bnode;
+		struct kvfree_rcu_bulk_data *bnode;
 
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
@@ -4195,7 +4222,7 @@ static void __init kfree_rcu_batch_init(void)
 		}
 
 		for (i = 0; i < rcu_min_cached_objs; i++) {
-			bnode = (struct kfree_rcu_bulk_data *)
+			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 
 			if (bnode)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 10/16] rcu/tiny: support vmalloc in tiny-RCU
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (8 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 11/16] rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_* Uladzislau Rezki (Sony)
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Replace kfree() with kvfree() in rcu_reclaim_tiny().
This makes it possible to release either SLAB or vmalloc
objects after a GP.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tiny.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index dd572ce7c747..4b99f7b88bee 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -23,6 +23,7 @@
 #include <linux/cpu.h>
 #include <linux/prefetch.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 
 #include "rcu.h"
 
@@ -86,7 +87,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
 	rcu_lock_acquire(&rcu_callback_map);
 	if (__is_kfree_rcu_offset(offset)) {
 		trace_rcu_invoke_kfree_callback("", head, offset);
-		kfree((void *)head - offset);
+		kvfree((void *)head - offset);
 		rcu_lock_release(&rcu_callback_map);
 		return true;
 	}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 11/16] rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_*
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (9 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 10/16] rcu/tiny: support vmalloc in tiny-RCU Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 12/16] mm/list_lru.c: Rename kvfree_rcu() to local variant Uladzislau Rezki (Sony)
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

The following changes are introduced:

1. Rename rcu_invoke_kfree_callback() to rcu_invoke_kvfree_callback(),
as well as the associated trace events, so the rcu_kfree_callback(),
becomes rcu_kvfree_callback(). The reason is to be aligned with kvfree()
notation.

2. Rename __is_kfree_rcu_offset to __is_kvfree_rcu_offset. All RCU
paths use kvfree() now instead of kfree(), thus rename it.

3. Rename kfree_call_rcu() to the kvfree_call_rcu(). The reason is,
it is capable of freeing vmalloc() memory now. Do the same with
__kfree_rcu() macro, it becomes __kvfree_rcu(), the goal is the
same.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/rcupdate.h   | 14 +++++++-------
 include/linux/rcutiny.h    |  2 +-
 include/linux/rcutree.h    |  2 +-
 include/trace/events/rcu.h |  8 ++++----
 kernel/rcu/tiny.c          |  4 ++--
 kernel/rcu/tree.c          | 16 ++++++++--------
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 659cbfa7581a..b344fc800a9b 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -828,17 +828,17 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 
 /*
  * Does the specified offset indicate that the corresponding rcu_head
- * structure can be handled by kfree_rcu()?
+ * structure can be handled by kvfree_rcu()?
  */
-#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
+#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
  * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
  */
-#define __kfree_rcu(head, offset) \
+#define __kvfree_rcu(head, offset) \
 	do { \
-		BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
-		kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
+		BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \
+		kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
 	} while (0)
 
 /**
@@ -857,7 +857,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * Because the functions are not allowed in the low-order 4096 bytes of
  * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
  * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
+ * be generated in __kvfree_rcu(). If this error is triggered, you can
  * either fall back to use of call_rcu() or rearrange the structure to
  * position the rcu_head structure into the first 4096 bytes.
  *
@@ -872,7 +872,7 @@ do {									\
 	typeof (ptr) ___p = (ptr);					\
 									\
 	if (___p)							\
-		__kfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
+		__kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
 } while (0)
 
 /*
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 8b851904efed..00bbd0e328c8 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -31,7 +31,7 @@ static inline void synchronize_rcu_expedited(void)
 	synchronize_rcu();
 }
 
-static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	call_rcu(head, func);
 }
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 9366fa4d0717..da07f8dc05c6 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(int cpu)
 }
 
 void synchronize_rcu_expedited(void);
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
 
 void rcu_barrier(void);
 bool rcu_eqs_special_set(int cpu);
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 02dcd119f326..9d34ee810894 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -509,13 +509,13 @@ TRACE_EVENT_RCU(rcu_callback,
 
 /*
  * Tracepoint for the registration of a single RCU callback of the special
- * kfree() form.  The first argument is the RCU type, the second argument
+ * kvfree() form.  The first argument is the RCU type, the second argument
  * is a pointer to the RCU callback, the third argument is the offset
  * of the callback within the enclosing RCU-protected data structure,
  * the fourth argument is the number of lazy callbacks queued, and the
  * fifth argument is the total number of callbacks queued.
  */
-TRACE_EVENT_RCU(rcu_kfree_callback,
+TRACE_EVENT_RCU(rcu_kvfree_callback,
 
 	TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
 		 long qlen),
@@ -599,12 +599,12 @@ TRACE_EVENT_RCU(rcu_invoke_callback,
 
 /*
  * Tracepoint for the invocation of a single RCU callback of the special
- * kfree() form.  The first argument is the RCU flavor, the second
+ * kvfree() form.  The first argument is the RCU flavor, the second
  * argument is a pointer to the RCU callback, and the third argument
  * is the offset of the callback within the enclosing RCU-protected
  * data structure.
  */
-TRACE_EVENT_RCU(rcu_invoke_kfree_callback,
+TRACE_EVENT_RCU(rcu_invoke_kvfree_callback,
 
 	TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset),
 
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 4b99f7b88bee..aa897c3f2e92 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -85,8 +85,8 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
 	unsigned long offset = (unsigned long)head->func;
 
 	rcu_lock_acquire(&rcu_callback_map);
-	if (__is_kfree_rcu_offset(offset)) {
-		trace_rcu_invoke_kfree_callback("", head, offset);
+	if (__is_kvfree_rcu_offset(offset)) {
+		trace_rcu_invoke_kvfree_callback("", head, offset);
 		kvfree((void *)head - offset);
 		rcu_lock_release(&rcu_callback_map);
 		return true;
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9f84ff80bc25..da29e6078392 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2771,8 +2771,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		return; // Enqueued onto ->nocb_bypass, so just leave.
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
-	if (__is_kfree_rcu_offset((unsigned long)func))
-		trace_rcu_kfree_callback(rcu_state.name, head,
+	if (__is_kvfree_rcu_offset((unsigned long)func))
+		trace_rcu_kvfree_callback(rcu_state.name, head,
 					 (unsigned long)func,
 					 rcu_segcblist_n_cbs(&rdp->cblist));
 	else
@@ -3012,7 +3012,7 @@ static void kfree_rcu_work(struct work_struct *work)
 					bkvhead[i]->records);
 			} else { // vmalloc() / vfree().
 				for (j = 0; j < bkvhead[i]->nr_records; j++) {
-					trace_rcu_invoke_kfree_callback(
+					trace_rcu_invoke_kvfree_callback(
 						rcu_state.name,
 						bkvhead[i]->records[j], 0);
 
@@ -3045,9 +3045,9 @@ static void kfree_rcu_work(struct work_struct *work)
 		next = head->next;
 		debug_rcu_head_unqueue((struct rcu_head *)ptr);
 		rcu_lock_acquire(&rcu_callback_map);
-		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
+		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
 
-		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
+		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
 			kvfree(ptr);
 
 		rcu_lock_release(&rcu_callback_map);
@@ -3210,12 +3210,12 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
  * one, that is used only when the main path can not be maintained temporary,
  * due to memory pressure.
  *
- * Each kfree_call_rcu() request is added to a batch. The batch will be drained
+ * Each kvfree_call_rcu() request is added to a batch. The batch will be drained
  * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
  * be free'd in workqueue context. This allows us to: batch requests together to
  * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
  */
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
@@ -3254,7 +3254,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
 }
-EXPORT_SYMBOL_GPL(kfree_call_rcu);
+EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 12/16] mm/list_lru.c: Rename kvfree_rcu() to local variant
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (10 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 11/16] rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_* Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 13/16] rcu: Introduce 2 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Rename kvfree_rcu() function to the kvfree_rcu_local() one.
The purpose is to prevent a conflict of two same function
declarations. The kvfree_rcu() will be globally visible
what would lead to a build error. No functional change.

Cc: linux-mm@kvack.org
Cc: rcu@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/list_lru.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 4d5294c39bba..42c95bcb53ca 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -373,14 +373,14 @@ static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
 	struct list_lru_memcg *memcg_lrus;
 	/*
 	 * This is called when shrinker has already been unregistered,
-	 * and nobody can use it. So, there is no need to use kvfree_rcu().
+	 * and nobody can use it. So, there is no need to use kvfree_rcu_local().
 	 */
 	memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
 	__memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
 	kvfree(memcg_lrus);
 }
 
-static void kvfree_rcu(struct rcu_head *head)
+static void kvfree_rcu_local(struct rcu_head *head)
 {
 	struct list_lru_memcg *mlru;
 
@@ -419,7 +419,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru,
 	rcu_assign_pointer(nlru->memcg_lrus, new);
 	spin_unlock_irq(&nlru->lock);
 
-	call_rcu(&old->rcu, kvfree_rcu);
+	call_rcu(&old->rcu, kvfree_rcu_local);
 	return 0;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 13/16] rcu: Introduce 2 arg kvfree_rcu() interface
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (11 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 12/16] mm/list_lru.c: Rename kvfree_rcu() to local variant Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 14/16] rcu: Support reclaim for head-less object Uladzislau Rezki (Sony)
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

kvmalloc() can allocate two types of objects: SLAB backed
and vmalloc backed. How it behaves depends on requested
object's size and memory pressure.

Add a kvfree_rcu() interface that can free memory allocated
via kvmalloc(). It is a simple alias to kfree_rcu() which
can now handle either type of object.

<snip>
    struct test_kvfree_rcu {
        struct rcu_head rcu;
        unsigned char array[100];
    };

    struct test_kvfree_rcu *p;

    p = kvmalloc(10 * PAGE_SIZE);
    if (p)
        kvfree_rcu(p, rcu);
<snip>

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcupdate.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index b344fc800a9b..51b26ab02878 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -875,6 +875,15 @@ do {									\
 		__kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
 } while (0)
 
+/**
+ * kvfree_rcu() - kvfree an object after a grace period.
+ * @ptr:	pointer to kvfree
+ * @rhf:	the name of the struct rcu_head within the type of @ptr.
+ *
+ * Same as kfree_rcu(), just simple alias.
+ */
+#define kvfree_rcu(ptr, rhf) kfree_rcu(ptr, rhf)
+
 /*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 14/16] rcu: Support reclaim for head-less object
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (12 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 13/16] rcu: Introduce 2 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:47 ` [PATCH v2 15/16] rcu: Introduce single argument kvfree_rcu() interface Uladzislau Rezki (Sony)
  2020-05-25 21:48 ` [PATCH v2 16/16] lib/test_vmalloc.c: Add test cases for kvfree_rcu() Uladzislau Rezki (Sony)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Update the kvfree_call_rcu() function with head-less support.
This allows RCU to reclaim objects without an embedded rcu_head.

tree-RCU:
We introduce two chains of arrays to store SLAB-backed and vmalloc
pointers, each.  Storage in either of these arrays does not require
embedding an rcu_head within the object.

Maintaining the arrays may become impossible due to high memory
pressure. For such cases there is an emergency path. Objects with
rcu_head inside are just queued on a backup rcu_head list. Later on
that list is drained. As for the head-less variant, as the current
context can sleep, the following emergency measures are applied:
   a) Synchronously wait until a grace period has elapsed.
   b) Call kvfree().

tiny-RCU:
For double argument calls, there are no new changes in behavior. For
single argument call, kvfree() is directly inlined on the current
stack after a synchronize_rcu() call. Note that for tiny-RCU, any
call to synchronize_rcu() is actually a quiescent state, therefore
it does nothing.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcutiny.h | 18 ++++++++++++++++-
 kernel/rcu/tree.c       | 45 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 00bbd0e328c8..e7e4ec5938af 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -31,9 +31,25 @@ static inline void synchronize_rcu_expedited(void)
 	synchronize_rcu();
 }
 
+/*
+ * Add one more declaration of kvfree() here. It is
+ * not so straight forward to just include <linux/mm.h>
+ * where it is defined due to getting many compile
+ * errors caused by that include.
+ */
+extern void kvfree(const void *addr);
+
 static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-	call_rcu(head, func);
+	if (head) {
+		call_rcu(head, func);
+		return;
+	}
+
+	// kvfree_rcu(one_arg) call.
+	might_sleep();
+	synchronize_rcu();
+	kvfree((void *) func);
 }
 
 void rcu_qs(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index da29e6078392..2459830a3851 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3180,6 +3180,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 			if (IS_ENABLED(CONFIG_PREEMPT_RT))
 				return false;
 
+			/*
+			 * NOTE: For one argument of kvfree_rcu() we can
+			 * drop the lock and get the page in sleepable
+			 * context. That would allow to maintain an array
+			 * for the CONFIG_PREEMPT_RT as well if no cached
+			 * pages are available.
+			 */
 			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		}
@@ -3219,16 +3226,33 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
+	bool success;
 	void *ptr;
 
+	if (head) {
+		ptr = (void *) head - (unsigned long) func;
+	} else {
+		/*
+		 * Please note there is a limitation for the head-less
+		 * variant, that is why there is a clear rule for such
+		 * objects: it can be used from might_sleep() context
+		 * only. For other places please embed an rcu_head to
+		 * your data.
+		 */
+		might_sleep();
+		ptr = (unsigned long *) func;
+	}
+
 	krcp = krc_this_cpu_lock(&flags);
-	ptr = (void *)head - (unsigned long)func;
 
 	// Queue the object but don't yet schedule the batch.
 	if (debug_rcu_head_queue(ptr)) {
 		// Probable double kfree_rcu(), just leak.
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
+
+		// Mark as success and leave.
+		success = true;
 		goto unlock_return;
 	}
 
@@ -3236,10 +3260,16 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	 * Under high memory pressure GFP_NOWAIT can fail,
 	 * in that case the emergency path is maintained.
 	 */
-	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
+	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
+	if (!success) {
+		if (head == NULL)
+			// Inline if kvfree_rcu(one_arg) call.
+			goto unlock_return;
+
 		head->func = func;
 		head->next = krcp->head;
 		krcp->head = head;
+		success = true;
 	}
 
 	WRITE_ONCE(krcp->count, krcp->count + 1);
@@ -3253,6 +3283,17 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
+
+	/*
+	 * Inline kvfree() after synchronize_rcu(). We can do
+	 * it from might_sleep() context only, so the current
+	 * CPU can pass the QS state.
+	 */
+	if (!success) {
+		debug_rcu_head_unqueue((struct rcu_head *) ptr);
+		synchronize_rcu();
+		kvfree(ptr);
+	}
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 15/16] rcu: Introduce single argument kvfree_rcu() interface
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (13 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 14/16] rcu: Support reclaim for head-less object Uladzislau Rezki (Sony)
@ 2020-05-25 21:47 ` Uladzislau Rezki (Sony)
  2020-05-25 21:48 ` [PATCH v2 16/16] lib/test_vmalloc.c: Add test cases for kvfree_rcu() Uladzislau Rezki (Sony)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:47 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Make kvfree_rcu() capable of freeing objects that will not
embed an rcu_head within it. This saves storage overhead in
such objects. Reclaiming headless objects this way requires
only a single argument (pointer to the object).

After this patch, there are two ways to use kvfree_rcu():

a) kvfree_rcu(ptr, rhf);
    struct X {
        struct rcu_head rhf;
        unsigned char data[100];
    };

    void *ptr = kvmalloc(sizeof(struct X), GFP_KERNEL);
    if (ptr)
        kvfree_rcu(ptr, rhf);

b) kvfree_rcu(ptr);
    void *ptr = kvmalloc(some_bytes, GFP_KERNEL);
    if (ptr)
        kvfree_rcu(ptr);

Note that the headless usage (example b) can only be used in a code
that can sleep. This is enforced by the CONFIG_DEBUG_ATOMIC_SLEEP
option.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/rcupdate.h | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 51b26ab02878..d15d46db61f7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -877,12 +877,42 @@ do {									\
 
 /**
  * kvfree_rcu() - kvfree an object after a grace period.
- * @ptr:	pointer to kvfree
- * @rhf:	the name of the struct rcu_head within the type of @ptr.
  *
- * Same as kfree_rcu(), just simple alias.
+ * This macro consists of one or two arguments and it is
+ * based on whether an object is head-less or not. If it
+ * has a head then a semantic stays the same as it used
+ * to be before:
+ *
+ *     kvfree_rcu(ptr, rhf);
+ *
+ * where @ptr is a pointer to kvfree(), @rhf is the name
+ * of the rcu_head structure within the type of @ptr.
+ *
+ * When it comes to head-less variant, only one argument
+ * is passed and that is just a pointer which has to be
+ * freed after a grace period. Therefore the semantic is
+ *
+ *     kvfree_rcu(ptr);
+ *
+ * where @ptr is a pointer to kvfree().
+ *
+ * Please note, head-less way of freeing is permitted to
+ * use from a context that has to follow might_sleep()
+ * annotation. Otherwise, please switch and embed the
+ * rcu_head structure within the type of @ptr.
  */
-#define kvfree_rcu(ptr, rhf) kfree_rcu(ptr, rhf)
+#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__,		\
+	kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
+
+#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
+#define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf)
+#define kvfree_rcu_arg_1(ptr)					\
+do {								\
+	typeof(ptr) ___p = (ptr);				\
+								\
+	if (___p)						\
+		kvfree_call_rcu(NULL, (rcu_callback_t) (___p));	\
+} while (0)
 
 /*
  * Place this after a lock-acquisition primitive to guarantee that
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v2 16/16] lib/test_vmalloc.c: Add test cases for kvfree_rcu()
  2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (14 preceding siblings ...)
  2020-05-25 21:47 ` [PATCH v2 15/16] rcu: Introduce single argument kvfree_rcu() interface Uladzislau Rezki (Sony)
@ 2020-05-25 21:48 ` Uladzislau Rezki (Sony)
  15 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-05-25 21:48 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce four new test cases for testing the kvfree_rcu()
interface. Two of them belong to single argument functionality
and another two for 2-argument functionality.

The aim is to stress and check how kvfree_rcu() behaves under
different load and memory conditions and analyze its performance
throughput.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 lib/test_vmalloc.c | 103 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 95 insertions(+), 8 deletions(-)

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 8bbefcaddfe8..ec73561cda2e 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -15,6 +15,8 @@
 #include <linux/delay.h>
 #include <linux/rwsem.h>
 #include <linux/mm.h>
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
 
 #define __param(type, name, init, msg)		\
 	static type name = init;				\
@@ -35,14 +37,18 @@ __param(int, test_loop_count, 1000000,
 
 __param(int, run_test_mask, INT_MAX,
 	"Set tests specified in the mask.\n\n"
-		"\t\tid: 1,   name: fix_size_alloc_test\n"
-		"\t\tid: 2,   name: full_fit_alloc_test\n"
-		"\t\tid: 4,   name: long_busy_list_alloc_test\n"
-		"\t\tid: 8,   name: random_size_alloc_test\n"
-		"\t\tid: 16,  name: fix_align_alloc_test\n"
-		"\t\tid: 32,  name: random_size_align_alloc_test\n"
-		"\t\tid: 64,  name: align_shift_alloc_test\n"
-		"\t\tid: 128, name: pcpu_alloc_test\n"
+		"\t\tid: 1,    name: fix_size_alloc_test\n"
+		"\t\tid: 2,    name: full_fit_alloc_test\n"
+		"\t\tid: 4,    name: long_busy_list_alloc_test\n"
+		"\t\tid: 8,    name: random_size_alloc_test\n"
+		"\t\tid: 16,   name: fix_align_alloc_test\n"
+		"\t\tid: 32,   name: random_size_align_alloc_test\n"
+		"\t\tid: 64,   name: align_shift_alloc_test\n"
+		"\t\tid: 128,  name: pcpu_alloc_test\n"
+		"\t\tid: 256,  name: kvfree_rcu_1_arg_vmalloc_test\n"
+		"\t\tid: 512,  name: kvfree_rcu_2_arg_vmalloc_test\n"
+		"\t\tid: 1024, name: kvfree_rcu_1_arg_slab_test\n"
+		"\t\tid: 2048, name: kvfree_rcu_2_arg_slab_test\n"
 		/* Add a new test case description here. */
 );
 
@@ -328,6 +334,83 @@ pcpu_alloc_test(void)
 	return rv;
 }
 
+struct test_kvfree_rcu {
+	struct rcu_head rcu;
+	unsigned char array[20];
+};
+
+static int
+kvfree_rcu_1_arg_vmalloc_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = vmalloc(1 * PAGE_SIZE);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p);
+	}
+
+	return 0;
+}
+
+static int
+kvfree_rcu_2_arg_vmalloc_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = vmalloc(1 * PAGE_SIZE);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p, rcu);
+	}
+
+	return 0;
+}
+
+static int
+kvfree_rcu_1_arg_slab_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = kmalloc(sizeof(*p), GFP_KERNEL);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p);
+	}
+
+	return 0;
+}
+
+static int
+kvfree_rcu_2_arg_slab_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = kmalloc(sizeof(*p), GFP_KERNEL);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p, rcu);
+	}
+
+	return 0;
+}
+
 struct test_case_desc {
 	const char *test_name;
 	int (*test_func)(void);
@@ -342,6 +425,10 @@ static struct test_case_desc test_case_array[] = {
 	{ "random_size_align_alloc_test", random_size_align_alloc_test },
 	{ "align_shift_alloc_test", align_shift_alloc_test },
 	{ "pcpu_alloc_test", pcpu_alloc_test },
+	{ "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test },
+	{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test },
+	{ "kvfree_rcu_1_arg_slab_test", kvfree_rcu_1_arg_slab_test },
+	{ "kvfree_rcu_2_arg_slab_test", kvfree_rcu_2_arg_slab_test },
 	/* Add a new test case here. */
 };
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-25 21:47 ` [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
@ 2020-06-17 23:46   ` Paul E. McKenney
  2020-06-18  0:52     ` Matthew Wilcox
  2020-06-18 17:25     ` Uladzislau Rezki
  0 siblings, 2 replies; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-17 23:46 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Mon, May 25, 2020 at 11:47:53PM +0200, Uladzislau Rezki (Sony) wrote:
> To do so, we use an array of kvfree_rcu_bulk_data structures.
> It consists of two elements:
>  - index number 0 corresponds to slab pointers.
>  - index number 1 corresponds to vmalloc pointers.
> 
> Keeping vmalloc pointers separated from slab pointers makes
> it possible to invoke the right freeing API for the right
> kind of pointer.
> 
> It also prepares us for future headless support for vmalloc
> and SLAB objects. Such objects cannot be queued on a linked
> list and are instead directly into an array.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---

[ . . . ]

> +	// Handle two first channels.
> +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> +			bnext = bkvhead[i]->next;
> +			debug_rcu_bhead_unqueue(bkvhead[i]);
> +
> +			rcu_lock_acquire(&rcu_callback_map);
> +			if (i == 0) { // kmalloc() / kfree().
> +				trace_rcu_invoke_kfree_bulk_callback(
> +					rcu_state.name, bkvhead[i]->nr_records,
> +					bkvhead[i]->records);
> +
> +				kfree_bulk(bkvhead[i]->nr_records,
> +					bkvhead[i]->records);
> +			} else { // vmalloc() / vfree().
> +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> +					trace_rcu_invoke_kfree_callback(
> +						rcu_state.name,
> +						bkvhead[i]->records[j], 0);
> +
> +					vfree(bkvhead[i]->records[j]);
> +				}
> +			}
> +			rcu_lock_release(&rcu_callback_map);

Not an emergency, but did you look into replacing this "if" statement
with an array of pointers to functions implementing the legs of the
"if" statement?  If nothing else, this would greatly reduced indentation.

I am taking this as is, but if you have not already done so, could you
please look into this for a follow-up patch?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-17 23:46   ` Paul E. McKenney
@ 2020-06-18  0:52     ` Matthew Wilcox
  2020-06-18  3:18       ` Paul E. McKenney
  2020-06-18 17:30       ` Uladzislau Rezki
  2020-06-18 17:25     ` Uladzislau Rezki
  1 sibling, 2 replies; 43+ messages in thread
From: Matthew Wilcox @ 2020-06-18  0:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Joel Fernandes, RCU, Oleksiy Avramchenko

On Wed, Jun 17, 2020 at 04:46:09PM -0700, Paul E. McKenney wrote:
> > +	// Handle two first channels.
> > +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> > +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> > +			bnext = bkvhead[i]->next;
> > +			debug_rcu_bhead_unqueue(bkvhead[i]);
> > +
> > +			rcu_lock_acquire(&rcu_callback_map);
> > +			if (i == 0) { // kmalloc() / kfree().
> > +				trace_rcu_invoke_kfree_bulk_callback(
> > +					rcu_state.name, bkvhead[i]->nr_records,
> > +					bkvhead[i]->records);
> > +
> > +				kfree_bulk(bkvhead[i]->nr_records,
> > +					bkvhead[i]->records);
> > +			} else { // vmalloc() / vfree().
> > +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > +					trace_rcu_invoke_kfree_callback(
> > +						rcu_state.name,
> > +						bkvhead[i]->records[j], 0);
> > +
> > +					vfree(bkvhead[i]->records[j]);
> > +				}
> > +			}
> > +			rcu_lock_release(&rcu_callback_map);
> 
> Not an emergency, but did you look into replacing this "if" statement
> with an array of pointers to functions implementing the legs of the
> "if" statement?  If nothing else, this would greatly reduced indentation.

I don't think that replacing direct function calls with indirect function
calls is a great suggestion with the current state of play around branch
prediction.

I'd suggest:

 			rcu_lock_acquire(&rcu_callback_map);
			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
				bkvhead[i]->nr_records, bkvhead[i]->records);
 			if (i == 0) {
 				kfree_bulk(bkvhead[i]->nr_records,
 					bkvhead[i]->records);
 			} else {
 				for (j = 0; j < bkvhead[i]->nr_records; j++) {
 					vfree(bkvhead[i]->records[j]);
 				}
 			}
 			rcu_lock_release(&rcu_callback_map);

But I'd also suggest a vfree_bulk be added.  There are a few things
which would be better done in bulk as part of the vfree process
(we batch them up already, but i'm sure we could do better).



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18  0:52     ` Matthew Wilcox
@ 2020-06-18  3:18       ` Paul E. McKenney
  2020-06-18 17:35         ` Uladzislau Rezki
  2020-06-18 17:30       ` Uladzislau Rezki
  1 sibling, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-18  3:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Joel Fernandes, RCU, Oleksiy Avramchenko

On Wed, Jun 17, 2020 at 05:52:14PM -0700, Matthew Wilcox wrote:
> On Wed, Jun 17, 2020 at 04:46:09PM -0700, Paul E. McKenney wrote:
> > > +	// Handle two first channels.
> > > +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> > > +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> > > +			bnext = bkvhead[i]->next;
> > > +			debug_rcu_bhead_unqueue(bkvhead[i]);
> > > +
> > > +			rcu_lock_acquire(&rcu_callback_map);
> > > +			if (i == 0) { // kmalloc() / kfree().
> > > +				trace_rcu_invoke_kfree_bulk_callback(
> > > +					rcu_state.name, bkvhead[i]->nr_records,
> > > +					bkvhead[i]->records);
> > > +
> > > +				kfree_bulk(bkvhead[i]->nr_records,
> > > +					bkvhead[i]->records);
> > > +			} else { // vmalloc() / vfree().
> > > +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > > +					trace_rcu_invoke_kfree_callback(
> > > +						rcu_state.name,
> > > +						bkvhead[i]->records[j], 0);
> > > +
> > > +					vfree(bkvhead[i]->records[j]);
> > > +				}
> > > +			}
> > > +			rcu_lock_release(&rcu_callback_map);
> > 
> > Not an emergency, but did you look into replacing this "if" statement
> > with an array of pointers to functions implementing the legs of the
> > "if" statement?  If nothing else, this would greatly reduced indentation.
> 
> I don't think that replacing direct function calls with indirect function
> calls is a great suggestion with the current state of play around branch
> prediction.
> 
> I'd suggest:
> 
>  			rcu_lock_acquire(&rcu_callback_map);
> 			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> 				bkvhead[i]->nr_records, bkvhead[i]->records);
>  			if (i == 0) {
>  				kfree_bulk(bkvhead[i]->nr_records,
>  					bkvhead[i]->records);
>  			} else {
>  				for (j = 0; j < bkvhead[i]->nr_records; j++) {
>  					vfree(bkvhead[i]->records[j]);
>  				}
>  			}
>  			rcu_lock_release(&rcu_callback_map);
> 
> But I'd also suggest a vfree_bulk be added.  There are a few things
> which would be better done in bulk as part of the vfree process
> (we batch them up already, but i'm sure we could do better).

I suspect that he would like to keep the tracing.

It might be worth trying the branches, given that they would be constant
and indexed by "i".  The compiler might well remove the indirection.

The compiler guys brag about doing so, which of course might or might
not have any correlation to a given compiler actually doing so.  :-/

Having a vfree_bulk() might well be useful, but I would feel more
confidence in that if there were other callers of kfree_bulk().

But again, either way, future work as far as this series is concerned.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-17 23:46   ` Paul E. McKenney
  2020-06-18  0:52     ` Matthew Wilcox
@ 2020-06-18 17:25     ` Uladzislau Rezki
  2020-06-18 17:32       ` Paul E. McKenney
  1 sibling, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 17:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

> > +	// Handle two first channels.
> > +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> > +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> > +			bnext = bkvhead[i]->next;
> > +			debug_rcu_bhead_unqueue(bkvhead[i]);
> > +
> > +			rcu_lock_acquire(&rcu_callback_map);
> > +			if (i == 0) { // kmalloc() / kfree().
> > +				trace_rcu_invoke_kfree_bulk_callback(
> > +					rcu_state.name, bkvhead[i]->nr_records,
> > +					bkvhead[i]->records);
> > +
> > +				kfree_bulk(bkvhead[i]->nr_records,
> > +					bkvhead[i]->records);
> > +			} else { // vmalloc() / vfree().
> > +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > +					trace_rcu_invoke_kfree_callback(
> > +						rcu_state.name,
> > +						bkvhead[i]->records[j], 0);
> > +
> > +					vfree(bkvhead[i]->records[j]);
> > +				}
> > +			}
> > +			rcu_lock_release(&rcu_callback_map);
> 
> Not an emergency, but did you look into replacing this "if" statement
> with an array of pointers to functions implementing the legs of the
> "if" statement?  If nothing else, this would greatly reduced indentation.
> 
>
> I am taking this as is, but if you have not already done so, could you
> please look into this for a follow-up patch?
> 
I do not think it makes sense, because it would require to check each
pointer in the array, what can lead to many branching, i.e. "if-else"
instructions.

Paul, thank you to take it in!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18  0:52     ` Matthew Wilcox
  2020-06-18  3:18       ` Paul E. McKenney
@ 2020-06-18 17:30       ` Uladzislau Rezki
  2020-06-18 17:35         ` Matthew Wilcox
  1 sibling, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 17:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Paul E. McKenney, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Joel Fernandes, RCU, Oleksiy Avramchenko

> > 
> > Not an emergency, but did you look into replacing this "if" statement
> > with an array of pointers to functions implementing the legs of the
> > "if" statement?  If nothing else, this would greatly reduced indentation.
> 
> I don't think that replacing direct function calls with indirect function
> calls is a great suggestion with the current state of play around branch
> prediction.
> 
> I'd suggest:
> 
>  			rcu_lock_acquire(&rcu_callback_map);
> 			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> 				bkvhead[i]->nr_records, bkvhead[i]->records);
>  			if (i == 0) {
>  				kfree_bulk(bkvhead[i]->nr_records,
>  					bkvhead[i]->records);
>  			} else {
>  				for (j = 0; j < bkvhead[i]->nr_records; j++) {
>  					vfree(bkvhead[i]->records[j]);
>  				}
>  			}
>  			rcu_lock_release(&rcu_callback_map);
>
There are two different trace functions, one for "bulk" tracing
messages, and another one is per one call of kfree(), though we use 
to indicate vfree() call.

Probably we can rename it to: trace_rcu_invoke_kvfree_callback();

What do you think?

> 
> But I'd also suggest a vfree_bulk be added.  There are a few things
> which would be better done in bulk as part of the vfree process
> (we batch them up already, but i'm sure we could do better).
> 
I was thinking to implement of vfree_bulk() API, but i guess it can
be done as future work.

Does that sound good?

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:25     ` Uladzislau Rezki
@ 2020-06-18 17:32       ` Paul E. McKenney
  2020-06-18 17:56         ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-18 17:32 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 07:25:04PM +0200, Uladzislau Rezki wrote:
> > > +	// Handle two first channels.
> > > +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> > > +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> > > +			bnext = bkvhead[i]->next;
> > > +			debug_rcu_bhead_unqueue(bkvhead[i]);
> > > +
> > > +			rcu_lock_acquire(&rcu_callback_map);
> > > +			if (i == 0) { // kmalloc() / kfree().
> > > +				trace_rcu_invoke_kfree_bulk_callback(
> > > +					rcu_state.name, bkvhead[i]->nr_records,
> > > +					bkvhead[i]->records);
> > > +
> > > +				kfree_bulk(bkvhead[i]->nr_records,
> > > +					bkvhead[i]->records);
> > > +			} else { // vmalloc() / vfree().
> > > +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > > +					trace_rcu_invoke_kfree_callback(
> > > +						rcu_state.name,
> > > +						bkvhead[i]->records[j], 0);
> > > +
> > > +					vfree(bkvhead[i]->records[j]);
> > > +				}
> > > +			}
> > > +			rcu_lock_release(&rcu_callback_map);
> > 
> > Not an emergency, but did you look into replacing this "if" statement
> > with an array of pointers to functions implementing the legs of the
> > "if" statement?  If nothing else, this would greatly reduced indentation.
> > 
> >
> > I am taking this as is, but if you have not already done so, could you
> > please look into this for a follow-up patch?
> > 
> I do not think it makes sense, because it would require to check each
> pointer in the array, what can lead to many branching, i.e. "if-else"
> instructions.

Mightn't the compiler simply unroll the outer loop?  Then the first
unrolled iteration of that loop would contain the then-clause and
the second unrolled iteration would contain the else-clause.  At that
point, there would be no checking, just direct calls.

Or am I missing something?

> Paul, thank you to take it in!

Thank you for persisting!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18  3:18       ` Paul E. McKenney
@ 2020-06-18 17:35         ` Uladzislau Rezki
  2020-06-18 17:57           ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 17:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Matthew Wilcox, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Joel Fernandes, RCU, Oleksiy Avramchenko

> > 
> > I don't think that replacing direct function calls with indirect function
> > calls is a great suggestion with the current state of play around branch
> > prediction.
> > 
> > I'd suggest:
> > 
> >  			rcu_lock_acquire(&rcu_callback_map);
> > 			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > 				bkvhead[i]->nr_records, bkvhead[i]->records);
> >  			if (i == 0) {
> >  				kfree_bulk(bkvhead[i]->nr_records,
> >  					bkvhead[i]->records);
> >  			} else {
> >  				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> >  					vfree(bkvhead[i]->records[j]);
> >  				}
> >  			}
> >  			rcu_lock_release(&rcu_callback_map);
> > 
> > But I'd also suggest a vfree_bulk be added.  There are a few things
> > which would be better done in bulk as part of the vfree process
> > (we batch them up already, but i'm sure we could do better).
> 
> I suspect that he would like to keep the tracing.
> 
> It might be worth trying the branches, given that they would be constant
> and indexed by "i".  The compiler might well remove the indirection.
> 
> The compiler guys brag about doing so, which of course might or might
> not have any correlation to a given compiler actually doing so.  :-/
> 
> Having a vfree_bulk() might well be useful, but I would feel more
> confidence in that if there were other callers of kfree_bulk().
>
Hmm... I think replacing that with vfree_bulk() is a good idea though.

> 
> But again, either way, future work as far as this series is concerned.
> 
What do you mean: is concerned?

We are planning to implement kfree_rcu() to be integrated directly into
SLAB: SLAB, SLUB, SLOB. So, there are plenty of future work :)

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:30       ` Uladzislau Rezki
@ 2020-06-18 17:35         ` Matthew Wilcox
  2020-06-18 20:03           ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2020-06-18 17:35 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 07:30:49PM +0200, Uladzislau Rezki wrote:
> > I'd suggest:
> > 
> >  			rcu_lock_acquire(&rcu_callback_map);
> > 			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > 				bkvhead[i]->nr_records, bkvhead[i]->records);
> >  			if (i == 0) {
> >  				kfree_bulk(bkvhead[i]->nr_records,
> >  					bkvhead[i]->records);
> >  			} else {
> >  				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> >  					vfree(bkvhead[i]->records[j]);
> >  				}
> >  			}
> >  			rcu_lock_release(&rcu_callback_map);
> >
> There are two different trace functions, one for "bulk" tracing
> messages, and another one is per one call of kfree(), though we use 
> to indicate vfree() call.
> 
> Probably we can rename it to: trace_rcu_invoke_kvfree_callback();
> 
> What do you think?

Works for me!

> > But I'd also suggest a vfree_bulk be added.  There are a few things
> > which would be better done in bulk as part of the vfree process
> > (we batch them up already, but i'm sure we could do better).
> 
> I was thinking to implement of vfree_bulk() API, but i guess it can
> be done as future work.
> 
> Does that sound good?

Yes, definitely a future piece of work.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:32       ` Paul E. McKenney
@ 2020-06-18 17:56         ` Uladzislau Rezki
  2020-06-18 18:15           ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 17:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, Joel Fernandes, RCU,
	Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 10:32:06AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 07:25:04PM +0200, Uladzislau Rezki wrote:
> > > > +	// Handle two first channels.
> > > > +	for (i = 0; i < FREE_N_CHANNELS; i++) {
> > > > +		for (; bkvhead[i]; bkvhead[i] = bnext) {
> > > > +			bnext = bkvhead[i]->next;
> > > > +			debug_rcu_bhead_unqueue(bkvhead[i]);
> > > > +
> > > > +			rcu_lock_acquire(&rcu_callback_map);
> > > > +			if (i == 0) { // kmalloc() / kfree().
> > > > +				trace_rcu_invoke_kfree_bulk_callback(
> > > > +					rcu_state.name, bkvhead[i]->nr_records,
> > > > +					bkvhead[i]->records);
> > > > +
> > > > +				kfree_bulk(bkvhead[i]->nr_records,
> > > > +					bkvhead[i]->records);
> > > > +			} else { // vmalloc() / vfree().
> > > > +				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > > > +					trace_rcu_invoke_kfree_callback(
> > > > +						rcu_state.name,
> > > > +						bkvhead[i]->records[j], 0);
> > > > +
> > > > +					vfree(bkvhead[i]->records[j]);
> > > > +				}
> > > > +			}
> > > > +			rcu_lock_release(&rcu_callback_map);
> > > 
> > > Not an emergency, but did you look into replacing this "if" statement
> > > with an array of pointers to functions implementing the legs of the
> > > "if" statement?  If nothing else, this would greatly reduced indentation.
> > > 
> > >
> > > I am taking this as is, but if you have not already done so, could you
> > > please look into this for a follow-up patch?
> > > 
> > I do not think it makes sense, because it would require to check each
> > pointer in the array, what can lead to many branching, i.e. "if-else"
> > instructions.
> 
> Mightn't the compiler simply unroll the outer loop?  Then the first
> unrolled iteration of that loop would contain the then-clause and
> the second unrolled iteration would contain the else-clause.  At that
> point, there would be no checking, just direct calls.
> 
> Or am I missing something?
> 
If we mix pointers, then we can do free per pointer only. I mean in that
case we will not be able to use kfree_bulk() interface for freeing SLAB
memory and the code would converted to something like:

<snip>
while (nr_objects_in_array > 0) {
    if (is_vmalloc_addr(array[X]))
       vfree(array[X]);
    else
       kfree(array[X]);
}
<snip>

> > Paul, thank you to take it in!
> 
> Thank you for persisting!
> 
Welcome :)

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:35         ` Uladzislau Rezki
@ 2020-06-18 17:57           ` Paul E. McKenney
  2020-06-18 18:34             ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-18 17:57 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 07:35:20PM +0200, Uladzislau Rezki wrote:
> > > 
> > > I don't think that replacing direct function calls with indirect function
> > > calls is a great suggestion with the current state of play around branch
> > > prediction.
> > > 
> > > I'd suggest:
> > > 
> > >  			rcu_lock_acquire(&rcu_callback_map);
> > > 			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > > 				bkvhead[i]->nr_records, bkvhead[i]->records);
> > >  			if (i == 0) {
> > >  				kfree_bulk(bkvhead[i]->nr_records,
> > >  					bkvhead[i]->records);
> > >  			} else {
> > >  				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > >  					vfree(bkvhead[i]->records[j]);
> > >  				}
> > >  			}
> > >  			rcu_lock_release(&rcu_callback_map);
> > > 
> > > But I'd also suggest a vfree_bulk be added.  There are a few things
> > > which would be better done in bulk as part of the vfree process
> > > (we batch them up already, but i'm sure we could do better).
> > 
> > I suspect that he would like to keep the tracing.
> > 
> > It might be worth trying the branches, given that they would be constant
> > and indexed by "i".  The compiler might well remove the indirection.
> > 
> > The compiler guys brag about doing so, which of course might or might
> > not have any correlation to a given compiler actually doing so.  :-/
> > 
> > Having a vfree_bulk() might well be useful, but I would feel more
> > confidence in that if there were other callers of kfree_bulk().
> >
> Hmm... I think replacing that with vfree_bulk() is a good idea though.

In other words, get rid of kfree_bulk() in favor of vfree_bulk()?

> > But again, either way, future work as far as this series is concerned.
> > 
> What do you mean: is concerned?

Apologies for the strange English.  How about this?

"This series is OK as is.  Any comments above did not prevent me from
taking these patches, but instead discuss possible future work."

> We are planning to implement kfree_rcu() to be integrated directly into
> SLAB: SLAB, SLUB, SLOB. So, there are plenty of future work :)

And I am glad that this is still the goal.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:56         ` Uladzislau Rezki
@ 2020-06-18 18:15           ` Matthew Wilcox
  2020-06-18 18:23             ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2020-06-18 18:15 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 07:56:23PM +0200, Uladzislau Rezki wrote:
> If we mix pointers, then we can do free per pointer only. I mean in that
> case we will not be able to use kfree_bulk() interface for freeing SLAB
> memory and the code would converted to something like:
> 
> <snip>
> while (nr_objects_in_array > 0) {
>     if (is_vmalloc_addr(array[X]))
>        vfree(array[X]);
>     else
>        kfree(array[X]);
> }
> <snip>

[PATCH] Add vfree_bulk interface

This is a useful interface to have for the RCU kvfree code.  There is
scope for more performance gains later, but introducing the interface
now allows us to simplify the RCU code today.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 48bb681e6c2a..dc2bbb61af61 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -119,6 +119,7 @@ void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
+extern void vfree_bulk(size_t count, void **addrs);
 
 extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index abe37f09ac42..6042f9b4394a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2366,6 +2366,22 @@ void vfree(const void *addr)
 }
 EXPORT_SYMBOL(vfree);
 
+void vfree_bulk(size_t count, void **addrs)
+{
+	unsigned int i;
+
+	BUG_ON(in_nmi());
+	might_sleep_if(!in_interrupt());
+
+	for (i = 0; i < count; i++) {
+		void *addr = addrs[i];
+		kmemleak_free(addr);
+		if (addr)
+			__vfree(addr);
+	}
+}
+EXPORT_SYMBOL(vfree_bulk);
+
 /**
  * vunmap - release virtual mapping obtained by vmap()
  * @addr:   memory base address


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 18:15           ` Matthew Wilcox
@ 2020-06-18 18:23             ` Uladzislau Rezki
  2020-06-18 18:37               ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 18:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki, Paul E. McKenney, LKML, linux-mm,
	Andrew Morton, Theodore Y . Ts'o, Joel Fernandes, RCU,
	Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 11:15:41AM -0700, Matthew Wilcox wrote:
> On Thu, Jun 18, 2020 at 07:56:23PM +0200, Uladzislau Rezki wrote:
> > If we mix pointers, then we can do free per pointer only. I mean in that
> > case we will not be able to use kfree_bulk() interface for freeing SLAB
> > memory and the code would converted to something like:
> > 
> > <snip>
> > while (nr_objects_in_array > 0) {
> >     if (is_vmalloc_addr(array[X]))
> >        vfree(array[X]);
> >     else
> >        kfree(array[X]);
> > }
> > <snip>
> 
> [PATCH] Add vfree_bulk interface
> 
> This is a useful interface to have for the RCU kvfree code.  There is
> scope for more performance gains later, but introducing the interface
> now allows us to simplify the RCU code today.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 48bb681e6c2a..dc2bbb61af61 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -119,6 +119,7 @@ void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
>  
>  extern void vfree(const void *addr);
>  extern void vfree_atomic(const void *addr);
> +extern void vfree_bulk(size_t count, void **addrs);
>  
>  extern void *vmap(struct page **pages, unsigned int count,
>  			unsigned long flags, pgprot_t prot);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index abe37f09ac42..6042f9b4394a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2366,6 +2366,22 @@ void vfree(const void *addr)
>  }
>  EXPORT_SYMBOL(vfree);
>  
> +void vfree_bulk(size_t count, void **addrs)
> +{
> +	unsigned int i;
> +
> +	BUG_ON(in_nmi());
> +	might_sleep_if(!in_interrupt());
> +
> +	for (i = 0; i < count; i++) {
> +		void *addr = addrs[i];
> +		kmemleak_free(addr);
> +		if (addr)
> +			__vfree(addr);
> +	}
> +}
> +EXPORT_SYMBOL(vfree_bulk);
> +
>
Can we just do addrs[i] all over the loop?

Also, we can just call vfree() instead that has all checking we
need: NMI, kmemleak, might_sleep.

Thanks!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:57           ` Paul E. McKenney
@ 2020-06-18 18:34             ` Uladzislau Rezki
  2020-06-18 19:03               ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 18:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

> > > 
> > > I suspect that he would like to keep the tracing.
> > > 
> > > It might be worth trying the branches, given that they would be constant
> > > and indexed by "i".  The compiler might well remove the indirection.
> > > 
> > > The compiler guys brag about doing so, which of course might or might
> > > not have any correlation to a given compiler actually doing so.  :-/
> > > 
> > > Having a vfree_bulk() might well be useful, but I would feel more
> > > confidence in that if there were other callers of kfree_bulk().
> > >
> > Hmm... I think replacing that with vfree_bulk() is a good idea though.
> 
> In other words, get rid of kfree_bulk() in favor of vfree_bulk()?
> 
kfree_bulk() does not understand vmalloc memory. vfree_bulk() should
be implemented to release vmalloc's pointers. On i high level it will
be used the same way as kfree_bulk() but for vmalloc ptrs. only.

> > > But again, either way, future work as far as this series is concerned.
> > > 
> > What do you mean: is concerned?
> 
> Apologies for the strange English.  How about this?
> 
> "This series is OK as is.  Any comments above did not prevent me from
> taking these patches, but instead discuss possible future work."
> 
That is perfectly clear to me :)

> > We are planning to implement kfree_rcu() to be integrated directly into
> > SLAB: SLAB, SLUB, SLOB. So, there are plenty of future work :)
>
> And I am glad that this is still the goal.  ;-)
>
:)

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 18:23             ` Uladzislau Rezki
@ 2020-06-18 18:37               ` Matthew Wilcox
  2020-06-18 18:48                 ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2020-06-18 18:37 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 08:23:33PM +0200, Uladzislau Rezki wrote:
> > +void vfree_bulk(size_t count, void **addrs)
> > +{
> > +	unsigned int i;
> > +
> > +	BUG_ON(in_nmi());
> > +	might_sleep_if(!in_interrupt());
> > +
> > +	for (i = 0; i < count; i++) {
> > +		void *addr = addrs[i];
> > +		kmemleak_free(addr);
> > +		if (addr)
> > +			__vfree(addr);
> > +	}
> > +}
> > +EXPORT_SYMBOL(vfree_bulk);
> > +
> >
> Can we just do addrs[i] all over the loop?
> 
> Also, we can just call vfree() instead that has all checking we
> need: NMI, kmemleak, might_sleep.

Of course we _can_.  But would we want to?  This way, we only do these
checks once instead of once per pointer, which is rather the point
of batching.

I might actually go further and hoist the in_interrupt() check into
this function ... I suspect the RCU code always runs in_interrupt()
and so we always call vfree_deferred().


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 18:37               ` Matthew Wilcox
@ 2020-06-18 18:48                 ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 18:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki, Paul E. McKenney, LKML, linux-mm,
	Andrew Morton, Theodore Y . Ts'o, Joel Fernandes, RCU,
	Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 11:37:51AM -0700, Matthew Wilcox wrote:
> On Thu, Jun 18, 2020 at 08:23:33PM +0200, Uladzislau Rezki wrote:
> > > +void vfree_bulk(size_t count, void **addrs)
> > > +{
> > > +	unsigned int i;
> > > +
> > > +	BUG_ON(in_nmi());
> > > +	might_sleep_if(!in_interrupt());
> > > +
> > > +	for (i = 0; i < count; i++) {
> > > +		void *addr = addrs[i];
> > > +		kmemleak_free(addr);
> > > +		if (addr)
> > > +			__vfree(addr);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL(vfree_bulk);
> > > +
> > >
> > Can we just do addrs[i] all over the loop?
> > 
> > Also, we can just call vfree() instead that has all checking we
> > need: NMI, kmemleak, might_sleep.
> 
> Of course we _can_.  But would we want to?  This way, we only do these
> checks once instead of once per pointer, which is rather the point
> of batching.
>
Ahh, right. I briefly looked at it and missed that point. Right you
are we do not want the vfree() here!

> 
> I might actually go further and hoist the in_interrupt() check into
> this function ...
>
Why do you need it? Just to inline below code:

<snip>
 if (unlikely(in_interrupt()))
  __vfree_deferred(addr);
 else
  __vunmap(addr, 1);
<snip>

and bypass the __vfree() call(that is not marked as inline one)?
I mean to inline above into  vfree_bulk().

>
> I suspect the RCU code always runs in_interrupt()
> and so we always call vfree_deferred().
>
No. We release the memory from workqueue context.

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 18:34             ` Uladzislau Rezki
@ 2020-06-18 19:03               ` Paul E. McKenney
  2020-06-18 20:35                 ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-18 19:03 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 08:34:48PM +0200, Uladzislau Rezki wrote:
> > > > 
> > > > I suspect that he would like to keep the tracing.
> > > > 
> > > > It might be worth trying the branches, given that they would be constant
> > > > and indexed by "i".  The compiler might well remove the indirection.
> > > > 
> > > > The compiler guys brag about doing so, which of course might or might
> > > > not have any correlation to a given compiler actually doing so.  :-/
> > > > 
> > > > Having a vfree_bulk() might well be useful, but I would feel more
> > > > confidence in that if there were other callers of kfree_bulk().
> > > >
> > > Hmm... I think replacing that with vfree_bulk() is a good idea though.
> > 
> > In other words, get rid of kfree_bulk() in favor of vfree_bulk()?
> > 
> kfree_bulk() does not understand vmalloc memory. vfree_bulk() should
> be implemented to release vmalloc's pointers. On i high level it will
> be used the same way as kfree_bulk() but for vmalloc ptrs. only.

Ah, I thought that you guys were proposing something that did bulk
free of both kmalloc and vmalloc memory.

							Thanx, Paul

> > > > But again, either way, future work as far as this series is concerned.
> > > > 
> > > What do you mean: is concerned?
> > 
> > Apologies for the strange English.  How about this?
> > 
> > "This series is OK as is.  Any comments above did not prevent me from
> > taking these patches, but instead discuss possible future work."
> > 
> That is perfectly clear to me :)
> 
> > > We are planning to implement kfree_rcu() to be integrated directly into
> > > SLAB: SLAB, SLUB, SLOB. So, there are plenty of future work :)
> >
> > And I am glad that this is still the goal.  ;-)
> >
> :)
> 
> --
> Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 17:35         ` Matthew Wilcox
@ 2020-06-18 20:03           ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 20:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki, Paul E. McKenney, LKML, linux-mm,
	Andrew Morton, Theodore Y . Ts'o, Joel Fernandes, RCU,
	Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 10:35:27AM -0700, Matthew Wilcox wrote:
> On Thu, Jun 18, 2020 at 07:30:49PM +0200, Uladzislau Rezki wrote:
> > > I'd suggest:
> > > 
> > >  			rcu_lock_acquire(&rcu_callback_map);
> > > 			trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > > 				bkvhead[i]->nr_records, bkvhead[i]->records);
> > >  			if (i == 0) {
> > >  				kfree_bulk(bkvhead[i]->nr_records,
> > >  					bkvhead[i]->records);
> > >  			} else {
> > >  				for (j = 0; j < bkvhead[i]->nr_records; j++) {
> > >  					vfree(bkvhead[i]->records[j]);
> > >  				}
> > >  			}
> > >  			rcu_lock_release(&rcu_callback_map);
> > >
> > There are two different trace functions, one for "bulk" tracing
> > messages, and another one is per one call of kfree(), though we use 
> > to indicate vfree() call.
> > 
> > Probably we can rename it to: trace_rcu_invoke_kvfree_callback();
> > 
> > What do you think?
> 
> Works for me!
> 
OK. I will send out the patch that will rename that trace function
that makes clear that the pointer that is freed can belong to SLAB
or vmalloc.

> > > But I'd also suggest a vfree_bulk be added.  There are a few things
> > > which would be better done in bulk as part of the vfree process
> > > (we batch them up already, but i'm sure we could do better).
> > 
> > I was thinking to implement of vfree_bulk() API, but i guess it can
> > be done as future work.
> > 
> > Does that sound good?
> 
> Yes, definitely a future piece of work.
>
You have already been doing it.

Thank you, Matthew :)

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 19:03               ` Paul E. McKenney
@ 2020-06-18 20:35                 ` Uladzislau Rezki
  2020-06-18 20:38                   ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 20:35 UTC (permalink / raw)
  To: Paul E. McKenney, Matthew Wilcox
  Cc: Uladzislau Rezki, Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 12:03:59PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 08:34:48PM +0200, Uladzislau Rezki wrote:
> > > > > 
> > > > > I suspect that he would like to keep the tracing.
> > > > > 
> > > > > It might be worth trying the branches, given that they would be constant
> > > > > and indexed by "i".  The compiler might well remove the indirection.
> > > > > 
> > > > > The compiler guys brag about doing so, which of course might or might
> > > > > not have any correlation to a given compiler actually doing so.  :-/
> > > > > 
> > > > > Having a vfree_bulk() might well be useful, but I would feel more
> > > > > confidence in that if there were other callers of kfree_bulk().
> > > > >
> > > > Hmm... I think replacing that with vfree_bulk() is a good idea though.
> > > 
> > > In other words, get rid of kfree_bulk() in favor of vfree_bulk()?
> > > 
> > kfree_bulk() does not understand vmalloc memory. vfree_bulk() should
> > be implemented to release vmalloc's pointers. On i high level it will
> > be used the same way as kfree_bulk() but for vmalloc ptrs. only.
> 
> Ah, I thought that you guys were proposing something that did bulk
> free of both kmalloc and vmalloc memory.
> 
I see your point. We could introduce something like:

	kvfree_bulk(slab_arra, vmalloc_array);

but i do not have a strong opinion here, even though i tend to
say that it would be odd. Having just vfree_bulk(), i think
would be enough, as a result the code will look like:

<snip>
    trace_rcu_invoke_kfree_bulk_callback(
        rcu_state.name, bkvhead[i]->nr_records,
            bkvhead[i]->records);
    if (i == 0)
        kfree_bulk(bkvhead[i]->nr_records,
            bkvhead[i]->records);
    else
        vfree_bulk(bkvhead[i]->nr_records,
            bkvhead[i]->records);
<snip>

Matthew, what is your thought?

Thanks!

--
Vlad rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 20:35                 ` Uladzislau Rezki
@ 2020-06-18 20:38                   ` Matthew Wilcox
  2020-06-18 21:17                     ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2020-06-18 20:38 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 10:35:57PM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 18, 2020 at 12:03:59PM -0700, Paul E. McKenney wrote:
> but i do not have a strong opinion here, even though i tend to
> say that it would be odd. Having just vfree_bulk(), i think
> would be enough, as a result the code will look like:
> 
> <snip>
>     trace_rcu_invoke_kfree_bulk_callback(
>         rcu_state.name, bkvhead[i]->nr_records,
>             bkvhead[i]->records);
>     if (i == 0)
>         kfree_bulk(bkvhead[i]->nr_records,
>             bkvhead[i]->records);
>     else
>         vfree_bulk(bkvhead[i]->nr_records,
>             bkvhead[i]->records);
> <snip>
> 
> Matthew, what is your thought?

That was my thinking too.  If we had a kvfree_bulk(), I would expect it to
handle a mixture of vfree and kfree, but you've segregated them already.
So I think this is better.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 20:38                   ` Matthew Wilcox
@ 2020-06-18 21:17                     ` Uladzislau Rezki
  2020-06-18 21:34                       ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-18 21:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki, Paul E. McKenney, LKML, linux-mm,
	Andrew Morton, Theodore Y . Ts'o, Joel Fernandes, RCU,
	Oleksiy Avramchenko

> > <snip>
> >     trace_rcu_invoke_kfree_bulk_callback(
> >         rcu_state.name, bkvhead[i]->nr_records,
> >             bkvhead[i]->records);
> >     if (i == 0)
> >         kfree_bulk(bkvhead[i]->nr_records,
> >             bkvhead[i]->records);
> >     else
> >         vfree_bulk(bkvhead[i]->nr_records,
> >             bkvhead[i]->records);
> > <snip>
> > 
> > Matthew, what is your thought?
> 
> That was my thinking too.  If we had a kvfree_bulk(), I would expect it to
> handle a mixture of vfree and kfree, but you've segregated them already.
> So I think this is better.
>
Yes, the segregation is done. Having vfree_bulk() is enough then.
We are on the same page :)

Thanks!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 21:17                     ` Uladzislau Rezki
@ 2020-06-18 21:34                       ` Paul E. McKenney
  2020-06-19 15:46                         ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-18 21:34 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 11:17:09PM +0200, Uladzislau Rezki wrote:
> > > <snip>
> > >     trace_rcu_invoke_kfree_bulk_callback(
> > >         rcu_state.name, bkvhead[i]->nr_records,
> > >             bkvhead[i]->records);
> > >     if (i == 0)
> > >         kfree_bulk(bkvhead[i]->nr_records,
> > >             bkvhead[i]->records);
> > >     else
> > >         vfree_bulk(bkvhead[i]->nr_records,
> > >             bkvhead[i]->records);
> > > <snip>
> > > 
> > > Matthew, what is your thought?
> > 
> > That was my thinking too.  If we had a kvfree_bulk(), I would expect it to
> > handle a mixture of vfree and kfree, but you've segregated them already.
> > So I think this is better.
> >
> Yes, the segregation is done. Having vfree_bulk() is enough then.
> We are on the same page :)

Very good.  When does kfree_rcu() and friends move out of kernel/rcu?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-18 21:34                       ` Paul E. McKenney
@ 2020-06-19 15:46                         ` Uladzislau Rezki
  2020-06-19 16:25                           ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-19 15:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Thu, Jun 18, 2020 at 02:34:27PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 11:17:09PM +0200, Uladzislau Rezki wrote:
> > > > <snip>
> > > >     trace_rcu_invoke_kfree_bulk_callback(
> > > >         rcu_state.name, bkvhead[i]->nr_records,
> > > >             bkvhead[i]->records);
> > > >     if (i == 0)
> > > >         kfree_bulk(bkvhead[i]->nr_records,
> > > >             bkvhead[i]->records);
> > > >     else
> > > >         vfree_bulk(bkvhead[i]->nr_records,
> > > >             bkvhead[i]->records);
> > > > <snip>
> > > > 
> > > > Matthew, what is your thought?
> > > 
> > > That was my thinking too.  If we had a kvfree_bulk(), I would expect it to
> > > handle a mixture of vfree and kfree, but you've segregated them already.
> > > So I think this is better.
> > >
> > Yes, the segregation is done. Having vfree_bulk() is enough then.
> > We are on the same page :)
> 
> Very good.  When does kfree_rcu() and friends move out of kernel/rcu?
> 
Do you mean to move the whole logic of kfree_rcu() from top to down to mm/?

Thanks!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-19 15:46                         ` Uladzislau Rezki
@ 2020-06-19 16:25                           ` Paul E. McKenney
  2020-06-22 19:04                             ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-19 16:25 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Fri, Jun 19, 2020 at 05:46:52PM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 18, 2020 at 02:34:27PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 11:17:09PM +0200, Uladzislau Rezki wrote:
> > > > > <snip>
> > > > >     trace_rcu_invoke_kfree_bulk_callback(
> > > > >         rcu_state.name, bkvhead[i]->nr_records,
> > > > >             bkvhead[i]->records);
> > > > >     if (i == 0)
> > > > >         kfree_bulk(bkvhead[i]->nr_records,
> > > > >             bkvhead[i]->records);
> > > > >     else
> > > > >         vfree_bulk(bkvhead[i]->nr_records,
> > > > >             bkvhead[i]->records);
> > > > > <snip>
> > > > > 
> > > > > Matthew, what is your thought?
> > > > 
> > > > That was my thinking too.  If we had a kvfree_bulk(), I would expect it to
> > > > handle a mixture of vfree and kfree, but you've segregated them already.
> > > > So I think this is better.
> > > >
> > > Yes, the segregation is done. Having vfree_bulk() is enough then.
> > > We are on the same page :)
> > 
> > Very good.  When does kfree_rcu() and friends move out of kernel/rcu?
> > 
> Do you mean to move the whole logic of kfree_rcu() from top to down to mm/?

I do mean exactly that.

That was my goal some years back when Rao Shoaib was making the first
attempt along these lines, and it remains my goal.  After all, if this
effort is at all successful, the coupling between kfree_rcu() with
slab/slob/slub will become much tighter than that between kfree_rcu()
and RCU.

There will need to be some additional RCU APIs used by kfree_rcu(),
for example, something to tell RCU how many blocks are awaiting a
grace period.  But these are narrow and easily defined APIs.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-19 16:25                           ` Paul E. McKenney
@ 2020-06-22 19:04                             ` Uladzislau Rezki
  2020-06-22 19:53                               ` Paul E. McKenney
  0 siblings, 1 reply; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-22 19:04 UTC (permalink / raw)
  To: Paul E. McKenney, linux-mm
  Cc: Uladzislau Rezki, Matthew Wilcox, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

> > > 
> > > Very good.  When does kfree_rcu() and friends move out of kernel/rcu?
> > > 
> > Do you mean to move the whole logic of kfree_rcu() from top to down to mm/?
> 
> I do mean exactly that.
> 
> That was my goal some years back when Rao Shoaib was making the first
> attempt along these lines, and it remains my goal.  After all, if this
> effort is at all successful, the coupling between kfree_rcu() with
> slab/slob/slub will become much tighter than that between kfree_rcu()
> and RCU.
> 
> There will need to be some additional RCU APIs used by kfree_rcu(),
> for example, something to tell RCU how many blocks are awaiting a
> grace period.  But these are narrow and easily defined APIs.
>
I also think that k[v]free_rcu() should reside somewhere under "mm/".
Currently they are defined as macros under "linux/rcupdate.h". So i
am not sure if definitions should stay there or moved also.

Implementation of the k[v]free_rcu() is under rcu/tree.c and for tiny
variant is under rcutiny.h. It can be moved to the mm/slab_common.c
or independent files can be created. I think, mm people should consult
what is the best way to go :)

Any thoughts on it?

Thank you!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-22 19:04                             ` Uladzislau Rezki
@ 2020-06-22 19:53                               ` Paul E. McKenney
  2020-06-30 17:46                                 ` Uladzislau Rezki
  0 siblings, 1 reply; 43+ messages in thread
From: Paul E. McKenney @ 2020-06-22 19:53 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Matthew Wilcox, LKML, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Mon, Jun 22, 2020 at 09:04:06PM +0200, Uladzislau Rezki wrote:
> > > > 
> > > > Very good.  When does kfree_rcu() and friends move out of kernel/rcu?
> > > > 
> > > Do you mean to move the whole logic of kfree_rcu() from top to down to mm/?
> > 
> > I do mean exactly that.
> > 
> > That was my goal some years back when Rao Shoaib was making the first
> > attempt along these lines, and it remains my goal.  After all, if this
> > effort is at all successful, the coupling between kfree_rcu() with
> > slab/slob/slub will become much tighter than that between kfree_rcu()
> > and RCU.
> > 
> > There will need to be some additional RCU APIs used by kfree_rcu(),
> > for example, something to tell RCU how many blocks are awaiting a
> > grace period.  But these are narrow and easily defined APIs.
> >
> I also think that k[v]free_rcu() should reside somewhere under "mm/".
> Currently they are defined as macros under "linux/rcupdate.h". So i
> am not sure if definitions should stay there or moved also.

I am not as worried about the high-level macros as I am about the code
that does the bulk of the work, but they should still move as well.
Otherwise, changes involving both the macros and the underlying
implementation are harder than needed.

> Implementation of the k[v]free_rcu() is under rcu/tree.c and for tiny
> variant is under rcutiny.h. It can be moved to the mm/slab_common.c
> or independent files can be created. I think, mm people should consult
> what is the best way to go :)
> 
> Any thoughts on it?

I don't have any opinion on exactly where in mm the underlying
implementation code should go.  You suggestion of mm/slab_common.c
seems fine to me.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-06-22 19:53                               ` Paul E. McKenney
@ 2020-06-30 17:46                                 ` Uladzislau Rezki
  0 siblings, 0 replies; 43+ messages in thread
From: Uladzislau Rezki @ 2020-06-30 17:46 UTC (permalink / raw)
  To: linux-mm, Matthew Wilcox, vbabka, mhocko
  Cc: Uladzislau Rezki, linux-mm, Matthew Wilcox, LKML, Andrew Morton,
	Theodore Y . Ts'o, Joel Fernandes, RCU, Oleksiy Avramchenko

On Mon, Jun 22, 2020 at 12:53:29PM -0700, Paul E. McKenney wrote:
> On Mon, Jun 22, 2020 at 09:04:06PM +0200, Uladzislau Rezki wrote:
> > > > > 
> > > > > Very good.  When does kfree_rcu() and friends move out of kernel/rcu?
> > > > > 
> > > > Do you mean to move the whole logic of kfree_rcu() from top to down to mm/?
> > > 
> > > I do mean exactly that.
> > > 
> > > That was my goal some years back when Rao Shoaib was making the first
> > > attempt along these lines, and it remains my goal.  After all, if this
> > > effort is at all successful, the coupling between kfree_rcu() with
> > > slab/slob/slub will become much tighter than that between kfree_rcu()
> > > and RCU.
> > > 
> > > There will need to be some additional RCU APIs used by kfree_rcu(),
> > > for example, something to tell RCU how many blocks are awaiting a
> > > grace period.  But these are narrow and easily defined APIs.
> > >
> > I also think that k[v]free_rcu() should reside somewhere under "mm/".
> > Currently they are defined as macros under "linux/rcupdate.h". So i
> > am not sure if definitions should stay there or moved also.
> 
> I am not as worried about the high-level macros as I am about the code
> that does the bulk of the work, but they should still move as well.
> Otherwise, changes involving both the macros and the underlying
> implementation are harder than needed.
> 
> > Implementation of the k[v]free_rcu() is under rcu/tree.c and for tiny
> > variant is under rcutiny.h. It can be moved to the mm/slab_common.c
> > or independent files can be created. I think, mm people should consult
> > what is the best way to go :)
> > 
> > Any thoughts on it?
> 
> I don't have any opinion on exactly where in mm the underlying
> implementation code should go.  You suggestion of mm/slab_common.c
> seems fine to me.  ;-)
> 
OK :)

Then i would like to hear an opinion from the "mm" people where
kfree_rcu() and friends could potentially be moved.

Matthew, Michal, Vlastimil could you please share your view?

Thanks!

--
Vlad Rezki


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-06-30 17:46 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 21:47 [PATCH v2 00/16] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 01/16] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 02/16] rcu/tree: Skip entry into the page allocator for PREEMPT_RT Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 03/16] rcu/tree: Repeat the monitor if any free channel is busy Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 04/16] rcu/tree: Make debug_objects logic independent of rcu_head Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 05/16] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 06/16] rcu/tree: Move kfree_rcu_cpu locking/unlocking to separate functions Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 07/16] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 08/16] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 09/16] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
2020-06-17 23:46   ` Paul E. McKenney
2020-06-18  0:52     ` Matthew Wilcox
2020-06-18  3:18       ` Paul E. McKenney
2020-06-18 17:35         ` Uladzislau Rezki
2020-06-18 17:57           ` Paul E. McKenney
2020-06-18 18:34             ` Uladzislau Rezki
2020-06-18 19:03               ` Paul E. McKenney
2020-06-18 20:35                 ` Uladzislau Rezki
2020-06-18 20:38                   ` Matthew Wilcox
2020-06-18 21:17                     ` Uladzislau Rezki
2020-06-18 21:34                       ` Paul E. McKenney
2020-06-19 15:46                         ` Uladzislau Rezki
2020-06-19 16:25                           ` Paul E. McKenney
2020-06-22 19:04                             ` Uladzislau Rezki
2020-06-22 19:53                               ` Paul E. McKenney
2020-06-30 17:46                                 ` Uladzislau Rezki
2020-06-18 17:30       ` Uladzislau Rezki
2020-06-18 17:35         ` Matthew Wilcox
2020-06-18 20:03           ` Uladzislau Rezki
2020-06-18 17:25     ` Uladzislau Rezki
2020-06-18 17:32       ` Paul E. McKenney
2020-06-18 17:56         ` Uladzislau Rezki
2020-06-18 18:15           ` Matthew Wilcox
2020-06-18 18:23             ` Uladzislau Rezki
2020-06-18 18:37               ` Matthew Wilcox
2020-06-18 18:48                 ` Uladzislau Rezki
2020-05-25 21:47 ` [PATCH v2 10/16] rcu/tiny: support vmalloc in tiny-RCU Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 11/16] rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_* Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 12/16] mm/list_lru.c: Rename kvfree_rcu() to local variant Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 13/16] rcu: Introduce 2 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 14/16] rcu: Support reclaim for head-less object Uladzislau Rezki (Sony)
2020-05-25 21:47 ` [PATCH v2 15/16] rcu: Introduce single argument kvfree_rcu() interface Uladzislau Rezki (Sony)
2020-05-25 21:48 ` [PATCH v2 16/16] lib/test_vmalloc.c: Add test cases for kvfree_rcu() Uladzislau Rezki (Sony)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).