linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments)
@ 2020-04-28 20:58 Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 01/24] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
                   ` (23 more replies)
  0 siblings, 24 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Motivation
----------
There were some discussions and demand in having kvfree_rcu()
interface for different purposes. Basically to have a simple
interface like:

<snip>
    void *ptr = kvmalloc(some_bytes, GFP_KERNEL);
        if (ptr)
            kvfree_rcu(ptr);
<snip>

For example, please have a look at ext4 discussion here:
    https://lkml.org/lkml/2020/2/19/1372

due to lack of the interface that is in question, the ext4 specific
workaround has been introduced, to kvfree() after a grace period:

<snip>
void ext4_kvfree_array_rcu(void *to_free)
{
	struct ext4_rcu_ptr *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);

	if (ptr) {
		ptr->ptr = to_free;
		call_rcu(&ptr->rcu, ext4_rcu_ptr_callback);
		return;
	}
	synchronize_rcu();
	kvfree(ptr);
}
<snip>

there are also similar places there which could be replaced by the new
interface, that is much more efficient then just call synchronize_rcu()
and release a memory.

Please have a look at another places in the kernel where people do not
embed the rcu_head into their stuctures for some reason and do like:

<snip>
    synchronize_rcu();
    kfree(p);
<snip>

<snip>
urezki@pc638:~/data/coding/linux-rcu.git$ find ./ -name "*.c" | xargs grep -C 1 -rn "synchronize_rcu" | grep kfree
./fs/nfs/sysfs.c-113-           kfree(old);
./fs/ext4/super.c-1708- kfree(old_qname);
./kernel/trace/ftrace.c-5079-                   kfree(direct);
./kernel/trace/ftrace.c-5156-                   kfree(direct);
./kernel/trace/trace_probe.c-1087-      kfree(link);
./kernel/module.c-3910- kfree(mod->args);
./net/core/sysctl_net_core.c-143-                               kfree(cur);
./arch/x86/mm/mmio-mod.c-314-           kfree(found_trace);
./drivers/mfd/dln2.c-183-               kfree(i);
./drivers/block/drbd/drbd_state.c-2074-         kfree(old_conf);
./drivers/block/drbd/drbd_nl.c-1689-    kfree(old_disk_conf);
./drivers/block/drbd/drbd_nl.c-2522-    kfree(old_net_conf);
./drivers/block/drbd/drbd_nl.c-2935-            kfree(old_disk_conf);
./drivers/block/drbd/drbd_receiver.c-3805-      kfree(old_net_conf);
./drivers/block/drbd/drbd_receiver.c-4177-                      kfree(old_disk_conf);
./drivers/ipack/carriers/tpci200.c-189- kfree(slot_irq);
./drivers/crypto/nx/nx-842-pseries.c-1010-      kfree(old_devdata);
./drivers/net/ethernet/myricom/myri10ge/myri10ge.c-3583-        kfree(mgp->ss);
./drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:286:       synchronize_rcu(); /* before kfree(flow) */
./drivers/net/ethernet/mellanox/mlxsw/core.c-1574-      kfree(rxl_item);
./drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c-6642- kfree(adapter->mbox_log);
./drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c-6644- kfree(adapter);
./drivers/infiniband/hw/hfi1/sdma.c-1337-       kfree(dd->per_sdma);
./drivers/infiniband/core/device.c:2164:                         * synchronize_rcu before the netdev is kfreed, so we
./drivers/misc/vmw_vmci/vmci_context.c-692-             kfree(notifier);
./drivers/misc/vmw_vmci/vmci_event.c-213-       kfree(s);
./drivers/staging/fwserial/fwserial.c-2122-     kfree(peer);
urezki@pc638:~/data/coding/linux-rcu.git$
<snip>

so all of it can be replaced by the introduced interface and that
is actually aim and motivation. All that can replaced by the single
kvfree_rcu() logic.

As for double argument of the kvfree_rcu() we have only one user
so far, it is "mm/list_lru.c". But it costs nothing to add it.

Description
-----------
This small series introduces kvfree_rcu() macro that is used
for free memory after a grace period. It can be called either
with one or two arguments. kvfree_rcu() as it derives from its
name can handle two types of pointers, SLAB and vmalloc ones.

As a result we get two ways how to use kvfree_rcu() macro, see
below two examples.

a) kvfree_rcu(ptr, rhf);
    struct X {
        struct rcu_head rhf;
        unsigned char data[100];
    };

    void *ptr = kvmalloc(sizeof(struct X), GFP_KERNEL);
    if (ptr)
        kvfree_rcu(ptr, rhf);

b) kvfree_rcu(ptr);
    void *ptr = kvmalloc(some_bytes, GFP_KERNEL);
    if (ptr)
        kvfree_rcu(ptr);

Last one, we name it headless variant, only needs one argument,
means it does not require any rcu_head to be present within the
type of ptr. There is a restriction the (b) context has to fall
into might_sleep() annotation. To check that, please activate
the CONFIG_DEBUG_ATOMIC_SLEEP option in your kernel.

This series is based on:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git
"origin/rcu/dev" branch, what is the same as Paul's almost
latest dev.2020.04.13c branch.

Appreciate for any comments and feedback.

Joel Fernandes (Google) (5):
  rcu/tree: Keep kfree_rcu() awake during lock contention
  rcu/tree: Skip entry into the page allocator for PREEMPT_RT
  rcu/tree: Use consistent style for comments
  rcu/tree: Simplify debug_objects handling
  rcu/tree: Make kvfree_rcu() tolerate any alignment

Sebastian Andrzej Siewior (1):
  rcu/tree: Use static initializer for krc.lock

Uladzislau Rezki (Sony) (18):
  rcu/tree: Repeat the monitor if any free channel is busy
  rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro
  rcu/tree: move locking/unlocking to separate functions
  rcu/tree: cache specified number of objects
  rcu/tree: add rcutree.rcu_min_cached_objs description
  rcu/tree: Maintain separate array for vmalloc ptrs
  rcu/tiny: support vmalloc in tiny-RCU
  rcu: Rename rcu_invoke_kfree_callback/rcu_kfree_callback
  rcu: Rename __is_kfree_rcu_offset() macro
  rcu: Rename kfree_call_rcu() to the kvfree_call_rcu().
  mm/list_lru.c: Rename kvfree_rcu() to local variant
  rcu: Introduce 2 arg kvfree_rcu() interface
  mm/list_lru.c: Remove kvfree_rcu_local() function
  rcu/tree: Support reclaim for head-less object
  rcu/tiny: move kvfree_call_rcu() out of header
  rcu/tiny: support reclaim for head-less object
  rcu: Introduce 1 arg kvfree_rcu() interface
  lib/test_vmalloc.c: Add test cases for kvfree_rcu()

 .../admin-guide/kernel-parameters.txt         |   8 +
 include/linux/rcupdate.h                      |  53 +-
 include/linux/rcutiny.h                       |   6 +-
 include/linux/rcutree.h                       |   2 +-
 include/trace/events/rcu.h                    |   8 +-
 kernel/rcu/tiny.c                             | 168 ++++++-
 kernel/rcu/tree.c                             | 454 +++++++++++++-----
 lib/test_vmalloc.c                            | 103 +++-
 mm/list_lru.c                                 |  11 +-
 9 files changed, 662 insertions(+), 151 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/24] rcu/tree: Keep kfree_rcu() awake during lock contention
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 02/24] rcu/tree: Skip entry into the page allocator for PREEMPT_RT Uladzislau Rezki (Sony)
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, bigeasy

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

On PREEMPT_RT kernels, contending on the krcp spinlock can cause
sleeping as on these kernels, the spinlock is converted to an rt-mutex.
To prevent breakage of possible usage of kfree_rcu() now or in the
future, make use of raw spinlocks which are not subject to such
conversions.

Vetting all code paths, there is no reason to believe that the raw
spinlock will be held for long time so PREEMPT_RT should not suffer from
lengthy acquirals of the lock.

Cc: bigeasy@linutronix.de
Cc: Uladzislau Rezki <urezki@gmail.com>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f288477ee1c2..cf68d3d9f5b8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2905,7 +2905,7 @@ struct kfree_rcu_cpu {
 	struct kfree_rcu_bulk_data *bhead;
 	struct kfree_rcu_bulk_data *bcached;
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
-	spinlock_t lock;
+	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
 	bool monitor_todo;
 	bool initialized;
@@ -2939,12 +2939,12 @@ static void kfree_rcu_work(struct work_struct *work)
 	krwp = container_of(to_rcu_work(work),
 			    struct kfree_rcu_cpu_work, rcu_work);
 	krcp = krwp->krcp;
-	spin_lock_irqsave(&krcp->lock, flags);
+	raw_spin_lock_irqsave(&krcp->lock, flags);
 	head = krwp->head_free;
 	krwp->head_free = NULL;
 	bhead = krwp->bhead_free;
 	krwp->bhead_free = NULL;
-	spin_unlock_irqrestore(&krcp->lock, flags);
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
 	/* "bhead" is now private, so traverse locklessly. */
 	for (; bhead; bhead = bnext) {
@@ -3047,14 +3047,14 @@ static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
 	krcp->monitor_todo = false;
 	if (queue_kfree_rcu_work(krcp)) {
 		// Success! Our job is done here.
-		spin_unlock_irqrestore(&krcp->lock, flags);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 		return;
 	}
 
 	// Previous RCU batch still in progress, try again later.
 	krcp->monitor_todo = true;
 	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
-	spin_unlock_irqrestore(&krcp->lock, flags);
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
 /*
@@ -3067,11 +3067,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
 	struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
 						 monitor_work.work);
 
-	spin_lock_irqsave(&krcp->lock, flags);
+	raw_spin_lock_irqsave(&krcp->lock, flags);
 	if (krcp->monitor_todo)
 		kfree_rcu_drain_unlock(krcp, flags);
 	else
-		spin_unlock_irqrestore(&krcp->lock, flags);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
 
 static inline bool
@@ -3142,7 +3142,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	local_irq_save(flags);	// For safely calling this_cpu_ptr().
 	krcp = this_cpu_ptr(&krc);
 	if (krcp->initialized)
-		spin_lock(&krcp->lock);
+		raw_spin_lock(&krcp->lock);
 
 	// Queue the object but don't yet schedule the batch.
 	if (debug_rcu_head_queue(head)) {
@@ -3173,7 +3173,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 unlock_return:
 	if (krcp->initialized)
-		spin_unlock(&krcp->lock);
+		raw_spin_unlock(&krcp->lock);
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
@@ -3205,11 +3205,11 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count = krcp->count;
-		spin_lock_irqsave(&krcp->lock, flags);
+		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (krcp->monitor_todo)
 			kfree_rcu_drain_unlock(krcp, flags);
 		else
-			spin_unlock_irqrestore(&krcp->lock, flags);
+			raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
 		sc->nr_to_scan -= count;
 		freed += count;
@@ -3236,15 +3236,15 @@ void __init kfree_rcu_scheduler_running(void)
 	for_each_online_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
-		spin_lock_irqsave(&krcp->lock, flags);
+		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (!krcp->head || krcp->monitor_todo) {
-			spin_unlock_irqrestore(&krcp->lock, flags);
+			raw_spin_unlock_irqrestore(&krcp->lock, flags);
 			continue;
 		}
 		krcp->monitor_todo = true;
 		schedule_delayed_work_on(cpu, &krcp->monitor_work,
 					 KFREE_DRAIN_JIFFIES);
-		spin_unlock_irqrestore(&krcp->lock, flags);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 	}
 }
 
@@ -4140,7 +4140,7 @@ static void __init kfree_rcu_batch_init(void)
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
-		spin_lock_init(&krcp->lock);
+		raw_spin_lock_init(&krcp->lock);
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 02/24] rcu/tree: Skip entry into the page allocator for PREEMPT_RT
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 01/24] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 03/24] rcu/tree: Use consistent style for comments Uladzislau Rezki (Sony)
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, Sebastian Andrzej Siewior

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

To keep kfree_rcu() path working on raw non-preemptible sections,
prevent the optional entry into the allocator as it uses sleeping locks.
In fact, even if the caller of kfree_rcu() is preemptible, this path
still is not, as krcp->lock is a raw spinlock as done in previous
patches. With additional page pre-allocation in the works, hitting this
return is going to be much less likely soon so just prevent it for now
so that PREEMPT_RT does not break. Note that page allocation here is an
optimization and skipping it still makes kfree_rcu() work.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Co-developed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cf68d3d9f5b8..cd61649e1b00 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3092,6 +3092,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 		if (!bnode) {
 			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
 
+			/*
+			 * To keep this path working on raw non-preemptible
+			 * sections, prevent the optional entry into the
+			 * allocator as it uses sleeping locks. In fact, even
+			 * if the caller of kfree_rcu() is preemptible, this
+			 * path still is not, as krcp->lock is a raw spinlock.
+			 * With additional page pre-allocation in the works,
+			 * hitting this return is going to be much less likely.
+			 */
+			if (IS_ENABLED(CONFIG_PREEMPT_RT))
+				return false;
+
 			bnode = (struct kfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 01/24] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 02/24] rcu/tree: Skip entry into the page allocator for PREEMPT_RT Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 19:05   ` Paul E. McKenney
  2020-04-28 20:58 ` [PATCH 04/24] rcu/tree: Repeat the monitor if any free channel is busy Uladzislau Rezki (Sony)
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Simple clean up of comments in kfree_rcu() code to keep it consistent
with majority of commenting styles.

Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cd61649e1b00..1487af8e11e8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3043,15 +3043,15 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
 					  unsigned long flags)
 {
-	// Attempt to start a new batch.
+	/* Attempt to start a new batch. */
 	krcp->monitor_todo = false;
 	if (queue_kfree_rcu_work(krcp)) {
-		// Success! Our job is done here.
+		/* Success! Our job is done here. */
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 		return;
 	}
 
-	// Previous RCU batch still in progress, try again later.
+	/* Previous RCU batch still in progress, try again later. */
 	krcp->monitor_todo = true;
 	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
@@ -3151,14 +3151,14 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
 
-	local_irq_save(flags);	// For safely calling this_cpu_ptr().
+	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
 	krcp = this_cpu_ptr(&krc);
 	if (krcp->initialized)
 		raw_spin_lock(&krcp->lock);
 
-	// Queue the object but don't yet schedule the batch.
+	/* Queue the object but don't yet schedule the batch. */
 	if (debug_rcu_head_queue(head)) {
-		// Probable double kfree_rcu(), just leak.
+		/* Probable double kfree_rcu(), just leak. */
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
 		goto unlock_return;
@@ -3176,7 +3176,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 	WRITE_ONCE(krcp->count, krcp->count + 1);
 
-	// Set timer to drain after KFREE_DRAIN_JIFFIES.
+	/* Set timer to drain after KFREE_DRAIN_JIFFIES. */
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
 	    !krcp->monitor_todo) {
 		krcp->monitor_todo = true;
@@ -3722,7 +3722,7 @@ int rcutree_offline_cpu(unsigned int cpu)
 
 	rcutree_affinity_setting(cpu, cpu);
 
-	// nohz_full CPUs need the tick for stop-machine to work quickly
+	/* nohz_full CPUs need the tick for stop-machine to work quickly */
 	tick_dep_set(TICK_DEP_BIT_RCU);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 04/24] rcu/tree: Repeat the monitor if any free channel is busy
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (2 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 03/24] rcu/tree: Use consistent style for comments Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 05/24] rcu/tree: Simplify debug_objects handling Uladzislau Rezki (Sony)
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

It can be that one of the channels can not be detached
due to the fact that its free channel and previous data
has not been processed yet. From the other hand another
channel can be successfully detached causing the monitor
work to stop.

To prevent that, if there are any channels are still in
pending state after a detach attempt, just reschedule
the monitor work.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1487af8e11e8..0762ac06f0b7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2995,7 +2995,7 @@ static void kfree_rcu_work(struct work_struct *work)
 static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 {
 	struct kfree_rcu_cpu_work *krwp;
-	bool queued = false;
+	bool repeat = false;
 	int i;
 
 	lockdep_assert_held(&krcp->lock);
@@ -3033,11 +3033,14 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 			 * been detached following each other, one by one.
 			 */
 			queue_rcu_work(system_wq, &krwp->rcu_work);
-			queued = true;
 		}
+
+		/* Repeat if any "free" corresponding channel is still busy. */
+		if (krcp->bhead || krcp->head)
+			repeat = true;
 	}
 
-	return queued;
+	return !repeat;
 }
 
 static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 05/24] rcu/tree: Simplify debug_objects handling
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (3 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 04/24] rcu/tree: Repeat the monitor if any free channel is busy Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 06/24] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro Uladzislau Rezki (Sony)
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In order to prepare for future changes for headless RCU support, make the
debug_objects handling in kfree_rcu use the final 'pointer' value of the
object, instead of depending on the head.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0762ac06f0b7..767aed49d7fd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2860,13 +2860,11 @@ EXPORT_SYMBOL_GPL(call_rcu);
  * @nr_records: Number of active pointers in the array
  * @records: Array of the kfree_rcu() pointers
  * @next: Next bulk object in the block chain
- * @head_free_debug: For debug, when CONFIG_DEBUG_OBJECTS_RCU_HEAD is set
  */
 struct kfree_rcu_bulk_data {
 	unsigned long nr_records;
 	void *records[KFREE_BULK_MAX_ENTR];
 	struct kfree_rcu_bulk_data *next;
-	struct rcu_head *head_free_debug;
 };
 
 /**
@@ -2916,11 +2914,13 @@ struct kfree_rcu_cpu {
 static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
 
 static __always_inline void
-debug_rcu_head_unqueue_bulk(struct rcu_head *head)
+debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
 {
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-	for (; head; head = head->next)
-		debug_rcu_head_unqueue(head);
+	int i;
+
+	for (i = 0; i < bhead->nr_records; i++)
+		debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i]));
 #endif
 }
 
@@ -2950,7 +2950,7 @@ static void kfree_rcu_work(struct work_struct *work)
 	for (; bhead; bhead = bnext) {
 		bnext = bhead->next;
 
-		debug_rcu_head_unqueue_bulk(bhead->head_free_debug);
+		debug_rcu_bhead_unqueue(bhead);
 
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
@@ -2972,14 +2972,15 @@ static void kfree_rcu_work(struct work_struct *work)
 	 */
 	for (; head; head = next) {
 		unsigned long offset = (unsigned long)head->func;
+		void *ptr = (void *)head - offset;
 
 		next = head->next;
-		debug_rcu_head_unqueue(head);
+		debug_rcu_head_unqueue((struct rcu_head *)ptr);
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
 
 		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
-			kfree((void *)head - offset);
+			kfree(ptr);
 
 		rcu_lock_release(&rcu_callback_map);
 		cond_resched_tasks_rcu_qs();
@@ -3118,18 +3119,11 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
 		bnode->next = krcp->bhead;
-		bnode->head_free_debug = NULL;
 
 		/* Attach it to the head. */
 		krcp->bhead = bnode;
 	}
 
-#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-	head->func = func;
-	head->next = krcp->bhead->head_free_debug;
-	krcp->bhead->head_free_debug = head;
-#endif
-
 	/* Finally insert. */
 	krcp->bhead->records[krcp->bhead->nr_records++] =
 		(void *) head - (unsigned long) func;
@@ -3153,14 +3147,17 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
+	void *ptr;
 
 	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
 	krcp = this_cpu_ptr(&krc);
 	if (krcp->initialized)
 		raw_spin_lock(&krcp->lock);
 
+	ptr = (void *)head - (unsigned long)func;
+
 	/* Queue the object but don't yet schedule the batch. */
-	if (debug_rcu_head_queue(head)) {
+	if (debug_rcu_head_queue(ptr)) {
 		/* Probable double kfree_rcu(), just leak. */
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 06/24] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (4 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 05/24] rcu/tree: Simplify debug_objects handling Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 07/24] rcu/tree: move locking/unlocking to separate functions Uladzislau Rezki (Sony)
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, Boqun Feng

We can simplify KFREE_BULK_MAX_ENTR macro and get rid of
magic numbers which were used to make the structure to be
exactly one page.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 767aed49d7fd..eebd7f627794 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2848,13 +2848,6 @@ EXPORT_SYMBOL_GPL(call_rcu);
 #define KFREE_DRAIN_JIFFIES (HZ / 50)
 #define KFREE_N_BATCHES 2
 
-/*
- * This macro defines how many entries the "records" array
- * will contain. It is based on the fact that the size of
- * kfree_rcu_bulk_data structure becomes exactly one page.
- */
-#define KFREE_BULK_MAX_ENTR ((PAGE_SIZE / sizeof(void *)) - 3)
-
 /**
  * struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers
  * @nr_records: Number of active pointers in the array
@@ -2863,10 +2856,18 @@ EXPORT_SYMBOL_GPL(call_rcu);
  */
 struct kfree_rcu_bulk_data {
 	unsigned long nr_records;
-	void *records[KFREE_BULK_MAX_ENTR];
 	struct kfree_rcu_bulk_data *next;
+	void *records[];
 };
 
+/*
+ * This macro defines how many entries the "records" array
+ * will contain. It is based on the fact that the size of
+ * kfree_rcu_bulk_data structure becomes exactly one page.
+ */
+#define KFREE_BULK_MAX_ENTR \
+	((PAGE_SIZE - sizeof(struct kfree_rcu_bulk_data)) / sizeof(void *))
+
 /**
  * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
  * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 07/24] rcu/tree: move locking/unlocking to separate functions
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (5 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 06/24] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 08/24] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce two helpers to lock and unlock an access to the
per-cpu "kfree_rcu_cpu" structure. The reason is to make
kfree_call_rcu() function to be more readable.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index eebd7f627794..bc6c2bc8fa32 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2925,6 +2925,27 @@ debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
 #endif
 }
 
+static inline struct kfree_rcu_cpu *
+krc_this_cpu_lock(unsigned long *flags)
+{
+	struct kfree_rcu_cpu *krcp;
+
+	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
+	krcp = this_cpu_ptr(&krc);
+	if (likely(krcp->initialized))
+		raw_spin_lock(&krcp->lock);
+
+	return krcp;
+}
+
+static inline void
+krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
+{
+	if (likely(krcp->initialized))
+		raw_spin_unlock(&krcp->lock);
+	local_irq_restore(flags);
+}
+
 /*
  * This function is invoked in workqueue context after a grace period.
  * It frees all the objects queued on ->bhead_free or ->head_free.
@@ -3150,11 +3171,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	struct kfree_rcu_cpu *krcp;
 	void *ptr;
 
-	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
-	krcp = this_cpu_ptr(&krc);
-	if (krcp->initialized)
-		raw_spin_lock(&krcp->lock);
-
+	krcp = krc_this_cpu_lock(&flags);
 	ptr = (void *)head - (unsigned long)func;
 
 	/* Queue the object but don't yet schedule the batch. */
@@ -3185,9 +3202,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	}
 
 unlock_return:
-	if (krcp->initialized)
-		raw_spin_unlock(&krcp->lock);
-	local_irq_restore(flags);
+	krc_this_cpu_unlock(krcp, flags);
 }
 EXPORT_SYMBOL_GPL(kfree_call_rcu);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 08/24] rcu/tree: Use static initializer for krc.lock
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (6 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 07/24] rcu/tree: move locking/unlocking to separate functions Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 21:17   ` Paul E. McKenney
  2020-04-28 20:58 ` [PATCH 09/24] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The per-CPU variable is initialized at runtime in
kfree_rcu_batch_init(). This function is invoked before
'rcu_scheduler_active' is set to 'RCU_SCHEDULER_RUNNING'.
After the initialisation, '->initialized' is to true.

The raw_spin_lock is only acquired if '->initialized' is
set to true. The worqueue item is only used if 'rcu_scheduler_active'
set to RCU_SCHEDULER_RUNNING which happens after initialisation.

Use a static initializer for krc.lock and remove the runtime
initialisation of the lock. Since the lock can now be always
acquired, remove the '->initialized' check.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bc6c2bc8fa32..89e9ca3f4e3e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2892,7 +2892,7 @@ struct kfree_rcu_cpu_work {
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
  * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
- * @initialized: The @lock and @rcu_work fields have been initialized
+ * @initialized: The @rcu_work fields have been initialized
  *
  * This is a per-CPU structure.  The reason that it is not included in
  * the rcu_data structure is to permit this code to be extracted from
@@ -2912,7 +2912,9 @@ struct kfree_rcu_cpu {
 	int count;
 };
 
-static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
+static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
+	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
+};
 
 static __always_inline void
 debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
@@ -2930,10 +2932,9 @@ krc_this_cpu_lock(unsigned long *flags)
 {
 	struct kfree_rcu_cpu *krcp;
 
-	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
+	local_irq_save(*flags);	/* For safely calling this_cpu_ptr(). */
 	krcp = this_cpu_ptr(&krc);
-	if (likely(krcp->initialized))
-		raw_spin_lock(&krcp->lock);
+	raw_spin_lock(&krcp->lock);
 
 	return krcp;
 }
@@ -2941,8 +2942,7 @@ krc_this_cpu_lock(unsigned long *flags)
 static inline void
 krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
 {
-	if (likely(krcp->initialized))
-		raw_spin_unlock(&krcp->lock);
+	raw_spin_unlock(&krcp->lock);
 	local_irq_restore(flags);
 }
 
@@ -4168,7 +4168,6 @@ static void __init kfree_rcu_batch_init(void)
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
-		raw_spin_lock_init(&krcp->lock);
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (7 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 08/24] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 21:27   ` Paul E. McKenney
  2020-04-28 20:58 ` [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description Uladzislau Rezki (Sony)
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Cache some extra objects per-CPU. During reclaim process
some pages are cached instead of releasing by linking them
into the list. Such approach provides O(1) access time to
the cache.

That reduces number of requests to the page allocator, also
that makes it more helpful if a low memory condition occurs.

A parameter reflecting the minimum allowed pages to be
cached per one CPU is propagated via sysfs, it is read
only, the name is "rcu_min_cached_objs".

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 60 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 89e9ca3f4e3e..d8975819b1c9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -178,6 +178,14 @@ module_param(gp_init_delay, int, 0444);
 static int gp_cleanup_delay;
 module_param(gp_cleanup_delay, int, 0444);
 
+/*
+ * This rcu parameter is read-only, but can be write also.
+ * It reflects the minimum allowed number of objects which
+ * can be cached per-CPU. Object size is equal to one page.
+ */
+int rcu_min_cached_objs = 2;
+module_param(rcu_min_cached_objs, int, 0444);
+
 /* Retrieve RCU kthreads priority for rcutorture */
 int rcu_get_gp_kthreads_prio(void)
 {
@@ -2887,7 +2895,6 @@ struct kfree_rcu_cpu_work {
  * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
  * @head: List of kfree_rcu() objects not yet waiting for a grace period
  * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
- * @bcached: Keeps at most one object for later reuse when build chain blocks
  * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
@@ -2902,7 +2909,6 @@ struct kfree_rcu_cpu_work {
 struct kfree_rcu_cpu {
 	struct rcu_head *head;
 	struct kfree_rcu_bulk_data *bhead;
-	struct kfree_rcu_bulk_data *bcached;
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
 	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
@@ -2910,6 +2916,15 @@ struct kfree_rcu_cpu {
 	bool initialized;
 	// Number of objects for which GP not started
 	int count;
+
+	/*
+	 * Number of cached objects which are queued into
+	 * the lock-less list. This cache is used by the
+	 * kvfree_call_rcu() function and as of now its
+	 * size is static.
+	 */
+	struct llist_head bkvcache;
+	int nr_bkv_objs;
 };
 
 static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
@@ -2946,6 +2961,31 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
 	local_irq_restore(flags);
 }
 
+static inline struct kfree_rcu_bulk_data *
+get_cached_bnode(struct kfree_rcu_cpu *krcp)
+{
+	if (!krcp->nr_bkv_objs)
+		return NULL;
+
+	krcp->nr_bkv_objs--;
+	return (struct kfree_rcu_bulk_data *)
+		llist_del_first(&krcp->bkvcache);
+}
+
+static inline bool
+put_cached_bnode(struct kfree_rcu_cpu *krcp,
+	struct kfree_rcu_bulk_data *bnode)
+{
+	/* Check the limit. */
+	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
+		return false;
+
+	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
+	krcp->nr_bkv_objs++;
+	return true;
+
+}
+
 /*
  * This function is invoked in workqueue context after a grace period.
  * It frees all the objects queued on ->bhead_free or ->head_free.
@@ -2981,7 +3021,12 @@ static void kfree_rcu_work(struct work_struct *work)
 		kfree_bulk(bhead->nr_records, bhead->records);
 		rcu_lock_release(&rcu_callback_map);
 
-		if (cmpxchg(&krcp->bcached, NULL, bhead))
+		krcp = krc_this_cpu_lock(&flags);
+		if (put_cached_bnode(krcp, bhead))
+			bhead = NULL;
+		krc_this_cpu_unlock(krcp, flags);
+
+		if (bhead)
 			free_page((unsigned long) bhead);
 
 		cond_resched_tasks_rcu_qs();
@@ -3114,7 +3159,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 	/* Check if a new block is required. */
 	if (!krcp->bhead ||
 			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
-		bnode = xchg(&krcp->bcached, NULL);
+		bnode = get_cached_bnode(krcp);
 		if (!bnode) {
 			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
 
@@ -4167,12 +4212,23 @@ static void __init kfree_rcu_batch_init(void)
 
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+		struct kfree_rcu_bulk_data *bnode;
 
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
 			krcp->krw_arr[i].krcp = krcp;
 		}
 
+		for (i = 0; i < rcu_min_cached_objs; i++) {
+			bnode = (struct kfree_rcu_bulk_data *)
+				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+
+			if (bnode)
+				put_cached_bnode(krcp, bnode);
+			else
+				pr_err("Failed to preallocate for %d CPU!\n", cpu);
+		}
+
 		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
 		krcp->initialized = true;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (8 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 09/24] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 22:25   ` Paul E. McKenney
  2020-04-28 20:58 ` [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Document the rcutree.rcu_min_cached_objs sysfs kernel parameter.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 828ff975fbc6..b2b7022374af 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3977,6 +3977,14 @@
 			latencies, which will choose a value aligned
 			with the appropriate hardware boundaries.
 
+	rcutree.rcu_min_cached_objs= [KNL]
+			Minimum number of objects which are cached and
+			maintained per one CPU. Object size is equal
+			to PAGE_SIZE. The cache allows to reduce the
+			pressure to page allocator, also it makes the
+			whole algorithm to behave better in low memory
+			condition.
+
 	rcutree.jiffies_till_first_fqs= [KNL]
 			Set delay from grace-period initialization to
 			first attempt to force quiescent states.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (9 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 21:37   ` Paul E. McKenney
  2020-04-28 20:58 ` [PATCH 12/24] rcu/tiny: support vmalloc in tiny-RCU Uladzislau Rezki (Sony)
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

To do so we use an array of common kvfree_rcu_bulk_data
structure. It consists of two elements, index number 0
corresponds to SLAB ptrs., whereas vmalloc pointers can
be accessed by using index number 1.

The reason of not mixing pointers is to have an easy way
to to distinguish them.

It is also the preparation patch for head-less objects
support. When an object is head-less we can not queue
it into any list, instead a pointer is placed directly
into an array.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 172 +++++++++++++++++++++++++++++-----------------
 1 file changed, 109 insertions(+), 63 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d8975819b1c9..7983926af95b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -57,6 +57,7 @@
 #include <linux/slab.h>
 #include <linux/sched/isolation.h>
 #include <linux/sched/clock.h>
+#include <linux/mm.h>
 #include "../time/tick-internal.h"
 
 #include "tree.h"
@@ -2857,44 +2858,44 @@ EXPORT_SYMBOL_GPL(call_rcu);
 #define KFREE_N_BATCHES 2
 
 /**
- * struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers
+ * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
  * @nr_records: Number of active pointers in the array
- * @records: Array of the kfree_rcu() pointers
  * @next: Next bulk object in the block chain
+ * @records: Array of the kvfree_rcu() pointers
  */
-struct kfree_rcu_bulk_data {
+struct kvfree_rcu_bulk_data {
 	unsigned long nr_records;
-	struct kfree_rcu_bulk_data *next;
+	struct kvfree_rcu_bulk_data *next;
 	void *records[];
 };
 
 /*
  * This macro defines how many entries the "records" array
  * will contain. It is based on the fact that the size of
- * kfree_rcu_bulk_data structure becomes exactly one page.
+ * kvfree_rcu_bulk_data structure becomes exactly one page.
  */
-#define KFREE_BULK_MAX_ENTR \
-	((PAGE_SIZE - sizeof(struct kfree_rcu_bulk_data)) / sizeof(void *))
+#define KVFREE_BULK_MAX_ENTR \
+	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
 
 /**
  * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
  * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
  * @head_free: List of kfree_rcu() objects waiting for a grace period
- * @bhead_free: Bulk-List of kfree_rcu() objects waiting for a grace period
+ * @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
  * @krcp: Pointer to @kfree_rcu_cpu structure
  */
 
 struct kfree_rcu_cpu_work {
 	struct rcu_work rcu_work;
 	struct rcu_head *head_free;
-	struct kfree_rcu_bulk_data *bhead_free;
+	struct kvfree_rcu_bulk_data *bkvhead_free[2];
 	struct kfree_rcu_cpu *krcp;
 };
 
 /**
  * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
  * @head: List of kfree_rcu() objects not yet waiting for a grace period
- * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
+ * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
  * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
@@ -2908,7 +2909,7 @@ struct kfree_rcu_cpu_work {
  */
 struct kfree_rcu_cpu {
 	struct rcu_head *head;
-	struct kfree_rcu_bulk_data *bhead;
+	struct kvfree_rcu_bulk_data *bkvhead[2];
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
 	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
@@ -2932,7 +2933,7 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
 };
 
 static __always_inline void
-debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
+debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
 {
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
 	int i;
@@ -2961,20 +2962,20 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
 	local_irq_restore(flags);
 }
 
-static inline struct kfree_rcu_bulk_data *
+static inline struct kvfree_rcu_bulk_data *
 get_cached_bnode(struct kfree_rcu_cpu *krcp)
 {
 	if (!krcp->nr_bkv_objs)
 		return NULL;
 
 	krcp->nr_bkv_objs--;
-	return (struct kfree_rcu_bulk_data *)
+	return (struct kvfree_rcu_bulk_data *)
 		llist_del_first(&krcp->bkvcache);
 }
 
 static inline bool
 put_cached_bnode(struct kfree_rcu_cpu *krcp,
-	struct kfree_rcu_bulk_data *bnode)
+	struct kvfree_rcu_bulk_data *bnode)
 {
 	/* Check the limit. */
 	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
@@ -2993,41 +2994,73 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
 static void kfree_rcu_work(struct work_struct *work)
 {
 	unsigned long flags;
+	struct kvfree_rcu_bulk_data *bkhead, *bvhead, *bnext;
 	struct rcu_head *head, *next;
-	struct kfree_rcu_bulk_data *bhead, *bnext;
 	struct kfree_rcu_cpu *krcp;
 	struct kfree_rcu_cpu_work *krwp;
+	int i;
 
 	krwp = container_of(to_rcu_work(work),
 			    struct kfree_rcu_cpu_work, rcu_work);
 	krcp = krwp->krcp;
+
 	raw_spin_lock_irqsave(&krcp->lock, flags);
+	/* Channel 1. */
+	bkhead = krwp->bkvhead_free[0];
+	krwp->bkvhead_free[0] = NULL;
+
+	/* Channel 2. */
+	bvhead = krwp->bkvhead_free[1];
+	krwp->bkvhead_free[1] = NULL;
+
+	/* Channel 3. */
 	head = krwp->head_free;
 	krwp->head_free = NULL;
-	bhead = krwp->bhead_free;
-	krwp->bhead_free = NULL;
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 
-	/* "bhead" is now private, so traverse locklessly. */
-	for (; bhead; bhead = bnext) {
-		bnext = bhead->next;
-
-		debug_rcu_bhead_unqueue(bhead);
+	/* kmalloc()/kfree() channel. */
+	for (; bkhead; bkhead = bnext) {
+		bnext = bkhead->next;
+		debug_rcu_bhead_unqueue(bkhead);
 
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
-			bhead->nr_records, bhead->records);
+			bkhead->nr_records, bkhead->records);
+
+		kfree_bulk(bkhead->nr_records, bkhead->records);
+		rcu_lock_release(&rcu_callback_map);
+
+		krcp = krc_this_cpu_lock(&flags);
+		if (put_cached_bnode(krcp, bkhead))
+			bkhead = NULL;
+		krc_this_cpu_unlock(krcp, flags);
+
+		if (bkhead)
+			free_page((unsigned long) bkhead);
+
+		cond_resched_tasks_rcu_qs();
+	}
+
+	/* vmalloc()/vfree() channel. */
+	for (; bvhead; bvhead = bnext) {
+		bnext = bvhead->next;
+		debug_rcu_bhead_unqueue(bvhead);
 
-		kfree_bulk(bhead->nr_records, bhead->records);
+		rcu_lock_acquire(&rcu_callback_map);
+		for (i = 0; i < bvhead->nr_records; i++) {
+			trace_rcu_invoke_kfree_callback(rcu_state.name,
+				(struct rcu_head *) bvhead->records[i], 0);
+			vfree(bvhead->records[i]);
+		}
 		rcu_lock_release(&rcu_callback_map);
 
 		krcp = krc_this_cpu_lock(&flags);
-		if (put_cached_bnode(krcp, bhead))
-			bhead = NULL;
+		if (put_cached_bnode(krcp, bvhead))
+			bvhead = NULL;
 		krc_this_cpu_unlock(krcp, flags);
 
-		if (bhead)
-			free_page((unsigned long) bhead);
+		if (bvhead)
+			free_page((unsigned long) bvhead);
 
 		cond_resched_tasks_rcu_qs();
 	}
@@ -3047,7 +3080,7 @@ static void kfree_rcu_work(struct work_struct *work)
 		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
 
 		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
-			kfree(ptr);
+			kvfree(ptr);
 
 		rcu_lock_release(&rcu_callback_map);
 		cond_resched_tasks_rcu_qs();
@@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 		krwp = &(krcp->krw_arr[i]);
 
 		/*
-		 * Try to detach bhead or head and attach it over any
+		 * Try to detach bkvhead or head and attach it over any
 		 * available corresponding free channel. It can be that
 		 * a previous RCU batch is in progress, it means that
 		 * immediately to queue another one is not possible so
 		 * return false to tell caller to retry.
 		 */
-		if ((krcp->bhead && !krwp->bhead_free) ||
+		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
+			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
 				(krcp->head && !krwp->head_free)) {
-			/* Channel 1. */
-			if (!krwp->bhead_free) {
-				krwp->bhead_free = krcp->bhead;
-				krcp->bhead = NULL;
+			/*
+			 * Channel 1 corresponds to SLAB ptrs.
+			 */
+			if (!krwp->bkvhead_free[0]) {
+				krwp->bkvhead_free[0] = krcp->bkvhead[0];
+				krcp->bkvhead[0] = NULL;
 			}
 
-			/* Channel 2. */
+			/*
+			 * Channel 2 corresponds to vmalloc ptrs.
+			 */
+			if (!krwp->bkvhead_free[1]) {
+				krwp->bkvhead_free[1] = krcp->bkvhead[1];
+				krcp->bkvhead[1] = NULL;
+			}
+
+			/*
+			 * Channel 3 corresponds to emergency path.
+			 */
 			if (!krwp->head_free) {
 				krwp->head_free = krcp->head;
 				krcp->head = NULL;
@@ -3095,16 +3141,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
 			WRITE_ONCE(krcp->count, 0);
 
 			/*
-			 * One work is per one batch, so there are two "free channels",
-			 * "bhead_free" and "head_free" the batch can handle. It can be
-			 * that the work is in the pending state when two channels have
-			 * been detached following each other, one by one.
+			 * One work is per one batch, so there are three
+			 * "free channels", the batch can handle. It can
+			 * be that the work is in the pending state when
+			 * channels have been detached following by each
+			 * other.
 			 */
 			queue_rcu_work(system_wq, &krwp->rcu_work);
 		}
 
 		/* Repeat if any "free" corresponding channel is still busy. */
-		if (krcp->bhead || krcp->head)
+		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
 			repeat = true;
 	}
 
@@ -3146,23 +3193,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
 }
 
 static inline bool
-kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
-	struct rcu_head *head, rcu_callback_t func)
+kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 {
-	struct kfree_rcu_bulk_data *bnode;
+	struct kvfree_rcu_bulk_data *bnode;
+	int idx;
 
 	if (unlikely(!krcp->initialized))
 		return false;
 
 	lockdep_assert_held(&krcp->lock);
+	idx = !!is_vmalloc_addr(ptr);
 
 	/* Check if a new block is required. */
-	if (!krcp->bhead ||
-			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
+	if (!krcp->bkvhead[idx] ||
+			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
 		bnode = get_cached_bnode(krcp);
 		if (!bnode) {
-			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
-
 			/*
 			 * To keep this path working on raw non-preemptible
 			 * sections, prevent the optional entry into the
@@ -3175,7 +3221,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 			if (IS_ENABLED(CONFIG_PREEMPT_RT))
 				return false;
 
-			bnode = (struct kfree_rcu_bulk_data *)
+			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		}
 
@@ -3185,30 +3231,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
 
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
-		bnode->next = krcp->bhead;
+		bnode->next = krcp->bkvhead[idx];
 
 		/* Attach it to the head. */
-		krcp->bhead = bnode;
+		krcp->bkvhead[idx] = bnode;
 	}
 
 	/* Finally insert. */
-	krcp->bhead->records[krcp->bhead->nr_records++] =
-		(void *) head - (unsigned long) func;
+	krcp->bkvhead[idx]->records
+		[krcp->bkvhead[idx]->nr_records++] = ptr;
 
 	return true;
 }
 
 /*
- * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
- * period. Please note there are two paths are maintained, one is the main one
- * that uses kfree_bulk() interface and second one is emergency one, that is
- * used only when the main path can not be maintained temporary, due to memory
- * pressure.
+ * Queue a request for lazy invocation of appropriate free routine after a
+ * grace period. Please note there are three paths are maintained, two are the
+ * main ones that use array of pointers interface and third one is emergency
+ * one, that is used only when the main path can not be maintained temporary,
+ * due to memory pressure.
  *
  * Each kfree_call_rcu() request is added to a batch. The batch will be drained
  * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
  * be free'd in workqueue context. This allows us to: batch requests together to
- * reduce the number of grace periods during heavy kfree_rcu() load.
+ * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
  */
 void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
@@ -3231,7 +3277,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	 * Under high memory pressure GFP_NOWAIT can fail,
 	 * in that case the emergency path is maintained.
 	 */
-	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
+	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
 		head->func = func;
 		head->next = krcp->head;
 		krcp->head = head;
@@ -4212,7 +4258,7 @@ static void __init kfree_rcu_batch_init(void)
 
 	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-		struct kfree_rcu_bulk_data *bnode;
+		struct kvfree_rcu_bulk_data *bnode;
 
 		for (i = 0; i < KFREE_N_BATCHES; i++) {
 			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
@@ -4220,7 +4266,7 @@ static void __init kfree_rcu_batch_init(void)
 		}
 
 		for (i = 0; i < rcu_min_cached_objs; i++) {
-			bnode = (struct kfree_rcu_bulk_data *)
+			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 
 			if (bnode)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 12/24] rcu/tiny: support vmalloc in tiny-RCU
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (10 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 13/24] rcu: Rename rcu_invoke_kfree_callback/rcu_kfree_callback Uladzislau Rezki (Sony)
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Replace kfree() with kvfree() in rcu_reclaim_tiny().
So it becomes possible to release either SLAB memory
or vmalloc one after a GP.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tiny.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index dd572ce7c747..4b99f7b88bee 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -23,6 +23,7 @@
 #include <linux/cpu.h>
 #include <linux/prefetch.h>
 #include <linux/slab.h>
+#include <linux/mm.h>
 
 #include "rcu.h"
 
@@ -86,7 +87,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
 	rcu_lock_acquire(&rcu_callback_map);
 	if (__is_kfree_rcu_offset(offset)) {
 		trace_rcu_invoke_kfree_callback("", head, offset);
-		kfree((void *)head - offset);
+		kvfree((void *)head - offset);
 		rcu_lock_release(&rcu_callback_map);
 		return true;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 13/24] rcu: Rename rcu_invoke_kfree_callback/rcu_kfree_callback
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (11 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 12/24] rcu/tiny: support vmalloc in tiny-RCU Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 14/24] rcu: Rename __is_kfree_rcu_offset() macro Uladzislau Rezki (Sony)
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Rename rcu_invoke_kfree_callback to rcu_invoke_kvfree_callback.
Do the same with second trace event, the rcu_kfree_callback,
becomes rcu_kvfree_callback. The reason is to be aligned with
kvfree notation.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/trace/events/rcu.h | 8 ++++----
 kernel/rcu/tiny.c          | 2 +-
 kernel/rcu/tree.c          | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index f9a7811148e2..0ee93d0b1daa 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -506,13 +506,13 @@ TRACE_EVENT_RCU(rcu_callback,
 
 /*
  * Tracepoint for the registration of a single RCU callback of the special
- * kfree() form.  The first argument is the RCU type, the second argument
+ * kvfree() form.  The first argument is the RCU type, the second argument
  * is a pointer to the RCU callback, the third argument is the offset
  * of the callback within the enclosing RCU-protected data structure,
  * the fourth argument is the number of lazy callbacks queued, and the
  * fifth argument is the total number of callbacks queued.
  */
-TRACE_EVENT_RCU(rcu_kfree_callback,
+TRACE_EVENT_RCU(rcu_kvfree_callback,
 
 	TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
 		 long qlen),
@@ -596,12 +596,12 @@ TRACE_EVENT_RCU(rcu_invoke_callback,
 
 /*
  * Tracepoint for the invocation of a single RCU callback of the special
- * kfree() form.  The first argument is the RCU flavor, the second
+ * kvfree() form.  The first argument is the RCU flavor, the second
  * argument is a pointer to the RCU callback, and the third argument
  * is the offset of the callback within the enclosing RCU-protected
  * data structure.
  */
-TRACE_EVENT_RCU(rcu_invoke_kfree_callback,
+TRACE_EVENT_RCU(rcu_invoke_kvfree_callback,
 
 	TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset),
 
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 4b99f7b88bee..3dd8e6e207b0 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -86,7 +86,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
 
 	rcu_lock_acquire(&rcu_callback_map);
 	if (__is_kfree_rcu_offset(offset)) {
-		trace_rcu_invoke_kfree_callback("", head, offset);
+		trace_rcu_invoke_kvfree_callback("", head, offset);
 		kvfree((void *)head - offset);
 		rcu_lock_release(&rcu_callback_map);
 		return true;
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7983926af95b..821de8149928 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2794,7 +2794,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
 	if (__is_kfree_rcu_offset((unsigned long)func))
-		trace_rcu_kfree_callback(rcu_state.name, head,
+		trace_rcu_kvfree_callback(rcu_state.name, head,
 					 (unsigned long)func,
 					 rcu_segcblist_n_cbs(&rdp->cblist));
 	else
@@ -3048,7 +3048,7 @@ static void kfree_rcu_work(struct work_struct *work)
 
 		rcu_lock_acquire(&rcu_callback_map);
 		for (i = 0; i < bvhead->nr_records; i++) {
-			trace_rcu_invoke_kfree_callback(rcu_state.name,
+			trace_rcu_invoke_kvfree_callback(rcu_state.name,
 				(struct rcu_head *) bvhead->records[i], 0);
 			vfree(bvhead->records[i]);
 		}
@@ -3077,7 +3077,7 @@ static void kfree_rcu_work(struct work_struct *work)
 		next = head->next;
 		debug_rcu_head_unqueue((struct rcu_head *)ptr);
 		rcu_lock_acquire(&rcu_callback_map);
-		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
+		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
 
 		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
 			kvfree(ptr);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 14/24] rcu: Rename __is_kfree_rcu_offset() macro
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (12 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 13/24] rcu: Rename rcu_invoke_kfree_callback/rcu_kfree_callback Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 15/24] rcu: Rename kfree_call_rcu() to the kvfree_call_rcu() Uladzislau Rezki (Sony)
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Rename __is_kfree_rcu_offset to __is_kvfree_rcu_offset.
All RCU paths use kvfree() now instead of kfree(), thus
rename it.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcupdate.h | 6 +++---
 kernel/rcu/tiny.c        | 2 +-
 kernel/rcu/tree.c        | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 659cbfa7581a..1d25e6c23ebd 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -828,16 +828,16 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 
 /*
  * Does the specified offset indicate that the corresponding rcu_head
- * structure can be handled by kfree_rcu()?
+ * structure can be handled by kvfree_rcu()?
  */
-#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
+#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
 
 /*
  * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
  */
 #define __kfree_rcu(head, offset) \
 	do { \
-		BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
+		BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \
 		kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
 	} while (0)
 
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 3dd8e6e207b0..aa897c3f2e92 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -85,7 +85,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
 	unsigned long offset = (unsigned long)head->func;
 
 	rcu_lock_acquire(&rcu_callback_map);
-	if (__is_kfree_rcu_offset(offset)) {
+	if (__is_kvfree_rcu_offset(offset)) {
 		trace_rcu_invoke_kvfree_callback("", head, offset);
 		kvfree((void *)head - offset);
 		rcu_lock_release(&rcu_callback_map);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 821de8149928..5f53368f2554 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2793,7 +2793,7 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
 		return; // Enqueued onto ->nocb_bypass, so just leave.
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
-	if (__is_kfree_rcu_offset((unsigned long)func))
+	if (__is_kvfree_rcu_offset((unsigned long)func))
 		trace_rcu_kvfree_callback(rcu_state.name, head,
 					 (unsigned long)func,
 					 rcu_segcblist_n_cbs(&rdp->cblist));
@@ -3079,7 +3079,7 @@ static void kfree_rcu_work(struct work_struct *work)
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
 
-		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
+		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
 			kvfree(ptr);
 
 		rcu_lock_release(&rcu_callback_map);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 15/24] rcu: Rename kfree_call_rcu() to the kvfree_call_rcu().
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (13 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 14/24] rcu: Rename __is_kfree_rcu_offset() macro Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 16/24] mm/list_lru.c: Rename kvfree_rcu() to local variant Uladzislau Rezki (Sony)
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

The reason is, it is capable of freeing vmalloc()
memory now.

Do the same with __kfree_rcu() macro, it becomes
__kvfree_rcu(), the reason is the same as pointed
above.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcupdate.h | 8 ++++----
 include/linux/rcutiny.h  | 2 +-
 include/linux/rcutree.h  | 2 +-
 kernel/rcu/tree.c        | 6 +++---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1d25e6c23ebd..b344fc800a9b 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -835,10 +835,10 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 /*
  * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
  */
-#define __kfree_rcu(head, offset) \
+#define __kvfree_rcu(head, offset) \
 	do { \
 		BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \
-		kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
+		kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
 	} while (0)
 
 /**
@@ -857,7 +857,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * Because the functions are not allowed in the low-order 4096 bytes of
  * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
  * If the offset is larger than 4095 bytes, a compile-time error will
- * be generated in __kfree_rcu().  If this error is triggered, you can
+ * be generated in __kvfree_rcu(). If this error is triggered, you can
  * either fall back to use of call_rcu() or rearrange the structure to
  * position the rcu_head structure into the first 4096 bytes.
  *
@@ -872,7 +872,7 @@ do {									\
 	typeof (ptr) ___p = (ptr);					\
 									\
 	if (___p)							\
-		__kfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
+		__kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
 } while (0)
 
 /*
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 3465ba704a11..0c6315c4a0fe 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -34,7 +34,7 @@ static inline void synchronize_rcu_expedited(void)
 	synchronize_rcu();
 }
 
-static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	call_rcu(head, func);
 }
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fbc26274af4d..4d2732442013 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(int cpu)
 }
 
 void synchronize_rcu_expedited(void);
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
 
 void rcu_barrier(void);
 bool rcu_eqs_special_set(int cpu);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5f53368f2554..51726e4c3b4d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3251,12 +3251,12 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
  * one, that is used only when the main path can not be maintained temporary,
  * due to memory pressure.
  *
- * Each kfree_call_rcu() request is added to a batch. The batch will be drained
+ * Each kvfree_call_rcu() request is added to a batch. The batch will be drained
  * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
  * be free'd in workqueue context. This allows us to: batch requests together to
  * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
  */
-void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
@@ -3295,7 +3295,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
 }
-EXPORT_SYMBOL_GPL(kfree_call_rcu);
+EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 16/24] mm/list_lru.c: Rename kvfree_rcu() to local variant
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (14 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 15/24] rcu: Rename kfree_call_rcu() to the kvfree_call_rcu() Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 17/24] rcu: Introduce 2 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Rename kvfree_rcu() function to the kvfree_rcu_local() one. The aim is
to introduce the public API that would conflict with this one. So we
temporarily rename it and remove it in a later commit.

Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: rcu@vger.kernel.org
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/list_lru.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 4d5294c39bba..42c95bcb53ca 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -373,14 +373,14 @@ static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
 	struct list_lru_memcg *memcg_lrus;
 	/*
 	 * This is called when shrinker has already been unregistered,
-	 * and nobody can use it. So, there is no need to use kvfree_rcu().
+	 * and nobody can use it. So, there is no need to use kvfree_rcu_local().
 	 */
 	memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
 	__memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
 	kvfree(memcg_lrus);
 }
 
-static void kvfree_rcu(struct rcu_head *head)
+static void kvfree_rcu_local(struct rcu_head *head)
 {
 	struct list_lru_memcg *mlru;
 
@@ -419,7 +419,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru,
 	rcu_assign_pointer(nlru->memcg_lrus, new);
 	spin_unlock_irq(&nlru->lock);
 
-	call_rcu(&old->rcu, kvfree_rcu);
+	call_rcu(&old->rcu, kvfree_rcu_local);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 17/24] rcu: Introduce 2 arg kvfree_rcu() interface
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (15 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 16/24] mm/list_lru.c: Rename kvfree_rcu() to local variant Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 18/24] mm/list_lru.c: Remove kvfree_rcu_local() function Uladzislau Rezki (Sony)
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

kvfree_rcu() can deal with an allocated memory that is obtained
via kvmalloc(). It can return two types of allocated memory or
"pointers", one can belong to regular SLAB allocator and another
one can be vmalloc one. It depends on requested size and memory
pressure.

<snip>
    struct test_kvfree_rcu {
        struct rcu_head rcu;
        unsigned char array[100];
    };

    struct test_kvfree_rcu *p;

    p = kvmalloc(10 * PAGE_SIZE);
    if (p)
        kvfree_rcu(p, rcu);
<snip>

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/rcupdate.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index b344fc800a9b..51b26ab02878 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -875,6 +875,15 @@ do {									\
 		__kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
 } while (0)
 
+/**
+ * kvfree_rcu() - kvfree an object after a grace period.
+ * @ptr:	pointer to kvfree
+ * @rhf:	the name of the struct rcu_head within the type of @ptr.
+ *
+ * Same as kfree_rcu(), just simple alias.
+ */
+#define kvfree_rcu(ptr, rhf) kfree_rcu(ptr, rhf)
+
 /*
  * Place this after a lock-acquisition primitive to guarantee that
  * an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 18/24] mm/list_lru.c: Remove kvfree_rcu_local() function
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (16 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 17/24] rcu: Introduce 2 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-04-28 20:58 ` [PATCH 19/24] rcu/tree: Support reclaim for head-less object Uladzislau Rezki (Sony)
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Since there is newly introduced kvfree_rcu() API, there is no need in
queuing and using call_rcu() to kvfree() an object after the GP.

Remove kvfree_rcu_local() function and replace call_rcu() by new
kvfree_rcu() API that does the same but in more efficient way.

Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: rcu@vger.kernel.org
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/list_lru.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 42c95bcb53ca..a0b08b27a9b9 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -12,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/mutex.h>
 #include <linux/memcontrol.h>
+#include <linux/rcupdate.h>
 #include "slab.h"
 
 #ifdef CONFIG_MEMCG_KMEM
@@ -373,21 +374,13 @@ static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
 	struct list_lru_memcg *memcg_lrus;
 	/*
 	 * This is called when shrinker has already been unregistered,
-	 * and nobody can use it. So, there is no need to use kvfree_rcu_local().
+	 * and nobody can use it. So, there is no need to use kvfree_rcu().
 	 */
 	memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
 	__memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
 	kvfree(memcg_lrus);
 }
 
-static void kvfree_rcu_local(struct rcu_head *head)
-{
-	struct list_lru_memcg *mlru;
-
-	mlru = container_of(head, struct list_lru_memcg, rcu);
-	kvfree(mlru);
-}
-
 static int memcg_update_list_lru_node(struct list_lru_node *nlru,
 				      int old_size, int new_size)
 {
@@ -419,7 +412,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru,
 	rcu_assign_pointer(nlru->memcg_lrus, new);
 	spin_unlock_irq(&nlru->lock);
 
-	call_rcu(&old->rcu, kvfree_rcu_local);
+	kvfree_rcu(old, rcu);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (17 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 18/24] mm/list_lru.c: Remove kvfree_rcu_local() function Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 22:39   ` Paul E. McKenney
  2020-04-28 20:58 ` [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment Uladzislau Rezki (Sony)
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Update the kvfree_call_rcu() with head-less support, it
means an object without any rcu_head structure can be
reclaimed after GP.

To store pointers there are two chain-arrays maintained
one for SLAB and another one is for vmalloc. Both types
of objects(head-less variant and regular one) are placed
there based on the type.

It can be that maintaining of arrays becomes impossible
due to high memory pressure. For such reason there is an
emergency path. In that case objects with rcu_head inside
are just queued building one way list. Later on that list
is drained.

As for head-less variant. Such objects do not have any
rcu_head helper inside. Thus it is dynamically attached.
As a result an object consists of back-pointer and regular
rcu_head. It implies that emergency path can detect such
object type, therefore they are tagged. So a back-pointer
could be freed as well as dynamically attached wrapper.

Even though such approach requires dynamic memory it needs
only sizeof(unsigned long *) + sizeof(struct rcu_head) bytes,
thus SLAB is used to obtain it. Finally if attaching of the
rcu_head and queuing get failed, the current context has
to follow might_sleep() annotation, thus below steps could
be applied:
   a) wait until a grace period has elapsed;
   b) direct inlining of the kvfree() call.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 98 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 51726e4c3b4d..501cac02146d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3072,15 +3072,31 @@ static void kfree_rcu_work(struct work_struct *work)
 	 */
 	for (; head; head = next) {
 		unsigned long offset = (unsigned long)head->func;
-		void *ptr = (void *)head - offset;
+		bool headless;
+		void *ptr;
 
 		next = head->next;
+
+		/* We tag the headless object, if so adjust offset. */
+		headless = (((unsigned long) head - offset) & BIT(0));
+		if (headless)
+			offset -= 1;
+
+		ptr = (void *) head - offset;
+
 		debug_rcu_head_unqueue((struct rcu_head *)ptr);
 		rcu_lock_acquire(&rcu_callback_map);
 		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
 
-		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
+		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) {
+			/*
+			 * If headless free the back-pointer first.
+			 */
+			if (headless)
+				kvfree((void *) *((unsigned long *) ptr));
+
 			kvfree(ptr);
+		}
 
 		rcu_lock_release(&rcu_callback_map);
 		cond_resched_tasks_rcu_qs();
@@ -3221,6 +3237,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 			if (IS_ENABLED(CONFIG_PREEMPT_RT))
 				return false;
 
+			/*
+			 * TODO: For one argument of kvfree_rcu() we can
+			 * drop the lock and get the page in sleepable
+			 * context. That would allow to maintain an array
+			 * for the CONFIG_PREEMPT_RT as well. Thus we could
+			 * get rid of dynamic rcu_head attaching code.
+			 */
 			bnode = (struct kvfree_rcu_bulk_data *)
 				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
 		}
@@ -3244,6 +3267,23 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
 	return true;
 }
 
+static inline struct rcu_head *
+attach_rcu_head_to_object(void *obj)
+{
+	unsigned long *ptr;
+
+	ptr = kmalloc(sizeof(unsigned long *) +
+			sizeof(struct rcu_head), GFP_NOWAIT |
+				__GFP_RECLAIM |	/* can do direct reclaim. */
+				__GFP_NORETRY |	/* only lightweight one.  */
+				__GFP_NOWARN);	/* no failure reports. */
+	if (!ptr)
+		return NULL;
+
+	ptr[0] = (unsigned long) obj;
+	return ((struct rcu_head *) ++ptr);
+}
+
 /*
  * Queue a request for lazy invocation of appropriate free routine after a
  * grace period. Please note there are three paths are maintained, two are the
@@ -3260,16 +3300,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
 	unsigned long flags;
 	struct kfree_rcu_cpu *krcp;
+	bool success;
 	void *ptr;
 
+	if (head) {
+		ptr = (void *) head - (unsigned long) func;
+	} else {
+		/*
+		 * Please note there is a limitation for the head-less
+		 * variant, that is why there is a clear rule for such
+		 * objects:
+		 *
+		 * it can be used from might_sleep() context only. For
+		 * other places please embed an rcu_head to your data.
+		 */
+		might_sleep();
+		ptr = (unsigned long *) func;
+	}
+
 	krcp = krc_this_cpu_lock(&flags);
-	ptr = (void *)head - (unsigned long)func;
 
 	/* Queue the object but don't yet schedule the batch. */
 	if (debug_rcu_head_queue(ptr)) {
 		/* Probable double kfree_rcu(), just leak. */
 		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
 			  __func__, head);
+
+		/* Mark as success and leave. */
+		success = true;
 		goto unlock_return;
 	}
 
@@ -3277,10 +3335,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	 * Under high memory pressure GFP_NOWAIT can fail,
 	 * in that case the emergency path is maintained.
 	 */
-	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
+	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
+	if (!success) {
+		if (head == NULL) {
+			/*
+			 * Headless(one argument kvfree_rcu()) can sleep.
+			 * Drop the lock and tack it back. So it can do
+			 * direct lightweight reclaim.
+			 */
+			krc_this_cpu_unlock(krcp, flags);
+			head = attach_rcu_head_to_object(ptr);
+			krcp = krc_this_cpu_lock(&flags);
+
+			if (head == NULL)
+				goto unlock_return;
+
+			/*
+			 * Tag the headless object. Such objects have a
+			 * back-pointer to the original allocated memory,
+			 * that has to be freed as well as dynamically
+			 * attached wrapper/head.
+			 */
+			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
+		}
+
 		head->func = func;
 		head->next = krcp->head;
 		krcp->head = head;
+		success = true;
 	}
 
 	WRITE_ONCE(krcp->count, krcp->count + 1);
@@ -3294,6 +3376,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
+
+	/*
+	 * High memory pressure, so inline kvfree() after
+	 * synchronize_rcu(). We can do it from might_sleep()
+	 * context only, so the current CPU can pass the QS
+	 * state.
+	 */
+	if (!success) {
+		debug_rcu_head_unqueue(ptr);
+		synchronize_rcu();
+		kvfree(ptr);
+	}
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (18 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 19/24] rcu/tree: Support reclaim for head-less object Uladzislau Rezki (Sony)
@ 2020-04-28 20:58 ` Uladzislau Rezki (Sony)
  2020-05-01 23:00   ` Paul E. McKenney
  2020-04-28 20:59 ` [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header Uladzislau Rezki (Sony)
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:58 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Handle cases where the the object being kvfree_rcu()'d is not aligned by
2-byte boundaries.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 501cac02146d..649bad7ad0f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2877,6 +2877,9 @@ struct kvfree_rcu_bulk_data {
 #define KVFREE_BULK_MAX_ENTR \
 	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
 
+/* Encoding the offset of a fake rcu_head to indicate the head is a wrapper. */
+#define RCU_HEADLESS_KFREE BIT(31)
+
 /**
  * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
  * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
@@ -3078,9 +3081,9 @@ static void kfree_rcu_work(struct work_struct *work)
 		next = head->next;
 
 		/* We tag the headless object, if so adjust offset. */
-		headless = (((unsigned long) head - offset) & BIT(0));
+		headless = !!(offset & RCU_HEADLESS_KFREE);
 		if (headless)
-			offset -= 1;
+			offset &= ~(RCU_HEADLESS_KFREE);
 
 		ptr = (void *) head - offset;
 
@@ -3356,7 +3359,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 			 * that has to be freed as well as dynamically
 			 * attached wrapper/head.
 			 */
-			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
+			func = (rcu_callback_t)(sizeof(unsigned long *) | RCU_HEADLESS_KFREE);
 		}
 
 		head->func = func;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (19 preceding siblings ...)
  2020-04-28 20:58 ` [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment Uladzislau Rezki (Sony)
@ 2020-04-28 20:59 ` Uladzislau Rezki (Sony)
  2020-05-01 23:03   ` Paul E. McKenney
  2020-04-28 20:59 ` [PATCH 22/24] rcu/tiny: support reclaim for head-less object Uladzislau Rezki (Sony)
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:59 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Move inlined kvfree_call_rcu() function out of the
header file. This step is a preparation for head-less
support.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/rcutiny.h | 6 +-----
 kernel/rcu/tiny.c       | 6 ++++++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0c6315c4a0fe..7eb66909ae1b 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -34,11 +34,7 @@ static inline void synchronize_rcu_expedited(void)
 	synchronize_rcu();
 }
 
-static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
-{
-	call_rcu(head, func);
-}
-
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
 void rcu_qs(void);
 
 static inline void rcu_softirq_qs(void)
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index aa897c3f2e92..508c82faa45c 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -177,6 +177,12 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
+void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	call_rcu(head, func);
+}
+EXPORT_SYMBOL_GPL(kvfree_call_rcu);
+
 void __init rcu_init(void)
 {
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 22/24] rcu/tiny: support reclaim for head-less object
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (20 preceding siblings ...)
  2020-04-28 20:59 ` [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header Uladzislau Rezki (Sony)
@ 2020-04-28 20:59 ` Uladzislau Rezki (Sony)
  2020-05-01 23:06   ` Paul E. McKenney
  2020-04-28 20:59 ` [PATCH 23/24] rcu: Introduce 1 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
  2020-04-28 20:59 ` [PATCH 24/24] lib/test_vmalloc.c: Add test cases for kvfree_rcu() Uladzislau Rezki (Sony)
  23 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:59 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Make a kvfree_call_rcu() function to support head-less
freeing. Same as for tree-RCU, for such purpose we store
pointers in array. SLAB and vmalloc ptrs. are mixed and
coexist together.

Under high memory pressure it can be that maintaining of
arrays becomes impossible. Objects with an rcu_head are
released via call_rcu(). When it comes to the head-less
variant, the kvfree() call is directly inlined, i.e. we
do the same as for tree-RCU:
    a) wait until a grace period has elapsed;
    b) direct inlining of the kvfree() call.

Thus the current context has to follow might_sleep()
annotation. Also please note that for tiny-RCU any
call of synchronize_rcu() is actually a quiescent
state, therefore (a) does nothing.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tiny.c | 157 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 156 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 508c82faa45c..b1c31a935db9 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -40,6 +40,29 @@ static struct rcu_ctrlblk rcu_ctrlblk = {
 	.curtail	= &rcu_ctrlblk.rcucblist,
 };
 
+/* Can be common with tree-RCU. */
+#define KVFREE_DRAIN_JIFFIES (HZ / 50)
+
+/* Can be common with tree-RCU. */
+struct kvfree_rcu_bulk_data {
+	unsigned long nr_records;
+	struct kvfree_rcu_bulk_data *next;
+	void *records[];
+};
+
+/* Can be common with tree-RCU. */
+#define KVFREE_BULK_MAX_ENTR \
+	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
+
+static struct kvfree_rcu_bulk_data *kvhead;
+static struct kvfree_rcu_bulk_data *kvhead_free;
+static struct kvfree_rcu_bulk_data *kvcache;
+
+static DEFINE_STATIC_KEY_FALSE(rcu_init_done);
+static struct delayed_work monitor_work;
+static struct rcu_work rcu_work;
+static bool monitor_todo;
+
 void rcu_barrier(void)
 {
 	wait_rcu_gp(call_rcu);
@@ -177,9 +200,137 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
+static inline bool
+kvfree_call_rcu_add_ptr_to_bulk(void *ptr)
+{
+	struct kvfree_rcu_bulk_data *bnode;
+
+	if (!kvhead || kvhead->nr_records == KVFREE_BULK_MAX_ENTR) {
+		bnode = xchg(&kvcache, NULL);
+		if (!bnode)
+			bnode = (struct kvfree_rcu_bulk_data *)
+				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
+
+		if (unlikely(!bnode))
+			return false;
+
+		/* Initialize the new block. */
+		bnode->nr_records = 0;
+		bnode->next = kvhead;
+
+		/* Attach it to the bvhead. */
+		kvhead = bnode;
+	}
+
+	/* Done. */
+	kvhead->records[kvhead->nr_records++] = ptr;
+	return true;
+}
+
+static void
+kvfree_rcu_work(struct work_struct *work)
+{
+	struct kvfree_rcu_bulk_data *kvhead_tofree, *next;
+	unsigned long flags;
+	int i;
+
+	local_irq_save(flags);
+	kvhead_tofree = kvhead_free;
+	kvhead_free = NULL;
+	local_irq_restore(flags);
+
+	/* Reclaim process. */
+	for (; kvhead_tofree; kvhead_tofree = next) {
+		next = kvhead_tofree->next;
+
+		for (i = 0; i < kvhead_tofree->nr_records; i++) {
+			debug_rcu_head_unqueue((struct rcu_head *)
+				kvhead_tofree->records[i]);
+			kvfree(kvhead_tofree->records[i]);
+		}
+
+		if (cmpxchg(&kvcache, NULL, kvhead_tofree))
+			free_page((unsigned long) kvhead_tofree);
+	}
+}
+
+static inline bool
+queue_kvfree_rcu_work(void)
+{
+	/* Check if the free channel is available. */
+	if (kvhead_free)
+		return false;
+
+	kvhead_free = kvhead;
+	kvhead = NULL;
+
+	/*
+	 * Queue the job for memory reclaim after GP.
+	 */
+	queue_rcu_work(system_wq, &rcu_work);
+	return true;
+}
+
+static void kvfree_rcu_monitor(struct work_struct *work)
+{
+	unsigned long flags;
+	bool queued;
+
+	local_irq_save(flags);
+	queued = queue_kvfree_rcu_work();
+	if (queued)
+		/* Success. */
+		monitor_todo = false;
+	local_irq_restore(flags);
+
+	/*
+	 * If previous RCU reclaim process is still in progress,
+	 * schedule the work one more time to try again later.
+	 */
+	if (monitor_todo)
+		schedule_delayed_work(&monitor_work,
+			KVFREE_DRAIN_JIFFIES);
+}
+
 void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-	call_rcu(head, func);
+	unsigned long flags;
+	bool success;
+	void *ptr;
+
+	if (head) {
+		ptr = (void *) head - (unsigned long) func;
+	} else {
+		might_sleep();
+		ptr = (void *) func;
+	}
+
+	if (debug_rcu_head_queue(ptr)) {
+		/* Probable double free, just leak. */
+		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
+			  __func__, head);
+		return;
+	}
+
+	local_irq_save(flags);
+	success = kvfree_call_rcu_add_ptr_to_bulk(ptr);
+	if (static_branch_likely(&rcu_init_done)) {
+		if (success && !monitor_todo) {
+			monitor_todo = true;
+			schedule_delayed_work(&monitor_work,
+				KVFREE_DRAIN_JIFFIES);
+		}
+	}
+	local_irq_restore(flags);
+
+	if (!success) {
+		if (!head) {
+			synchronize_rcu();
+			kvfree(ptr);
+		} else {
+			call_rcu(head, func);
+		}
+	}
 }
 EXPORT_SYMBOL_GPL(kvfree_call_rcu);
 
@@ -188,4 +339,8 @@ void __init rcu_init(void)
 	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 	rcu_early_boot_tests();
 	srcu_init();
+
+	INIT_DELAYED_WORK(&monitor_work, kvfree_rcu_monitor);
+	INIT_RCU_WORK(&rcu_work, kvfree_rcu_work);
+	static_branch_enable(&rcu_init_done);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 23/24] rcu: Introduce 1 arg kvfree_rcu() interface
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (21 preceding siblings ...)
  2020-04-28 20:59 ` [PATCH 22/24] rcu/tiny: support reclaim for head-less object Uladzislau Rezki (Sony)
@ 2020-04-28 20:59 ` Uladzislau Rezki (Sony)
  2020-04-28 20:59 ` [PATCH 24/24] lib/test_vmalloc.c: Add test cases for kvfree_rcu() Uladzislau Rezki (Sony)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:59 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Make it possible to pass one or two arguments to the
kvfree_rcu() macro what corresponds to either headless
case or not, so it becomes a bit versatile.

As a result we obtain two ways of using that macro,
below are two examples:

a) kvfree_rcu(ptr, rhf);
    struct X {
        struct rcu_head rhf;
        unsigned char data[100];
    };

    void *ptr = kvmalloc(sizeof(struct X), GFP_KERNEL);
    if (ptr)
        kvfree_rcu(ptr, rhf);

b) kvfree_rcu(ptr);
    void *ptr = kvmalloc(some_bytes, GFP_KERNEL);
    if (ptr)
        kvfree_rcu(ptr);

Last one, we name it headless variant, only needs one
argument, means it does not require any rcu_head to be
present within the type of ptr.

There is a restriction the (b) context has to fall into
might_sleep() annotation. To check that, please activate
the CONFIG_DEBUG_ATOMIC_SLEEP option in your kernel.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/rcupdate.h | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 51b26ab02878..d15d46db61f7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -877,12 +877,42 @@ do {									\
 
 /**
  * kvfree_rcu() - kvfree an object after a grace period.
- * @ptr:	pointer to kvfree
- * @rhf:	the name of the struct rcu_head within the type of @ptr.
  *
- * Same as kfree_rcu(), just simple alias.
+ * This macro consists of one or two arguments and it is
+ * based on whether an object is head-less or not. If it
+ * has a head then a semantic stays the same as it used
+ * to be before:
+ *
+ *     kvfree_rcu(ptr, rhf);
+ *
+ * where @ptr is a pointer to kvfree(), @rhf is the name
+ * of the rcu_head structure within the type of @ptr.
+ *
+ * When it comes to head-less variant, only one argument
+ * is passed and that is just a pointer which has to be
+ * freed after a grace period. Therefore the semantic is
+ *
+ *     kvfree_rcu(ptr);
+ *
+ * where @ptr is a pointer to kvfree().
+ *
+ * Please note, head-less way of freeing is permitted to
+ * use from a context that has to follow might_sleep()
+ * annotation. Otherwise, please switch and embed the
+ * rcu_head structure within the type of @ptr.
  */
-#define kvfree_rcu(ptr, rhf) kfree_rcu(ptr, rhf)
+#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__,		\
+	kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
+
+#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
+#define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf)
+#define kvfree_rcu_arg_1(ptr)					\
+do {								\
+	typeof(ptr) ___p = (ptr);				\
+								\
+	if (___p)						\
+		kvfree_call_rcu(NULL, (rcu_callback_t) (___p));	\
+} while (0)
 
 /*
  * Place this after a lock-acquisition primitive to guarantee that
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 24/24] lib/test_vmalloc.c: Add test cases for kvfree_rcu()
  2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
                   ` (22 preceding siblings ...)
  2020-04-28 20:59 ` [PATCH 23/24] rcu: Introduce 1 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
@ 2020-04-28 20:59 ` Uladzislau Rezki (Sony)
  23 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki (Sony) @ 2020-04-28 20:59 UTC (permalink / raw)
  To: LKML, linux-mm
  Cc: Andrew Morton, Paul E . McKenney, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce four new test cases. They are considered to
cover and testing of kvfree_rcu() interface. Two of
them belong to 1 argument functionality and another
two for 2 arguments functionality.

The aim is to stress it, to check how it behaves under
different load and memory conditions, analysis of the
performance throughput and its impact.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 lib/test_vmalloc.c | 103 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 95 insertions(+), 8 deletions(-)

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 8bbefcaddfe8..ec73561cda2e 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -15,6 +15,8 @@
 #include <linux/delay.h>
 #include <linux/rwsem.h>
 #include <linux/mm.h>
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
 
 #define __param(type, name, init, msg)		\
 	static type name = init;				\
@@ -35,14 +37,18 @@ __param(int, test_loop_count, 1000000,
 
 __param(int, run_test_mask, INT_MAX,
 	"Set tests specified in the mask.\n\n"
-		"\t\tid: 1,   name: fix_size_alloc_test\n"
-		"\t\tid: 2,   name: full_fit_alloc_test\n"
-		"\t\tid: 4,   name: long_busy_list_alloc_test\n"
-		"\t\tid: 8,   name: random_size_alloc_test\n"
-		"\t\tid: 16,  name: fix_align_alloc_test\n"
-		"\t\tid: 32,  name: random_size_align_alloc_test\n"
-		"\t\tid: 64,  name: align_shift_alloc_test\n"
-		"\t\tid: 128, name: pcpu_alloc_test\n"
+		"\t\tid: 1,    name: fix_size_alloc_test\n"
+		"\t\tid: 2,    name: full_fit_alloc_test\n"
+		"\t\tid: 4,    name: long_busy_list_alloc_test\n"
+		"\t\tid: 8,    name: random_size_alloc_test\n"
+		"\t\tid: 16,   name: fix_align_alloc_test\n"
+		"\t\tid: 32,   name: random_size_align_alloc_test\n"
+		"\t\tid: 64,   name: align_shift_alloc_test\n"
+		"\t\tid: 128,  name: pcpu_alloc_test\n"
+		"\t\tid: 256,  name: kvfree_rcu_1_arg_vmalloc_test\n"
+		"\t\tid: 512,  name: kvfree_rcu_2_arg_vmalloc_test\n"
+		"\t\tid: 1024, name: kvfree_rcu_1_arg_slab_test\n"
+		"\t\tid: 2048, name: kvfree_rcu_2_arg_slab_test\n"
 		/* Add a new test case description here. */
 );
 
@@ -328,6 +334,83 @@ pcpu_alloc_test(void)
 	return rv;
 }
 
+struct test_kvfree_rcu {
+	struct rcu_head rcu;
+	unsigned char array[20];
+};
+
+static int
+kvfree_rcu_1_arg_vmalloc_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = vmalloc(1 * PAGE_SIZE);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p);
+	}
+
+	return 0;
+}
+
+static int
+kvfree_rcu_2_arg_vmalloc_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = vmalloc(1 * PAGE_SIZE);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p, rcu);
+	}
+
+	return 0;
+}
+
+static int
+kvfree_rcu_1_arg_slab_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = kmalloc(sizeof(*p), GFP_KERNEL);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p);
+	}
+
+	return 0;
+}
+
+static int
+kvfree_rcu_2_arg_slab_test(void)
+{
+	struct test_kvfree_rcu *p;
+	int i;
+
+	for (i = 0; i < test_loop_count; i++) {
+		p = kmalloc(sizeof(*p), GFP_KERNEL);
+		if (!p)
+			return -1;
+
+		p->array[0] = 'a';
+		kvfree_rcu(p, rcu);
+	}
+
+	return 0;
+}
+
 struct test_case_desc {
 	const char *test_name;
 	int (*test_func)(void);
@@ -342,6 +425,10 @@ static struct test_case_desc test_case_array[] = {
 	{ "random_size_align_alloc_test", random_size_align_alloc_test },
 	{ "align_shift_alloc_test", align_shift_alloc_test },
 	{ "pcpu_alloc_test", pcpu_alloc_test },
+	{ "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test },
+	{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test },
+	{ "kvfree_rcu_1_arg_slab_test", kvfree_rcu_1_arg_slab_test },
+	{ "kvfree_rcu_2_arg_slab_test", kvfree_rcu_2_arg_slab_test },
 	/* Add a new test case here. */
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-04-28 20:58 ` [PATCH 03/24] rcu/tree: Use consistent style for comments Uladzislau Rezki (Sony)
@ 2020-05-01 19:05   ` Paul E. McKenney
  2020-05-01 20:52     ` Joe Perches
  2020-05-03 23:52     ` Joel Fernandes
  0 siblings, 2 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 19:05 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> Simple clean up of comments in kfree_rcu() code to keep it consistent
> with majority of commenting styles.
> 
> Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Hmmm...

Exactly why is three additional characters per line preferable?  Or in
the case of block comments, either one or two additional lines, depending
on /* */ style?

I am (slowly) moving RCU to "//" for those reasons.  ;-)

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cd61649e1b00..1487af8e11e8 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3043,15 +3043,15 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
>  static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
>  					  unsigned long flags)
>  {
> -	// Attempt to start a new batch.
> +	/* Attempt to start a new batch. */
>  	krcp->monitor_todo = false;
>  	if (queue_kfree_rcu_work(krcp)) {
> -		// Success! Our job is done here.
> +		/* Success! Our job is done here. */
>  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  		return;
>  	}
>  
> -	// Previous RCU batch still in progress, try again later.
> +	/* Previous RCU batch still in progress, try again later. */
>  	krcp->monitor_todo = true;
>  	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
>  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> @@ -3151,14 +3151,14 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	unsigned long flags;
>  	struct kfree_rcu_cpu *krcp;
>  
> -	local_irq_save(flags);	// For safely calling this_cpu_ptr().
> +	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
>  	krcp = this_cpu_ptr(&krc);
>  	if (krcp->initialized)
>  		raw_spin_lock(&krcp->lock);
>  
> -	// Queue the object but don't yet schedule the batch.
> +	/* Queue the object but don't yet schedule the batch. */
>  	if (debug_rcu_head_queue(head)) {
> -		// Probable double kfree_rcu(), just leak.
> +		/* Probable double kfree_rcu(), just leak. */
>  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
>  			  __func__, head);
>  		goto unlock_return;
> @@ -3176,7 +3176,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  
>  	WRITE_ONCE(krcp->count, krcp->count + 1);
>  
> -	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> +	/* Set timer to drain after KFREE_DRAIN_JIFFIES. */
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>  	    !krcp->monitor_todo) {
>  		krcp->monitor_todo = true;
> @@ -3722,7 +3722,7 @@ int rcutree_offline_cpu(unsigned int cpu)
>  
>  	rcutree_affinity_setting(cpu, cpu);
>  
> -	// nohz_full CPUs need the tick for stop-machine to work quickly
> +	/* nohz_full CPUs need the tick for stop-machine to work quickly */
>  	tick_dep_set(TICK_DEP_BIT_RCU);
>  	return 0;
>  }
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-01 19:05   ` Paul E. McKenney
@ 2020-05-01 20:52     ` Joe Perches
  2020-05-03 23:44       ` Joel Fernandes
  2020-05-03 23:52     ` Joel Fernandes
  1 sibling, 1 reply; 78+ messages in thread
From: Joe Perches @ 2020-05-01 20:52 UTC (permalink / raw)
  To: paulmck, Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Fri, 2020-05-01 at 12:05 -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > with majority of commenting styles.
[]
> on /* */ style?
> 
> I am (slowly) moving RCU to "//" for those reasons.  ;-)

I hope c99 comment styles are more commonly used soon too.
checkpatch doesn't care.

Perhaps a change to coding-style.rst
---
 Documentation/process/coding-style.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index acb2f1b..fee647 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -565,6 +565,11 @@ comments is a little different.
 	 * but there is no initial almost-blank line.
 	 */
 
+.. code-block:: c
+
+	// Single line and inline comments may also use the c99 // style
+	// Block comments as well
+
 It's also important to comment data, whether they are basic types or derived
 types.  To this end, use just one data declaration per line (no commas for
 multiple data declarations).  This leaves you room for a small comment on each



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/24] rcu/tree: Use static initializer for krc.lock
  2020-04-28 20:58 ` [PATCH 08/24] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
@ 2020-05-01 21:17   ` Paul E. McKenney
  2020-05-04 12:10     ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 21:17 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko,
	Sebastian Andrzej Siewior

On Tue, Apr 28, 2020 at 10:58:47PM +0200, Uladzislau Rezki (Sony) wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The per-CPU variable is initialized at runtime in
> kfree_rcu_batch_init(). This function is invoked before
> 'rcu_scheduler_active' is set to 'RCU_SCHEDULER_RUNNING'.
> After the initialisation, '->initialized' is to true.
> 
> The raw_spin_lock is only acquired if '->initialized' is
> set to true. The worqueue item is only used if 'rcu_scheduler_active'
> set to RCU_SCHEDULER_RUNNING which happens after initialisation.
> 
> Use a static initializer for krc.lock and remove the runtime
> initialisation of the lock. Since the lock can now be always
> acquired, remove the '->initialized' check.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  kernel/rcu/tree.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index bc6c2bc8fa32..89e9ca3f4e3e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2892,7 +2892,7 @@ struct kfree_rcu_cpu_work {
>   * @lock: Synchronize access to this structure
>   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
>   * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
> - * @initialized: The @lock and @rcu_work fields have been initialized
> + * @initialized: The @rcu_work fields have been initialized
>   *
>   * This is a per-CPU structure.  The reason that it is not included in
>   * the rcu_data structure is to permit this code to be extracted from
> @@ -2912,7 +2912,9 @@ struct kfree_rcu_cpu {
>  	int count;
>  };
>  
> -static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc);
> +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> +	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
> +};
>  
>  static __always_inline void
>  debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
> @@ -2930,10 +2932,9 @@ krc_this_cpu_lock(unsigned long *flags)
>  {
>  	struct kfree_rcu_cpu *krcp;
>  
> -	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
> +	local_irq_save(*flags);	/* For safely calling this_cpu_ptr(). */

And here as well.  ;-)

							Thanx, Paul

>  	krcp = this_cpu_ptr(&krc);
> -	if (likely(krcp->initialized))
> -		raw_spin_lock(&krcp->lock);
> +	raw_spin_lock(&krcp->lock);
>  
>  	return krcp;
>  }
> @@ -2941,8 +2942,7 @@ krc_this_cpu_lock(unsigned long *flags)
>  static inline void
>  krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
>  {
> -	if (likely(krcp->initialized))
> -		raw_spin_unlock(&krcp->lock);
> +	raw_spin_unlock(&krcp->lock);
>  	local_irq_restore(flags);
>  }
>  
> @@ -4168,7 +4168,6 @@ static void __init kfree_rcu_batch_init(void)
>  	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
> -		raw_spin_lock_init(&krcp->lock);
>  		for (i = 0; i < KFREE_N_BATCHES; i++) {
>  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
>  			krcp->krw_arr[i].krcp = krcp;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-04-28 20:58 ` [PATCH 09/24] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
@ 2020-05-01 21:27   ` Paul E. McKenney
  2020-05-04 12:43     ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 21:27 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:58:48PM +0200, Uladzislau Rezki (Sony) wrote:
> Cache some extra objects per-CPU. During reclaim process
> some pages are cached instead of releasing by linking them
> into the list. Such approach provides O(1) access time to
> the cache.
> 
> That reduces number of requests to the page allocator, also
> that makes it more helpful if a low memory condition occurs.
> 
> A parameter reflecting the minimum allowed pages to be
> cached per one CPU is propagated via sysfs, it is read
> only, the name is "rcu_min_cached_objs".
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  kernel/rcu/tree.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 89e9ca3f4e3e..d8975819b1c9 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -178,6 +178,14 @@ module_param(gp_init_delay, int, 0444);
>  static int gp_cleanup_delay;
>  module_param(gp_cleanup_delay, int, 0444);
>  
> +/*
> + * This rcu parameter is read-only, but can be write also.

You mean that although the parameter is read-only, you see no reason
why it could not be converted to writeable?

If it was writeable, and a given CPU had the maximum numbr of cached
objects, the rcu_min_cached_objs value was decreased, but that CPU never
saw another kfree_rcu(), would the number of cached objects change?

(Just curious, not asking for a change in functionality.)

> + * It reflects the minimum allowed number of objects which
> + * can be cached per-CPU. Object size is equal to one page.
> + */
> +int rcu_min_cached_objs = 2;
> +module_param(rcu_min_cached_objs, int, 0444);
> +
>  /* Retrieve RCU kthreads priority for rcutorture */
>  int rcu_get_gp_kthreads_prio(void)
>  {
> @@ -2887,7 +2895,6 @@ struct kfree_rcu_cpu_work {
>   * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
>   * @head: List of kfree_rcu() objects not yet waiting for a grace period
>   * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
> - * @bcached: Keeps at most one object for later reuse when build chain blocks
>   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
>   * @lock: Synchronize access to this structure
>   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> @@ -2902,7 +2909,6 @@ struct kfree_rcu_cpu_work {
>  struct kfree_rcu_cpu {
>  	struct rcu_head *head;
>  	struct kfree_rcu_bulk_data *bhead;
> -	struct kfree_rcu_bulk_data *bcached;
>  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
>  	raw_spinlock_t lock;
>  	struct delayed_work monitor_work;
> @@ -2910,6 +2916,15 @@ struct kfree_rcu_cpu {
>  	bool initialized;
>  	// Number of objects for which GP not started
>  	int count;
> +
> +	/*
> +	 * Number of cached objects which are queued into
> +	 * the lock-less list. This cache is used by the
> +	 * kvfree_call_rcu() function and as of now its
> +	 * size is static.
> +	 */
> +	struct llist_head bkvcache;
> +	int nr_bkv_objs;
>  };
>  
>  static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> @@ -2946,6 +2961,31 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
>  	local_irq_restore(flags);
>  }
>  
> +static inline struct kfree_rcu_bulk_data *
> +get_cached_bnode(struct kfree_rcu_cpu *krcp)
> +{
> +	if (!krcp->nr_bkv_objs)
> +		return NULL;
> +
> +	krcp->nr_bkv_objs--;
> +	return (struct kfree_rcu_bulk_data *)
> +		llist_del_first(&krcp->bkvcache);
> +}
> +
> +static inline bool
> +put_cached_bnode(struct kfree_rcu_cpu *krcp,
> +	struct kfree_rcu_bulk_data *bnode)
> +{
> +	/* Check the limit. */
> +	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
> +		return false;
> +
> +	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
> +	krcp->nr_bkv_objs++;
> +	return true;
> +
> +}
> +
>  /*
>   * This function is invoked in workqueue context after a grace period.
>   * It frees all the objects queued on ->bhead_free or ->head_free.
> @@ -2981,7 +3021,12 @@ static void kfree_rcu_work(struct work_struct *work)
>  		kfree_bulk(bhead->nr_records, bhead->records);
>  		rcu_lock_release(&rcu_callback_map);
>  
> -		if (cmpxchg(&krcp->bcached, NULL, bhead))
> +		krcp = krc_this_cpu_lock(&flags);

Presumably the list can also be accessed without holding this lock,
because otherwise we shouldn't need llist...

							Thanx, Paul

> +		if (put_cached_bnode(krcp, bhead))
> +			bhead = NULL;
> +		krc_this_cpu_unlock(krcp, flags);
> +
> +		if (bhead)
>  			free_page((unsigned long) bhead);
>  
>  		cond_resched_tasks_rcu_qs();
> @@ -3114,7 +3159,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
>  	/* Check if a new block is required. */
>  	if (!krcp->bhead ||
>  			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> -		bnode = xchg(&krcp->bcached, NULL);
> +		bnode = get_cached_bnode(krcp);
>  		if (!bnode) {
>  			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
>  
> @@ -4167,12 +4212,23 @@ static void __init kfree_rcu_batch_init(void)
>  
>  	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> +		struct kfree_rcu_bulk_data *bnode;
>  
>  		for (i = 0; i < KFREE_N_BATCHES; i++) {
>  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
>  			krcp->krw_arr[i].krcp = krcp;
>  		}
>  
> +		for (i = 0; i < rcu_min_cached_objs; i++) {
> +			bnode = (struct kfree_rcu_bulk_data *)
> +				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> +
> +			if (bnode)
> +				put_cached_bnode(krcp, bnode);
> +			else
> +				pr_err("Failed to preallocate for %d CPU!\n", cpu);
> +		}
> +
>  		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
>  		krcp->initialized = true;
>  	}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-04-28 20:58 ` [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
@ 2020-05-01 21:37   ` Paul E. McKenney
  2020-05-03 23:42     ` Joel Fernandes
  2020-05-04 14:25     ` Uladzislau Rezki
  0 siblings, 2 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 21:37 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:58:50PM +0200, Uladzislau Rezki (Sony) wrote:
> To do so we use an array of common kvfree_rcu_bulk_data
> structure. It consists of two elements, index number 0
> corresponds to SLAB ptrs., whereas vmalloc pointers can
> be accessed by using index number 1.
> 
> The reason of not mixing pointers is to have an easy way
> to to distinguish them.
> 
> It is also the preparation patch for head-less objects
> support. When an object is head-less we can not queue
> it into any list, instead a pointer is placed directly
> into an array.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 172 +++++++++++++++++++++++++++++-----------------
>  1 file changed, 109 insertions(+), 63 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d8975819b1c9..7983926af95b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -57,6 +57,7 @@
>  #include <linux/slab.h>
>  #include <linux/sched/isolation.h>
>  #include <linux/sched/clock.h>
> +#include <linux/mm.h>
>  #include "../time/tick-internal.h"
>  
>  #include "tree.h"
> @@ -2857,44 +2858,44 @@ EXPORT_SYMBOL_GPL(call_rcu);
>  #define KFREE_N_BATCHES 2
>  
>  /**
> - * struct kfree_rcu_bulk_data - single block to store kfree_rcu() pointers
> + * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
>   * @nr_records: Number of active pointers in the array
> - * @records: Array of the kfree_rcu() pointers
>   * @next: Next bulk object in the block chain
> + * @records: Array of the kvfree_rcu() pointers
>   */
> -struct kfree_rcu_bulk_data {
> +struct kvfree_rcu_bulk_data {
>  	unsigned long nr_records;
> -	struct kfree_rcu_bulk_data *next;
> +	struct kvfree_rcu_bulk_data *next;
>  	void *records[];
>  };
>  
>  /*
>   * This macro defines how many entries the "records" array
>   * will contain. It is based on the fact that the size of
> - * kfree_rcu_bulk_data structure becomes exactly one page.
> + * kvfree_rcu_bulk_data structure becomes exactly one page.
>   */
> -#define KFREE_BULK_MAX_ENTR \
> -	((PAGE_SIZE - sizeof(struct kfree_rcu_bulk_data)) / sizeof(void *))
> +#define KVFREE_BULK_MAX_ENTR \
> +	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
>  
>  /**
>   * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
>   * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
>   * @head_free: List of kfree_rcu() objects waiting for a grace period
> - * @bhead_free: Bulk-List of kfree_rcu() objects waiting for a grace period
> + * @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
>   * @krcp: Pointer to @kfree_rcu_cpu structure
>   */
>  
>  struct kfree_rcu_cpu_work {
>  	struct rcu_work rcu_work;
>  	struct rcu_head *head_free;
> -	struct kfree_rcu_bulk_data *bhead_free;
> +	struct kvfree_rcu_bulk_data *bkvhead_free[2];
>  	struct kfree_rcu_cpu *krcp;
>  };
>  
>  /**
>   * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
>   * @head: List of kfree_rcu() objects not yet waiting for a grace period
> - * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
> + * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
>   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
>   * @lock: Synchronize access to this structure
>   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> @@ -2908,7 +2909,7 @@ struct kfree_rcu_cpu_work {
>   */
>  struct kfree_rcu_cpu {
>  	struct rcu_head *head;
> -	struct kfree_rcu_bulk_data *bhead;
> +	struct kvfree_rcu_bulk_data *bkvhead[2];
>  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
>  	raw_spinlock_t lock;
>  	struct delayed_work monitor_work;
> @@ -2932,7 +2933,7 @@ static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
>  };
>  
>  static __always_inline void
> -debug_rcu_bhead_unqueue(struct kfree_rcu_bulk_data *bhead)
> +debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
>  {
>  #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
>  	int i;
> @@ -2961,20 +2962,20 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
>  	local_irq_restore(flags);
>  }
>  
> -static inline struct kfree_rcu_bulk_data *
> +static inline struct kvfree_rcu_bulk_data *
>  get_cached_bnode(struct kfree_rcu_cpu *krcp)
>  {
>  	if (!krcp->nr_bkv_objs)
>  		return NULL;
>  
>  	krcp->nr_bkv_objs--;
> -	return (struct kfree_rcu_bulk_data *)
> +	return (struct kvfree_rcu_bulk_data *)
>  		llist_del_first(&krcp->bkvcache);
>  }
>  
>  static inline bool
>  put_cached_bnode(struct kfree_rcu_cpu *krcp,
> -	struct kfree_rcu_bulk_data *bnode)
> +	struct kvfree_rcu_bulk_data *bnode)
>  {
>  	/* Check the limit. */
>  	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
> @@ -2993,41 +2994,73 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
>  static void kfree_rcu_work(struct work_struct *work)
>  {
>  	unsigned long flags;
> +	struct kvfree_rcu_bulk_data *bkhead, *bvhead, *bnext;
>  	struct rcu_head *head, *next;
> -	struct kfree_rcu_bulk_data *bhead, *bnext;
>  	struct kfree_rcu_cpu *krcp;
>  	struct kfree_rcu_cpu_work *krwp;
> +	int i;
>  
>  	krwp = container_of(to_rcu_work(work),
>  			    struct kfree_rcu_cpu_work, rcu_work);
>  	krcp = krwp->krcp;
> +
>  	raw_spin_lock_irqsave(&krcp->lock, flags);
> +	/* Channel 1. */
> +	bkhead = krwp->bkvhead_free[0];
> +	krwp->bkvhead_free[0] = NULL;
> +
> +	/* Channel 2. */
> +	bvhead = krwp->bkvhead_free[1];
> +	krwp->bkvhead_free[1] = NULL;
> +
> +	/* Channel 3. */
>  	head = krwp->head_free;
>  	krwp->head_free = NULL;
> -	bhead = krwp->bhead_free;
> -	krwp->bhead_free = NULL;
>  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  
> -	/* "bhead" is now private, so traverse locklessly. */
> -	for (; bhead; bhead = bnext) {
> -		bnext = bhead->next;
> -
> -		debug_rcu_bhead_unqueue(bhead);
> +	/* kmalloc()/kfree() channel. */
> +	for (; bkhead; bkhead = bnext) {
> +		bnext = bkhead->next;
> +		debug_rcu_bhead_unqueue(bkhead);
>  
>  		rcu_lock_acquire(&rcu_callback_map);

Given that rcu_lock_acquire() only affects lockdep, I have to ask exactly
what concurrency design you are using here...

>  		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> -			bhead->nr_records, bhead->records);
> +			bkhead->nr_records, bkhead->records);
> +
> +		kfree_bulk(bkhead->nr_records, bkhead->records);
> +		rcu_lock_release(&rcu_callback_map);
> +
> +		krcp = krc_this_cpu_lock(&flags);
> +		if (put_cached_bnode(krcp, bkhead))
> +			bkhead = NULL;
> +		krc_this_cpu_unlock(krcp, flags);
> +
> +		if (bkhead)
> +			free_page((unsigned long) bkhead);
> +
> +		cond_resched_tasks_rcu_qs();
> +	}
> +
> +	/* vmalloc()/vfree() channel. */
> +	for (; bvhead; bvhead = bnext) {
> +		bnext = bvhead->next;
> +		debug_rcu_bhead_unqueue(bvhead);
>  
> -		kfree_bulk(bhead->nr_records, bhead->records);
> +		rcu_lock_acquire(&rcu_callback_map);

And the same here.

> +		for (i = 0; i < bvhead->nr_records; i++) {
> +			trace_rcu_invoke_kfree_callback(rcu_state.name,
> +				(struct rcu_head *) bvhead->records[i], 0);
> +			vfree(bvhead->records[i]);
> +		}
>  		rcu_lock_release(&rcu_callback_map);
>  
>  		krcp = krc_this_cpu_lock(&flags);
> -		if (put_cached_bnode(krcp, bhead))
> -			bhead = NULL;
> +		if (put_cached_bnode(krcp, bvhead))
> +			bvhead = NULL;
>  		krc_this_cpu_unlock(krcp, flags);
>  
> -		if (bhead)
> -			free_page((unsigned long) bhead);
> +		if (bvhead)
> +			free_page((unsigned long) bvhead);
>  
>  		cond_resched_tasks_rcu_qs();
>  	}
> @@ -3047,7 +3080,7 @@ static void kfree_rcu_work(struct work_struct *work)
>  		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
>  
>  		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
> -			kfree(ptr);
> +			kvfree(ptr);
>  
>  		rcu_lock_release(&rcu_callback_map);
>  		cond_resched_tasks_rcu_qs();
> @@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
>  		krwp = &(krcp->krw_arr[i]);
>  
>  		/*
> -		 * Try to detach bhead or head and attach it over any
> +		 * Try to detach bkvhead or head and attach it over any
>  		 * available corresponding free channel. It can be that
>  		 * a previous RCU batch is in progress, it means that
>  		 * immediately to queue another one is not possible so
>  		 * return false to tell caller to retry.
>  		 */
> -		if ((krcp->bhead && !krwp->bhead_free) ||
> +		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
> +			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
>  				(krcp->head && !krwp->head_free)) {
> -			/* Channel 1. */
> -			if (!krwp->bhead_free) {
> -				krwp->bhead_free = krcp->bhead;
> -				krcp->bhead = NULL;
> +			/*
> +			 * Channel 1 corresponds to SLAB ptrs.
> +			 */
> +			if (!krwp->bkvhead_free[0]) {
> +				krwp->bkvhead_free[0] = krcp->bkvhead[0];
> +				krcp->bkvhead[0] = NULL;
>  			}
>  
> -			/* Channel 2. */
> +			/*
> +			 * Channel 2 corresponds to vmalloc ptrs.
> +			 */
> +			if (!krwp->bkvhead_free[1]) {
> +				krwp->bkvhead_free[1] = krcp->bkvhead[1];
> +				krcp->bkvhead[1] = NULL;
> +			}

Why not a "for" loop here?  Duplicate code is most certainly not what
we want, as it can cause all sorts of trouble down the road.

							Thanx, Paul

> +			/*
> +			 * Channel 3 corresponds to emergency path.
> +			 */
>  			if (!krwp->head_free) {
>  				krwp->head_free = krcp->head;
>  				krcp->head = NULL;
> @@ -3095,16 +3141,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
>  			WRITE_ONCE(krcp->count, 0);
>  
>  			/*
> -			 * One work is per one batch, so there are two "free channels",
> -			 * "bhead_free" and "head_free" the batch can handle. It can be
> -			 * that the work is in the pending state when two channels have
> -			 * been detached following each other, one by one.
> +			 * One work is per one batch, so there are three
> +			 * "free channels", the batch can handle. It can
> +			 * be that the work is in the pending state when
> +			 * channels have been detached following by each
> +			 * other.
>  			 */
>  			queue_rcu_work(system_wq, &krwp->rcu_work);
>  		}
>  
>  		/* Repeat if any "free" corresponding channel is still busy. */
> -		if (krcp->bhead || krcp->head)
> +		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
>  			repeat = true;
>  	}
>  
> @@ -3146,23 +3193,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  }
>  
>  static inline bool
> -kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> -	struct rcu_head *head, rcu_callback_t func)
> +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
>  {
> -	struct kfree_rcu_bulk_data *bnode;
> +	struct kvfree_rcu_bulk_data *bnode;
> +	int idx;
>  
>  	if (unlikely(!krcp->initialized))
>  		return false;
>  
>  	lockdep_assert_held(&krcp->lock);
> +	idx = !!is_vmalloc_addr(ptr);
>  
>  	/* Check if a new block is required. */
> -	if (!krcp->bhead ||
> -			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> +	if (!krcp->bkvhead[idx] ||
> +			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
>  		bnode = get_cached_bnode(krcp);
>  		if (!bnode) {
> -			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> -
>  			/*
>  			 * To keep this path working on raw non-preemptible
>  			 * sections, prevent the optional entry into the
> @@ -3175,7 +3221,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
>  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
>  				return false;
>  
> -			bnode = (struct kfree_rcu_bulk_data *)
> +			bnode = (struct kvfree_rcu_bulk_data *)
>  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
>  		}
>  
> @@ -3185,30 +3231,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
>  
>  		/* Initialize the new block. */
>  		bnode->nr_records = 0;
> -		bnode->next = krcp->bhead;
> +		bnode->next = krcp->bkvhead[idx];
>  
>  		/* Attach it to the head. */
> -		krcp->bhead = bnode;
> +		krcp->bkvhead[idx] = bnode;
>  	}
>  
>  	/* Finally insert. */
> -	krcp->bhead->records[krcp->bhead->nr_records++] =
> -		(void *) head - (unsigned long) func;
> +	krcp->bkvhead[idx]->records
> +		[krcp->bkvhead[idx]->nr_records++] = ptr;
>  
>  	return true;
>  }
>  
>  /*
> - * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
> - * period. Please note there are two paths are maintained, one is the main one
> - * that uses kfree_bulk() interface and second one is emergency one, that is
> - * used only when the main path can not be maintained temporary, due to memory
> - * pressure.
> + * Queue a request for lazy invocation of appropriate free routine after a
> + * grace period. Please note there are three paths are maintained, two are the
> + * main ones that use array of pointers interface and third one is emergency
> + * one, that is used only when the main path can not be maintained temporary,
> + * due to memory pressure.
>   *
>   * Each kfree_call_rcu() request is added to a batch. The batch will be drained
>   * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
>   * be free'd in workqueue context. This allows us to: batch requests together to
> - * reduce the number of grace periods during heavy kfree_rcu() load.
> + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
>   */
>  void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  {
> @@ -3231,7 +3277,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	 * Under high memory pressure GFP_NOWAIT can fail,
>  	 * in that case the emergency path is maintained.
>  	 */
> -	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
> +	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
>  		head->func = func;
>  		head->next = krcp->head;
>  		krcp->head = head;
> @@ -4212,7 +4258,7 @@ static void __init kfree_rcu_batch_init(void)
>  
>  	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> -		struct kfree_rcu_bulk_data *bnode;
> +		struct kvfree_rcu_bulk_data *bnode;
>  
>  		for (i = 0; i < KFREE_N_BATCHES; i++) {
>  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
> @@ -4220,7 +4266,7 @@ static void __init kfree_rcu_batch_init(void)
>  		}
>  
>  		for (i = 0; i < rcu_min_cached_objs; i++) {
> -			bnode = (struct kfree_rcu_bulk_data *)
> +			bnode = (struct kvfree_rcu_bulk_data *)
>  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
>  
>  			if (bnode)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description
  2020-04-28 20:58 ` [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description Uladzislau Rezki (Sony)
@ 2020-05-01 22:25   ` Paul E. McKenney
  2020-05-04 12:44     ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 22:25 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:58:49PM +0200, Uladzislau Rezki (Sony) wrote:
> Document the rcutree.rcu_min_cached_objs sysfs kernel parameter.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Could you please combine this wtih the patch that created this sysfs
parameter?

							Thanx, Paul

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 828ff975fbc6..b2b7022374af 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3977,6 +3977,14 @@
>  			latencies, which will choose a value aligned
>  			with the appropriate hardware boundaries.
>  
> +	rcutree.rcu_min_cached_objs= [KNL]
> +			Minimum number of objects which are cached and
> +			maintained per one CPU. Object size is equal
> +			to PAGE_SIZE. The cache allows to reduce the
> +			pressure to page allocator, also it makes the
> +			whole algorithm to behave better in low memory
> +			condition.
> +
>  	rcutree.jiffies_till_first_fqs= [KNL]
>  			Set delay from grace-period initialization to
>  			first attempt to force quiescent states.
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-04-28 20:58 ` [PATCH 19/24] rcu/tree: Support reclaim for head-less object Uladzislau Rezki (Sony)
@ 2020-05-01 22:39   ` Paul E. McKenney
  2020-05-04  0:12     ` Joel Fernandes
  2020-05-04 12:57     ` Uladzislau Rezki
  0 siblings, 2 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 22:39 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:58:58PM +0200, Uladzislau Rezki (Sony) wrote:
> Update the kvfree_call_rcu() with head-less support, it
> means an object without any rcu_head structure can be
> reclaimed after GP.
> 
> To store pointers there are two chain-arrays maintained
> one for SLAB and another one is for vmalloc. Both types
> of objects(head-less variant and regular one) are placed
> there based on the type.
> 
> It can be that maintaining of arrays becomes impossible
> due to high memory pressure. For such reason there is an
> emergency path. In that case objects with rcu_head inside
> are just queued building one way list. Later on that list
> is drained.
> 
> As for head-less variant. Such objects do not have any
> rcu_head helper inside. Thus it is dynamically attached.
> As a result an object consists of back-pointer and regular
> rcu_head. It implies that emergency path can detect such
> object type, therefore they are tagged. So a back-pointer
> could be freed as well as dynamically attached wrapper.
> 
> Even though such approach requires dynamic memory it needs
> only sizeof(unsigned long *) + sizeof(struct rcu_head) bytes,
> thus SLAB is used to obtain it. Finally if attaching of the
> rcu_head and queuing get failed, the current context has
> to follow might_sleep() annotation, thus below steps could
> be applied:
>    a) wait until a grace period has elapsed;
>    b) direct inlining of the kvfree() call.
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 98 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 51726e4c3b4d..501cac02146d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3072,15 +3072,31 @@ static void kfree_rcu_work(struct work_struct *work)
>  	 */
>  	for (; head; head = next) {
>  		unsigned long offset = (unsigned long)head->func;
> -		void *ptr = (void *)head - offset;
> +		bool headless;
> +		void *ptr;
>  
>  		next = head->next;
> +
> +		/* We tag the headless object, if so adjust offset. */
> +		headless = (((unsigned long) head - offset) & BIT(0));
> +		if (headless)
> +			offset -= 1;
> +
> +		ptr = (void *) head - offset;
> +
>  		debug_rcu_head_unqueue((struct rcu_head *)ptr);
>  		rcu_lock_acquire(&rcu_callback_map);
>  		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
>  
> -		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> +		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) {
> +			/*
> +			 * If headless free the back-pointer first.
> +			 */
> +			if (headless)
> +				kvfree((void *) *((unsigned long *) ptr));
> +
>  			kvfree(ptr);
> +		}
>  
>  		rcu_lock_release(&rcu_callback_map);
>  		cond_resched_tasks_rcu_qs();
> @@ -3221,6 +3237,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
>  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
>  				return false;
>  
> +			/*
> +			 * TODO: For one argument of kvfree_rcu() we can
> +			 * drop the lock and get the page in sleepable
> +			 * context. That would allow to maintain an array
> +			 * for the CONFIG_PREEMPT_RT as well. Thus we could
> +			 * get rid of dynamic rcu_head attaching code.
> +			 */
>  			bnode = (struct kvfree_rcu_bulk_data *)
>  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
>  		}
> @@ -3244,6 +3267,23 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
>  	return true;
>  }
>  
> +static inline struct rcu_head *
> +attach_rcu_head_to_object(void *obj)
> +{
> +	unsigned long *ptr;
> +
> +	ptr = kmalloc(sizeof(unsigned long *) +
> +			sizeof(struct rcu_head), GFP_NOWAIT |
> +				__GFP_RECLAIM |	/* can do direct reclaim. */
> +				__GFP_NORETRY |	/* only lightweight one.  */
> +				__GFP_NOWARN);	/* no failure reports. */

Again, let's please not do this single-pointer-sized allocation.  If
a full page is not available and this is a single-argument kfree_rcu(),
just call synchronize_rcu() and then free the object directly.

It should not be -that- hard to adjust locking for CONFIG_PREEMPT_RT!
For example, have some kind of reservation protocol so that a task
that drops the lock can retry the page allocation and be sure of having
a place to put it.  This might entail making CONFIG_PREEMPT_RT reserve
more pages per CPU.  Or maybe that would not be necessary.

							Thanx, Paul

> +	if (!ptr)
> +		return NULL;
> +
> +	ptr[0] = (unsigned long) obj;
> +	return ((struct rcu_head *) ++ptr);
> +}
> +
>  /*
>   * Queue a request for lazy invocation of appropriate free routine after a
>   * grace period. Please note there are three paths are maintained, two are the
> @@ -3260,16 +3300,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  {
>  	unsigned long flags;
>  	struct kfree_rcu_cpu *krcp;
> +	bool success;
>  	void *ptr;
>  
> +	if (head) {
> +		ptr = (void *) head - (unsigned long) func;
> +	} else {
> +		/*
> +		 * Please note there is a limitation for the head-less
> +		 * variant, that is why there is a clear rule for such
> +		 * objects:
> +		 *
> +		 * it can be used from might_sleep() context only. For
> +		 * other places please embed an rcu_head to your data.
> +		 */
> +		might_sleep();
> +		ptr = (unsigned long *) func;
> +	}
> +
>  	krcp = krc_this_cpu_lock(&flags);
> -	ptr = (void *)head - (unsigned long)func;
>  
>  	/* Queue the object but don't yet schedule the batch. */
>  	if (debug_rcu_head_queue(ptr)) {
>  		/* Probable double kfree_rcu(), just leak. */
>  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
>  			  __func__, head);
> +
> +		/* Mark as success and leave. */
> +		success = true;
>  		goto unlock_return;
>  	}
>  
> @@ -3277,10 +3335,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	 * Under high memory pressure GFP_NOWAIT can fail,
>  	 * in that case the emergency path is maintained.
>  	 */
> -	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> +	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
> +	if (!success) {
> +		if (head == NULL) {
> +			/*
> +			 * Headless(one argument kvfree_rcu()) can sleep.
> +			 * Drop the lock and tack it back. So it can do
> +			 * direct lightweight reclaim.
> +			 */
> +			krc_this_cpu_unlock(krcp, flags);
> +			head = attach_rcu_head_to_object(ptr);
> +			krcp = krc_this_cpu_lock(&flags);
> +
> +			if (head == NULL)
> +				goto unlock_return;
> +
> +			/*
> +			 * Tag the headless object. Such objects have a
> +			 * back-pointer to the original allocated memory,
> +			 * that has to be freed as well as dynamically
> +			 * attached wrapper/head.
> +			 */
> +			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
> +		}
> +
>  		head->func = func;
>  		head->next = krcp->head;
>  		krcp->head = head;
> +		success = true;
>  	}
>  
>  	WRITE_ONCE(krcp->count, krcp->count + 1);
> @@ -3294,6 +3376,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  
>  unlock_return:
>  	krc_this_cpu_unlock(krcp, flags);
> +
> +	/*
> +	 * High memory pressure, so inline kvfree() after
> +	 * synchronize_rcu(). We can do it from might_sleep()
> +	 * context only, so the current CPU can pass the QS
> +	 * state.
> +	 */
> +	if (!success) {
> +		debug_rcu_head_unqueue(ptr);
> +		synchronize_rcu();
> +		kvfree(ptr);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(kvfree_call_rcu);
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment
  2020-04-28 20:58 ` [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment Uladzislau Rezki (Sony)
@ 2020-05-01 23:00   ` Paul E. McKenney
  2020-05-04  0:24     ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 23:00 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:58:59PM +0200, Uladzislau Rezki (Sony) wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> Handle cases where the the object being kvfree_rcu()'d is not aligned by
> 2-byte boundaries.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 501cac02146d..649bad7ad0f0 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2877,6 +2877,9 @@ struct kvfree_rcu_bulk_data {
>  #define KVFREE_BULK_MAX_ENTR \
>  	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
>  
> +/* Encoding the offset of a fake rcu_head to indicate the head is a wrapper. */
> +#define RCU_HEADLESS_KFREE BIT(31)

Did I miss the check for freeing something larger than 2GB?  Or is this
impossible, even on systems with many terabytes of physical memory?
Even if it is currently impossible, what prevents it from suddenly
becoming all too possible at some random point in the future?  If you
think that this will never happen, please keep in mind that the first
time I heard "640K ought to be enough for anybody", it sounded eminently
reasonable to me.

Besides...

Isn't the offset in question the offset of an rcu_head struct within
the enclosing structure?  If so, why not keep the current requirement
that this be at least 16-bit aligned, especially given that some work
is required to make that alignment less than pointer sized?  Then you
can continue using bit 0.

This alignment requirement is included in the RCU requirements
documentation and is enforced within the __call_rcu() function.

So let's leave this at bit 0.

							Thanx, Paul

>  /**
>   * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
>   * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
> @@ -3078,9 +3081,9 @@ static void kfree_rcu_work(struct work_struct *work)
>  		next = head->next;
>  
>  		/* We tag the headless object, if so adjust offset. */
> -		headless = (((unsigned long) head - offset) & BIT(0));
> +		headless = !!(offset & RCU_HEADLESS_KFREE);
>  		if (headless)
> -			offset -= 1;
> +			offset &= ~(RCU_HEADLESS_KFREE);
>  
>  		ptr = (void *) head - offset;
>  
> @@ -3356,7 +3359,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  			 * that has to be freed as well as dynamically
>  			 * attached wrapper/head.
>  			 */
> -			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
> +			func = (rcu_callback_t)(sizeof(unsigned long *) | RCU_HEADLESS_KFREE);
>  		}
>  
>  		head->func = func;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header
  2020-04-28 20:59 ` [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header Uladzislau Rezki (Sony)
@ 2020-05-01 23:03   ` Paul E. McKenney
  2020-05-04 12:45     ` Uladzislau Rezki
  2020-05-06 18:29     ` Uladzislau Rezki
  0 siblings, 2 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 23:03 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:59:00PM +0200, Uladzislau Rezki (Sony) wrote:
> Move inlined kvfree_call_rcu() function out of the
> header file. This step is a preparation for head-less
> support.
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  include/linux/rcutiny.h | 6 +-----
>  kernel/rcu/tiny.c       | 6 ++++++
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 0c6315c4a0fe..7eb66909ae1b 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -34,11 +34,7 @@ static inline void synchronize_rcu_expedited(void)
>  	synchronize_rcu();
>  }
>  
> -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> -{
> -	call_rcu(head, func);
> -}
> -
> +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
>  void rcu_qs(void);
>  
>  static inline void rcu_softirq_qs(void)
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index aa897c3f2e92..508c82faa45c 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -177,6 +177,12 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  }
>  EXPORT_SYMBOL_GPL(call_rcu);
>  
> +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> +{
> +	call_rcu(head, func);
> +}
> +EXPORT_SYMBOL_GPL(kvfree_call_rcu);

This increases the size of Tiny RCU.  Plus in Tiny RCU, the overhead of
synchronize_rcu() is exactly zero.  So why not make the single-argument
kvfree_call_rcu() just unconditionally do synchronize_rcu() followed by
kvfree() or whatever?  That should go just fine into the header file.

							Thanx, Paul

>  void __init rcu_init(void)
>  {
>  	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 22/24] rcu/tiny: support reclaim for head-less object
  2020-04-28 20:59 ` [PATCH 22/24] rcu/tiny: support reclaim for head-less object Uladzislau Rezki (Sony)
@ 2020-05-01 23:06   ` Paul E. McKenney
  2020-05-04  0:27     ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-01 23:06 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Tue, Apr 28, 2020 at 10:59:01PM +0200, Uladzislau Rezki (Sony) wrote:
> Make a kvfree_call_rcu() function to support head-less
> freeing. Same as for tree-RCU, for such purpose we store
> pointers in array. SLAB and vmalloc ptrs. are mixed and
> coexist together.
> 
> Under high memory pressure it can be that maintaining of
> arrays becomes impossible. Objects with an rcu_head are
> released via call_rcu(). When it comes to the head-less
> variant, the kvfree() call is directly inlined, i.e. we
> do the same as for tree-RCU:
>     a) wait until a grace period has elapsed;
>     b) direct inlining of the kvfree() call.
> 
> Thus the current context has to follow might_sleep()
> annotation. Also please note that for tiny-RCU any
> call of synchronize_rcu() is actually a quiescent
> state, therefore (a) does nothing.

Please, please, please just do synchronize_rcu() followed by kvfree()
for single-argument kfree_rcu() and friends in Tiny RCU.

Way simpler and probably way faster as well.  And given that Tiny RCU
runs only on uniprocessor systems, the complexity probably is buying
you very little, if anything.

							Thanx, Paul

> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  kernel/rcu/tiny.c | 157 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 156 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index 508c82faa45c..b1c31a935db9 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -40,6 +40,29 @@ static struct rcu_ctrlblk rcu_ctrlblk = {
>  	.curtail	= &rcu_ctrlblk.rcucblist,
>  };
>  
> +/* Can be common with tree-RCU. */
> +#define KVFREE_DRAIN_JIFFIES (HZ / 50)
> +
> +/* Can be common with tree-RCU. */
> +struct kvfree_rcu_bulk_data {
> +	unsigned long nr_records;
> +	struct kvfree_rcu_bulk_data *next;
> +	void *records[];
> +};
> +
> +/* Can be common with tree-RCU. */
> +#define KVFREE_BULK_MAX_ENTR \
> +	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
> +
> +static struct kvfree_rcu_bulk_data *kvhead;
> +static struct kvfree_rcu_bulk_data *kvhead_free;
> +static struct kvfree_rcu_bulk_data *kvcache;
> +
> +static DEFINE_STATIC_KEY_FALSE(rcu_init_done);
> +static struct delayed_work monitor_work;
> +static struct rcu_work rcu_work;
> +static bool monitor_todo;
> +
>  void rcu_barrier(void)
>  {
>  	wait_rcu_gp(call_rcu);
> @@ -177,9 +200,137 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
>  }
>  EXPORT_SYMBOL_GPL(call_rcu);
>  
> +static inline bool
> +kvfree_call_rcu_add_ptr_to_bulk(void *ptr)
> +{
> +	struct kvfree_rcu_bulk_data *bnode;
> +
> +	if (!kvhead || kvhead->nr_records == KVFREE_BULK_MAX_ENTR) {
> +		bnode = xchg(&kvcache, NULL);
> +		if (!bnode)
> +			bnode = (struct kvfree_rcu_bulk_data *)
> +				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> +
> +		if (unlikely(!bnode))
> +			return false;
> +
> +		/* Initialize the new block. */
> +		bnode->nr_records = 0;
> +		bnode->next = kvhead;
> +
> +		/* Attach it to the bvhead. */
> +		kvhead = bnode;
> +	}
> +
> +	/* Done. */
> +	kvhead->records[kvhead->nr_records++] = ptr;
> +	return true;
> +}
> +
> +static void
> +kvfree_rcu_work(struct work_struct *work)
> +{
> +	struct kvfree_rcu_bulk_data *kvhead_tofree, *next;
> +	unsigned long flags;
> +	int i;
> +
> +	local_irq_save(flags);
> +	kvhead_tofree = kvhead_free;
> +	kvhead_free = NULL;
> +	local_irq_restore(flags);
> +
> +	/* Reclaim process. */
> +	for (; kvhead_tofree; kvhead_tofree = next) {
> +		next = kvhead_tofree->next;
> +
> +		for (i = 0; i < kvhead_tofree->nr_records; i++) {
> +			debug_rcu_head_unqueue((struct rcu_head *)
> +				kvhead_tofree->records[i]);
> +			kvfree(kvhead_tofree->records[i]);
> +		}
> +
> +		if (cmpxchg(&kvcache, NULL, kvhead_tofree))
> +			free_page((unsigned long) kvhead_tofree);
> +	}
> +}
> +
> +static inline bool
> +queue_kvfree_rcu_work(void)
> +{
> +	/* Check if the free channel is available. */
> +	if (kvhead_free)
> +		return false;
> +
> +	kvhead_free = kvhead;
> +	kvhead = NULL;
> +
> +	/*
> +	 * Queue the job for memory reclaim after GP.
> +	 */
> +	queue_rcu_work(system_wq, &rcu_work);
> +	return true;
> +}
> +
> +static void kvfree_rcu_monitor(struct work_struct *work)
> +{
> +	unsigned long flags;
> +	bool queued;
> +
> +	local_irq_save(flags);
> +	queued = queue_kvfree_rcu_work();
> +	if (queued)
> +		/* Success. */
> +		monitor_todo = false;
> +	local_irq_restore(flags);
> +
> +	/*
> +	 * If previous RCU reclaim process is still in progress,
> +	 * schedule the work one more time to try again later.
> +	 */
> +	if (monitor_todo)
> +		schedule_delayed_work(&monitor_work,
> +			KVFREE_DRAIN_JIFFIES);
> +}
> +
>  void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  {
> -	call_rcu(head, func);
> +	unsigned long flags;
> +	bool success;
> +	void *ptr;
> +
> +	if (head) {
> +		ptr = (void *) head - (unsigned long) func;
> +	} else {
> +		might_sleep();
> +		ptr = (void *) func;
> +	}
> +
> +	if (debug_rcu_head_queue(ptr)) {
> +		/* Probable double free, just leak. */
> +		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> +			  __func__, head);
> +		return;
> +	}
> +
> +	local_irq_save(flags);
> +	success = kvfree_call_rcu_add_ptr_to_bulk(ptr);
> +	if (static_branch_likely(&rcu_init_done)) {
> +		if (success && !monitor_todo) {
> +			monitor_todo = true;
> +			schedule_delayed_work(&monitor_work,
> +				KVFREE_DRAIN_JIFFIES);
> +		}
> +	}
> +	local_irq_restore(flags);
> +
> +	if (!success) {
> +		if (!head) {
> +			synchronize_rcu();
> +			kvfree(ptr);
> +		} else {
> +			call_rcu(head, func);
> +		}
> +	}
>  }
>  EXPORT_SYMBOL_GPL(kvfree_call_rcu);
>  
> @@ -188,4 +339,8 @@ void __init rcu_init(void)
>  	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
>  	rcu_early_boot_tests();
>  	srcu_init();
> +
> +	INIT_DELAYED_WORK(&monitor_work, kvfree_rcu_monitor);
> +	INIT_RCU_WORK(&rcu_work, kvfree_rcu_work);
> +	static_branch_enable(&rcu_init_done);
>  }
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-01 21:37   ` Paul E. McKenney
@ 2020-05-03 23:42     ` Joel Fernandes
  2020-05-04  0:20       ` Paul E. McKenney
  2020-05-04 14:25     ` Uladzislau Rezki
  1 sibling, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-03 23:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 02:37:53PM -0700, Paul E. McKenney wrote:
[...]
> > @@ -2993,41 +2994,73 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> >  static void kfree_rcu_work(struct work_struct *work)
> >  {
> >  	unsigned long flags;
> > +	struct kvfree_rcu_bulk_data *bkhead, *bvhead, *bnext;
> >  	struct rcu_head *head, *next;
> > -	struct kfree_rcu_bulk_data *bhead, *bnext;
> >  	struct kfree_rcu_cpu *krcp;
> >  	struct kfree_rcu_cpu_work *krwp;
> > +	int i;
> >  
> >  	krwp = container_of(to_rcu_work(work),
> >  			    struct kfree_rcu_cpu_work, rcu_work);
> >  	krcp = krwp->krcp;
> > +
> >  	raw_spin_lock_irqsave(&krcp->lock, flags);
> > +	/* Channel 1. */
> > +	bkhead = krwp->bkvhead_free[0];
> > +	krwp->bkvhead_free[0] = NULL;
> > +
> > +	/* Channel 2. */
> > +	bvhead = krwp->bkvhead_free[1];
> > +	krwp->bkvhead_free[1] = NULL;
> > +
> > +	/* Channel 3. */
> >  	head = krwp->head_free;
> >  	krwp->head_free = NULL;
> > -	bhead = krwp->bhead_free;
> > -	krwp->bhead_free = NULL;
> >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> >  
> > -	/* "bhead" is now private, so traverse locklessly. */
> > -	for (; bhead; bhead = bnext) {
> > -		bnext = bhead->next;
> > -
> > -		debug_rcu_bhead_unqueue(bhead);
> > +	/* kmalloc()/kfree() channel. */
> > +	for (; bkhead; bkhead = bnext) {
> > +		bnext = bkhead->next;
> > +		debug_rcu_bhead_unqueue(bkhead);
> >  
> >  		rcu_lock_acquire(&rcu_callback_map);
> 
> Given that rcu_lock_acquire() only affects lockdep, I have to ask exactly
> what concurrency design you are using here...

I believe the rcu_callback_map usage above follows a similar pattern from old
code where the rcu_callback_map is acquired before doing the kfree.

static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
{
        rcu_callback_t f;
        unsigned long offset = (unsigned long)head->func;

        rcu_lock_acquire(&rcu_callback_map);
        if (__is_kfree_rcu_offset(offset)) {
                trace_rcu_invoke_kfree_callback(rn, head, offset);
                kfree((void *)head - offset);
                rcu_lock_release(&rcu_callback_map);

So when kfree_rcu() was rewritten, the rcu_lock_acquire() of rcu_callback_map
got carried.

I believe it is for detecting recursion where we possibly try to free
RCU-held memory while already freeing memory. Or was there anoher purpose of
the rcu_callback_map?

thanks,

 - Joel


> >  		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > -			bhead->nr_records, bhead->records);
> > +			bkhead->nr_records, bkhead->records);
> > +
> > +		kfree_bulk(bkhead->nr_records, bkhead->records);
> > +		rcu_lock_release(&rcu_callback_map);
> > +
> > +		krcp = krc_this_cpu_lock(&flags);
> > +		if (put_cached_bnode(krcp, bkhead))
> > +			bkhead = NULL;
> > +		krc_this_cpu_unlock(krcp, flags);
> > +
> > +		if (bkhead)
> > +			free_page((unsigned long) bkhead);
> > +
> > +		cond_resched_tasks_rcu_qs();
> > +	}
> > +
> > +	/* vmalloc()/vfree() channel. */
> > +	for (; bvhead; bvhead = bnext) {
> > +		bnext = bvhead->next;
> > +		debug_rcu_bhead_unqueue(bvhead);
> >  
> > -		kfree_bulk(bhead->nr_records, bhead->records);
> > +		rcu_lock_acquire(&rcu_callback_map);
> 
> And the same here.
> 
> > +		for (i = 0; i < bvhead->nr_records; i++) {
> > +			trace_rcu_invoke_kfree_callback(rcu_state.name,
> > +				(struct rcu_head *) bvhead->records[i], 0);
> > +			vfree(bvhead->records[i]);
> > +		}
> >  		rcu_lock_release(&rcu_callback_map);
> >  
> >  		krcp = krc_this_cpu_lock(&flags);
> > -		if (put_cached_bnode(krcp, bhead))
> > -			bhead = NULL;
> > +		if (put_cached_bnode(krcp, bvhead))
> > +			bvhead = NULL;
> >  		krc_this_cpu_unlock(krcp, flags);
> >  
> > -		if (bhead)
> > -			free_page((unsigned long) bhead);
> > +		if (bvhead)
> > +			free_page((unsigned long) bvhead);
> >  
> >  		cond_resched_tasks_rcu_qs();
> >  	}
> > @@ -3047,7 +3080,7 @@ static void kfree_rcu_work(struct work_struct *work)
> >  		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
> >  
> >  		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
> > -			kfree(ptr);
> > +			kvfree(ptr);
> >  
> >  		rcu_lock_release(&rcu_callback_map);
> >  		cond_resched_tasks_rcu_qs();
> > @@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> >  		krwp = &(krcp->krw_arr[i]);
> >  
> >  		/*
> > -		 * Try to detach bhead or head and attach it over any
> > +		 * Try to detach bkvhead or head and attach it over any
> >  		 * available corresponding free channel. It can be that
> >  		 * a previous RCU batch is in progress, it means that
> >  		 * immediately to queue another one is not possible so
> >  		 * return false to tell caller to retry.
> >  		 */
> > -		if ((krcp->bhead && !krwp->bhead_free) ||
> > +		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
> > +			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
> >  				(krcp->head && !krwp->head_free)) {
> > -			/* Channel 1. */
> > -			if (!krwp->bhead_free) {
> > -				krwp->bhead_free = krcp->bhead;
> > -				krcp->bhead = NULL;
> > +			/*
> > +			 * Channel 1 corresponds to SLAB ptrs.
> > +			 */
> > +			if (!krwp->bkvhead_free[0]) {
> > +				krwp->bkvhead_free[0] = krcp->bkvhead[0];
> > +				krcp->bkvhead[0] = NULL;
> >  			}
> >  
> > -			/* Channel 2. */
> > +			/*
> > +			 * Channel 2 corresponds to vmalloc ptrs.
> > +			 */
> > +			if (!krwp->bkvhead_free[1]) {
> > +				krwp->bkvhead_free[1] = krcp->bkvhead[1];
> > +				krcp->bkvhead[1] = NULL;
> > +			}
> 
> Why not a "for" loop here?  Duplicate code is most certainly not what
> we want, as it can cause all sorts of trouble down the road.
> 
> 							Thanx, Paul
> 
> > +			/*
> > +			 * Channel 3 corresponds to emergency path.
> > +			 */
> >  			if (!krwp->head_free) {
> >  				krwp->head_free = krcp->head;
> >  				krcp->head = NULL;
> > @@ -3095,16 +3141,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> >  			WRITE_ONCE(krcp->count, 0);
> >  
> >  			/*
> > -			 * One work is per one batch, so there are two "free channels",
> > -			 * "bhead_free" and "head_free" the batch can handle. It can be
> > -			 * that the work is in the pending state when two channels have
> > -			 * been detached following each other, one by one.
> > +			 * One work is per one batch, so there are three
> > +			 * "free channels", the batch can handle. It can
> > +			 * be that the work is in the pending state when
> > +			 * channels have been detached following by each
> > +			 * other.
> >  			 */
> >  			queue_rcu_work(system_wq, &krwp->rcu_work);
> >  		}
> >  
> >  		/* Repeat if any "free" corresponding channel is still busy. */
> > -		if (krcp->bhead || krcp->head)
> > +		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
> >  			repeat = true;
> >  	}
> >  
> > @@ -3146,23 +3193,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
> >  }
> >  
> >  static inline bool
> > -kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > -	struct rcu_head *head, rcu_callback_t func)
> > +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> >  {
> > -	struct kfree_rcu_bulk_data *bnode;
> > +	struct kvfree_rcu_bulk_data *bnode;
> > +	int idx;
> >  
> >  	if (unlikely(!krcp->initialized))
> >  		return false;
> >  
> >  	lockdep_assert_held(&krcp->lock);
> > +	idx = !!is_vmalloc_addr(ptr);
> >  
> >  	/* Check if a new block is required. */
> > -	if (!krcp->bhead ||
> > -			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> > +	if (!krcp->bkvhead[idx] ||
> > +			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> >  		bnode = get_cached_bnode(krcp);
> >  		if (!bnode) {
> > -			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> > -
> >  			/*
> >  			 * To keep this path working on raw non-preemptible
> >  			 * sections, prevent the optional entry into the
> > @@ -3175,7 +3221,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >  				return false;
> >  
> > -			bnode = (struct kfree_rcu_bulk_data *)
> > +			bnode = (struct kvfree_rcu_bulk_data *)
> >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> >  		}
> >  
> > @@ -3185,30 +3231,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> >  
> >  		/* Initialize the new block. */
> >  		bnode->nr_records = 0;
> > -		bnode->next = krcp->bhead;
> > +		bnode->next = krcp->bkvhead[idx];
> >  
> >  		/* Attach it to the head. */
> > -		krcp->bhead = bnode;
> > +		krcp->bkvhead[idx] = bnode;
> >  	}
> >  
> >  	/* Finally insert. */
> > -	krcp->bhead->records[krcp->bhead->nr_records++] =
> > -		(void *) head - (unsigned long) func;
> > +	krcp->bkvhead[idx]->records
> > +		[krcp->bkvhead[idx]->nr_records++] = ptr;
> >  
> >  	return true;
> >  }
> >  
> >  /*
> > - * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
> > - * period. Please note there are two paths are maintained, one is the main one
> > - * that uses kfree_bulk() interface and second one is emergency one, that is
> > - * used only when the main path can not be maintained temporary, due to memory
> > - * pressure.
> > + * Queue a request for lazy invocation of appropriate free routine after a
> > + * grace period. Please note there are three paths are maintained, two are the
> > + * main ones that use array of pointers interface and third one is emergency
> > + * one, that is used only when the main path can not be maintained temporary,
> > + * due to memory pressure.
> >   *
> >   * Each kfree_call_rcu() request is added to a batch. The batch will be drained
> >   * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
> >   * be free'd in workqueue context. This allows us to: batch requests together to
> > - * reduce the number of grace periods during heavy kfree_rcu() load.
> > + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
> >   */
> >  void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  {
> > @@ -3231,7 +3277,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	 * Under high memory pressure GFP_NOWAIT can fail,
> >  	 * in that case the emergency path is maintained.
> >  	 */
> > -	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
> > +	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> >  		head->func = func;
> >  		head->next = krcp->head;
> >  		krcp->head = head;
> > @@ -4212,7 +4258,7 @@ static void __init kfree_rcu_batch_init(void)
> >  
> >  	for_each_possible_cpu(cpu) {
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > -		struct kfree_rcu_bulk_data *bnode;
> > +		struct kvfree_rcu_bulk_data *bnode;
> >  
> >  		for (i = 0; i < KFREE_N_BATCHES; i++) {
> >  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
> > @@ -4220,7 +4266,7 @@ static void __init kfree_rcu_batch_init(void)
> >  		}
> >  
> >  		for (i = 0; i < rcu_min_cached_objs; i++) {
> > -			bnode = (struct kfree_rcu_bulk_data *)
> > +			bnode = (struct kvfree_rcu_bulk_data *)
> >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> >  
> >  			if (bnode)
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-01 20:52     ` Joe Perches
@ 2020-05-03 23:44       ` Joel Fernandes
  2020-05-04  0:23         ` Paul E. McKenney
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-03 23:44 UTC (permalink / raw)
  To: Joe Perches
  Cc: paulmck, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 01:52:46PM -0700, Joe Perches wrote:
> On Fri, 2020-05-01 at 12:05 -0700, Paul E. McKenney wrote:
> > On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > > with majority of commenting styles.
> []
> > on /* */ style?
> > 
> > I am (slowly) moving RCU to "//" for those reasons.  ;-)
> 
> I hope c99 comment styles are more commonly used soon too.
> checkpatch doesn't care.
> 
> Perhaps a change to coding-style.rst
> ---
>  Documentation/process/coding-style.rst | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> index acb2f1b..fee647 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -565,6 +565,11 @@ comments is a little different.
>  	 * but there is no initial almost-blank line.
>  	 */
>  
> +.. code-block:: c
> +
> +	// Single line and inline comments may also use the c99 // style
> +	// Block comments as well
> +
>  It's also important to comment data, whether they are basic types or derived
>  types.  To this end, use just one data declaration per line (no commas for
>  multiple data declarations).  This leaves you room for a small comment on each
> 
> 

Yeah that's fine with me. This patch just tries to keep it consistent. I am
Ok with either style.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-01 19:05   ` Paul E. McKenney
  2020-05-01 20:52     ` Joe Perches
@ 2020-05-03 23:52     ` Joel Fernandes
  2020-05-04  0:26       ` Paul E. McKenney
  1 sibling, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-03 23:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 12:05:55PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > with majority of commenting styles.
> > 
> > Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Hmmm...
> 
> Exactly why is three additional characters per line preferable?  Or in
> the case of block comments, either one or two additional lines, depending
> on /* */ style?

I prefer to keep the code consistent and then bulk convert it later. Its a
bit ugly to read when its mixed up with "//" and "/* */" right now. We can
convert it to // all at once later but until then it'll be good to keep it
consistent in this file IMO. When I checked the kfree_rcu() code, it had more
"/* */" than not, so this small change is less churn for now.

thanks,

 - Joel

> 
> I am (slowly) moving RCU to "//" for those reasons.  ;-)
> 
> 							Thanx, Paul
> 
> > ---
> >  kernel/rcu/tree.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index cd61649e1b00..1487af8e11e8 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3043,15 +3043,15 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> >  static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
> >  					  unsigned long flags)
> >  {
> > -	// Attempt to start a new batch.
> > +	/* Attempt to start a new batch. */
> >  	krcp->monitor_todo = false;
> >  	if (queue_kfree_rcu_work(krcp)) {
> > -		// Success! Our job is done here.
> > +		/* Success! Our job is done here. */
> >  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
> >  		return;
> >  	}
> >  
> > -	// Previous RCU batch still in progress, try again later.
> > +	/* Previous RCU batch still in progress, try again later. */
> >  	krcp->monitor_todo = true;
> >  	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > @@ -3151,14 +3151,14 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	unsigned long flags;
> >  	struct kfree_rcu_cpu *krcp;
> >  
> > -	local_irq_save(flags);	// For safely calling this_cpu_ptr().
> > +	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
> >  	krcp = this_cpu_ptr(&krc);
> >  	if (krcp->initialized)
> >  		raw_spin_lock(&krcp->lock);
> >  
> > -	// Queue the object but don't yet schedule the batch.
> > +	/* Queue the object but don't yet schedule the batch. */
> >  	if (debug_rcu_head_queue(head)) {
> > -		// Probable double kfree_rcu(), just leak.
> > +		/* Probable double kfree_rcu(), just leak. */
> >  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> >  			  __func__, head);
> >  		goto unlock_return;
> > @@ -3176,7 +3176,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  
> >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> >  
> > -	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> > +	/* Set timer to drain after KFREE_DRAIN_JIFFIES. */
> >  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> >  	    !krcp->monitor_todo) {
> >  		krcp->monitor_todo = true;
> > @@ -3722,7 +3722,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  
> >  	rcutree_affinity_setting(cpu, cpu);
> >  
> > -	// nohz_full CPUs need the tick for stop-machine to work quickly
> > +	/* nohz_full CPUs need the tick for stop-machine to work quickly */
> >  	tick_dep_set(TICK_DEP_BIT_RCU);
> >  	return 0;
> >  }
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-01 22:39   ` Paul E. McKenney
@ 2020-05-04  0:12     ` Joel Fernandes
  2020-05-04  0:28       ` Paul E. McKenney
  2020-05-04 12:57     ` Uladzislau Rezki
  1 sibling, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 03:39:09PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:58PM +0200, Uladzislau Rezki (Sony) wrote:
> > Update the kvfree_call_rcu() with head-less support, it
> > means an object without any rcu_head structure can be
> > reclaimed after GP.
> > 
> > To store pointers there are two chain-arrays maintained
> > one for SLAB and another one is for vmalloc. Both types
> > of objects(head-less variant and regular one) are placed
> > there based on the type.
> > 
> > It can be that maintaining of arrays becomes impossible
> > due to high memory pressure. For such reason there is an
> > emergency path. In that case objects with rcu_head inside
> > are just queued building one way list. Later on that list
> > is drained.
> > 
> > As for head-less variant. Such objects do not have any
> > rcu_head helper inside. Thus it is dynamically attached.
> > As a result an object consists of back-pointer and regular
> > rcu_head. It implies that emergency path can detect such
> > object type, therefore they are tagged. So a back-pointer
> > could be freed as well as dynamically attached wrapper.
> > 
> > Even though such approach requires dynamic memory it needs
> > only sizeof(unsigned long *) + sizeof(struct rcu_head) bytes,
> > thus SLAB is used to obtain it. Finally if attaching of the
> > rcu_head and queuing get failed, the current context has
> > to follow might_sleep() annotation, thus below steps could
> > be applied:
> >    a) wait until a grace period has elapsed;
> >    b) direct inlining of the kvfree() call.
> > 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 98 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 51726e4c3b4d..501cac02146d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3072,15 +3072,31 @@ static void kfree_rcu_work(struct work_struct *work)
> >  	 */
> >  	for (; head; head = next) {
> >  		unsigned long offset = (unsigned long)head->func;
> > -		void *ptr = (void *)head - offset;
> > +		bool headless;
> > +		void *ptr;
> >  
> >  		next = head->next;
> > +
> > +		/* We tag the headless object, if so adjust offset. */
> > +		headless = (((unsigned long) head - offset) & BIT(0));
> > +		if (headless)
> > +			offset -= 1;
> > +
> > +		ptr = (void *) head - offset;
> > +
> >  		debug_rcu_head_unqueue((struct rcu_head *)ptr);
> >  		rcu_lock_acquire(&rcu_callback_map);
> >  		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
> >  
> > -		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> > +		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) {
> > +			/*
> > +			 * If headless free the back-pointer first.
> > +			 */
> > +			if (headless)
> > +				kvfree((void *) *((unsigned long *) ptr));
> > +
> >  			kvfree(ptr);
> > +		}
> >  
> >  		rcu_lock_release(&rcu_callback_map);
> >  		cond_resched_tasks_rcu_qs();
> > @@ -3221,6 +3237,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >  				return false;
> >  
> > +			/*
> > +			 * TODO: For one argument of kvfree_rcu() we can
> > +			 * drop the lock and get the page in sleepable
> > +			 * context. That would allow to maintain an array
> > +			 * for the CONFIG_PREEMPT_RT as well. Thus we could
> > +			 * get rid of dynamic rcu_head attaching code.
> > +			 */
> >  			bnode = (struct kvfree_rcu_bulk_data *)
> >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> >  		}
> > @@ -3244,6 +3267,23 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> >  	return true;
> >  }
> >  
> > +static inline struct rcu_head *
> > +attach_rcu_head_to_object(void *obj)
> > +{
> > +	unsigned long *ptr;
> > +
> > +	ptr = kmalloc(sizeof(unsigned long *) +
> > +			sizeof(struct rcu_head), GFP_NOWAIT |
> > +				__GFP_RECLAIM |	/* can do direct reclaim. */
> > +				__GFP_NORETRY |	/* only lightweight one.  */
> > +				__GFP_NOWARN);	/* no failure reports. */
> 
> Again, let's please not do this single-pointer-sized allocation.  If
> a full page is not available and this is a single-argument kfree_rcu(),
> just call synchronize_rcu() and then free the object directly.

With the additional caching, lack of full page should not be very likely. I
agree we can avoid doing any allocation and just straight to
synchroize_rcu().

> It should not be -that- hard to adjust locking for CONFIG_PREEMPT_RT!
> For example, have some kind of reservation protocol so that a task
> that drops the lock can retry the page allocation and be sure of having
> a place to put it.  This might entail making CONFIG_PREEMPT_RT reserve
> more pages per CPU.  Or maybe that would not be necessary.

If we are not doing single-pointer allocation, then that would also eliminate
entering the low-level page allocator for single-pointer allocations.

Or did you mean entry into the allocator for the full-page allocations
related to the pointer array for PREEMPT_RT? Even if we skip entry into the
allocator for those, we will still have additional caching which further
reduces chances of getting a full page. In the event of such failure, we can
simply queue the rcu_head.

Thoughts?

thanks,

 - Joel

> 
> 							Thanx, Paul
> 
> > +	if (!ptr)
> > +		return NULL;
> > +
> > +	ptr[0] = (unsigned long) obj;
> > +	return ((struct rcu_head *) ++ptr);
> > +}
> > +
> >  /*
> >   * Queue a request for lazy invocation of appropriate free routine after a
> >   * grace period. Please note there are three paths are maintained, two are the
> > @@ -3260,16 +3300,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  {
> >  	unsigned long flags;
> >  	struct kfree_rcu_cpu *krcp;
> > +	bool success;
> >  	void *ptr;
> >  
> > +	if (head) {
> > +		ptr = (void *) head - (unsigned long) func;
> > +	} else {
> > +		/*
> > +		 * Please note there is a limitation for the head-less
> > +		 * variant, that is why there is a clear rule for such
> > +		 * objects:
> > +		 *
> > +		 * it can be used from might_sleep() context only. For
> > +		 * other places please embed an rcu_head to your data.
> > +		 */
> > +		might_sleep();
> > +		ptr = (unsigned long *) func;
> > +	}
> > +
> >  	krcp = krc_this_cpu_lock(&flags);
> > -	ptr = (void *)head - (unsigned long)func;
> >  
> >  	/* Queue the object but don't yet schedule the batch. */
> >  	if (debug_rcu_head_queue(ptr)) {
> >  		/* Probable double kfree_rcu(), just leak. */
> >  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> >  			  __func__, head);
> > +
> > +		/* Mark as success and leave. */
> > +		success = true;
> >  		goto unlock_return;
> >  	}
> >  
> > @@ -3277,10 +3335,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	 * Under high memory pressure GFP_NOWAIT can fail,
> >  	 * in that case the emergency path is maintained.
> >  	 */
> > -	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> > +	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
> > +	if (!success) {
> > +		if (head == NULL) {
> > +			/*
> > +			 * Headless(one argument kvfree_rcu()) can sleep.
> > +			 * Drop the lock and tack it back. So it can do
> > +			 * direct lightweight reclaim.
> > +			 */
> > +			krc_this_cpu_unlock(krcp, flags);
> > +			head = attach_rcu_head_to_object(ptr);
> > +			krcp = krc_this_cpu_lock(&flags);
> > +
> > +			if (head == NULL)
> > +				goto unlock_return;
> > +
> > +			/*
> > +			 * Tag the headless object. Such objects have a
> > +			 * back-pointer to the original allocated memory,
> > +			 * that has to be freed as well as dynamically
> > +			 * attached wrapper/head.
> > +			 */
> > +			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
> > +		}
> > +
> >  		head->func = func;
> >  		head->next = krcp->head;
> >  		krcp->head = head;
> > +		success = true;
> >  	}
> >  
> >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> > @@ -3294,6 +3376,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  
> >  unlock_return:
> >  	krc_this_cpu_unlock(krcp, flags);
> > +
> > +	/*
> > +	 * High memory pressure, so inline kvfree() after
> > +	 * synchronize_rcu(). We can do it from might_sleep()
> > +	 * context only, so the current CPU can pass the QS
> > +	 * state.
> > +	 */
> > +	if (!success) {
> > +		debug_rcu_head_unqueue(ptr);
> > +		synchronize_rcu();
> > +		kvfree(ptr);
> > +	}
> >  }
> >  EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> >  
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-03 23:42     ` Joel Fernandes
@ 2020-05-04  0:20       ` Paul E. McKenney
  2020-05-04  0:58         ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04  0:20 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 07:42:50PM -0400, Joel Fernandes wrote:
> On Fri, May 01, 2020 at 02:37:53PM -0700, Paul E. McKenney wrote:
> [...]
> > > @@ -2993,41 +2994,73 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > >  static void kfree_rcu_work(struct work_struct *work)
> > >  {
> > >  	unsigned long flags;
> > > +	struct kvfree_rcu_bulk_data *bkhead, *bvhead, *bnext;
> > >  	struct rcu_head *head, *next;
> > > -	struct kfree_rcu_bulk_data *bhead, *bnext;
> > >  	struct kfree_rcu_cpu *krcp;
> > >  	struct kfree_rcu_cpu_work *krwp;
> > > +	int i;
> > >  
> > >  	krwp = container_of(to_rcu_work(work),
> > >  			    struct kfree_rcu_cpu_work, rcu_work);
> > >  	krcp = krwp->krcp;
> > > +
> > >  	raw_spin_lock_irqsave(&krcp->lock, flags);
> > > +	/* Channel 1. */
> > > +	bkhead = krwp->bkvhead_free[0];
> > > +	krwp->bkvhead_free[0] = NULL;
> > > +
> > > +	/* Channel 2. */
> > > +	bvhead = krwp->bkvhead_free[1];
> > > +	krwp->bkvhead_free[1] = NULL;
> > > +
> > > +	/* Channel 3. */
> > >  	head = krwp->head_free;
> > >  	krwp->head_free = NULL;
> > > -	bhead = krwp->bhead_free;
> > > -	krwp->bhead_free = NULL;
> > >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > >  
> > > -	/* "bhead" is now private, so traverse locklessly. */
> > > -	for (; bhead; bhead = bnext) {
> > > -		bnext = bhead->next;
> > > -
> > > -		debug_rcu_bhead_unqueue(bhead);
> > > +	/* kmalloc()/kfree() channel. */
> > > +	for (; bkhead; bkhead = bnext) {
> > > +		bnext = bkhead->next;
> > > +		debug_rcu_bhead_unqueue(bkhead);
> > >  
> > >  		rcu_lock_acquire(&rcu_callback_map);
> > 
> > Given that rcu_lock_acquire() only affects lockdep, I have to ask exactly
> > what concurrency design you are using here...
> 
> I believe the rcu_callback_map usage above follows a similar pattern from old
> code where the rcu_callback_map is acquired before doing the kfree.
> 
> static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
> {
>         rcu_callback_t f;
>         unsigned long offset = (unsigned long)head->func;
> 
>         rcu_lock_acquire(&rcu_callback_map);
>         if (__is_kfree_rcu_offset(offset)) {
>                 trace_rcu_invoke_kfree_callback(rn, head, offset);
>                 kfree((void *)head - offset);
>                 rcu_lock_release(&rcu_callback_map);
> 
> So when kfree_rcu() was rewritten, the rcu_lock_acquire() of rcu_callback_map
> got carried.
> 
> I believe it is for detecting recursion where we possibly try to free
> RCU-held memory while already freeing memory. Or was there anoher purpose of
> the rcu_callback_map?

It looks like rcu_callback_map was been added by 77a40f97030 ("rcu:
Remove kfree_rcu() special casing and lazy-callback handling").  Which
was less than a year ago.  ;-)

Hmmm...  This would be a good way to allow lockdep to tell you that you
are running within an RCU callback on the one hand are are reclaiming
due to kfree_rcu() on the other.  Was that the intent?  If so, a comment
seems necessary.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > >  		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > > -			bhead->nr_records, bhead->records);
> > > +			bkhead->nr_records, bkhead->records);
> > > +
> > > +		kfree_bulk(bkhead->nr_records, bkhead->records);
> > > +		rcu_lock_release(&rcu_callback_map);
> > > +
> > > +		krcp = krc_this_cpu_lock(&flags);
> > > +		if (put_cached_bnode(krcp, bkhead))
> > > +			bkhead = NULL;
> > > +		krc_this_cpu_unlock(krcp, flags);
> > > +
> > > +		if (bkhead)
> > > +			free_page((unsigned long) bkhead);
> > > +
> > > +		cond_resched_tasks_rcu_qs();
> > > +	}
> > > +
> > > +	/* vmalloc()/vfree() channel. */
> > > +	for (; bvhead; bvhead = bnext) {
> > > +		bnext = bvhead->next;
> > > +		debug_rcu_bhead_unqueue(bvhead);
> > >  
> > > -		kfree_bulk(bhead->nr_records, bhead->records);
> > > +		rcu_lock_acquire(&rcu_callback_map);
> > 
> > And the same here.
> > 
> > > +		for (i = 0; i < bvhead->nr_records; i++) {
> > > +			trace_rcu_invoke_kfree_callback(rcu_state.name,
> > > +				(struct rcu_head *) bvhead->records[i], 0);
> > > +			vfree(bvhead->records[i]);
> > > +		}
> > >  		rcu_lock_release(&rcu_callback_map);
> > >  
> > >  		krcp = krc_this_cpu_lock(&flags);
> > > -		if (put_cached_bnode(krcp, bhead))
> > > -			bhead = NULL;
> > > +		if (put_cached_bnode(krcp, bvhead))
> > > +			bvhead = NULL;
> > >  		krc_this_cpu_unlock(krcp, flags);
> > >  
> > > -		if (bhead)
> > > -			free_page((unsigned long) bhead);
> > > +		if (bvhead)
> > > +			free_page((unsigned long) bvhead);
> > >  
> > >  		cond_resched_tasks_rcu_qs();
> > >  	}
> > > @@ -3047,7 +3080,7 @@ static void kfree_rcu_work(struct work_struct *work)
> > >  		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
> > >  
> > >  		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
> > > -			kfree(ptr);
> > > +			kvfree(ptr);
> > >  
> > >  		rcu_lock_release(&rcu_callback_map);
> > >  		cond_resched_tasks_rcu_qs();
> > > @@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > >  		krwp = &(krcp->krw_arr[i]);
> > >  
> > >  		/*
> > > -		 * Try to detach bhead or head and attach it over any
> > > +		 * Try to detach bkvhead or head and attach it over any
> > >  		 * available corresponding free channel. It can be that
> > >  		 * a previous RCU batch is in progress, it means that
> > >  		 * immediately to queue another one is not possible so
> > >  		 * return false to tell caller to retry.
> > >  		 */
> > > -		if ((krcp->bhead && !krwp->bhead_free) ||
> > > +		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
> > > +			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
> > >  				(krcp->head && !krwp->head_free)) {
> > > -			/* Channel 1. */
> > > -			if (!krwp->bhead_free) {
> > > -				krwp->bhead_free = krcp->bhead;
> > > -				krcp->bhead = NULL;
> > > +			/*
> > > +			 * Channel 1 corresponds to SLAB ptrs.
> > > +			 */
> > > +			if (!krwp->bkvhead_free[0]) {
> > > +				krwp->bkvhead_free[0] = krcp->bkvhead[0];
> > > +				krcp->bkvhead[0] = NULL;
> > >  			}
> > >  
> > > -			/* Channel 2. */
> > > +			/*
> > > +			 * Channel 2 corresponds to vmalloc ptrs.
> > > +			 */
> > > +			if (!krwp->bkvhead_free[1]) {
> > > +				krwp->bkvhead_free[1] = krcp->bkvhead[1];
> > > +				krcp->bkvhead[1] = NULL;
> > > +			}
> > 
> > Why not a "for" loop here?  Duplicate code is most certainly not what
> > we want, as it can cause all sorts of trouble down the road.
> > 
> > 							Thanx, Paul
> > 
> > > +			/*
> > > +			 * Channel 3 corresponds to emergency path.
> > > +			 */
> > >  			if (!krwp->head_free) {
> > >  				krwp->head_free = krcp->head;
> > >  				krcp->head = NULL;
> > > @@ -3095,16 +3141,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > >  			WRITE_ONCE(krcp->count, 0);
> > >  
> > >  			/*
> > > -			 * One work is per one batch, so there are two "free channels",
> > > -			 * "bhead_free" and "head_free" the batch can handle. It can be
> > > -			 * that the work is in the pending state when two channels have
> > > -			 * been detached following each other, one by one.
> > > +			 * One work is per one batch, so there are three
> > > +			 * "free channels", the batch can handle. It can
> > > +			 * be that the work is in the pending state when
> > > +			 * channels have been detached following by each
> > > +			 * other.
> > >  			 */
> > >  			queue_rcu_work(system_wq, &krwp->rcu_work);
> > >  		}
> > >  
> > >  		/* Repeat if any "free" corresponding channel is still busy. */
> > > -		if (krcp->bhead || krcp->head)
> > > +		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
> > >  			repeat = true;
> > >  	}
> > >  
> > > @@ -3146,23 +3193,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
> > >  }
> > >  
> > >  static inline bool
> > > -kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > -	struct rcu_head *head, rcu_callback_t func)
> > > +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > >  {
> > > -	struct kfree_rcu_bulk_data *bnode;
> > > +	struct kvfree_rcu_bulk_data *bnode;
> > > +	int idx;
> > >  
> > >  	if (unlikely(!krcp->initialized))
> > >  		return false;
> > >  
> > >  	lockdep_assert_held(&krcp->lock);
> > > +	idx = !!is_vmalloc_addr(ptr);
> > >  
> > >  	/* Check if a new block is required. */
> > > -	if (!krcp->bhead ||
> > > -			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> > > +	if (!krcp->bkvhead[idx] ||
> > > +			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > >  		bnode = get_cached_bnode(krcp);
> > >  		if (!bnode) {
> > > -			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> > > -
> > >  			/*
> > >  			 * To keep this path working on raw non-preemptible
> > >  			 * sections, prevent the optional entry into the
> > > @@ -3175,7 +3221,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > >  				return false;
> > >  
> > > -			bnode = (struct kfree_rcu_bulk_data *)
> > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > >  		}
> > >  
> > > @@ -3185,30 +3231,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > >  
> > >  		/* Initialize the new block. */
> > >  		bnode->nr_records = 0;
> > > -		bnode->next = krcp->bhead;
> > > +		bnode->next = krcp->bkvhead[idx];
> > >  
> > >  		/* Attach it to the head. */
> > > -		krcp->bhead = bnode;
> > > +		krcp->bkvhead[idx] = bnode;
> > >  	}
> > >  
> > >  	/* Finally insert. */
> > > -	krcp->bhead->records[krcp->bhead->nr_records++] =
> > > -		(void *) head - (unsigned long) func;
> > > +	krcp->bkvhead[idx]->records
> > > +		[krcp->bkvhead[idx]->nr_records++] = ptr;
> > >  
> > >  	return true;
> > >  }
> > >  
> > >  /*
> > > - * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
> > > - * period. Please note there are two paths are maintained, one is the main one
> > > - * that uses kfree_bulk() interface and second one is emergency one, that is
> > > - * used only when the main path can not be maintained temporary, due to memory
> > > - * pressure.
> > > + * Queue a request for lazy invocation of appropriate free routine after a
> > > + * grace period. Please note there are three paths are maintained, two are the
> > > + * main ones that use array of pointers interface and third one is emergency
> > > + * one, that is used only when the main path can not be maintained temporary,
> > > + * due to memory pressure.
> > >   *
> > >   * Each kfree_call_rcu() request is added to a batch. The batch will be drained
> > >   * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
> > >   * be free'd in workqueue context. This allows us to: batch requests together to
> > > - * reduce the number of grace periods during heavy kfree_rcu() load.
> > > + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
> > >   */
> > >  void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  {
> > > @@ -3231,7 +3277,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  	 * Under high memory pressure GFP_NOWAIT can fail,
> > >  	 * in that case the emergency path is maintained.
> > >  	 */
> > > -	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
> > > +	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> > >  		head->func = func;
> > >  		head->next = krcp->head;
> > >  		krcp->head = head;
> > > @@ -4212,7 +4258,7 @@ static void __init kfree_rcu_batch_init(void)
> > >  
> > >  	for_each_possible_cpu(cpu) {
> > >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > -		struct kfree_rcu_bulk_data *bnode;
> > > +		struct kvfree_rcu_bulk_data *bnode;
> > >  
> > >  		for (i = 0; i < KFREE_N_BATCHES; i++) {
> > >  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
> > > @@ -4220,7 +4266,7 @@ static void __init kfree_rcu_batch_init(void)
> > >  		}
> > >  
> > >  		for (i = 0; i < rcu_min_cached_objs; i++) {
> > > -			bnode = (struct kfree_rcu_bulk_data *)
> > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > >  
> > >  			if (bnode)
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-03 23:44       ` Joel Fernandes
@ 2020-05-04  0:23         ` Paul E. McKenney
  2020-05-04  0:34           ` Joe Perches
  2020-05-04  0:41           ` Joel Fernandes
  0 siblings, 2 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04  0:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Joe Perches, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 07:44:00PM -0400, Joel Fernandes wrote:
> On Fri, May 01, 2020 at 01:52:46PM -0700, Joe Perches wrote:
> > On Fri, 2020-05-01 at 12:05 -0700, Paul E. McKenney wrote:
> > > On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > > > with majority of commenting styles.
> > []
> > > on /* */ style?
> > > 
> > > I am (slowly) moving RCU to "//" for those reasons.  ;-)
> > 
> > I hope c99 comment styles are more commonly used soon too.
> > checkpatch doesn't care.
> > 
> > Perhaps a change to coding-style.rst
> > ---
> >  Documentation/process/coding-style.rst | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> > index acb2f1b..fee647 100644
> > --- a/Documentation/process/coding-style.rst
> > +++ b/Documentation/process/coding-style.rst
> > @@ -565,6 +565,11 @@ comments is a little different.
> >  	 * but there is no initial almost-blank line.
> >  	 */
> >  
> > +.. code-block:: c
> > +
> > +	// Single line and inline comments may also use the c99 // style
> > +	// Block comments as well
> > +
> >  It's also important to comment data, whether they are basic types or derived
> >  types.  To this end, use just one data declaration per line (no commas for
> >  multiple data declarations).  This leaves you room for a small comment on each
> 
> Yeah that's fine with me. This patch just tries to keep it consistent. I am
> Ok with either style.

My approach has been gradual change.  Big-bang changes of this sort
cause quite a bit of trouble.  So I use "//" in new code and (sometimes)
convert nearby ones when making a change.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment
  2020-05-01 23:00   ` Paul E. McKenney
@ 2020-05-04  0:24     ` Joel Fernandes
  2020-05-04  0:29       ` Paul E. McKenney
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 04:00:52PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:59PM +0200, Uladzislau Rezki (Sony) wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > Handle cases where the the object being kvfree_rcu()'d is not aligned by
> > 2-byte boundaries.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 501cac02146d..649bad7ad0f0 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2877,6 +2877,9 @@ struct kvfree_rcu_bulk_data {
> >  #define KVFREE_BULK_MAX_ENTR \
> >  	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
> >  
> > +/* Encoding the offset of a fake rcu_head to indicate the head is a wrapper. */
> > +#define RCU_HEADLESS_KFREE BIT(31)
> 
> Did I miss the check for freeing something larger than 2GB?  Or is this
> impossible, even on systems with many terabytes of physical memory?
> Even if it is currently impossible, what prevents it from suddenly
> becoming all too possible at some random point in the future?  If you
> think that this will never happen, please keep in mind that the first
> time I heard "640K ought to be enough for anybody", it sounded eminently
> reasonable to me.
> 
> Besides...
> 
> Isn't the offset in question the offset of an rcu_head struct within
> the enclosing structure? If so, why not keep the current requirement
> that this be at least 16-bit aligned, especially given that some work
> is required to make that alignment less than pointer sized?  Then you
> can continue using bit 0.
> 
> This alignment requirement is included in the RCU requirements
> documentation and is enforced within the __call_rcu() function.
> 
> So let's leave this at bit 0.

This patch is needed only if we are growing the fake rcu_head. Since you
mentioned in a previous patch in this series that you don't want to do that,
and just rely on availability of the array of pointers or synchronize_rcu(),
we can drop this patch. If we are not dropping that earlier patch, let us
discuss more.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-03 23:52     ` Joel Fernandes
@ 2020-05-04  0:26       ` Paul E. McKenney
  2020-05-04  0:39         ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04  0:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 07:52:13PM -0400, Joel Fernandes wrote:
> On Fri, May 01, 2020 at 12:05:55PM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > 
> > > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > > with majority of commenting styles.
> > > 
> > > Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > 
> > Hmmm...
> > 
> > Exactly why is three additional characters per line preferable?  Or in
> > the case of block comments, either one or two additional lines, depending
> > on /* */ style?
> 
> I prefer to keep the code consistent and then bulk convert it later. Its a
> bit ugly to read when its mixed up with "//" and "/* */" right now. We can
> convert it to // all at once later but until then it'll be good to keep it
> consistent in this file IMO. When I checked the kfree_rcu() code, it had more
> "/* */" than not, so this small change is less churn for now.

Please just drop this patch along with the other "//"-to-"/* */"
regressions.

If you want to convert more comments to "//" within the confines of the
kfree_rcu() code, I am probably OK with that.  But again, a big-bang
change of this sort often causes problems due to lots of potential
rebase/merge conflicts.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > 
> > I am (slowly) moving RCU to "//" for those reasons.  ;-)
> > 
> > 							Thanx, Paul
> > 
> > > ---
> > >  kernel/rcu/tree.c | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index cd61649e1b00..1487af8e11e8 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3043,15 +3043,15 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > >  static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
> > >  					  unsigned long flags)
> > >  {
> > > -	// Attempt to start a new batch.
> > > +	/* Attempt to start a new batch. */
> > >  	krcp->monitor_todo = false;
> > >  	if (queue_kfree_rcu_work(krcp)) {
> > > -		// Success! Our job is done here.
> > > +		/* Success! Our job is done here. */
> > >  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > >  		return;
> > >  	}
> > >  
> > > -	// Previous RCU batch still in progress, try again later.
> > > +	/* Previous RCU batch still in progress, try again later. */
> > >  	krcp->monitor_todo = true;
> > >  	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> > >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > @@ -3151,14 +3151,14 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  	unsigned long flags;
> > >  	struct kfree_rcu_cpu *krcp;
> > >  
> > > -	local_irq_save(flags);	// For safely calling this_cpu_ptr().
> > > +	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
> > >  	krcp = this_cpu_ptr(&krc);
> > >  	if (krcp->initialized)
> > >  		raw_spin_lock(&krcp->lock);
> > >  
> > > -	// Queue the object but don't yet schedule the batch.
> > > +	/* Queue the object but don't yet schedule the batch. */
> > >  	if (debug_rcu_head_queue(head)) {
> > > -		// Probable double kfree_rcu(), just leak.
> > > +		/* Probable double kfree_rcu(), just leak. */
> > >  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> > >  			  __func__, head);
> > >  		goto unlock_return;
> > > @@ -3176,7 +3176,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  
> > >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> > >  
> > > -	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> > > +	/* Set timer to drain after KFREE_DRAIN_JIFFIES. */
> > >  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> > >  	    !krcp->monitor_todo) {
> > >  		krcp->monitor_todo = true;
> > > @@ -3722,7 +3722,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  
> > >  	rcutree_affinity_setting(cpu, cpu);
> > >  
> > > -	// nohz_full CPUs need the tick for stop-machine to work quickly
> > > +	/* nohz_full CPUs need the tick for stop-machine to work quickly */
> > >  	tick_dep_set(TICK_DEP_BIT_RCU);
> > >  	return 0;
> > >  }
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 22/24] rcu/tiny: support reclaim for head-less object
  2020-05-01 23:06   ` Paul E. McKenney
@ 2020-05-04  0:27     ` Joel Fernandes
  2020-05-04 12:45       ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 04:06:38PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:59:01PM +0200, Uladzislau Rezki (Sony) wrote:
> > Make a kvfree_call_rcu() function to support head-less
> > freeing. Same as for tree-RCU, for such purpose we store
> > pointers in array. SLAB and vmalloc ptrs. are mixed and
> > coexist together.
> > 
> > Under high memory pressure it can be that maintaining of
> > arrays becomes impossible. Objects with an rcu_head are
> > released via call_rcu(). When it comes to the head-less
> > variant, the kvfree() call is directly inlined, i.e. we
> > do the same as for tree-RCU:
> >     a) wait until a grace period has elapsed;
> >     b) direct inlining of the kvfree() call.
> > 
> > Thus the current context has to follow might_sleep()
> > annotation. Also please note that for tiny-RCU any
> > call of synchronize_rcu() is actually a quiescent
> > state, therefore (a) does nothing.
> 
> Please, please, please just do synchronize_rcu() followed by kvfree()
> for single-argument kfree_rcu() and friends in Tiny RCU.
> 
> Way simpler and probably way faster as well.  And given that Tiny RCU
> runs only on uniprocessor systems, the complexity probably is buying
> you very little, if anything.

Agreed.

thanks,

 - Joel

> 
> 							Thanx, Paul
> 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  kernel/rcu/tiny.c | 157 +++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 156 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > index 508c82faa45c..b1c31a935db9 100644
> > --- a/kernel/rcu/tiny.c
> > +++ b/kernel/rcu/tiny.c
> > @@ -40,6 +40,29 @@ static struct rcu_ctrlblk rcu_ctrlblk = {
> >  	.curtail	= &rcu_ctrlblk.rcucblist,
> >  };
> >  
> > +/* Can be common with tree-RCU. */
> > +#define KVFREE_DRAIN_JIFFIES (HZ / 50)
> > +
> > +/* Can be common with tree-RCU. */
> > +struct kvfree_rcu_bulk_data {
> > +	unsigned long nr_records;
> > +	struct kvfree_rcu_bulk_data *next;
> > +	void *records[];
> > +};
> > +
> > +/* Can be common with tree-RCU. */
> > +#define KVFREE_BULK_MAX_ENTR \
> > +	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
> > +
> > +static struct kvfree_rcu_bulk_data *kvhead;
> > +static struct kvfree_rcu_bulk_data *kvhead_free;
> > +static struct kvfree_rcu_bulk_data *kvcache;
> > +
> > +static DEFINE_STATIC_KEY_FALSE(rcu_init_done);
> > +static struct delayed_work monitor_work;
> > +static struct rcu_work rcu_work;
> > +static bool monitor_todo;
> > +
> >  void rcu_barrier(void)
> >  {
> >  	wait_rcu_gp(call_rcu);
> > @@ -177,9 +200,137 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu);
> >  
> > +static inline bool
> > +kvfree_call_rcu_add_ptr_to_bulk(void *ptr)
> > +{
> > +	struct kvfree_rcu_bulk_data *bnode;
> > +
> > +	if (!kvhead || kvhead->nr_records == KVFREE_BULK_MAX_ENTR) {
> > +		bnode = xchg(&kvcache, NULL);
> > +		if (!bnode)
> > +			bnode = (struct kvfree_rcu_bulk_data *)
> > +				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > +
> > +		if (unlikely(!bnode))
> > +			return false;
> > +
> > +		/* Initialize the new block. */
> > +		bnode->nr_records = 0;
> > +		bnode->next = kvhead;
> > +
> > +		/* Attach it to the bvhead. */
> > +		kvhead = bnode;
> > +	}
> > +
> > +	/* Done. */
> > +	kvhead->records[kvhead->nr_records++] = ptr;
> > +	return true;
> > +}
> > +
> > +static void
> > +kvfree_rcu_work(struct work_struct *work)
> > +{
> > +	struct kvfree_rcu_bulk_data *kvhead_tofree, *next;
> > +	unsigned long flags;
> > +	int i;
> > +
> > +	local_irq_save(flags);
> > +	kvhead_tofree = kvhead_free;
> > +	kvhead_free = NULL;
> > +	local_irq_restore(flags);
> > +
> > +	/* Reclaim process. */
> > +	for (; kvhead_tofree; kvhead_tofree = next) {
> > +		next = kvhead_tofree->next;
> > +
> > +		for (i = 0; i < kvhead_tofree->nr_records; i++) {
> > +			debug_rcu_head_unqueue((struct rcu_head *)
> > +				kvhead_tofree->records[i]);
> > +			kvfree(kvhead_tofree->records[i]);
> > +		}
> > +
> > +		if (cmpxchg(&kvcache, NULL, kvhead_tofree))
> > +			free_page((unsigned long) kvhead_tofree);
> > +	}
> > +}
> > +
> > +static inline bool
> > +queue_kvfree_rcu_work(void)
> > +{
> > +	/* Check if the free channel is available. */
> > +	if (kvhead_free)
> > +		return false;
> > +
> > +	kvhead_free = kvhead;
> > +	kvhead = NULL;
> > +
> > +	/*
> > +	 * Queue the job for memory reclaim after GP.
> > +	 */
> > +	queue_rcu_work(system_wq, &rcu_work);
> > +	return true;
> > +}
> > +
> > +static void kvfree_rcu_monitor(struct work_struct *work)
> > +{
> > +	unsigned long flags;
> > +	bool queued;
> > +
> > +	local_irq_save(flags);
> > +	queued = queue_kvfree_rcu_work();
> > +	if (queued)
> > +		/* Success. */
> > +		monitor_todo = false;
> > +	local_irq_restore(flags);
> > +
> > +	/*
> > +	 * If previous RCU reclaim process is still in progress,
> > +	 * schedule the work one more time to try again later.
> > +	 */
> > +	if (monitor_todo)
> > +		schedule_delayed_work(&monitor_work,
> > +			KVFREE_DRAIN_JIFFIES);
> > +}
> > +
> >  void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  {
> > -	call_rcu(head, func);
> > +	unsigned long flags;
> > +	bool success;
> > +	void *ptr;
> > +
> > +	if (head) {
> > +		ptr = (void *) head - (unsigned long) func;
> > +	} else {
> > +		might_sleep();
> > +		ptr = (void *) func;
> > +	}
> > +
> > +	if (debug_rcu_head_queue(ptr)) {
> > +		/* Probable double free, just leak. */
> > +		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> > +			  __func__, head);
> > +		return;
> > +	}
> > +
> > +	local_irq_save(flags);
> > +	success = kvfree_call_rcu_add_ptr_to_bulk(ptr);
> > +	if (static_branch_likely(&rcu_init_done)) {
> > +		if (success && !monitor_todo) {
> > +			monitor_todo = true;
> > +			schedule_delayed_work(&monitor_work,
> > +				KVFREE_DRAIN_JIFFIES);
> > +		}
> > +	}
> > +	local_irq_restore(flags);
> > +
> > +	if (!success) {
> > +		if (!head) {
> > +			synchronize_rcu();
> > +			kvfree(ptr);
> > +		} else {
> > +			call_rcu(head, func);
> > +		}
> > +	}
> >  }
> >  EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> >  
> > @@ -188,4 +339,8 @@ void __init rcu_init(void)
> >  	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
> >  	rcu_early_boot_tests();
> >  	srcu_init();
> > +
> > +	INIT_DELAYED_WORK(&monitor_work, kvfree_rcu_monitor);
> > +	INIT_RCU_WORK(&rcu_work, kvfree_rcu_work);
> > +	static_branch_enable(&rcu_init_done);
> >  }
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-04  0:12     ` Joel Fernandes
@ 2020-05-04  0:28       ` Paul E. McKenney
  2020-05-04  0:32         ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04  0:28 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 08:12:58PM -0400, Joel Fernandes wrote:
> On Fri, May 01, 2020 at 03:39:09PM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 28, 2020 at 10:58:58PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Update the kvfree_call_rcu() with head-less support, it
> > > means an object without any rcu_head structure can be
> > > reclaimed after GP.
> > > 
> > > To store pointers there are two chain-arrays maintained
> > > one for SLAB and another one is for vmalloc. Both types
> > > of objects(head-less variant and regular one) are placed
> > > there based on the type.
> > > 
> > > It can be that maintaining of arrays becomes impossible
> > > due to high memory pressure. For such reason there is an
> > > emergency path. In that case objects with rcu_head inside
> > > are just queued building one way list. Later on that list
> > > is drained.
> > > 
> > > As for head-less variant. Such objects do not have any
> > > rcu_head helper inside. Thus it is dynamically attached.
> > > As a result an object consists of back-pointer and regular
> > > rcu_head. It implies that emergency path can detect such
> > > object type, therefore they are tagged. So a back-pointer
> > > could be freed as well as dynamically attached wrapper.
> > > 
> > > Even though such approach requires dynamic memory it needs
> > > only sizeof(unsigned long *) + sizeof(struct rcu_head) bytes,
> > > thus SLAB is used to obtain it. Finally if attaching of the
> > > rcu_head and queuing get failed, the current context has
> > > to follow might_sleep() annotation, thus below steps could
> > > be applied:
> > >    a) wait until a grace period has elapsed;
> > >    b) direct inlining of the kvfree() call.
> > > 
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 98 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 51726e4c3b4d..501cac02146d 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3072,15 +3072,31 @@ static void kfree_rcu_work(struct work_struct *work)
> > >  	 */
> > >  	for (; head; head = next) {
> > >  		unsigned long offset = (unsigned long)head->func;
> > > -		void *ptr = (void *)head - offset;
> > > +		bool headless;
> > > +		void *ptr;
> > >  
> > >  		next = head->next;
> > > +
> > > +		/* We tag the headless object, if so adjust offset. */
> > > +		headless = (((unsigned long) head - offset) & BIT(0));
> > > +		if (headless)
> > > +			offset -= 1;
> > > +
> > > +		ptr = (void *) head - offset;
> > > +
> > >  		debug_rcu_head_unqueue((struct rcu_head *)ptr);
> > >  		rcu_lock_acquire(&rcu_callback_map);
> > >  		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
> > >  
> > > -		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> > > +		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) {
> > > +			/*
> > > +			 * If headless free the back-pointer first.
> > > +			 */
> > > +			if (headless)
> > > +				kvfree((void *) *((unsigned long *) ptr));
> > > +
> > >  			kvfree(ptr);
> > > +		}
> > >  
> > >  		rcu_lock_release(&rcu_callback_map);
> > >  		cond_resched_tasks_rcu_qs();
> > > @@ -3221,6 +3237,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > >  				return false;
> > >  
> > > +			/*
> > > +			 * TODO: For one argument of kvfree_rcu() we can
> > > +			 * drop the lock and get the page in sleepable
> > > +			 * context. That would allow to maintain an array
> > > +			 * for the CONFIG_PREEMPT_RT as well. Thus we could
> > > +			 * get rid of dynamic rcu_head attaching code.
> > > +			 */
> > >  			bnode = (struct kvfree_rcu_bulk_data *)
> > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > >  		}
> > > @@ -3244,6 +3267,23 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > >  	return true;
> > >  }
> > >  
> > > +static inline struct rcu_head *
> > > +attach_rcu_head_to_object(void *obj)
> > > +{
> > > +	unsigned long *ptr;
> > > +
> > > +	ptr = kmalloc(sizeof(unsigned long *) +
> > > +			sizeof(struct rcu_head), GFP_NOWAIT |
> > > +				__GFP_RECLAIM |	/* can do direct reclaim. */
> > > +				__GFP_NORETRY |	/* only lightweight one.  */
> > > +				__GFP_NOWARN);	/* no failure reports. */
> > 
> > Again, let's please not do this single-pointer-sized allocation.  If
> > a full page is not available and this is a single-argument kfree_rcu(),
> > just call synchronize_rcu() and then free the object directly.
> 
> With the additional caching, lack of full page should not be very likely. I
> agree we can avoid doing any allocation and just straight to
> synchroize_rcu().

That sounds good to me!

> > It should not be -that- hard to adjust locking for CONFIG_PREEMPT_RT!
> > For example, have some kind of reservation protocol so that a task
> > that drops the lock can retry the page allocation and be sure of having
> > a place to put it.  This might entail making CONFIG_PREEMPT_RT reserve
> > more pages per CPU.  Or maybe that would not be necessary.
> 
> If we are not doing single-pointer allocation, then that would also eliminate
> entering the low-level page allocator for single-pointer allocations.
> 
> Or did you mean entry into the allocator for the full-page allocations
> related to the pointer array for PREEMPT_RT? Even if we skip entry into the
> allocator for those, we will still have additional caching which further
> reduces chances of getting a full page. In the event of such failure, we can
> simply queue the rcu_head.
> 
> Thoughts?

I was just trying to guess why you kept the single-pointer allocation.
It looks like I guessed wrong.  ;-)

If, as you say above, you make it go straight to synchronize_rcu()
upon full-page allocation failure, that would be good!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > 
> > 							Thanx, Paul
> > 
> > > +	if (!ptr)
> > > +		return NULL;
> > > +
> > > +	ptr[0] = (unsigned long) obj;
> > > +	return ((struct rcu_head *) ++ptr);
> > > +}
> > > +
> > >  /*
> > >   * Queue a request for lazy invocation of appropriate free routine after a
> > >   * grace period. Please note there are three paths are maintained, two are the
> > > @@ -3260,16 +3300,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  {
> > >  	unsigned long flags;
> > >  	struct kfree_rcu_cpu *krcp;
> > > +	bool success;
> > >  	void *ptr;
> > >  
> > > +	if (head) {
> > > +		ptr = (void *) head - (unsigned long) func;
> > > +	} else {
> > > +		/*
> > > +		 * Please note there is a limitation for the head-less
> > > +		 * variant, that is why there is a clear rule for such
> > > +		 * objects:
> > > +		 *
> > > +		 * it can be used from might_sleep() context only. For
> > > +		 * other places please embed an rcu_head to your data.
> > > +		 */
> > > +		might_sleep();
> > > +		ptr = (unsigned long *) func;
> > > +	}
> > > +
> > >  	krcp = krc_this_cpu_lock(&flags);
> > > -	ptr = (void *)head - (unsigned long)func;
> > >  
> > >  	/* Queue the object but don't yet schedule the batch. */
> > >  	if (debug_rcu_head_queue(ptr)) {
> > >  		/* Probable double kfree_rcu(), just leak. */
> > >  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> > >  			  __func__, head);
> > > +
> > > +		/* Mark as success and leave. */
> > > +		success = true;
> > >  		goto unlock_return;
> > >  	}
> > >  
> > > @@ -3277,10 +3335,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  	 * Under high memory pressure GFP_NOWAIT can fail,
> > >  	 * in that case the emergency path is maintained.
> > >  	 */
> > > -	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> > > +	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
> > > +	if (!success) {
> > > +		if (head == NULL) {
> > > +			/*
> > > +			 * Headless(one argument kvfree_rcu()) can sleep.
> > > +			 * Drop the lock and tack it back. So it can do
> > > +			 * direct lightweight reclaim.
> > > +			 */
> > > +			krc_this_cpu_unlock(krcp, flags);
> > > +			head = attach_rcu_head_to_object(ptr);
> > > +			krcp = krc_this_cpu_lock(&flags);
> > > +
> > > +			if (head == NULL)
> > > +				goto unlock_return;
> > > +
> > > +			/*
> > > +			 * Tag the headless object. Such objects have a
> > > +			 * back-pointer to the original allocated memory,
> > > +			 * that has to be freed as well as dynamically
> > > +			 * attached wrapper/head.
> > > +			 */
> > > +			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
> > > +		}
> > > +
> > >  		head->func = func;
> > >  		head->next = krcp->head;
> > >  		krcp->head = head;
> > > +		success = true;
> > >  	}
> > >  
> > >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> > > @@ -3294,6 +3376,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  
> > >  unlock_return:
> > >  	krc_this_cpu_unlock(krcp, flags);
> > > +
> > > +	/*
> > > +	 * High memory pressure, so inline kvfree() after
> > > +	 * synchronize_rcu(). We can do it from might_sleep()
> > > +	 * context only, so the current CPU can pass the QS
> > > +	 * state.
> > > +	 */
> > > +	if (!success) {
> > > +		debug_rcu_head_unqueue(ptr);
> > > +		synchronize_rcu();
> > > +		kvfree(ptr);
> > > +	}
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> > >  
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment
  2020-05-04  0:24     ` Joel Fernandes
@ 2020-05-04  0:29       ` Paul E. McKenney
  2020-05-04  0:31         ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04  0:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> On Fri, May 01, 2020 at 04:00:52PM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 28, 2020 at 10:58:59PM +0200, Uladzislau Rezki (Sony) wrote:
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > 
> > > Handle cases where the the object being kvfree_rcu()'d is not aligned by
> > > 2-byte boundaries.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 501cac02146d..649bad7ad0f0 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2877,6 +2877,9 @@ struct kvfree_rcu_bulk_data {
> > >  #define KVFREE_BULK_MAX_ENTR \
> > >  	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
> > >  
> > > +/* Encoding the offset of a fake rcu_head to indicate the head is a wrapper. */
> > > +#define RCU_HEADLESS_KFREE BIT(31)
> > 
> > Did I miss the check for freeing something larger than 2GB?  Or is this
> > impossible, even on systems with many terabytes of physical memory?
> > Even if it is currently impossible, what prevents it from suddenly
> > becoming all too possible at some random point in the future?  If you
> > think that this will never happen, please keep in mind that the first
> > time I heard "640K ought to be enough for anybody", it sounded eminently
> > reasonable to me.
> > 
> > Besides...
> > 
> > Isn't the offset in question the offset of an rcu_head struct within
> > the enclosing structure? If so, why not keep the current requirement
> > that this be at least 16-bit aligned, especially given that some work
> > is required to make that alignment less than pointer sized?  Then you
> > can continue using bit 0.
> > 
> > This alignment requirement is included in the RCU requirements
> > documentation and is enforced within the __call_rcu() function.
> > 
> > So let's leave this at bit 0.
> 
> This patch is needed only if we are growing the fake rcu_head. Since you
> mentioned in a previous patch in this series that you don't want to do that,
> and just rely on availability of the array of pointers or synchronize_rcu(),
> we can drop this patch. If we are not dropping that earlier patch, let us
> discuss more.

Dropping it sounds very good to me!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment
  2020-05-04  0:29       ` Paul E. McKenney
@ 2020-05-04  0:31         ` Joel Fernandes
  2020-05-04 12:56           ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:31 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 05:29:47PM -0700, Paul E. McKenney wrote:
> On Sun, May 03, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> > On Fri, May 01, 2020 at 04:00:52PM -0700, Paul E. McKenney wrote:
> > > On Tue, Apr 28, 2020 at 10:58:59PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > 
> > > > Handle cases where the the object being kvfree_rcu()'d is not aligned by
> > > > 2-byte boundaries.
> > > > 
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 9 ++++++---
> > > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 501cac02146d..649bad7ad0f0 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2877,6 +2877,9 @@ struct kvfree_rcu_bulk_data {
> > > >  #define KVFREE_BULK_MAX_ENTR \
> > > >  	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
> > > >  
> > > > +/* Encoding the offset of a fake rcu_head to indicate the head is a wrapper. */
> > > > +#define RCU_HEADLESS_KFREE BIT(31)
> > > 
> > > Did I miss the check for freeing something larger than 2GB?  Or is this
> > > impossible, even on systems with many terabytes of physical memory?
> > > Even if it is currently impossible, what prevents it from suddenly
> > > becoming all too possible at some random point in the future?  If you
> > > think that this will never happen, please keep in mind that the first
> > > time I heard "640K ought to be enough for anybody", it sounded eminently
> > > reasonable to me.
> > > 
> > > Besides...
> > > 
> > > Isn't the offset in question the offset of an rcu_head struct within
> > > the enclosing structure? If so, why not keep the current requirement
> > > that this be at least 16-bit aligned, especially given that some work
> > > is required to make that alignment less than pointer sized?  Then you
> > > can continue using bit 0.
> > > 
> > > This alignment requirement is included in the RCU requirements
> > > documentation and is enforced within the __call_rcu() function.
> > > 
> > > So let's leave this at bit 0.
> > 
> > This patch is needed only if we are growing the fake rcu_head. Since you
> > mentioned in a previous patch in this series that you don't want to do that,
> > and just rely on availability of the array of pointers or synchronize_rcu(),
> > we can drop this patch. If we are not dropping that earlier patch, let us
> > discuss more.
> 
> Dropping it sounds very good to me!

Cool ;-) Thanks,

 - Joel


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-04  0:28       ` Paul E. McKenney
@ 2020-05-04  0:32         ` Joel Fernandes
  2020-05-04 14:21           ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 05:28:55PM -0700, Paul E. McKenney wrote:
> On Sun, May 03, 2020 at 08:12:58PM -0400, Joel Fernandes wrote:
> > On Fri, May 01, 2020 at 03:39:09PM -0700, Paul E. McKenney wrote:
> > > On Tue, Apr 28, 2020 at 10:58:58PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > Update the kvfree_call_rcu() with head-less support, it
> > > > means an object without any rcu_head structure can be
> > > > reclaimed after GP.
> > > > 
> > > > To store pointers there are two chain-arrays maintained
> > > > one for SLAB and another one is for vmalloc. Both types
> > > > of objects(head-less variant and regular one) are placed
> > > > there based on the type.
> > > > 
> > > > It can be that maintaining of arrays becomes impossible
> > > > due to high memory pressure. For such reason there is an
> > > > emergency path. In that case objects with rcu_head inside
> > > > are just queued building one way list. Later on that list
> > > > is drained.
> > > > 
> > > > As for head-less variant. Such objects do not have any
> > > > rcu_head helper inside. Thus it is dynamically attached.
> > > > As a result an object consists of back-pointer and regular
> > > > rcu_head. It implies that emergency path can detect such
> > > > object type, therefore they are tagged. So a back-pointer
> > > > could be freed as well as dynamically attached wrapper.
> > > > 
> > > > Even though such approach requires dynamic memory it needs
> > > > only sizeof(unsigned long *) + sizeof(struct rcu_head) bytes,
> > > > thus SLAB is used to obtain it. Finally if attaching of the
> > > > rcu_head and queuing get failed, the current context has
> > > > to follow might_sleep() annotation, thus below steps could
> > > > be applied:
> > > >    a) wait until a grace period has elapsed;
> > > >    b) direct inlining of the kvfree() call.
> > > > 
> > > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  kernel/rcu/tree.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 98 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 51726e4c3b4d..501cac02146d 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3072,15 +3072,31 @@ static void kfree_rcu_work(struct work_struct *work)
> > > >  	 */
> > > >  	for (; head; head = next) {
> > > >  		unsigned long offset = (unsigned long)head->func;
> > > > -		void *ptr = (void *)head - offset;
> > > > +		bool headless;
> > > > +		void *ptr;
> > > >  
> > > >  		next = head->next;
> > > > +
> > > > +		/* We tag the headless object, if so adjust offset. */
> > > > +		headless = (((unsigned long) head - offset) & BIT(0));
> > > > +		if (headless)
> > > > +			offset -= 1;
> > > > +
> > > > +		ptr = (void *) head - offset;
> > > > +
> > > >  		debug_rcu_head_unqueue((struct rcu_head *)ptr);
> > > >  		rcu_lock_acquire(&rcu_callback_map);
> > > >  		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
> > > >  
> > > > -		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> > > > +		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) {
> > > > +			/*
> > > > +			 * If headless free the back-pointer first.
> > > > +			 */
> > > > +			if (headless)
> > > > +				kvfree((void *) *((unsigned long *) ptr));
> > > > +
> > > >  			kvfree(ptr);
> > > > +		}
> > > >  
> > > >  		rcu_lock_release(&rcu_callback_map);
> > > >  		cond_resched_tasks_rcu_qs();
> > > > @@ -3221,6 +3237,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > > >  				return false;
> > > >  
> > > > +			/*
> > > > +			 * TODO: For one argument of kvfree_rcu() we can
> > > > +			 * drop the lock and get the page in sleepable
> > > > +			 * context. That would allow to maintain an array
> > > > +			 * for the CONFIG_PREEMPT_RT as well. Thus we could
> > > > +			 * get rid of dynamic rcu_head attaching code.
> > > > +			 */
> > > >  			bnode = (struct kvfree_rcu_bulk_data *)
> > > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > > >  		}
> > > > @@ -3244,6 +3267,23 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > >  	return true;
> > > >  }
> > > >  
> > > > +static inline struct rcu_head *
> > > > +attach_rcu_head_to_object(void *obj)
> > > > +{
> > > > +	unsigned long *ptr;
> > > > +
> > > > +	ptr = kmalloc(sizeof(unsigned long *) +
> > > > +			sizeof(struct rcu_head), GFP_NOWAIT |
> > > > +				__GFP_RECLAIM |	/* can do direct reclaim. */
> > > > +				__GFP_NORETRY |	/* only lightweight one.  */
> > > > +				__GFP_NOWARN);	/* no failure reports. */
> > > 
> > > Again, let's please not do this single-pointer-sized allocation.  If
> > > a full page is not available and this is a single-argument kfree_rcu(),
> > > just call synchronize_rcu() and then free the object directly.
> > 
> > With the additional caching, lack of full page should not be very likely. I
> > agree we can avoid doing any allocation and just straight to
> > synchroize_rcu().
> 
> That sounds good to me!
> 
> > > It should not be -that- hard to adjust locking for CONFIG_PREEMPT_RT!
> > > For example, have some kind of reservation protocol so that a task
> > > that drops the lock can retry the page allocation and be sure of having
> > > a place to put it.  This might entail making CONFIG_PREEMPT_RT reserve
> > > more pages per CPU.  Or maybe that would not be necessary.
> > 
> > If we are not doing single-pointer allocation, then that would also eliminate
> > entering the low-level page allocator for single-pointer allocations.
> > 
> > Or did you mean entry into the allocator for the full-page allocations
> > related to the pointer array for PREEMPT_RT? Even if we skip entry into the
> > allocator for those, we will still have additional caching which further
> > reduces chances of getting a full page. In the event of such failure, we can
> > simply queue the rcu_head.
> > 
> > Thoughts?
> 
> I was just trying to guess why you kept the single-pointer allocation.
> It looks like I guessed wrong.  ;-)
> 
> If, as you say above, you make it go straight to synchronize_rcu()
> upon full-page allocation failure, that would be good!

Paul, sounds good. Vlad, are you also Ok with that?

thanks,

 - Joel


> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > > 
> > > 							Thanx, Paul
> > > 
> > > > +	if (!ptr)
> > > > +		return NULL;
> > > > +
> > > > +	ptr[0] = (unsigned long) obj;
> > > > +	return ((struct rcu_head *) ++ptr);
> > > > +}
> > > > +
> > > >  /*
> > > >   * Queue a request for lazy invocation of appropriate free routine after a
> > > >   * grace period. Please note there are three paths are maintained, two are the
> > > > @@ -3260,16 +3300,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  {
> > > >  	unsigned long flags;
> > > >  	struct kfree_rcu_cpu *krcp;
> > > > +	bool success;
> > > >  	void *ptr;
> > > >  
> > > > +	if (head) {
> > > > +		ptr = (void *) head - (unsigned long) func;
> > > > +	} else {
> > > > +		/*
> > > > +		 * Please note there is a limitation for the head-less
> > > > +		 * variant, that is why there is a clear rule for such
> > > > +		 * objects:
> > > > +		 *
> > > > +		 * it can be used from might_sleep() context only. For
> > > > +		 * other places please embed an rcu_head to your data.
> > > > +		 */
> > > > +		might_sleep();
> > > > +		ptr = (unsigned long *) func;
> > > > +	}
> > > > +
> > > >  	krcp = krc_this_cpu_lock(&flags);
> > > > -	ptr = (void *)head - (unsigned long)func;
> > > >  
> > > >  	/* Queue the object but don't yet schedule the batch. */
> > > >  	if (debug_rcu_head_queue(ptr)) {
> > > >  		/* Probable double kfree_rcu(), just leak. */
> > > >  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> > > >  			  __func__, head);
> > > > +
> > > > +		/* Mark as success and leave. */
> > > > +		success = true;
> > > >  		goto unlock_return;
> > > >  	}
> > > >  
> > > > @@ -3277,10 +3335,34 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  	 * Under high memory pressure GFP_NOWAIT can fail,
> > > >  	 * in that case the emergency path is maintained.
> > > >  	 */
> > > > -	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> > > > +	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
> > > > +	if (!success) {
> > > > +		if (head == NULL) {
> > > > +			/*
> > > > +			 * Headless(one argument kvfree_rcu()) can sleep.
> > > > +			 * Drop the lock and tack it back. So it can do
> > > > +			 * direct lightweight reclaim.
> > > > +			 */
> > > > +			krc_this_cpu_unlock(krcp, flags);
> > > > +			head = attach_rcu_head_to_object(ptr);
> > > > +			krcp = krc_this_cpu_lock(&flags);
> > > > +
> > > > +			if (head == NULL)
> > > > +				goto unlock_return;
> > > > +
> > > > +			/*
> > > > +			 * Tag the headless object. Such objects have a
> > > > +			 * back-pointer to the original allocated memory,
> > > > +			 * that has to be freed as well as dynamically
> > > > +			 * attached wrapper/head.
> > > > +			 */
> > > > +			func = (rcu_callback_t) (sizeof(unsigned long *) + 1);
> > > > +		}
> > > > +
> > > >  		head->func = func;
> > > >  		head->next = krcp->head;
> > > >  		krcp->head = head;
> > > > +		success = true;
> > > >  	}
> > > >  
> > > >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> > > > @@ -3294,6 +3376,18 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  
> > > >  unlock_return:
> > > >  	krc_this_cpu_unlock(krcp, flags);
> > > > +
> > > > +	/*
> > > > +	 * High memory pressure, so inline kvfree() after
> > > > +	 * synchronize_rcu(). We can do it from might_sleep()
> > > > +	 * context only, so the current CPU can pass the QS
> > > > +	 * state.
> > > > +	 */
> > > > +	if (!success) {
> > > > +		debug_rcu_head_unqueue(ptr);
> > > > +		synchronize_rcu();
> > > > +		kvfree(ptr);
> > > > +	}
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> > > >  
> > > > -- 
> > > > 2.20.1
> > > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-04  0:23         ` Paul E. McKenney
@ 2020-05-04  0:34           ` Joe Perches
  2020-05-04  0:41           ` Joel Fernandes
  1 sibling, 0 replies; 78+ messages in thread
From: Joe Perches @ 2020-05-04  0:34 UTC (permalink / raw)
  To: paulmck, Joel Fernandes, Jonathan Corbet
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, 2020-05-03 at 17:23 -0700, Paul E. McKenney wrote:
> On Sun, May 03, 2020 at 07:44:00PM -0400, Joel Fernandes wrote:
> > On Fri, May 01, 2020 at 01:52:46PM -0700, Joe Perches wrote:
[]
> > > Perhaps a change to coding-style.rst
> > > ---
> > > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
[]
> > > @@ -565,6 +565,11 @@ comments is a little different.
> > >  	 * but there is no initial almost-blank line.
> > >  	 */
> > >  
> > > +.. code-block:: c
> > > +
> > > +	// Single line and inline comments may also use the c99 // style
> > > +	// Block comments as well
> > > +
> > >  It's also important to comment data, whether they are basic types or derived
> > >  types.  To this end, use just one data declaration per line (no commas for
> > >  multiple data declarations).  This leaves you room for a small comment on each
> > 
> > Yeah that's fine with me. This patch just tries to keep it consistent. I am
> > Ok with either style.
> 
> My approach has been gradual change.  Big-bang changes of this sort
> cause quite a bit of trouble.  So I use "//" in new code and (sometimes)
> convert nearby ones when making a change.

I think that's good too.

Mixing styles in the same compilation unit is not
generally the right thing to do.

But right now, c99 comments are not specified as
allowed in coding-style so it's likely appropriate
to add something like this there.



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-04  0:26       ` Paul E. McKenney
@ 2020-05-04  0:39         ` Joel Fernandes
  0 siblings, 0 replies; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 05:26:56PM -0700, Paul E. McKenney wrote:
> On Sun, May 03, 2020 at 07:52:13PM -0400, Joel Fernandes wrote:
> > On Fri, May 01, 2020 at 12:05:55PM -0700, Paul E. McKenney wrote:
> > > On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > 
> > > > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > > > with majority of commenting styles.
> > > > 
> > > > Reviewed-by: Uladzislau Rezki <urezki@gmail.com>
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > 
> > > Hmmm...
> > > 
> > > Exactly why is three additional characters per line preferable?  Or in
> > > the case of block comments, either one or two additional lines, depending
> > > on /* */ style?
> > 
> > I prefer to keep the code consistent and then bulk convert it later. Its a
> > bit ugly to read when its mixed up with "//" and "/* */" right now. We can
> > convert it to // all at once later but until then it'll be good to keep it
> > consistent in this file IMO. When I checked the kfree_rcu() code, it had more
> > "/* */" than not, so this small change is less churn for now.
> 
> Please just drop this patch along with the other "//"-to-"/* */"
> regressions.

Right now in your rcu/dev branch (without applying this series),in
kfree_rcu_drain_unlock() and the functions before and after it, it is
inconsistent.

Also in kfree_call_rcu(), it is:

	// Queue the object but don't yet schedule the batch.
	if (debug_rcu_head_queue(head)) {
		// Probable double kfree_rcu(), just leak.
		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
			  __func__, head);
		goto unlock_return;
	}

	/*
	 * Under high memory pressure GFP_NOWAIT can fail,
	 * in that case the emergency path is maintained.
	 */

> If you want to convert more comments to "//" within the confines of the
> kfree_rcu() code, I am probably OK with that.  But again, a big-bang
> change of this sort often causes problems due to lots of potential
> rebase/merge conflicts.

Ok. Since this series touched kfree-related RCU code, converting all of the
kfree-related RCU code to "//" is Ok with me. Just wanted to keep it
consistent :)

thanks,

 - Joel


> 
> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > > 
> > > I am (slowly) moving RCU to "//" for those reasons.  ;-)
> > > 
> > > 							Thanx, Paul
> > > 
> > > > ---
> > > >  kernel/rcu/tree.c | 16 ++++++++--------
> > > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index cd61649e1b00..1487af8e11e8 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3043,15 +3043,15 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > > >  static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
> > > >  					  unsigned long flags)
> > > >  {
> > > > -	// Attempt to start a new batch.
> > > > +	/* Attempt to start a new batch. */
> > > >  	krcp->monitor_todo = false;
> > > >  	if (queue_kfree_rcu_work(krcp)) {
> > > > -		// Success! Our job is done here.
> > > > +		/* Success! Our job is done here. */
> > > >  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > >  		return;
> > > >  	}
> > > >  
> > > > -	// Previous RCU batch still in progress, try again later.
> > > > +	/* Previous RCU batch still in progress, try again later. */
> > > >  	krcp->monitor_todo = true;
> > > >  	schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> > > >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > @@ -3151,14 +3151,14 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  	unsigned long flags;
> > > >  	struct kfree_rcu_cpu *krcp;
> > > >  
> > > > -	local_irq_save(flags);	// For safely calling this_cpu_ptr().
> > > > +	local_irq_save(flags);	/* For safely calling this_cpu_ptr(). */
> > > >  	krcp = this_cpu_ptr(&krc);
> > > >  	if (krcp->initialized)
> > > >  		raw_spin_lock(&krcp->lock);
> > > >  
> > > > -	// Queue the object but don't yet schedule the batch.
> > > > +	/* Queue the object but don't yet schedule the batch. */
> > > >  	if (debug_rcu_head_queue(head)) {
> > > > -		// Probable double kfree_rcu(), just leak.
> > > > +		/* Probable double kfree_rcu(), just leak. */
> > > >  		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
> > > >  			  __func__, head);
> > > >  		goto unlock_return;
> > > > @@ -3176,7 +3176,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  
> > > >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> > > >  
> > > > -	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> > > > +	/* Set timer to drain after KFREE_DRAIN_JIFFIES. */
> > > >  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> > > >  	    !krcp->monitor_todo) {
> > > >  		krcp->monitor_todo = true;
> > > > @@ -3722,7 +3722,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >  
> > > >  	rcutree_affinity_setting(cpu, cpu);
> > > >  
> > > > -	// nohz_full CPUs need the tick for stop-machine to work quickly
> > > > +	/* nohz_full CPUs need the tick for stop-machine to work quickly */
> > > >  	tick_dep_set(TICK_DEP_BIT_RCU);
> > > >  	return 0;
> > > >  }
> > > > -- 
> > > > 2.20.1
> > > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/24] rcu/tree: Use consistent style for comments
  2020-05-04  0:23         ` Paul E. McKenney
  2020-05-04  0:34           ` Joe Perches
@ 2020-05-04  0:41           ` Joel Fernandes
  1 sibling, 0 replies; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joe Perches, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 05:23:09PM -0700, Paul E. McKenney wrote:
> On Sun, May 03, 2020 at 07:44:00PM -0400, Joel Fernandes wrote:
> > On Fri, May 01, 2020 at 01:52:46PM -0700, Joe Perches wrote:
> > > On Fri, 2020-05-01 at 12:05 -0700, Paul E. McKenney wrote:
> > > > On Tue, Apr 28, 2020 at 10:58:42PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > Simple clean up of comments in kfree_rcu() code to keep it consistent
> > > > > with majority of commenting styles.
> > > []
> > > > on /* */ style?
> > > > 
> > > > I am (slowly) moving RCU to "//" for those reasons.  ;-)
> > > 
> > > I hope c99 comment styles are more commonly used soon too.
> > > checkpatch doesn't care.
> > > 
> > > Perhaps a change to coding-style.rst
> > > ---
> > >  Documentation/process/coding-style.rst | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> > > index acb2f1b..fee647 100644
> > > --- a/Documentation/process/coding-style.rst
> > > +++ b/Documentation/process/coding-style.rst
> > > @@ -565,6 +565,11 @@ comments is a little different.
> > >  	 * but there is no initial almost-blank line.
> > >  	 */
> > >  
> > > +.. code-block:: c
> > > +
> > > +	// Single line and inline comments may also use the c99 // style
> > > +	// Block comments as well
> > > +
> > >  It's also important to comment data, whether they are basic types or derived
> > >  types.  To this end, use just one data declaration per line (no commas for
> > >  multiple data declarations).  This leaves you room for a small comment on each
> > 
> > Yeah that's fine with me. This patch just tries to keep it consistent. I am
> > Ok with either style.
> 
> My approach has been gradual change.  Big-bang changes of this sort
> cause quite a bit of trouble.  So I use "//" in new code and (sometimes)
> convert nearby ones when making a change.

Ok thanks for the guidance on that, will follow similar conversion strategy
as well.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-04  0:20       ` Paul E. McKenney
@ 2020-05-04  0:58         ` Joel Fernandes
  2020-05-04  2:20           ` Paul E. McKenney
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04  0:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 05:20:32PM -0700, Paul E. McKenney wrote:
> On Sun, May 03, 2020 at 07:42:50PM -0400, Joel Fernandes wrote:
> > On Fri, May 01, 2020 at 02:37:53PM -0700, Paul E. McKenney wrote:
> > [...]
> > > > @@ -2993,41 +2994,73 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > > >  static void kfree_rcu_work(struct work_struct *work)
> > > >  {
> > > >  	unsigned long flags;
> > > > +	struct kvfree_rcu_bulk_data *bkhead, *bvhead, *bnext;
> > > >  	struct rcu_head *head, *next;
> > > > -	struct kfree_rcu_bulk_data *bhead, *bnext;
> > > >  	struct kfree_rcu_cpu *krcp;
> > > >  	struct kfree_rcu_cpu_work *krwp;
> > > > +	int i;
> > > >  
> > > >  	krwp = container_of(to_rcu_work(work),
> > > >  			    struct kfree_rcu_cpu_work, rcu_work);
> > > >  	krcp = krwp->krcp;
> > > > +
> > > >  	raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > +	/* Channel 1. */
> > > > +	bkhead = krwp->bkvhead_free[0];
> > > > +	krwp->bkvhead_free[0] = NULL;
> > > > +
> > > > +	/* Channel 2. */
> > > > +	bvhead = krwp->bkvhead_free[1];
> > > > +	krwp->bkvhead_free[1] = NULL;
> > > > +
> > > > +	/* Channel 3. */
> > > >  	head = krwp->head_free;
> > > >  	krwp->head_free = NULL;
> > > > -	bhead = krwp->bhead_free;
> > > > -	krwp->bhead_free = NULL;
> > > >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > >  
> > > > -	/* "bhead" is now private, so traverse locklessly. */
> > > > -	for (; bhead; bhead = bnext) {
> > > > -		bnext = bhead->next;
> > > > -
> > > > -		debug_rcu_bhead_unqueue(bhead);
> > > > +	/* kmalloc()/kfree() channel. */
> > > > +	for (; bkhead; bkhead = bnext) {
> > > > +		bnext = bkhead->next;
> > > > +		debug_rcu_bhead_unqueue(bkhead);
> > > >  
> > > >  		rcu_lock_acquire(&rcu_callback_map);
> > > 
> > > Given that rcu_lock_acquire() only affects lockdep, I have to ask exactly
> > > what concurrency design you are using here...
> > 
> > I believe the rcu_callback_map usage above follows a similar pattern from old
> > code where the rcu_callback_map is acquired before doing the kfree.
> > 
> > static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
> > {
> >         rcu_callback_t f;
> >         unsigned long offset = (unsigned long)head->func;
> > 
> >         rcu_lock_acquire(&rcu_callback_map);
> >         if (__is_kfree_rcu_offset(offset)) {
> >                 trace_rcu_invoke_kfree_callback(rn, head, offset);
> >                 kfree((void *)head - offset);
> >                 rcu_lock_release(&rcu_callback_map);
> > 
> > So when kfree_rcu() was rewritten, the rcu_lock_acquire() of rcu_callback_map
> > got carried.
> > 
> > I believe it is for detecting recursion where we possibly try to free
> > RCU-held memory while already freeing memory. Or was there anoher purpose of
> > the rcu_callback_map?
> 
> It looks like rcu_callback_map was been added by 77a40f97030 ("rcu:
> Remove kfree_rcu() special casing and lazy-callback handling").  Which
> was less than a year ago.  ;-)

I think that's just git blame falsely looking at moved code instead of the
original code.

It was actually the following commit. I think you were trying to detect
blocking and context-switching within an RCU callback. Since kfree_rcu() does
not have RCU callback functions, may be we can just remove it?

commit 24ef659a857c3cba40b64ea51ea4fce8d2fb7bbc
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Oct 28 09:22:24 2013 -0700

    rcu: Provide better diagnostics for blocking in RCU callback functions

    Currently blocking in an RCU callback function will result in
    "scheduling while atomic", which could be triggered for any number
    of reasons.  To aid debugging, this patch introduces a rcu_callback_map
    that is used to tie the inappropriate voluntary context switch back
    to the fact that the function is being invoked from within a callback.

thanks,

 - Joel


> 
> Hmmm...  This would be a good way to allow lockdep to tell you that you
> are running within an RCU callback on the one hand are are reclaiming
> due to kfree_rcu() on the other.  Was that the intent?  If so, a comment
> seems necessary.
> 
> 							Thanx, Paul
> 
> > thanks,
> > 
> >  - Joel
> > 
> > 
> > > >  		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > > > -			bhead->nr_records, bhead->records);
> > > > +			bkhead->nr_records, bkhead->records);
> > > > +
> > > > +		kfree_bulk(bkhead->nr_records, bkhead->records);
> > > > +		rcu_lock_release(&rcu_callback_map);
> > > > +
> > > > +		krcp = krc_this_cpu_lock(&flags);
> > > > +		if (put_cached_bnode(krcp, bkhead))
> > > > +			bkhead = NULL;
> > > > +		krc_this_cpu_unlock(krcp, flags);
> > > > +
> > > > +		if (bkhead)
> > > > +			free_page((unsigned long) bkhead);
> > > > +
> > > > +		cond_resched_tasks_rcu_qs();
> > > > +	}
> > > > +
> > > > +	/* vmalloc()/vfree() channel. */
> > > > +	for (; bvhead; bvhead = bnext) {
> > > > +		bnext = bvhead->next;
> > > > +		debug_rcu_bhead_unqueue(bvhead);
> > > >  
> > > > -		kfree_bulk(bhead->nr_records, bhead->records);
> > > > +		rcu_lock_acquire(&rcu_callback_map);
> > > 
> > > And the same here.
> > > 
> > > > +		for (i = 0; i < bvhead->nr_records; i++) {
> > > > +			trace_rcu_invoke_kfree_callback(rcu_state.name,
> > > > +				(struct rcu_head *) bvhead->records[i], 0);
> > > > +			vfree(bvhead->records[i]);
> > > > +		}
> > > >  		rcu_lock_release(&rcu_callback_map);
> > > >  
> > > >  		krcp = krc_this_cpu_lock(&flags);
> > > > -		if (put_cached_bnode(krcp, bhead))
> > > > -			bhead = NULL;
> > > > +		if (put_cached_bnode(krcp, bvhead))
> > > > +			bvhead = NULL;
> > > >  		krc_this_cpu_unlock(krcp, flags);
> > > >  
> > > > -		if (bhead)
> > > > -			free_page((unsigned long) bhead);
> > > > +		if (bvhead)
> > > > +			free_page((unsigned long) bvhead);
> > > >  
> > > >  		cond_resched_tasks_rcu_qs();
> > > >  	}
> > > > @@ -3047,7 +3080,7 @@ static void kfree_rcu_work(struct work_struct *work)
> > > >  		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
> > > >  
> > > >  		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
> > > > -			kfree(ptr);
> > > > +			kvfree(ptr);
> > > >  
> > > >  		rcu_lock_release(&rcu_callback_map);
> > > >  		cond_resched_tasks_rcu_qs();
> > > > @@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > > >  		krwp = &(krcp->krw_arr[i]);
> > > >  
> > > >  		/*
> > > > -		 * Try to detach bhead or head and attach it over any
> > > > +		 * Try to detach bkvhead or head and attach it over any
> > > >  		 * available corresponding free channel. It can be that
> > > >  		 * a previous RCU batch is in progress, it means that
> > > >  		 * immediately to queue another one is not possible so
> > > >  		 * return false to tell caller to retry.
> > > >  		 */
> > > > -		if ((krcp->bhead && !krwp->bhead_free) ||
> > > > +		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
> > > > +			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
> > > >  				(krcp->head && !krwp->head_free)) {
> > > > -			/* Channel 1. */
> > > > -			if (!krwp->bhead_free) {
> > > > -				krwp->bhead_free = krcp->bhead;
> > > > -				krcp->bhead = NULL;
> > > > +			/*
> > > > +			 * Channel 1 corresponds to SLAB ptrs.
> > > > +			 */
> > > > +			if (!krwp->bkvhead_free[0]) {
> > > > +				krwp->bkvhead_free[0] = krcp->bkvhead[0];
> > > > +				krcp->bkvhead[0] = NULL;
> > > >  			}
> > > >  
> > > > -			/* Channel 2. */
> > > > +			/*
> > > > +			 * Channel 2 corresponds to vmalloc ptrs.
> > > > +			 */
> > > > +			if (!krwp->bkvhead_free[1]) {
> > > > +				krwp->bkvhead_free[1] = krcp->bkvhead[1];
> > > > +				krcp->bkvhead[1] = NULL;
> > > > +			}
> > > 
> > > Why not a "for" loop here?  Duplicate code is most certainly not what
> > > we want, as it can cause all sorts of trouble down the road.
> > > 
> > > 							Thanx, Paul
> > > 
> > > > +			/*
> > > > +			 * Channel 3 corresponds to emergency path.
> > > > +			 */
> > > >  			if (!krwp->head_free) {
> > > >  				krwp->head_free = krcp->head;
> > > >  				krcp->head = NULL;
> > > > @@ -3095,16 +3141,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > > >  			WRITE_ONCE(krcp->count, 0);
> > > >  
> > > >  			/*
> > > > -			 * One work is per one batch, so there are two "free channels",
> > > > -			 * "bhead_free" and "head_free" the batch can handle. It can be
> > > > -			 * that the work is in the pending state when two channels have
> > > > -			 * been detached following each other, one by one.
> > > > +			 * One work is per one batch, so there are three
> > > > +			 * "free channels", the batch can handle. It can
> > > > +			 * be that the work is in the pending state when
> > > > +			 * channels have been detached following by each
> > > > +			 * other.
> > > >  			 */
> > > >  			queue_rcu_work(system_wq, &krwp->rcu_work);
> > > >  		}
> > > >  
> > > >  		/* Repeat if any "free" corresponding channel is still busy. */
> > > > -		if (krcp->bhead || krcp->head)
> > > > +		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
> > > >  			repeat = true;
> > > >  	}
> > > >  
> > > > @@ -3146,23 +3193,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
> > > >  }
> > > >  
> > > >  static inline bool
> > > > -kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > > -	struct rcu_head *head, rcu_callback_t func)
> > > > +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > >  {
> > > > -	struct kfree_rcu_bulk_data *bnode;
> > > > +	struct kvfree_rcu_bulk_data *bnode;
> > > > +	int idx;
> > > >  
> > > >  	if (unlikely(!krcp->initialized))
> > > >  		return false;
> > > >  
> > > >  	lockdep_assert_held(&krcp->lock);
> > > > +	idx = !!is_vmalloc_addr(ptr);
> > > >  
> > > >  	/* Check if a new block is required. */
> > > > -	if (!krcp->bhead ||
> > > > -			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> > > > +	if (!krcp->bkvhead[idx] ||
> > > > +			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > > >  		bnode = get_cached_bnode(krcp);
> > > >  		if (!bnode) {
> > > > -			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> > > > -
> > > >  			/*
> > > >  			 * To keep this path working on raw non-preemptible
> > > >  			 * sections, prevent the optional entry into the
> > > > @@ -3175,7 +3221,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > > >  				return false;
> > > >  
> > > > -			bnode = (struct kfree_rcu_bulk_data *)
> > > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > > >  		}
> > > >  
> > > > @@ -3185,30 +3231,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > >  
> > > >  		/* Initialize the new block. */
> > > >  		bnode->nr_records = 0;
> > > > -		bnode->next = krcp->bhead;
> > > > +		bnode->next = krcp->bkvhead[idx];
> > > >  
> > > >  		/* Attach it to the head. */
> > > > -		krcp->bhead = bnode;
> > > > +		krcp->bkvhead[idx] = bnode;
> > > >  	}
> > > >  
> > > >  	/* Finally insert. */
> > > > -	krcp->bhead->records[krcp->bhead->nr_records++] =
> > > > -		(void *) head - (unsigned long) func;
> > > > +	krcp->bkvhead[idx]->records
> > > > +		[krcp->bkvhead[idx]->nr_records++] = ptr;
> > > >  
> > > >  	return true;
> > > >  }
> > > >  
> > > >  /*
> > > > - * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
> > > > - * period. Please note there are two paths are maintained, one is the main one
> > > > - * that uses kfree_bulk() interface and second one is emergency one, that is
> > > > - * used only when the main path can not be maintained temporary, due to memory
> > > > - * pressure.
> > > > + * Queue a request for lazy invocation of appropriate free routine after a
> > > > + * grace period. Please note there are three paths are maintained, two are the
> > > > + * main ones that use array of pointers interface and third one is emergency
> > > > + * one, that is used only when the main path can not be maintained temporary,
> > > > + * due to memory pressure.
> > > >   *
> > > >   * Each kfree_call_rcu() request is added to a batch. The batch will be drained
> > > >   * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
> > > >   * be free'd in workqueue context. This allows us to: batch requests together to
> > > > - * reduce the number of grace periods during heavy kfree_rcu() load.
> > > > + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
> > > >   */
> > > >  void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  {
> > > > @@ -3231,7 +3277,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > >  	 * Under high memory pressure GFP_NOWAIT can fail,
> > > >  	 * in that case the emergency path is maintained.
> > > >  	 */
> > > > -	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
> > > > +	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> > > >  		head->func = func;
> > > >  		head->next = krcp->head;
> > > >  		krcp->head = head;
> > > > @@ -4212,7 +4258,7 @@ static void __init kfree_rcu_batch_init(void)
> > > >  
> > > >  	for_each_possible_cpu(cpu) {
> > > >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > -		struct kfree_rcu_bulk_data *bnode;
> > > > +		struct kvfree_rcu_bulk_data *bnode;
> > > >  
> > > >  		for (i = 0; i < KFREE_N_BATCHES; i++) {
> > > >  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
> > > > @@ -4220,7 +4266,7 @@ static void __init kfree_rcu_batch_init(void)
> > > >  		}
> > > >  
> > > >  		for (i = 0; i < rcu_min_cached_objs; i++) {
> > > > -			bnode = (struct kfree_rcu_bulk_data *)
> > > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > > >  
> > > >  			if (bnode)
> > > > -- 
> > > > 2.20.1
> > > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-04  0:58         ` Joel Fernandes
@ 2020-05-04  2:20           ` Paul E. McKenney
  0 siblings, 0 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04  2:20 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 08:58:58PM -0400, Joel Fernandes wrote:
> On Sun, May 03, 2020 at 05:20:32PM -0700, Paul E. McKenney wrote:
> > On Sun, May 03, 2020 at 07:42:50PM -0400, Joel Fernandes wrote:
> > > On Fri, May 01, 2020 at 02:37:53PM -0700, Paul E. McKenney wrote:
> > > [...]
> > > > > @@ -2993,41 +2994,73 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > > > >  static void kfree_rcu_work(struct work_struct *work)
> > > > >  {
> > > > >  	unsigned long flags;
> > > > > +	struct kvfree_rcu_bulk_data *bkhead, *bvhead, *bnext;
> > > > >  	struct rcu_head *head, *next;
> > > > > -	struct kfree_rcu_bulk_data *bhead, *bnext;
> > > > >  	struct kfree_rcu_cpu *krcp;
> > > > >  	struct kfree_rcu_cpu_work *krwp;
> > > > > +	int i;
> > > > >  
> > > > >  	krwp = container_of(to_rcu_work(work),
> > > > >  			    struct kfree_rcu_cpu_work, rcu_work);
> > > > >  	krcp = krwp->krcp;
> > > > > +
> > > > >  	raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > +	/* Channel 1. */
> > > > > +	bkhead = krwp->bkvhead_free[0];
> > > > > +	krwp->bkvhead_free[0] = NULL;
> > > > > +
> > > > > +	/* Channel 2. */
> > > > > +	bvhead = krwp->bkvhead_free[1];
> > > > > +	krwp->bkvhead_free[1] = NULL;
> > > > > +
> > > > > +	/* Channel 3. */
> > > > >  	head = krwp->head_free;
> > > > >  	krwp->head_free = NULL;
> > > > > -	bhead = krwp->bhead_free;
> > > > > -	krwp->bhead_free = NULL;
> > > > >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > >  
> > > > > -	/* "bhead" is now private, so traverse locklessly. */
> > > > > -	for (; bhead; bhead = bnext) {
> > > > > -		bnext = bhead->next;
> > > > > -
> > > > > -		debug_rcu_bhead_unqueue(bhead);
> > > > > +	/* kmalloc()/kfree() channel. */
> > > > > +	for (; bkhead; bkhead = bnext) {
> > > > > +		bnext = bkhead->next;
> > > > > +		debug_rcu_bhead_unqueue(bkhead);
> > > > >  
> > > > >  		rcu_lock_acquire(&rcu_callback_map);
> > > > 
> > > > Given that rcu_lock_acquire() only affects lockdep, I have to ask exactly
> > > > what concurrency design you are using here...
> > > 
> > > I believe the rcu_callback_map usage above follows a similar pattern from old
> > > code where the rcu_callback_map is acquired before doing the kfree.
> > > 
> > > static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
> > > {
> > >         rcu_callback_t f;
> > >         unsigned long offset = (unsigned long)head->func;
> > > 
> > >         rcu_lock_acquire(&rcu_callback_map);
> > >         if (__is_kfree_rcu_offset(offset)) {
> > >                 trace_rcu_invoke_kfree_callback(rn, head, offset);
> > >                 kfree((void *)head - offset);
> > >                 rcu_lock_release(&rcu_callback_map);
> > > 
> > > So when kfree_rcu() was rewritten, the rcu_lock_acquire() of rcu_callback_map
> > > got carried.
> > > 
> > > I believe it is for detecting recursion where we possibly try to free
> > > RCU-held memory while already freeing memory. Or was there anoher purpose of
> > > the rcu_callback_map?
> > 
> > It looks like rcu_callback_map was been added by 77a40f97030 ("rcu:
> > Remove kfree_rcu() special casing and lazy-callback handling").  Which
> > was less than a year ago.  ;-)
> 
> I think that's just git blame falsely looking at moved code instead of the
> original code.
> 
> It was actually the following commit. I think you were trying to detect
> blocking and context-switching within an RCU callback. Since kfree_rcu() does
> not have RCU callback functions, may be we can just remove it?
> 
> commit 24ef659a857c3cba40b64ea51ea4fce8d2fb7bbc
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Mon Oct 28 09:22:24 2013 -0700
> 
>     rcu: Provide better diagnostics for blocking in RCU callback functions
> 
>     Currently blocking in an RCU callback function will result in
>     "scheduling while atomic", which could be triggered for any number
>     of reasons.  To aid debugging, this patch introduces a rcu_callback_map
>     that is used to tie the inappropriate voluntary context switch back
>     to the fact that the function is being invoked from within a callback.

Right you are!

I was fooled as you say by the code movement.  I was searching for
rcu_callback_map in kernel/rcu/tree.c rather than using "git grep"
or similar.

So I took my own advice and added a comment.  ;-)

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > Hmmm...  This would be a good way to allow lockdep to tell you that you
> > are running within an RCU callback on the one hand are are reclaiming
> > due to kfree_rcu() on the other.  Was that the intent?  If so, a comment
> > seems necessary.
> > 
> > 							Thanx, Paul
> > 
> > > thanks,
> > > 
> > >  - Joel
> > > 
> > > 
> > > > >  		trace_rcu_invoke_kfree_bulk_callback(rcu_state.name,
> > > > > -			bhead->nr_records, bhead->records);
> > > > > +			bkhead->nr_records, bkhead->records);
> > > > > +
> > > > > +		kfree_bulk(bkhead->nr_records, bkhead->records);
> > > > > +		rcu_lock_release(&rcu_callback_map);
> > > > > +
> > > > > +		krcp = krc_this_cpu_lock(&flags);
> > > > > +		if (put_cached_bnode(krcp, bkhead))
> > > > > +			bkhead = NULL;
> > > > > +		krc_this_cpu_unlock(krcp, flags);
> > > > > +
> > > > > +		if (bkhead)
> > > > > +			free_page((unsigned long) bkhead);
> > > > > +
> > > > > +		cond_resched_tasks_rcu_qs();
> > > > > +	}
> > > > > +
> > > > > +	/* vmalloc()/vfree() channel. */
> > > > > +	for (; bvhead; bvhead = bnext) {
> > > > > +		bnext = bvhead->next;
> > > > > +		debug_rcu_bhead_unqueue(bvhead);
> > > > >  
> > > > > -		kfree_bulk(bhead->nr_records, bhead->records);
> > > > > +		rcu_lock_acquire(&rcu_callback_map);
> > > > 
> > > > And the same here.
> > > > 
> > > > > +		for (i = 0; i < bvhead->nr_records; i++) {
> > > > > +			trace_rcu_invoke_kfree_callback(rcu_state.name,
> > > > > +				(struct rcu_head *) bvhead->records[i], 0);
> > > > > +			vfree(bvhead->records[i]);
> > > > > +		}
> > > > >  		rcu_lock_release(&rcu_callback_map);
> > > > >  
> > > > >  		krcp = krc_this_cpu_lock(&flags);
> > > > > -		if (put_cached_bnode(krcp, bhead))
> > > > > -			bhead = NULL;
> > > > > +		if (put_cached_bnode(krcp, bvhead))
> > > > > +			bvhead = NULL;
> > > > >  		krc_this_cpu_unlock(krcp, flags);
> > > > >  
> > > > > -		if (bhead)
> > > > > -			free_page((unsigned long) bhead);
> > > > > +		if (bvhead)
> > > > > +			free_page((unsigned long) bvhead);
> > > > >  
> > > > >  		cond_resched_tasks_rcu_qs();
> > > > >  	}
> > > > > @@ -3047,7 +3080,7 @@ static void kfree_rcu_work(struct work_struct *work)
> > > > >  		trace_rcu_invoke_kfree_callback(rcu_state.name, head, offset);
> > > > >  
> > > > >  		if (!WARN_ON_ONCE(!__is_kfree_rcu_offset(offset)))
> > > > > -			kfree(ptr);
> > > > > +			kvfree(ptr);
> > > > >  
> > > > >  		rcu_lock_release(&rcu_callback_map);
> > > > >  		cond_resched_tasks_rcu_qs();
> > > > > @@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > > > >  		krwp = &(krcp->krw_arr[i]);
> > > > >  
> > > > >  		/*
> > > > > -		 * Try to detach bhead or head and attach it over any
> > > > > +		 * Try to detach bkvhead or head and attach it over any
> > > > >  		 * available corresponding free channel. It can be that
> > > > >  		 * a previous RCU batch is in progress, it means that
> > > > >  		 * immediately to queue another one is not possible so
> > > > >  		 * return false to tell caller to retry.
> > > > >  		 */
> > > > > -		if ((krcp->bhead && !krwp->bhead_free) ||
> > > > > +		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
> > > > > +			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
> > > > >  				(krcp->head && !krwp->head_free)) {
> > > > > -			/* Channel 1. */
> > > > > -			if (!krwp->bhead_free) {
> > > > > -				krwp->bhead_free = krcp->bhead;
> > > > > -				krcp->bhead = NULL;
> > > > > +			/*
> > > > > +			 * Channel 1 corresponds to SLAB ptrs.
> > > > > +			 */
> > > > > +			if (!krwp->bkvhead_free[0]) {
> > > > > +				krwp->bkvhead_free[0] = krcp->bkvhead[0];
> > > > > +				krcp->bkvhead[0] = NULL;
> > > > >  			}
> > > > >  
> > > > > -			/* Channel 2. */
> > > > > +			/*
> > > > > +			 * Channel 2 corresponds to vmalloc ptrs.
> > > > > +			 */
> > > > > +			if (!krwp->bkvhead_free[1]) {
> > > > > +				krwp->bkvhead_free[1] = krcp->bkvhead[1];
> > > > > +				krcp->bkvhead[1] = NULL;
> > > > > +			}
> > > > 
> > > > Why not a "for" loop here?  Duplicate code is most certainly not what
> > > > we want, as it can cause all sorts of trouble down the road.
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > +			/*
> > > > > +			 * Channel 3 corresponds to emergency path.
> > > > > +			 */
> > > > >  			if (!krwp->head_free) {
> > > > >  				krwp->head_free = krcp->head;
> > > > >  				krcp->head = NULL;
> > > > > @@ -3095,16 +3141,17 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> > > > >  			WRITE_ONCE(krcp->count, 0);
> > > > >  
> > > > >  			/*
> > > > > -			 * One work is per one batch, so there are two "free channels",
> > > > > -			 * "bhead_free" and "head_free" the batch can handle. It can be
> > > > > -			 * that the work is in the pending state when two channels have
> > > > > -			 * been detached following each other, one by one.
> > > > > +			 * One work is per one batch, so there are three
> > > > > +			 * "free channels", the batch can handle. It can
> > > > > +			 * be that the work is in the pending state when
> > > > > +			 * channels have been detached following by each
> > > > > +			 * other.
> > > > >  			 */
> > > > >  			queue_rcu_work(system_wq, &krwp->rcu_work);
> > > > >  		}
> > > > >  
> > > > >  		/* Repeat if any "free" corresponding channel is still busy. */
> > > > > -		if (krcp->bhead || krcp->head)
> > > > > +		if (krcp->bkvhead[0] || krcp->bkvhead[1] || krcp->head)
> > > > >  			repeat = true;
> > > > >  	}
> > > > >  
> > > > > @@ -3146,23 +3193,22 @@ static void kfree_rcu_monitor(struct work_struct *work)
> > > > >  }
> > > > >  
> > > > >  static inline bool
> > > > > -kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > > > -	struct rcu_head *head, rcu_callback_t func)
> > > > > +kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> > > > >  {
> > > > > -	struct kfree_rcu_bulk_data *bnode;
> > > > > +	struct kvfree_rcu_bulk_data *bnode;
> > > > > +	int idx;
> > > > >  
> > > > >  	if (unlikely(!krcp->initialized))
> > > > >  		return false;
> > > > >  
> > > > >  	lockdep_assert_held(&krcp->lock);
> > > > > +	idx = !!is_vmalloc_addr(ptr);
> > > > >  
> > > > >  	/* Check if a new block is required. */
> > > > > -	if (!krcp->bhead ||
> > > > > -			krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> > > > > +	if (!krcp->bkvhead[idx] ||
> > > > > +			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
> > > > >  		bnode = get_cached_bnode(krcp);
> > > > >  		if (!bnode) {
> > > > > -			WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> > > > > -
> > > > >  			/*
> > > > >  			 * To keep this path working on raw non-preemptible
> > > > >  			 * sections, prevent the optional entry into the
> > > > > @@ -3175,7 +3221,7 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > > >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > > > >  				return false;
> > > > >  
> > > > > -			bnode = (struct kfree_rcu_bulk_data *)
> > > > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > > > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > > > >  		}
> > > > >  
> > > > > @@ -3185,30 +3231,30 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > > > >  
> > > > >  		/* Initialize the new block. */
> > > > >  		bnode->nr_records = 0;
> > > > > -		bnode->next = krcp->bhead;
> > > > > +		bnode->next = krcp->bkvhead[idx];
> > > > >  
> > > > >  		/* Attach it to the head. */
> > > > > -		krcp->bhead = bnode;
> > > > > +		krcp->bkvhead[idx] = bnode;
> > > > >  	}
> > > > >  
> > > > >  	/* Finally insert. */
> > > > > -	krcp->bhead->records[krcp->bhead->nr_records++] =
> > > > > -		(void *) head - (unsigned long) func;
> > > > > +	krcp->bkvhead[idx]->records
> > > > > +		[krcp->bkvhead[idx]->nr_records++] = ptr;
> > > > >  
> > > > >  	return true;
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > - * Queue a request for lazy invocation of kfree_bulk()/kfree() after a grace
> > > > > - * period. Please note there are two paths are maintained, one is the main one
> > > > > - * that uses kfree_bulk() interface and second one is emergency one, that is
> > > > > - * used only when the main path can not be maintained temporary, due to memory
> > > > > - * pressure.
> > > > > + * Queue a request for lazy invocation of appropriate free routine after a
> > > > > + * grace period. Please note there are three paths are maintained, two are the
> > > > > + * main ones that use array of pointers interface and third one is emergency
> > > > > + * one, that is used only when the main path can not be maintained temporary,
> > > > > + * due to memory pressure.
> > > > >   *
> > > > >   * Each kfree_call_rcu() request is added to a batch. The batch will be drained
> > > > >   * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
> > > > >   * be free'd in workqueue context. This allows us to: batch requests together to
> > > > > - * reduce the number of grace periods during heavy kfree_rcu() load.
> > > > > + * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
> > > > >   */
> > > > >  void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > > >  {
> > > > > @@ -3231,7 +3277,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > > >  	 * Under high memory pressure GFP_NOWAIT can fail,
> > > > >  	 * in that case the emergency path is maintained.
> > > > >  	 */
> > > > > -	if (unlikely(!kfree_call_rcu_add_ptr_to_bulk(krcp, head, func))) {
> > > > > +	if (unlikely(!kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr))) {
> > > > >  		head->func = func;
> > > > >  		head->next = krcp->head;
> > > > >  		krcp->head = head;
> > > > > @@ -4212,7 +4258,7 @@ static void __init kfree_rcu_batch_init(void)
> > > > >  
> > > > >  	for_each_possible_cpu(cpu) {
> > > > >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > -		struct kfree_rcu_bulk_data *bnode;
> > > > > +		struct kvfree_rcu_bulk_data *bnode;
> > > > >  
> > > > >  		for (i = 0; i < KFREE_N_BATCHES; i++) {
> > > > >  			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
> > > > > @@ -4220,7 +4266,7 @@ static void __init kfree_rcu_batch_init(void)
> > > > >  		}
> > > > >  
> > > > >  		for (i = 0; i < rcu_min_cached_objs; i++) {
> > > > > -			bnode = (struct kfree_rcu_bulk_data *)
> > > > > +			bnode = (struct kvfree_rcu_bulk_data *)
> > > > >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > > > >  
> > > > >  			if (bnode)
> > > > > -- 
> > > > > 2.20.1
> > > > > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/24] rcu/tree: Use static initializer for krc.lock
  2020-05-01 21:17   ` Paul E. McKenney
@ 2020-05-04 12:10     ` Uladzislau Rezki
  0 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko,
	Sebastian Andrzej Siewior

> >  
> > -	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
> > +	local_irq_save(*flags);	/* For safely calling this_cpu_ptr(). */
> 
> And here as well.  ;-)
> 
OK. For me it works either way. I can stick to "//" :)

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-01 21:27   ` Paul E. McKenney
@ 2020-05-04 12:43     ` Uladzislau Rezki
  2020-05-04 15:24       ` Paul E. McKenney
  0 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 02:27:49PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:48PM +0200, Uladzislau Rezki (Sony) wrote:
> > Cache some extra objects per-CPU. During reclaim process
> > some pages are cached instead of releasing by linking them
> > into the list. Such approach provides O(1) access time to
> > the cache.
> > 
> > That reduces number of requests to the page allocator, also
> > that makes it more helpful if a low memory condition occurs.
> > 
> > A parameter reflecting the minimum allowed pages to be
> > cached per one CPU is propagated via sysfs, it is read
> > only, the name is "rcu_min_cached_objs".
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  kernel/rcu/tree.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 60 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 89e9ca3f4e3e..d8975819b1c9 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -178,6 +178,14 @@ module_param(gp_init_delay, int, 0444);
> >  static int gp_cleanup_delay;
> >  module_param(gp_cleanup_delay, int, 0444);
> >  
> > +/*
> > + * This rcu parameter is read-only, but can be write also.
> 
> You mean that although the parameter is read-only, you see no reason
> why it could not be converted to writeable?
> 
I added just a note. If it is writable, then we can change the size of the
per-CPU cache dynamically, i.e. "echo 5 > /sys/.../rcu_min_cached_objs"
would cache 5 pages. But i do not have a strong opinion if it should be
writable.

>
> If it was writeable, and a given CPU had the maximum numbr of cached
> objects, the rcu_min_cached_objs value was decreased, but that CPU never
> saw another kfree_rcu(), would the number of cached objects change?
> 
No. It works the way: unqueue the page from cache in the kfree_rcu(),
whereas "rcu work" will put it back if number of objects < rcu_min_cached_objs,
if >= will free the page.

>
> (Just curious, not asking for a change in functionality.)
> 
> > + * It reflects the minimum allowed number of objects which
> > + * can be cached per-CPU. Object size is equal to one page.
> > + */
> > +int rcu_min_cached_objs = 2;
> > +module_param(rcu_min_cached_objs, int, 0444);
> > +
> >  /* Retrieve RCU kthreads priority for rcutorture */
> >  int rcu_get_gp_kthreads_prio(void)
> >  {
> > @@ -2887,7 +2895,6 @@ struct kfree_rcu_cpu_work {
> >   * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
> >   * @head: List of kfree_rcu() objects not yet waiting for a grace period
> >   * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
> > - * @bcached: Keeps at most one object for later reuse when build chain blocks
> >   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
> >   * @lock: Synchronize access to this structure
> >   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> > @@ -2902,7 +2909,6 @@ struct kfree_rcu_cpu_work {
> >  struct kfree_rcu_cpu {
> >  	struct rcu_head *head;
> >  	struct kfree_rcu_bulk_data *bhead;
> > -	struct kfree_rcu_bulk_data *bcached;
> >  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> >  	raw_spinlock_t lock;
> >  	struct delayed_work monitor_work;
> > @@ -2910,6 +2916,15 @@ struct kfree_rcu_cpu {
> >  	bool initialized;
> >  	// Number of objects for which GP not started
> >  	int count;
> > +
> > +	/*
> > +	 * Number of cached objects which are queued into
> > +	 * the lock-less list. This cache is used by the
> > +	 * kvfree_call_rcu() function and as of now its
> > +	 * size is static.
> > +	 */
> > +	struct llist_head bkvcache;
> > +	int nr_bkv_objs;
> >  };
> >  
> >  static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> > @@ -2946,6 +2961,31 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
> >  	local_irq_restore(flags);
> >  }
> >  
> > +static inline struct kfree_rcu_bulk_data *
> > +get_cached_bnode(struct kfree_rcu_cpu *krcp)
> > +{
> > +	if (!krcp->nr_bkv_objs)
> > +		return NULL;
> > +
> > +	krcp->nr_bkv_objs--;
> > +	return (struct kfree_rcu_bulk_data *)
> > +		llist_del_first(&krcp->bkvcache);
> > +}
> > +
> > +static inline bool
> > +put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > +	struct kfree_rcu_bulk_data *bnode)
> > +{
> > +	/* Check the limit. */
> > +	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
> > +		return false;
> > +
> > +	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
> > +	krcp->nr_bkv_objs++;
> > +	return true;
> > +
> > +}
> > +
> >  /*
> >   * This function is invoked in workqueue context after a grace period.
> >   * It frees all the objects queued on ->bhead_free or ->head_free.
> > @@ -2981,7 +3021,12 @@ static void kfree_rcu_work(struct work_struct *work)
> >  		kfree_bulk(bhead->nr_records, bhead->records);
> >  		rcu_lock_release(&rcu_callback_map);
> >  
> > -		if (cmpxchg(&krcp->bcached, NULL, bhead))
> > +		krcp = krc_this_cpu_lock(&flags);
> 
> Presumably the list can also be accessed without holding this lock,
> because otherwise we shouldn't need llist...
> 
Hm... We increase the number of elements in cache, therefore it is not
lockless. From the other hand i used llist_head to maintain the cache
because it is single linked list, we do not need "*prev" link. Also
we do not need to init the list.

But i can change it to list_head. Please let me know if i need :)

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description
  2020-05-01 22:25   ` Paul E. McKenney
@ 2020-05-04 12:44     ` Uladzislau Rezki
  0 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 03:25:24PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:49PM +0200, Uladzislau Rezki (Sony) wrote:
> > Document the rcutree.rcu_min_cached_objs sysfs kernel parameter.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Could you please combine this wtih the patch that created this sysfs
> parameter?
> 
Will combine them.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header
  2020-05-01 23:03   ` Paul E. McKenney
@ 2020-05-04 12:45     ` Uladzislau Rezki
  2020-05-06 18:29     ` Uladzislau Rezki
  1 sibling, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 04:03:59PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:59:00PM +0200, Uladzislau Rezki (Sony) wrote:
> > Move inlined kvfree_call_rcu() function out of the
> > header file. This step is a preparation for head-less
> > support.
> > 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  include/linux/rcutiny.h | 6 +-----
> >  kernel/rcu/tiny.c       | 6 ++++++
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > index 0c6315c4a0fe..7eb66909ae1b 100644
> > --- a/include/linux/rcutiny.h
> > +++ b/include/linux/rcutiny.h
> > @@ -34,11 +34,7 @@ static inline void synchronize_rcu_expedited(void)
> >  	synchronize_rcu();
> >  }
> >  
> > -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > -{
> > -	call_rcu(head, func);
> > -}
> > -
> > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> >  void rcu_qs(void);
> >  
> >  static inline void rcu_softirq_qs(void)
> > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > index aa897c3f2e92..508c82faa45c 100644
> > --- a/kernel/rcu/tiny.c
> > +++ b/kernel/rcu/tiny.c
> > @@ -177,6 +177,12 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu);
> >  
> > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +{
> > +	call_rcu(head, func);
> > +}
> > +EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> 
> This increases the size of Tiny RCU.  Plus in Tiny RCU, the overhead of
> synchronize_rcu() is exactly zero.  So why not make the single-argument
> kvfree_call_rcu() just unconditionally do synchronize_rcu() followed by
> kvfree() or whatever?  That should go just fine into the header file.
> 
I was thinking about it. That makes sense. Let me rework it then :)

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 22/24] rcu/tiny: support reclaim for head-less object
  2020-05-04  0:27     ` Joel Fernandes
@ 2020-05-04 12:45       ` Uladzislau Rezki
  0 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:45 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 08:27:00PM -0400, Joel Fernandes wrote:
> On Fri, May 01, 2020 at 04:06:38PM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 28, 2020 at 10:59:01PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Make a kvfree_call_rcu() function to support head-less
> > > freeing. Same as for tree-RCU, for such purpose we store
> > > pointers in array. SLAB and vmalloc ptrs. are mixed and
> > > coexist together.
> > > 
> > > Under high memory pressure it can be that maintaining of
> > > arrays becomes impossible. Objects with an rcu_head are
> > > released via call_rcu(). When it comes to the head-less
> > > variant, the kvfree() call is directly inlined, i.e. we
> > > do the same as for tree-RCU:
> > >     a) wait until a grace period has elapsed;
> > >     b) direct inlining of the kvfree() call.
> > > 
> > > Thus the current context has to follow might_sleep()
> > > annotation. Also please note that for tiny-RCU any
> > > call of synchronize_rcu() is actually a quiescent
> > > state, therefore (a) does nothing.
> > 
> > Please, please, please just do synchronize_rcu() followed by kvfree()
> > for single-argument kfree_rcu() and friends in Tiny RCU.
> > 
> > Way simpler and probably way faster as well.  And given that Tiny RCU
> > runs only on uniprocessor systems, the complexity probably is buying
> > you very little, if anything.
> 
> Agreed.
> 
Cool. Agree also :)

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment
  2020-05-04  0:31         ` Joel Fernandes
@ 2020-05-04 12:56           ` Uladzislau Rezki
  0 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

On Sun, May 03, 2020 at 08:31:06PM -0400, Joel Fernandes wrote:
> On Sun, May 03, 2020 at 05:29:47PM -0700, Paul E. McKenney wrote:
> > On Sun, May 03, 2020 at 08:24:37PM -0400, Joel Fernandes wrote:
> > > On Fri, May 01, 2020 at 04:00:52PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Apr 28, 2020 at 10:58:59PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > > > 
> > > > > Handle cases where the the object being kvfree_rcu()'d is not aligned by
> > > > > 2-byte boundaries.
> > > > > 
> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  kernel/rcu/tree.c | 9 ++++++---
> > > > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 501cac02146d..649bad7ad0f0 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2877,6 +2877,9 @@ struct kvfree_rcu_bulk_data {
> > > > >  #define KVFREE_BULK_MAX_ENTR \
> > > > >  	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
> > > > >  
> > > > > +/* Encoding the offset of a fake rcu_head to indicate the head is a wrapper. */
> > > > > +#define RCU_HEADLESS_KFREE BIT(31)
> > > > 
> > > > Did I miss the check for freeing something larger than 2GB?  Or is this
> > > > impossible, even on systems with many terabytes of physical memory?
> > > > Even if it is currently impossible, what prevents it from suddenly
> > > > becoming all too possible at some random point in the future?  If you
> > > > think that this will never happen, please keep in mind that the first
> > > > time I heard "640K ought to be enough for anybody", it sounded eminently
> > > > reasonable to me.
> > > > 
> > > > Besides...
> > > > 
> > > > Isn't the offset in question the offset of an rcu_head struct within
> > > > the enclosing structure? If so, why not keep the current requirement
> > > > that this be at least 16-bit aligned, especially given that some work
> > > > is required to make that alignment less than pointer sized?  Then you
> > > > can continue using bit 0.
> > > > 
> > > > This alignment requirement is included in the RCU requirements
> > > > documentation and is enforced within the __call_rcu() function.
> > > > 
> > > > So let's leave this at bit 0.
> > > 
> > > This patch is needed only if we are growing the fake rcu_head. Since you
> > > mentioned in a previous patch in this series that you don't want to do that,
> > > and just rely on availability of the array of pointers or synchronize_rcu(),
> > > we can drop this patch. If we are not dropping that earlier patch, let us
> > > discuss more.
> > 
> > Dropping it sounds very good to me!
> 
> Cool ;-) Thanks,
> 
OK. Then we drop this patch and all dynamic rcu_head attaching logic
what will make the code size smaller.

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-01 22:39   ` Paul E. McKenney
  2020-05-04  0:12     ` Joel Fernandes
@ 2020-05-04 12:57     ` Uladzislau Rezki
  1 sibling, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 12:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Fri, May 01, 2020 at 03:39:09PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 28, 2020 at 10:58:58PM +0200, Uladzislau Rezki (Sony) wrote:
> > Update the kvfree_call_rcu() with head-less support, it
> > means an object without any rcu_head structure can be
> > reclaimed after GP.
> > 
> > To store pointers there are two chain-arrays maintained
> > one for SLAB and another one is for vmalloc. Both types
> > of objects(head-less variant and regular one) are placed
> > there based on the type.
> > 
> > It can be that maintaining of arrays becomes impossible
> > due to high memory pressure. For such reason there is an
> > emergency path. In that case objects with rcu_head inside
> > are just queued building one way list. Later on that list
> > is drained.
> > 
> > As for head-less variant. Such objects do not have any
> > rcu_head helper inside. Thus it is dynamically attached.
> > As a result an object consists of back-pointer and regular
> > rcu_head. It implies that emergency path can detect such
> > object type, therefore they are tagged. So a back-pointer
> > could be freed as well as dynamically attached wrapper.
> > 
> > Even though such approach requires dynamic memory it needs
> > only sizeof(unsigned long *) + sizeof(struct rcu_head) bytes,
> > thus SLAB is used to obtain it. Finally if attaching of the
> > rcu_head and queuing get failed, the current context has
> > to follow might_sleep() annotation, thus below steps could
> > be applied:
> >    a) wait until a grace period has elapsed;
> >    b) direct inlining of the kvfree() call.
> > 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 102 ++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 98 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 51726e4c3b4d..501cac02146d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3072,15 +3072,31 @@ static void kfree_rcu_work(struct work_struct *work)
> >  	 */
> >  	for (; head; head = next) {
> >  		unsigned long offset = (unsigned long)head->func;
> > -		void *ptr = (void *)head - offset;
> > +		bool headless;
> > +		void *ptr;
> >  
> >  		next = head->next;
> > +
> > +		/* We tag the headless object, if so adjust offset. */
> > +		headless = (((unsigned long) head - offset) & BIT(0));
> > +		if (headless)
> > +			offset -= 1;
> > +
> > +		ptr = (void *) head - offset;
> > +
> >  		debug_rcu_head_unqueue((struct rcu_head *)ptr);
> >  		rcu_lock_acquire(&rcu_callback_map);
> >  		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
> >  
> > -		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> > +		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) {
> > +			/*
> > +			 * If headless free the back-pointer first.
> > +			 */
> > +			if (headless)
> > +				kvfree((void *) *((unsigned long *) ptr));
> > +
> >  			kvfree(ptr);
> > +		}
> >  
> >  		rcu_lock_release(&rcu_callback_map);
> >  		cond_resched_tasks_rcu_qs();
> > @@ -3221,6 +3237,13 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> >  			if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >  				return false;
> >  
> > +			/*
> > +			 * TODO: For one argument of kvfree_rcu() we can
> > +			 * drop the lock and get the page in sleepable
> > +			 * context. That would allow to maintain an array
> > +			 * for the CONFIG_PREEMPT_RT as well. Thus we could
> > +			 * get rid of dynamic rcu_head attaching code.
> > +			 */
> >  			bnode = (struct kvfree_rcu_bulk_data *)
> >  				__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> >  		}
> > @@ -3244,6 +3267,23 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
> >  	return true;
> >  }
> >  
> > +static inline struct rcu_head *
> > +attach_rcu_head_to_object(void *obj)
> > +{
> > +	unsigned long *ptr;
> > +
> > +	ptr = kmalloc(sizeof(unsigned long *) +
> > +			sizeof(struct rcu_head), GFP_NOWAIT |
> > +				__GFP_RECLAIM |	/* can do direct reclaim. */
> > +				__GFP_NORETRY |	/* only lightweight one.  */
> > +				__GFP_NOWARN);	/* no failure reports. */
> 
> Again, let's please not do this single-pointer-sized allocation.  If
> a full page is not available and this is a single-argument kfree_rcu(),
> just call synchronize_rcu() and then free the object directly.
> 
> It should not be -that- hard to adjust locking for CONFIG_PREEMPT_RT!
> For example, have some kind of reservation protocol so that a task
> that drops the lock can retry the page allocation and be sure of having
> a place to put it.  This might entail making CONFIG_PREEMPT_RT reserve
> more pages per CPU.  Or maybe that would not be necessary.
> 
Agreed. Will drop it!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-04  0:32         ` Joel Fernandes
@ 2020-05-04 14:21           ` Uladzislau Rezki
  2020-05-04 15:31             ` Paul E. McKenney
  0 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 14:21 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney
  Cc: Paul E. McKenney, Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko

> > > 
> > > If we are not doing single-pointer allocation, then that would also eliminate
> > > entering the low-level page allocator for single-pointer allocations.
> > > 
> > > Or did you mean entry into the allocator for the full-page allocations
> > > related to the pointer array for PREEMPT_RT? Even if we skip entry into the
> > > allocator for those, we will still have additional caching which further
> > > reduces chances of getting a full page. In the event of such failure, we can
> > > simply queue the rcu_head.
> > > 
> > > Thoughts?
> > 
> > I was just trying to guess why you kept the single-pointer allocation.
> > It looks like I guessed wrong.  ;-)
> > 
> > If, as you say above, you make it go straight to synchronize_rcu()
> > upon full-page allocation failure, that would be good!
> 
> Paul, sounds good. Vlad, are you also Ok with that?
> 
OK, let's drop it and keep it simple :)

BTW, for PREEMPT_RT we still can do a page allocation for single
argument of kvfree_rcu(). In case of double we just revert everything
to the rcu_head if no cache.

For single argument we can drop the lock before the entry to the page
allocator. Because it follows might_sleep() anotation we avoid of having
a situation when spinlock(rt mutex) is taken from any atomic context.

Since the lock is dropped the current context can be interrupted by
an IRQ which in its turn can also call kvfree_rcu() on current CPU.
In that case it must be double argument(single is not allowed) kvfree_rcu()
call. For PREEMPT_RT if no cache everything is reverted to rcu_head usage,
i.e. the entry to page allocator is bypassed.

It can be addressed as a separate patch and send out later on if we
are on the same page.

Paul, Joel what are your opinions?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs
  2020-05-01 21:37   ` Paul E. McKenney
  2020-05-03 23:42     ` Joel Fernandes
@ 2020-05-04 14:25     ` Uladzislau Rezki
  1 sibling, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 14:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

> > @@ -3072,21 +3105,34 @@ static inline bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krcp)
> >  		krwp = &(krcp->krw_arr[i]);
> >  
> >  		/*
> > -		 * Try to detach bhead or head and attach it over any
> > +		 * Try to detach bkvhead or head and attach it over any
> >  		 * available corresponding free channel. It can be that
> >  		 * a previous RCU batch is in progress, it means that
> >  		 * immediately to queue another one is not possible so
> >  		 * return false to tell caller to retry.
> >  		 */
> > -		if ((krcp->bhead && !krwp->bhead_free) ||
> > +		if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
> > +			(krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
> >  				(krcp->head && !krwp->head_free)) {
> > -			/* Channel 1. */
> > -			if (!krwp->bhead_free) {
> > -				krwp->bhead_free = krcp->bhead;
> > -				krcp->bhead = NULL;
> > +			/*
> > +			 * Channel 1 corresponds to SLAB ptrs.
> > +			 */
> > +			if (!krwp->bkvhead_free[0]) {
> > +				krwp->bkvhead_free[0] = krcp->bkvhead[0];
> > +				krcp->bkvhead[0] = NULL;
> >  			}
> >  
> > -			/* Channel 2. */
> > +			/*
> > +			 * Channel 2 corresponds to vmalloc ptrs.
> > +			 */
> > +			if (!krwp->bkvhead_free[1]) {
> > +				krwp->bkvhead_free[1] = krcp->bkvhead[1];
> > +				krcp->bkvhead[1] = NULL;
> > +			}
> 
> Why not a "for" loop here?  Duplicate code is most certainly not what
> we want, as it can cause all sorts of trouble down the road.
> 
Agree. Can be done. Thanks :)

--
Vlad Rezki



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 12:43     ` Uladzislau Rezki
@ 2020-05-04 15:24       ` Paul E. McKenney
  2020-05-04 17:48         ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04 15:24 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 02:43:23PM +0200, Uladzislau Rezki wrote:
> On Fri, May 01, 2020 at 02:27:49PM -0700, Paul E. McKenney wrote:
> > On Tue, Apr 28, 2020 at 10:58:48PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Cache some extra objects per-CPU. During reclaim process
> > > some pages are cached instead of releasing by linking them
> > > into the list. Such approach provides O(1) access time to
> > > the cache.
> > > 
> > > That reduces number of requests to the page allocator, also
> > > that makes it more helpful if a low memory condition occurs.
> > > 
> > > A parameter reflecting the minimum allowed pages to be
> > > cached per one CPU is propagated via sysfs, it is read
> > > only, the name is "rcu_min_cached_objs".
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > >  kernel/rcu/tree.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 60 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 89e9ca3f4e3e..d8975819b1c9 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -178,6 +178,14 @@ module_param(gp_init_delay, int, 0444);
> > >  static int gp_cleanup_delay;
> > >  module_param(gp_cleanup_delay, int, 0444);
> > >  
> > > +/*
> > > + * This rcu parameter is read-only, but can be write also.
> > 
> > You mean that although the parameter is read-only, you see no reason
> > why it could not be converted to writeable?
> > 
> I added just a note. If it is writable, then we can change the size of the
> per-CPU cache dynamically, i.e. "echo 5 > /sys/.../rcu_min_cached_objs"
> would cache 5 pages. But i do not have a strong opinion if it should be
> writable.
> 
> > If it was writeable, and a given CPU had the maximum numbr of cached
> > objects, the rcu_min_cached_objs value was decreased, but that CPU never
> > saw another kfree_rcu(), would the number of cached objects change?
> > 
> No. It works the way: unqueue the page from cache in the kfree_rcu(),
> whereas "rcu work" will put it back if number of objects < rcu_min_cached_objs,
> if >= will free the page.

Just to make sure I understand...  If someone writes a smaller number to
the sysfs variable, the per-CPU caches will be decreased at that point,
immediately during that sysfs write?  Or are you saying something else?

> > (Just curious, not asking for a change in functionality.)
> > 
> > > + * It reflects the minimum allowed number of objects which
> > > + * can be cached per-CPU. Object size is equal to one page.
> > > + */
> > > +int rcu_min_cached_objs = 2;
> > > +module_param(rcu_min_cached_objs, int, 0444);
> > > +
> > >  /* Retrieve RCU kthreads priority for rcutorture */
> > >  int rcu_get_gp_kthreads_prio(void)
> > >  {
> > > @@ -2887,7 +2895,6 @@ struct kfree_rcu_cpu_work {
> > >   * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
> > >   * @head: List of kfree_rcu() objects not yet waiting for a grace period
> > >   * @bhead: Bulk-List of kfree_rcu() objects not yet waiting for a grace period
> > > - * @bcached: Keeps at most one object for later reuse when build chain blocks
> > >   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
> > >   * @lock: Synchronize access to this structure
> > >   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> > > @@ -2902,7 +2909,6 @@ struct kfree_rcu_cpu_work {
> > >  struct kfree_rcu_cpu {
> > >  	struct rcu_head *head;
> > >  	struct kfree_rcu_bulk_data *bhead;
> > > -	struct kfree_rcu_bulk_data *bcached;
> > >  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> > >  	raw_spinlock_t lock;
> > >  	struct delayed_work monitor_work;
> > > @@ -2910,6 +2916,15 @@ struct kfree_rcu_cpu {
> > >  	bool initialized;
> > >  	// Number of objects for which GP not started
> > >  	int count;
> > > +
> > > +	/*
> > > +	 * Number of cached objects which are queued into
> > > +	 * the lock-less list. This cache is used by the
> > > +	 * kvfree_call_rcu() function and as of now its
> > > +	 * size is static.
> > > +	 */
> > > +	struct llist_head bkvcache;
> > > +	int nr_bkv_objs;
> > >  };
> > >  
> > >  static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
> > > @@ -2946,6 +2961,31 @@ krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
> > >  	local_irq_restore(flags);
> > >  }
> > >  
> > > +static inline struct kfree_rcu_bulk_data *
> > > +get_cached_bnode(struct kfree_rcu_cpu *krcp)
> > > +{
> > > +	if (!krcp->nr_bkv_objs)
> > > +		return NULL;
> > > +
> > > +	krcp->nr_bkv_objs--;
> > > +	return (struct kfree_rcu_bulk_data *)
> > > +		llist_del_first(&krcp->bkvcache);
> > > +}
> > > +
> > > +static inline bool
> > > +put_cached_bnode(struct kfree_rcu_cpu *krcp,
> > > +	struct kfree_rcu_bulk_data *bnode)
> > > +{
> > > +	/* Check the limit. */
> > > +	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
> > > +		return false;
> > > +
> > > +	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
> > > +	krcp->nr_bkv_objs++;
> > > +	return true;
> > > +
> > > +}
> > > +
> > >  /*
> > >   * This function is invoked in workqueue context after a grace period.
> > >   * It frees all the objects queued on ->bhead_free or ->head_free.
> > > @@ -2981,7 +3021,12 @@ static void kfree_rcu_work(struct work_struct *work)
> > >  		kfree_bulk(bhead->nr_records, bhead->records);
> > >  		rcu_lock_release(&rcu_callback_map);
> > >  
> > > -		if (cmpxchg(&krcp->bcached, NULL, bhead))
> > > +		krcp = krc_this_cpu_lock(&flags);
> > 
> > Presumably the list can also be accessed without holding this lock,
> > because otherwise we shouldn't need llist...
> > 
> Hm... We increase the number of elements in cache, therefore it is not
> lockless. From the other hand i used llist_head to maintain the cache
> because it is single linked list, we do not need "*prev" link. Also
> we do not need to init the list.
> 
> But i can change it to list_head. Please let me know if i need :)

Hmmm...  Maybe it is time for a non-atomic singly linked list?  In the RCU
callback processing, the operations were open-coded, but they have been
pushed into include/linux/rcu_segcblist.h and kernel/rcu/rcu_segcblist.*.

Maybe some non-atomic/protected/whatever macros in the llist.h file?
Or maybe just open-code the singly linked list?  (Probably not the
best choice, though.)  Add comments stating that the atomic properties
of the llist functions aren't neded?  Something else?

The comments would be a good start.  Just to take pity on people seeing
the potential for concurrency and wondering how the concurrent accesses
actually happen.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-04 14:21           ` Uladzislau Rezki
@ 2020-05-04 15:31             ` Paul E. McKenney
  2020-05-04 16:56               ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04 15:31 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 04:21:53PM +0200, Uladzislau Rezki wrote:
> > > > 
> > > > If we are not doing single-pointer allocation, then that would also eliminate
> > > > entering the low-level page allocator for single-pointer allocations.
> > > > 
> > > > Or did you mean entry into the allocator for the full-page allocations
> > > > related to the pointer array for PREEMPT_RT? Even if we skip entry into the
> > > > allocator for those, we will still have additional caching which further
> > > > reduces chances of getting a full page. In the event of such failure, we can
> > > > simply queue the rcu_head.
> > > > 
> > > > Thoughts?
> > > 
> > > I was just trying to guess why you kept the single-pointer allocation.
> > > It looks like I guessed wrong.  ;-)
> > > 
> > > If, as you say above, you make it go straight to synchronize_rcu()
> > > upon full-page allocation failure, that would be good!
> > 
> > Paul, sounds good. Vlad, are you also Ok with that?
> > 
> OK, let's drop it and keep it simple :)
> 
> BTW, for PREEMPT_RT we still can do a page allocation for single
> argument of kvfree_rcu(). In case of double we just revert everything
> to the rcu_head if no cache.
> 
> For single argument we can drop the lock before the entry to the page
> allocator. Because it follows might_sleep() anotation we avoid of having
> a situation when spinlock(rt mutex) is taken from any atomic context.
> 
> Since the lock is dropped the current context can be interrupted by
> an IRQ which in its turn can also call kvfree_rcu() on current CPU.
> In that case it must be double argument(single is not allowed) kvfree_rcu()
> call. For PREEMPT_RT if no cache everything is reverted to rcu_head usage,
> i.e. the entry to page allocator is bypassed.
> 
> It can be addressed as a separate patch and send out later on if we
> are on the same page.
> 
> Paul, Joel what are your opinions?

I strongly prefer that it be removed from the series.  I do understand
that this is a bit more hassle right now, but this does help avoid
confusion in the future, plus perhaps also avoiding issues with future
bisections.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-04 15:31             ` Paul E. McKenney
@ 2020-05-04 16:56               ` Uladzislau Rezki
  2020-05-04 17:08                 ` Paul E. McKenney
  0 siblings, 1 reply; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 16:56 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

> > 
> > For single argument we can drop the lock before the entry to the page
> > allocator. Because it follows might_sleep() anotation we avoid of having
> > a situation when spinlock(rt mutex) is taken from any atomic context.
> > 
> > Since the lock is dropped the current context can be interrupted by
> > an IRQ which in its turn can also call kvfree_rcu() on current CPU.
> > In that case it must be double argument(single is not allowed) kvfree_rcu()
> > call. For PREEMPT_RT if no cache everything is reverted to rcu_head usage,
> > i.e. the entry to page allocator is bypassed.
> > 
> > It can be addressed as a separate patch and send out later on if we
> > are on the same page.
> > 
> > Paul, Joel what are your opinions?
> 
> I strongly prefer that it be removed from the series.  I do understand
> that this is a bit more hassle right now, but this does help avoid
> confusion in the future, plus perhaps also avoiding issues with future
> bisections.
> 
We have already decided to get rid of it, i mean small allocations(dynamic
rcu_head attaching). I will exclude it from next patch-set version. 

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/24] rcu/tree: Support reclaim for head-less object
  2020-05-04 16:56               ` Uladzislau Rezki
@ 2020-05-04 17:08                 ` Paul E. McKenney
  0 siblings, 0 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04 17:08 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 06:56:29PM +0200, Uladzislau Rezki wrote:
> > > 
> > > For single argument we can drop the lock before the entry to the page
> > > allocator. Because it follows might_sleep() anotation we avoid of having
> > > a situation when spinlock(rt mutex) is taken from any atomic context.
> > > 
> > > Since the lock is dropped the current context can be interrupted by
> > > an IRQ which in its turn can also call kvfree_rcu() on current CPU.
> > > In that case it must be double argument(single is not allowed) kvfree_rcu()
> > > call. For PREEMPT_RT if no cache everything is reverted to rcu_head usage,
> > > i.e. the entry to page allocator is bypassed.
> > > 
> > > It can be addressed as a separate patch and send out later on if we
> > > are on the same page.
> > > 
> > > Paul, Joel what are your opinions?
> > 
> > I strongly prefer that it be removed from the series.  I do understand
> > that this is a bit more hassle right now, but this does help avoid
> > confusion in the future, plus perhaps also avoiding issues with future
> > bisections.
> > 
> We have already decided to get rid of it, i mean small allocations(dynamic
> rcu_head attaching). I will exclude it from next patch-set version. 

Very good, and thank you!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 15:24       ` Paul E. McKenney
@ 2020-05-04 17:48         ` Uladzislau Rezki
  2020-05-04 18:07           ` Paul E. McKenney
  2020-05-04 18:08           ` Joel Fernandes
  0 siblings, 2 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 17:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, Joel Fernandes, RCU,
	Oleksiy Avramchenko

On Mon, May 04, 2020 at 08:24:37AM -0700, Paul E. McKenney wrote:
> On Mon, May 04, 2020 at 02:43:23PM +0200, Uladzislau Rezki wrote:
> > On Fri, May 01, 2020 at 02:27:49PM -0700, Paul E. McKenney wrote:
> > > On Tue, Apr 28, 2020 at 10:58:48PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > Cache some extra objects per-CPU. During reclaim process
> > > > some pages are cached instead of releasing by linking them
> > > > into the list. Such approach provides O(1) access time to
> > > > the cache.
> > > > 
> > > > That reduces number of requests to the page allocator, also
> > > > that makes it more helpful if a low memory condition occurs.
> > > > 
> > > > A parameter reflecting the minimum allowed pages to be
> > > > cached per one CPU is propagated via sysfs, it is read
> > > > only, the name is "rcu_min_cached_objs".
> > > > 
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > ---
> > > >  kernel/rcu/tree.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 60 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 89e9ca3f4e3e..d8975819b1c9 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -178,6 +178,14 @@ module_param(gp_init_delay, int, 0444);
> > > >  static int gp_cleanup_delay;
> > > >  module_param(gp_cleanup_delay, int, 0444);
> > > >  
> > > > +/*
> > > > + * This rcu parameter is read-only, but can be write also.
> > > 
> > > You mean that although the parameter is read-only, you see no reason
> > > why it could not be converted to writeable?
> > > 
> > I added just a note. If it is writable, then we can change the size of the
> > per-CPU cache dynamically, i.e. "echo 5 > /sys/.../rcu_min_cached_objs"
> > would cache 5 pages. But i do not have a strong opinion if it should be
> > writable.
> > 
> > > If it was writeable, and a given CPU had the maximum numbr of cached
> > > objects, the rcu_min_cached_objs value was decreased, but that CPU never
> > > saw another kfree_rcu(), would the number of cached objects change?
> > > 
> > No. It works the way: unqueue the page from cache in the kfree_rcu(),
> > whereas "rcu work" will put it back if number of objects < rcu_min_cached_objs,
> > if >= will free the page.
> 
> Just to make sure I understand...  If someone writes a smaller number to
> the sysfs variable, the per-CPU caches will be decreased at that point,
> immediately during that sysfs write?  Or are you saying something else?
> 
This patch defines it as read-only. It defines the minimum threshold that
controls number of elements in the per-CPU cache. If we decide to make it
write also, then we will have full of freedom how to define its behavior,
i.e. it is not defined because it is read only.


> > > Presumably the list can also be accessed without holding this lock,
> > > because otherwise we shouldn't need llist...
> > > 
> > Hm... We increase the number of elements in cache, therefore it is not
> > lockless. From the other hand i used llist_head to maintain the cache
> > because it is single linked list, we do not need "*prev" link. Also
> > we do not need to init the list.
> > 
> > But i can change it to list_head. Please let me know if i need :)
> 
> Hmmm...  Maybe it is time for a non-atomic singly linked list?  In the RCU
> callback processing, the operations were open-coded, but they have been
> pushed into include/linux/rcu_segcblist.h and kernel/rcu/rcu_segcblist.*.
> 
> Maybe some non-atomic/protected/whatever macros in the llist.h file?
> Or maybe just open-code the singly linked list?  (Probably not the
> best choice, though.)  Add comments stating that the atomic properties
> of the llist functions aren't neded?  Something else?
>
In order to keep it simple i can replace llist_head by the list_head?

> 
> The comments would be a good start.  Just to take pity on people seeing
> the potential for concurrency and wondering how the concurrent accesses
> actually happen.  ;-)
> 
Sounds like you are kidding me :) 

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 17:48         ` Uladzislau Rezki
@ 2020-05-04 18:07           ` Paul E. McKenney
  2020-05-04 18:08           ` Joel Fernandes
  1 sibling, 0 replies; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04 18:07 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 07:48:22PM +0200, Uladzislau Rezki wrote:
> On Mon, May 04, 2020 at 08:24:37AM -0700, Paul E. McKenney wrote:
> > On Mon, May 04, 2020 at 02:43:23PM +0200, Uladzislau Rezki wrote:
> > > On Fri, May 01, 2020 at 02:27:49PM -0700, Paul E. McKenney wrote:
> > > > On Tue, Apr 28, 2020 at 10:58:48PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > Cache some extra objects per-CPU. During reclaim process
> > > > > some pages are cached instead of releasing by linking them
> > > > > into the list. Such approach provides O(1) access time to
> > > > > the cache.
> > > > > 
> > > > > That reduces number of requests to the page allocator, also
> > > > > that makes it more helpful if a low memory condition occurs.
> > > > > 
> > > > > A parameter reflecting the minimum allowed pages to be
> > > > > cached per one CPU is propagated via sysfs, it is read
> > > > > only, the name is "rcu_min_cached_objs".
> > > > > 
> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > ---
> > > > >  kernel/rcu/tree.c | 64 ++++++++++++++++++++++++++++++++++++++++++++---
> > > > >  1 file changed, 60 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 89e9ca3f4e3e..d8975819b1c9 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -178,6 +178,14 @@ module_param(gp_init_delay, int, 0444);
> > > > >  static int gp_cleanup_delay;
> > > > >  module_param(gp_cleanup_delay, int, 0444);
> > > > >  
> > > > > +/*
> > > > > + * This rcu parameter is read-only, but can be write also.
> > > > 
> > > > You mean that although the parameter is read-only, you see no reason
> > > > why it could not be converted to writeable?
> > > > 
> > > I added just a note. If it is writable, then we can change the size of the
> > > per-CPU cache dynamically, i.e. "echo 5 > /sys/.../rcu_min_cached_objs"
> > > would cache 5 pages. But i do not have a strong opinion if it should be
> > > writable.
> > > 
> > > > If it was writeable, and a given CPU had the maximum numbr of cached
> > > > objects, the rcu_min_cached_objs value was decreased, but that CPU never
> > > > saw another kfree_rcu(), would the number of cached objects change?
> > > > 
> > > No. It works the way: unqueue the page from cache in the kfree_rcu(),
> > > whereas "rcu work" will put it back if number of objects < rcu_min_cached_objs,
> > > if >= will free the page.
> > 
> > Just to make sure I understand...  If someone writes a smaller number to
> > the sysfs variable, the per-CPU caches will be decreased at that point,
> > immediately during that sysfs write?  Or are you saying something else?
> > 
> This patch defines it as read-only. It defines the minimum threshold that
> controls number of elements in the per-CPU cache. If we decide to make it
> write also, then we will have full of freedom how to define its behavior,
> i.e. it is not defined because it is read only.

And runtime-read-only sounds like an excellent state for it.

> > > > Presumably the list can also be accessed without holding this lock,
> > > > because otherwise we shouldn't need llist...
> > > > 
> > > Hm... We increase the number of elements in cache, therefore it is not
> > > lockless. From the other hand i used llist_head to maintain the cache
> > > because it is single linked list, we do not need "*prev" link. Also
> > > we do not need to init the list.
> > > 
> > > But i can change it to list_head. Please let me know if i need :)
> > 
> > Hmmm...  Maybe it is time for a non-atomic singly linked list?  In the RCU
> > callback processing, the operations were open-coded, but they have been
> > pushed into include/linux/rcu_segcblist.h and kernel/rcu/rcu_segcblist.*.
> > 
> > Maybe some non-atomic/protected/whatever macros in the llist.h file?
> > Or maybe just open-code the singly linked list?  (Probably not the
> > best choice, though.)  Add comments stating that the atomic properties
> > of the llist functions aren't neded?  Something else?
> >
> In order to keep it simple i can replace llist_head by the list_head?

Fine by me!

> > The comments would be a good start.  Just to take pity on people seeing
> > the potential for concurrency and wondering how the concurrent accesses
> > actually happen.  ;-)
> > 
> Sounds like you are kidding me :) 

"Only those who have gone too far can possibly tell you how far you
can go!"  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 17:48         ` Uladzislau Rezki
  2020-05-04 18:07           ` Paul E. McKenney
@ 2020-05-04 18:08           ` Joel Fernandes
  2020-05-04 19:01             ` Paul E. McKenney
  1 sibling, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04 18:08 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 07:48:22PM +0200, Uladzislau Rezki wrote:
> On Mon, May 04, 2020 at 08:24:37AM -0700, Paul E. McKenney wrote:
[..] 
> > > > Presumably the list can also be accessed without holding this lock,
> > > > because otherwise we shouldn't need llist...
> > > > 
> > > Hm... We increase the number of elements in cache, therefore it is not
> > > lockless. From the other hand i used llist_head to maintain the cache
> > > because it is single linked list, we do not need "*prev" link. Also
> > > we do not need to init the list.
> > > 
> > > But i can change it to list_head. Please let me know if i need :)
> > 
> > Hmmm...  Maybe it is time for a non-atomic singly linked list?  In the RCU
> > callback processing, the operations were open-coded, but they have been
> > pushed into include/linux/rcu_segcblist.h and kernel/rcu/rcu_segcblist.*.
> > 
> > Maybe some non-atomic/protected/whatever macros in the llist.h file?
> > Or maybe just open-code the singly linked list?  (Probably not the
> > best choice, though.)  Add comments stating that the atomic properties
> > of the llist functions aren't neded?  Something else?
> >
> In order to keep it simple i can replace llist_head by the list_head?

Just to clarify for me, what is the disadvantage of using llist here?

Since we don't care about traversing backwards, isn't it better to use llist
for this usecase?

I think Vlad is using locking as we're also tracking the size of the llist to
know when to free pages. This tracking could suffer from the lost-update
problem without any locking, 2 lockless llist_add happened simulatenously.

Also if list_head is used, it will take more space and still use locking.

Thoughts?

thanks,

 - Joel

> > 
> > The comments would be a good start.  Just to take pity on people seeing
> > the potential for concurrency and wondering how the concurrent accesses
> > actually happen.  ;-)
> > 
> Sounds like you are kidding me :) 
> 
> --
> Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 18:08           ` Joel Fernandes
@ 2020-05-04 19:01             ` Paul E. McKenney
  2020-05-04 19:37               ` Joel Fernandes
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04 19:01 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 02:08:05PM -0400, Joel Fernandes wrote:
> On Mon, May 04, 2020 at 07:48:22PM +0200, Uladzislau Rezki wrote:
> > On Mon, May 04, 2020 at 08:24:37AM -0700, Paul E. McKenney wrote:
> [..] 
> > > > > Presumably the list can also be accessed without holding this lock,
> > > > > because otherwise we shouldn't need llist...
> > > > > 
> > > > Hm... We increase the number of elements in cache, therefore it is not
> > > > lockless. From the other hand i used llist_head to maintain the cache
> > > > because it is single linked list, we do not need "*prev" link. Also
> > > > we do not need to init the list.
> > > > 
> > > > But i can change it to list_head. Please let me know if i need :)
> > > 
> > > Hmmm...  Maybe it is time for a non-atomic singly linked list?  In the RCU
> > > callback processing, the operations were open-coded, but they have been
> > > pushed into include/linux/rcu_segcblist.h and kernel/rcu/rcu_segcblist.*.
> > > 
> > > Maybe some non-atomic/protected/whatever macros in the llist.h file?
> > > Or maybe just open-code the singly linked list?  (Probably not the
> > > best choice, though.)  Add comments stating that the atomic properties
> > > of the llist functions aren't neded?  Something else?
> > >
> > In order to keep it simple i can replace llist_head by the list_head?
> 
> Just to clarify for me, what is the disadvantage of using llist here?

Are there some llist APIs that are not set up for concurrency?  I am
not seeing any.

The overhead isn't that much of a concern, given that these are not on the
hotpath, but people reading the code and seeing the cmpxchg operations
might be forgiven for believing that there is some concurrency involved
somewhere.

Or am I confused and there are now single-threaded add/delete operations
for llist?

> Since we don't care about traversing backwards, isn't it better to use llist
> for this usecase?
> 
> I think Vlad is using locking as we're also tracking the size of the llist to
> know when to free pages. This tracking could suffer from the lost-update
> problem without any locking, 2 lockless llist_add happened simulatenously.
> 
> Also if list_head is used, it will take more space and still use locking.

Indeed, it would be best to use a non-concurrent singly linked list.

							Thanx, Paul

> Thoughts?
> 
> thanks,
> 
>  - Joel
> 
> > > 
> > > The comments would be a good start.  Just to take pity on people seeing
> > > the potential for concurrency and wondering how the concurrent accesses
> > > actually happen.  ;-)
> > > 
> > Sounds like you are kidding me :) 
> > 
> > --
> > Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 19:01             ` Paul E. McKenney
@ 2020-05-04 19:37               ` Joel Fernandes
  2020-05-04 19:51                 ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Joel Fernandes @ 2020-05-04 19:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

Hi Paul,

On Mon, May 4, 2020 at 3:01 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, May 04, 2020 at 02:08:05PM -0400, Joel Fernandes wrote:
> > On Mon, May 04, 2020 at 07:48:22PM +0200, Uladzislau Rezki wrote:
> > > On Mon, May 04, 2020 at 08:24:37AM -0700, Paul E. McKenney wrote:
> > [..]
> > > > > > Presumably the list can also be accessed without holding this lock,
> > > > > > because otherwise we shouldn't need llist...
> > > > > >
> > > > > Hm... We increase the number of elements in cache, therefore it is not
> > > > > lockless. From the other hand i used llist_head to maintain the cache
> > > > > because it is single linked list, we do not need "*prev" link. Also
> > > > > we do not need to init the list.
> > > > >
> > > > > But i can change it to list_head. Please let me know if i need :)
> > > >
> > > > Hmmm...  Maybe it is time for a non-atomic singly linked list?  In the RCU
> > > > callback processing, the operations were open-coded, but they have been
> > > > pushed into include/linux/rcu_segcblist.h and kernel/rcu/rcu_segcblist.*.
> > > >
> > > > Maybe some non-atomic/protected/whatever macros in the llist.h file?
> > > > Or maybe just open-code the singly linked list?  (Probably not the
> > > > best choice, though.)  Add comments stating that the atomic properties
> > > > of the llist functions aren't neded?  Something else?
> > > >
> > > In order to keep it simple i can replace llist_head by the list_head?
> >
> > Just to clarify for me, what is the disadvantage of using llist here?
>
> Are there some llist APIs that are not set up for concurrency?  I am
> not seeing any.

llist deletion racing with another llist deletion will need locking.
So strictly speaking, some locking is possible with llist usage?

The locklessness as I understand comes when adding and deleting at the
same time. For that no lock is needed. But in the current patch, it
locks anyway to avoid the lost-update of the size of the list.

> The overhead isn't that much of a concern, given that these are not on the
> hotpath, but people reading the code and seeing the cmpxchg operations
> might be forgiven for believing that there is some concurrency involved
> somewhere.
>
> Or am I confused and there are now single-threaded add/delete operations
> for llist?

I do see some examples of llist usage with locking in the kernel code.
One case is: do_init_module() calling llist_add to add to the
init_free_list under module_mutex.

> > Since we don't care about traversing backwards, isn't it better to use llist
> > for this usecase?
> >
> > I think Vlad is using locking as we're also tracking the size of the llist to
> > know when to free pages. This tracking could suffer from the lost-update
> > problem without any locking, 2 lockless llist_add happened simulatenously.
> >
> > Also if list_head is used, it will take more space and still use locking.
>
> Indeed, it would be best to use a non-concurrent singly linked list.

Ok cool :-)

Is it safe to say something like the following is ruled out? ;-) :-D
#define kfree_rcu_list_add llist_add

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 19:37               ` Joel Fernandes
@ 2020-05-04 19:51                 ` Uladzislau Rezki
  2020-05-04 20:15                   ` joel
  2020-05-04 20:16                   ` Paul E. McKenney
  0 siblings, 2 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-04 19:51 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney
  Cc: Paul E. McKenney, Uladzislau Rezki, LKML, linux-mm,
	Andrew Morton, Theodore Y . Ts'o, Matthew Wilcox, RCU,
	Oleksiy Avramchenko

> > > Since we don't care about traversing backwards, isn't it better to use llist
> > > for this usecase?
> > >
> > > I think Vlad is using locking as we're also tracking the size of the llist to
> > > know when to free pages. This tracking could suffer from the lost-update
> > > problem without any locking, 2 lockless llist_add happened simulatenously.
> > >
> > > Also if list_head is used, it will take more space and still use locking.
> >
> > Indeed, it would be best to use a non-concurrent singly linked list.
> 
> Ok cool :-)
> 
> Is it safe to say something like the following is ruled out? ;-) :-D
> #define kfree_rcu_list_add llist_add
> 
In that case i think it is better just to add a comment about using
llist_head. To state that it used as a singular list to save space
and the access is synchronized by the lock :)

IMHO.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 19:51                 ` Uladzislau Rezki
@ 2020-05-04 20:15                   ` joel
  2020-05-04 20:16                   ` Paul E. McKenney
  1 sibling, 0 replies; 78+ messages in thread
From: joel @ 2020-05-04 20:15 UTC (permalink / raw)
  To: Uladzislau Rezki, Paul E. McKenney
  Cc: LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, RCU, Oleksiy Avramchenko



On May 4, 2020 3:51:28 PM EDT, Uladzislau Rezki <urezki@gmail.com> wrote:
>> > > Since we don't care about traversing backwards, isn't it better
>to use llist
>> > > for this usecase?
>> > >
>> > > I think Vlad is using locking as we're also tracking the size of
>the llist to
>> > > know when to free pages. This tracking could suffer from the
>lost-update
>> > > problem without any locking, 2 lockless llist_add happened
>simulatenously.
>> > >
>> > > Also if list_head is used, it will take more space and still use
>locking.
>> >
>> > Indeed, it would be best to use a non-concurrent singly linked
>list.
>> 
>> Ok cool :-)
>> 
>> Is it safe to say something like the following is ruled out? ;-) :-D
>> #define kfree_rcu_list_add llist_add
>> 
>In that case i think it is better just to add a comment about using
>llist_head. To state that it used as a singular list to save space
>and the access is synchronized by the lock :)
>
>IMHO.

Sounds good to me. thanks,

 - Joel

>
>--
>Vlad Rezki

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 19:51                 ` Uladzislau Rezki
  2020-05-04 20:15                   ` joel
@ 2020-05-04 20:16                   ` Paul E. McKenney
  2020-05-05 11:03                     ` Uladzislau Rezki
  1 sibling, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-04 20:16 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 09:51:28PM +0200, Uladzislau Rezki wrote:
> > > > Since we don't care about traversing backwards, isn't it better to use llist
> > > > for this usecase?
> > > >
> > > > I think Vlad is using locking as we're also tracking the size of the llist to
> > > > know when to free pages. This tracking could suffer from the lost-update
> > > > problem without any locking, 2 lockless llist_add happened simulatenously.
> > > >
> > > > Also if list_head is used, it will take more space and still use locking.
> > >
> > > Indeed, it would be best to use a non-concurrent singly linked list.
> > 
> > Ok cool :-)
> > 
> > Is it safe to say something like the following is ruled out? ;-) :-D
> > #define kfree_rcu_list_add llist_add
> > 
> In that case i think it is better just to add a comment about using
> llist_head. To state that it used as a singular list to save space
> and the access is synchronized by the lock :)
> 
> IMHO.

But adding such a comment would be fine as well.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/24] rcu/tree: cache specified number of objects
  2020-05-04 20:16                   ` Paul E. McKenney
@ 2020-05-05 11:03                     ` Uladzislau Rezki
  0 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-05 11:03 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes
  Cc: Uladzislau Rezki, Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Mon, May 04, 2020 at 01:16:41PM -0700, Paul E. McKenney wrote:
> On Mon, May 04, 2020 at 09:51:28PM +0200, Uladzislau Rezki wrote:
> > > > > Since we don't care about traversing backwards, isn't it better to use llist
> > > > > for this usecase?
> > > > >
> > > > > I think Vlad is using locking as we're also tracking the size of the llist to
> > > > > know when to free pages. This tracking could suffer from the lost-update
> > > > > problem without any locking, 2 lockless llist_add happened simulatenously.
> > > > >
> > > > > Also if list_head is used, it will take more space and still use locking.
> > > >
> > > > Indeed, it would be best to use a non-concurrent singly linked list.
> > > 
> > > Ok cool :-)
> > > 
> > > Is it safe to say something like the following is ruled out? ;-) :-D
> > > #define kfree_rcu_list_add llist_add
> > > 
> > In that case i think it is better just to add a comment about using
> > llist_head. To state that it used as a singular list to save space
> > and the access is synchronized by the lock :)
> > 
> > IMHO.
> 
> But adding such a comment would be fine as well.
> 
Thank you Paul and Joel!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header
  2020-05-01 23:03   ` Paul E. McKenney
  2020-05-04 12:45     ` Uladzislau Rezki
@ 2020-05-06 18:29     ` Uladzislau Rezki
  2020-05-06 18:45       ` Paul E. McKenney
  1 sibling, 1 reply; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-06 18:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, linux-mm, Andrew Morton, Theodore Y . Ts'o,
	Matthew Wilcox, Joel Fernandes, RCU, Oleksiy Avramchenko

Hello, Paul, Joel.

> > Move inlined kvfree_call_rcu() function out of the
> > header file. This step is a preparation for head-less
> > support.
> > 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  include/linux/rcutiny.h | 6 +-----
> >  kernel/rcu/tiny.c       | 6 ++++++
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > index 0c6315c4a0fe..7eb66909ae1b 100644
> > --- a/include/linux/rcutiny.h
> > +++ b/include/linux/rcutiny.h
> > @@ -34,11 +34,7 @@ static inline void synchronize_rcu_expedited(void)
> >  	synchronize_rcu();
> >  }
> >  
> > -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > -{
> > -	call_rcu(head, func);
> > -}
> > -
> > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> >  void rcu_qs(void);
> >  
> >  static inline void rcu_softirq_qs(void)
> > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > index aa897c3f2e92..508c82faa45c 100644
> > --- a/kernel/rcu/tiny.c
> > +++ b/kernel/rcu/tiny.c
> > @@ -177,6 +177,12 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  }
> >  EXPORT_SYMBOL_GPL(call_rcu);
> >  
> > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > +{
> > +	call_rcu(head, func);
> > +}
> > +EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> 
> This increases the size of Tiny RCU.  Plus in Tiny RCU, the overhead of
> synchronize_rcu() is exactly zero.  So why not make the single-argument
> kvfree_call_rcu() just unconditionally do synchronize_rcu() followed by
> kvfree() or whatever?  That should go just fine into the header file.
> 
Seems it does not go well if i do it in header file:

<snip>
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 0c6315c4a0fe..76b7ad053218 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -13,6 +13,7 @@
 #define __LINUX_TINY_H
 
 #include <asm/param.h> /* for HZ */
+#include <linux/mm.h>
 
 /* Never flag non-existent other CPUs! */
 static inline bool rcu_eqs_special_set(int cpu) { return false; }
@@ -36,7 +37,15 @@ static inline void synchronize_rcu_expedited(void)
 
 static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-       call_rcu(head, func);
+       if (head) {
+               call_rcu(head, func);
+               return;
+       }
+
+       // kvfree_rcu(one_arg) call.
+       might_sleep();
+       synchronize_rcu();
+       kvfree((void *) func);
 }
<snip> 

kvfree() is defined in <linux/mm.h> as extern void kvfree(const void *addr); 
If i just include <linux/mm.h> i get many errors related to "implicit declaration
of function" like:

<snip>
rcu_read_lock()
compound_mapcount_ptr()
rcu_assign_pointer()
...
<snip>

and many other messages like:

<snip>
warning: returning ‘int’ from a function with return type
error: unknown type name ‘vm_fault_t’; did you mean ‘pmdval_t’?
error: implicit declaration of function ‘RB_EMPTY_ROOT’
...
<snip>

Please see full log here: ftp://vps418301.ovh.net/incoming/include_mm_h_output.txt

I can fix it by adding the kvfree() declaration to the rcutiny.h also:
extern void kvfree(const void *addr);

what seems wired to me? Also it can be fixed if i move it to the tiny.c
so it will be aligned with the way how it is done for tree-RCU.

Any valuable proposals?

--
Vlad Rezki

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header
  2020-05-06 18:29     ` Uladzislau Rezki
@ 2020-05-06 18:45       ` Paul E. McKenney
  2020-05-07 17:34         ` Uladzislau Rezki
  0 siblings, 1 reply; 78+ messages in thread
From: Paul E. McKenney @ 2020-05-06 18:45 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

On Wed, May 06, 2020 at 08:29:02PM +0200, Uladzislau Rezki wrote:
> Hello, Paul, Joel.
> 
> > > Move inlined kvfree_call_rcu() function out of the
> > > header file. This step is a preparation for head-less
> > > support.
> > > 
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > >  include/linux/rcutiny.h | 6 +-----
> > >  kernel/rcu/tiny.c       | 6 ++++++
> > >  2 files changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> > > index 0c6315c4a0fe..7eb66909ae1b 100644
> > > --- a/include/linux/rcutiny.h
> > > +++ b/include/linux/rcutiny.h
> > > @@ -34,11 +34,7 @@ static inline void synchronize_rcu_expedited(void)
> > >  	synchronize_rcu();
> > >  }
> > >  
> > > -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > -{
> > > -	call_rcu(head, func);
> > > -}
> > > -
> > > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
> > >  void rcu_qs(void);
> > >  
> > >  static inline void rcu_softirq_qs(void)
> > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> > > index aa897c3f2e92..508c82faa45c 100644
> > > --- a/kernel/rcu/tiny.c
> > > +++ b/kernel/rcu/tiny.c
> > > @@ -177,6 +177,12 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> > >  }
> > >  EXPORT_SYMBOL_GPL(call_rcu);
> > >  
> > > +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> > > +{
> > > +	call_rcu(head, func);
> > > +}
> > > +EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> > 
> > This increases the size of Tiny RCU.  Plus in Tiny RCU, the overhead of
> > synchronize_rcu() is exactly zero.  So why not make the single-argument
> > kvfree_call_rcu() just unconditionally do synchronize_rcu() followed by
> > kvfree() or whatever?  That should go just fine into the header file.
> > 
> Seems it does not go well if i do it in header file:
> 
> <snip>
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index 0c6315c4a0fe..76b7ad053218 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -13,6 +13,7 @@
>  #define __LINUX_TINY_H
>  
>  #include <asm/param.h> /* for HZ */
> +#include <linux/mm.h>
>  
>  /* Never flag non-existent other CPUs! */
>  static inline bool rcu_eqs_special_set(int cpu) { return false; }
> @@ -36,7 +37,15 @@ static inline void synchronize_rcu_expedited(void)
>  
>  static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  {
> -       call_rcu(head, func);
> +       if (head) {
> +               call_rcu(head, func);
> +               return;
> +       }
> +
> +       // kvfree_rcu(one_arg) call.
> +       might_sleep();
> +       synchronize_rcu();
> +       kvfree((void *) func);
>  }
> <snip> 
> 
> kvfree() is defined in <linux/mm.h> as extern void kvfree(const void *addr); 
> If i just include <linux/mm.h> i get many errors related to "implicit declaration
> of function" like:
> 
> <snip>
> rcu_read_lock()
> compound_mapcount_ptr()
> rcu_assign_pointer()
> ...
> <snip>
> 
> and many other messages like:
> 
> <snip>
> warning: returning ‘int’ from a function with return type
> error: unknown type name ‘vm_fault_t’; did you mean ‘pmdval_t’?
> error: implicit declaration of function ‘RB_EMPTY_ROOT’
> ...
> <snip>
> 
> Please see full log here: ftp://vps418301.ovh.net/incoming/include_mm_h_output.txt
> 
> I can fix it by adding the kvfree() declaration to the rcutiny.h also:
> extern void kvfree(const void *addr);
> 
> what seems wired to me? Also it can be fixed if i move it to the tiny.c
> so it will be aligned with the way how it is done for tree-RCU.

If the mm guys are OK with the kvfree() declaration, that is the way
to go.  With the addition of a comment saying something like "Avoid
#include hell".

The compiler will complain if the definition changes given that there
has to be somewhere that sees both the above and the real declaration,
so this should not cause too much trouble.

> Any valuable proposals?

Otherwise, yes, the function would need to move to tiny.c and thus add
bloat.  :-(

							Thanx, Paul

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header
  2020-05-06 18:45       ` Paul E. McKenney
@ 2020-05-07 17:34         ` Uladzislau Rezki
  0 siblings, 0 replies; 78+ messages in thread
From: Uladzislau Rezki @ 2020-05-07 17:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Joel Fernandes, LKML, linux-mm, Andrew Morton,
	Theodore Y . Ts'o, Matthew Wilcox, RCU, Oleksiy Avramchenko

> > 
> > Please see full log here: ftp://vps418301.ovh.net/incoming/include_mm_h_output.txt
> > 
> > I can fix it by adding the kvfree() declaration to the rcutiny.h also:
> > extern void kvfree(const void *addr);
> > 
> > what seems wired to me? Also it can be fixed if i move it to the tiny.c
> > so it will be aligned with the way how it is done for tree-RCU.
> 
> If the mm guys are OK with the kvfree() declaration, that is the way
> to go.  With the addition of a comment saying something like "Avoid
> #include hell".
> 
> The compiler will complain if the definition changes given that there
> has to be somewhere that sees both the above and the real declaration,
> so this should not cause too much trouble.
> 
> > Any valuable proposals?
> 
> Otherwise, yes, the function would need to move to tiny.c and thus add
> bloat.  :-(
> 

OK. I will declare it one more time. Indeed if it is changed, the
compiler will emit some errors. Also, i will add some comments why
it is done.

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2020-05-07 17:34 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28 20:58 [PATCH 00/24] Introduce kvfree_rcu(1 or 2 arguments) Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 01/24] rcu/tree: Keep kfree_rcu() awake during lock contention Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 02/24] rcu/tree: Skip entry into the page allocator for PREEMPT_RT Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 03/24] rcu/tree: Use consistent style for comments Uladzislau Rezki (Sony)
2020-05-01 19:05   ` Paul E. McKenney
2020-05-01 20:52     ` Joe Perches
2020-05-03 23:44       ` Joel Fernandes
2020-05-04  0:23         ` Paul E. McKenney
2020-05-04  0:34           ` Joe Perches
2020-05-04  0:41           ` Joel Fernandes
2020-05-03 23:52     ` Joel Fernandes
2020-05-04  0:26       ` Paul E. McKenney
2020-05-04  0:39         ` Joel Fernandes
2020-04-28 20:58 ` [PATCH 04/24] rcu/tree: Repeat the monitor if any free channel is busy Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 05/24] rcu/tree: Simplify debug_objects handling Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 06/24] rcu/tree: Simplify KFREE_BULK_MAX_ENTR macro Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 07/24] rcu/tree: move locking/unlocking to separate functions Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 08/24] rcu/tree: Use static initializer for krc.lock Uladzislau Rezki (Sony)
2020-05-01 21:17   ` Paul E. McKenney
2020-05-04 12:10     ` Uladzislau Rezki
2020-04-28 20:58 ` [PATCH 09/24] rcu/tree: cache specified number of objects Uladzislau Rezki (Sony)
2020-05-01 21:27   ` Paul E. McKenney
2020-05-04 12:43     ` Uladzislau Rezki
2020-05-04 15:24       ` Paul E. McKenney
2020-05-04 17:48         ` Uladzislau Rezki
2020-05-04 18:07           ` Paul E. McKenney
2020-05-04 18:08           ` Joel Fernandes
2020-05-04 19:01             ` Paul E. McKenney
2020-05-04 19:37               ` Joel Fernandes
2020-05-04 19:51                 ` Uladzislau Rezki
2020-05-04 20:15                   ` joel
2020-05-04 20:16                   ` Paul E. McKenney
2020-05-05 11:03                     ` Uladzislau Rezki
2020-04-28 20:58 ` [PATCH 10/24] rcu/tree: add rcutree.rcu_min_cached_objs description Uladzislau Rezki (Sony)
2020-05-01 22:25   ` Paul E. McKenney
2020-05-04 12:44     ` Uladzislau Rezki
2020-04-28 20:58 ` [PATCH 11/24] rcu/tree: Maintain separate array for vmalloc ptrs Uladzislau Rezki (Sony)
2020-05-01 21:37   ` Paul E. McKenney
2020-05-03 23:42     ` Joel Fernandes
2020-05-04  0:20       ` Paul E. McKenney
2020-05-04  0:58         ` Joel Fernandes
2020-05-04  2:20           ` Paul E. McKenney
2020-05-04 14:25     ` Uladzislau Rezki
2020-04-28 20:58 ` [PATCH 12/24] rcu/tiny: support vmalloc in tiny-RCU Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 13/24] rcu: Rename rcu_invoke_kfree_callback/rcu_kfree_callback Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 14/24] rcu: Rename __is_kfree_rcu_offset() macro Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 15/24] rcu: Rename kfree_call_rcu() to the kvfree_call_rcu() Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 16/24] mm/list_lru.c: Rename kvfree_rcu() to local variant Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 17/24] rcu: Introduce 2 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 18/24] mm/list_lru.c: Remove kvfree_rcu_local() function Uladzislau Rezki (Sony)
2020-04-28 20:58 ` [PATCH 19/24] rcu/tree: Support reclaim for head-less object Uladzislau Rezki (Sony)
2020-05-01 22:39   ` Paul E. McKenney
2020-05-04  0:12     ` Joel Fernandes
2020-05-04  0:28       ` Paul E. McKenney
2020-05-04  0:32         ` Joel Fernandes
2020-05-04 14:21           ` Uladzislau Rezki
2020-05-04 15:31             ` Paul E. McKenney
2020-05-04 16:56               ` Uladzislau Rezki
2020-05-04 17:08                 ` Paul E. McKenney
2020-05-04 12:57     ` Uladzislau Rezki
2020-04-28 20:58 ` [PATCH 20/24] rcu/tree: Make kvfree_rcu() tolerate any alignment Uladzislau Rezki (Sony)
2020-05-01 23:00   ` Paul E. McKenney
2020-05-04  0:24     ` Joel Fernandes
2020-05-04  0:29       ` Paul E. McKenney
2020-05-04  0:31         ` Joel Fernandes
2020-05-04 12:56           ` Uladzislau Rezki
2020-04-28 20:59 ` [PATCH 21/24] rcu/tiny: move kvfree_call_rcu() out of header Uladzislau Rezki (Sony)
2020-05-01 23:03   ` Paul E. McKenney
2020-05-04 12:45     ` Uladzislau Rezki
2020-05-06 18:29     ` Uladzislau Rezki
2020-05-06 18:45       ` Paul E. McKenney
2020-05-07 17:34         ` Uladzislau Rezki
2020-04-28 20:59 ` [PATCH 22/24] rcu/tiny: support reclaim for head-less object Uladzislau Rezki (Sony)
2020-05-01 23:06   ` Paul E. McKenney
2020-05-04  0:27     ` Joel Fernandes
2020-05-04 12:45       ` Uladzislau Rezki
2020-04-28 20:59 ` [PATCH 23/24] rcu: Introduce 1 arg kvfree_rcu() interface Uladzislau Rezki (Sony)
2020-04-28 20:59 ` [PATCH 24/24] lib/test_vmalloc.c: Add test cases for kvfree_rcu() Uladzislau Rezki (Sony)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).