All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag
@ 2022-06-02  8:06 Uladzislau Rezki (Sony)
  2022-06-02  8:06 ` [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval Uladzislau Rezki (Sony)
  2022-06-02 23:43 ` [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Joel Fernandes
  0 siblings, 2 replies; 20+ messages in thread
From: Uladzislau Rezki (Sony) @ 2022-06-02  8:06 UTC (permalink / raw)
  To: LKML, RCU, Paul E . McKenney
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, Oleksiy Avramchenko

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

monitor_todo is not needed as the work struct already tracks
if work is pending. Just use that to know if work is pending
using schedule_delayed_work() helper.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 222d59299a2a..fd16c0b46d9e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3295,7 +3295,6 @@ struct kfree_rcu_cpu_work {
  * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
  * @lock: Synchronize access to this structure
  * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
- * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
  * @initialized: The @rcu_work fields have been initialized
  * @count: Number of objects for which GP not started
  * @bkvcache:
@@ -3320,7 +3319,6 @@ struct kfree_rcu_cpu {
 	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
 	raw_spinlock_t lock;
 	struct delayed_work monitor_work;
-	bool monitor_todo;
 	bool initialized;
 	int count;
 
@@ -3500,6 +3498,18 @@ static void kfree_rcu_work(struct work_struct *work)
 	}
 }
 
+static bool
+need_offload_krc(struct kfree_rcu_cpu *krcp)
+{
+	int i;
+
+	for (i = 0; i < FREE_N_CHANNELS; i++)
+		if (krcp->bkvhead[i])
+			return true;
+
+	return !!krcp->head;
+}
+
 /*
  * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
  */
@@ -3556,9 +3566,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
 	// of the channels that is still busy we should rearm the
 	// work to repeat an attempt. Because previous batches are
 	// still in progress.
-	if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
-		krcp->monitor_todo = false;
-	else
+	if (need_offload_krc(krcp))
 		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
@@ -3746,11 +3754,8 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 	WRITE_ONCE(krcp->count, krcp->count + 1);
 
 	// Set timer to drain after KFREE_DRAIN_JIFFIES.
-	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
-	    !krcp->monitor_todo) {
-		krcp->monitor_todo = true;
+	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
 		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
-	}
 
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
@@ -3825,14 +3830,8 @@ void __init kfree_rcu_scheduler_running(void)
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);
-		if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
-				krcp->monitor_todo) {
-			raw_spin_unlock_irqrestore(&krcp->lock, flags);
-			continue;
-		}
-		krcp->monitor_todo = true;
-		schedule_delayed_work_on(cpu, &krcp->monitor_work,
-					 KFREE_DRAIN_JIFFIES);
+		if (need_offload_krc(krcp))
+			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-02  8:06 [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Uladzislau Rezki (Sony)
@ 2022-06-02  8:06 ` Uladzislau Rezki (Sony)
  2022-06-02 23:32   ` Joel Fernandes
  2022-06-04 15:51   ` Paul E. McKenney
  2022-06-02 23:43 ` [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Joel Fernandes
  1 sibling, 2 replies; 20+ messages in thread
From: Uladzislau Rezki (Sony) @ 2022-06-02  8:06 UTC (permalink / raw)
  To: LKML, RCU, Paul E . McKenney
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, Oleksiy Avramchenko

Currently the monitor work is scheduled with a fixed interval that
is HZ/20 or each 50 milliseconds. The drawback of such approach is
a low utilization of page slot in some scenarios. The page can store
up to 512 records. For example on Android system it can look like:

<snip>
  kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
  kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
  kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
  kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
  kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
  kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
  kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
  kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
  kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
  kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
  kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
  kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
  kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
  kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
  kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
<snip>

where a page only carries few records to reclaim a memory. In order to
improve batching and make utilization more efficient the patch introduces
a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
case a memory reclaim occurs more often whereas in mostly idle cases the
interval is set to its maximum timeout that improves the utilization of
page slots.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index fd16c0b46d9e..c02a64995b85 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3249,7 +3249,8 @@ EXPORT_SYMBOL_GPL(call_rcu);
 
 
 /* Maximum number of jiffies to wait before draining a batch. */
-#define KFREE_DRAIN_JIFFIES (HZ / 50)
+#define KFREE_DRAIN_JIFFIES_MAX (HZ)
+#define KFREE_DRAIN_JIFFIES_MIN (HZ / 50)
 #define KFREE_N_BATCHES 2
 #define FREE_N_CHANNELS 2
 
@@ -3510,6 +3511,26 @@ need_offload_krc(struct kfree_rcu_cpu *krcp)
 	return !!krcp->head;
 }
 
+static void
+schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+{
+	long delay, delay_left;
+
+	delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ?
+		KFREE_DRAIN_JIFFIES_MIN:KFREE_DRAIN_JIFFIES_MAX;
+
+	if (delayed_work_pending(&krcp->monitor_work)) {
+		delay_left = krcp->monitor_work.timer.expires - jiffies;
+
+		if (delay < delay_left)
+			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
+
+		return;
+	}
+
+	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
+}
+
 /*
  * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
  */
@@ -3567,7 +3588,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
 	// work to repeat an attempt. Because previous batches are
 	// still in progress.
 	if (need_offload_krc(krcp))
-		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
+		schedule_delayed_monitor_work(krcp);
 
 	raw_spin_unlock_irqrestore(&krcp->lock, flags);
 }
@@ -3755,7 +3776,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 
 	// Set timer to drain after KFREE_DRAIN_JIFFIES.
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
-		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
+		schedule_delayed_monitor_work(krcp);
 
 unlock_return:
 	krc_this_cpu_unlock(krcp, flags);
@@ -3831,7 +3852,7 @@ void __init kfree_rcu_scheduler_running(void)
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);
 		if (need_offload_krc(krcp))
-			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
+			schedule_delayed_monitor_work(krcp);
 		raw_spin_unlock_irqrestore(&krcp->lock, flags);
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-02  8:06 ` [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval Uladzislau Rezki (Sony)
@ 2022-06-02 23:32   ` Joel Fernandes
  2022-06-03  9:55     ` Uladzislau Rezki
  2022-06-04 15:51   ` Paul E. McKenney
  1 sibling, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2022-06-02 23:32 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, RCU, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> Currently the monitor work is scheduled with a fixed interval that
> is HZ/20 or each 50 milliseconds. The drawback of such approach is
> a low utilization of page slot in some scenarios. The page can store
> up to 512 records. For example on Android system it can look like:
> 
> <snip>
>   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
>   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
>   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
>   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
>   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
>   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
>   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
>   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
>   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
>   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
>   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
>   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
>   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
>   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
>   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> <snip>
> 
> where a page only carries few records to reclaim a memory. In order to
> improve batching and make utilization more efficient the patch introduces
> a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> case a memory reclaim occurs more often whereas in mostly idle cases the
> interval is set to its maximum timeout that improves the utilization of
> page slots.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel



> ---
>  kernel/rcu/tree.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index fd16c0b46d9e..c02a64995b85 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3249,7 +3249,8 @@ EXPORT_SYMBOL_GPL(call_rcu);
>  
>  
>  /* Maximum number of jiffies to wait before draining a batch. */
> -#define KFREE_DRAIN_JIFFIES (HZ / 50)
> +#define KFREE_DRAIN_JIFFIES_MAX (HZ)
> +#define KFREE_DRAIN_JIFFIES_MIN (HZ / 50)
>  #define KFREE_N_BATCHES 2
>  #define FREE_N_CHANNELS 2
>  
> @@ -3510,6 +3511,26 @@ need_offload_krc(struct kfree_rcu_cpu *krcp)
>  	return !!krcp->head;
>  }
>  
> +static void
> +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
> +{
> +	long delay, delay_left;
> +
> +	delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ?
> +		KFREE_DRAIN_JIFFIES_MIN:KFREE_DRAIN_JIFFIES_MAX;
> +
> +	if (delayed_work_pending(&krcp->monitor_work)) {
> +		delay_left = krcp->monitor_work.timer.expires - jiffies;
> +
> +		if (delay < delay_left)
> +			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
> +
> +		return;
> +	}
> +
> +	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
> +}
> +
>  /*
>   * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
>   */
> @@ -3567,7 +3588,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  	// work to repeat an attempt. Because previous batches are
>  	// still in progress.
>  	if (need_offload_krc(krcp))
> -		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> +		schedule_delayed_monitor_work(krcp);
>  
>  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  }
> @@ -3755,7 +3776,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  
>  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
> -		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> +		schedule_delayed_monitor_work(krcp);
>  
>  unlock_return:
>  	krc_this_cpu_unlock(krcp, flags);
> @@ -3831,7 +3852,7 @@ void __init kfree_rcu_scheduler_running(void)
>  
>  		raw_spin_lock_irqsave(&krcp->lock, flags);
>  		if (need_offload_krc(krcp))
> -			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> +			schedule_delayed_monitor_work(krcp);
>  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  	}
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-02  8:06 [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Uladzislau Rezki (Sony)
  2022-06-02  8:06 ` [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval Uladzislau Rezki (Sony)
@ 2022-06-02 23:43 ` Joel Fernandes
  2022-06-03  9:51   ` Uladzislau Rezki
  2022-06-04 16:03   ` Paul E. McKenney
  1 sibling, 2 replies; 20+ messages in thread
From: Joel Fernandes @ 2022-06-02 23:43 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, RCU, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Thu, Jun 02, 2022 at 10:06:43AM +0200, Uladzislau Rezki (Sony) wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> monitor_todo is not needed as the work struct already tracks
> if work is pending. Just use that to know if work is pending
> using schedule_delayed_work() helper.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
>  1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 222d59299a2a..fd16c0b46d9e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3295,7 +3295,6 @@ struct kfree_rcu_cpu_work {
>   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
>   * @lock: Synchronize access to this structure
>   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> - * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
>   * @initialized: The @rcu_work fields have been initialized
>   * @count: Number of objects for which GP not started
>   * @bkvcache:
> @@ -3320,7 +3319,6 @@ struct kfree_rcu_cpu {
>  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
>  	raw_spinlock_t lock;
>  	struct delayed_work monitor_work;
> -	bool monitor_todo;
>  	bool initialized;
>  	int count;
>  
> @@ -3500,6 +3498,18 @@ static void kfree_rcu_work(struct work_struct *work)
>  	}
>  }
>  
> +static bool
> +need_offload_krc(struct kfree_rcu_cpu *krcp)
> +{
> +	int i;
> +
> +	for (i = 0; i < FREE_N_CHANNELS; i++)
> +		if (krcp->bkvhead[i])
> +			return true;
> +
> +	return !!krcp->head;
> +}

Thanks for modifying my original patch to do this, and thanks for giving me
the attribution for the patch. This function is a nice addition.

For the patch in its entirety:
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel



> +
>  /*
>   * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
>   */
> @@ -3556,9 +3566,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  	// of the channels that is still busy we should rearm the
>  	// work to repeat an attempt. Because previous batches are
>  	// still in progress.
> -	if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
> -		krcp->monitor_todo = false;
> -	else
> +	if (need_offload_krc(krcp))
>  		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
>  
>  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> @@ -3746,11 +3754,8 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  	WRITE_ONCE(krcp->count, krcp->count + 1);
>  
>  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> -	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> -	    !krcp->monitor_todo) {
> -		krcp->monitor_todo = true;
> +	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
>  		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> -	}
>  
>  unlock_return:
>  	krc_this_cpu_unlock(krcp, flags);
> @@ -3825,14 +3830,8 @@ void __init kfree_rcu_scheduler_running(void)
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
>  		raw_spin_lock_irqsave(&krcp->lock, flags);
> -		if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
> -				krcp->monitor_todo) {
> -			raw_spin_unlock_irqrestore(&krcp->lock, flags);
> -			continue;
> -		}
> -		krcp->monitor_todo = true;
> -		schedule_delayed_work_on(cpu, &krcp->monitor_work,
> -					 KFREE_DRAIN_JIFFIES);
> +		if (need_offload_krc(krcp))
> +			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
>  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  	}
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-02 23:43 ` [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Joel Fernandes
@ 2022-06-03  9:51   ` Uladzislau Rezki
  2022-06-04  3:07     ` Joel Fernandes
  2022-06-04 16:03   ` Paul E. McKenney
  1 sibling, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-03  9:51 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> On Thu, Jun 02, 2022 at 10:06:43AM +0200, Uladzislau Rezki (Sony) wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > monitor_todo is not needed as the work struct already tracks
> > if work is pending. Just use that to know if work is pending
> > using schedule_delayed_work() helper.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
> >  1 file changed, 16 insertions(+), 17 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 222d59299a2a..fd16c0b46d9e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3295,7 +3295,6 @@ struct kfree_rcu_cpu_work {
> >   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
> >   * @lock: Synchronize access to this structure
> >   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> > - * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
> >   * @initialized: The @rcu_work fields have been initialized
> >   * @count: Number of objects for which GP not started
> >   * @bkvcache:
> > @@ -3320,7 +3319,6 @@ struct kfree_rcu_cpu {
> >  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> >  	raw_spinlock_t lock;
> >  	struct delayed_work monitor_work;
> > -	bool monitor_todo;
> >  	bool initialized;
> >  	int count;
> >  
> > @@ -3500,6 +3498,18 @@ static void kfree_rcu_work(struct work_struct *work)
> >  	}
> >  }
> >  
> > +static bool
> > +need_offload_krc(struct kfree_rcu_cpu *krcp)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < FREE_N_CHANNELS; i++)
> > +		if (krcp->bkvhead[i])
> > +			return true;
> > +
> > +	return !!krcp->head;
> > +}
> 
> Thanks for modifying my original patch to do this, and thanks for giving me
> the attribution for the patch. This function is a nice addition.
> 
It was you who did it :) Actually the second patch depends on it therefore 
i decided to upload it on behalf of you with slight modification hoping that
you would not mind.

>
> For the patch in its entirety:
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
Thanks for the review!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-02 23:32   ` Joel Fernandes
@ 2022-06-03  9:55     ` Uladzislau Rezki
  2022-06-04  3:03       ` Joel Fernandes
  0 siblings, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-03  9:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Thu, Jun 02, 2022 at 11:32:23PM +0000, Joel Fernandes wrote:
> On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > Currently the monitor work is scheduled with a fixed interval that
> > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > a low utilization of page slot in some scenarios. The page can store
> > up to 512 records. For example on Android system it can look like:
> > 
> > <snip>
> >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > <snip>
> > 
> > where a page only carries few records to reclaim a memory. In order to
> > improve batching and make utilization more efficient the patch introduces
> > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > case a memory reclaim occurs more often whereas in mostly idle cases the
> > interval is set to its maximum timeout that improves the utilization of
> > page slots.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
Thanks!

This patch makes the interval hard-coded in some sense so you can not change
it in runtime, only recompilation. If there is a need or request we can make
both as module_param().

If we are to do that we can just add one extra patch on top of it.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-03  9:55     ` Uladzislau Rezki
@ 2022-06-04  3:03       ` Joel Fernandes
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Fernandes @ 2022-06-04  3:03 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, RCU, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Fri, Jun 03, 2022 at 11:55:10AM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 02, 2022 at 11:32:23PM +0000, Joel Fernandes wrote:
> > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > Currently the monitor work is scheduled with a fixed interval that
> > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > a low utilization of page slot in some scenarios. The page can store
> > > up to 512 records. For example on Android system it can look like:
> > > 
> > > <snip>
> > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > <snip>
> > > 
> > > where a page only carries few records to reclaim a memory. In order to
> > > improve batching and make utilization more efficient the patch introduces
> > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > interval is set to its maximum timeout that improves the utilization of
> > > page slots.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > 
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> Thanks!
> 
> This patch makes the interval hard-coded in some sense so you can not change
> it in runtime, only recompilation. If there is a need or request we can make
> both as module_param().

Yes, this seems a good first step.

> If we are to do that we can just add one extra patch on top of it.

Yes.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-03  9:51   ` Uladzislau Rezki
@ 2022-06-04  3:07     ` Joel Fernandes
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Fernandes @ 2022-06-04  3:07 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, RCU, Paul E . McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Fri, Jun 03, 2022 at 11:51:34AM +0200, Uladzislau Rezki wrote:
> > On Thu, Jun 02, 2022 at 10:06:43AM +0200, Uladzislau Rezki (Sony) wrote:
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > 
> > > monitor_todo is not needed as the work struct already tracks
> > > if work is pending. Just use that to know if work is pending
> > > using schedule_delayed_work() helper.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
> > >  1 file changed, 16 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 222d59299a2a..fd16c0b46d9e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3295,7 +3295,6 @@ struct kfree_rcu_cpu_work {
> > >   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
> > >   * @lock: Synchronize access to this structure
> > >   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> > > - * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
> > >   * @initialized: The @rcu_work fields have been initialized
> > >   * @count: Number of objects for which GP not started
> > >   * @bkvcache:
> > > @@ -3320,7 +3319,6 @@ struct kfree_rcu_cpu {
> > >  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> > >  	raw_spinlock_t lock;
> > >  	struct delayed_work monitor_work;
> > > -	bool monitor_todo;
> > >  	bool initialized;
> > >  	int count;
> > >  
> > > @@ -3500,6 +3498,18 @@ static void kfree_rcu_work(struct work_struct *work)
> > >  	}
> > >  }
> > >  
> > > +static bool
> > > +need_offload_krc(struct kfree_rcu_cpu *krcp)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < FREE_N_CHANNELS; i++)
> > > +		if (krcp->bkvhead[i])
> > > +			return true;
> > > +
> > > +	return !!krcp->head;
> > > +}
> > 
> > Thanks for modifying my original patch to do this, and thanks for giving me
> > the attribution for the patch. This function is a nice addition.
> > 
> It was you who did it :) Actually the second patch depends on it therefore 
> i decided to upload it on behalf of you with slight modification hoping that
> you would not mind.

Yes I don't mind at all :) Thank you again for seeing it through!

> > For the patch in its entirety:
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > 
> Thanks for the review!

Sure, any time. By the way I am out in the Carribean next week. I will catch
you all the week after.

Quick update on my side on the lazy CB stuff, I made some progress on using
the bypass lists for lazy CBs, its looking good and builds now. I fixed bug
in my code where idle loop was flushing lazy CBs on its way to idle.. I think
its probably O(workingdays) away from v2 posting assuming all goes well.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-02  8:06 ` [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval Uladzislau Rezki (Sony)
  2022-06-02 23:32   ` Joel Fernandes
@ 2022-06-04 15:51   ` Paul E. McKenney
  2022-06-05  9:10     ` Uladzislau Rezki
  1 sibling, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2022-06-04 15:51 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> Currently the monitor work is scheduled with a fixed interval that
> is HZ/20 or each 50 milliseconds. The drawback of such approach is
> a low utilization of page slot in some scenarios. The page can store
> up to 512 records. For example on Android system it can look like:
> 
> <snip>
>   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
>   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
>   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
>   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
>   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
>   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
>   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
>   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
>   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
>   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
>   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
>   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
>   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
>   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
>   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> <snip>
> 
> where a page only carries few records to reclaim a memory. In order to
> improve batching and make utilization more efficient the patch introduces
> a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> case a memory reclaim occurs more often whereas in mostly idle cases the
> interval is set to its maximum timeout that improves the utilization of
> page slots.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

That does look like a problem well worth solving!

But I am missing one thing.  If we are having a callback flood, why do we
need a shorter timeout?  Wouldn't a check on the number of blocks queued
be simpler, more direct, and provide faster response to the start of a
callback flood?

							Thanx, Paul

> ---
>  kernel/rcu/tree.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index fd16c0b46d9e..c02a64995b85 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3249,7 +3249,8 @@ EXPORT_SYMBOL_GPL(call_rcu);
>  
>  
>  /* Maximum number of jiffies to wait before draining a batch. */
> -#define KFREE_DRAIN_JIFFIES (HZ / 50)
> +#define KFREE_DRAIN_JIFFIES_MAX (HZ)
> +#define KFREE_DRAIN_JIFFIES_MIN (HZ / 50)
>  #define KFREE_N_BATCHES 2
>  #define FREE_N_CHANNELS 2
>  
> @@ -3510,6 +3511,26 @@ need_offload_krc(struct kfree_rcu_cpu *krcp)
>  	return !!krcp->head;
>  }
>  
> +static void
> +schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
> +{
> +	long delay, delay_left;
> +
> +	delay = READ_ONCE(krcp->count) >= KVFREE_BULK_MAX_ENTR ?
> +		KFREE_DRAIN_JIFFIES_MIN:KFREE_DRAIN_JIFFIES_MAX;
> +
> +	if (delayed_work_pending(&krcp->monitor_work)) {
> +		delay_left = krcp->monitor_work.timer.expires - jiffies;
> +
> +		if (delay < delay_left)
> +			mod_delayed_work(system_wq, &krcp->monitor_work, delay);
> +
> +		return;
> +	}
> +
> +	queue_delayed_work(system_wq, &krcp->monitor_work, delay);
> +}
> +
>  /*
>   * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
>   */
> @@ -3567,7 +3588,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
>  	// work to repeat an attempt. Because previous batches are
>  	// still in progress.
>  	if (need_offload_krc(krcp))
> -		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> +		schedule_delayed_monitor_work(krcp);
>  
>  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  }
> @@ -3755,7 +3776,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>  
>  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
> -		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> +		schedule_delayed_monitor_work(krcp);
>  
>  unlock_return:
>  	krc_this_cpu_unlock(krcp, flags);
> @@ -3831,7 +3852,7 @@ void __init kfree_rcu_scheduler_running(void)
>  
>  		raw_spin_lock_irqsave(&krcp->lock, flags);
>  		if (need_offload_krc(krcp))
> -			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> +			schedule_delayed_monitor_work(krcp);
>  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
>  	}
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-02 23:43 ` [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Joel Fernandes
  2022-06-03  9:51   ` Uladzislau Rezki
@ 2022-06-04 16:03   ` Paul E. McKenney
  2022-06-05  8:52     ` Uladzislau Rezki
  1 sibling, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2022-06-04 16:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay,
	Oleksiy Avramchenko

On Thu, Jun 02, 2022 at 11:43:39PM +0000, Joel Fernandes wrote:
> On Thu, Jun 02, 2022 at 10:06:43AM +0200, Uladzislau Rezki (Sony) wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > monitor_todo is not needed as the work struct already tracks
> > if work is pending. Just use that to know if work is pending
> > using schedule_delayed_work() helper.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
> >  1 file changed, 16 insertions(+), 17 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 222d59299a2a..fd16c0b46d9e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3295,7 +3295,6 @@ struct kfree_rcu_cpu_work {
> >   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
> >   * @lock: Synchronize access to this structure
> >   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> > - * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
> >   * @initialized: The @rcu_work fields have been initialized
> >   * @count: Number of objects for which GP not started
> >   * @bkvcache:
> > @@ -3320,7 +3319,6 @@ struct kfree_rcu_cpu {
> >  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> >  	raw_spinlock_t lock;
> >  	struct delayed_work monitor_work;
> > -	bool monitor_todo;
> >  	bool initialized;
> >  	int count;
> >  
> > @@ -3500,6 +3498,18 @@ static void kfree_rcu_work(struct work_struct *work)
> >  	}
> >  }
> >  
> > +static bool
> > +need_offload_krc(struct kfree_rcu_cpu *krcp)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < FREE_N_CHANNELS; i++)
> > +		if (krcp->bkvhead[i])
> > +			return true;
> > +
> > +	return !!krcp->head;
> > +}
> 
> Thanks for modifying my original patch to do this, and thanks for giving me
> the attribution for the patch. This function is a nice addition.
> 
> For the patch in its entirety:
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

I pulled this one in for testing and further review, thank you both!

Given the description, I reversed the order of the Signed-off-by
tags to indicate that the patch came through Uladzislau from Joel.

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> 
> > +
> >  /*
> >   * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
> >   */
> > @@ -3556,9 +3566,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
> >  	// of the channels that is still busy we should rearm the
> >  	// work to repeat an attempt. Because previous batches are
> >  	// still in progress.
> > -	if (!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head)
> > -		krcp->monitor_todo = false;
> > -	else
> > +	if (need_offload_krc(krcp))
> >  		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> >  
> >  	raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > @@ -3746,11 +3754,8 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> >  	WRITE_ONCE(krcp->count, krcp->count + 1);
> >  
> >  	// Set timer to drain after KFREE_DRAIN_JIFFIES.
> > -	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> > -	    !krcp->monitor_todo) {
> > -		krcp->monitor_todo = true;
> > +	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
> >  		schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> > -	}
> >  
> >  unlock_return:
> >  	krc_this_cpu_unlock(krcp, flags);
> > @@ -3825,14 +3830,8 @@ void __init kfree_rcu_scheduler_running(void)
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> >  		raw_spin_lock_irqsave(&krcp->lock, flags);
> > -		if ((!krcp->bkvhead[0] && !krcp->bkvhead[1] && !krcp->head) ||
> > -				krcp->monitor_todo) {
> > -			raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > -			continue;
> > -		}
> > -		krcp->monitor_todo = true;
> > -		schedule_delayed_work_on(cpu, &krcp->monitor_work,
> > -					 KFREE_DRAIN_JIFFIES);
> > +		if (need_offload_krc(krcp))
> > +			schedule_delayed_work_on(cpu, &krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> >  		raw_spin_unlock_irqrestore(&krcp->lock, flags);
> >  	}
> >  }
> > -- 
> > 2.30.2
> > 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag
  2022-06-04 16:03   ` Paul E. McKenney
@ 2022-06-05  8:52     ` Uladzislau Rezki
  0 siblings, 0 replies; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-05  8:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Uladzislau Rezki (Sony),
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay,
	Oleksiy Avramchenko

> On Thu, Jun 02, 2022 at 11:43:39PM +0000, Joel Fernandes wrote:
> > On Thu, Jun 02, 2022 at 10:06:43AM +0200, Uladzislau Rezki (Sony) wrote:
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > > 
> > > monitor_todo is not needed as the work struct already tracks
> > > if work is pending. Just use that to know if work is pending
> > > using schedule_delayed_work() helper.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  kernel/rcu/tree.c | 33 ++++++++++++++++-----------------
> > >  1 file changed, 16 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 222d59299a2a..fd16c0b46d9e 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3295,7 +3295,6 @@ struct kfree_rcu_cpu_work {
> > >   * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
> > >   * @lock: Synchronize access to this structure
> > >   * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
> > > - * @monitor_todo: Tracks whether a @monitor_work delayed work is pending
> > >   * @initialized: The @rcu_work fields have been initialized
> > >   * @count: Number of objects for which GP not started
> > >   * @bkvcache:
> > > @@ -3320,7 +3319,6 @@ struct kfree_rcu_cpu {
> > >  	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> > >  	raw_spinlock_t lock;
> > >  	struct delayed_work monitor_work;
> > > -	bool monitor_todo;
> > >  	bool initialized;
> > >  	int count;
> > >  
> > > @@ -3500,6 +3498,18 @@ static void kfree_rcu_work(struct work_struct *work)
> > >  	}
> > >  }
> > >  
> > > +static bool
> > > +need_offload_krc(struct kfree_rcu_cpu *krcp)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < FREE_N_CHANNELS; i++)
> > > +		if (krcp->bkvhead[i])
> > > +			return true;
> > > +
> > > +	return !!krcp->head;
> > > +}
> > 
> > Thanks for modifying my original patch to do this, and thanks for giving me
> > the attribution for the patch. This function is a nice addition.
> > 
> > For the patch in its entirety:
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> 
> I pulled this one in for testing and further review, thank you both!
> 
> Given the description, I reversed the order of the Signed-off-by
> tags to indicate that the patch came through Uladzislau from Joel.
> 
Thanks! The author is Joel i might messed up with "Signed-off-by" tags :)

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-04 15:51   ` Paul E. McKenney
@ 2022-06-05  9:10     ` Uladzislau Rezki
  2022-06-07  3:47       ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-05  9:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

> On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > Currently the monitor work is scheduled with a fixed interval that
> > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > a low utilization of page slot in some scenarios. The page can store
> > up to 512 records. For example on Android system it can look like:
> > 
> > <snip>
> >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > <snip>
> > 
> > where a page only carries few records to reclaim a memory. In order to
> > improve batching and make utilization more efficient the patch introduces
> > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > case a memory reclaim occurs more often whereas in mostly idle cases the
> > interval is set to its maximum timeout that improves the utilization of
> > page slots.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> That does look like a problem well worth solving!
>
Agree, better ideas make better final solution :)

> 
> But I am missing one thing. If we are having a callback flood, why do we
> need a shorter timeout?
>
To offload faster, because otherwise we run into classical issue, it is a low
memory condition state resulting in OOM.

>
> Wouldn't a check on the number of blocks queued be simpler, more direct,
> and provide faster response to the start of a callback flood?
>
I rely on krcp->count because not always we can store the pointer in the page
slots. We can not allocate a page in the caller context thus we use page-cache
worker that fills the cache in normal context. While it populates the cache, 
pointers temporary are queued to the linked-list.

Any thoughts?

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-05  9:10     ` Uladzislau Rezki
@ 2022-06-07  3:47       ` Paul E. McKenney
  2022-06-09 13:10         ` Uladzislau Rezki
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2022-06-07  3:47 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > Currently the monitor work is scheduled with a fixed interval that
> > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > a low utilization of page slot in some scenarios. The page can store
> > > up to 512 records. For example on Android system it can look like:
> > > 
> > > <snip>
> > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > <snip>
> > > 
> > > where a page only carries few records to reclaim a memory. In order to
> > > improve batching and make utilization more efficient the patch introduces
> > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > interval is set to its maximum timeout that improves the utilization of
> > > page slots.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > 
> > That does look like a problem well worth solving!
> >
> Agree, better ideas make better final solution :)
> 
> > 
> > But I am missing one thing. If we are having a callback flood, why do we
> > need a shorter timeout?
> >
> To offload faster, because otherwise we run into classical issue, it is a low
> memory condition state resulting in OOM.

But doesn't each callback queued during the flood give us an opportunity
to react to the flood?  That will be way more fine-grained than any
reasonable timer, right?  Or am I missing something?

I do agree that the action would often need to be indirect to avoid the
memory-allocation-state hassles, but we already can do that, either via
an extremely short-term hrtimer or something like irq-work.

> > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > and provide faster response to the start of a callback flood?
> >
> I rely on krcp->count because not always we can store the pointer in the page
> slots. We can not allocate a page in the caller context thus we use page-cache
> worker that fills the cache in normal context. While it populates the cache, 
> pointers temporary are queued to the linked-list.
> 
> Any thoughts?

There are a great many ways to approach this.  One of them is to maintain
a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
counter each jiffy.

Or am I missing a trick here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-07  3:47       ` Paul E. McKenney
@ 2022-06-09 13:10         ` Uladzislau Rezki
  2022-06-10 16:45           ` Joel Fernandes
  0 siblings, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-09 13:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > Currently the monitor work is scheduled with a fixed interval that
> > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > a low utilization of page slot in some scenarios. The page can store
> > > > up to 512 records. For example on Android system it can look like:
> > > >
> > > > <snip>
> > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > <snip>
> > > >
> > > > where a page only carries few records to reclaim a memory. In order to
> > > > improve batching and make utilization more efficient the patch introduces
> > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > interval is set to its maximum timeout that improves the utilization of
> > > > page slots.
> > > >
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > >
> > > That does look like a problem well worth solving!
> > >
> > Agree, better ideas make better final solution :)
> >
> > >
> > > But I am missing one thing. If we are having a callback flood, why do we
> > > need a shorter timeout?
> > >
> > To offload faster, because otherwise we run into classical issue, it is a low
> > memory condition state resulting in OOM.
>
> But doesn't each callback queued during the flood give us an opportunity
> to react to the flood?  That will be way more fine-grained than any
> reasonable timer, right?  Or am I missing something?
>
We can set the timer to zero or to current "jiffies" to initiate the
offloading if the
page is full. In that sense probably it make sense to propagate those two attr.
to user space, so the user can configure min/max drain interval.

Or we can only deal with fixed interval exposed via sysfs to control it by user.
In that case we can get rid of MIN one and just trigger a timer if the page is
full. I think this approach is better.

>
> I do agree that the action would often need to be indirect to avoid the
> memory-allocation-state hassles, but we already can do that, either via
> an extremely short-term hrtimer or something like irq-work.
>
> > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > and provide faster response to the start of a callback flood?
> > >
> > I rely on krcp->count because not always we can store the pointer in the page
> > slots. We can not allocate a page in the caller context thus we use page-cache
> > worker that fills the cache in normal context. While it populates the cache,
> > pointers temporary are queued to the linked-list.
> >
> > Any thoughts?
>
> There are a great many ways to approach this.  One of them is to maintain
> a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> counter each jiffy.
>
> Or am I missing a trick here?
>
Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
and schedule the work when if it is needed? Or i have missed your point?

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-09 13:10         ` Uladzislau Rezki
@ 2022-06-10 16:45           ` Joel Fernandes
  2022-06-13  9:47             ` Uladzislau Rezki
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2022-06-10 16:45 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

Hi Vlad, Paul,

On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > up to 512 records. For example on Android system it can look like:
> > > > >
> > > > > <snip>
> > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > <snip>
> > > > >
> > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > improve batching and make utilization more efficient the patch introduces
> > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > page slots.
> > > > >
> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > >
> > > > That does look like a problem well worth solving!
> > > >
> > > Agree, better ideas make better final solution :)
> > >
> > > >
> > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > need a shorter timeout?
> > > >
> > > To offload faster, because otherwise we run into classical issue, it is a low
> > > memory condition state resulting in OOM.
> >
> > But doesn't each callback queued during the flood give us an opportunity
> > to react to the flood?  That will be way more fine-grained than any
> > reasonable timer, right?  Or am I missing something?
> >
> We can set the timer to zero or to current "jiffies" to initiate the
> offloading if the
> page is full. In that sense probably it make sense to propagate those two attr.
> to user space, so the user can configure min/max drain interval.
> 
> Or we can only deal with fixed interval exposed via sysfs to control it by user.
> In that case we can get rid of MIN one and just trigger a timer if the page is
> full. I think this approach is better.

Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
accomplish that by just calling the timer callback inline, instead of queuing
a timer? I imagine you would just do queue_work() instead of
queue_delayed_work() in this scenario.

> > I do agree that the action would often need to be indirect to avoid the
> > memory-allocation-state hassles, but we already can do that, either via
> > an extremely short-term hrtimer or something like irq-work.
> >
> > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > and provide faster response to the start of a callback flood?
> > > >
> > > I rely on krcp->count because not always we can store the pointer in the page
> > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > worker that fills the cache in normal context. While it populates the cache,
> > > pointers temporary are queued to the linked-list.
> > >
> > > Any thoughts?
> >
> > There are a great many ways to approach this.  One of them is to maintain
> > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > counter each jiffy.
> >
> > Or am I missing a trick here?
> >
> Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> and schedule the work when if it is needed? Or i have missed your point?

I think he (Paul) is describing the way 'flood detection' can work similar to how the
bypass list code is implemented. There he maintains a count which only if
exceeds a limit, will queue on to the bypass list.

This code:

        // If we have advanced to a new jiffy, reset counts to allow
        // moving back from ->nocb_bypass to ->cblist.
        if (j == rdp->nocb_nobypass_last) {
                c = rdp->nocb_nobypass_count + 1;
        } else {
                WRITE_ONCE(rdp->nocb_nobypass_last, j);
                c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
                if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
                                 nocb_nobypass_lim_per_jiffy))
                        c = 0;
                else if (c > nocb_nobypass_lim_per_jiffy)
                        c = nocb_nobypass_lim_per_jiffy;
        }
        WRITE_ONCE(rdp->nocb_nobypass_count, c);


Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
make that as a 'flood is happening' detector.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-10 16:45           ` Joel Fernandes
@ 2022-06-13  9:47             ` Uladzislau Rezki
  2022-06-14  6:42               ` Uladzislau Rezki
  0 siblings, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-13  9:47 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney
  Cc: Uladzislau Rezki, Paul E. McKenney, LKML, RCU,
	Frederic Weisbecker, Neeraj Upadhyay, Oleksiy Avramchenko

Hello, Joel, Paul.

> Hi Vlad, Paul,
> 
> On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > > up to 512 records. For example on Android system it can look like:
> > > > > >
> > > > > > <snip>
> > > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > > <snip>
> > > > > >
> > > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > > improve batching and make utilization more efficient the patch introduces
> > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > > page slots.
> > > > > >
> > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > >
> > > > > That does look like a problem well worth solving!
> > > > >
> > > > Agree, better ideas make better final solution :)
> > > >
> > > > >
> > > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > > need a shorter timeout?
> > > > >
> > > > To offload faster, because otherwise we run into classical issue, it is a low
> > > > memory condition state resulting in OOM.
> > >
> > > But doesn't each callback queued during the flood give us an opportunity
> > > to react to the flood?  That will be way more fine-grained than any
> > > reasonable timer, right?  Or am I missing something?
> > >
> > We can set the timer to zero or to current "jiffies" to initiate the
> > offloading if the
> > page is full. In that sense probably it make sense to propagate those two attr.
> > to user space, so the user can configure min/max drain interval.
> > 
> > Or we can only deal with fixed interval exposed via sysfs to control it by user.
> > In that case we can get rid of MIN one and just trigger a timer if the page is
> > full. I think this approach is better.
> 
> Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
> accomplish that by just calling the timer callback inline, instead of queuing
> a timer? I imagine you would just do queue_work() instead of
> queue_delayed_work() in this scenario.
> 
> > > I do agree that the action would often need to be indirect to avoid the
> > > memory-allocation-state hassles, but we already can do that, either via
> > > an extremely short-term hrtimer or something like irq-work.
> > >
> > > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > > and provide faster response to the start of a callback flood?
> > > > >
> > > > I rely on krcp->count because not always we can store the pointer in the page
> > > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > > worker that fills the cache in normal context. While it populates the cache,
> > > > pointers temporary are queued to the linked-list.
> > > >
> > > > Any thoughts?
> > >
> > > There are a great many ways to approach this.  One of them is to maintain
> > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > > counter each jiffy.
> > >
> > > Or am I missing a trick here?
> > >
> > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> > and schedule the work when if it is needed? Or i have missed your point?
> 
> I think he (Paul) is describing the way 'flood detection' can work similar to how the
> bypass list code is implemented. There he maintains a count which only if
> exceeds a limit, will queue on to the bypass list.
> 
OK, i see that. We also do similar thing. We say it is a flood - when a
page becomes full, so it is kind of threshold that we pass.

> This code:
> 
>         // If we have advanced to a new jiffy, reset counts to allow
>         // moving back from ->nocb_bypass to ->cblist.
>         if (j == rdp->nocb_nobypass_last) {
>                 c = rdp->nocb_nobypass_count + 1;
>         } else {
>                 WRITE_ONCE(rdp->nocb_nobypass_last, j);
>                 c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
>                 if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
>                                  nocb_nobypass_lim_per_jiffy))
>                         c = 0;
>                 else if (c > nocb_nobypass_lim_per_jiffy)
>                         c = nocb_nobypass_lim_per_jiffy;
>         }
>         WRITE_ONCE(rdp->nocb_nobypass_count, c);
> 
> 
> Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
> make that as a 'flood is happening' detector.
> 
OK, thank you Joel. I also think, that way we improve batching and utilization
of the page what is actually an intention of the patch in question.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-13  9:47             ` Uladzislau Rezki
@ 2022-06-14  6:42               ` Uladzislau Rezki
  2022-06-15  5:12                 ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-14  6:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Paul E. McKenney, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> Hello, Joel, Paul.
> 
> > Hi Vlad, Paul,
> > 
> > On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> > > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > > > up to 512 records. For example on Android system it can look like:
> > > > > > >
> > > > > > > <snip>
> > > > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > > > <snip>
> > > > > > >
> > > > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > > > improve batching and make utilization more efficient the patch introduces
> > > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > > > page slots.
> > > > > > >
> > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > >
> > > > > > That does look like a problem well worth solving!
> > > > > >
> > > > > Agree, better ideas make better final solution :)
> > > > >
> > > > > >
> > > > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > > > need a shorter timeout?
> > > > > >
> > > > > To offload faster, because otherwise we run into classical issue, it is a low
> > > > > memory condition state resulting in OOM.
> > > >
> > > > But doesn't each callback queued during the flood give us an opportunity
> > > > to react to the flood?  That will be way more fine-grained than any
> > > > reasonable timer, right?  Or am I missing something?
> > > >
> > > We can set the timer to zero or to current "jiffies" to initiate the
> > > offloading if the
> > > page is full. In that sense probably it make sense to propagate those two attr.
> > > to user space, so the user can configure min/max drain interval.
> > > 
> > > Or we can only deal with fixed interval exposed via sysfs to control it by user.
> > > In that case we can get rid of MIN one and just trigger a timer if the page is
> > > full. I think this approach is better.
> > 
> > Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
> > accomplish that by just calling the timer callback inline, instead of queuing
> > a timer? I imagine you would just do queue_work() instead of
> > queue_delayed_work() in this scenario.
> > 
> > > > I do agree that the action would often need to be indirect to avoid the
> > > > memory-allocation-state hassles, but we already can do that, either via
> > > > an extremely short-term hrtimer or something like irq-work.
> > > >
> > > > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > > > and provide faster response to the start of a callback flood?
> > > > > >
> > > > > I rely on krcp->count because not always we can store the pointer in the page
> > > > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > > > worker that fills the cache in normal context. While it populates the cache,
> > > > > pointers temporary are queued to the linked-list.
> > > > >
> > > > > Any thoughts?
> > > >
> > > > There are a great many ways to approach this.  One of them is to maintain
> > > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > > > counter each jiffy.
> > > >
> > > > Or am I missing a trick here?
> > > >
> > > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> > > and schedule the work when if it is needed? Or i have missed your point?
> > 
> > I think he (Paul) is describing the way 'flood detection' can work similar to how the
> > bypass list code is implemented. There he maintains a count which only if
> > exceeds a limit, will queue on to the bypass list.
> > 
> OK, i see that. We also do similar thing. We say it is a flood - when a
> page becomes full, so it is kind of threshold that we pass.
> 
> > This code:
> > 
> >         // If we have advanced to a new jiffy, reset counts to allow
> >         // moving back from ->nocb_bypass to ->cblist.
> >         if (j == rdp->nocb_nobypass_last) {
> >                 c = rdp->nocb_nobypass_count + 1;
> >         } else {
> >                 WRITE_ONCE(rdp->nocb_nobypass_last, j);
> >                 c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
> >                 if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
> >                                  nocb_nobypass_lim_per_jiffy))
> >                         c = 0;
> >                 else if (c > nocb_nobypass_lim_per_jiffy)
> >                         c = nocb_nobypass_lim_per_jiffy;
> >         }
> >         WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > 
> > 
> > Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
> > make that as a 'flood is happening' detector.
> > 
> OK, thank you Joel. I also think, that way we improve batching and utilization
> of the page what is actually an intention of the patch in question.
> 
Paul, will you pick this patch?

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-14  6:42               ` Uladzislau Rezki
@ 2022-06-15  5:12                 ` Paul E. McKenney
  2022-06-15  7:32                   ` Uladzislau Rezki
  0 siblings, 1 reply; 20+ messages in thread
From: Paul E. McKenney @ 2022-06-15  5:12 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay,
	Oleksiy Avramchenko

On Tue, Jun 14, 2022 at 08:42:00AM +0200, Uladzislau Rezki wrote:
> > Hello, Joel, Paul.
> > 
> > > Hi Vlad, Paul,
> > > 
> > > On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> > > > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > > > > up to 512 records. For example on Android system it can look like:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > > > > improve batching and make utilization more efficient the patch introduces
> > > > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > > > > page slots.
> > > > > > > >
> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > >
> > > > > > > That does look like a problem well worth solving!
> > > > > > >
> > > > > > Agree, better ideas make better final solution :)
> > > > > >
> > > > > > >
> > > > > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > > > > need a shorter timeout?
> > > > > > >
> > > > > > To offload faster, because otherwise we run into classical issue, it is a low
> > > > > > memory condition state resulting in OOM.
> > > > >
> > > > > But doesn't each callback queued during the flood give us an opportunity
> > > > > to react to the flood?  That will be way more fine-grained than any
> > > > > reasonable timer, right?  Or am I missing something?
> > > > >
> > > > We can set the timer to zero or to current "jiffies" to initiate the
> > > > offloading if the
> > > > page is full. In that sense probably it make sense to propagate those two attr.
> > > > to user space, so the user can configure min/max drain interval.
> > > > 
> > > > Or we can only deal with fixed interval exposed via sysfs to control it by user.
> > > > In that case we can get rid of MIN one and just trigger a timer if the page is
> > > > full. I think this approach is better.
> > > 
> > > Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
> > > accomplish that by just calling the timer callback inline, instead of queuing
> > > a timer? I imagine you would just do queue_work() instead of
> > > queue_delayed_work() in this scenario.
> > > 
> > > > > I do agree that the action would often need to be indirect to avoid the
> > > > > memory-allocation-state hassles, but we already can do that, either via
> > > > > an extremely short-term hrtimer or something like irq-work.
> > > > >
> > > > > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > > > > and provide faster response to the start of a callback flood?
> > > > > > >
> > > > > > I rely on krcp->count because not always we can store the pointer in the page
> > > > > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > > > > worker that fills the cache in normal context. While it populates the cache,
> > > > > > pointers temporary are queued to the linked-list.
> > > > > >
> > > > > > Any thoughts?
> > > > >
> > > > > There are a great many ways to approach this.  One of them is to maintain
> > > > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > > > > counter each jiffy.
> > > > >
> > > > > Or am I missing a trick here?
> > > > >
> > > > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> > > > and schedule the work when if it is needed? Or i have missed your point?
> > > 
> > > I think he (Paul) is describing the way 'flood detection' can work similar to how the
> > > bypass list code is implemented. There he maintains a count which only if
> > > exceeds a limit, will queue on to the bypass list.
> > > 
> > OK, i see that. We also do similar thing. We say it is a flood - when a
> > page becomes full, so it is kind of threshold that we pass.
> > 
> > > This code:
> > > 
> > >         // If we have advanced to a new jiffy, reset counts to allow
> > >         // moving back from ->nocb_bypass to ->cblist.
> > >         if (j == rdp->nocb_nobypass_last) {
> > >                 c = rdp->nocb_nobypass_count + 1;
> > >         } else {
> > >                 WRITE_ONCE(rdp->nocb_nobypass_last, j);
> > >                 c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
> > >                 if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
> > >                                  nocb_nobypass_lim_per_jiffy))
> > >                         c = 0;
> > >                 else if (c > nocb_nobypass_lim_per_jiffy)
> > >                         c = nocb_nobypass_lim_per_jiffy;
> > >         }
> > >         WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > > 
> > > 
> > > Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
> > > make that as a 'flood is happening' detector.
> > > 
> > OK, thank you Joel. I also think, that way we improve batching and utilization
> > of the page what is actually an intention of the patch in question.
> > 
> Paul, will you pick this patch?

I did pick up the first one:

16224f4cdf03 ("rcu/kvfree: Remove useless monitor_todo flag")

On the second one, if you use page-fill as your flood detector, can't
you simplify things by just using the one longer timeout, as discussed
in this thread?

Or did I miss a turn somewhere?

							Thanx, Paul

> Thanks!
> 
> --
> Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-15  5:12                 ` Paul E. McKenney
@ 2022-06-15  7:32                   ` Uladzislau Rezki
  2022-06-15 14:02                     ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Uladzislau Rezki @ 2022-06-15  7:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Joel Fernandes, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> On Tue, Jun 14, 2022 at 08:42:00AM +0200, Uladzislau Rezki wrote:
> > > Hello, Joel, Paul.
> > > 
> > > > Hi Vlad, Paul,
> > > > 
> > > > On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> > > > > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > >
> > > > > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > > > > > up to 512 records. For example on Android system it can look like:
> > > > > > > > >
> > > > > > > > > <snip>
> > > > > > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > > > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > > > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > > > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > > > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > > > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > > > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > > > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > > > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > > > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > > > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > > > > > <snip>
> > > > > > > > >
> > > > > > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > > > > > improve batching and make utilization more efficient the patch introduces
> > > > > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > > > > > page slots.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > >
> > > > > > > > That does look like a problem well worth solving!
> > > > > > > >
> > > > > > > Agree, better ideas make better final solution :)
> > > > > > >
> > > > > > > >
> > > > > > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > > > > > need a shorter timeout?
> > > > > > > >
> > > > > > > To offload faster, because otherwise we run into classical issue, it is a low
> > > > > > > memory condition state resulting in OOM.
> > > > > >
> > > > > > But doesn't each callback queued during the flood give us an opportunity
> > > > > > to react to the flood?  That will be way more fine-grained than any
> > > > > > reasonable timer, right?  Or am I missing something?
> > > > > >
> > > > > We can set the timer to zero or to current "jiffies" to initiate the
> > > > > offloading if the
> > > > > page is full. In that sense probably it make sense to propagate those two attr.
> > > > > to user space, so the user can configure min/max drain interval.
> > > > > 
> > > > > Or we can only deal with fixed interval exposed via sysfs to control it by user.
> > > > > In that case we can get rid of MIN one and just trigger a timer if the page is
> > > > > full. I think this approach is better.
> > > > 
> > > > Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
> > > > accomplish that by just calling the timer callback inline, instead of queuing
> > > > a timer? I imagine you would just do queue_work() instead of
> > > > queue_delayed_work() in this scenario.
> > > > 
> > > > > > I do agree that the action would often need to be indirect to avoid the
> > > > > > memory-allocation-state hassles, but we already can do that, either via
> > > > > > an extremely short-term hrtimer or something like irq-work.
> > > > > >
> > > > > > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > > > > > and provide faster response to the start of a callback flood?
> > > > > > > >
> > > > > > > I rely on krcp->count because not always we can store the pointer in the page
> > > > > > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > > > > > worker that fills the cache in normal context. While it populates the cache,
> > > > > > > pointers temporary are queued to the linked-list.
> > > > > > >
> > > > > > > Any thoughts?
> > > > > >
> > > > > > There are a great many ways to approach this.  One of them is to maintain
> > > > > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > > > > > counter each jiffy.
> > > > > >
> > > > > > Or am I missing a trick here?
> > > > > >
> > > > > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> > > > > and schedule the work when if it is needed? Or i have missed your point?
> > > > 
> > > > I think he (Paul) is describing the way 'flood detection' can work similar to how the
> > > > bypass list code is implemented. There he maintains a count which only if
> > > > exceeds a limit, will queue on to the bypass list.
> > > > 
> > > OK, i see that. We also do similar thing. We say it is a flood - when a
> > > page becomes full, so it is kind of threshold that we pass.
> > > 
> > > > This code:
> > > > 
> > > >         // If we have advanced to a new jiffy, reset counts to allow
> > > >         // moving back from ->nocb_bypass to ->cblist.
> > > >         if (j == rdp->nocb_nobypass_last) {
> > > >                 c = rdp->nocb_nobypass_count + 1;
> > > >         } else {
> > > >                 WRITE_ONCE(rdp->nocb_nobypass_last, j);
> > > >                 c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
> > > >                 if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
> > > >                                  nocb_nobypass_lim_per_jiffy))
> > > >                         c = 0;
> > > >                 else if (c > nocb_nobypass_lim_per_jiffy)
> > > >                         c = nocb_nobypass_lim_per_jiffy;
> > > >         }
> > > >         WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > > > 
> > > > 
> > > > Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
> > > > make that as a 'flood is happening' detector.
> > > > 
> > > OK, thank you Joel. I also think, that way we improve batching and utilization
> > > of the page what is actually an intention of the patch in question.
> > > 
> > Paul, will you pick this patch?
> 
> I did pick up the first one:
> 
> 16224f4cdf03 ("rcu/kvfree: Remove useless monitor_todo flag")
> 
> On the second one, if you use page-fill as your flood detector, can't
> you simplify things by just using the one longer timeout, as discussed
> in this thread?
> 
> Or did I miss a turn somewhere?
> 
No, you did not :) Agreed i will simplify it with a one interval that
corresponding to 1 HZ. The flood is detected when a page is full. When
it occurs the work will be rearmed to be run asap. Will resend it.

One thing that we have discussed, in case of flood we can minimize a
memory footprint by releasing the page directly from the our monitor
work if the grace period is passed for the all page slots. In that case
we do not need to move the page forward toward the RCU-core for later
reclaim.

But that is another patches. I will examine it when i return back from 
Norway.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval
  2022-06-15  7:32                   ` Uladzislau Rezki
@ 2022-06-15 14:02                     ` Paul E. McKenney
  0 siblings, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2022-06-15 14:02 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay,
	Oleksiy Avramchenko

On Wed, Jun 15, 2022 at 09:32:32AM +0200, Uladzislau Rezki wrote:
> > On Tue, Jun 14, 2022 at 08:42:00AM +0200, Uladzislau Rezki wrote:
> > > > Hello, Joel, Paul.
> > > > 
> > > > > Hi Vlad, Paul,
> > > > > 
> > > > > On Thu, Jun 09, 2022 at 03:10:57PM +0200, Uladzislau Rezki wrote:
> > > > > > On Tue, Jun 7, 2022 at 5:47 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > >
> > > > > > > On Sun, Jun 05, 2022 at 11:10:31AM +0200, Uladzislau Rezki wrote:
> > > > > > > > > On Thu, Jun 02, 2022 at 10:06:44AM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > Currently the monitor work is scheduled with a fixed interval that
> > > > > > > > > > is HZ/20 or each 50 milliseconds. The drawback of such approach is
> > > > > > > > > > a low utilization of page slot in some scenarios. The page can store
> > > > > > > > > > up to 512 records. For example on Android system it can look like:
> > > > > > > > > >
> > > > > > > > > > <snip>
> > > > > > > > > >   kworker/3:0-13872   [003] .... 11286.007048: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=1
> > > > > > > > > >   kworker/3:0-13872   [003] .... 11286.015638: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > > > >   kworker/1:2-20434   [001] .... 11286.051230: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > > > >   kworker/1:2-20434   [001] .... 11286.059322: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=2
> > > > > > > > > >   kworker/0:1-20052   [000] .... 11286.095295: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=2
> > > > > > > > > >   kworker/0:1-20052   [000] .... 11286.103418: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=1
> > > > > > > > > >   kworker/2:3-14372   [002] .... 11286.135155: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=2
> > > > > > > > > >   kworker/2:3-14372   [002] .... 11286.135198: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044872ffd nr_records=1
> > > > > > > > > >   kworker/1:2-20434   [001] .... 11286.155377: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=5
> > > > > > > > > >   kworker/2:3-14372   [002] .... 11286.167181: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000026522604 nr_records=5
> > > > > > > > > >   kworker/1:2-20434   [001] .... 11286.179202: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000008ef95e14 nr_records=1
> > > > > > > > > >   kworker/2:3-14372   [002] .... 11286.187398: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000c597d297 nr_records=6
> > > > > > > > > >   kworker/3:0-13872   [003] .... 11286.187445: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000050bf92e2 nr_records=3
> > > > > > > > > >   kworker/1:2-20434   [001] .... 11286.198975: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000cbcf05db nr_records=4
> > > > > > > > > >   kworker/1:2-20434   [001] .... 11286.207203: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000095ed6fca nr_records=4
> > > > > > > > > > <snip>
> > > > > > > > > >
> > > > > > > > > > where a page only carries few records to reclaim a memory. In order to
> > > > > > > > > > improve batching and make utilization more efficient the patch introduces
> > > > > > > > > > a drain interval that can be set either to KFREE_DRAIN_JIFFIES_MAX or
> > > > > > > > > > KFREE_DRAIN_JIFFIES_MIN. It is adjusted if a flood is detected, in this
> > > > > > > > > > case a memory reclaim occurs more often whereas in mostly idle cases the
> > > > > > > > > > interval is set to its maximum timeout that improves the utilization of
> > > > > > > > > > page slots.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > >
> > > > > > > > > That does look like a problem well worth solving!
> > > > > > > > >
> > > > > > > > Agree, better ideas make better final solution :)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > But I am missing one thing. If we are having a callback flood, why do we
> > > > > > > > > need a shorter timeout?
> > > > > > > > >
> > > > > > > > To offload faster, because otherwise we run into classical issue, it is a low
> > > > > > > > memory condition state resulting in OOM.
> > > > > > >
> > > > > > > But doesn't each callback queued during the flood give us an opportunity
> > > > > > > to react to the flood?  That will be way more fine-grained than any
> > > > > > > reasonable timer, right?  Or am I missing something?
> > > > > > >
> > > > > > We can set the timer to zero or to current "jiffies" to initiate the
> > > > > > offloading if the
> > > > > > page is full. In that sense probably it make sense to propagate those two attr.
> > > > > > to user space, so the user can configure min/max drain interval.
> > > > > > 
> > > > > > Or we can only deal with fixed interval exposed via sysfs to control it by user.
> > > > > > In that case we can get rid of MIN one and just trigger a timer if the page is
> > > > > > full. I think this approach is better.
> > > > > 
> > > > > Yes I also think triggering timer with zero-timeout is better. Can you (Vlad)
> > > > > accomplish that by just calling the timer callback inline, instead of queuing
> > > > > a timer? I imagine you would just do queue_work() instead of
> > > > > queue_delayed_work() in this scenario.
> > > > > 
> > > > > > > I do agree that the action would often need to be indirect to avoid the
> > > > > > > memory-allocation-state hassles, but we already can do that, either via
> > > > > > > an extremely short-term hrtimer or something like irq-work.
> > > > > > >
> > > > > > > > > Wouldn't a check on the number of blocks queued be simpler, more direct,
> > > > > > > > > and provide faster response to the start of a callback flood?
> > > > > > > > >
> > > > > > > > I rely on krcp->count because not always we can store the pointer in the page
> > > > > > > > slots. We can not allocate a page in the caller context thus we use page-cache
> > > > > > > > worker that fills the cache in normal context. While it populates the cache,
> > > > > > > > pointers temporary are queued to the linked-list.
> > > > > > > >
> > > > > > > > Any thoughts?
> > > > > > >
> > > > > > > There are a great many ways to approach this.  One of them is to maintain
> > > > > > > a per-CPU free-running counter of kvfree_rcu() calls, and to reset this
> > > > > > > counter each jiffy.
> > > > > > >
> > > > > > > Or am I missing a trick here?
> > > > > > >
> > > > > > Do you mean to have a per-cpu timer that checks the per-cpu-freed counter
> > > > > > and schedule the work when if it is needed? Or i have missed your point?
> > > > > 
> > > > > I think he (Paul) is describing the way 'flood detection' can work similar to how the
> > > > > bypass list code is implemented. There he maintains a count which only if
> > > > > exceeds a limit, will queue on to the bypass list.
> > > > > 
> > > > OK, i see that. We also do similar thing. We say it is a flood - when a
> > > > page becomes full, so it is kind of threshold that we pass.
> > > > 
> > > > > This code:
> > > > > 
> > > > >         // If we have advanced to a new jiffy, reset counts to allow
> > > > >         // moving back from ->nocb_bypass to ->cblist.
> > > > >         if (j == rdp->nocb_nobypass_last) {
> > > > >                 c = rdp->nocb_nobypass_count + 1;
> > > > >         } else {
> > > > >                 WRITE_ONCE(rdp->nocb_nobypass_last, j);
> > > > >                 c = rdp->nocb_nobypass_count - nocb_nobypass_lim_per_jiffy;
> > > > >                 if (ULONG_CMP_LT(rdp->nocb_nobypass_count,
> > > > >                                  nocb_nobypass_lim_per_jiffy))
> > > > >                         c = 0;
> > > > >                 else if (c > nocb_nobypass_lim_per_jiffy)
> > > > >                         c = nocb_nobypass_lim_per_jiffy;
> > > > >         }
> > > > >         WRITE_ONCE(rdp->nocb_nobypass_count, c);
> > > > > 
> > > > > 
> > > > > Your (Vlad's) approach OTOH is also fine to me, you check if page is full and
> > > > > make that as a 'flood is happening' detector.
> > > > > 
> > > > OK, thank you Joel. I also think, that way we improve batching and utilization
> > > > of the page what is actually an intention of the patch in question.
> > > > 
> > > Paul, will you pick this patch?
> > 
> > I did pick up the first one:
> > 
> > 16224f4cdf03 ("rcu/kvfree: Remove useless monitor_todo flag")
> > 
> > On the second one, if you use page-fill as your flood detector, can't
> > you simplify things by just using the one longer timeout, as discussed
> > in this thread?
> > 
> > Or did I miss a turn somewhere?
> > 
> No, you did not :) Agreed i will simplify it with a one interval that
> corresponding to 1 HZ. The flood is detected when a page is full. When
> it occurs the work will be rearmed to be run asap. Will resend it.

Very good, and looking forward to it!

> One thing that we have discussed, in case of flood we can minimize a
> memory footprint by releasing the page directly from the our monitor
> work if the grace period is passed for the all page slots. In that case
> we do not need to move the page forward toward the RCU-core for later
> reclaim.
> 
> But that is another patches. I will examine it when i return back from 
> Norway.

Agreed, those should be separate patches.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-15 14:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-02  8:06 [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Uladzislau Rezki (Sony)
2022-06-02  8:06 ` [PATCH 2/2] rcu/kvfree: Introduce KFREE_DRAIN_JIFFIES_[MAX/MIN] interval Uladzislau Rezki (Sony)
2022-06-02 23:32   ` Joel Fernandes
2022-06-03  9:55     ` Uladzislau Rezki
2022-06-04  3:03       ` Joel Fernandes
2022-06-04 15:51   ` Paul E. McKenney
2022-06-05  9:10     ` Uladzislau Rezki
2022-06-07  3:47       ` Paul E. McKenney
2022-06-09 13:10         ` Uladzislau Rezki
2022-06-10 16:45           ` Joel Fernandes
2022-06-13  9:47             ` Uladzislau Rezki
2022-06-14  6:42               ` Uladzislau Rezki
2022-06-15  5:12                 ` Paul E. McKenney
2022-06-15  7:32                   ` Uladzislau Rezki
2022-06-15 14:02                     ` Paul E. McKenney
2022-06-02 23:43 ` [PATCH 1/2] rcu/kvfree: Remove useless monitor_todo flag Joel Fernandes
2022-06-03  9:51   ` Uladzislau Rezki
2022-06-04  3:07     ` Joel Fernandes
2022-06-04 16:03   ` Paul E. McKenney
2022-06-05  8:52     ` Uladzislau Rezki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.