rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] Remove unnecessary RCU grace period chaining
@ 2022-10-11  7:11 Hou Tao
  2022-10-11  7:11 ` [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period Hou Tao
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Hou Tao @ 2022-10-11  7:11 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Paul E . McKenney, Delyan Kratunov, rcu, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

Now bpf uses RCU grace period chaining to wait for the completion of
access from both sleepable and non-sleepable bpf program: calling
call_rcu_tasks_trace() firstly to wait for a RCU-tasks-trace grace
period, then in its callback calls call_rcu() or kfree_rcu() to wait for
a normal RCU grace period.

According to the implementation of RCU Tasks Trace, it inovkes
->postscan_func() to wait for one RCU-tasks-trace grace period and
rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
normal RCU grace period in turn, so one RCU-tasks-trace grace period
will imply one normal RCU grace period. And it also means there is no
need to do call_rcu() or kfree_rcu() again in the callback of
call_rcu_tasks_trace() in these grace period chains, so just remove the
unnecessary call_rcu() or kfree_rcu() calls.

Comments are always welcome.

Hou Tao (3):
  bpf: Free elements after one RCU-tasks-trace grace period
  bpf: Free local storage memory after one RCU-tasks-trace grace period
  bpf: Free trace program array after one RCU-tasks-trace grace period

 kernel/bpf/bpf_local_storage.c | 10 ++++++++--
 kernel/bpf/core.c              |  5 ++++-
 kernel/bpf/memalloc.c          | 17 ++++++-----------
 3 files changed, 18 insertions(+), 14 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-11  7:11 [PATCH bpf-next 0/3] Remove unnecessary RCU grace period chaining Hou Tao
@ 2022-10-11  7:11 ` Hou Tao
  2022-10-11  9:07   ` Paul E. McKenney
  2022-10-11  7:11 ` [PATCH bpf-next 2/3] bpf: Free local storage memory " Hou Tao
  2022-10-11  7:11 ` [PATCH bpf-next 3/3] bpf: Free trace program array " Hou Tao
  2 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-11  7:11 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Paul E . McKenney, Delyan Kratunov, rcu, houtao1

From: Hou Tao <houtao1@huawei.com>

According to the implementation of RCU Tasks Trace, it inovkes
->postscan_func() to wait for one RCU-tasks-trace grace period and
rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
normal RCU grace period in turn, so one RCU-tasks-trace grace period
will imply one RCU grace period.

So there is no need to do call_rcu() again in the callback of
call_rcu_tasks_trace() and it can just free these elements directly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/memalloc.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 5f83be1d2018..6f32dddc804f 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
 	kfree(obj);
 }
 
+/* Now RCU Tasks grace period implies RCU grace period, so no need to do
+ * an extra call_rcu().
+ */
 static void __free_rcu(struct rcu_head *head)
 {
 	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
@@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
 	atomic_set(&c->call_rcu_in_progress, 0);
 }
 
-static void __free_rcu_tasks_trace(struct rcu_head *head)
-{
-	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
-
-	call_rcu(&c->rcu, __free_rcu);
-}
-
 static void enque_to_free(struct bpf_mem_cache *c, void *obj)
 {
 	struct llist_node *llnode = obj;
@@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
 		 * from __free_rcu() and from drain_mem_cache().
 		 */
 		__llist_add(llnode, &c->waiting_for_gp);
-	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
-	 * Then use call_rcu() to wait for normal progs to finish
-	 * and finally do free_one() on each element.
+	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
+	 * progs to finish and finally do free_one() on each element.
 	 */
-	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
+	call_rcu_tasks_trace(&c->rcu, __free_rcu);
 }
 
 static void free_bulk(struct bpf_mem_cache *c)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 2/3] bpf: Free local storage memory after one RCU-tasks-trace grace period
  2022-10-11  7:11 [PATCH bpf-next 0/3] Remove unnecessary RCU grace period chaining Hou Tao
  2022-10-11  7:11 ` [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period Hou Tao
@ 2022-10-11  7:11 ` Hou Tao
  2022-10-11  9:09   ` Paul E. McKenney
  2022-10-11  7:11 ` [PATCH bpf-next 3/3] bpf: Free trace program array " Hou Tao
  2 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-11  7:11 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Paul E . McKenney, Delyan Kratunov, rcu, houtao1

From: Hou Tao <houtao1@huawei.com>

Local storage map is accessible for both sleepable and normal bpf
program, so its memory is freed by using both call_rcu_tasks_trace() and
kfree_rcu() to wait for both RCU-tasks-trace grace period and RCU grace
period to expire.

However According to the current implementation of RCU-tasks-trace, one
RCU-tasks-trace grace period waits for one RCU grace period, so there is
no need to call kfree_rcu(), it is safe to call kfree() directly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/bpf_local_storage.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index 802fc15b0d73..18a2dd611635 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -84,20 +84,26 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
 	return NULL;
 }
 
+/* Now RCU Tasks grace period implies RCU grace period, so no need to call
+ * kfree_rcu(), just call kfree() directly.
+ */
 void bpf_local_storage_free_rcu(struct rcu_head *rcu)
 {
 	struct bpf_local_storage *local_storage;
 
 	local_storage = container_of(rcu, struct bpf_local_storage, rcu);
-	kfree_rcu(local_storage, rcu);
+	kfree(local_storage);
 }
 
+/* Now RCU Tasks grace period implies RCU grace period, so no need to call
+ * kfree_rcu(), just call kfree() directly.
+ */
 static void bpf_selem_free_rcu(struct rcu_head *rcu)
 {
 	struct bpf_local_storage_elem *selem;
 
 	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
-	kfree_rcu(selem, rcu);
+	kfree(selem);
 }
 
 /* local_storage->lock must be held and selem->local_storage == local_storage.
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 3/3] bpf: Free trace program array after one RCU-tasks-trace grace period
  2022-10-11  7:11 [PATCH bpf-next 0/3] Remove unnecessary RCU grace period chaining Hou Tao
  2022-10-11  7:11 ` [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period Hou Tao
  2022-10-11  7:11 ` [PATCH bpf-next 2/3] bpf: Free local storage memory " Hou Tao
@ 2022-10-11  7:11 ` Hou Tao
  2022-10-11  9:09   ` Paul E. McKenney
  2 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-11  7:11 UTC (permalink / raw)
  To: bpf
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Paul E . McKenney, Delyan Kratunov, rcu, houtao1

From: Hou Tao <houtao1@huawei.com>

To support sleepable uprobe bpf program, the freeing of trace program
array chains a RCU-tasks-trace grace period with a normal RCU grace
period. But considering in the current implementation of RCU-tasks-trace
that one RCU-tasks-trace grace period implies one normal RCU grace
period, so it is not need for such chaining and it is safe to free the
array in the callback of call_rcu_tasks_trace().

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 711fd293b6de..f943620b55b0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2247,12 +2247,15 @@ void bpf_prog_array_free(struct bpf_prog_array *progs)
 	kfree_rcu(progs, rcu);
 }
 
+/* Now RCU Tasks grace period implies RCU grace period, so no need to call
+ * kfree_rcu(), just call kfree() directly.
+ */
 static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu)
 {
 	struct bpf_prog_array *progs;
 
 	progs = container_of(rcu, struct bpf_prog_array, rcu);
-	kfree_rcu(progs, rcu);
+	kfree(progs);
 }
 
 void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-11  7:11 ` [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period Hou Tao
@ 2022-10-11  9:07   ` Paul E. McKenney
  2022-10-11 11:31     ` Hou Tao
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-11  9:07 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> According to the implementation of RCU Tasks Trace, it inovkes
> ->postscan_func() to wait for one RCU-tasks-trace grace period and
> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
> normal RCU grace period in turn, so one RCU-tasks-trace grace period
> will imply one RCU grace period.
> 
> So there is no need to do call_rcu() again in the callback of
> call_rcu_tasks_trace() and it can just free these elements directly.

This is true, but this is an implementation detail that is not guaranteed
in future versions of the kernel.  But if this additional call_rcu()
is causing trouble, I could add some API member that returned true in
kernels where it does happen to be the case that call_rcu_tasks_trace()
implies a call_rcu()-style grace period.

The BPF memory allocator could then complain or adapt, as appropriate.

Thoughts?

							Thanx, Paul

> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/memalloc.c | 17 ++++++-----------
>  1 file changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 5f83be1d2018..6f32dddc804f 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>  	kfree(obj);
>  }
>  
> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
> + * an extra call_rcu().
> + */
>  static void __free_rcu(struct rcu_head *head)
>  {
>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>  	atomic_set(&c->call_rcu_in_progress, 0);
>  }
>  
> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> -{
> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> -
> -	call_rcu(&c->rcu, __free_rcu);
> -}
> -
>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>  {
>  	struct llist_node *llnode = obj;
> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>  		 * from __free_rcu() and from drain_mem_cache().
>  		 */
>  		__llist_add(llnode, &c->waiting_for_gp);
> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> -	 * Then use call_rcu() to wait for normal progs to finish
> -	 * and finally do free_one() on each element.
> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
> +	 * progs to finish and finally do free_one() on each element.
>  	 */
> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>  }
>  
>  static void free_bulk(struct bpf_mem_cache *c)
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Free local storage memory after one RCU-tasks-trace grace period
  2022-10-11  7:11 ` [PATCH bpf-next 2/3] bpf: Free local storage memory " Hou Tao
@ 2022-10-11  9:09   ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-11  9:09 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Tue, Oct 11, 2022 at 03:11:27PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Local storage map is accessible for both sleepable and normal bpf
> program, so its memory is freed by using both call_rcu_tasks_trace() and
> kfree_rcu() to wait for both RCU-tasks-trace grace period and RCU grace
> period to expire.
> 
> However According to the current implementation of RCU-tasks-trace, one
> RCU-tasks-trace grace period waits for one RCU grace period, so there is
> no need to call kfree_rcu(), it is safe to call kfree() directly.

Again, this is true, but this is an implementation detail that is not
guaranteed in future versions of the kernel.  But if this additional
call_rcu() is causing trouble, I could add some API member that
returned true in kernels where it does happen to be the case that
call_rcu_tasks_trace() implies a call_rcu()-style grace period.

The BPF local storage code could then complain or adapt, as appropriate.

Again, thoughts?

							Thanx, Paul

> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/bpf_local_storage.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index 802fc15b0d73..18a2dd611635 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -84,20 +84,26 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
>  	return NULL;
>  }
>  
> +/* Now RCU Tasks grace period implies RCU grace period, so no need to call
> + * kfree_rcu(), just call kfree() directly.
> + */
>  void bpf_local_storage_free_rcu(struct rcu_head *rcu)
>  {
>  	struct bpf_local_storage *local_storage;
>  
>  	local_storage = container_of(rcu, struct bpf_local_storage, rcu);
> -	kfree_rcu(local_storage, rcu);
> +	kfree(local_storage);
>  }
>  
> +/* Now RCU Tasks grace period implies RCU grace period, so no need to call
> + * kfree_rcu(), just call kfree() directly.
> + */
>  static void bpf_selem_free_rcu(struct rcu_head *rcu)
>  {
>  	struct bpf_local_storage_elem *selem;
>  
>  	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
> -	kfree_rcu(selem, rcu);
> +	kfree(selem);
>  }
>  
>  /* local_storage->lock must be held and selem->local_storage == local_storage.
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 3/3] bpf: Free trace program array after one RCU-tasks-trace grace period
  2022-10-11  7:11 ` [PATCH bpf-next 3/3] bpf: Free trace program array " Hou Tao
@ 2022-10-11  9:09   ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-11  9:09 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Tue, Oct 11, 2022 at 03:11:28PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> To support sleepable uprobe bpf program, the freeing of trace program
> array chains a RCU-tasks-trace grace period with a normal RCU grace
> period. But considering in the current implementation of RCU-tasks-trace
> that one RCU-tasks-trace grace period implies one normal RCU grace
> period, so it is not need for such chaining and it is safe to free the
> array in the callback of call_rcu_tasks_trace().

And the same here.  ;-)

							Thanx, Paul

> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/core.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 711fd293b6de..f943620b55b0 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2247,12 +2247,15 @@ void bpf_prog_array_free(struct bpf_prog_array *progs)
>  	kfree_rcu(progs, rcu);
>  }
>  
> +/* Now RCU Tasks grace period implies RCU grace period, so no need to call
> + * kfree_rcu(), just call kfree() directly.
> + */
>  static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu)
>  {
>  	struct bpf_prog_array *progs;
>  
>  	progs = container_of(rcu, struct bpf_prog_array, rcu);
> -	kfree_rcu(progs, rcu);
> +	kfree(progs);
>  }
>  
>  void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs)
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-11  9:07   ` Paul E. McKenney
@ 2022-10-11 11:31     ` Hou Tao
  2022-10-12  6:36       ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-11 11:31 UTC (permalink / raw)
  To: paulmck, Alexei Starovoitov
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Delyan Kratunov, rcu, houtao1

Hi,

On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
> On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> According to the implementation of RCU Tasks Trace, it inovkes
>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
>> will imply one RCU grace period.
>>
>> So there is no need to do call_rcu() again in the callback of
>> call_rcu_tasks_trace() and it can just free these elements directly.
> This is true, but this is an implementation detail that is not guaranteed
> in future versions of the kernel.  But if this additional call_rcu()
> is causing trouble, I could add some API member that returned true in
> kernels where it does happen to be the case that call_rcu_tasks_trace()
> implies a call_rcu()-style grace period.
>
> The BPF memory allocator could then complain or adapt, as appropriate.
>
> Thoughts?
It is indeed an implementation details. But In an idle KVM guest, for memory
reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
and RCU grace period is about 20 ms. Under stress condition, the grace period
will be much longer. If the extra RCU grace period can be removed, these memory
can be reclaimed more quickly and it will be beneficial for memory pressure.

Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
passed. But It needs to add at least one unsigned long into the freeing object.
The extra memory overhead may be OK for bpf memory allocator, but it is not for
small object. So could you please show example on how these new APIs work ? Does
it need to modify the to-be-free object ?
>
> 							Thanx, Paul
>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/memalloc.c | 17 ++++++-----------
>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>> index 5f83be1d2018..6f32dddc804f 100644
>> --- a/kernel/bpf/memalloc.c
>> +++ b/kernel/bpf/memalloc.c
>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>>  	kfree(obj);
>>  }
>>  
>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
>> + * an extra call_rcu().
>> + */
>>  static void __free_rcu(struct rcu_head *head)
>>  {
>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>>  	atomic_set(&c->call_rcu_in_progress, 0);
>>  }
>>  
>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
>> -{
>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>> -
>> -	call_rcu(&c->rcu, __free_rcu);
>> -}
>> -
>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>>  {
>>  	struct llist_node *llnode = obj;
>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>>  		 * from __free_rcu() and from drain_mem_cache().
>>  		 */
>>  		__llist_add(llnode, &c->waiting_for_gp);
>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
>> -	 * Then use call_rcu() to wait for normal progs to finish
>> -	 * and finally do free_one() on each element.
>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
>> +	 * progs to finish and finally do free_one() on each element.
>>  	 */
>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>>  }
>>  
>>  static void free_bulk(struct bpf_mem_cache *c)
>> -- 
>> 2.29.2
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-11 11:31     ` Hou Tao
@ 2022-10-12  6:36       ` Paul E. McKenney
  2022-10-12  9:26         ` Hou Tao
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-12  6:36 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Tue, Oct 11, 2022 at 07:31:28PM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
> > On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> According to the implementation of RCU Tasks Trace, it inovkes
> >> ->postscan_func() to wait for one RCU-tasks-trace grace period and
> >> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
> >> normal RCU grace period in turn, so one RCU-tasks-trace grace period
> >> will imply one RCU grace period.
> >>
> >> So there is no need to do call_rcu() again in the callback of
> >> call_rcu_tasks_trace() and it can just free these elements directly.
> > This is true, but this is an implementation detail that is not guaranteed
> > in future versions of the kernel.  But if this additional call_rcu()
> > is causing trouble, I could add some API member that returned true in
> > kernels where it does happen to be the case that call_rcu_tasks_trace()
> > implies a call_rcu()-style grace period.
> >
> > The BPF memory allocator could then complain or adapt, as appropriate.
> >
> > Thoughts?
> It is indeed an implementation details. But In an idle KVM guest, for memory
> reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
> and RCU grace period is about 20 ms. Under stress condition, the grace period
> will be much longer. If the extra RCU grace period can be removed, these memory
> can be reclaimed more quickly and it will be beneficial for memory pressure.

I understand the benefits.  We just need to get a safe way to take
advantage of them.

> Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
> poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
> passed. But It needs to add at least one unsigned long into the freeing object.
> The extra memory overhead may be OK for bpf memory allocator, but it is not for
> small object. So could you please show example on how these new APIs work ? Does
> it need to modify the to-be-free object ?

Good point on the polling APIs, more on this below.

I was thinking in terms of an API like this:

	static inline bool rcu_trace_implies_rcu_gp(void)
	{
		return true;
	}

Along with comments on the synchronize_rcu() pointing at the
rcu_trace_implies_rcu_gp().

Another approach is to wait for the grace periods concurrently, but this
requires not one but two rcu_head structures.

Back to the polling API.  Are these things freed individually, or can
they be grouped?  If they can be grouped, the storage for the grace-period
state could be associated with the group.

							Thanx, Paul

> >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >> ---
> >>  kernel/bpf/memalloc.c | 17 ++++++-----------
> >>  1 file changed, 6 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> >> index 5f83be1d2018..6f32dddc804f 100644
> >> --- a/kernel/bpf/memalloc.c
> >> +++ b/kernel/bpf/memalloc.c
> >> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
> >>  	kfree(obj);
> >>  }
> >>  
> >> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
> >> + * an extra call_rcu().
> >> + */
> >>  static void __free_rcu(struct rcu_head *head)
> >>  {
> >>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
> >>  	atomic_set(&c->call_rcu_in_progress, 0);
> >>  }
> >>  
> >> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> >> -{
> >> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >> -
> >> -	call_rcu(&c->rcu, __free_rcu);
> >> -}
> >> -
> >>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
> >>  {
> >>  	struct llist_node *llnode = obj;
> >> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
> >>  		 * from __free_rcu() and from drain_mem_cache().
> >>  		 */
> >>  		__llist_add(llnode, &c->waiting_for_gp);
> >> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> >> -	 * Then use call_rcu() to wait for normal progs to finish
> >> -	 * and finally do free_one() on each element.
> >> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
> >> +	 * progs to finish and finally do free_one() on each element.
> >>  	 */
> >> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
> >> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
> >>  }
> >>  
> >>  static void free_bulk(struct bpf_mem_cache *c)
> >> -- 
> >> 2.29.2
> >>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-12  6:36       ` Paul E. McKenney
@ 2022-10-12  9:26         ` Hou Tao
  2022-10-12 16:11           ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-12  9:26 UTC (permalink / raw)
  To: paulmck, Alexei Starovoitov
  Cc: bpf, Martin KaFai Lau, Andrii Nakryiko, Song Liu, Hao Luo,
	Yonghong Song, Daniel Borkmann, KP Singh, David S . Miller,
	Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
	Delyan Kratunov, rcu, houtao1

Hi,

On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
> On Tue, Oct 11, 2022 at 07:31:28PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
>>> On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> According to the implementation of RCU Tasks Trace, it inovkes
>>>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
>>>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
>>>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
>>>> will imply one RCU grace period.
>>>>
>>>> So there is no need to do call_rcu() again in the callback of
>>>> call_rcu_tasks_trace() and it can just free these elements directly.
>>> This is true, but this is an implementation detail that is not guaranteed
>>> in future versions of the kernel.  But if this additional call_rcu()
>>> is causing trouble, I could add some API member that returned true in
>>> kernels where it does happen to be the case that call_rcu_tasks_trace()
>>> implies a call_rcu()-style grace period.
>>>
>>> The BPF memory allocator could then complain or adapt, as appropriate.
>>>
>>> Thoughts?
>> It is indeed an implementation details. But In an idle KVM guest, for memory
>> reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
>> and RCU grace period is about 20 ms. Under stress condition, the grace period
>> will be much longer. If the extra RCU grace period can be removed, these memory
>> can be reclaimed more quickly and it will be beneficial for memory pressure.
> I understand the benefits.  We just need to get a safe way to take
> advantage of them.
>
>> Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
>> poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
>> passed. But It needs to add at least one unsigned long into the freeing object.
>> The extra memory overhead may be OK for bpf memory allocator, but it is not for
>> small object. So could you please show example on how these new APIs work ? Does
>> it need to modify the to-be-free object ?
> Good point on the polling APIs, more on this below.
>
> I was thinking in terms of an API like this:
>
> 	static inline bool rcu_trace_implies_rcu_gp(void)
> 	{
> 		return true;
> 	}
>
> Along with comments on the synchronize_rcu() pointing at the
> rcu_trace_implies_rcu_gp().
It is a simple API and the modifications for call_rcu_tasks_trace() users will
also be simple. The callback of call_rcu_tasks_trace() will invoke
rcu_trace_implies_rcu_gp(), and it returns true, the callback invokes the
callback for call_rcu() directly, else it does so through call_rcu().
>
> Another approach is to wait for the grace periods concurrently, but this
> requires not one but two rcu_head structures.
Beside the extra space usage, does it also complicate the logic in callback
function because it needs to handle the concurrency problem ?
>
> Back to the polling API.  Are these things freed individually, or can
> they be grouped?  If they can be grouped, the storage for the grace-period
> state could be associated with the group.
As said above, for bpf memory allocator it may be OK because it frees elements
in batch, but for bpf local storage and its element these memories are freed
individually. So I think if the implication of RCU tasks trace grace period will
not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
using it in bpf is a good idea. What do you think ?
>
> 							Thanx, Paul
>
>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>>> ---
>>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
>>>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>>>> index 5f83be1d2018..6f32dddc804f 100644
>>>> --- a/kernel/bpf/memalloc.c
>>>> +++ b/kernel/bpf/memalloc.c
>>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>>>>  	kfree(obj);
>>>>  }
>>>>  
>>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
>>>> + * an extra call_rcu().
>>>> + */
>>>>  static void __free_rcu(struct rcu_head *head)
>>>>  {
>>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>>>>  	atomic_set(&c->call_rcu_in_progress, 0);
>>>>  }
>>>>  
>>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
>>>> -{
>>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>> -
>>>> -	call_rcu(&c->rcu, __free_rcu);
>>>> -}
>>>> -
>>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>>>>  {
>>>>  	struct llist_node *llnode = obj;
>>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>>>>  		 * from __free_rcu() and from drain_mem_cache().
>>>>  		 */
>>>>  		__llist_add(llnode, &c->waiting_for_gp);
>>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
>>>> -	 * Then use call_rcu() to wait for normal progs to finish
>>>> -	 * and finally do free_one() on each element.
>>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
>>>> +	 * progs to finish and finally do free_one() on each element.
>>>>  	 */
>>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>>>>  }
>>>>  
>>>>  static void free_bulk(struct bpf_mem_cache *c)
>>>> -- 
>>>> 2.29.2
>>>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-12  9:26         ` Hou Tao
@ 2022-10-12 16:11           ` Paul E. McKenney
  2022-10-13  1:25             ` Hou Tao
  2022-10-13  1:41             ` Hou Tao
  0 siblings, 2 replies; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-12 16:11 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
> > On Tue, Oct 11, 2022 at 07:31:28PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
> >>> On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
> >>>> From: Hou Tao <houtao1@huawei.com>
> >>>>
> >>>> According to the implementation of RCU Tasks Trace, it inovkes
> >>>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
> >>>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
> >>>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
> >>>> will imply one RCU grace period.
> >>>>
> >>>> So there is no need to do call_rcu() again in the callback of
> >>>> call_rcu_tasks_trace() and it can just free these elements directly.
> >>> This is true, but this is an implementation detail that is not guaranteed
> >>> in future versions of the kernel.  But if this additional call_rcu()
> >>> is causing trouble, I could add some API member that returned true in
> >>> kernels where it does happen to be the case that call_rcu_tasks_trace()
> >>> implies a call_rcu()-style grace period.
> >>>
> >>> The BPF memory allocator could then complain or adapt, as appropriate.
> >>>
> >>> Thoughts?
> >> It is indeed an implementation details. But In an idle KVM guest, for memory
> >> reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
> >> and RCU grace period is about 20 ms. Under stress condition, the grace period
> >> will be much longer. If the extra RCU grace period can be removed, these memory
> >> can be reclaimed more quickly and it will be beneficial for memory pressure.
> > I understand the benefits.  We just need to get a safe way to take
> > advantage of them.
> >
> >> Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
> >> poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
> >> passed. But It needs to add at least one unsigned long into the freeing object.
> >> The extra memory overhead may be OK for bpf memory allocator, but it is not for
> >> small object. So could you please show example on how these new APIs work ? Does
> >> it need to modify the to-be-free object ?
> > Good point on the polling APIs, more on this below.
> >
> > I was thinking in terms of an API like this:
> >
> > 	static inline bool rcu_trace_implies_rcu_gp(void)
> > 	{
> > 		return true;
> > 	}
> >
> > Along with comments on the synchronize_rcu() pointing at the
> > rcu_trace_implies_rcu_gp().
> 
> It is a simple API and the modifications for call_rcu_tasks_trace() users will
> also be simple. The callback of call_rcu_tasks_trace() will invoke
> rcu_trace_implies_rcu_gp(), and it returns true, the callback invokes the
> callback for call_rcu() directly, else it does so through call_rcu().

Sounds good!

Please note that if the callback function just does kfree() or equivalent,
this will work fine.  If it acquires spinlocks, you may need to do
local_bh_disable() before invoking it directly and local_bh_enable()
afterwards.

> > Another approach is to wait for the grace periods concurrently, but this
> > requires not one but two rcu_head structures.
> 
> Beside the extra space usage, does it also complicate the logic in callback
> function because it needs to handle the concurrency problem ?

Definitely!!!

Perhaps something like this:

	static void cbf(struct rcu_head *rhp)
	{
		p = container_of(rhp, ...);

		if (atomic_dec_and_test(&p->cbs_awaiting))
			kfree(p);
	}

	atomic_set(&p->cbs_awating, 2);
	call_rcu(p->rh1, cbf);
	call_rcu_tasks_trace(p->rh2, cbf);

Is this worth it?  I have no idea.  I must defer to you.

> > Back to the polling API.  Are these things freed individually, or can
> > they be grouped?  If they can be grouped, the storage for the grace-period
> > state could be associated with the group.
> 
> As said above, for bpf memory allocator it may be OK because it frees elements
> in batch, but for bpf local storage and its element these memories are freed
> individually. So I think if the implication of RCU tasks trace grace period will
> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
> using it in bpf is a good idea. What do you think ?

Maybe the BPF memory allocator does it one way and BPF local storage
does it another way.

How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
you carry it with your series?  That way I don't have an unused function
in -rcu and you don't have to wait for me to send it upstream?

							Thanx, Paul

> >>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >>>> ---
> >>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
> >>>>  1 file changed, 6 insertions(+), 11 deletions(-)
> >>>>
> >>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> >>>> index 5f83be1d2018..6f32dddc804f 100644
> >>>> --- a/kernel/bpf/memalloc.c
> >>>> +++ b/kernel/bpf/memalloc.c
> >>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
> >>>>  	kfree(obj);
> >>>>  }
> >>>>  
> >>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
> >>>> + * an extra call_rcu().
> >>>> + */
> >>>>  static void __free_rcu(struct rcu_head *head)
> >>>>  {
> >>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
> >>>>  	atomic_set(&c->call_rcu_in_progress, 0);
> >>>>  }
> >>>>  
> >>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> >>>> -{
> >>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >>>> -
> >>>> -	call_rcu(&c->rcu, __free_rcu);
> >>>> -}
> >>>> -
> >>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
> >>>>  {
> >>>>  	struct llist_node *llnode = obj;
> >>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
> >>>>  		 * from __free_rcu() and from drain_mem_cache().
> >>>>  		 */
> >>>>  		__llist_add(llnode, &c->waiting_for_gp);
> >>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> >>>> -	 * Then use call_rcu() to wait for normal progs to finish
> >>>> -	 * and finally do free_one() on each element.
> >>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
> >>>> +	 * progs to finish and finally do free_one() on each element.
> >>>>  	 */
> >>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
> >>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
> >>>>  }
> >>>>  
> >>>>  static void free_bulk(struct bpf_mem_cache *c)
> >>>> -- 
> >>>> 2.29.2
> >>>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-12 16:11           ` Paul E. McKenney
@ 2022-10-13  1:25             ` Hou Tao
  2022-10-13  5:07               ` Martin KaFai Lau
  2022-10-13 19:00               ` Paul E. McKenney
  2022-10-13  1:41             ` Hou Tao
  1 sibling, 2 replies; 22+ messages in thread
From: Hou Tao @ 2022-10-13  1:25 UTC (permalink / raw)
  To: paulmck
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

Hi,

On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
>>> On Tue, Oct 11, 2022 at 07:31:28PM +0800, Hou Tao wrote:
>>>> Hi,
>>>>
>>>> On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
>>>>> On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
>>>>>> From: Hou Tao <houtao1@huawei.com>
>>>>>>
>>>>>> According to the implementation of RCU Tasks Trace, it inovkes
>>>>>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
>>>>>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
>>>>>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
>>>>>> will imply one RCU grace period.
>>>>>>
>>>>>> So there is no need to do call_rcu() again in the callback of
>>>>>> call_rcu_tasks_trace() and it can just free these elements directly.
>>>>> This is true, but this is an implementation detail that is not guaranteed
>>>>> in future versions of the kernel.  But if this additional call_rcu()
>>>>> is causing trouble, I could add some API member that returned true in
>>>>> kernels where it does happen to be the case that call_rcu_tasks_trace()
>>>>> implies a call_rcu()-style grace period.
>>>>>
>>>>> The BPF memory allocator could then complain or adapt, as appropriate.
>>>>>
>>>>> Thoughts?
>>>> It is indeed an implementation details. But In an idle KVM guest, for memory
>>>> reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
>>>> and RCU grace period is about 20 ms. Under stress condition, the grace period
>>>> will be much longer. If the extra RCU grace period can be removed, these memory
>>>> can be reclaimed more quickly and it will be beneficial for memory pressure.
>>> I understand the benefits.  We just need to get a safe way to take
>>> advantage of them.
>>>
>>>> Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
>>>> poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
>>>> passed. But It needs to add at least one unsigned long into the freeing object.
>>>> The extra memory overhead may be OK for bpf memory allocator, but it is not for
>>>> small object. So could you please show example on how these new APIs work ? Does
>>>> it need to modify the to-be-free object ?
>>> Good point on the polling APIs, more on this below.
>>>
>>> I was thinking in terms of an API like this:
>>>
>>> 	static inline bool rcu_trace_implies_rcu_gp(void)
>>> 	{
>>> 		return true;
>>> 	}
>>>
>>> Along with comments on the synchronize_rcu() pointing at the
>>> rcu_trace_implies_rcu_gp().
>> It is a simple API and the modifications for call_rcu_tasks_trace() users will
>> also be simple. The callback of call_rcu_tasks_trace() will invoke
>> rcu_trace_implies_rcu_gp(), and it returns true, the callback invokes the
>> callback for call_rcu() directly, else it does so through call_rcu().
> Sounds good!
>
> Please note that if the callback function just does kfree() or equivalent,
> this will work fine.  If it acquires spinlocks, you may need to do
> local_bh_disable() before invoking it directly and local_bh_enable()
> afterwards.
What is the purpose for invoking local_bh_disable() ? Is it trying to ensure the
callback is called under soft-irq context or something else ? For all I know,
task rcu already invokes its callback with soft-irq disabled.
>
>>> Another approach is to wait for the grace periods concurrently, but this
>>> requires not one but two rcu_head structures.
>> Beside the extra space usage, does it also complicate the logic in callback
>> function because it needs to handle the concurrency problem ?
> Definitely!!!
>
> Perhaps something like this:
>
> 	static void cbf(struct rcu_head *rhp)
> 	{
> 		p = container_of(rhp, ...);
>
> 		if (atomic_dec_and_test(&p->cbs_awaiting))
> 			kfree(p);
> 	}
>
> 	atomic_set(&p->cbs_awating, 2);
> 	call_rcu(p->rh1, cbf);
> 	call_rcu_tasks_trace(p->rh2, cbf);
>
> Is this worth it?  I have no idea.  I must defer to you.
I still prefer the simple solution.
>
>>> Back to the polling API.  Are these things freed individually, or can
>>> they be grouped?  If they can be grouped, the storage for the grace-period
>>> state could be associated with the group.
>> As said above, for bpf memory allocator it may be OK because it frees elements
>> in batch, but for bpf local storage and its element these memories are freed
>> individually. So I think if the implication of RCU tasks trace grace period will
>> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
>> using it in bpf is a good idea. What do you think ?
> Maybe the BPF memory allocator does it one way and BPF local storage
> does it another way.
Why not. Maybe bpf expert think the space overhead is also reasonable in the BPF
local storage case.
>
> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> you carry it with your series?  That way I don't have an unused function
> in -rcu and you don't have to wait for me to send it upstream?
Sound reasonable to me. Also thanks for your suggestions.
>
> 							Thanx, Paul
>
>>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>>>>> ---
>>>>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
>>>>>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>>>>>> index 5f83be1d2018..6f32dddc804f 100644
>>>>>> --- a/kernel/bpf/memalloc.c
>>>>>> +++ b/kernel/bpf/memalloc.c
>>>>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>>>>>>  	kfree(obj);
>>>>>>  }
>>>>>>  
>>>>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
>>>>>> + * an extra call_rcu().
>>>>>> + */
>>>>>>  static void __free_rcu(struct rcu_head *head)
>>>>>>  {
>>>>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>>>>>>  	atomic_set(&c->call_rcu_in_progress, 0);
>>>>>>  }
>>>>>>  
>>>>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
>>>>>> -{
>>>>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>>>> -
>>>>>> -	call_rcu(&c->rcu, __free_rcu);
>>>>>> -}
>>>>>> -
>>>>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>>>>>>  {
>>>>>>  	struct llist_node *llnode = obj;
>>>>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>>>>>>  		 * from __free_rcu() and from drain_mem_cache().
>>>>>>  		 */
>>>>>>  		__llist_add(llnode, &c->waiting_for_gp);
>>>>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
>>>>>> -	 * Then use call_rcu() to wait for normal progs to finish
>>>>>> -	 * and finally do free_one() on each element.
>>>>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
>>>>>> +	 * progs to finish and finally do free_one() on each element.
>>>>>>  	 */
>>>>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>>>>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>>>>>>  }
>>>>>>  
>>>>>>  static void free_bulk(struct bpf_mem_cache *c)
>>>>>> -- 
>>>>>> 2.29.2
>>>>>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-12 16:11           ` Paul E. McKenney
  2022-10-13  1:25             ` Hou Tao
@ 2022-10-13  1:41             ` Hou Tao
  2022-10-13 19:04               ` Paul E. McKenney
  1 sibling, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-13  1:41 UTC (permalink / raw)
  To: paulmck
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

Hi,

On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
SNIP
>> As said above, for bpf memory allocator it may be OK because it frees elements
>> in batch, but for bpf local storage and its element these memories are freed
>> individually. So I think if the implication of RCU tasks trace grace period will
>> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
>> using it in bpf is a good idea. What do you think ?
> Maybe the BPF memory allocator does it one way and BPF local storage
> does it another way.
Another question. Just find out that there are new APIs for RCU polling (e.g.
get_state_synchronize_rcu_full()). According to comments, the advantage of new
API is that it will never miss a passed grace period. So for this case is
get_state_synchronize_rcu() enough ? Or should I switch to use
get_state_synchronize_rcu_full() instead ?

Regards
> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> you carry it with your series?  That way I don't have an unused function
> in -rcu and you don't have to wait for me to send it upstream?
>
> 							Thanx, Paul
>
>>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>>>>> ---
>>>>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
>>>>>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>>>>>> index 5f83be1d2018..6f32dddc804f 100644
>>>>>> --- a/kernel/bpf/memalloc.c
>>>>>> +++ b/kernel/bpf/memalloc.c
>>>>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>>>>>>  	kfree(obj);
>>>>>>  }
>>>>>>  
>>>>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
>>>>>> + * an extra call_rcu().
>>>>>> + */
>>>>>>  static void __free_rcu(struct rcu_head *head)
>>>>>>  {
>>>>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>>>>>>  	atomic_set(&c->call_rcu_in_progress, 0);
>>>>>>  }
>>>>>>  
>>>>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
>>>>>> -{
>>>>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>>>> -
>>>>>> -	call_rcu(&c->rcu, __free_rcu);
>>>>>> -}
>>>>>> -
>>>>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>>>>>>  {
>>>>>>  	struct llist_node *llnode = obj;
>>>>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>>>>>>  		 * from __free_rcu() and from drain_mem_cache().
>>>>>>  		 */
>>>>>>  		__llist_add(llnode, &c->waiting_for_gp);
>>>>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
>>>>>> -	 * Then use call_rcu() to wait for normal progs to finish
>>>>>> -	 * and finally do free_one() on each element.
>>>>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
>>>>>> +	 * progs to finish and finally do free_one() on each element.
>>>>>>  	 */
>>>>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>>>>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>>>>>>  }
>>>>>>  
>>>>>>  static void free_bulk(struct bpf_mem_cache *c)
>>>>>> -- 
>>>>>> 2.29.2
>>>>>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-13  1:25             ` Hou Tao
@ 2022-10-13  5:07               ` Martin KaFai Lau
  2022-10-13  9:02                 ` Hou Tao
  2022-10-13 19:00               ` Paul E. McKenney
  1 sibling, 1 reply; 22+ messages in thread
From: Martin KaFai Lau @ 2022-10-13  5:07 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1, paulmck

On 10/12/22 6:25 PM, Hou Tao wrote:
>>>> Back to the polling API.  Are these things freed individually, or can
>>>> they be grouped?  If they can be grouped, the storage for the grace-period
>>>> state could be associated with the group.
>>> As said above, for bpf memory allocator it may be OK because it frees elements
>>> in batch, but for bpf local storage and its element these memories are freed
>>> individually. So I think if the implication of RCU tasks trace grace period will
>>> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
>>> using it in bpf is a good idea. What do you think ?
>> Maybe the BPF memory allocator does it one way and BPF local storage
>> does it another way.
> Why not. Maybe bpf expert think the space overhead is also reasonable in the BPF
> local storage case.

There is only 8 bytes hole left in 'struct bpf_local_storage_elem', so it is 
precious.  Put aside the space overhead, only deletion of a local storage 
requires call_rcu_tasks_trace().  The common use case for local storage is to 
alloc once and stay there for the whole life time of the sk/task/inode.  Thus, 
delete should happen very infrequent.

It will still be nice to optimize though if it does not need extra space and 
that seems possible from reading the thread.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-13  5:07               ` Martin KaFai Lau
@ 2022-10-13  9:02                 ` Hou Tao
  0 siblings, 0 replies; 22+ messages in thread
From: Hou Tao @ 2022-10-13  9:02 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1, paulmck

Hi,

On 10/13/2022 1:07 PM, Martin KaFai Lau wrote:
> On 10/12/22 6:25 PM, Hou Tao wrote:
>>>>> Back to the polling API.  Are these things freed individually, or can
>>>>> they be grouped?  If they can be grouped, the storage for the grace-period
>>>>> state could be associated with the group.
>>>> As said above, for bpf memory allocator it may be OK because it frees elements
>>>> in batch, but for bpf local storage and its element these memories are freed
>>>> individually. So I think if the implication of RCU tasks trace grace period
>>>> will
>>>> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp()
>>>> and
>>>> using it in bpf is a good idea. What do you think ?
>>> Maybe the BPF memory allocator does it one way and BPF local storage
>>> does it another way.
>> Why not. Maybe bpf expert think the space overhead is also reasonable in the BPF
>> local storage case.
>
> There is only 8 bytes hole left in 'struct bpf_local_storage_elem', so it is
> precious.  Put aside the space overhead, only deletion of a local storage
> requires call_rcu_tasks_trace().  The common use case for local storage is to
> alloc once and stay there for the whole life time of the sk/task/inode.  Thus,
> delete should happen very infrequent.
>
> It will still be nice to optimize though if it does not need extra space and
> that seems possible from reading the thread.
Understand. Yes, if using the newly added rcu_trace_implies_rcu_gp(), there will
be no space overhead. I will use the rcu_trace_implies_rcu_gp()-way in all these
cases and defer the use of RCU polling APIs to future work.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-13  1:25             ` Hou Tao
  2022-10-13  5:07               ` Martin KaFai Lau
@ 2022-10-13 19:00               ` Paul E. McKenney
  2022-10-14  4:20                 ` Hou Tao
  1 sibling, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-13 19:00 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Thu, Oct 13, 2022 at 09:25:31AM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> > On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
> >>> On Tue, Oct 11, 2022 at 07:31:28PM +0800, Hou Tao wrote:
> >>>> Hi,
> >>>>
> >>>> On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
> >>>>> On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
> >>>>>> From: Hou Tao <houtao1@huawei.com>
> >>>>>>
> >>>>>> According to the implementation of RCU Tasks Trace, it inovkes
> >>>>>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
> >>>>>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
> >>>>>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
> >>>>>> will imply one RCU grace period.
> >>>>>>
> >>>>>> So there is no need to do call_rcu() again in the callback of
> >>>>>> call_rcu_tasks_trace() and it can just free these elements directly.
> >>>>> This is true, but this is an implementation detail that is not guaranteed
> >>>>> in future versions of the kernel.  But if this additional call_rcu()
> >>>>> is causing trouble, I could add some API member that returned true in
> >>>>> kernels where it does happen to be the case that call_rcu_tasks_trace()
> >>>>> implies a call_rcu()-style grace period.
> >>>>>
> >>>>> The BPF memory allocator could then complain or adapt, as appropriate.
> >>>>>
> >>>>> Thoughts?
> >>>> It is indeed an implementation details. But In an idle KVM guest, for memory
> >>>> reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
> >>>> and RCU grace period is about 20 ms. Under stress condition, the grace period
> >>>> will be much longer. If the extra RCU grace period can be removed, these memory
> >>>> can be reclaimed more quickly and it will be beneficial for memory pressure.
> >>> I understand the benefits.  We just need to get a safe way to take
> >>> advantage of them.
> >>>
> >>>> Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
> >>>> poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
> >>>> passed. But It needs to add at least one unsigned long into the freeing object.
> >>>> The extra memory overhead may be OK for bpf memory allocator, but it is not for
> >>>> small object. So could you please show example on how these new APIs work ? Does
> >>>> it need to modify the to-be-free object ?
> >>> Good point on the polling APIs, more on this below.
> >>>
> >>> I was thinking in terms of an API like this:
> >>>
> >>> 	static inline bool rcu_trace_implies_rcu_gp(void)
> >>> 	{
> >>> 		return true;
> >>> 	}
> >>>
> >>> Along with comments on the synchronize_rcu() pointing at the
> >>> rcu_trace_implies_rcu_gp().
> >> It is a simple API and the modifications for call_rcu_tasks_trace() users will
> >> also be simple. The callback of call_rcu_tasks_trace() will invoke
> >> rcu_trace_implies_rcu_gp(), and it returns true, the callback invokes the
> >> callback for call_rcu() directly, else it does so through call_rcu().
> > Sounds good!
> >
> > Please note that if the callback function just does kfree() or equivalent,
> > this will work fine.  If it acquires spinlocks, you may need to do
> > local_bh_disable() before invoking it directly and local_bh_enable()
> > afterwards.
> What is the purpose for invoking local_bh_disable() ? Is it trying to ensure the
> callback is called under soft-irq context or something else ? For all I know,
> task rcu already invokes its callback with soft-irq disabled.
> >
> >>> Another approach is to wait for the grace periods concurrently, but this
> >>> requires not one but two rcu_head structures.
> >> Beside the extra space usage, does it also complicate the logic in callback
> >> function because it needs to handle the concurrency problem ?
> > Definitely!!!
> >
> > Perhaps something like this:
> >
> > 	static void cbf(struct rcu_head *rhp)
> > 	{
> > 		p = container_of(rhp, ...);
> >
> > 		if (atomic_dec_and_test(&p->cbs_awaiting))
> > 			kfree(p);
> > 	}
> >
> > 	atomic_set(&p->cbs_awating, 2);
> > 	call_rcu(p->rh1, cbf);
> > 	call_rcu_tasks_trace(p->rh2, cbf);
> >
> > Is this worth it?  I have no idea.  I must defer to you.
> I still prefer the simple solution.
> >
> >>> Back to the polling API.  Are these things freed individually, or can
> >>> they be grouped?  If they can be grouped, the storage for the grace-period
> >>> state could be associated with the group.
> >> As said above, for bpf memory allocator it may be OK because it frees elements
> >> in batch, but for bpf local storage and its element these memories are freed
> >> individually. So I think if the implication of RCU tasks trace grace period will
> >> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
> >> using it in bpf is a good idea. What do you think ?
> > Maybe the BPF memory allocator does it one way and BPF local storage
> > does it another way.
> Why not. Maybe bpf expert think the space overhead is also reasonable in the BPF
> local storage case.
> >
> > How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> > you carry it with your series?  That way I don't have an unused function
> > in -rcu and you don't have to wait for me to send it upstream?
> Sound reasonable to me. Also thanks for your suggestions.

Here you go!  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 2eac2f7a9a6d8921e8084a6acdffa595e99dbd17
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Oct 13 11:54:13 2022 -0700

    rcu-tasks: Provide rcu_trace_implies_rcu_gp()
    
    As an accident of implementation, an RCU Tasks Trace grace period also
    acts as an RCU grace period.  However, this could change at any time.
    This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
    returns true to codify this accident.  Code relying on this accident
    must call this function to verify that this accident is still happening.
    
    Reported-by: Hou Tao <houtao@huaweicloud.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Martin KaFai Lau <martin.lau@linux.dev>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 08605ce7379d7..8822f06e4b40c 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -240,6 +240,18 @@ static inline void exit_tasks_rcu_start(void) { }
 static inline void exit_tasks_rcu_finish(void) { }
 #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
 
+/**
+ * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
+ *
+ * As an accident of implementation, an RCU Tasks Trace grace period also
+ * acts as an RCU grace period.  However, this could change at any time.
+ * Code relying on this accident must call this function to verify that
+ * this accident is still happening.
+ *
+ * You have been warned!
+ */
+static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
+
 /**
  * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
  *
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index b0b885e071fa8..fe9840d90e960 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1535,6 +1535,8 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
 {
 	// Wait for late-stage exiting tasks to finish exiting.
 	// These might have passed the call to exit_tasks_rcu_finish().
+
+	// If you remove the following line, update rcu_trace_implies_rcu_gp()!!!
 	synchronize_rcu();
 	// Any tasks that exit after this point will set
 	// TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-13  1:41             ` Hou Tao
@ 2022-10-13 19:04               ` Paul E. McKenney
  2022-10-14  4:04                 ` Hou Tao
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-13 19:04 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Thu, Oct 13, 2022 at 09:41:46AM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> > On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
> SNIP
> >> As said above, for bpf memory allocator it may be OK because it frees elements
> >> in batch, but for bpf local storage and its element these memories are freed
> >> individually. So I think if the implication of RCU tasks trace grace period will
> >> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
> >> using it in bpf is a good idea. What do you think ?
> > Maybe the BPF memory allocator does it one way and BPF local storage
> > does it another way.
> 
> Another question. Just find out that there are new APIs for RCU polling (e.g.
> get_state_synchronize_rcu_full()). According to comments, the advantage of new
> API is that it will never miss a passed grace period. So for this case is
> get_state_synchronize_rcu() enough ? Or should I switch to use
> get_state_synchronize_rcu_full() instead ?

I suggest starting with get_state_synchronize_rcu(), and moving to the
_full() variants only if experience shows that it is necessary.

Please note that these functions work with normal RCU, that is,
call_rcu(), but not call_rcu_tasks(), call_rcu_tasks_trace(), or
call_rcu_rude().  Please note also that SRCU has its own set of polling
APIs, for example, get_state_synchronize_srcu().

								Thanx, Paul

> Regards
> > How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> > you carry it with your series?  That way I don't have an unused function
> > in -rcu and you don't have to wait for me to send it upstream?
> >
> > 							Thanx, Paul
> >
> >>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >>>>>> ---
> >>>>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
> >>>>>>  1 file changed, 6 insertions(+), 11 deletions(-)
> >>>>>>
> >>>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> >>>>>> index 5f83be1d2018..6f32dddc804f 100644
> >>>>>> --- a/kernel/bpf/memalloc.c
> >>>>>> +++ b/kernel/bpf/memalloc.c
> >>>>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
> >>>>>>  	kfree(obj);
> >>>>>>  }
> >>>>>>  
> >>>>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
> >>>>>> + * an extra call_rcu().
> >>>>>> + */
> >>>>>>  static void __free_rcu(struct rcu_head *head)
> >>>>>>  {
> >>>>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >>>>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
> >>>>>>  	atomic_set(&c->call_rcu_in_progress, 0);
> >>>>>>  }
> >>>>>>  
> >>>>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> >>>>>> -{
> >>>>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >>>>>> -
> >>>>>> -	call_rcu(&c->rcu, __free_rcu);
> >>>>>> -}
> >>>>>> -
> >>>>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
> >>>>>>  {
> >>>>>>  	struct llist_node *llnode = obj;
> >>>>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
> >>>>>>  		 * from __free_rcu() and from drain_mem_cache().
> >>>>>>  		 */
> >>>>>>  		__llist_add(llnode, &c->waiting_for_gp);
> >>>>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> >>>>>> -	 * Then use call_rcu() to wait for normal progs to finish
> >>>>>> -	 * and finally do free_one() on each element.
> >>>>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
> >>>>>> +	 * progs to finish and finally do free_one() on each element.
> >>>>>>  	 */
> >>>>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
> >>>>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
> >>>>>>  }
> >>>>>>  
> >>>>>>  static void free_bulk(struct bpf_mem_cache *c)
> >>>>>> -- 
> >>>>>> 2.29.2
> >>>>>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-13 19:04               ` Paul E. McKenney
@ 2022-10-14  4:04                 ` Hou Tao
  0 siblings, 0 replies; 22+ messages in thread
From: Hou Tao @ 2022-10-14  4:04 UTC (permalink / raw)
  To: paulmck
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

Hi,

On 10/14/2022 3:04 AM, Paul E. McKenney wrote:
> On Thu, Oct 13, 2022 at 09:41:46AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
>>> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
>>>> Hi,
>>>>
>>>> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
>> SNIP
>>>> As said above, for bpf memory allocator it may be OK because it frees elements
>>>> in batch, but for bpf local storage and its element these memories are freed
>>>> individually. So I think if the implication of RCU tasks trace grace period will
>>>> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
>>>> using it in bpf is a good idea. What do you think ?
>>> Maybe the BPF memory allocator does it one way and BPF local storage
>>> does it another way.
>> Another question. Just find out that there are new APIs for RCU polling (e.g.
>> get_state_synchronize_rcu_full()). According to comments, the advantage of new
>> API is that it will never miss a passed grace period. So for this case is
>> get_state_synchronize_rcu() enough ? Or should I switch to use
>> get_state_synchronize_rcu_full() instead ?
> I suggest starting with get_state_synchronize_rcu(), and moving to the
> _full() variants only if experience shows that it is necessary.
>
> Please note that these functions work with normal RCU, that is,
> call_rcu(), but not call_rcu_tasks(), call_rcu_tasks_trace(), or
> call_rcu_rude().  Please note also that SRCU has its own set of polling
> APIs, for example, get_state_synchronize_srcu().
I see. Thanks for your suggestion and details explanations.
>
> 								Thanx, Paul
>
>> Regards
>>> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
>>> you carry it with your series?  That way I don't have an unused function
>>> in -rcu and you don't have to wait for me to send it upstream?
>>>
>>> 							Thanx, Paul
>>>
>>>>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>>>>>>>> ---
>>>>>>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
>>>>>>>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>>>>>>>> index 5f83be1d2018..6f32dddc804f 100644
>>>>>>>> --- a/kernel/bpf/memalloc.c
>>>>>>>> +++ b/kernel/bpf/memalloc.c
>>>>>>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>>>>>>>>  	kfree(obj);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
>>>>>>>> + * an extra call_rcu().
>>>>>>>> + */
>>>>>>>>  static void __free_rcu(struct rcu_head *head)
>>>>>>>>  {
>>>>>>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>>>>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>>>>>>>>  	atomic_set(&c->call_rcu_in_progress, 0);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
>>>>>>>> -{
>>>>>>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>>>>>>>> -
>>>>>>>> -	call_rcu(&c->rcu, __free_rcu);
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>>>>>>>>  {
>>>>>>>>  	struct llist_node *llnode = obj;
>>>>>>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>>>>>>>>  		 * from __free_rcu() and from drain_mem_cache().
>>>>>>>>  		 */
>>>>>>>>  		__llist_add(llnode, &c->waiting_for_gp);
>>>>>>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
>>>>>>>> -	 * Then use call_rcu() to wait for normal progs to finish
>>>>>>>> -	 * and finally do free_one() on each element.
>>>>>>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
>>>>>>>> +	 * progs to finish and finally do free_one() on each element.
>>>>>>>>  	 */
>>>>>>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>>>>>>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  static void free_bulk(struct bpf_mem_cache *c)
>>>>>>>> -- 
>>>>>>>> 2.29.2
>>>>>>>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-13 19:00               ` Paul E. McKenney
@ 2022-10-14  4:20                 ` Hou Tao
  2022-10-14 12:15                   ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-14  4:20 UTC (permalink / raw)
  To: paulmck
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

Hi,

On 10/14/2022 3:00 AM, Paul E. McKenney wrote:
> On Thu, Oct 13, 2022 at 09:25:31AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
>>> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
SNIP
>>> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
>>> you carry it with your series?  That way I don't have an unused function
>>> in -rcu and you don't have to wait for me to send it upstream?
>> Sound reasonable to me. Also thanks for your suggestions.
> Here you go!  Thoughts?
It looks great and thanks for it.
>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 2eac2f7a9a6d8921e8084a6acdffa595e99dbd17
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Thu Oct 13 11:54:13 2022 -0700
>
>     rcu-tasks: Provide rcu_trace_implies_rcu_gp()
>     
>     As an accident of implementation, an RCU Tasks Trace grace period also
>     acts as an RCU grace period.  However, this could change at any time.
>     This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
>     returns true to codify this accident.  Code relying on this accident
>     must call this function to verify that this accident is still happening.
>     
>     Reported-by: Hou Tao <houtao@huaweicloud.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>     Cc: Alexei Starovoitov <ast@kernel.org>
>     Cc: Martin KaFai Lau <martin.lau@linux.dev>
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 08605ce7379d7..8822f06e4b40c 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -240,6 +240,18 @@ static inline void exit_tasks_rcu_start(void) { }
>  static inline void exit_tasks_rcu_finish(void) { }
>  #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
>  
> +/**
> + * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
> + *
> + * As an accident of implementation, an RCU Tasks Trace grace period also
> + * acts as an RCU grace period.  However, this could change at any time.
> + * Code relying on this accident must call this function to verify that
> + * this accident is still happening.
> + *
> + * You have been warned!
> + */
> +static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
> +
>  /**
>   * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
>   *
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index b0b885e071fa8..fe9840d90e960 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -1535,6 +1535,8 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
>  {
>  	// Wait for late-stage exiting tasks to finish exiting.
>  	// These might have passed the call to exit_tasks_rcu_finish().
> +
> +	// If you remove the following line, update rcu_trace_implies_rcu_gp()!!!
>  	synchronize_rcu();
>  	// Any tasks that exit after this point will set
>  	// TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-14  4:20                 ` Hou Tao
@ 2022-10-14 12:15                   ` Paul E. McKenney
  2022-10-17  6:55                     ` Hou Tao
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-14 12:15 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Fri, Oct 14, 2022 at 12:20:19PM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/14/2022 3:00 AM, Paul E. McKenney wrote:
> > On Thu, Oct 13, 2022 at 09:25:31AM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> >>> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
> SNIP
> >>> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> >>> you carry it with your series?  That way I don't have an unused function
> >>> in -rcu and you don't have to wait for me to send it upstream?
> >> Sound reasonable to me. Also thanks for your suggestions.
> > Here you go!  Thoughts?
> 
> It looks great and thanks for it.

Very good!  I will carry it in -rcu for some time, so please let me know
when/if you pull it into a series.

							Thanx, Paul

> > ------------------------------------------------------------------------
> >
> > commit 2eac2f7a9a6d8921e8084a6acdffa595e99dbd17
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Thu Oct 13 11:54:13 2022 -0700
> >
> >     rcu-tasks: Provide rcu_trace_implies_rcu_gp()
> >     
> >     As an accident of implementation, an RCU Tasks Trace grace period also
> >     acts as an RCU grace period.  However, this could change at any time.
> >     This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
> >     returns true to codify this accident.  Code relying on this accident
> >     must call this function to verify that this accident is still happening.
> >     
> >     Reported-by: Hou Tao <houtao@huaweicloud.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >     Cc: Alexei Starovoitov <ast@kernel.org>
> >     Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 08605ce7379d7..8822f06e4b40c 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -240,6 +240,18 @@ static inline void exit_tasks_rcu_start(void) { }
> >  static inline void exit_tasks_rcu_finish(void) { }
> >  #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
> >  
> > +/**
> > + * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
> > + *
> > + * As an accident of implementation, an RCU Tasks Trace grace period also
> > + * acts as an RCU grace period.  However, this could change at any time.
> > + * Code relying on this accident must call this function to verify that
> > + * this accident is still happening.
> > + *
> > + * You have been warned!
> > + */
> > +static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
> > +
> >  /**
> >   * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
> >   *
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index b0b885e071fa8..fe9840d90e960 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -1535,6 +1535,8 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
> >  {
> >  	// Wait for late-stage exiting tasks to finish exiting.
> >  	// These might have passed the call to exit_tasks_rcu_finish().
> > +
> > +	// If you remove the following line, update rcu_trace_implies_rcu_gp()!!!
> >  	synchronize_rcu();
> >  	// Any tasks that exit after this point will set
> >  	// TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-14 12:15                   ` Paul E. McKenney
@ 2022-10-17  6:55                     ` Hou Tao
  2022-10-17 10:23                       ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Hou Tao @ 2022-10-17  6:55 UTC (permalink / raw)
  To: paulmck
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

Hi,

On 10/14/2022 8:15 PM, Paul E. McKenney wrote:
> On Fri, Oct 14, 2022 at 12:20:19PM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 10/14/2022 3:00 AM, Paul E. McKenney wrote:
>>> On Thu, Oct 13, 2022 at 09:25:31AM +0800, Hou Tao wrote:
>>>> Hi,
>>>>
>>>> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
>>>>> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
>> SNIP
>>>>> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
>>>>> you carry it with your series?  That way I don't have an unused function
>>>>> in -rcu and you don't have to wait for me to send it upstream?
>>>> Sound reasonable to me. Also thanks for your suggestions.
>>> Here you go!  Thoughts?
>> It looks great and thanks for it.
> Very good!  I will carry it in -rcu for some time, so please let me know
> when/if you pull it into a series.
Have already included the patch in v2 patch-set and send it out [0].

0: https://lore.kernel.org/bpf/20221014113946.965131-1-houtao@huaweicloud.com/T/#t

>
> 							Thanx, Paul
>
>>> ------------------------------------------------------------------------
>>>
>>> commit 2eac2f7a9a6d8921e8084a6acdffa595e99dbd17
>>> Author: Paul E. McKenney <paulmck@kernel.org>
>>> Date:   Thu Oct 13 11:54:13 2022 -0700
>>>
>>>     rcu-tasks: Provide rcu_trace_implies_rcu_gp()
>>>     
>>>     As an accident of implementation, an RCU Tasks Trace grace period also
>>>     acts as an RCU grace period.  However, this could change at any time.
>>>     This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
>>>     returns true to codify this accident.  Code relying on this accident
>>>     must call this function to verify that this accident is still happening.
>>>     
>>>     Reported-by: Hou Tao <houtao@huaweicloud.com>
>>>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>     Cc: Alexei Starovoitov <ast@kernel.org>
>>>     Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>
>>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>>> index 08605ce7379d7..8822f06e4b40c 100644
>>> --- a/include/linux/rcupdate.h
>>> +++ b/include/linux/rcupdate.h
>>> @@ -240,6 +240,18 @@ static inline void exit_tasks_rcu_start(void) { }
>>>  static inline void exit_tasks_rcu_finish(void) { }
>>>  #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
>>>  
>>> +/**
>>> + * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
>>> + *
>>> + * As an accident of implementation, an RCU Tasks Trace grace period also
>>> + * acts as an RCU grace period.  However, this could change at any time.
>>> + * Code relying on this accident must call this function to verify that
>>> + * this accident is still happening.
>>> + *
>>> + * You have been warned!
>>> + */
>>> +static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
>>> +
>>>  /**
>>>   * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
>>>   *
>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>>> index b0b885e071fa8..fe9840d90e960 100644
>>> --- a/kernel/rcu/tasks.h
>>> +++ b/kernel/rcu/tasks.h
>>> @@ -1535,6 +1535,8 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
>>>  {
>>>  	// Wait for late-stage exiting tasks to finish exiting.
>>>  	// These might have passed the call to exit_tasks_rcu_finish().
>>> +
>>> +	// If you remove the following line, update rcu_trace_implies_rcu_gp()!!!
>>>  	synchronize_rcu();
>>>  	// Any tasks that exit after this point will set
>>>  	// TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period
  2022-10-17  6:55                     ` Hou Tao
@ 2022-10-17 10:23                       ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2022-10-17 10:23 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Andrii Nakryiko,
	Song Liu, Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, Delyan Kratunov, rcu, houtao1

On Mon, Oct 17, 2022 at 02:55:42PM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/14/2022 8:15 PM, Paul E. McKenney wrote:
> > On Fri, Oct 14, 2022 at 12:20:19PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/14/2022 3:00 AM, Paul E. McKenney wrote:
> >>> On Thu, Oct 13, 2022 at 09:25:31AM +0800, Hou Tao wrote:
> >>>> Hi,
> >>>>
> >>>> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> >>>>> On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
> >> SNIP
> >>>>> How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> >>>>> you carry it with your series?  That way I don't have an unused function
> >>>>> in -rcu and you don't have to wait for me to send it upstream?
> >>>> Sound reasonable to me. Also thanks for your suggestions.
> >>> Here you go!  Thoughts?
> >> It looks great and thanks for it.
> > Very good!  I will carry it in -rcu for some time, so please let me know
> > when/if you pull it into a series.
> Have already included the patch in v2 patch-set and send it out [0].
> 
> 0: https://lore.kernel.org/bpf/20221014113946.965131-1-houtao@huaweicloud.com/T/#t

Thank you, I will drop it from -rcu on my next rebase.  Should you need
to refer to it again, it will remain at tag rcutrace-rcu.2022.10.13a.

							Thanx, Paul

> >>> ------------------------------------------------------------------------
> >>>
> >>> commit 2eac2f7a9a6d8921e8084a6acdffa595e99dbd17
> >>> Author: Paul E. McKenney <paulmck@kernel.org>
> >>> Date:   Thu Oct 13 11:54:13 2022 -0700
> >>>
> >>>     rcu-tasks: Provide rcu_trace_implies_rcu_gp()
> >>>     
> >>>     As an accident of implementation, an RCU Tasks Trace grace period also
> >>>     acts as an RCU grace period.  However, this could change at any time.
> >>>     This commit therefore creates an rcu_trace_implies_rcu_gp() that currently
> >>>     returns true to codify this accident.  Code relying on this accident
> >>>     must call this function to verify that this accident is still happening.
> >>>     
> >>>     Reported-by: Hou Tao <houtao@huaweicloud.com>
> >>>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >>>     Cc: Alexei Starovoitov <ast@kernel.org>
> >>>     Cc: Martin KaFai Lau <martin.lau@linux.dev>
> >>>
> >>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >>> index 08605ce7379d7..8822f06e4b40c 100644
> >>> --- a/include/linux/rcupdate.h
> >>> +++ b/include/linux/rcupdate.h
> >>> @@ -240,6 +240,18 @@ static inline void exit_tasks_rcu_start(void) { }
> >>>  static inline void exit_tasks_rcu_finish(void) { }
> >>>  #endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
> >>>  
> >>> +/**
> >>> + * rcu_trace_implies_rcu_gp - does an RCU Tasks Trace grace period imply an RCU grace period?
> >>> + *
> >>> + * As an accident of implementation, an RCU Tasks Trace grace period also
> >>> + * acts as an RCU grace period.  However, this could change at any time.
> >>> + * Code relying on this accident must call this function to verify that
> >>> + * this accident is still happening.
> >>> + *
> >>> + * You have been warned!
> >>> + */
> >>> +static inline bool rcu_trace_implies_rcu_gp(void) { return true; }
> >>> +
> >>>  /**
> >>>   * cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
> >>>   *
> >>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> >>> index b0b885e071fa8..fe9840d90e960 100644
> >>> --- a/kernel/rcu/tasks.h
> >>> +++ b/kernel/rcu/tasks.h
> >>> @@ -1535,6 +1535,8 @@ static void rcu_tasks_trace_postscan(struct list_head *hop)
> >>>  {
> >>>  	// Wait for late-stage exiting tasks to finish exiting.
> >>>  	// These might have passed the call to exit_tasks_rcu_finish().
> >>> +
> >>> +	// If you remove the following line, update rcu_trace_implies_rcu_gp()!!!
> >>>  	synchronize_rcu();
> >>>  	// Any tasks that exit after this point will set
> >>>  	// TRC_NEED_QS_CHECKED in ->trc_reader_special.b.need_qs.
> > .
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-10-17 10:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-11  7:11 [PATCH bpf-next 0/3] Remove unnecessary RCU grace period chaining Hou Tao
2022-10-11  7:11 ` [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period Hou Tao
2022-10-11  9:07   ` Paul E. McKenney
2022-10-11 11:31     ` Hou Tao
2022-10-12  6:36       ` Paul E. McKenney
2022-10-12  9:26         ` Hou Tao
2022-10-12 16:11           ` Paul E. McKenney
2022-10-13  1:25             ` Hou Tao
2022-10-13  5:07               ` Martin KaFai Lau
2022-10-13  9:02                 ` Hou Tao
2022-10-13 19:00               ` Paul E. McKenney
2022-10-14  4:20                 ` Hou Tao
2022-10-14 12:15                   ` Paul E. McKenney
2022-10-17  6:55                     ` Hou Tao
2022-10-17 10:23                       ` Paul E. McKenney
2022-10-13  1:41             ` Hou Tao
2022-10-13 19:04               ` Paul E. McKenney
2022-10-14  4:04                 ` Hou Tao
2022-10-11  7:11 ` [PATCH bpf-next 2/3] bpf: Free local storage memory " Hou Tao
2022-10-11  9:09   ` Paul E. McKenney
2022-10-11  7:11 ` [PATCH bpf-next 3/3] bpf: Free trace program array " Hou Tao
2022-10-11  9:09   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).