rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: shrink each possible cpu krcp
@ 2020-08-14  6:45 qiang.zhang
  2020-08-14 18:51 ` Uladzislau Rezki
  0 siblings, 1 reply; 26+ messages in thread
From: qiang.zhang @ 2020-08-14  6:45 UTC (permalink / raw)
  To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, joel
  Cc: rcu, linux-kernel

From: Zqiang <qiang.zhang@windriver.com>

Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
func, if the shrinker is triggered at this time, we should drain each
possible cpu "krcp".

Signed-off-by: Zqiang <qiang.zhang@windriver.com>
---
 kernel/rcu/tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9ac716..619ccbb3fe4b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 	unsigned long count = 0;
 
 	/* Snapshot count of all CPUs */
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count += READ_ONCE(krcp->count);
@@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 	int cpu, freed = 0;
 	unsigned long flags;
 
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		int count;
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
@@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
 	int cpu;
 	unsigned long flags;
 
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-14  6:45 [PATCH] rcu: shrink each possible cpu krcp qiang.zhang
@ 2020-08-14 18:51 ` Uladzislau Rezki
  2020-08-17 22:03   ` Joel Fernandes
  0 siblings, 1 reply; 26+ messages in thread
From: Uladzislau Rezki @ 2020-08-14 18:51 UTC (permalink / raw)
  To: qiang.zhang, Joel Fernandes
  Cc: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai, joel,
	rcu, linux-kernel

> From: Zqiang <qiang.zhang@windriver.com>
> 
> Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> func, if the shrinker is triggered at this time, we should drain each
> possible cpu "krcp".
> 
> Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> ---
>  kernel/rcu/tree.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..619ccbb3fe4b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
>  	unsigned long count = 0;
>  
>  	/* Snapshot count of all CPUs */
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
>  		count += READ_ONCE(krcp->count);
> @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  	int cpu, freed = 0;
>  	unsigned long flags;
>  
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		int count;
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
> @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
>  	int cpu;
>  	unsigned long flags;
>  
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
>  		raw_spin_lock_irqsave(&krcp->lock, flags);
>
I agree that it can happen.

Joel, what is your view?

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-14 18:51 ` Uladzislau Rezki
@ 2020-08-17 22:03   ` Joel Fernandes
  2020-08-18 17:18     ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Joel Fernandes @ 2020-08-17 22:03 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: qiang.zhang, Paul E. McKenney, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> > From: Zqiang <qiang.zhang@windriver.com>
> >
> > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > func, if the shrinker is triggered at this time, we should drain each
> > possible cpu "krcp".
> >
> > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > ---
> >  kernel/rcu/tree.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8ce77d9ac716..619ccbb3fe4b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> >       unsigned long count = 0;
> >
> >       /* Snapshot count of all CPUs */
> > -     for_each_online_cpu(cpu) {
> > +     for_each_possible_cpu(cpu) {
> >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> >               count += READ_ONCE(krcp->count);
> > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >       int cpu, freed = 0;
> >       unsigned long flags;
> >
> > -     for_each_online_cpu(cpu) {
> > +     for_each_possible_cpu(cpu) {
> >               int count;
> >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> >       int cpu;
> >       unsigned long flags;
> >
> > -     for_each_online_cpu(cpu) {
> > +     for_each_possible_cpu(cpu) {
> >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> >               raw_spin_lock_irqsave(&krcp->lock, flags);
> >
> I agree that it can happen.
>
> Joel, what is your view?

Yes I also think it is possible. The patch LGTM. Another fix could be
to drain the caches in the CPU offline path and save the memory. But
then it will take hit during __get_free_page(). If CPU
offlining/online is not frequent, then it will save the lost memory.

I wonder how other per-cpu caches in the kernel work in such scenarios.

Thoughts?

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-17 22:03   ` Joel Fernandes
@ 2020-08-18 17:18     ` Paul E. McKenney
  2020-08-18 19:00       ` Joel Fernandes
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2020-08-18 17:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > > From: Zqiang <qiang.zhang@windriver.com>
> > >
> > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > func, if the shrinker is triggered at this time, we should drain each
> > > possible cpu "krcp".
> > >
> > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > ---
> > >  kernel/rcu/tree.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > >       unsigned long count = 0;
> > >
> > >       /* Snapshot count of all CPUs */
> > > -     for_each_online_cpu(cpu) {
> > > +     for_each_possible_cpu(cpu) {
> > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > >               count += READ_ONCE(krcp->count);
> > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > >       int cpu, freed = 0;
> > >       unsigned long flags;
> > >
> > > -     for_each_online_cpu(cpu) {
> > > +     for_each_possible_cpu(cpu) {
> > >               int count;
> > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > >       int cpu;
> > >       unsigned long flags;
> > >
> > > -     for_each_online_cpu(cpu) {
> > > +     for_each_possible_cpu(cpu) {
> > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > >               raw_spin_lock_irqsave(&krcp->lock, flags);
> > >
> > I agree that it can happen.
> >
> > Joel, what is your view?
> 
> Yes I also think it is possible. The patch LGTM. Another fix could be
> to drain the caches in the CPU offline path and save the memory. But
> then it will take hit during __get_free_page(). If CPU
> offlining/online is not frequent, then it will save the lost memory.
> 
> I wonder how other per-cpu caches in the kernel work in such scenarios.
> 
> Thoughts?

Do I count this as an ack or a review?  If not, what precisely would
you like the submitter to do differently?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-18 17:18     ` Paul E. McKenney
@ 2020-08-18 19:00       ` Joel Fernandes
  2020-08-18 21:03         ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Joel Fernandes @ 2020-08-18 19:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> > >
> > > > From: Zqiang <qiang.zhang@windriver.com>
> > > >
> > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > func, if the shrinker is triggered at this time, we should drain each
> > > > possible cpu "krcp".
> > > >
> > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > > ---
> > > >  kernel/rcu/tree.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > >       unsigned long count = 0;
> > > >
> > > >       /* Snapshot count of all CPUs */
> > > > -     for_each_online_cpu(cpu) {
> > > > +     for_each_possible_cpu(cpu) {
> > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > >
> > > >               count += READ_ONCE(krcp->count);
> > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > >       int cpu, freed = 0;
> > > >       unsigned long flags;
> > > >
> > > > -     for_each_online_cpu(cpu) {
> > > > +     for_each_possible_cpu(cpu) {
> > > >               int count;
> > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > >
> > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > >       int cpu;
> > > >       unsigned long flags;
> > > >
> > > > -     for_each_online_cpu(cpu) {
> > > > +     for_each_possible_cpu(cpu) {
> > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > >
> > > >               raw_spin_lock_irqsave(&krcp->lock, flags);
> > > >
> > > I agree that it can happen.
> > >
> > > Joel, what is your view?
> >
> > Yes I also think it is possible. The patch LGTM. Another fix could be
> > to drain the caches in the CPU offline path and save the memory. But
> > then it will take hit during __get_free_page(). If CPU
> > offlining/online is not frequent, then it will save the lost memory.
> >
> > I wonder how other per-cpu caches in the kernel work in such scenarios.
> >
> > Thoughts?
>
> Do I count this as an ack or a review?  If not, what precisely would
> you like the submitter to do differently?

Hi Paul,
The patch is correct and is definitely an improvement. I was thinking
about whether we should always do what the patch is doing when
offlining CPUs to save memory but now I feel that may not be that much
of a win to justify more complexity.

You can take it with my ack:

Acked-by: Joel Fernandes <joel@joelfernandes.org>

 thanks,

 - Joel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-18 19:00       ` Joel Fernandes
@ 2020-08-18 21:03         ` Paul E. McKenney
  2020-08-18 21:55           ` Uladzislau Rezki
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2020-08-18 21:03 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > >
> > > > > From: Zqiang <qiang.zhang@windriver.com>
> > > > >
> > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > possible cpu "krcp".
> > > > >
> > > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > > > ---
> > > > >  kernel/rcu/tree.c | 6 +++---
> > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > >       unsigned long count = 0;
> > > > >
> > > > >       /* Snapshot count of all CPUs */
> > > > > -     for_each_online_cpu(cpu) {
> > > > > +     for_each_possible_cpu(cpu) {
> > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > >
> > > > >               count += READ_ONCE(krcp->count);
> > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > >       int cpu, freed = 0;
> > > > >       unsigned long flags;
> > > > >
> > > > > -     for_each_online_cpu(cpu) {
> > > > > +     for_each_possible_cpu(cpu) {
> > > > >               int count;
> > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > >
> > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > >       int cpu;
> > > > >       unsigned long flags;
> > > > >
> > > > > -     for_each_online_cpu(cpu) {
> > > > > +     for_each_possible_cpu(cpu) {
> > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > >
> > > > >               raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > >
> > > > I agree that it can happen.
> > > >
> > > > Joel, what is your view?
> > >
> > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > to drain the caches in the CPU offline path and save the memory. But
> > > then it will take hit during __get_free_page(). If CPU
> > > offlining/online is not frequent, then it will save the lost memory.
> > >
> > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > >
> > > Thoughts?
> >
> > Do I count this as an ack or a review?  If not, what precisely would
> > you like the submitter to do differently?
> 
> Hi Paul,
> The patch is correct and is definitely an improvement. I was thinking
> about whether we should always do what the patch is doing when
> offlining CPUs to save memory but now I feel that may not be that much
> of a win to justify more complexity.
> 
> You can take it with my ack:
> 
> Acked-by: Joel Fernandes <joel@joelfernandes.org>

Thank you all!  I wordsmithed a bit as shown below, so please let
me know if I messed anything up.

							Thanx, Paul

------------------------------------------------------------------------

commit fe5d89cc025b3efe682cac122bc4d39f4722821e
Author: Zqiang <qiang.zhang@windriver.com>
Date:   Fri Aug 14 14:45:57 2020 +0800

    rcu: Shrink each possible cpu krcp
    
    CPUs can go offline shortly after kfree_call_rcu() has been invoked,
    which can leave memory stranded until those CPUs come back online.
    This commit therefore drains the kcrp of each CPU, not just the
    ones that happen to be online.
    
    Acked-by: Joel Fernandes <joel@joelfernandes.org>
    Signed-off-by: Zqiang <qiang.zhang@windriver.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 02ca8e5..d9f90f6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 	unsigned long count = 0;
 
 	/* Snapshot count of all CPUs */
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		count += READ_ONCE(krcp->count);
@@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 	int cpu, freed = 0;
 	unsigned long flags;
 
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		int count;
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
@@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
 	int cpu;
 	unsigned long flags;
 
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
 
 		raw_spin_lock_irqsave(&krcp->lock, flags);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-18 21:03         ` Paul E. McKenney
@ 2020-08-18 21:55           ` Uladzislau Rezki
  2020-08-18 22:02             ` Paul E. McKenney
  2020-08-18 23:25             ` Joel Fernandes
  0 siblings, 2 replies; 26+ messages in thread
From: Uladzislau Rezki @ 2020-08-18 21:55 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes
  Cc: Joel Fernandes, Uladzislau Rezki, qiang.zhang, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

> On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > >
> > > > > > From: Zqiang <qiang.zhang@windriver.com>
> > > > > >
> > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > possible cpu "krcp".
> > > > > >
> > > > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > > > > ---
> > > > > >  kernel/rcu/tree.c | 6 +++---
> > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > >       unsigned long count = 0;
> > > > > >
> > > > > >       /* Snapshot count of all CPUs */
> > > > > > -     for_each_online_cpu(cpu) {
> > > > > > +     for_each_possible_cpu(cpu) {
> > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > >
> > > > > >               count += READ_ONCE(krcp->count);
> > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > >       int cpu, freed = 0;
> > > > > >       unsigned long flags;
> > > > > >
> > > > > > -     for_each_online_cpu(cpu) {
> > > > > > +     for_each_possible_cpu(cpu) {
> > > > > >               int count;
> > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > >
> > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > >       int cpu;
> > > > > >       unsigned long flags;
> > > > > >
> > > > > > -     for_each_online_cpu(cpu) {
> > > > > > +     for_each_possible_cpu(cpu) {
> > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > >
> > > > > >               raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > >
> > > > > I agree that it can happen.
> > > > >
> > > > > Joel, what is your view?
> > > >
> > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > to drain the caches in the CPU offline path and save the memory. But
> > > > then it will take hit during __get_free_page(). If CPU
> > > > offlining/online is not frequent, then it will save the lost memory.
> > > >
> > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > >
> > > > Thoughts?
> > >
> > > Do I count this as an ack or a review?  If not, what precisely would
> > > you like the submitter to do differently?
> > 
> > Hi Paul,
> > The patch is correct and is definitely an improvement. I was thinking
> > about whether we should always do what the patch is doing when
> > offlining CPUs to save memory but now I feel that may not be that much
> > of a win to justify more complexity.
> > 
> > You can take it with my ack:
> > 
> > Acked-by: Joel Fernandes <joel@joelfernandes.org>
> 
> Thank you all!  I wordsmithed a bit as shown below, so please let
> me know if I messed anything up.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> Author: Zqiang <qiang.zhang@windriver.com>
> Date:   Fri Aug 14 14:45:57 2020 +0800
> 
>     rcu: Shrink each possible cpu krcp
>     
>     CPUs can go offline shortly after kfree_call_rcu() has been invoked,
>     which can leave memory stranded until those CPUs come back online.
>     This commit therefore drains the kcrp of each CPU, not just the
>     ones that happen to be online.
>     
>     Acked-by: Joel Fernandes <joel@joelfernandes.org>
>     Signed-off-by: Zqiang <qiang.zhang@windriver.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 02ca8e5..d9f90f6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
>  	unsigned long count = 0;
>  
>  	/* Snapshot count of all CPUs */
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
>  		count += READ_ONCE(krcp->count);
> @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>  	int cpu, freed = 0;
>  	unsigned long flags;
>  
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		int count;
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
> @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
>  	int cpu;
>  	unsigned long flags;
>  
> -	for_each_online_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
>  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>  
>  		raw_spin_lock_irqsave(&krcp->lock, flags);
>

Should we just clean a krc of a CPU when it goes offline?

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b8ccd7b5af82..6decb9ad2421 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
 {
        struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
        struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
+       struct kfree_rcu_cpu *krcp;
 
        if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
                return 0;
 
+       /* Drain the kcrp of this CPU. IRQs should be disabled? */
+       krcp = this_cpu_ptr(&krc)
+       schedule_delayed_work(&krcp->monitor_work, 0);
+

A cpu can be offlined and its krp will be stuck until a shrinker is involved.
Maybe be never.

--
Vlad Rezki

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-18 21:55           ` Uladzislau Rezki
@ 2020-08-18 22:02             ` Paul E. McKenney
  2020-08-19  0:04               ` Joel Fernandes
  2020-08-18 23:25             ` Joel Fernandes
  1 sibling, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2020-08-18 22:02 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Tue, Aug 18, 2020 at 11:55:11PM +0200, Uladzislau Rezki wrote:
> > On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > > >
> > > > > > > From: Zqiang <qiang.zhang@windriver.com>
> > > > > > >
> > > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > > possible cpu "krcp".
> > > > > > >
> > > > > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > > > > > ---
> > > > > > >  kernel/rcu/tree.c | 6 +++---
> > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > >       unsigned long count = 0;
> > > > > > >
> > > > > > >       /* Snapshot count of all CPUs */
> > > > > > > -     for_each_online_cpu(cpu) {
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > >               count += READ_ONCE(krcp->count);
> > > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > >       int cpu, freed = 0;
> > > > > > >       unsigned long flags;
> > > > > > >
> > > > > > > -     for_each_online_cpu(cpu) {
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               int count;
> > > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > > >       int cpu;
> > > > > > >       unsigned long flags;
> > > > > > >
> > > > > > > -     for_each_online_cpu(cpu) {
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > >               raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > >
> > > > > > I agree that it can happen.
> > > > > >
> > > > > > Joel, what is your view?
> > > > >
> > > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > > to drain the caches in the CPU offline path and save the memory. But
> > > > > then it will take hit during __get_free_page(). If CPU
> > > > > offlining/online is not frequent, then it will save the lost memory.
> > > > >
> > > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Do I count this as an ack or a review?  If not, what precisely would
> > > > you like the submitter to do differently?
> > > 
> > > Hi Paul,
> > > The patch is correct and is definitely an improvement. I was thinking
> > > about whether we should always do what the patch is doing when
> > > offlining CPUs to save memory but now I feel that may not be that much
> > > of a win to justify more complexity.
> > > 
> > > You can take it with my ack:
> > > 
> > > Acked-by: Joel Fernandes <joel@joelfernandes.org>
> > 
> > Thank you all!  I wordsmithed a bit as shown below, so please let
> > me know if I messed anything up.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> > Author: Zqiang <qiang.zhang@windriver.com>
> > Date:   Fri Aug 14 14:45:57 2020 +0800
> > 
> >     rcu: Shrink each possible cpu krcp
> >     
> >     CPUs can go offline shortly after kfree_call_rcu() has been invoked,
> >     which can leave memory stranded until those CPUs come back online.
> >     This commit therefore drains the kcrp of each CPU, not just the
> >     ones that happen to be online.
> >     
> >     Acked-by: Joel Fernandes <joel@joelfernandes.org>
> >     Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 02ca8e5..d9f90f6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> >  	unsigned long count = 0;
> >  
> >  	/* Snapshot count of all CPUs */
> > -	for_each_online_cpu(cpu) {
> > +	for_each_possible_cpu(cpu) {
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> >  		count += READ_ONCE(krcp->count);
> > @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >  	int cpu, freed = 0;
> >  	unsigned long flags;
> >  
> > -	for_each_online_cpu(cpu) {
> > +	for_each_possible_cpu(cpu) {
> >  		int count;
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> > @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
> >  	int cpu;
> >  	unsigned long flags;
> >  
> > -	for_each_online_cpu(cpu) {
> > +	for_each_possible_cpu(cpu) {
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> >  		raw_spin_lock_irqsave(&krcp->lock, flags);
> >
> 
> Should we just clean a krc of a CPU when it goes offline?

That is equally valid from my viewpoint.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b8ccd7b5af82..6decb9ad2421 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
>  {
>         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> +       struct kfree_rcu_cpu *krcp;
>  
>         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>                 return 0;
>  
> +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> +       krcp = this_cpu_ptr(&krc)
> +       schedule_delayed_work(&krcp->monitor_work, 0);
> +
> 
> A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> Maybe be never.

Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
hard time getting too worried about it.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-18 21:55           ` Uladzislau Rezki
  2020-08-18 22:02             ` Paul E. McKenney
@ 2020-08-18 23:25             ` Joel Fernandes
  1 sibling, 0 replies; 26+ messages in thread
From: Joel Fernandes @ 2020-08-18 23:25 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Tue, Aug 18, 2020 at 11:55:11PM +0200, Uladzislau Rezki wrote:
> > On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > > >
> > > > > > > From: Zqiang <qiang.zhang@windriver.com>
> > > > > > >
> > > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > > possible cpu "krcp".
> > > > > > >
> > > > > > > Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> > > > > > > ---
> > > > > > >  kernel/rcu/tree.c | 6 +++---
> > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > >       unsigned long count = 0;
> > > > > > >
> > > > > > >       /* Snapshot count of all CPUs */
> > > > > > > -     for_each_online_cpu(cpu) {
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > >               count += READ_ONCE(krcp->count);
> > > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > >       int cpu, freed = 0;
> > > > > > >       unsigned long flags;
> > > > > > >
> > > > > > > -     for_each_online_cpu(cpu) {
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               int count;
> > > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > > >       int cpu;
> > > > > > >       unsigned long flags;
> > > > > > >
> > > > > > > -     for_each_online_cpu(cpu) {
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > >               raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > >
> > > > > > I agree that it can happen.
> > > > > >
> > > > > > Joel, what is your view?
> > > > >
> > > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > > to drain the caches in the CPU offline path and save the memory. But
> > > > > then it will take hit during __get_free_page(). If CPU
> > > > > offlining/online is not frequent, then it will save the lost memory.
> > > > >
> > > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Do I count this as an ack or a review?  If not, what precisely would
> > > > you like the submitter to do differently?
> > > 
> > > Hi Paul,
> > > The patch is correct and is definitely an improvement. I was thinking
> > > about whether we should always do what the patch is doing when
> > > offlining CPUs to save memory but now I feel that may not be that much
> > > of a win to justify more complexity.
> > > 
> > > You can take it with my ack:
> > > 
> > > Acked-by: Joel Fernandes <joel@joelfernandes.org>
> > 
> > Thank you all!  I wordsmithed a bit as shown below, so please let
> > me know if I messed anything up.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> > Author: Zqiang <qiang.zhang@windriver.com>
> > Date:   Fri Aug 14 14:45:57 2020 +0800
> > 
> >     rcu: Shrink each possible cpu krcp
> >     
> >     CPUs can go offline shortly after kfree_call_rcu() has been invoked,
> >     which can leave memory stranded until those CPUs come back online.
> >     This commit therefore drains the kcrp of each CPU, not just the
> >     ones that happen to be online.
> >     
> >     Acked-by: Joel Fernandes <joel@joelfernandes.org>
> >     Signed-off-by: Zqiang <qiang.zhang@windriver.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 02ca8e5..d9f90f6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> >  	unsigned long count = 0;
> >  
> >  	/* Snapshot count of all CPUs */
> > -	for_each_online_cpu(cpu) {
> > +	for_each_possible_cpu(cpu) {
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> >  		count += READ_ONCE(krcp->count);
> > @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> >  	int cpu, freed = 0;
> >  	unsigned long flags;
> >  
> > -	for_each_online_cpu(cpu) {
> > +	for_each_possible_cpu(cpu) {
> >  		int count;
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> > @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
> >  	int cpu;
> >  	unsigned long flags;
> >  
> > -	for_each_online_cpu(cpu) {
> > +	for_each_possible_cpu(cpu) {
> >  		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >  
> >  		raw_spin_lock_irqsave(&krcp->lock, flags);
> >
> 
> Should we just clean a krc of a CPU when it goes offline?
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b8ccd7b5af82..6decb9ad2421 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
>  {
>         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> +       struct kfree_rcu_cpu *krcp;
>  
>         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
>                 return 0;
>  
> +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> +       krcp = this_cpu_ptr(&krc)
> +       schedule_delayed_work(&krcp->monitor_work, 0);
> +
> 
> A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> Maybe be never.
> 

Yes that is a bug as we discussed on IRC, thanks for following up as well.

We need to acquire the krcp->lock too if no monitor is scheduled then nothing
to do so it does not race with the kfree_rcu_work. So same as what shrinker
does:

		raw_spin_lock_irqsave(&krcp->lock, flags);
		if (krcp->monitor_todo)
			kfree_rcu_drain_unlock(krcp, flags);
		else
			raw_spin_unlock_irqrestore(&krcp->lock, flags);

thanks!

 - Joel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-18 22:02             ` Paul E. McKenney
@ 2020-08-19  0:04               ` Joel Fernandes
  2020-08-19  3:00                 ` 回复: " Zhang, Qiang
  2020-08-19 11:22                 ` [PATCH] rcu: shrink each possible cpu krcp Uladzislau Rezki
  0 siblings, 2 replies; 26+ messages in thread
From: Joel Fernandes @ 2020-08-19  0:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b8ccd7b5af82..6decb9ad2421 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +       struct kfree_rcu_cpu *krcp;
> >
> >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >                 return 0;
> >
> > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > +       krcp = this_cpu_ptr(&krc)
> > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > +
> >
> > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > Maybe be never.
>
> Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> hard time getting too worried about it.  ;-)

Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
on the cache reaper who's job is to flush the per-cpu caches. So I
believe during CPU offlining, the per-cpu slab caches are flushed.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19  0:04               ` Joel Fernandes
@ 2020-08-19  3:00                 ` Zhang, Qiang
  2020-08-19 13:04                   ` Paul E. McKenney
  2020-08-19 13:56                   ` Joel Fernandes
  2020-08-19 11:22                 ` [PATCH] rcu: shrink each possible cpu krcp Uladzislau Rezki
  1 sibling, 2 replies; 26+ messages in thread
From: Zhang, Qiang @ 2020-08-19  3:00 UTC (permalink / raw)
  To: Joel Fernandes, Paul E. McKenney
  Cc: Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML



________________________________________
发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
发送时间: 2020年8月19日 8:04
收件人: Paul E. McKenney
抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
主题: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b8ccd7b5af82..6decb9ad2421 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> >  {
> >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > +       struct kfree_rcu_cpu *krcp;
> >
> >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> >                 return 0;
> >
> > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > +       krcp = this_cpu_ptr(&krc)
> > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > +
> >
> > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > Maybe be never.
>
> Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> hard time getting too worried about it.  ;-)

>Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
>on the cache reaper who's job is to flush the per-cpu caches. So I
>believe during CPU offlining, the per-cpu slab caches are flushed.
>
>thanks,
>
 >- Joel

When cpu going offline, the slub or slab only flush free objects in offline cpu cache,  put these free objects in node list  or return buddy system,  for those who are still in use, they still stay offline cpu cache.

If we want clean per-cpu "krcp" objects when cpu going offline.
 we should free "krcp" cache objects in "rcutree_offline_cpu", this func be called before other rcu cpu offline func. and then "rcutree_offline_cpu" will be called in "cpuhp/%u" per-cpu thread.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9ac716..1812d4a1ac1b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
        unsigned long flags;
        struct rcu_data *rdp;
        struct rcu_node *rnp;
+       struct kfree_rcu_cpu *krcp;
 
        rdp = per_cpu_ptr(&rcu_data, cpu);
        rnp = rdp->mynode;
@@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
 
        // nohz_full CPUs need the tick for stop-machine to work quickly
        tick_dep_set(TICK_DEP_BIT_RCU);
+
+       krcp = per_cpu_ptr(&krc, cpu);
+       raw_spin_lock_irqsave(&krcp->lock, flags);
+       schedule_delayed_work(&krcp->monitor_work, 0);
+       raw_spin_unlock_irqrestore(&krcp->lock, flags);
        return 0;
 }

thanks,

Zqiang

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19  0:04               ` Joel Fernandes
  2020-08-19  3:00                 ` 回复: " Zhang, Qiang
@ 2020-08-19 11:22                 ` Uladzislau Rezki
  2020-08-19 13:25                   ` Joel Fernandes
  1 sibling, 1 reply; 26+ messages in thread
From: Uladzislau Rezki @ 2020-08-19 11:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, Uladzislau Rezki, qiang.zhang, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Tue, Aug 18, 2020 at 08:04:20PM -0400, Joel Fernandes wrote:
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >  {
> > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > +       struct kfree_rcu_cpu *krcp;
> > >
> > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > >                 return 0;
> > >
> > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > +       krcp = this_cpu_ptr(&krc)
> > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > hard time getting too worried about it.  ;-)
> 
> Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> on the cache reaper who's job is to flush the per-cpu caches. So I
> believe during CPU offlining, the per-cpu slab caches are flushed.
> 
SLAB does it for sure, same as page allocator. There are special CPU-offline
callbacks for both cases to perform cleanup when CPU dies.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19  3:00                 ` 回复: " Zhang, Qiang
@ 2020-08-19 13:04                   ` Paul E. McKenney
  2020-08-19 13:56                   ` Joel Fernandes
  1 sibling, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2020-08-19 13:04 UTC (permalink / raw)
  To: Zhang, Qiang
  Cc: Joel Fernandes, Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> 
> 
> ________________________________________
> 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> 
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >  {
> > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > +       struct kfree_rcu_cpu *krcp;
> > >
> > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > >                 return 0;
> > >
> > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > +       krcp = this_cpu_ptr(&krc)
> > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > hard time getting too worried about it.  ;-)
> 
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
>  >- Joel
> 
> When cpu going offline, the slub or slab only flush free objects in offline cpu cache,  put these free objects in node list  or return buddy system,  for those who are still in use, they still stay offline cpu cache.
> 
> If we want clean per-cpu "krcp" objects when cpu going offline.
>  we should free "krcp" cache objects in "rcutree_offline_cpu", this func be called before other rcu cpu offline func. and then "rcutree_offline_cpu" will be called in "cpuhp/%u" per-cpu thread.

If Joel and Uladislau are good with this, would you be willing to
resend with a commit log and your Signed-off-by?

							Thanx, Paul

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
>         unsigned long flags;
>         struct rcu_data *rdp;
>         struct rcu_node *rnp;
> +       struct kfree_rcu_cpu *krcp;
>  
>         rdp = per_cpu_ptr(&rcu_data, cpu);
>         rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>  
>         // nohz_full CPUs need the tick for stop-machine to work quickly
>         tick_dep_set(TICK_DEP_BIT_RCU);
> +
> +       krcp = per_cpu_ptr(&krc, cpu);
> +       raw_spin_lock_irqsave(&krcp->lock, flags);
> +       schedule_delayed_work(&krcp->monitor_work, 0);
> +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
>         return 0;
>  }
> 
> thanks,
> 
> Zqiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19 11:22                 ` [PATCH] rcu: shrink each possible cpu krcp Uladzislau Rezki
@ 2020-08-19 13:25                   ` Joel Fernandes
  0 siblings, 0 replies; 26+ messages in thread
From: Joel Fernandes @ 2020-08-19 13:25 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, qiang.zhang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 01:22:25PM +0200, Uladzislau Rezki wrote:
> On Tue, Aug 18, 2020 at 08:04:20PM -0400, Joel Fernandes wrote:
> > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > >  {
> > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > +       struct kfree_rcu_cpu *krcp;
> > > >
> > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > >                 return 0;
> > > >
> > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > +       krcp = this_cpu_ptr(&krc)
> > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > +
> > > >
> > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > Maybe be never.
> > >
> > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > hard time getting too worried about it.  ;-)
> > 
> > Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > on the cache reaper who's job is to flush the per-cpu caches. So I
> > believe during CPU offlining, the per-cpu slab caches are flushed.
> > 
> SLAB does it for sure, same as page allocator. There are special CPU-offline
> callbacks for both cases to perform cleanup when CPU dies.

Got it, thanks for confirming, makes sense.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19  3:00                 ` 回复: " Zhang, Qiang
  2020-08-19 13:04                   ` Paul E. McKenney
@ 2020-08-19 13:56                   ` Joel Fernandes
  2020-08-19 15:21                     ` Paul E. McKenney
  1 sibling, 1 reply; 26+ messages in thread
From: Joel Fernandes @ 2020-08-19 13:56 UTC (permalink / raw)
  To: Zhang, Qiang
  Cc: Paul E. McKenney, Uladzislau Rezki, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> 
> 
> ________________________________________
> 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> 
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > >  {
> > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > +       struct kfree_rcu_cpu *krcp;
> > >
> > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > >                 return 0;
> > >
> > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > +       krcp = this_cpu_ptr(&krc)
> > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > hard time getting too worried about it.  ;-)
> 
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
>  >- Joel
> 
> When cpu going offline, the slub or slab only flush free objects in offline
> cpu cache,  put these free objects in node list  or return buddy system,
> for those who are still in use, they still stay offline cpu cache.
> 
> If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> called in "cpuhp/%u" per-cpu thread.
> 

Could you please wrap text properly when you post to mailing list, thanks. I
fixed it for you above.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
>         unsigned long flags;
>         struct rcu_data *rdp;
>         struct rcu_node *rnp;
> +       struct kfree_rcu_cpu *krcp;
>  
>         rdp = per_cpu_ptr(&rcu_data, cpu);
>         rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>  
>         // nohz_full CPUs need the tick for stop-machine to work quickly
>         tick_dep_set(TICK_DEP_BIT_RCU);
> +
> +       krcp = per_cpu_ptr(&krc, cpu);
> +       raw_spin_lock_irqsave(&krcp->lock, flags);
> +       schedule_delayed_work(&krcp->monitor_work, 0);
> +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
>         return 0;

I realized the above is not good enough for what this is trying to do. Unlike
the slab, the new kfree_rcu objects cannot always be drained / submitted to
RCU because the previous batch may still be waiting for a grace period. So
the above code could very well return with the yet-to-be-submitted kfree_rcu
objects still in the cache.

One option is to spin-wait here for monitor_todo to be false and keep calling
kfree_rcu_drain_unlock() till then.

But then that's not good enough either, because if new objects are queued
when interrupts are enabled in the CPU offline path, then the cache will get
new objects after the previous set was drained. Further, spin waiting may
introduce deadlocks.

Another option is to switch the kfree_rcu() path to non-batching (so new
objects cannot be cached in the offline path and are submitted directly to
RCU), wait for a GP and then submit the work. But then not sure if 1-argument
kfree_rcu() will like that.

Probably Qian's original fix for for_each_possible_cpus() is good enough for
the shrinker case, and then we can tackle the hotplug one.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19 13:56                   ` Joel Fernandes
@ 2020-08-19 15:21                     ` Paul E. McKenney
  2020-08-19 15:54                       ` Joel Fernandes
  2020-08-19 15:58                       ` Uladzislau Rezki
  0 siblings, 2 replies; 26+ messages in thread
From: Paul E. McKenney @ 2020-08-19 15:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Zhang, Qiang, Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > 
> > 
> > ________________________________________
> > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > 发送时间: 2020年8月19日 8:04
> > 收件人: Paul E. McKenney
> > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > 
> > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > >  {
> > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > +       struct kfree_rcu_cpu *krcp;
> > > >
> > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > >                 return 0;
> > > >
> > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > +       krcp = this_cpu_ptr(&krc)
> > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > +
> > > >
> > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > Maybe be never.
> > >
> > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > hard time getting too worried about it.  ;-)
> > 
> > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > >
> > >thanks,
> > >
> >  >- Joel
> > 
> > When cpu going offline, the slub or slab only flush free objects in offline
> > cpu cache,  put these free objects in node list  or return buddy system,
> > for those who are still in use, they still stay offline cpu cache.
> > 
> > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > called in "cpuhp/%u" per-cpu thread.
> > 
> 
> Could you please wrap text properly when you post to mailing list, thanks. I
> fixed it for you above.
> 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8ce77d9ac716..1812d4a1ac1b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> >         unsigned long flags;
> >         struct rcu_data *rdp;
> >         struct rcu_node *rnp;
> > +       struct kfree_rcu_cpu *krcp;
> >  
> >         rdp = per_cpu_ptr(&rcu_data, cpu);
> >         rnp = rdp->mynode;
> > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> >  
> >         // nohz_full CPUs need the tick for stop-machine to work quickly
> >         tick_dep_set(TICK_DEP_BIT_RCU);
> > +
> > +       krcp = per_cpu_ptr(&krc, cpu);
> > +       raw_spin_lock_irqsave(&krcp->lock, flags);
> > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
> >         return 0;
> 
> I realized the above is not good enough for what this is trying to do. Unlike
> the slab, the new kfree_rcu objects cannot always be drained / submitted to
> RCU because the previous batch may still be waiting for a grace period. So
> the above code could very well return with the yet-to-be-submitted kfree_rcu
> objects still in the cache.
> 
> One option is to spin-wait here for monitor_todo to be false and keep calling
> kfree_rcu_drain_unlock() till then.
> 
> But then that's not good enough either, because if new objects are queued
> when interrupts are enabled in the CPU offline path, then the cache will get
> new objects after the previous set was drained. Further, spin waiting may
> introduce deadlocks.
> 
> Another option is to switch the kfree_rcu() path to non-batching (so new
> objects cannot be cached in the offline path and are submitted directly to
> RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> kfree_rcu() will like that.

Or spawn a workqueue that does something like this:

1.	Get any pending kvfree_rcu() requests sent off to RCU.

2.	Do an rcu_barrier().

3.	Do the cleanup actions.

> Probably Qian's original fix for for_each_possible_cpus() is good enough for
> the shrinker case, and then we can tackle the hotplug one.

It might take some experimentation to find the best solution.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19 15:21                     ` Paul E. McKenney
@ 2020-08-19 15:54                       ` Joel Fernandes
  2020-08-19 15:58                       ` Uladzislau Rezki
  1 sibling, 0 replies; 26+ messages in thread
From: Joel Fernandes @ 2020-08-19 15:54 UTC (permalink / raw)
  To: Paul E. McKenney, boqun.feng
  Cc: Zhang, Qiang, Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > 
> > > 
> > > ________________________________________
> > > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > > 发送时间: 2020年8月19日 8:04
> > > 收件人: Paul E. McKenney
> > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > 
> > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > >  {
> > > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > > +       struct kfree_rcu_cpu *krcp;
> > > > >
> > > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > >                 return 0;
> > > > >
> > > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > +       krcp = this_cpu_ptr(&krc)
> > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > +
> > > > >
> > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > Maybe be never.
> > > >
> > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > hard time getting too worried about it.  ;-)
> > > 
> > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > >
> > > >thanks,
> > > >
> > >  >- Joel
> > > 
> > > When cpu going offline, the slub or slab only flush free objects in offline
> > > cpu cache,  put these free objects in node list  or return buddy system,
> > > for those who are still in use, they still stay offline cpu cache.
> > > 
> > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > called in "cpuhp/%u" per-cpu thread.
> > > 
> > 
> > Could you please wrap text properly when you post to mailing list, thanks. I
> > fixed it for you above.
> > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >         unsigned long flags;
> > >         struct rcu_data *rdp;
> > >         struct rcu_node *rnp;
> > > +       struct kfree_rcu_cpu *krcp;
> > >  
> > >         rdp = per_cpu_ptr(&rcu_data, cpu);
> > >         rnp = rdp->mynode;
> > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  
> > >         // nohz_full CPUs need the tick for stop-machine to work quickly
> > >         tick_dep_set(TICK_DEP_BIT_RCU);
> > > +
> > > +       krcp = per_cpu_ptr(&krc, cpu);
> > > +       raw_spin_lock_irqsave(&krcp->lock, flags);
> > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > >         return 0;
> > 
> > I realized the above is not good enough for what this is trying to do. Unlike
> > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > RCU because the previous batch may still be waiting for a grace period. So
> > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > objects still in the cache.
> > 
> > One option is to spin-wait here for monitor_todo to be false and keep calling
> > kfree_rcu_drain_unlock() till then.
> > 
> > But then that's not good enough either, because if new objects are queued
> > when interrupts are enabled in the CPU offline path, then the cache will get
> > new objects after the previous set was drained. Further, spin waiting may
> > introduce deadlocks.
> > 
> > Another option is to switch the kfree_rcu() path to non-batching (so new
> > objects cannot be cached in the offline path and are submitted directly to
> > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > kfree_rcu() will like that.
> 
> Or spawn a workqueue that does something like this:
> 
> 1.	Get any pending kvfree_rcu() requests sent off to RCU.
> 
> 2.	Do an rcu_barrier().
> 
> 3.	Do the cleanup actions.

One more thing I'n not sure about is if this will even be an issue  because
the delayed-work to run the monitor could be migrated to another (both timers
and wq get migrated).  So in that case, the draining of the kfree_rcu objects
will happened on another CPU.

Adding Boqun as well since I believe he knows a lot about wq and hotplug as
well.

Basically my question is does delayed_work get migrated to another CPU when
one CPU is offline?  If so, I am assuming all races between new
schedule_delayed_work invocations and CPU offlining are handled as well, such
that at some point those new invocations are queued on another CPU.

> > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > the shrinker case, and then we can tackle the hotplug one.
> 
> It might take some experimentation to find the best solution.

Sure.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19 15:21                     ` Paul E. McKenney
  2020-08-19 15:54                       ` Joel Fernandes
@ 2020-08-19 15:58                       ` Uladzislau Rezki
  2020-08-20 22:39                         ` Joel Fernandes
  1 sibling, 1 reply; 26+ messages in thread
From: Uladzislau Rezki @ 2020-08-19 15:58 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes
  Cc: Joel Fernandes, Zhang, Qiang, Uladzislau Rezki, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > 
> > > 
> > > ________________________________________
> > > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > > 发送时间: 2020年8月19日 8:04
> > > 收件人: Paul E. McKenney
> > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > 
> > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > >  {
> > > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > > +       struct kfree_rcu_cpu *krcp;
> > > > >
> > > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > >                 return 0;
> > > > >
> > > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > +       krcp = this_cpu_ptr(&krc)
> > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > +
> > > > >
> > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > Maybe be never.
> > > >
> > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > hard time getting too worried about it.  ;-)
> > > 
> > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > >
> > > >thanks,
> > > >
> > >  >- Joel
> > > 
> > > When cpu going offline, the slub or slab only flush free objects in offline
> > > cpu cache,  put these free objects in node list  or return buddy system,
> > > for those who are still in use, they still stay offline cpu cache.
> > > 
> > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > called in "cpuhp/%u" per-cpu thread.
> > > 
> > 
> > Could you please wrap text properly when you post to mailing list, thanks. I
> > fixed it for you above.
> > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >         unsigned long flags;
> > >         struct rcu_data *rdp;
> > >         struct rcu_node *rnp;
> > > +       struct kfree_rcu_cpu *krcp;
> > >  
> > >         rdp = per_cpu_ptr(&rcu_data, cpu);
> > >         rnp = rdp->mynode;
> > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >  
> > >         // nohz_full CPUs need the tick for stop-machine to work quickly
> > >         tick_dep_set(TICK_DEP_BIT_RCU);
> > > +
> > > +       krcp = per_cpu_ptr(&krc, cpu);
> > > +       raw_spin_lock_irqsave(&krcp->lock, flags);
> > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > >         return 0;
> > 
> > I realized the above is not good enough for what this is trying to do. Unlike
> > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > RCU because the previous batch may still be waiting for a grace period. So
> > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > objects still in the cache.
> > 
> > One option is to spin-wait here for monitor_todo to be false and keep calling
> > kfree_rcu_drain_unlock() till then.
> > 
> > But then that's not good enough either, because if new objects are queued
> > when interrupts are enabled in the CPU offline path, then the cache will get
> > new objects after the previous set was drained. Further, spin waiting may
> > introduce deadlocks.
> > 
> > Another option is to switch the kfree_rcu() path to non-batching (so new
> > objects cannot be cached in the offline path and are submitted directly to
> > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > kfree_rcu() will like that.
> 
> Or spawn a workqueue that does something like this:
> 
> 1.	Get any pending kvfree_rcu() requests sent off to RCU.
> 
> 2.	Do an rcu_barrier().
> 
> 3.	Do the cleanup actions.
> 
> > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > the shrinker case, and then we can tackle the hotplug one.
> 
> It might take some experimentation to find the best solution.
> 

<snip>
static void do_idle(void)
{
...
 while (!need_resched()) {
  rmb();

  local_irq_disable();

  if (cpu_is_offline(cpu)) {
   tick_nohz_idle_stop_tick();
   cpuhp_report_idle_dead();
       -> cpuhp_report_idle_dead(void)
              -> rcu_report_dead(smp_processor_id());
   arch_cpu_idle_dead();
  }
...
<snip>

We have the rcu_report_dead() callback. When it gets called IRQs are off
and CPU that is in question is offline.

    krcp = per_cpu_ptr(&krc, cpu);
    raw_spin_lock_irqsave(&krcp->lock, flags);
    krcp->monotro_todo = true;
    schedule_delayed_work(&krcp->monitor_work, 0);
    raw_spin_unlock_irqrestore(&krcp->lock, flags);

If there is a batch that is in progress, the job will rearm itself.
But i agree, it requires more experiments.


--
Vlad Rezki

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-19 15:58                       ` Uladzislau Rezki
@ 2020-08-20 22:39                         ` Joel Fernandes
  2020-08-21 15:33                           ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Joel Fernandes @ 2020-08-20 22:39 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, Zhang, Qiang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > > 
> > > > 
> > > > ________________________________________
> > > > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > > > 发送时间: 2020年8月19日 8:04
> > > > 收件人: Paul E. McKenney
> > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > 
> > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > >  {
> > > > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > > > +       struct kfree_rcu_cpu *krcp;
> > > > > >
> > > > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > >                 return 0;
> > > > > >
> > > > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > +       krcp = this_cpu_ptr(&krc)
> > > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > +
> > > > > >
> > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > Maybe be never.
> > > > >
> > > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > > hard time getting too worried about it.  ;-)
> > > > 
> > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > >
> > > > >thanks,
> > > > >
> > > >  >- Joel
> > > > 
> > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > cpu cache,  put these free objects in node list  or return buddy system,
> > > > for those who are still in use, they still stay offline cpu cache.
> > > > 
> > > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > called in "cpuhp/%u" per-cpu thread.
> > > > 
> > > 
> > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > fixed it for you above.
> > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >         unsigned long flags;
> > > >         struct rcu_data *rdp;
> > > >         struct rcu_node *rnp;
> > > > +       struct kfree_rcu_cpu *krcp;
> > > >  
> > > >         rdp = per_cpu_ptr(&rcu_data, cpu);
> > > >         rnp = rdp->mynode;
> > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >  
> > > >         // nohz_full CPUs need the tick for stop-machine to work quickly
> > > >         tick_dep_set(TICK_DEP_BIT_RCU);
> > > > +
> > > > +       krcp = per_cpu_ptr(&krc, cpu);
> > > > +       raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > >         return 0;
> > > 
> > > I realized the above is not good enough for what this is trying to do. Unlike
> > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > RCU because the previous batch may still be waiting for a grace period. So
> > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > objects still in the cache.
> > > 
> > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > kfree_rcu_drain_unlock() till then.
> > > 
> > > But then that's not good enough either, because if new objects are queued
> > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > new objects after the previous set was drained. Further, spin waiting may
> > > introduce deadlocks.
> > > 
> > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > objects cannot be cached in the offline path and are submitted directly to
> > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > kfree_rcu() will like that.
> > 
> > Or spawn a workqueue that does something like this:
> > 
> > 1.	Get any pending kvfree_rcu() requests sent off to RCU.
> > 
> > 2.	Do an rcu_barrier().
> > 
> > 3.	Do the cleanup actions.
> > 
> > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > the shrinker case, and then we can tackle the hotplug one.
> > 
> > It might take some experimentation to find the best solution.
> > 
> 
> <snip>
> static void do_idle(void)
> {
> ...
>  while (!need_resched()) {
>   rmb();
> 
>   local_irq_disable();
> 
>   if (cpu_is_offline(cpu)) {
>    tick_nohz_idle_stop_tick();
>    cpuhp_report_idle_dead();
>        -> cpuhp_report_idle_dead(void)
>               -> rcu_report_dead(smp_processor_id());
>    arch_cpu_idle_dead();
>   }
> ...
> <snip>
> 
> We have the rcu_report_dead() callback. When it gets called IRQs are off
> and CPU that is in question is offline.
> 
>     krcp = per_cpu_ptr(&krc, cpu);
>     raw_spin_lock_irqsave(&krcp->lock, flags);
>     krcp->monotro_todo = true;
>     schedule_delayed_work(&krcp->monitor_work, 0);
>     raw_spin_unlock_irqrestore(&krcp->lock, flags);
> 
> If there is a batch that is in progress, the job will rearm itself.
> But i agree, it requires more experiments.

I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
get migrated during the CPU offline path, so it is not an issue.

In this case, Qiang's initial patch suffices to fix the shrinker issue.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-20 22:39                         ` Joel Fernandes
@ 2020-08-21 15:33                           ` Paul E. McKenney
  2020-08-31  9:30                             ` Uladzislau Rezki
  2020-09-09  6:35                             ` Zhang, Qiang
  0 siblings, 2 replies; 26+ messages in thread
From: Paul E. McKenney @ 2020-08-21 15:33 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki, Zhang, Qiang, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > > > 
> > > > > 
> > > > > ________________________________________
> > > > > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > > > > 发送时间: 2020年8月19日 8:04
> > > > > 收件人: Paul E. McKenney
> > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > > 
> > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > 
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > >  {
> > > > > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > > > > +       struct kfree_rcu_cpu *krcp;
> > > > > > >
> > > > > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > >                 return 0;
> > > > > > >
> > > > > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > > +       krcp = this_cpu_ptr(&krc)
> > > > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > > +
> > > > > > >
> > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > > Maybe be never.
> > > > > >
> > > > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > > > hard time getting too worried about it.  ;-)
> > > > > 
> > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > >
> > > > > >thanks,
> > > > > >
> > > > >  >- Joel
> > > > > 
> > > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > > cpu cache,  put these free objects in node list  or return buddy system,
> > > > > for those who are still in use, they still stay offline cpu cache.
> > > > > 
> > > > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > > called in "cpuhp/%u" per-cpu thread.
> > > > > 
> > > > 
> > > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > > fixed it for you above.
> > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > >         unsigned long flags;
> > > > >         struct rcu_data *rdp;
> > > > >         struct rcu_node *rnp;
> > > > > +       struct kfree_rcu_cpu *krcp;
> > > > >  
> > > > >         rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > >         rnp = rdp->mynode;
> > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > >  
> > > > >         // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > >         tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > +
> > > > > +       krcp = per_cpu_ptr(&krc, cpu);
> > > > > +       raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > >         return 0;
> > > > 
> > > > I realized the above is not good enough for what this is trying to do. Unlike
> > > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > > RCU because the previous batch may still be waiting for a grace period. So
> > > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > > objects still in the cache.
> > > > 
> > > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > > kfree_rcu_drain_unlock() till then.
> > > > 
> > > > But then that's not good enough either, because if new objects are queued
> > > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > > new objects after the previous set was drained. Further, spin waiting may
> > > > introduce deadlocks.
> > > > 
> > > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > > objects cannot be cached in the offline path and are submitted directly to
> > > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > > kfree_rcu() will like that.
> > > 
> > > Or spawn a workqueue that does something like this:
> > > 
> > > 1.	Get any pending kvfree_rcu() requests sent off to RCU.
> > > 
> > > 2.	Do an rcu_barrier().
> > > 
> > > 3.	Do the cleanup actions.
> > > 
> > > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > > the shrinker case, and then we can tackle the hotplug one.
> > > 
> > > It might take some experimentation to find the best solution.
> > > 
> > 
> > <snip>
> > static void do_idle(void)
> > {
> > ...
> >  while (!need_resched()) {
> >   rmb();
> > 
> >   local_irq_disable();
> > 
> >   if (cpu_is_offline(cpu)) {
> >    tick_nohz_idle_stop_tick();
> >    cpuhp_report_idle_dead();
> >        -> cpuhp_report_idle_dead(void)
> >               -> rcu_report_dead(smp_processor_id());
> >    arch_cpu_idle_dead();
> >   }
> > ...
> > <snip>
> > 
> > We have the rcu_report_dead() callback. When it gets called IRQs are off
> > and CPU that is in question is offline.
> > 
> >     krcp = per_cpu_ptr(&krc, cpu);
> >     raw_spin_lock_irqsave(&krcp->lock, flags);
> >     krcp->monotro_todo = true;
> >     schedule_delayed_work(&krcp->monitor_work, 0);
> >     raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > 
> > If there is a batch that is in progress, the job will rearm itself.
> > But i agree, it requires more experiments.
> 
> I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
> get migrated during the CPU offline path, so it is not an issue.
> 
> In this case, Qiang's initial patch suffices to fix the shrinker issue.

As in the patch that is currented in -rcu, correct?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 回复: [PATCH] rcu: shrink each possible cpu krcp
  2020-08-21 15:33                           ` Paul E. McKenney
@ 2020-08-31  9:30                             ` Uladzislau Rezki
  2020-09-09  6:35                             ` Zhang, Qiang
  1 sibling, 0 replies; 26+ messages in thread
From: Uladzislau Rezki @ 2020-08-31  9:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Joel Fernandes, Uladzislau Rezki, Zhang, Qiang, Josh Triplett,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Fri, Aug 21, 2020 at 08:33:28AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > > > > 
> > > > > > 
> > > > > > ________________________________________
> > > > > > 发件人: linux-kernel-owner@vger.kernel.org <linux-kernel-owner@vger.kernel.org> 代表 Joel Fernandes <joel@joelfernandes.org>
> > > > > > 发送时间: 2020年8月19日 8:04
> > > > > > 收件人: Paul E. McKenney
> > > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > > > 
> > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > > > 
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > > >  {
> > > > > > > >         struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > > >         struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
> > > > > > > > +       struct kfree_rcu_cpu *krcp;
> > > > > > > >
> > > > > > > >         if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > > >                 return 0;
> > > > > > > >
> > > > > > > > +       /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > > > +       krcp = this_cpu_ptr(&krc)
> > > > > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > > > +
> > > > > > > >
> > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > > > Maybe be never.
> > > > > > >
> > > > > > > Does the same apply to its kmalloc() per-CPU caches?  If so, I have a
> > > > > > > hard time getting too worried about it.  ;-)
> > > > > > 
> > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > > >
> > > > > > >thanks,
> > > > > > >
> > > > > >  >- Joel
> > > > > > 
> > > > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > > > cpu cache,  put these free objects in node list  or return buddy system,
> > > > > > for those who are still in use, they still stay offline cpu cache.
> > > > > > 
> > > > > > If we want clean per-cpu "krcp" objects when cpu going offline.  we should
> > > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > > > called in "cpuhp/%u" per-cpu thread.
> > > > > > 
> > > > > 
> > > > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > > > fixed it for you above.
> > > > > 
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > >         unsigned long flags;
> > > > > >         struct rcu_data *rdp;
> > > > > >         struct rcu_node *rnp;
> > > > > > +       struct kfree_rcu_cpu *krcp;
> > > > > >  
> > > > > >         rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > >         rnp = rdp->mynode;
> > > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > >  
> > > > > >         // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > > >         tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > > +
> > > > > > +       krcp = per_cpu_ptr(&krc, cpu);
> > > > > > +       raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > +       schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > +       raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > > >         return 0;
> > > > > 
> > > > > I realized the above is not good enough for what this is trying to do. Unlike
> > > > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > > > RCU because the previous batch may still be waiting for a grace period. So
> > > > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > > > objects still in the cache.
> > > > > 
> > > > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > > > kfree_rcu_drain_unlock() till then.
> > > > > 
> > > > > But then that's not good enough either, because if new objects are queued
> > > > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > > > new objects after the previous set was drained. Further, spin waiting may
> > > > > introduce deadlocks.
> > > > > 
> > > > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > > > objects cannot be cached in the offline path and are submitted directly to
> > > > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > > > kfree_rcu() will like that.
> > > > 
> > > > Or spawn a workqueue that does something like this:
> > > > 
> > > > 1.	Get any pending kvfree_rcu() requests sent off to RCU.
> > > > 
> > > > 2.	Do an rcu_barrier().
> > > > 
> > > > 3.	Do the cleanup actions.
> > > > 
> > > > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > > > the shrinker case, and then we can tackle the hotplug one.
> > > > 
> > > > It might take some experimentation to find the best solution.
> > > > 
> > > 
> > > <snip>
> > > static void do_idle(void)
> > > {
> > > ...
> > >  while (!need_resched()) {
> > >   rmb();
> > > 
> > >   local_irq_disable();
> > > 
> > >   if (cpu_is_offline(cpu)) {
> > >    tick_nohz_idle_stop_tick();
> > >    cpuhp_report_idle_dead();
> > >        -> cpuhp_report_idle_dead(void)
> > >               -> rcu_report_dead(smp_processor_id());
> > >    arch_cpu_idle_dead();
> > >   }
> > > ...
> > > <snip>
> > > 
> > > We have the rcu_report_dead() callback. When it gets called IRQs are off
> > > and CPU that is in question is offline.
> > > 
> > >     krcp = per_cpu_ptr(&krc, cpu);
> > >     raw_spin_lock_irqsave(&krcp->lock, flags);
> > >     krcp->monotro_todo = true;
> > >     schedule_delayed_work(&krcp->monitor_work, 0);
> > >     raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > 
> > > If there is a batch that is in progress, the job will rearm itself.
> > > But i agree, it requires more experiments.
> > 
> > I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
> > get migrated during the CPU offline path, so it is not an issue.
> > 
> > In this case, Qiang's initial patch suffices to fix the shrinker issue.
> 
> As in the patch that is currented in -rcu, correct?
> 
The "[PATCH] rcu: shrink each possible cpu krcp" will fix the
shrinker "issue" when CPU goes offline. So that was valid concern.

As for "main track", our drain work or a timer that triggers it
will be migrated anyway. Just to repeat what Joel wrote earlier,
we do not have any issues with this part.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 26+ messages in thread

* (no subject)
  2020-08-21 15:33                           ` Paul E. McKenney
  2020-08-31  9:30                             ` Uladzislau Rezki
@ 2020-09-09  6:35                             ` Zhang, Qiang
  2020-09-09  7:03                               ` RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func Zhang, Qiang
  1 sibling, 1 reply; 26+ messages in thread
From: Zhang, Qiang @ 2020-09-09  6:35 UTC (permalink / raw)
  To: Joel Fernandes, paulmck
  Cc: Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

 When config preempt RCU,  if task switch happened in rcu  read critical region.
 the current task be add to "rnp->blkd_tasks" link list,  these rnp is leaf node in RCU    tree.
 In "rcu_gp_fqs_loop" func, 

static void rcu_gp_fqs_loop(void)
 {

            struct rcu_node *rnp = rcu_get_root();
}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RCU:  Question   rcu_preempt_blocked_readers_cgp  in  rcu_gp_fqs_loop func
  2020-09-09  6:35                             ` Zhang, Qiang
@ 2020-09-09  7:03                               ` Zhang, Qiang
  2020-09-09 11:22                                 ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Zhang, Qiang @ 2020-09-09  7:03 UTC (permalink / raw)
  To: Joel Fernandes, paulmck
  Cc: Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML


When config preempt RCU,  and then  there are multiple levels  node,  the current task is preempted  in rcu  read critical region.
the current task be add to "rnp->blkd_tasks" link list,  and the "rnp->gp_tasks"  may be assigned a value .  these rnp is leaf node in RCU tree.

But in "rcu_gp_fqs_loop" func, we check blocked readers in root node. 

static void rcu_gp_fqs_loop(void)
 {
            .....
            struct rcu_node *rnp = rcu_get_root();
            .....
            if (!READ_ONCE(rnp->qsmask) &&
                               !rcu_preempt_blocked_readers_cgp(rnp))    ------> rnp is root node
                     break;
            ....
}

the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value,  this check is invailed.
 Should we check leaf nodes like this 

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void)
 	return true;
 }
 
+static bool rcu_preempt_blocked_readers(void)
+{
+	struct rcu_node *rnp;
+	unsigned long flags;
+	bool ret = false;
+
+	rcu_for_each_leaf_node(rnp) {
+		raw_spin_lock_irqsave_rcu_node(rnp, flags);
+		if (rcu_preempt_blocked_readers_cgp(rnp)) {
+			ret = true;
+			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+			break;
+		}
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	}
+
+	return ret;
+}
+
 /*
  * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
  * time.
@@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
 		return true;
 
 	// The current grace period has completed.
-	if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
+	if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers())
 		return true;
 
 	return false;
@@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void)
 		/* Locking provides needed memory barriers. */
 		/* If grace period done, leave loop. */
 		if (!READ_ONCE(rnp->qsmask) &&
-		    !rcu_preempt_blocked_readers_cgp(rnp))
+		    !rcu_preempt_blocked_readers())
 			break;
 		/* If time for quiescent-state forcing, do it. */
 		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
-- 


thanks
Qiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RCU:  Question   rcu_preempt_blocked_readers_cgp  in rcu_gp_fqs_loop func
  2020-09-09  7:03                               ` RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func Zhang, Qiang
@ 2020-09-09 11:22                                 ` Paul E. McKenney
  2020-09-10  3:25                                   ` 回复: " Zhang, Qiang
  2020-09-14 20:06                                   ` Joel Fernandes
  0 siblings, 2 replies; 26+ messages in thread
From: Paul E. McKenney @ 2020-09-09 11:22 UTC (permalink / raw)
  To: Zhang, Qiang
  Cc: Joel Fernandes, Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Sep 09, 2020 at 07:03:39AM +0000, Zhang, Qiang wrote:
> 
> When config preempt RCU,  and then  there are multiple levels  node,  the current task is preempted  in rcu  read critical region.
> the current task be add to "rnp->blkd_tasks" link list,  and the "rnp->gp_tasks"  may be assigned a value .  these rnp is leaf node in RCU tree.
> 
> But in "rcu_gp_fqs_loop" func, we check blocked readers in root node. 
> 
> static void rcu_gp_fqs_loop(void)
>  {
>             .....
>             struct rcu_node *rnp = rcu_get_root();
>             .....
>             if (!READ_ONCE(rnp->qsmask) &&
>                                !rcu_preempt_blocked_readers_cgp(rnp))    ------> rnp is root node
>                      break;
>             ....
> }
> 
> the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value,  this check is invailed.
>  Should we check leaf nodes like this 

There are two cases:

1.	There is only a single rcu_node structure, which is both root
	and leaf.  In this case, the current check is required:  Both
	->qsmask and the ->blkd_tasks list must be checked.  Your
	rcu_preempt_blocked_readers() would work in this case, but
	the current code is a bit faster because it does not need
	to acquire the ->lock nor does it need the loop overhead.

2.	There are multiple levels.  In this case, as you say, the root
	rcu_node structure's ->blkd_tasks list will always be empty.
	But also in this case, the root rcu_node structure's ->qsmask
	cannot be zero until all the leaf rcu_node structures' ->qsmask
	fields are zero and their ->blkd_tasks lists no longer have
	tasks blocking the current grace period.  This means that your
	rcu_preempt_blocked_readers() function would never return
	true in this case.

So the current code is fine.

Are you seeing failures on mainline kernels?  If so, what is the failure
mode?

							Thanx, Paul

> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void)
>  	return true;
>  }
>  
> +static bool rcu_preempt_blocked_readers(void)
> +{
> +	struct rcu_node *rnp;
> +	unsigned long flags;
> +	bool ret = false;
> +
> +	rcu_for_each_leaf_node(rnp) {
> +		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +		if (rcu_preempt_blocked_readers_cgp(rnp)) {
> +			ret = true;
> +			raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +			break;
> +		}
> +		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +	}
> +
> +	return ret;
> +}
> +
>  /*
>   * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
>   * time.
> @@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
>  		return true;
>  
>  	// The current grace period has completed.
> -	if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
> +	if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers())
>  		return true;
>  
>  	return false;
> @@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void)
>  		/* Locking provides needed memory barriers. */
>  		/* If grace period done, leave loop. */
>  		if (!READ_ONCE(rnp->qsmask) &&
> -		    !rcu_preempt_blocked_readers_cgp(rnp))
> +		    !rcu_preempt_blocked_readers())
>  			break;
>  		/* If time for quiescent-state forcing, do it. */
>  		if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> -- 
> 
> 
> thanks
> Qiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 回复: RCU:  Question   rcu_preempt_blocked_readers_cgp  in  rcu_gp_fqs_loop func
  2020-09-09 11:22                                 ` Paul E. McKenney
@ 2020-09-10  3:25                                   ` Zhang, Qiang
  2020-09-14 20:06                                   ` Joel Fernandes
  1 sibling, 0 replies; 26+ messages in thread
From: Zhang, Qiang @ 2020-09-10  3:25 UTC (permalink / raw)
  To: paulmck
  Cc: Joel Fernandes, Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML



________________________________________
发件人: Paul E. McKenney <paulmck@kernel.org>
发送时间: 2020年9月9日 19:22
收件人: Zhang, Qiang
抄送: Joel Fernandes; Uladzislau Rezki; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
主题: Re: RCU:  Question   rcu_preempt_blocked_readers_cgp  in  rcu_gp_fqs_loop func

On Wed, Sep 09, 2020 at 07:03:39AM +0000, Zhang, Qiang wrote:
>
> When config preempt RCU,  and then  there are multiple levels  node,  the current task is preempted  in rcu  read critical region.
> the current task be add to "rnp->blkd_tasks" link list,  and the "rnp->gp_tasks"  may be assigned a value .  these rnp is leaf node in RCU tree.
>
> But in "rcu_gp_fqs_loop" func, we check blocked readers in root node.
>
> static void rcu_gp_fqs_loop(void)
>  {
>             .....
>             struct rcu_node *rnp = rcu_get_root();
>             .....
>             if (!READ_ONCE(rnp->qsmask) &&
>                                !rcu_preempt_blocked_readers_cgp(rnp))    ------> rnp is root node
>                      break;
>             ....
> }
>
> the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value,  this check is invailed.
>  Should we check leaf nodes like this

>There are two cases:

>1.      There is only a single rcu_node structure, which is both root
>       and leaf.  In this case, the current check is required:  Both
>       ->qsmask and the ->blkd_tasks list must be checked.  Your
>        rcu_preempt_blocked_readers() would work in this case, but
>        the current code is a bit faster because it does not need
>        to acquire the ->lock nor does it need the loop overhead.

>2.      There are multiple levels.  In this case, as you say, the root
>        rcu_node structure's ->blkd_tasks list will always be empty.
>        But also in this case, the root rcu_node structure's ->qsmask
>        cannot be zero until all the leaf rcu_node structures' ->qsmask
>        fields are zero and their ->blkd_tasks lists no longer have
>        tasks blocking the current grace period.  This means that your
 >       rcu_preempt_blocked_readers() function would never return
 >       true in this case.

>So the current code is fine.

>Are you seeing failures on mainline kernels?  If so, what is the failure
>mode?

 Yes it's right, thank you for your explanation.
  
  thanks
  Qiang

 >                                                       Thanx, Paul

> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void)
>       return true;
>  }
>
> +static bool rcu_preempt_blocked_readers(void)
> +{
> +     struct rcu_node *rnp;
> +     unsigned long flags;
> +     bool ret = false;
> +
> +     rcu_for_each_leaf_node(rnp) {
> +             raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +             if (rcu_preempt_blocked_readers_cgp(rnp)) {
> +                     ret = true;
> +                     raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +                     break;
> +             }
> +             raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +     }
> +
> +     return ret;
> +}
> +
>  /*
>   * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
>   * time.
> @@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
>               return true;
>
>       // The current grace period has completed.
> -     if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
> +     if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers())
>               return true;
>
>       return false;
> @@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void)
>               /* Locking provides needed memory barriers. */
>               /* If grace period done, leave loop. */
>               if (!READ_ONCE(rnp->qsmask) &&
> -                 !rcu_preempt_blocked_readers_cgp(rnp))
> +                 !rcu_preempt_blocked_readers())
>                       break;
>               /* If time for quiescent-state forcing, do it. */
>               if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> --
>
>
> thanks
> Qiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RCU:  Question   rcu_preempt_blocked_readers_cgp  in rcu_gp_fqs_loop func
  2020-09-09 11:22                                 ` Paul E. McKenney
  2020-09-10  3:25                                   ` 回复: " Zhang, Qiang
@ 2020-09-14 20:06                                   ` Joel Fernandes
  1 sibling, 0 replies; 26+ messages in thread
From: Joel Fernandes @ 2020-09-14 20:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Zhang, Qiang, Uladzislau Rezki, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, rcu, LKML

On Wed, Sep 09, 2020 at 04:22:41AM -0700, Paul E. McKenney wrote:
> On Wed, Sep 09, 2020 at 07:03:39AM +0000, Zhang, Qiang wrote:
> > 
> > When config preempt RCU,  and then  there are multiple levels  node,  the current task is preempted  in rcu  read critical region.
> > the current task be add to "rnp->blkd_tasks" link list,  and the "rnp->gp_tasks"  may be assigned a value .  these rnp is leaf node in RCU tree.
> > 
> > But in "rcu_gp_fqs_loop" func, we check blocked readers in root node. 
> > 
> > static void rcu_gp_fqs_loop(void)
> >  {
> >             .....
> >             struct rcu_node *rnp = rcu_get_root();
> >             .....
> >             if (!READ_ONCE(rnp->qsmask) &&
> >                                !rcu_preempt_blocked_readers_cgp(rnp))    ------> rnp is root node
> >                      break;
> >             ....
> > }
> > 
> > the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value,  this check is invailed.
> >  Should we check leaf nodes like this 
> 
> There are two cases:
> 
> 1.	There is only a single rcu_node structure, which is both root
> 	and leaf.  In this case, the current check is required:  Both
> 	->qsmask and the ->blkd_tasks list must be checked.  Your
> 	rcu_preempt_blocked_readers() would work in this case, but
> 	the current code is a bit faster because it does not need
> 	to acquire the ->lock nor does it need the loop overhead.
> 
> 2.	There are multiple levels.  In this case, as you say, the root
> 	rcu_node structure's ->blkd_tasks list will always be empty.
> 	But also in this case, the root rcu_node structure's ->qsmask
> 	cannot be zero until all the leaf rcu_node structures' ->qsmask
> 	fields are zero and their ->blkd_tasks lists no longer have
> 	tasks blocking the current grace period.  This means that your
> 	rcu_preempt_blocked_readers() function would never return
> 	true in this case.
> 
> So the current code is fine.
> 
> Are you seeing failures on mainline kernels?  If so, what is the failure
> mode?

Nice question! He had me wondering there too for a bit ;-)

Thanks,

 - Joel


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-09-14 20:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14  6:45 [PATCH] rcu: shrink each possible cpu krcp qiang.zhang
2020-08-14 18:51 ` Uladzislau Rezki
2020-08-17 22:03   ` Joel Fernandes
2020-08-18 17:18     ` Paul E. McKenney
2020-08-18 19:00       ` Joel Fernandes
2020-08-18 21:03         ` Paul E. McKenney
2020-08-18 21:55           ` Uladzislau Rezki
2020-08-18 22:02             ` Paul E. McKenney
2020-08-19  0:04               ` Joel Fernandes
2020-08-19  3:00                 ` 回复: " Zhang, Qiang
2020-08-19 13:04                   ` Paul E. McKenney
2020-08-19 13:56                   ` Joel Fernandes
2020-08-19 15:21                     ` Paul E. McKenney
2020-08-19 15:54                       ` Joel Fernandes
2020-08-19 15:58                       ` Uladzislau Rezki
2020-08-20 22:39                         ` Joel Fernandes
2020-08-21 15:33                           ` Paul E. McKenney
2020-08-31  9:30                             ` Uladzislau Rezki
2020-09-09  6:35                             ` Zhang, Qiang
2020-09-09  7:03                               ` RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func Zhang, Qiang
2020-09-09 11:22                                 ` Paul E. McKenney
2020-09-10  3:25                                   ` 回复: " Zhang, Qiang
2020-09-14 20:06                                   ` Joel Fernandes
2020-08-19 11:22                 ` [PATCH] rcu: shrink each possible cpu krcp Uladzislau Rezki
2020-08-19 13:25                   ` Joel Fernandes
2020-08-18 23:25             ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).