linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
@ 2021-10-22 16:17 Seth Forshee
  2021-10-22 20:36 ` Paul E. McKenney
  2021-10-25 19:48 ` Jakub Kicinski
  0 siblings, 2 replies; 4+ messages in thread
From: Seth Forshee @ 2021-10-22 16:17 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Paul E. McKenney, netdev, linux-kernel

From: Seth Forshee <sforshee@digitalocean.com>

Currently rcu_barrier() is used to ensure that no readers of the
inactive mini_Qdisc buffer remain before it is reused. This waits for
any pending RCU callbacks to complete, when all that is actually
required is to wait for one RCU grace period to elapse after the buffer
was made inactive. This means that using rcu_barrier() may result in
unnecessary waits.

To improve this, store the current RCU state when a buffer is made
inactive and use poll_state_synchronize_rcu() to check whether a full
grace period has elapsed before reusing it. If a full grace period has
not elapsed, wait for a grace period to elapse, and in the non-RT case
use synchronize_rcu_expedited() to hasten it.

Since this approach eliminates the RCU callback it is no longer
necessary to synchronize_rcu() in the tp_head==NULL case. However, the
RCU state should still be saved for the previously active buffer.

Before this change I would typically see mini_qdisc_pair_swap() take
tens of milliseconds to complete. After this change it typcially
finishes in less than 1 ms, and often it takes just a few microseconds.

Thanks to Paul for walking me through the options for improving this.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_generic.c   | 38 +++++++++++++++++++-------------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f8631ad3c868..c725464be814 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1299,7 +1299,7 @@ struct mini_Qdisc {
 	struct tcf_block *block;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
-	struct rcu_head rcu;
+	unsigned long rcu_state;
 };
 
 static inline void mini_qdisc_bstats_cpu_update(struct mini_Qdisc *miniq,
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 854d2b38db85..8540c12c9a62 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1407,10 +1407,6 @@ void psched_ratecfg_precompute(struct psched_ratecfg *r,
 }
 EXPORT_SYMBOL(psched_ratecfg_precompute);
 
-static void mini_qdisc_rcu_func(struct rcu_head *head)
-{
-}
-
 void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 			  struct tcf_proto *tp_head)
 {
@@ -1423,28 +1419,30 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 
 	if (!tp_head) {
 		RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
-		/* Wait for flying RCU callback before it is freed. */
-		rcu_barrier();
-		return;
-	}
+	} else {
+		miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
+			&miniqp->miniq1 : &miniqp->miniq2;
 
-	miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
-		&miniqp->miniq1 : &miniqp->miniq2;
+		/* We need to make sure that readers won't see the miniq
+		 * we are about to modify. So ensure that at least one RCU
+		 * grace period has elapsed since the miniq was made
+		 * inactive.
+		 */
+		if (IS_ENABLED(CONFIG_PREEMPT_RT))
+			cond_synchronize_rcu(miniq->rcu_state);
+		else if (!poll_state_synchronize_rcu(miniq->rcu_state))
+			synchronize_rcu_expedited();
 
-	/* We need to make sure that readers won't see the miniq
-	 * we are about to modify. So wait until previous call_rcu callback
-	 * is done.
-	 */
-	rcu_barrier();
-	miniq->filter_list = tp_head;
-	rcu_assign_pointer(*miniqp->p_miniq, miniq);
+		miniq->filter_list = tp_head;
+		rcu_assign_pointer(*miniqp->p_miniq, miniq);
+	}
 
 	if (miniq_old)
-		/* This is counterpart of the rcu barriers above. We need to
+		/* This is counterpart of the rcu sync above. We need to
 		 * block potential new user of miniq_old until all readers
 		 * are not seeing it.
 		 */
-		call_rcu(&miniq_old->rcu, mini_qdisc_rcu_func);
+		miniq_old->rcu_state = start_poll_synchronize_rcu();
 }
 EXPORT_SYMBOL(mini_qdisc_pair_swap);
 
@@ -1463,6 +1461,8 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 	miniqp->miniq1.cpu_qstats = qdisc->cpu_qstats;
 	miniqp->miniq2.cpu_bstats = qdisc->cpu_bstats;
 	miniqp->miniq2.cpu_qstats = qdisc->cpu_qstats;
+	miniqp->miniq1.rcu_state = get_state_synchronize_rcu();
+	miniqp->miniq2.rcu_state = miniqp->miniq1.rcu_state;
 	miniqp->p_miniq = p_miniq;
 }
 EXPORT_SYMBOL(mini_qdisc_pair_init);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
  2021-10-22 16:17 [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap() Seth Forshee
@ 2021-10-22 20:36 ` Paul E. McKenney
  2021-10-25 19:48 ` Jakub Kicinski
  1 sibling, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2021-10-22 20:36 UTC (permalink / raw)
  To: Seth Forshee
  Cc: David S. Miller, Jakub Kicinski, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, netdev, linux-kernel

On Fri, Oct 22, 2021 at 11:17:46AM -0500, Seth Forshee wrote:
> From: Seth Forshee <sforshee@digitalocean.com>
> 
> Currently rcu_barrier() is used to ensure that no readers of the
> inactive mini_Qdisc buffer remain before it is reused. This waits for
> any pending RCU callbacks to complete, when all that is actually
> required is to wait for one RCU grace period to elapse after the buffer
> was made inactive. This means that using rcu_barrier() may result in
> unnecessary waits.
> 
> To improve this, store the current RCU state when a buffer is made
> inactive and use poll_state_synchronize_rcu() to check whether a full
> grace period has elapsed before reusing it. If a full grace period has
> not elapsed, wait for a grace period to elapse, and in the non-RT case
> use synchronize_rcu_expedited() to hasten it.
> 
> Since this approach eliminates the RCU callback it is no longer
> necessary to synchronize_rcu() in the tp_head==NULL case. However, the
> RCU state should still be saved for the previously active buffer.
> 
> Before this change I would typically see mini_qdisc_pair_swap() take
> tens of milliseconds to complete. After this change it typcially
> finishes in less than 1 ms, and often it takes just a few microseconds.
> 
> Thanks to Paul for walking me through the options for improving this.
> 
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>

At first glance, this looks plausible to me!

							Thanx, Paul

> ---
>  include/net/sch_generic.h |  2 +-
>  net/sched/sch_generic.c   | 38 +++++++++++++++++++-------------------
>  2 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index f8631ad3c868..c725464be814 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -1299,7 +1299,7 @@ struct mini_Qdisc {
>  	struct tcf_block *block;
>  	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
>  	struct gnet_stats_queue	__percpu *cpu_qstats;
> -	struct rcu_head rcu;
> +	unsigned long rcu_state;
>  };
>  
>  static inline void mini_qdisc_bstats_cpu_update(struct mini_Qdisc *miniq,
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 854d2b38db85..8540c12c9a62 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1407,10 +1407,6 @@ void psched_ratecfg_precompute(struct psched_ratecfg *r,
>  }
>  EXPORT_SYMBOL(psched_ratecfg_precompute);
>  
> -static void mini_qdisc_rcu_func(struct rcu_head *head)
> -{
> -}
> -
>  void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>  			  struct tcf_proto *tp_head)
>  {
> @@ -1423,28 +1419,30 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>  
>  	if (!tp_head) {
>  		RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> -		/* Wait for flying RCU callback before it is freed. */
> -		rcu_barrier();
> -		return;
> -	}
> +	} else {
> +		miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> +			&miniqp->miniq1 : &miniqp->miniq2;
>  
> -	miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> -		&miniqp->miniq1 : &miniqp->miniq2;
> +		/* We need to make sure that readers won't see the miniq
> +		 * we are about to modify. So ensure that at least one RCU
> +		 * grace period has elapsed since the miniq was made
> +		 * inactive.
> +		 */
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +			cond_synchronize_rcu(miniq->rcu_state);
> +		else if (!poll_state_synchronize_rcu(miniq->rcu_state))
> +			synchronize_rcu_expedited();
>  
> -	/* We need to make sure that readers won't see the miniq
> -	 * we are about to modify. So wait until previous call_rcu callback
> -	 * is done.
> -	 */
> -	rcu_barrier();
> -	miniq->filter_list = tp_head;
> -	rcu_assign_pointer(*miniqp->p_miniq, miniq);
> +		miniq->filter_list = tp_head;
> +		rcu_assign_pointer(*miniqp->p_miniq, miniq);
> +	}
>  
>  	if (miniq_old)
> -		/* This is counterpart of the rcu barriers above. We need to
> +		/* This is counterpart of the rcu sync above. We need to
>  		 * block potential new user of miniq_old until all readers
>  		 * are not seeing it.
>  		 */
> -		call_rcu(&miniq_old->rcu, mini_qdisc_rcu_func);
> +		miniq_old->rcu_state = start_poll_synchronize_rcu();
>  }
>  EXPORT_SYMBOL(mini_qdisc_pair_swap);
>  
> @@ -1463,6 +1461,8 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
>  	miniqp->miniq1.cpu_qstats = qdisc->cpu_qstats;
>  	miniqp->miniq2.cpu_bstats = qdisc->cpu_bstats;
>  	miniqp->miniq2.cpu_qstats = qdisc->cpu_qstats;
> +	miniqp->miniq1.rcu_state = get_state_synchronize_rcu();
> +	miniqp->miniq2.rcu_state = miniqp->miniq1.rcu_state;
>  	miniqp->p_miniq = p_miniq;
>  }
>  EXPORT_SYMBOL(mini_qdisc_pair_init);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
  2021-10-22 16:17 [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap() Seth Forshee
  2021-10-22 20:36 ` Paul E. McKenney
@ 2021-10-25 19:48 ` Jakub Kicinski
  2021-10-26 12:27   ` Seth Forshee
  1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2021-10-25 19:48 UTC (permalink / raw)
  To: Seth Forshee
  Cc: David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Paul E. McKenney, netdev, linux-kernel

On Fri, 22 Oct 2021 11:17:46 -0500 Seth Forshee wrote:
> From: Seth Forshee <sforshee@digitalocean.com>
> 
> Currently rcu_barrier() is used to ensure that no readers of the
> inactive mini_Qdisc buffer remain before it is reused. This waits for
> any pending RCU callbacks to complete, when all that is actually
> required is to wait for one RCU grace period to elapse after the buffer
> was made inactive. This means that using rcu_barrier() may result in
> unnecessary waits.
> 
> To improve this, store the current RCU state when a buffer is made
> inactive and use poll_state_synchronize_rcu() to check whether a full
> grace period has elapsed before reusing it. If a full grace period has
> not elapsed, wait for a grace period to elapse, and in the non-RT case
> use synchronize_rcu_expedited() to hasten it.
> 
> Since this approach eliminates the RCU callback it is no longer
> necessary to synchronize_rcu() in the tp_head==NULL case. However, the
> RCU state should still be saved for the previously active buffer.
> 
> Before this change I would typically see mini_qdisc_pair_swap() take
> tens of milliseconds to complete. After this change it typcially
> finishes in less than 1 ms, and often it takes just a few microseconds.
> 
> Thanks to Paul for walking me through the options for improving this.
> 
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Signed-off-by: Seth Forshee <sforshee@digitalocean.com>

LGTM, but please rebase and retest on top of latest net-next.

>  void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>  			  struct tcf_proto *tp_head)
>  {
> @@ -1423,28 +1419,30 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>  
>  	if (!tp_head) {
>  		RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> -		/* Wait for flying RCU callback before it is freed. */
> -		rcu_barrier();
> -		return;
> -	}
> +	} else {
> +		miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> +			&miniqp->miniq1 : &miniqp->miniq2;
>  
> -	miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> -		&miniqp->miniq1 : &miniqp->miniq2;

nit: any reason this doesn't read:

	miniq = miniq_old != &miniqp->miniq1 ? 
		&miniqp->miniq1 : &miniqp->miniq2;

Surely it's not equal to miniq1 or miniq2 if it's NULL.

> +		/* We need to make sure that readers won't see the miniq
> +		 * we are about to modify. So ensure that at least one RCU
> +		 * grace period has elapsed since the miniq was made
> +		 * inactive.
> +		 */
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +			cond_synchronize_rcu(miniq->rcu_state);
> +		else if (!poll_state_synchronize_rcu(miniq->rcu_state))
> +			synchronize_rcu_expedited();
>  
> -	/* We need to make sure that readers won't see the miniq
> -	 * we are about to modify. So wait until previous call_rcu callback
> -	 * is done.
> -	 */
> -	rcu_barrier();
> -	miniq->filter_list = tp_head;
> -	rcu_assign_pointer(*miniqp->p_miniq, miniq);
> +		miniq->filter_list = tp_head;
> +		rcu_assign_pointer(*miniqp->p_miniq, miniq);
> +	}
>  
>  	if (miniq_old)
> -		/* This is counterpart of the rcu barriers above. We need to
> +		/* This is counterpart of the rcu sync above. We need to
>  		 * block potential new user of miniq_old until all readers
>  		 * are not seeing it.
>  		 */
> -		call_rcu(&miniq_old->rcu, mini_qdisc_rcu_func);
> +		miniq_old->rcu_state = start_poll_synchronize_rcu();
>  }
>  EXPORT_SYMBOL(mini_qdisc_pair_swap);
>  
> @@ -1463,6 +1461,8 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
>  	miniqp->miniq1.cpu_qstats = qdisc->cpu_qstats;
>  	miniqp->miniq2.cpu_bstats = qdisc->cpu_bstats;
>  	miniqp->miniq2.cpu_qstats = qdisc->cpu_qstats;
> +	miniqp->miniq1.rcu_state = get_state_synchronize_rcu();
> +	miniqp->miniq2.rcu_state = miniqp->miniq1.rcu_state;
>  	miniqp->p_miniq = p_miniq;
>  }
>  EXPORT_SYMBOL(mini_qdisc_pair_init);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
  2021-10-25 19:48 ` Jakub Kicinski
@ 2021-10-26 12:27   ` Seth Forshee
  0 siblings, 0 replies; 4+ messages in thread
From: Seth Forshee @ 2021-10-26 12:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Paul E. McKenney, netdev, linux-kernel

On Mon, Oct 25, 2021 at 12:48:28PM -0700, Jakub Kicinski wrote:
> On Fri, 22 Oct 2021 11:17:46 -0500 Seth Forshee wrote:
> > From: Seth Forshee <sforshee@digitalocean.com>
> > 
> > Currently rcu_barrier() is used to ensure that no readers of the
> > inactive mini_Qdisc buffer remain before it is reused. This waits for
> > any pending RCU callbacks to complete, when all that is actually
> > required is to wait for one RCU grace period to elapse after the buffer
> > was made inactive. This means that using rcu_barrier() may result in
> > unnecessary waits.
> > 
> > To improve this, store the current RCU state when a buffer is made
> > inactive and use poll_state_synchronize_rcu() to check whether a full
> > grace period has elapsed before reusing it. If a full grace period has
> > not elapsed, wait for a grace period to elapse, and in the non-RT case
> > use synchronize_rcu_expedited() to hasten it.
> > 
> > Since this approach eliminates the RCU callback it is no longer
> > necessary to synchronize_rcu() in the tp_head==NULL case. However, the
> > RCU state should still be saved for the previously active buffer.
> > 
> > Before this change I would typically see mini_qdisc_pair_swap() take
> > tens of milliseconds to complete. After this change it typcially
> > finishes in less than 1 ms, and often it takes just a few microseconds.
> > 
> > Thanks to Paul for walking me through the options for improving this.
> > 
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
> 
> LGTM, but please rebase and retest on top of latest net-next.

Will do.

> >  void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> >  			  struct tcf_proto *tp_head)
> >  {
> > @@ -1423,28 +1419,30 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> >  
> >  	if (!tp_head) {
> >  		RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> > -		/* Wait for flying RCU callback before it is freed. */
> > -		rcu_barrier();
> > -		return;
> > -	}
> > +	} else {
> > +		miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> > +			&miniqp->miniq1 : &miniqp->miniq2;
> >  
> > -	miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> > -		&miniqp->miniq1 : &miniqp->miniq2;
> 
> nit: any reason this doesn't read:
> 
> 	miniq = miniq_old != &miniqp->miniq1 ? 
> 		&miniqp->miniq1 : &miniqp->miniq2;
> 
> Surely it's not equal to miniq1 or miniq2 if it's NULL.

I agree, that looks simpler and functionally equivalent. It seems
off-topic for this patch though; I'm only touching that line to change
the indentation.

Thanks,
Seth

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-26 12:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 16:17 [PATCH] net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap() Seth Forshee
2021-10-22 20:36 ` Paul E. McKenney
2021-10-25 19:48 ` Jakub Kicinski
2021-10-26 12:27   ` Seth Forshee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).