All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: sched: put back q.qlen into a single location
@ 2019-02-28 20:55 Eric Dumazet
  2019-03-02  7:07 ` David Miller
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Dumazet @ 2019-02-28 20:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, John Fastabend,
	Jamal Hadi Salim, Cong Wang, Jiri Pirko

In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).

It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.

But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.

HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
  htb_activate() -> WARN_ON()
  htb_dequeue_tree() to decide if a class can be htb_deactivated
  when it has no more packets.

HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.

Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.

It seems we have to put back qlen in a central location,
at least for stable kernels.

For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.

For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())

This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.

- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).

Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.

Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
---
 include/net/sch_generic.h | 31 +++++++++++++------------------
 net/core/gen_stats.c      |  2 --
 net/sched/sch_generic.c   | 13 ++++++-------
 3 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9481f2c142e26ee1174653d673e6134edd9851da..e7eb4aa6ccc94e24f2d08d92d9df59db903664ab 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -51,7 +51,10 @@ struct qdisc_size_table {
 struct qdisc_skb_head {
 	struct sk_buff	*head;
 	struct sk_buff	*tail;
-	__u32		qlen;
+	union {
+		u32		qlen;
+		atomic_t	atomic_qlen;
+	};
 	spinlock_t	lock;
 };
 
@@ -408,27 +411,19 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 	BUILD_BUG_ON(sizeof(qcb->data) < sz);
 }
 
-static inline int qdisc_qlen_cpu(const struct Qdisc *q)
-{
-	return this_cpu_ptr(q->cpu_qstats)->qlen;
-}
-
 static inline int qdisc_qlen(const struct Qdisc *q)
 {
 	return q->q.qlen;
 }
 
-static inline int qdisc_qlen_sum(const struct Qdisc *q)
+static inline u32 qdisc_qlen_sum(const struct Qdisc *q)
 {
-	__u32 qlen = q->qstats.qlen;
-	int i;
+	u32 qlen = q->qstats.qlen;
 
-	if (q->flags & TCQ_F_NOLOCK) {
-		for_each_possible_cpu(i)
-			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
-	} else {
+	if (q->flags & TCQ_F_NOLOCK)
+		qlen += atomic_read(&q->q.atomic_qlen);
+	else
 		qlen += q->q.qlen;
-	}
 
 	return qlen;
 }
@@ -825,14 +820,14 @@ static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
 	this_cpu_add(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
 }
 
-static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
+static inline void qdisc_qstats_atomic_qlen_inc(struct Qdisc *sch)
 {
-	this_cpu_inc(sch->cpu_qstats->qlen);
+	atomic_inc(&sch->q.atomic_qlen);
 }
 
-static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
+static inline void qdisc_qstats_atomic_qlen_dec(struct Qdisc *sch)
 {
-	this_cpu_dec(sch->cpu_qstats->qlen);
+	atomic_dec(&sch->q.atomic_qlen);
 }
 
 static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 9bf1b9ad17806dfaa579317408e5c4707d014cc0..ac679f74ba4754e69c51dd9b3fb9b23611782300 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -291,7 +291,6 @@ __gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats,
 	for_each_possible_cpu(i) {
 		const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
 
-		qstats->qlen = 0;
 		qstats->backlog += qcpu->backlog;
 		qstats->drops += qcpu->drops;
 		qstats->requeues += qcpu->requeues;
@@ -307,7 +306,6 @@ void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
 	if (cpu) {
 		__gnet_stats_copy_queue_cpu(qstats, cpu);
 	} else {
-		qstats->qlen = q->qlen;
 		qstats->backlog = q->backlog;
 		qstats->drops = q->drops;
 		qstats->requeues = q->requeues;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 968a85fe4d4a9a0f5887989789cdd97ae7033f5a..de31f2f3b9730bc2e5d35648483d43ea2d86d008 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -68,7 +68,7 @@ static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
 			skb = __skb_dequeue(&q->skb_bad_txq);
 			if (qdisc_is_percpu_stats(q)) {
 				qdisc_qstats_cpu_backlog_dec(q, skb);
-				qdisc_qstats_cpu_qlen_dec(q);
+				qdisc_qstats_atomic_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
 				q->q.qlen--;
@@ -108,7 +108,7 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
 
 	if (qdisc_is_percpu_stats(q)) {
 		qdisc_qstats_cpu_backlog_inc(q, skb);
-		qdisc_qstats_cpu_qlen_inc(q);
+		qdisc_qstats_atomic_qlen_inc(q);
 	} else {
 		qdisc_qstats_backlog_inc(q, skb);
 		q->q.qlen++;
@@ -147,7 +147,7 @@ static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
 
 		qdisc_qstats_cpu_requeues_inc(q);
 		qdisc_qstats_cpu_backlog_inc(q, skb);
-		qdisc_qstats_cpu_qlen_inc(q);
+		qdisc_qstats_atomic_qlen_inc(q);
 
 		skb = next;
 	}
@@ -252,7 +252,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 			skb = __skb_dequeue(&q->gso_skb);
 			if (qdisc_is_percpu_stats(q)) {
 				qdisc_qstats_cpu_backlog_dec(q, skb);
-				qdisc_qstats_cpu_qlen_dec(q);
+				qdisc_qstats_atomic_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
 				q->q.qlen--;
@@ -645,7 +645,7 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
 	if (unlikely(err))
 		return qdisc_drop_cpu(skb, qdisc, to_free);
 
-	qdisc_qstats_cpu_qlen_inc(qdisc);
+	qdisc_qstats_atomic_qlen_inc(qdisc);
 	/* Note: skb can not be used after skb_array_produce(),
 	 * so we better not use qdisc_qstats_cpu_backlog_inc()
 	 */
@@ -670,7 +670,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 	if (likely(skb)) {
 		qdisc_qstats_cpu_backlog_dec(qdisc, skb);
 		qdisc_bstats_cpu_update(qdisc, skb);
-		qdisc_qstats_cpu_qlen_dec(qdisc);
+		qdisc_qstats_atomic_qlen_dec(qdisc);
 	}
 
 	return skb;
@@ -714,7 +714,6 @@ static void pfifo_fast_reset(struct Qdisc *qdisc)
 		struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i);
 
 		q->backlog = 0;
-		q->qlen = 0;
 	}
 }
 
-- 
2.21.0.352.gf09ad66450-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: sched: put back q.qlen into a single location
  2019-02-28 20:55 [PATCH net] net: sched: put back q.qlen into a single location Eric Dumazet
@ 2019-03-02  7:07 ` David Miller
  2019-03-02 22:11 ` David Miller
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2019-03-02  7:07 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, john.fastabend, jhs, xiyou.wangcong, jiri

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 28 Feb 2019 12:55:43 -0800

> In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
> John made the assumption that the data path had no need to read
> the qdisc qlen (number of packets in the qdisc).
> 
> It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
> children.
> 
> But pfifo_fast can be used as leaf in class full qdiscs, and existing
> logic needs to access the child qlen in an efficient way.
 ...

I'm going to let this sit a bit for review...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: sched: put back q.qlen into a single location
  2019-02-28 20:55 [PATCH net] net: sched: put back q.qlen into a single location Eric Dumazet
  2019-03-02  7:07 ` David Miller
@ 2019-03-02 22:11 ` David Miller
  2019-03-03 21:32 ` John Fastabend
  2019-03-08 11:01 ` Paolo Abeni
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2019-03-02 22:11 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, john.fastabend, jhs, xiyou.wangcong, jiri

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 28 Feb 2019 12:55:43 -0800

> In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
> John made the assumption that the data path had no need to read
> the qdisc qlen (number of packets in the qdisc).
> 
> It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
> children.
> 
> But pfifo_fast can be used as leaf in class full qdiscs, and existing
> logic needs to access the child qlen in an efficient way.
> 
> HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
>   htb_activate() -> WARN_ON()
>   htb_dequeue_tree() to decide if a class can be htb_deactivated
>   when it has no more packets.
> 
> HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
> qdisc_tree_reduce_backlog() also read q.qlen directly.
 ...
> This patch adds atomic_qlen (in the same location than qlen)
> and renames the following helpers, since we want to express
> they can be used without qdisc lock, and that qlen is no longer percpu.
> 
> - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
> - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()
> 
> Later (net-next) we might revert this patch by tracking all these
> qlen uses and replace them by a more efficient method (not having
> to access a precise qlen, but an empty/non_empty status that might
> be less expensive to maintain/track).
> 
> Another possibility is to have a legacy pfifo_fast version that would
> be used when used a a child qdisc, since the parent qdisc needs
> a spinlock anyway. But then, future lockless qdiscs would also
> have the same problem.
> 
> Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: sched: put back q.qlen into a single location
  2019-02-28 20:55 [PATCH net] net: sched: put back q.qlen into a single location Eric Dumazet
  2019-03-02  7:07 ` David Miller
  2019-03-02 22:11 ` David Miller
@ 2019-03-03 21:32 ` John Fastabend
  2019-03-08 11:01 ` Paolo Abeni
  3 siblings, 0 replies; 5+ messages in thread
From: John Fastabend @ 2019-03-03 21:32 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, Jamal Hadi Salim, Cong Wang, Jiri Pirko

On 2/28/19 12:55 PM, Eric Dumazet wrote:
> In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
> John made the assumption that the data path had no need to read
> the qdisc qlen (number of packets in the qdisc).
> 
> It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
> children.
> 
> But pfifo_fast can be used as leaf in class full qdiscs, and existing
> logic needs to access the child qlen in an efficient way.
> 
> HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
>   htb_activate() -> WARN_ON()
>   htb_dequeue_tree() to decide if a class can be htb_deactivated
>   when it has no more packets.
> 
> HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
> qdisc_tree_reduce_backlog() also read q.qlen directly.
> 
> Using qdisc_qlen_sum() (which iterates over all possible cpus)
> in the data path is a non starter.
> 

Yep.

> It seems we have to put back qlen in a central location,
> at least for stable kernels.
> 

I can't think of anything better either unfortunately.

> For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
> so the existing q.qlen{++|--} are correct.
> 
> For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
> because the spinlock might be not held (for example from
> pfifo_fast_enqueue() and pfifo_fast_dequeue())
> 
> This patch adds atomic_qlen (in the same location than qlen)
> and renames the following helpers, since we want to express
> they can be used without qdisc lock, and that qlen is no longer percpu.
> 
> - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
> - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()
> 
> Later (net-next) we might revert this patch by tracking all these
> qlen uses and replace them by a more efficient method (not having
> to access a precise qlen, but an empty/non_empty status that might
> be less expensive to maintain/track).

Right I think this is the next step.

> 
> Another possibility is to have a legacy pfifo_fast version that would
> be used when used a a child qdisc, since the parent qdisc needs
> a spinlock anyway. But then, future lockless qdiscs would also
> have the same problem.
> 

I would prefer to optimize the qlen checks out for each qdisc but
haven't checked its possible yet in all cases.

> Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/net/sch_generic.h | 31 +++++++++++++------------------
>  net/core/gen_stats.c      |  2 --
>  net/sched/sch_generic.c   | 13 ++++++-------
>  3 files changed, 19 insertions(+), 27 deletions(-)
> 

Thanks!

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: sched: put back q.qlen into a single location
  2019-02-28 20:55 [PATCH net] net: sched: put back q.qlen into a single location Eric Dumazet
                   ` (2 preceding siblings ...)
  2019-03-03 21:32 ` John Fastabend
@ 2019-03-08 11:01 ` Paolo Abeni
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2019-03-08 11:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Eric Dumazet, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Ivan Vecera, David S . Miller

Hi,

Thank you for fixing this!

On Thu, 2019-02-28 at 12:55 -0800, Eric Dumazet wrote:
> Later (net-next) we might revert this patch by tracking all these
> qlen uses and replace them by a more efficient method (not having
> to access a precise qlen, but an empty/non_empty status that might
> be less expensive to maintain/track).

We are working on this. The final goal is using this info to enable
TCQ_F_CAN_BYPASS for nolock qdisc, too, as we see regressions in
uncontended scenarios with pfifo_fast compared to pre TCQ_F_NOLOCK
implementation.

Please let us know if the "empty/non_empty status" refactor is in your
radar, so we can avoid conflicts.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-08 11:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-28 20:55 [PATCH net] net: sched: put back q.qlen into a single location Eric Dumazet
2019-03-02  7:07 ` David Miller
2019-03-02 22:11 ` David Miller
2019-03-03 21:32 ` John Fastabend
2019-03-08 11:01 ` Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.