All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/17] latest qdisc patch series
@ 2017-05-02 15:24 John Fastabend
  2017-05-02 15:25 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
                   ` (14 more replies)
  0 siblings, 15 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

This is my latest series of patches to remove the locking requirement
from qdisc logic. 

I still have a couple issues to resolve main problem at the moment is
pfifo_fast qdisc without contention running under mq or mqprio is
actually slower by a few 100k pps with pktgen tests. I am trying to
sort out how to get the performance back now.

The main difference in these patches is to recognize that parking
packets on the gso slot and bad txq cases are really edge cases and
can be handled with locked queue operations. This simplifies the
patches and avoids racing with netif scheduler. If you hit these paths
it means either there is a driver overrun, should be very rare with
bql, or a TCP session has migrated cores with outstanding packets.

Patches 16, 17 were an attempt to resolve the performance degradation
in the uncontended case. The idea is to use the qdisc lock around the
dequeue operations so that we only need to take a single lock for the
entire bulk operation and can consume packets out of skb_array without
a spin_lock.

Another potential issue, I think, is if we have multiple packets
on the bad_txq we should bulk out of the bad_txq and not jump to
bulking out of the "normal" dequeue qdisc op.

I've mostly tested with pktgen at this point but have done some basic
netperf tests and both seem to be working.

I am not going to be able to work on this for a few days so I figured
it might be worth getting some feedback if there is any. Any thoughts
on how to squeeze a few extra pps out of this would be very useful. I
would like to avoid having a degradation in the micro-benchmark if
possible. Further, it should be possible best I can tell.

Thanks!
John

---

John Fastabend (17):
      net: sched: cleanup qdisc_run and __qdisc_run semantics
      net: sched: allow qdiscs to handle locking
      net: sched: remove remaining uses for qdisc_qlen in xmit path
      net: sched: provide per cpu qstat helpers
      net: sched: a dflt qdisc may be used with per cpu stats
      net: sched: explicit locking in gso_cpu fallback
      net: sched: drop qdisc_reset from dev_graft_qdisc
      net: sched: support skb_bad_tx with lockless qdisc
      net: sched: check for frozen queue before skb_bad_txq check
      net: sched: qdisc_qlen for per cpu logic
      net: sched: helper to sum qlen
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
      net: skb_array: expose peek API
      net: sched: pfifo_fast use skb_array
      net: skb_array additions for unlocked consumer
      net: sched: lock once per bulk dequeue


 include/linux/skb_array.h |   10 +
 include/net/gen_stats.h   |    3 
 include/net/pkt_sched.h   |   10 +
 include/net/sch_generic.h |   82 ++++++++-
 net/core/dev.c            |   31 +++
 net/core/gen_stats.c      |    9 +
 net/sched/sch_api.c       |    3 
 net/sched/sch_generic.c   |  400 +++++++++++++++++++++++++++++++++------------
 net/sched/sch_mq.c        |   25 ++-
 net/sched/sch_mqprio.c    |   61 ++++---
 10 files changed, 470 insertions(+), 164 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
@ 2017-05-02 15:25 ` John Fastabend
  2017-05-02 15:25 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

Currently __qdisc_run calls qdisc_run_end() but does not call
qdisc_run_begin(). This makes it hard to track pairs of
qdisc_run_{begin,end} across function calls.

To simplify reading these code paths and simpler code this
patch moves begin/end calls into qdisc_run().

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/pkt_sched.h |    4 +++-
 net/core/dev.c          |    5 +++--
 net/sched/sch_generic.c |    2 --
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index bec46f6..b7f5776 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -109,8 +109,10 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 static inline void qdisc_run(struct Qdisc *q)
 {
-	if (qdisc_run_begin(q))
+	if (qdisc_run_begin(q)) {
 		__qdisc_run(q);
+		qdisc_run_end(q);
+	}
 }
 
 int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
diff --git a/net/core/dev.c b/net/core/dev.c
index db6e315..3169074 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3099,9 +3099,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 				contended = false;
 			}
 			__qdisc_run(q);
-		} else
-			qdisc_run_end(q);
+		}
 
+		qdisc_run_end(q);
 		rc = NET_XMIT_SUCCESS;
 	} else {
 		rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
@@ -3111,6 +3111,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 				contended = false;
 			}
 			__qdisc_run(q);
+			qdisc_run_end(q);
 		}
 	}
 	spin_unlock(root_lock);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 52a2c55..ef33bf8 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -262,8 +262,6 @@ void __qdisc_run(struct Qdisc *q)
 			break;
 		}
 	}
-
-	qdisc_run_end(q);
 }
 
 unsigned long dev_trans_start(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
  2017-05-02 15:25 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
@ 2017-05-02 15:25 ` John Fastabend
  2017-05-02 15:25 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

This patch adds a flag for queueing disciplines to indicate the stack
does not need to use the qdisc lock to protect operations. This can
be used to build lockless scheduling algorithms and improving
performance.

The flag is checked in the tx path and the qdisc lock is only taken
if it is not set. For now use a conditional if statement. Later we
could be more aggressive if it proves worthwhile and use a static key
or wrap this in a likely().

Also the lockless case drops the TCQ_F_CAN_BYPASS logic. The reason
for this is synchronizing a qlen counter across threads proves to
cost more than doing the enqueue/dequeue operations when tested with
pktgen.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    1 +
 net/core/dev.c            |   26 ++++++++++++++++++++++----
 net/sched/sch_generic.c   |   30 ++++++++++++++++++++----------
 3 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 22e5209..f5c8613 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,6 +67,7 @@ struct Qdisc {
 				      * qdisc_tree_decrease_qlen() should stop.
 				      */
 #define TCQ_F_INVISIBLE		0x80 /* invisible by default in dump */
+#define TCQ_F_NOLOCK		0x100 /* qdisc does not require locking */
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 3169074..6d2db37 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3069,6 +3069,21 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	int rc;
 
 	qdisc_calculate_pkt_len(skb, q);
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
+			__qdisc_drop(skb, &to_free);
+			rc = NET_XMIT_DROP;
+		} else {
+			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+			__qdisc_run(q);
+		}
+
+		if (unlikely(to_free))
+			kfree_skb_list(to_free);
+		return rc;
+	}
+
 	/*
 	 * Heuristic to force contended enqueues to serialize on a
 	 * separate lock before trying to get qdisc main lock.
@@ -3896,19 +3911,22 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
 
 		while (head) {
 			struct Qdisc *q = head;
-			spinlock_t *root_lock;
+			spinlock_t *root_lock = NULL;
 
 			head = head->next_sched;
 
-			root_lock = qdisc_lock(q);
-			spin_lock(root_lock);
+			if (!(q->flags & TCQ_F_NOLOCK)) {
+				root_lock = qdisc_lock(q);
+				spin_lock(root_lock);
+			}
 			/* We need to make sure head->next_sched is read
 			 * before clearing __QDISC_STATE_SCHED
 			 */
 			smp_mb__before_atomic();
 			clear_bit(__QDISC_STATE_SCHED, &q->state);
 			qdisc_run(q);
-			spin_unlock(root_lock);
+			if (root_lock)
+				spin_unlock(root_lock);
 		}
 	}
 }
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ef33bf8..39653e8 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -170,7 +170,8 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	int ret = NETDEV_TX_BUSY;
 
 	/* And release qdisc */
-	spin_unlock(root_lock);
+	if (root_lock)
+		spin_unlock(root_lock);
 
 	/* Note that we validate skb (GSO, checksum, ...) outside of locks */
 	if (validate)
@@ -183,10 +184,13 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 		HARD_TX_UNLOCK(dev, txq);
 	} else {
-		spin_lock(root_lock);
+		if (root_lock)
+			spin_lock(root_lock);
 		return qdisc_qlen(q);
 	}
-	spin_lock(root_lock);
+
+	if (root_lock)
+		spin_lock(root_lock);
 
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was consumed */
@@ -227,9 +231,9 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
  */
 static inline int qdisc_restart(struct Qdisc *q, int *packets)
 {
+	spinlock_t *root_lock = NULL;
 	struct netdev_queue *txq;
 	struct net_device *dev;
-	spinlock_t *root_lock;
 	struct sk_buff *skb;
 	bool validate;
 
@@ -238,7 +242,9 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
 	if (unlikely(!skb))
 		return 0;
 
-	root_lock = qdisc_lock(q);
+	if (!(q->flags & TCQ_F_NOLOCK))
+		root_lock = qdisc_lock(q);
+
 	dev = qdisc_dev(q);
 	txq = skb_get_tx_queue(dev, skb);
 
@@ -875,14 +881,18 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 
 		dev_queue = netdev_get_tx_queue(dev, i);
 		q = dev_queue->qdisc_sleeping;
-		root_lock = qdisc_lock(q);
 
-		spin_lock_bh(root_lock);
+		if (q->flags & TCQ_F_NOLOCK) {
+			val = test_bit(__QDISC_STATE_SCHED, &q->state);
+		} else {
+			root_lock = qdisc_lock(q);
+			spin_lock_bh(root_lock);
 
-		val = (qdisc_is_running(q) ||
-		       test_bit(__QDISC_STATE_SCHED, &q->state));
+			val = (qdisc_is_running(q) ||
+			       test_bit(__QDISC_STATE_SCHED, &q->state));
 
-		spin_unlock_bh(root_lock);
+			spin_unlock_bh(root_lock);
+		}
 
 		if (val)
 			return true;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
  2017-05-02 15:25 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
  2017-05-02 15:25 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend
@ 2017-05-02 15:25 ` John Fastabend
  2017-05-02 15:26 ` [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers John Fastabend
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
of the routine only check if it is zero or not. Simplify the logic so
that we don't need to return an actual queue length value.

This introduces a case now where sch_direct_xmit would have returned
a qlen of zero but now it returns true. However in this case all
call sites of sch_direct_xmit will implement a dequeue() and get
a null skb and abort. This trades tracking qlen in the hotpath for
an extra dequeue operation. Overall this seems to be good for
performance.

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/pkt_sched.h |    6 +++---
 net/sched/sch_generic.c |   23 ++++++++++++-----------
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index b7f5776..69b90a3 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -101,9 +101,9 @@ struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
 void qdisc_put_rtab(struct qdisc_rate_table *tab);
 void qdisc_put_stab(struct qdisc_size_table *tab);
 void qdisc_warn_nonwc(const char *txt, struct Qdisc *qdisc);
-int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
-		    struct net_device *dev, struct netdev_queue *txq,
-		    spinlock_t *root_lock, bool validate);
+bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
+		     struct net_device *dev, struct netdev_queue *txq,
+		     spinlock_t *root_lock, bool validate);
 
 void __qdisc_run(struct Qdisc *q);
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 39653e8..2f66b1f 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -160,12 +160,12 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
  * only one CPU can execute this function.
  *
  * Returns to the caller:
- *				0  - queue is empty or throttled.
- *				>0 - queue is not empty.
+ *				false  - hardware queue frozen backoff
+ *				true   - feel free to send more pkts
  */
-int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
-		    struct net_device *dev, struct netdev_queue *txq,
-		    spinlock_t *root_lock, bool validate)
+bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
+		     struct net_device *dev, struct netdev_queue *txq,
+		     spinlock_t *root_lock, bool validate)
 {
 	int ret = NETDEV_TX_BUSY;
 
@@ -186,7 +186,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	} else {
 		if (root_lock)
 			spin_lock(root_lock);
-		return qdisc_qlen(q);
+		return true;
 	}
 
 	if (root_lock)
@@ -194,18 +194,19 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was consumed */
-		ret = qdisc_qlen(q);
+		ret = true;
 	} else {
 		/* Driver returned NETDEV_TX_BUSY - requeue skb */
 		if (unlikely(ret != NETDEV_TX_BUSY))
 			net_warn_ratelimited("BUG %s code %d qlen %d\n",
 					     dev->name, ret, q->q.qlen);
 
-		ret = dev_requeue_skb(skb, q);
+		dev_requeue_skb(skb, q);
+		ret = false;
 	}
 
 	if (ret && netif_xmit_frozen_or_stopped(txq))
-		ret = 0;
+		ret = false;
 
 	return ret;
 }
@@ -229,7 +230,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
  *				>0 - queue is not empty.
  *
  */
-static inline int qdisc_restart(struct Qdisc *q, int *packets)
+static inline bool qdisc_restart(struct Qdisc *q, int *packets)
 {
 	spinlock_t *root_lock = NULL;
 	struct netdev_queue *txq;
@@ -240,7 +241,7 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
 	/* Dequeue packet */
 	skb = dequeue_skb(q, &validate, packets);
 	if (unlikely(!skb))
-		return 0;
+		return false;
 
 	if (!(q->flags & TCQ_F_NOLOCK))
 		root_lock = qdisc_lock(q);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (2 preceding siblings ...)
  2017-05-02 15:25 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
@ 2017-05-02 15:26 ` John Fastabend
  2017-05-02 15:26 ` [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

The per cpu qstats support was added with per cpu bstat support which
is currently used by the ingress qdisc. This patch adds a set of
helpers needed to make other qdiscs that use qstats per cpu as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f5c8613..9531b81 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -590,12 +590,39 @@ static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
+						const struct sk_buff *skb)
+{
+	this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
+}
+
 static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
 					    const struct sk_buff *skb)
 {
 	sch->qstats.backlog += qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
+						const struct sk_buff *skb)
+{
+	this_cpu_add(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
+}
+
+static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
+{
+	this_cpu_inc(sch->cpu_qstats->qlen);
+}
+
+static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
+{
+	this_cpu_dec(sch->cpu_qstats->qlen);
+}
+
+static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
+{
+	this_cpu_inc(sch->cpu_qstats->requeues);
+}
+
 static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
 {
 	sch->qstats.drops += count;
@@ -800,6 +827,14 @@ static inline void rtnl_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
 	qdisc_qstats_drop(sch);
 }
 
+static inline int qdisc_drop_cpu(struct sk_buff *skb, struct Qdisc *sch,
+				 struct sk_buff **to_free)
+{
+	__qdisc_drop(skb, to_free);
+	qdisc_qstats_cpu_drop(sch);
+
+	return NET_XMIT_DROP;
+}
 
 static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
 			     struct sk_buff **to_free)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (3 preceding siblings ...)
  2017-05-02 15:26 ` [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers John Fastabend
@ 2017-05-02 15:26 ` John Fastabend
  2017-05-02 15:26 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

Enable dflt qdisc support for per cpu stats before this patch a dflt
qdisc was required to use the global statistics qstats and bstats.

This adds a static flags field to qdisc_ops that is propagated
into qdisc->flags in qdisc allocate call. This allows the allocation
block to completely allocate the qdisc object so we don't have
dangling allocations after qdisc init.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    1 +
 net/sched/sch_generic.c   |   16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9531b81..83952af 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -172,6 +172,7 @@ struct Qdisc_ops {
 	const struct Qdisc_class_ops	*cl_ops;
 	char			id[IFNAMSIZ];
 	int			priv_size;
+	unsigned int		static_flags;
 
 	int 			(*enqueue)(struct sk_buff *skb,
 					   struct Qdisc *sch,
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 2f66b1f..8a03738 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -625,6 +625,19 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	qdisc_skb_head_init(&sch->q);
 	spin_lock_init(&sch->q.lock);
 
+	if (ops->static_flags & TCQ_F_CPUSTATS) {
+		sch->cpu_bstats =
+			netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+		if (!sch->cpu_bstats)
+			goto errout1;
+
+		sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+		if (!sch->cpu_qstats) {
+			free_percpu(sch->cpu_bstats);
+			goto errout1;
+		}
+	}
+
 	spin_lock_init(&sch->busylock);
 	lockdep_set_class(&sch->busylock,
 			  dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
@@ -634,6 +647,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 			  dev->qdisc_running_key ?: &qdisc_running_key);
 
 	sch->ops = ops;
+	sch->flags = ops->static_flags;
 	sch->enqueue = ops->enqueue;
 	sch->dequeue = ops->dequeue;
 	sch->dev_queue = dev_queue;
@@ -641,6 +655,8 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	atomic_set(&sch->refcnt, 1);
 
 	return sch;
+errout1:
+	kfree(p);
 errout:
 	return ERR_PTR(err);
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (4 preceding siblings ...)
  2017-05-02 15:26 ` [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
@ 2017-05-02 15:26 ` John Fastabend
  2017-05-02 15:27 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

This work is preparing the qdisc layer to support egress lockless
qdiscs. If we are running the egress qdisc lockless in the case we
overrun the netdev, for whatever reason, the netdev returns a busy
error code and the skb is parked on the gso_skb pointer. With many
cores all hitting this case at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.

This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   20 +++++++----
 net/sched/sch_generic.c   |   81 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 80 insertions(+), 21 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83952af..3021bbb 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -85,7 +85,7 @@ struct Qdisc {
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
 	 */
-	struct sk_buff		*gso_skb ____cacheline_aligned_in_smp;
+	struct sk_buff_head	gso_skb ____cacheline_aligned_in_smp;
 	struct qdisc_skb_head	q;
 	struct gnet_stats_basic_packed bstats;
 	seqcount_t		running;
@@ -754,26 +754,30 @@ static inline struct sk_buff *qdisc_peek_head(struct Qdisc *sch)
 /* generic pseudo peek method for non-work-conserving qdisc */
 static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
 {
+	struct sk_buff *skb = skb_peek(&sch->gso_skb);
+
 	/* we can reuse ->gso_skb because peek isn't called for root qdiscs */
-	if (!sch->gso_skb) {
-		sch->gso_skb = sch->dequeue(sch);
-		if (sch->gso_skb) {
+	if (!skb) {
+		skb = sch->dequeue(sch);
+
+		if (skb) {
+			__skb_queue_head(&sch->gso_skb, skb);
 			/* it's still part of the queue */
-			qdisc_qstats_backlog_inc(sch, sch->gso_skb);
+			qdisc_qstats_backlog_inc(sch, skb);
 			sch->q.qlen++;
 		}
 	}
 
-	return sch->gso_skb;
+	return skb;
 }
 
 /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
 static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 {
-	struct sk_buff *skb = sch->gso_skb;
+	struct sk_buff *skb = skb_peek(&sch->gso_skb);
 
 	if (skb) {
-		sch->gso_skb = NULL;
+		skb = __skb_dequeue(&sch->gso_skb);
 		qdisc_qstats_backlog_dec(sch, skb);
 		sch->q.qlen--;
 	} else {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 8a03738..f179fc9 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -44,10 +44,9 @@
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
-
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
-	q->gso_skb = skb;
+	__skb_queue_head(&q->gso_skb, skb);
 	q->qstats.requeues++;
 	qdisc_qstats_backlog_inc(q, skb);
 	q->q.qlen++;	/* it's still part of the queue */
@@ -56,6 +55,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 	return 0;
 }
 
+static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
+{
+	spinlock_t *lock = qdisc_lock(q);
+
+	spin_lock(lock);
+	__skb_queue_tail(&q->gso_skb, skb);
+	spin_unlock(lock);
+
+	qdisc_qstats_cpu_requeues_inc(q);
+	qdisc_qstats_cpu_backlog_inc(q, skb);
+	qdisc_qstats_cpu_qlen_inc(q);
+	__netif_schedule(q);
+
+	return 0;
+}
+
+static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+	if (q->flags & TCQ_F_NOLOCK)
+		return dev_requeue_skb_locked(skb, q);
+	else
+		return __dev_requeue_skb(skb, q);
+}
+
 static void try_bulk_dequeue_skb(struct Qdisc *q,
 				 struct sk_buff *skb,
 				 const struct netdev_queue *txq,
@@ -111,23 +134,50 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 				   int *packets)
 {
-	struct sk_buff *skb = q->gso_skb;
 	const struct netdev_queue *txq = q->dev_queue;
+	struct sk_buff *skb;
 
 	*packets = 1;
-	if (unlikely(skb)) {
+	if (unlikely(!skb_queue_empty(&q->gso_skb))) {
+		spinlock_t *lock = NULL;
+
+		if (q->flags & TCQ_F_NOLOCK) {
+			lock = qdisc_lock(q);
+			spin_lock(lock);
+		}
+
+		skb = skb_peek(&q->gso_skb);
+
+		/* skb may be null if another cpu pulls gso_skb off in between
+		 * empty check and lock.
+		 */
+		if (!skb) {
+			if (lock)
+				spin_unlock(lock);
+			goto validate;
+		}
+
 		/* skb in gso_skb were already validated */
 		*validate = false;
 		/* check the reason of requeuing without tx lock first */
 		txq = skb_get_tx_queue(txq->dev, skb);
 		if (!netif_xmit_frozen_or_stopped(txq)) {
-			q->gso_skb = NULL;
-			qdisc_qstats_backlog_dec(q, skb);
-			q->q.qlen--;
+			skb = __skb_dequeue(&q->gso_skb);
+			if (qdisc_is_percpu_stats(q)) {
+				qdisc_qstats_cpu_backlog_dec(q, skb);
+				qdisc_qstats_cpu_qlen_dec(q);
+			} else {
+				qdisc_qstats_backlog_dec(q, skb);
+				q->q.qlen--;
+			}
 		} else
 			skb = NULL;
+
+		if (lock)
+			spin_unlock(lock);
 		return skb;
 	}
+validate:
 	*validate = true;
 	skb = q->skb_bad_txq;
 	if (unlikely(skb)) {
@@ -622,6 +672,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
 		sch->padded = (char *) sch - (char *) p;
 	}
+	__skb_queue_head_init(&sch->gso_skb);
 	qdisc_skb_head_init(&sch->q);
 	spin_lock_init(&sch->q.lock);
 
@@ -690,6 +741,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 void qdisc_reset(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops *ops = qdisc->ops;
+	struct sk_buff *skb, *tmp;
 
 	if (ops->reset)
 		ops->reset(qdisc);
@@ -697,10 +749,9 @@ void qdisc_reset(struct Qdisc *qdisc)
 	kfree_skb(qdisc->skb_bad_txq);
 	qdisc->skb_bad_txq = NULL;
 
-	if (qdisc->gso_skb) {
-		kfree_skb_list(qdisc->gso_skb);
-		qdisc->gso_skb = NULL;
-	}
+	skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
+		kfree_skb_list(skb);
+
 	qdisc->q.qlen = 0;
 }
 EXPORT_SYMBOL(qdisc_reset);
@@ -720,6 +771,7 @@ static void qdisc_rcu_free(struct rcu_head *head)
 void qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
+	struct sk_buff *skb, *tmp;
 
 	if (qdisc->flags & TCQ_F_BUILTIN ||
 	    !atomic_dec_and_test(&qdisc->refcnt))
@@ -739,7 +791,9 @@ void qdisc_destroy(struct Qdisc *qdisc)
 	module_put(ops->owner);
 	dev_put(qdisc_dev(qdisc));
 
-	kfree_skb_list(qdisc->gso_skb);
+	skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
+		kfree_skb_list(skb);
+
 	kfree_skb(qdisc->skb_bad_txq);
 	/*
 	 * gen_estimator est_timer() might access qdisc->q.lock,
@@ -971,6 +1025,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
 
 	rcu_assign_pointer(dev_queue->qdisc, qdisc);
 	dev_queue->qdisc_sleeping = qdisc;
+	__skb_queue_head_init(&qdisc->gso_skb);
 }
 
 void dev_init_scheduler(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (5 preceding siblings ...)
  2017-05-02 15:26 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
@ 2017-05-02 15:27 ` John Fastabend
  2017-05-02 15:27 ` [RFC PATCH 08/17] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy'
operation is called on the old qdisc. The destroy operation will wait
a rcu grace period and call qdisc_rcu_free(). At which point
gso_cpu_skb is free'd along with all stats so no need to zero stats
and gso_cpu_skb from the graft operation itself.

Further after dropping the qdisc locks we can not continue to call
qdisc_reset before waiting an rcu grace period so that the qdisc is
detached from all cpus. By removing the qdisc_reset() here we get
the correct property of waiting an rcu grace period and letting the
qdisc_destroy operation clean up the qdisc correctly.

Note, a refcnt greater than 1 would cause the destroy operation to
be aborted however if this ever happened the reference to the qdisc
would be lost and we would have a memory leak.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |   28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f179fc9..37cd54e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -813,10 +813,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 	root_lock = qdisc_lock(oqdisc);
 	spin_lock_bh(root_lock);
 
-	/* Prune old scheduler */
-	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
-		qdisc_reset(oqdisc);
-
 	/* ... and graft new one */
 	if (qdisc == NULL)
 		qdisc = &noop_qdisc;
@@ -971,6 +967,16 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 	return false;
 }
 
+static void dev_qdisc_reset(struct net_device *dev,
+			    struct netdev_queue *dev_queue,
+			    void *none)
+{
+	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+	if (qdisc)
+		qdisc_reset(qdisc);
+}
+
 /**
  * 	dev_deactivate_many - deactivate transmissions on several devices
  * 	@head: list of devices to deactivate
@@ -981,7 +987,6 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 void dev_deactivate_many(struct list_head *head)
 {
 	struct net_device *dev;
-	bool sync_needed = false;
 
 	list_for_each_entry(dev, head, close_list) {
 		netdev_for_each_tx_queue(dev, dev_deactivate_queue,
@@ -991,20 +996,25 @@ void dev_deactivate_many(struct list_head *head)
 					     &noop_qdisc);
 
 		dev_watchdog_down(dev);
-		sync_needed |= !dev->dismantle;
 	}
 
 	/* Wait for outstanding qdisc-less dev_queue_xmit calls.
 	 * This is avoided if all devices are in dismantle phase :
 	 * Caller will call synchronize_net() for us
 	 */
-	if (sync_needed)
-		synchronize_net();
+	synchronize_net();
 
 	/* Wait for outstanding qdisc_run calls. */
-	list_for_each_entry(dev, head, close_list)
+	list_for_each_entry(dev, head, close_list) {
 		while (some_qdisc_is_busy(dev))
 			yield();
+		/* The new qdisc is assigned at this point so we can safely
+		 * unwind stale skb lists and qdisc statistics
+		 */
+		netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+		if (dev_ingress_queue(dev))
+			dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL);
+	}
 }
 
 void dev_deactivate(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 08/17] net: sched: support skb_bad_tx with lockless qdisc
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (6 preceding siblings ...)
  2017-05-02 15:27 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
@ 2017-05-02 15:27 ` John Fastabend
  2017-05-02 15:27 ` [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check John Fastabend
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

Similar to how gso is handled skb_bad_tx needs to be per cpu to handle
lockless qdisc with multiple writer/producers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    2 -
 net/sched/sch_generic.c   |  100 ++++++++++++++++++++++++++++++++++++---------
 2 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3021bbb..4288222 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -92,7 +92,7 @@ struct Qdisc {
 	struct gnet_stats_queue	qstats;
 	unsigned long		state;
 	struct Qdisc            *next_sched;
-	struct sk_buff		*skb_bad_txq;
+	struct sk_buff_head	skb_bad_txq;
 	struct rcu_head		rcu_head;
 	int			padded;
 	atomic_t		refcnt;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 37cd54e..d4194c2 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -44,6 +44,68 @@
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
+
+static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
+{
+	const struct netdev_queue *txq = q->dev_queue;
+	spinlock_t *lock = NULL;
+	struct sk_buff *skb;
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		lock = qdisc_lock(q);
+		spin_lock(lock);
+	}
+
+	skb = skb_peek(&q->skb_bad_txq);
+	if (skb) {
+		/* check the reason of requeuing without tx lock first */
+		txq = skb_get_tx_queue(txq->dev, skb);
+		if (!netif_xmit_frozen_or_stopped(txq)) {
+			skb = __skb_dequeue(&q->skb_bad_txq);
+			if (qdisc_is_percpu_stats(q)) {
+				qdisc_qstats_cpu_backlog_dec(q, skb);
+				qdisc_qstats_cpu_qlen_dec(q);
+			} else {
+				qdisc_qstats_backlog_dec(q, skb);
+				q->q.qlen--;
+			}
+		} else {
+			skb = NULL;
+		}
+	}
+
+	if (lock)
+		spin_unlock(lock);
+
+	return skb;
+}
+
+static inline struct sk_buff *qdisc_dequeue_skb_bad_txq(struct Qdisc *q)
+{
+	struct sk_buff *skb = skb_peek(&q->skb_bad_txq);
+
+	if (unlikely(skb))
+		skb = __skb_dequeue_bad_txq(q);
+
+	return skb;
+}
+
+static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
+					     struct sk_buff *skb)
+{
+	spinlock_t *lock = NULL;
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		lock = qdisc_lock(q);
+		spin_lock(lock);
+	}
+
+	__skb_queue_tail(&q->skb_bad_txq, skb);
+
+	if (lock)
+		spin_unlock(lock);
+}
+
 static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
 	__skb_queue_head(&q->gso_skb, skb);
@@ -116,9 +178,15 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 		if (!nskb)
 			break;
 		if (unlikely(skb_get_queue_mapping(nskb) != mapping)) {
-			q->skb_bad_txq = nskb;
-			qdisc_qstats_backlog_inc(q, nskb);
-			q->q.qlen++;
+			qdisc_enqueue_skb_bad_txq(q, nskb);
+
+			if (qdisc_is_percpu_stats(q)) {
+				qdisc_qstats_cpu_backlog_inc(q, nskb);
+				qdisc_qstats_cpu_qlen_inc(q);
+			} else {
+				qdisc_qstats_backlog_inc(q, nskb);
+				q->q.qlen++;
+			}
 			break;
 		}
 		skb->next = nskb;
@@ -179,18 +247,9 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 	}
 validate:
 	*validate = true;
-	skb = q->skb_bad_txq;
-	if (unlikely(skb)) {
-		/* check the reason of requeuing without tx lock first */
-		txq = skb_get_tx_queue(txq->dev, skb);
-		if (!netif_xmit_frozen_or_stopped(txq)) {
-			q->skb_bad_txq = NULL;
-			qdisc_qstats_backlog_dec(q, skb);
-			q->q.qlen--;
-			goto bulk;
-		}
-		return NULL;
-	}
+	skb = qdisc_dequeue_skb_bad_txq(q);
+	if (unlikely(skb))
+		goto bulk;
 	if (!(q->flags & TCQ_F_ONETXQUEUE) ||
 	    !netif_xmit_frozen_or_stopped(txq))
 		skb = q->dequeue(q);
@@ -673,6 +732,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 		sch->padded = (char *) sch - (char *) p;
 	}
 	__skb_queue_head_init(&sch->gso_skb);
+	__skb_queue_head_init(&sch->skb_bad_txq);
 	qdisc_skb_head_init(&sch->q);
 	spin_lock_init(&sch->q.lock);
 
@@ -746,12 +806,12 @@ void qdisc_reset(struct Qdisc *qdisc)
 	if (ops->reset)
 		ops->reset(qdisc);
 
-	kfree_skb(qdisc->skb_bad_txq);
-	qdisc->skb_bad_txq = NULL;
-
 	skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
 		kfree_skb_list(skb);
 
+	skb_queue_walk_safe(&qdisc->skb_bad_txq, skb, tmp)
+		kfree_skb_list(skb);
+
 	qdisc->q.qlen = 0;
 }
 EXPORT_SYMBOL(qdisc_reset);
@@ -793,8 +853,9 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	skb_queue_walk_safe(&qdisc->gso_skb, skb, tmp)
 		kfree_skb_list(skb);
+	skb_queue_walk_safe(&qdisc->skb_bad_txq, skb, tmp)
+		kfree_skb_list(skb);
 
-	kfree_skb(qdisc->skb_bad_txq);
 	/*
 	 * gen_estimator est_timer() might access qdisc->q.lock,
 	 * wait a RCU grace period before freeing qdisc.
@@ -1036,6 +1097,7 @@ static void dev_init_scheduler_queue(struct net_device *dev,
 	rcu_assign_pointer(dev_queue->qdisc, qdisc);
 	dev_queue->qdisc_sleeping = qdisc;
 	__skb_queue_head_init(&qdisc->gso_skb);
+	__skb_queue_head_init(&qdisc->skb_bad_txq);
 }
 
 void dev_init_scheduler(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (7 preceding siblings ...)
  2017-05-02 15:27 ` [RFC PATCH 08/17] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
@ 2017-05-02 15:27 ` John Fastabend
  2017-05-02 15:27 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

I can not think of any good reason to pull the bad txq skb off the
qdisc if the txq we plan to send this on is still frozen. So check
for frozen queue first and abort before dequeuing either skb_bad_txq
skb or normal qdisc dequeue() skb.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index d4194c2..db5f7a0 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -203,7 +203,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 				   int *packets)
 {
 	const struct netdev_queue *txq = q->dev_queue;
-	struct sk_buff *skb;
+	struct sk_buff *skb = NULL;
 
 	*packets = 1;
 	if (unlikely(!skb_queue_empty(&q->gso_skb))) {
@@ -247,12 +247,15 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 	}
 validate:
 	*validate = true;
+
+	if ((q->flags & TCQ_F_ONETXQUEUE) &&
+	    netif_xmit_frozen_or_stopped(txq))
+		return skb;
+
 	skb = qdisc_dequeue_skb_bad_txq(q);
 	if (unlikely(skb))
 		goto bulk;
-	if (!(q->flags & TCQ_F_ONETXQUEUE) ||
-	    !netif_xmit_frozen_or_stopped(txq))
-		skb = q->dequeue(q);
+	skb = q->dequeue(q);
 	if (skb) {
 bulk:
 		if (qdisc_may_bulk(q))

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (8 preceding siblings ...)
  2017-05-02 15:27 ` [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check John Fastabend
@ 2017-05-02 15:27 ` John Fastabend
  2017-05-02 15:28 ` [RFC PATCH 11/17] net: sched: helper to sum qlen John Fastabend
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

Add qdisc qlen helper routines for lockless qdiscs to use.

The qdisc qlen is no longer used in the hotpath but it is reported
via stats query on the qdisc so it still needs to be tracked. This
adds the per cpu operations needed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4288222..4c446fe 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -257,8 +257,16 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 	BUILD_BUG_ON(sizeof(qcb->data) < sz);
 }
 
+static inline int qdisc_qlen_cpu(const struct Qdisc *q)
+{
+	return this_cpu_ptr(q->cpu_qstats)->qlen;
+}
+
 static inline int qdisc_qlen(const struct Qdisc *q)
 {
+	if (q->flags & TCQ_F_NOLOCK)
+		return qdisc_qlen_cpu(q);
+
 	return q->q.qlen;
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 11/17] net: sched: helper to sum qlen
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (9 preceding siblings ...)
  2017-05-02 15:27 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
@ 2017-05-02 15:28 ` John Fastabend
  2017-05-02 15:28 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

Reporting qlen when qlen is per cpu requires aggregating the per
cpu counters. This adds a helper routine for this.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   15 +++++++++++++++
 net/sched/sch_api.c       |    3 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4c446fe..5ae07dd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -270,6 +270,21 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 	return q->q.qlen;
 }
 
+static inline int qdisc_qlen_sum(const struct Qdisc *q)
+{
+	__u32 qlen = 0;
+	int i;
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		for_each_possible_cpu(i)
+			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+	} else {
+		qlen = q->q.qlen;
+	}
+
+	return qlen;
+}
+
 static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
 {
 	return (struct qdisc_skb_cb *)skb->cb;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index bbe57d5..1bb9c89 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1378,7 +1378,8 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		goto nla_put_failure;
 	if (q->ops->dump && q->ops->dump(q, skb) < 0)
 		goto nla_put_failure;
-	qlen = q->q.qlen;
+
+	qlen = qdisc_qlen_sum(q);
 
 	stab = rtnl_dereference(q->stab);
 	if (stab && qdisc_dump_stab(skb, stab) < 0)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (10 preceding siblings ...)
  2017-05-02 15:28 ` [RFC PATCH 11/17] net: sched: helper to sum qlen John Fastabend
@ 2017-05-02 15:28 ` John Fastabend
  2017-05-02 15:28 ` [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Also exports __gnet_stats_copy_queue() to use as a helper function.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/gen_stats.h |    3 +++
 net/core/gen_stats.c    |    9 +++++----
 net/sched/sch_mq.c      |   25 ++++++++++++++++++-------
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 8b7aa37..2bf0ec2 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -48,6 +48,9 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
 int gnet_stats_copy_queue(struct gnet_dump *d,
 			  struct gnet_stats_queue __percpu *cpu_q,
 			  struct gnet_stats_queue *q, __u32 qlen);
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+			     const struct gnet_stats_queue __percpu *cpu_q,
+			     const struct gnet_stats_queue *q, __u32 qlen);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 87f2855..b2b2323b 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -252,10 +252,10 @@
 	}
 }
 
-static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
-				    const struct gnet_stats_queue __percpu *cpu,
-				    const struct gnet_stats_queue *q,
-				    __u32 qlen)
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+			     const struct gnet_stats_queue __percpu *cpu,
+			     const struct gnet_stats_queue *q,
+			     __u32 qlen)
 {
 	if (cpu) {
 		__gnet_stats_copy_queue_cpu(qstats, cpu);
@@ -269,6 +269,7 @@ static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
 
 	qstats->qlen = qlen;
 }
+EXPORT_SYMBOL(__gnet_stats_copy_queue);
 
 /**
  * gnet_stats_copy_queue - copy queue statistics into statistics TLV
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index cadfdd4..7871db4 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -17,6 +17,7 @@
 #include <linux/skbuff.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/sch_generic.h>
 
 struct mq_sched {
 	struct Qdisc		**qdiscs;
@@ -103,15 +104,25 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+		struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+		__u32 qlen = 0;
+
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
-		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+
+		if (qdisc_is_percpu_stats(qdisc)) {
+			cpu_bstats = qdisc->cpu_bstats;
+			cpu_qstats = qdisc->cpu_qstats;
+		}
+
+		qlen = qdisc_qlen_sum(qdisc);
+
+		__gnet_stats_copy_basic(NULL, &sch->bstats,
+					cpu_bstats, &qdisc->bstats);
+		__gnet_stats_copy_queue(&sch->qstats,
+					cpu_qstats, &qdisc->qstats, qlen);
+
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (11 preceding siblings ...)
  2017-05-02 15:28 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
@ 2017-05-02 15:28 ` John Fastabend
  2017-05-02 15:29 ` [RFC PATCH 14/17] net: skb_array: expose peek API John Fastabend
  2017-05-02 15:32 ` [RFC PATCH 00/17] latest qdisc patch series John Fastabend
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_mqprio.c |   61 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 0a4cf27..d2f33ac 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -233,22 +233,32 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned char *b = skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
 	struct Qdisc *qdisc;
-	unsigned int i;
+	unsigned int ntx, tc;
 
 	sch->q.qlen = 0;
 	memset(&sch->bstats, 0, sizeof(sch->bstats));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
+	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+		struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+		__u32 qlen = 0;
+
+		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
-		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+
+		if (qdisc_is_percpu_stats(qdisc)) {
+			cpu_bstats = qdisc->cpu_bstats;
+			cpu_qstats = qdisc->cpu_qstats;
+		}
+
+		qlen = qdisc_qlen_sum(qdisc);
+
+		__gnet_stats_copy_basic(NULL, &sch->bstats,
+					cpu_bstats, &qdisc->bstats);
+		__gnet_stats_copy_queue(&sch->qstats,
+					cpu_qstats, &qdisc->qstats, qlen);
+
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 
@@ -256,9 +266,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map));
 	opt.hw = priv->hw_offload;
 
-	for (i = 0; i < netdev_get_num_tc(dev); i++) {
-		opt.count[i] = dev->tc_to_txq[i].count;
-		opt.offset[i] = dev->tc_to_txq[i].offset;
+	for (tc = 0; tc < netdev_get_num_tc(dev); tc++) {
+		opt.count[tc] = dev->tc_to_txq[tc].count;
+		opt.offset[tc] = dev->tc_to_txq[tc].offset;
 	}
 
 	if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt))
@@ -336,7 +346,6 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	if (cl <= netdev_get_num_tc(dev)) {
 		int i;
 		__u32 qlen = 0;
-		struct Qdisc *qdisc;
 		struct gnet_stats_queue qstats = {0};
 		struct gnet_stats_basic_packed bstats = {0};
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl - 1];
@@ -351,18 +360,26 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
 			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+			struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
+			struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+			struct gnet_stats_queue __percpu *cpu_qstats = NULL;
 
-			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
-			qlen		  += qdisc->q.qlen;
-			bstats.bytes      += qdisc->bstats.bytes;
-			bstats.packets    += qdisc->bstats.packets;
-			qstats.backlog    += qdisc->qstats.backlog;
-			qstats.drops      += qdisc->qstats.drops;
-			qstats.requeues   += qdisc->qstats.requeues;
-			qstats.overlimits += qdisc->qstats.overlimits;
+			if (qdisc_is_percpu_stats(qdisc)) {
+				cpu_bstats = qdisc->cpu_bstats;
+				cpu_qstats = qdisc->cpu_qstats;
+			}
+
+			qlen = qdisc_qlen_sum(qdisc);
+			__gnet_stats_copy_basic(NULL, &sch->bstats,
+						cpu_bstats, &qdisc->bstats);
+			__gnet_stats_copy_queue(&sch->qstats,
+						cpu_qstats,
+						&qdisc->qstats,
+						qlen);
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
+
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
 			spin_lock_bh(d->lock);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 14/17] net: skb_array: expose peek API
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (12 preceding siblings ...)
  2017-05-02 15:28 ` [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
@ 2017-05-02 15:29 ` John Fastabend
  2017-05-02 15:32 ` [RFC PATCH 00/17] latest qdisc patch series John Fastabend
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, john.fastabend

This adds a peek routine to skb_array.h for use with qdisc.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/skb_array.h |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index f4dfade..33f1f0c 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -72,6 +72,11 @@ static inline bool __skb_array_empty(struct skb_array *a)
 	return !__ptr_ring_peek(&a->ring);
 }
 
+static inline struct sk_buff *skb_array_peek(struct skb_array *a)
+{
+	return __ptr_ring_peek(&a->ring);
+}
+
 static inline bool skb_array_empty(struct skb_array *a)
 {
 	return ptr_ring_empty(&a->ring);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 00/17] latest qdisc patch series
  2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
                   ` (13 preceding siblings ...)
  2017-05-02 15:29 ` [RFC PATCH 14/17] net: skb_array: expose peek API John Fastabend
@ 2017-05-02 15:32 ` John Fastabend
  14 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-05-02 15:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

On 17-05-02 08:24 AM, John Fastabend wrote:
> This is my latest series of patches to remove the locking requirement
> from qdisc logic. 
> 

Dang, unexpected git error mid series I'll send a v2. Sorry for the
noise :(

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking
  2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
@ 2017-11-13 20:08 ` John Fastabend
  0 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2017-11-13 20:08 UTC (permalink / raw)
  To: willemdebruijn.kernel, daniel, eric.dumazet
  Cc: make0818, netdev, jiri, xiyou.wangcong

This patch adds a flag for queueing disciplines to indicate the stack
does not need to use the qdisc lock to protect operations. This can
be used to build lockless scheduling algorithms and improving
performance.

The flag is checked in the tx path and the qdisc lock is only taken
if it is not set. For now use a conditional if statement. Later we
could be more aggressive if it proves worthwhile and use a static key
or wrap this in a likely().

Also the lockless case drops the TCQ_F_CAN_BYPASS logic. The reason
for this is synchronizing a qlen counter across threads proves to
cost more than doing the enqueue/dequeue operations when tested with
pktgen.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 0 files changed

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 65d0d25..bb806a0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -71,6 +71,7 @@ struct Qdisc {
 				      * qdisc_tree_decrease_qlen() should stop.
 				      */
 #define TCQ_F_INVISIBLE		0x80 /* invisible by default in dump */
+#define TCQ_F_NOLOCK		0x100 /* qdisc does not require locking */
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 9f5fa78..e03bee6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3179,6 +3179,21 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	int rc;
 
 	qdisc_calculate_pkt_len(skb, q);
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
+			__qdisc_drop(skb, &to_free);
+			rc = NET_XMIT_DROP;
+		} else {
+			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+			__qdisc_run(q);
+		}
+
+		if (unlikely(to_free))
+			kfree_skb_list(to_free);
+		return rc;
+	}
+
 	/*
 	 * Heuristic to force contended enqueues to serialize on a
 	 * separate lock before trying to get qdisc main lock.
@@ -4161,19 +4176,22 @@ static __latent_entropy void net_tx_action(struct softirq_action *h)
 
 		while (head) {
 			struct Qdisc *q = head;
-			spinlock_t *root_lock;
+			spinlock_t *root_lock = NULL;
 
 			head = head->next_sched;
 
-			root_lock = qdisc_lock(q);
-			spin_lock(root_lock);
+			if (!(q->flags & TCQ_F_NOLOCK)) {
+				root_lock = qdisc_lock(q);
+				spin_lock(root_lock);
+			}
 			/* We need to make sure head->next_sched is read
 			 * before clearing __QDISC_STATE_SCHED
 			 */
 			smp_mb__before_atomic();
 			clear_bit(__QDISC_STATE_SCHED, &q->state);
 			qdisc_run(q);
-			spin_unlock(root_lock);
+			if (root_lock)
+				spin_unlock(root_lock);
 		}
 	}
 }
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f6803e1..ec757f6 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -174,7 +174,8 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	int ret = NETDEV_TX_BUSY;
 
 	/* And release qdisc */
-	spin_unlock(root_lock);
+	if (root_lock)
+		spin_unlock(root_lock);
 
 	/* Note that we validate skb (GSO, checksum, ...) outside of locks */
 	if (validate)
@@ -187,10 +188,13 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 		HARD_TX_UNLOCK(dev, txq);
 	} else {
-		spin_lock(root_lock);
+		if (root_lock)
+			spin_lock(root_lock);
 		return qdisc_qlen(q);
 	}
-	spin_lock(root_lock);
+
+	if (root_lock)
+		spin_lock(root_lock);
 
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was consumed */
@@ -231,9 +235,9 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
  */
 static inline int qdisc_restart(struct Qdisc *q, int *packets)
 {
+	spinlock_t *root_lock = NULL;
 	struct netdev_queue *txq;
 	struct net_device *dev;
-	spinlock_t *root_lock;
 	struct sk_buff *skb;
 	bool validate;
 
@@ -242,7 +246,9 @@ static inline int qdisc_restart(struct Qdisc *q, int *packets)
 	if (unlikely(!skb))
 		return 0;
 
-	root_lock = qdisc_lock(q);
+	if (!(q->flags & TCQ_F_NOLOCK))
+		root_lock = qdisc_lock(q);
+
 	dev = qdisc_dev(q);
 	txq = skb_get_tx_queue(dev, skb);
 
@@ -880,14 +886,18 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 
 		dev_queue = netdev_get_tx_queue(dev, i);
 		q = dev_queue->qdisc_sleeping;
-		root_lock = qdisc_lock(q);
 
-		spin_lock_bh(root_lock);
+		if (q->flags & TCQ_F_NOLOCK) {
+			val = test_bit(__QDISC_STATE_SCHED, &q->state);
+		} else {
+			root_lock = qdisc_lock(q);
+			spin_lock_bh(root_lock);
 
-		val = (qdisc_is_running(q) ||
-		       test_bit(__QDISC_STATE_SCHED, &q->state));
+			val = (qdisc_is_running(q) ||
+			       test_bit(__QDISC_STATE_SCHED, &q->state));
 
-		spin_unlock_bh(root_lock);
+			spin_unlock_bh(root_lock);
+		}
 
 		if (val)
 			return true;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-11-13 20:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-02 15:24 [RFC PATCH 00/17] latest qdisc patch series John Fastabend
2017-05-02 15:25 ` [RFC PATCH 01/17] net: sched: cleanup qdisc_run and __qdisc_run semantics John Fastabend
2017-05-02 15:25 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend
2017-05-02 15:25 ` [RFC PATCH 03/17] net: sched: remove remaining uses for qdisc_qlen in xmit path John Fastabend
2017-05-02 15:26 ` [RFC PATCH 04/17] net: sched: provide per cpu qstat helpers John Fastabend
2017-05-02 15:26 ` [RFC PATCH 05/17] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
2017-05-02 15:26 ` [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback John Fastabend
2017-05-02 15:27 ` [RFC PATCH 07/17] net: sched: drop qdisc_reset from dev_graft_qdisc John Fastabend
2017-05-02 15:27 ` [RFC PATCH 08/17] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
2017-05-02 15:27 ` [RFC PATCH 09/17] net: sched: check for frozen queue before skb_bad_txq check John Fastabend
2017-05-02 15:27 ` [RFC PATCH 10/17] net: sched: qdisc_qlen for per cpu logic John Fastabend
2017-05-02 15:28 ` [RFC PATCH 11/17] net: sched: helper to sum qlen John Fastabend
2017-05-02 15:28 ` [RFC PATCH 12/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
2017-05-02 15:28 ` [RFC PATCH 13/17] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
2017-05-02 15:29 ` [RFC PATCH 14/17] net: skb_array: expose peek API John Fastabend
2017-05-02 15:32 ` [RFC PATCH 00/17] latest qdisc patch series John Fastabend
2017-11-13 20:07 [RFC PATCH 00/17] lockless qdisc John Fastabend
2017-11-13 20:08 ` [RFC PATCH 02/17] net: sched: allow qdiscs to handle locking John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.