All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] Series short description
@ 2016-08-17 19:33 John Fastabend
  2016-08-17 19:33 ` [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking John Fastabend
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:33 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

I've been working on this for a bit now figured its time for a v2 RFC. As
usual any comments, suggestions, observations, musings, etc are appreciated.

Latest round of lockless qdisc patch set with performance metric primarily
using pktgen to inject pkts into the qdisc layer. Some simple netperf tests
below as well but those need to be done correctly.

This v2 RFC version fixes a couple flaws in the original series. The first
major one was that the per_cpu accounting of qlen is not correct with respect
to the qdisc bypass. Using per cpu counters for qlen allows a flow to be
enqueuing on the packets into the qdisc and then get scheduled on another
core and bypass the qdisc completely if that core is not in use. I've reworked
the logic to use an atomic which is _correct_ now but unfortunately costs
a lot in performance. With a single pfifo_fast and 12 threads of pktgen
I still see a ~200k pps improvement even with atomic accounting so it is
still a win but nothing like the +1Mpps without the atomic accounting.

On the mq tests it atomic vs per cpu seems to be in the noise I believe
because mq qdisc is already aligned with a pfifo_fast qdisc per core
with the XPS setup I'm running mapping 1:1.

Any thoughts around this would be interesting to hear. My general thinking
around this is to submit the atomic version for inclusion and then start
to improve it with a few items listed below.

Additionally I've added a __netif_schedule() to the bad_skb_tx path
otherwise I observed a pkt getting stuck on the bad_txq_cpu path on
the pointer and sitting in the qdisc structure until it was kicked again
from another pkt or netif_schedule. And on the netif_schedule() topic
to support per cpu handling of gso and bad_txq_cpu we have to allow
the netif_schedule() logic to fire on a per cpu model as well.

Otherwise a bunch of small stylistic changes were made and I still need
to do another pass to catch checkpatch warnings/errors and try to do a bit
more cleanup around the statistics if/else branching. This series also
has both the atomic qlen code and the per cpu qlen code as I continue
to think up some scheme around the atomic qlen issue.

But this series seems to be working.

Future work is the following,

	- convert all qdiscs over to per cpu handling and cleanup the
	  rather ugly if/else statistics handling. Although a bit of
	  work its mechanical and should help some.

	- I'm looking at fq_codel to see how to make it "lockless".

	- It seems we can drop the TX_HARD_LOCK on cases where the
	  nic exposes a queue per core now that we have enqueue/dequeue
	  decoupled. The idea being a bunch of threads enqueue and per
	  core dequeue logic runs. Requires XPS to be setup.

	- qlen improvements somehow

	- look at improvements to the skb_array structure. We can look
	  at drop in replacements and/or improving it.


Below is the data I took from pktgen,

./samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -t $NUM -i eth3

I did a run of 4 each time and took the total summation of each
thread. There are four different runs for the mq and pfifo_fast
cases. "without qlen atomic" uses per queue qlen values and allows
bypassing the qdisc via bypass flag, this is incorrect but shows
the impact of having an atomic in the mix. "with qlen atomic" shows
the correct implementation with atomics and bypass enabled. And
finally "without qlen atomic and no bypass" uses per cpu qlen
values and disables bypass to ensure ooo packets are not created.
To be clear the submitted patches here are the "with qlen atomic"
metrics.

nolock pfifo_fast (without qlen atomic)

1:  1440293 1421602 1409553 1393469 1424543
2:  1754890 1819292 1727948 1797711 1743427
4:  3282665 3344095 3315220 3332777 3348972
8:  2940079 1644450 2950777 2922085 2946310
12: 2042084 2610060 2857581 3493162 3104611

nolock pfifo_fast (with qlen atomic)
1:  1425231 1417176 1402862 1432880
2:  1631437 1633398 1630867 1628816
4:  1704383 1709900 1706274 1710198
8:  1348672 1344343 1339072 1334288
12: 1262988 1280724 1262237 1262615 

nolock pfifo_fast (without qlen atomic and no bypass)
1:  1435796 1458522 1471855 1455658
2:  1880642 1876359 1872879 1884578
4:  1922935 1914589 1912832 1912116
8:  1585055 1576887 1577086 1570236
12: 1479273 1450706 1447056 1466330

lock (pfifo_fast)
1:  1471479 1469142 1458825 1456788 1453952
2:  1746231 1749490 1753176 1753780 1755959
4:  1119626 1120515 1121478 1119220 1121115
8:  1001471  999308 1000318 1000776 1000384
12:  989269  992122  991590  986581  990430

nolock (mq with per cpu qlen)
1:   1435952  1459523  1448860  1385451   1435031
2:   2850662  2855702  2859105  2855443   2843382
4:   5288135  5271192  5252242  5270192   5311642
8:  10042731 10018063  9891813  9968382   9956727
12: 13265277 13384199 13438955 13363771  13436198

nolock (mq with qlen atomic)
1:   1558253  1562285  1555037  1558422
2:   2917449  2952852  2921697  2892313
4:   5518243  5375300  5625724  5219599
8:  10183153 10169389 10163161 10202530
12: 13877976 13459987 13081520 13996757 

nolock (mq with !bypass and per cpu qlen)
1:   1369110  1379992  1359407  1397014
2:   2575546  2557471  2580782  2593226
4:   4632570  4871850  4830725  4968439
8:   8974135  8951107  9134641  9084347
12: 12982673 12737426 12808364 

lock (mq)
1:   1448374  1444208  1437459  1437088  1452453
2:   2687963  2679221  2651059  2691630  2667479
4:   5153884  4684153  5091728  4635261  4902381
8:   9292395  9625869  9681835  9711651  9660498
12: 13553918 13682410 14084055 13946138 13724726

######################################################

A few arbitrary netperf sessions... (TBD lots of sessions, etc).

nolock (mq with !bypass and per cpu qlen)

root@john-Precision-Tower-5810:~# netperf -H 22.1 -t TCP_RR -- -s 128K -S 128K -b 0   
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.1 () port 0 AF_INET : demo : first burst 0
q Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate         
bytes  Bytes  bytes    bytes   secs.    per sec   

262144 262144 1        1       10.00    19910.37   
262144 262144


nolock (pfifo_fast with !bypass and per cpu qlen)

root@john-Precision-Tower-5810:~# netperf -H 22.1 -t TCP_RR -- -s 128K -S 128K -b 0 
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.1 () port 0 AF_INET : demo : first burst 0
fgLocal /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate         
bytes  Bytes  bytes    bytes   secs.    per sec   

262144 262144 1        1       10.00    20358.90   
262144 262144

nolock (mq with qlen atomic)

root@john-Precision-Tower-5810:/home/john/git/kernel.org/master# netperf -H 22.1 -t TCP_RR -- -s 128K -S 128K -b 0 
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.1 () port 0 AF_INET : demo : first burst 0
kLocal /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate         
bytes  Bytes  bytes    bytes   secs.    per sec   

262144 262144 1        1       10.00    20202.38   
262144 262144

nolock (pfifo_fast with qlen_atomic)

root@john-Precision-Tower-5810:/home/john/git/kernel.org/master# netperf -H 22.1 -t TCP_RR -- -s 128K -S 128K -b 0 
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 22.1 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate         
bytes  Bytes  bytes    bytes   secs.    per sec   

262144 262144 1        1       10.00    20059.41   
262144 262144

lock (mq)

TBD


lock (pfifo_fast) 

TBD



---

John Fastabend (13):
      net: sched: allow qdiscs to handle locking
      net: sched: qdisc_qlen for per cpu logic
      net: sched: provide per cpu qstat helpers
      net: sched: provide atomic qlen helpers for bypass case
      net: sched: a dflt qdisc may be used with per cpu stats
      net: sched: per cpu gso handlers
      net: sched: support qdisc_reset on NOLOCK qdisc
      net: sched: support skb_bad_tx with lockless qdisc
      net: sched: helper to sum qlen
      net: sched: lockless support for netif_schedule
      net: sched: pfifo_fast use alf_queue
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio


 include/net/gen_stats.h   |    3 
 include/net/pkt_sched.h   |    4 
 include/net/sch_generic.h |  127 ++++++++++++++
 net/core/dev.c            |   60 +++++--
 net/core/gen_stats.c      |    9 +
 net/sched/sch_api.c       |   21 ++
 net/sched/sch_generic.c   |  404 +++++++++++++++++++++++++++++++++++----------
 net/sched/sch_mq.c        |   25 ++-
 net/sched/sch_mqprio.c    |   61 ++++---
 9 files changed, 577 insertions(+), 137 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
@ 2016-08-17 19:33 ` John Fastabend
  2016-08-17 22:33   ` Eric Dumazet
  2016-08-17 22:34   ` Eric Dumazet
  2016-08-17 19:34 ` [RFC PATCH 02/13] net: sched: qdisc_qlen for per cpu logic John Fastabend
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:33 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

This patch adds a flag for queueing disciplines to indicate the stack
does not need to use the qdisc lock to protect operations. This can
be used to build lockless scheduling algorithms and improving
performance.

The flag is checked in the tx path and the qdisc lock is only taken
if it is not set. For now use a conditional if statement. Later we
could be more aggressive if it proves worthwhile and use a static key
or wrap this in a likely().

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/pkt_sched.h   |    4 +++-
 include/net/sch_generic.h |    1 +
 net/core/dev.c            |   32 ++++++++++++++++++++++++++++----
 net/sched/sch_generic.c   |   26 ++++++++++++++++----------
 4 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7caa99b..69540c6 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -107,8 +107,10 @@ void __qdisc_run(struct Qdisc *q);
 
 static inline void qdisc_run(struct Qdisc *q)
 {
-	if (qdisc_run_begin(q))
+	if (qdisc_run_begin(q)) {
 		__qdisc_run(q);
+		qdisc_run_end(q);
+	}
 }
 
 int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 909aff2..3de6a8c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -58,6 +58,7 @@ struct Qdisc {
 #define TCQ_F_NOPARENT		0x40 /* root of its hierarchy :
 				      * qdisc_tree_decrease_qlen() should stop.
 				      */
+#define TCQ_F_NOLOCK		0x80 /* qdisc does not require locking */
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 4ce07dc..5db395d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3076,6 +3076,26 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	int rc;
 
 	qdisc_calculate_pkt_len(skb, q);
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
+			__qdisc_drop(skb, &to_free);
+			rc = NET_XMIT_DROP;
+		} else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q)) {
+			qdisc_bstats_cpu_update(q, skb);
+			if (sch_direct_xmit(skb, q, dev, txq, root_lock, true))
+				__qdisc_run(q);
+			rc = NET_XMIT_SUCCESS;
+		} else {
+			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+			__qdisc_run(q);
+		}
+
+		if (unlikely(to_free))
+			kfree_skb_list(to_free);
+		return rc;
+	}
+
 	/*
 	 * Heuristic to force contended enqueues to serialize on a
 	 * separate lock before trying to get qdisc main lock.
@@ -3118,6 +3138,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 				contended = false;
 			}
 			__qdisc_run(q);
+			qdisc_run_end(q);
 		}
 	}
 	spin_unlock(root_lock);
@@ -3897,19 +3918,22 @@ static void net_tx_action(struct softirq_action *h)
 
 		while (head) {
 			struct Qdisc *q = head;
-			spinlock_t *root_lock;
+			spinlock_t *root_lock = NULL;
 
 			head = head->next_sched;
 
-			root_lock = qdisc_lock(q);
-			spin_lock(root_lock);
+			if (!(q->flags & TCQ_F_NOLOCK)) {
+				root_lock = qdisc_lock(q);
+				spin_lock(root_lock);
+			}
 			/* We need to make sure head->next_sched is read
 			 * before clearing __QDISC_STATE_SCHED
 			 */
 			smp_mb__before_atomic();
 			clear_bit(__QDISC_STATE_SCHED, &q->state);
 			qdisc_run(q);
-			spin_unlock(root_lock);
+			if (!(q->flags & TCQ_F_NOLOCK))
+				spin_unlock(root_lock);
 		}
 	}
 }
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e95b67c..af32418 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -170,7 +170,8 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 	int ret = NETDEV_TX_BUSY;
 
 	/* And release qdisc */
-	spin_unlock(root_lock);
+	if (!(q->flags & TCQ_F_NOLOCK))
+		spin_unlock(root_lock);
 
 	/* Note that we validate skb (GSO, checksum, ...) outside of locks */
 	if (validate)
@@ -183,10 +184,13 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 		HARD_TX_UNLOCK(dev, txq);
 	} else {
-		spin_lock(root_lock);
+		if (!(q->flags & TCQ_F_NOLOCK))
+			spin_lock(root_lock);
 		return qdisc_qlen(q);
 	}
-	spin_lock(root_lock);
+
+	if (!(q->flags & TCQ_F_NOLOCK))
+		spin_lock(root_lock);
 
 	if (dev_xmit_complete(ret)) {
 		/* Driver sent out skb successfully or skb was consumed */
@@ -262,8 +266,6 @@ void __qdisc_run(struct Qdisc *q)
 			break;
 		}
 	}
-
-	qdisc_run_end(q);
 }
 
 unsigned long dev_trans_start(struct net_device *dev)
@@ -868,14 +870,18 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 
 		dev_queue = netdev_get_tx_queue(dev, i);
 		q = dev_queue->qdisc_sleeping;
-		root_lock = qdisc_lock(q);
 
-		spin_lock_bh(root_lock);
+		if (q->flags & TCQ_F_NOLOCK) {
+			val = test_bit(__QDISC_STATE_SCHED, &q->state);
+		} else {
+			root_lock = qdisc_lock(q);
+			spin_lock_bh(root_lock);
 
-		val = (qdisc_is_running(q) ||
-		       test_bit(__QDISC_STATE_SCHED, &q->state));
+			val = (qdisc_is_running(q) ||
+			       test_bit(__QDISC_STATE_SCHED, &q->state));
 
-		spin_unlock_bh(root_lock);
+			spin_unlock_bh(root_lock);
+		}
 
 		if (val)
 			return true;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 02/13] net: sched: qdisc_qlen for per cpu logic
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
  2016-08-17 19:33 ` [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking John Fastabend
@ 2016-08-17 19:34 ` John Fastabend
  2016-08-17 19:34 ` [RFC PATCH 03/13] net: sched: provide per cpu qstat helpers John Fastabend
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:34 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

This is a bit interesting because it means sch_direct_xmit will
return a positive value which causes the dequeue/xmit cycle to
continue only when a specific cpu has a qlen > 0.

However checking each cpu for qlen will break performance so
its important to note that qdiscs that set the no lock bit need
to have some sort of per cpu enqueue/dequeue data structure that
maps to the per cpu qlen value.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3de6a8c..354951d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -247,8 +247,16 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 	BUILD_BUG_ON(sizeof(qcb->data) < sz);
 }
 
+static inline int qdisc_qlen_cpu(const struct Qdisc *q)
+{
+	return this_cpu_ptr(q->cpu_qstats)->qlen;
+}
+
 static inline int qdisc_qlen(const struct Qdisc *q)
 {
+	if (q->flags & TCQ_F_NOLOCK)
+		return qdisc_qlen_cpu(q);
+
 	return q->q.qlen;
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 03/13] net: sched: provide per cpu qstat helpers
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
  2016-08-17 19:33 ` [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking John Fastabend
  2016-08-17 19:34 ` [RFC PATCH 02/13] net: sched: qdisc_qlen for per cpu logic John Fastabend
@ 2016-08-17 19:34 ` John Fastabend
  2016-08-17 19:35 ` [RFC PATCH 04/13] net: sched: provide atomic qlen helpers for bypass case John Fastabend
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:34 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

The per cpu qstats support was added with per cpu bstat support which
is currently used by the ingress qdisc. This patch adds a set of
helpers needed to make other qdiscs that use qstats per cpu as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 354951d..f69da4b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -573,12 +573,43 @@ static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
+						const struct sk_buff *skb)
+{
+	struct gnet_stats_queue *q = this_cpu_ptr(sch->cpu_qstats);
+
+	q->backlog -= qdisc_pkt_len(skb);
+}
+
 static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
 					    const struct sk_buff *skb)
 {
 	sch->qstats.backlog += qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
+						const struct sk_buff *skb)
+{
+	struct gnet_stats_queue *q = this_cpu_ptr(sch->cpu_qstats);
+
+	q->backlog += qdisc_pkt_len(skb);
+}
+
+static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
+{
+	this_cpu_ptr(sch->cpu_qstats)->qlen++;
+}
+
+static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
+{
+	this_cpu_ptr(sch->cpu_qstats)->qlen--;
+}
+
+static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
+{
+	this_cpu_ptr(sch->cpu_qstats)->requeues++;
+}
+
 static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
 {
 	sch->qstats.drops += count;
@@ -751,6 +782,14 @@ static inline void rtnl_qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
 	qdisc_qstats_drop(sch);
 }
 
+static inline int qdisc_drop_cpu(struct sk_buff *skb, struct Qdisc *sch,
+				 struct sk_buff **to_free)
+{
+	__qdisc_drop(skb, to_free);
+	qdisc_qstats_cpu_drop(sch);
+
+	return NET_XMIT_DROP;
+}
 
 static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch,
 			     struct sk_buff **to_free)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 04/13] net: sched: provide atomic qlen helpers for bypass case
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (2 preceding siblings ...)
  2016-08-17 19:34 ` [RFC PATCH 03/13] net: sched: provide per cpu qstat helpers John Fastabend
@ 2016-08-17 19:35 ` John Fastabend
  2016-08-17 19:35 ` [RFC PATCH 05/13] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:35 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

The qlen is used by the core/dev.c to determine if a packet
can skip the qdisc on qdiscs with bypass enabled. In these
cases a per cpu qlen value can cause one cpu to bypass a
qdisc that has packets in it.

To avoid this case use the simplest solution I could come
up with for now and add an atomic qlen value to the qdisc
to use in these cases.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f69da4b..193cf8c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -78,6 +78,7 @@ struct Qdisc {
 	 */
 	struct sk_buff		*gso_skb ____cacheline_aligned_in_smp;
 	struct sk_buff_head	q;
+	atomic_t		qlen_atomic;
 	struct gnet_stats_basic_packed bstats;
 	seqcount_t		running;
 	struct gnet_stats_queue	qstats;
@@ -247,6 +248,11 @@ static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 	BUILD_BUG_ON(sizeof(qcb->data) < sz);
 }
 
+static inline int qdisc_qlen_atomic(const struct Qdisc *q)
+{
+	return atomic_read(&q->qlen_atomic);
+}
+
 static inline int qdisc_qlen_cpu(const struct Qdisc *q)
 {
 	return this_cpu_ptr(q->cpu_qstats)->qlen;
@@ -254,8 +260,11 @@ static inline int qdisc_qlen_cpu(const struct Qdisc *q)
 
 static inline int qdisc_qlen(const struct Qdisc *q)
 {
+	/* current default is to use atomic ops for qdisc qlen when
+	 * running with TCQ_F_NOLOCK.
+	 */
 	if (q->flags & TCQ_F_NOLOCK)
-		return qdisc_qlen_cpu(q);
+		return qdisc_qlen_atomic(q);
 
 	return q->q.qlen;
 }
@@ -595,6 +604,16 @@ static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
 	q->backlog += qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_atomic_qlen_inc(struct Qdisc *sch)
+{
+	atomic_inc(&sch->qlen_atomic);
+}
+
+static inline void qdisc_qstats_atomic_qlen_dec(struct Qdisc *sch)
+{
+	atomic_dec(&sch->qlen_atomic);
+}
+
 static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
 {
 	this_cpu_ptr(sch->cpu_qstats)->qlen++;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 05/13] net: sched: a dflt qdisc may be used with per cpu stats
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (3 preceding siblings ...)
  2016-08-17 19:35 ` [RFC PATCH 04/13] net: sched: provide atomic qlen helpers for bypass case John Fastabend
@ 2016-08-17 19:35 ` John Fastabend
  2016-08-17 19:35 ` [RFC PATCH 06/13] net: sched: per cpu gso handlers John Fastabend
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:35 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

Enable dflt qdisc support for per cpu stats before this patch a
dflt qdisc was required to use the global statistics qstats and
bstats.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index af32418..f8fec81 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -645,18 +645,34 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 	struct Qdisc *sch;
 
 	if (!try_module_get(ops->owner))
-		goto errout;
+		return NULL;
 
 	sch = qdisc_alloc(dev_queue, ops);
 	if (IS_ERR(sch))
-		goto errout;
+		return NULL;
 	sch->parent = parentid;
 
-	if (!ops->init || ops->init(sch, NULL) == 0)
+	if (!ops->init)
 		return sch;
 
-	qdisc_destroy(sch);
+	if (ops->init(sch, NULL))
+		goto errout;
+
+	/* init() may have set percpu flags so init data structures */
+	if (qdisc_is_percpu_stats(sch)) {
+		sch->cpu_bstats =
+			netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+		if (!sch->cpu_bstats)
+			goto errout;
+
+		sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+		if (!sch->cpu_qstats)
+			goto errout;
+	}
+
+	return sch;
 errout:
+	qdisc_destroy(sch);
 	return NULL;
 }
 EXPORT_SYMBOL(qdisc_create_dflt);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 06/13] net: sched: per cpu gso handlers
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (4 preceding siblings ...)
  2016-08-17 19:35 ` [RFC PATCH 05/13] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
@ 2016-08-17 19:35 ` John Fastabend
  2016-08-17 19:36 ` [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc John Fastabend
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:35 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

The net sched infrastructure has a gso ptr that points to skb structs
that have failed to be enqueued by the device driver.

This can happen when multiple cores try to push a skb onto the same
underlying hardware queue resulting in lock contention. This case is
handled by a cpu collision handler handle_dev_cpu_collision(). Another
case occurs when the stack overruns the drivers low level tx queues
capacity. Ideally these should be a rare occurrence in a well-tuned
system but they do happen.

To handle this in the lockless case use a per cpu gso field to park
the skb until the conflict can be resolved. Note at this point the
skb has already been popped off the qdisc so it has to be handled
by the infrastructure.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   39 +++++++++++++++++++++++++
 net/sched/sch_api.c       |    7 ++++
 net/sched/sch_generic.c   |   71 ++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 193cf8c..0864813 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -36,6 +36,10 @@ struct qdisc_size_table {
 	u16			data[];
 };
 
+struct gso_cell {
+	struct sk_buff *skb;
+};
+
 struct Qdisc {
 	int 			(*enqueue)(struct sk_buff *skb,
 					   struct Qdisc *sch,
@@ -73,6 +77,8 @@ struct Qdisc {
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 
+	struct gso_cell __percpu *gso_cpu_skb;
+
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
 	 */
@@ -744,6 +750,23 @@ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
 	return sch->gso_skb;
 }
 
+static inline struct sk_buff *qdisc_peek_dequeued_cpu(struct Qdisc *sch)
+{
+	struct gso_cell *gso = this_cpu_ptr(sch->gso_cpu_skb);
+
+	if (!gso->skb) {
+		struct sk_buff *skb = sch->dequeue(sch);
+
+		if (skb) {
+			gso->skb = skb;
+			qdisc_qstats_cpu_backlog_inc(sch, skb);
+			qdisc_qstats_cpu_qlen_inc(sch);
+		}
+	}
+
+	return gso->skb;
+}
+
 /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
 static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 {
@@ -760,6 +783,22 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 	return skb;
 }
 
+static inline struct sk_buff *qdisc_dequeue_peeked_skb(struct Qdisc *sch)
+{
+	struct gso_cell *gso = this_cpu_ptr(sch->gso_cpu_skb);
+	struct sk_buff *skb = gso->skb;
+
+	if (skb) {
+		gso->skb = NULL;
+		qdisc_qstats_cpu_backlog_dec(sch, skb);
+		qdisc_qstats_cpu_qlen_dec(sch);
+	} else {
+		skb = sch->dequeue(sch);
+	}
+
+	return skb;
+}
+
 static inline void __qdisc_reset_queue(struct sk_buff_head *list)
 {
 	/*
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 12ebde8..d713052 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -966,6 +966,12 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				goto err_out4;
 		}
 
+		if (sch->flags & TCQ_F_NOLOCK) {
+			sch->gso_cpu_skb = alloc_percpu(struct gso_cell);
+			if (!sch->gso_cpu_skb)
+				goto err_out4;
+		}
+
 		if (tca[TCA_STAB]) {
 			stab = qdisc_get_stab(tca[TCA_STAB]);
 			if (IS_ERR(stab)) {
@@ -1014,6 +1020,7 @@ err_out:
 err_out4:
 	free_percpu(sch->cpu_bstats);
 	free_percpu(sch->cpu_qstats);
+	free_percpu(sch->gso_cpu_skb);
 	/*
 	 * Any broken qdiscs that would require a ops->reset() here?
 	 * The qdisc was never in action so it shouldn't be necessary.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f8fec81..3b9a21f 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -44,8 +44,25 @@ EXPORT_SYMBOL(default_qdisc_ops);
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
+static inline struct sk_buff *qdisc_dequeue_gso_skb(struct Qdisc *sch)
+{
+	if (sch->gso_cpu_skb)
+		return (this_cpu_ptr(sch->gso_cpu_skb))->skb;
 
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+	return sch->gso_skb;
+}
+
+static inline void qdisc_null_gso_skb(struct Qdisc *sch)
+{
+	if (sch->gso_cpu_skb) {
+		(this_cpu_ptr(sch->gso_cpu_skb))->skb = NULL;
+		return;
+	}
+
+	sch->gso_skb = NULL;
+}
+
+static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
 	q->gso_skb = skb;
 	q->qstats.requeues++;
@@ -56,6 +73,25 @@ static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 	return 0;
 }
 
+static inline int dev_requeue_cpu_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+	this_cpu_ptr(q->gso_cpu_skb)->skb = skb;
+	qdisc_qstats_cpu_requeues_inc(q);
+	qdisc_qstats_cpu_backlog_inc(q, skb);
+	qdisc_qstats_cpu_qlen_inc(q);
+	__netif_schedule(q);
+
+	return 0;
+}
+
+static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+	if (q->flags & TCQ_F_NOLOCK)
+		return dev_requeue_cpu_skb(skb, q);
+	else
+		return __dev_requeue_skb(skb, q);
+}
+
 static void try_bulk_dequeue_skb(struct Qdisc *q,
 				 struct sk_buff *skb,
 				 const struct netdev_queue *txq,
@@ -111,7 +147,7 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 				   int *packets)
 {
-	struct sk_buff *skb = q->gso_skb;
+	struct sk_buff *skb = qdisc_dequeue_gso_skb(q);
 	const struct netdev_queue *txq = q->dev_queue;
 
 	*packets = 1;
@@ -121,9 +157,15 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 		/* check the reason of requeuing without tx lock first */
 		txq = skb_get_tx_queue(txq->dev, skb);
 		if (!netif_xmit_frozen_or_stopped(txq)) {
-			q->gso_skb = NULL;
-			qdisc_qstats_backlog_dec(q, skb);
-			q->q.qlen--;
+			qdisc_null_gso_skb(q);
+
+			if (qdisc_is_percpu_stats(q)) {
+				qdisc_qstats_cpu_backlog_inc(q, skb);
+				qdisc_qstats_cpu_qlen_dec(q);
+			} else {
+				qdisc_qstats_backlog_dec(q, skb);
+				q->q.qlen--;
+			}
 		} else
 			skb = NULL;
 		return skb;
@@ -670,6 +712,12 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 			goto errout;
 	}
 
+	if (sch->flags & TCQ_F_NOLOCK) {
+		sch->gso_cpu_skb = alloc_percpu(struct gso_cell);
+		if (!sch->gso_cpu_skb)
+			goto errout;
+	}
+
 	return sch;
 errout:
 	qdisc_destroy(sch);
@@ -706,6 +754,19 @@ static void qdisc_rcu_free(struct rcu_head *head)
 		free_percpu(qdisc->cpu_qstats);
 	}
 
+	if (qdisc->gso_cpu_skb) {
+		int i;
+
+		for_each_possible_cpu(i) {
+			struct gso_cell *cell;
+
+			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
+			kfree_skb_list(cell->skb);
+		}
+
+		free_percpu(qdisc->gso_cpu_skb);
+	}
+
 	kfree((char *) qdisc - qdisc->padded);
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (5 preceding siblings ...)
  2016-08-17 19:35 ` [RFC PATCH 06/13] net: sched: per cpu gso handlers John Fastabend
@ 2016-08-17 19:36 ` John Fastabend
  2016-08-17 22:53   ` Eric Dumazet
  2016-08-17 19:36 ` [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:36 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

The qdisc_reset operation depends on the qdisc lock at the moment
to halt any additions to gso_skb and statistics while the list is
free'd and the stats zeroed.

Without the qdisc lock we can not guarantee another cpu is not in
the process of adding a skb to one of the "cells". Here are the
two cases we have to handle.

 case 1: qdisc_graft operation. In this case a "new" qdisc is attached
	 and the 'qdisc_destroy' operation is called on the old qdisc.
	 The destroy operation will wait a rcu grace period and call
	 qdisc_rcu_free(). At which point gso_cpu_skb is free'd along
	 with all stats so no need to zero stats and gso_cpu_skb from
	 the reset operation itself.

	 Because we can not continue to call qdisc_reset before waiting
	 an rcu grace period so that the qdisc is detached from all
	 cpus simply do not call qdisc_reset() at all and let the
	 qdisc_destroy operation clean up the qdisc. Note, a refcnt
	 greater than 1 would cause the destroy operation to be
	 aborted however if this ever happened the reference to the
	 qdisc would be lost and we would have a memory leak.

 case 2: dev_deactivate sequence. This can come from a user bringing
	 the interface down which causes the gso_skb list to be flushed
	 and the qlen zero'd. At the moment this is protected by the
	 qdisc lock so while we clear the qlen/gso_skb fields we are
	 guaranteed no new skbs are added. For the lockless case
	 though this is not true. To resolve this move the qdisc_reset
	 call after the new qdisc is assigned and a grace period is
	 exercised to ensure no new skbs can be enqueued. Further
	 the RTNL lock is held so we can not get another call to
	 activate the qdisc while the skb lists are being free'd.

	 Finally, fix qdisc_reset to handle the per cpu stats and
	 skb lists.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3b9a21f..29238c4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -737,6 +737,20 @@ void qdisc_reset(struct Qdisc *qdisc)
 	kfree_skb(qdisc->skb_bad_txq);
 	qdisc->skb_bad_txq = NULL;
 
+	if (qdisc->gso_cpu_skb) {
+		int i;
+
+		for_each_possible_cpu(i) {
+			struct gso_cell *cell;
+
+			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
+			if (cell) {
+				kfree_skb_list(cell->skb);
+				cell = NULL;
+			}
+		}
+	}
+
 	if (qdisc->gso_skb) {
 		kfree_skb_list(qdisc->gso_skb);
 		qdisc->gso_skb = NULL;
@@ -812,10 +826,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 	root_lock = qdisc_lock(oqdisc);
 	spin_lock_bh(root_lock);
 
-	/* Prune old scheduler */
-	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
-		qdisc_reset(oqdisc);
-
 	/* ... and graft new one */
 	if (qdisc == NULL)
 		qdisc = &noop_qdisc;
@@ -929,7 +939,6 @@ static void dev_deactivate_queue(struct net_device *dev,
 			set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
 
 		rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
-		qdisc_reset(qdisc);
 
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
@@ -966,6 +975,16 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 	return false;
 }
 
+static void dev_qdisc_reset(struct net_device *dev,
+			    struct netdev_queue *dev_queue,
+			    void *none)
+{
+	struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+	if (qdisc)
+		qdisc_reset(qdisc);
+}
+
 /**
  * 	dev_deactivate_many - deactivate transmissions on several devices
  * 	@head: list of devices to deactivate
@@ -976,7 +995,6 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 void dev_deactivate_many(struct list_head *head)
 {
 	struct net_device *dev;
-	bool sync_needed = false;
 
 	list_for_each_entry(dev, head, close_list) {
 		netdev_for_each_tx_queue(dev, dev_deactivate_queue,
@@ -986,20 +1004,27 @@ void dev_deactivate_many(struct list_head *head)
 					     &noop_qdisc);
 
 		dev_watchdog_down(dev);
-		sync_needed |= !dev->dismantle;
 	}
 
 	/* Wait for outstanding qdisc-less dev_queue_xmit calls.
 	 * This is avoided if all devices are in dismantle phase :
 	 * Caller will call synchronize_net() for us
 	 */
-	if (sync_needed)
-		synchronize_net();
+	synchronize_net();
 
 	/* Wait for outstanding qdisc_run calls. */
-	list_for_each_entry(dev, head, close_list)
+	list_for_each_entry(dev, head, close_list) {
 		while (some_qdisc_is_busy(dev))
 			yield();
+
+		/* The new qdisc is assigned at this point so we can safely
+		 * unwind stale skb lists and qdisc statistics
+		 */
+		netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+		if (dev_ingress_queue(dev))
+			dev_qdisc_reset(dev, dev_ingress_queue(dev), NULL);
+	}
+
 }
 
 void dev_deactivate(struct net_device *dev)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (6 preceding siblings ...)
  2016-08-17 19:36 ` [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc John Fastabend
@ 2016-08-17 19:36 ` John Fastabend
  2016-08-17 22:58   ` Eric Dumazet
  2016-08-17 19:37 ` [RFC PATCH 09/13] net: sched: helper to sum qlen John Fastabend
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:36 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

Similar to how gso is handled skb_bad_tx needs to be per cpu to handle
lockless qdisc with multiple writer/producers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |    7 +++
 net/sched/sch_api.c       |    6 +++
 net/sched/sch_generic.c   |   95 +++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 99 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 0864813..d465fb9 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -40,6 +40,10 @@ struct gso_cell {
 	struct sk_buff *skb;
 };
 
+struct bad_txq_cell {
+	struct sk_buff *skb;
+};
+
 struct Qdisc {
 	int 			(*enqueue)(struct sk_buff *skb,
 					   struct Qdisc *sch,
@@ -77,7 +81,8 @@ struct Qdisc {
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 
-	struct gso_cell __percpu *gso_cpu_skb;
+	struct gso_cell     __percpu *gso_cpu_skb;
+	struct bad_txq_cell __percpu *skb_bad_txq_cpu;
 
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d713052..b90a23a 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -970,6 +970,11 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 			sch->gso_cpu_skb = alloc_percpu(struct gso_cell);
 			if (!sch->gso_cpu_skb)
 				goto err_out4;
+
+			sch->skb_bad_txq_cpu =
+				alloc_percpu(struct bad_txq_cell);
+			if (!sch->skb_bad_txq_cpu)
+				goto err_out4;
 		}
 
 		if (tca[TCA_STAB]) {
@@ -1021,6 +1026,7 @@ err_out4:
 	free_percpu(sch->cpu_bstats);
 	free_percpu(sch->cpu_qstats);
 	free_percpu(sch->gso_cpu_skb);
+	free_percpu(sch->skb_bad_txq_cpu);
 	/*
 	 * Any broken qdiscs that would require a ops->reset() here?
 	 * The qdisc was never in action so it shouldn't be necessary.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 29238c4..d10b762 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -44,6 +44,43 @@ EXPORT_SYMBOL(default_qdisc_ops);
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
+static inline struct sk_buff *qdisc_dequeue_skb_bad_txq(struct Qdisc *sch)
+{
+	if (sch->skb_bad_txq_cpu) {
+		struct bad_txq_cell *cell = this_cpu_ptr(sch->skb_bad_txq_cpu);
+
+		return cell->skb;
+	}
+
+	return sch->skb_bad_txq;
+}
+
+static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *sch,
+					     struct sk_buff *skb)
+{
+	if (sch->skb_bad_txq_cpu) {
+		struct bad_txq_cell *cell = this_cpu_ptr(sch->skb_bad_txq_cpu);
+
+		cell->skb = skb;
+		__netif_schedule(sch);
+		return;
+	}
+
+	sch->skb_bad_txq = skb;
+}
+
+static inline void qdisc_null_skb_bad_txq(struct Qdisc *sch)
+{
+	if (sch->skb_bad_txq_cpu) {
+		struct bad_txq_cell *cell = this_cpu_ptr(sch->skb_bad_txq_cpu);
+
+		cell->skb = NULL;
+		return;
+	}
+
+	sch->skb_bad_txq = NULL;
+}
+
 static inline struct sk_buff *qdisc_dequeue_gso_skb(struct Qdisc *sch)
 {
 	if (sch->gso_cpu_skb)
@@ -129,9 +166,15 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 		if (!nskb)
 			break;
 		if (unlikely(skb_get_queue_mapping(nskb) != mapping)) {
-			q->skb_bad_txq = nskb;
-			qdisc_qstats_backlog_inc(q, nskb);
-			q->q.qlen++;
+			qdisc_enqueue_skb_bad_txq(q, nskb);
+
+			if (qdisc_is_percpu_stats(q)) {
+				qdisc_qstats_cpu_backlog_inc(q, nskb);
+				qdisc_qstats_cpu_qlen_inc(q);
+			} else {
+				qdisc_qstats_backlog_inc(q, nskb);
+				q->q.qlen++;
+			}
 			break;
 		}
 		skb->next = nskb;
@@ -160,7 +203,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 			qdisc_null_gso_skb(q);
 
 			if (qdisc_is_percpu_stats(q)) {
-				qdisc_qstats_cpu_backlog_inc(q, skb);
+				qdisc_qstats_cpu_backlog_dec(q, skb);
 				qdisc_qstats_cpu_qlen_dec(q);
 			} else {
 				qdisc_qstats_backlog_dec(q, skb);
@@ -171,14 +214,19 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
 		return skb;
 	}
 	*validate = true;
-	skb = q->skb_bad_txq;
+	skb = qdisc_dequeue_skb_bad_txq(q);
 	if (unlikely(skb)) {
 		/* check the reason of requeuing without tx lock first */
 		txq = skb_get_tx_queue(txq->dev, skb);
 		if (!netif_xmit_frozen_or_stopped(txq)) {
-			q->skb_bad_txq = NULL;
-			qdisc_qstats_backlog_dec(q, skb);
-			q->q.qlen--;
+			qdisc_null_skb_bad_txq(q);
+			if (qdisc_is_percpu_stats(q)) {
+				qdisc_qstats_cpu_backlog_dec(q, skb);
+				qdisc_qstats_cpu_qlen_dec(q);
+			} else {
+				qdisc_qstats_backlog_dec(q, skb);
+				q->q.qlen--;
+			}
 			goto bulk;
 		}
 		return NULL;
@@ -716,6 +764,10 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 		sch->gso_cpu_skb = alloc_percpu(struct gso_cell);
 		if (!sch->gso_cpu_skb)
 			goto errout;
+
+		sch->skb_bad_txq_cpu = alloc_percpu(struct bad_txq_cell);
+		if (!sch->skb_bad_txq_cpu)
+			goto errout;
 	}
 
 	return sch;
@@ -746,6 +798,20 @@ void qdisc_reset(struct Qdisc *qdisc)
 			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
 			if (cell) {
 				kfree_skb_list(cell->skb);
+				cell->skb = NULL;
+			}
+		}
+	}
+
+	if (qdisc->skb_bad_txq_cpu) {
+		int i;
+
+		for_each_possible_cpu(i) {
+			struct bad_txq_cell *cell;
+
+			cell = per_cpu_ptr(qdisc->skb_bad_txq_cpu, i);
+			if (cell) {
+				kfree_skb(cell->skb);
 				cell = NULL;
 			}
 		}
@@ -781,6 +847,19 @@ static void qdisc_rcu_free(struct rcu_head *head)
 		free_percpu(qdisc->gso_cpu_skb);
 	}
 
+	if (qdisc->skb_bad_txq_cpu) {
+		int i;
+
+		for_each_possible_cpu(i) {
+			struct bad_txq_cell *cell;
+
+			cell = per_cpu_ptr(qdisc->skb_bad_txq_cpu, i);
+			kfree_skb(cell->skb);
+		}
+
+		free_percpu(qdisc->skb_bad_txq_cpu);
+	}
+
 	kfree((char *) qdisc - qdisc->padded);
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 09/13] net: sched: helper to sum qlen
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (7 preceding siblings ...)
  2016-08-17 19:36 ` [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
@ 2016-08-17 19:37 ` John Fastabend
  2016-08-17 19:37 ` [RFC PATCH 10/13] net: sched: lockless support for netif_schedule John Fastabend
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:37 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

Reporting qlen when qlen is per cpu requires aggregating the per
cpu counters. This adds a helper routine for this.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/sch_generic.h |   15 +++++++++++++++
 net/sched/sch_api.c       |    3 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d465fb9..cc28af0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -280,6 +280,21 @@ static inline int qdisc_qlen(const struct Qdisc *q)
 	return q->q.qlen;
 }
 
+static inline int qdisc_qlen_sum(const struct Qdisc *q)
+{
+	__u32 qlen = 0;
+	int i;
+
+	if (q->flags & TCQ_F_NOLOCK) {
+		for_each_possible_cpu(i)
+			qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+	} else {
+		qlen = q->q.qlen;
+	}
+
+	return qlen;
+}
+
 static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
 {
 	return (struct qdisc_skb_cb *)skb->cb;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index b90a23a..6c5bf13 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1370,7 +1370,8 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
 		goto nla_put_failure;
 	if (q->ops->dump && q->ops->dump(q, skb) < 0)
 		goto nla_put_failure;
-	qlen = q->q.qlen;
+
+	qlen = qdisc_qlen_sum(q);
 
 	stab = rtnl_dereference(q->stab);
 	if (stab && qdisc_dump_stab(skb, stab) < 0)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 10/13] net: sched: lockless support for netif_schedule
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (8 preceding siblings ...)
  2016-08-17 19:37 ` [RFC PATCH 09/13] net: sched: helper to sum qlen John Fastabend
@ 2016-08-17 19:37 ` John Fastabend
  2016-08-17 19:46   ` John Fastabend
  2016-08-17 23:01   ` Eric Dumazet
  2016-08-17 19:38 ` [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue John Fastabend
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:37 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

netif_schedule uses a bit QDISC_STATE_SCHED to tell the qdisc layer
if a run of the qdisc has been scheduler. This is important when
tearing down qdisc instances. We can rcu_free an instance for example
if its possible that we might have outstanding references to it.

Perhaps more importantly in the per cpu lockless case we need to
schedule a run of the qdisc on all qdiscs that are enqueu'ing packets
and hitting the gso_skb requeue logic or else the skb may get stuck
on the gso_skb queue without anything to finish the xmit.

This patch uses a reference counter instead of a bit to account for
the multiple CPUs.
---
 include/net/sch_generic.h |    1 +
 net/core/dev.c            |   32 +++++++++++++++++++++++---------
 net/sched/sch_api.c       |    5 +++++
 net/sched/sch_generic.c   |   16 +++++++++++++++-
 4 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index cc28af0..2e0e5b0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -94,6 +94,7 @@ struct Qdisc {
 	seqcount_t		running;
 	struct gnet_stats_queue	qstats;
 	unsigned long		state;
+	unsigned long __percpu	*cpu_state;
 	struct Qdisc            *next_sched;
 	struct sk_buff		*skb_bad_txq;
 	struct rcu_head		rcu_head;
diff --git a/net/core/dev.c b/net/core/dev.c
index 5db395d..f491845 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2272,8 +2272,14 @@ static void __netif_reschedule(struct Qdisc *q)
 
 void __netif_schedule(struct Qdisc *q)
 {
-	if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state))
+	if (q->flags & TCQ_F_NOLOCK) {
+		unsigned long *s = this_cpu_ptr(q->cpu_state);
+
+		if (!test_and_set_bit(__QDISC_STATE_SCHED, s))
+			__netif_reschedule(q);
+	} else if (!test_and_set_bit(__QDISC_STATE_SCHED, &q->state)) {
 		__netif_reschedule(q);
+	}
 }
 EXPORT_SYMBOL(__netif_schedule);
 
@@ -3925,15 +3931,23 @@ static void net_tx_action(struct softirq_action *h)
 			if (!(q->flags & TCQ_F_NOLOCK)) {
 				root_lock = qdisc_lock(q);
 				spin_lock(root_lock);
-			}
-			/* We need to make sure head->next_sched is read
-			 * before clearing __QDISC_STATE_SCHED
-			 */
-			smp_mb__before_atomic();
-			clear_bit(__QDISC_STATE_SCHED, &q->state);
-			qdisc_run(q);
-			if (!(q->flags & TCQ_F_NOLOCK))
+
+				/* We need to make sure head->next_sched is read
+				 * before clearing __QDISC_STATE_SCHED
+				 */
+				smp_mb__before_atomic();
+				clear_bit(__QDISC_STATE_SCHED, &q->state);
+
+				qdisc_run(q);
+
 				spin_unlock(root_lock);
+			} else {
+				unsigned long *s = this_cpu_ptr(q->cpu_state);
+
+				smp_mb__before_atomic();
+				clear_bit(__QDISC_STATE_SCHED, s);
+				__qdisc_run(q);
+			}
 		}
 	}
 }
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6c5bf13..89989a6 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -975,6 +975,10 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
 				alloc_percpu(struct bad_txq_cell);
 			if (!sch->skb_bad_txq_cpu)
 				goto err_out4;
+
+			sch->cpu_state = alloc_percpu(unsigned long);
+			if (!sch->cpu_state)
+				goto err_out4;
 		}
 
 		if (tca[TCA_STAB]) {
@@ -1027,6 +1031,7 @@ err_out4:
 	free_percpu(sch->cpu_qstats);
 	free_percpu(sch->gso_cpu_skb);
 	free_percpu(sch->skb_bad_txq_cpu);
+	free_percpu(sch->cpu_state);
 	/*
 	 * Any broken qdiscs that would require a ops->reset() here?
 	 * The qdisc was never in action so it shouldn't be necessary.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index d10b762..f5b7254 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -171,6 +171,7 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 			if (qdisc_is_percpu_stats(q)) {
 				qdisc_qstats_cpu_backlog_inc(q, nskb);
 				qdisc_qstats_cpu_qlen_inc(q);
+				set_thread_flag(TIF_NEED_RESCHED);
 			} else {
 				qdisc_qstats_backlog_inc(q, nskb);
 				q->q.qlen++;
@@ -768,6 +769,10 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 		sch->skb_bad_txq_cpu = alloc_percpu(struct bad_txq_cell);
 		if (!sch->skb_bad_txq_cpu)
 			goto errout;
+
+		sch->cpu_state = alloc_percpu(unsigned long);
+		if (!sch->cpu_state)
+			goto errout;
 	}
 
 	return sch;
@@ -1037,7 +1042,16 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 		q = dev_queue->qdisc_sleeping;
 
 		if (q->flags & TCQ_F_NOLOCK) {
-			val = test_bit(__QDISC_STATE_SCHED, &q->state);
+			int i;
+
+			for_each_possible_cpu(i) {
+				unsigned long *s;
+
+				s = per_cpu_ptr(q->cpu_state, i);
+				val = test_bit(__QDISC_STATE_SCHED, s);
+				if (val)
+					break;
+			}
 		} else {
 			root_lock = qdisc_lock(q);
 			spin_lock_bh(root_lock);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (9 preceding siblings ...)
  2016-08-17 19:37 ` [RFC PATCH 10/13] net: sched: lockless support for netif_schedule John Fastabend
@ 2016-08-17 19:38 ` John Fastabend
  2016-08-19 10:13   ` Jesper Dangaard Brouer
  2016-08-17 19:38 ` [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
  2016-08-17 19:39 ` [RFC PATCH 13/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
  12 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:38 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

This converts the pfifo_fast qdisc to use the alf_queue enqueue and
dequeue routines then sets the NOLOCK bit.

This also removes the logic used to pick the next band to dequeue from
and instead just checks each alf_queue for packets from top priority
to lowest. This might need to be a bit more clever but seems to work
for now.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_generic.c |  133 +++++++++++++++++++++++++++--------------------
 1 file changed, 77 insertions(+), 56 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f5b7254..c41a5dd 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -26,6 +26,7 @@
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/if_vlan.h>
+#include <linux/skb_array.h>
 #include <net/sch_generic.h>
 #include <net/pkt_sched.h>
 #include <net/dst.h>
@@ -555,88 +556,79 @@ static const u8 prio2band[TC_PRIO_MAX + 1] = {
 
 /*
  * Private data for a pfifo_fast scheduler containing:
- * 	- queues for the three band
- * 	- bitmap indicating which of the bands contain skbs
+ *	- rings for priority bands
  */
 struct pfifo_fast_priv {
-	u32 bitmap;
-	struct sk_buff_head q[PFIFO_FAST_BANDS];
+	struct skb_array q[PFIFO_FAST_BANDS];
 };
 
-/*
- * Convert a bitmap to the first band number where an skb is queued, where:
- * 	bitmap=0 means there are no skbs on any band.
- * 	bitmap=1 means there is an skb on band 0.
- *	bitmap=7 means there are skbs on all 3 bands, etc.
- */
-static const int bitmap2band[] = {-1, 0, 1, 0, 2, 0, 1, 0};
-
-static inline struct sk_buff_head *band2list(struct pfifo_fast_priv *priv,
-					     int band)
+static inline struct skb_array *band2list(struct pfifo_fast_priv *priv,
+					  int band)
 {
-	return priv->q + band;
+	return &priv->q[band];
 }
 
 static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
 			      struct sk_buff **to_free)
 {
-	if (skb_queue_len(&qdisc->q) < qdisc_dev(qdisc)->tx_queue_len) {
-		int band = prio2band[skb->priority & TC_PRIO_MAX];
-		struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-		struct sk_buff_head *list = band2list(priv, band);
-
-		priv->bitmap |= (1 << band);
-		qdisc->q.qlen++;
-		return __qdisc_enqueue_tail(skb, qdisc, list);
-	}
+	int band = prio2band[skb->priority & TC_PRIO_MAX];
+	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
+	struct skb_array *q = band2list(priv, band);
+	int err;
 
-	return qdisc_drop(skb, qdisc, to_free);
+	err = skb_array_produce(q, skb);
+
+	if (unlikely(err))
+		return qdisc_drop_cpu(skb, qdisc, to_free);
+
+	qdisc_qstats_cpu_qlen_inc(qdisc);
+	qdisc_qstats_cpu_backlog_inc(qdisc, skb);
+	return NET_XMIT_SUCCESS;
 }
 
 static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 {
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-	int band = bitmap2band[priv->bitmap];
+	struct sk_buff *skb = NULL;
+	int band;
 
-	if (likely(band >= 0)) {
-		struct sk_buff_head *list = band2list(priv, band);
-		struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list);
+	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
+		struct skb_array *q = band2list(priv, band);
 
-		qdisc->q.qlen--;
-		if (skb_queue_empty(list))
-			priv->bitmap &= ~(1 << band);
+		if (__skb_array_empty(q))
+			continue;
 
-		return skb;
+		skb = skb_array_consume(q);
 	}
 
-	return NULL;
-}
-
-static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
-{
-	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-	int band = bitmap2band[priv->bitmap];
-
-	if (band >= 0) {
-		struct sk_buff_head *list = band2list(priv, band);
-
-		return skb_peek(list);
+	if (likely(skb)) {
+		qdisc_qstats_cpu_backlog_dec(qdisc, skb);
+		qdisc_bstats_cpu_update(qdisc, skb);
+		qdisc_qstats_cpu_qlen_dec(qdisc);
 	}
 
-	return NULL;
+	return skb;
 }
 
 static void pfifo_fast_reset(struct Qdisc *qdisc)
 {
-	int prio;
+	int i, band;
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
 
-	for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
-		__qdisc_reset_queue(band2list(priv, prio));
+	for (band = 0; band < PFIFO_FAST_BANDS; band++) {
+		struct skb_array *q = band2list(priv, band);
+		struct sk_buff *skb;
 
-	priv->bitmap = 0;
-	qdisc->qstats.backlog = 0;
-	qdisc->q.qlen = 0;
+		while ((skb = skb_array_consume(q)) != NULL)
+			kfree_skb(skb);
+	}
+
+	for_each_possible_cpu(i) {
+		struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i);
+
+		q->backlog = 0;
+		q->qlen = 0;
+	}
 }
 
 static int pfifo_fast_dump(struct Qdisc *qdisc, struct sk_buff *skb)
@@ -654,24 +646,53 @@ nla_put_failure:
 
 static int pfifo_fast_init(struct Qdisc *qdisc, struct nlattr *opt)
 {
-	int prio;
+	unsigned int qlen = qdisc_dev(qdisc)->tx_queue_len;
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
+	int prio;
+
+	/* guard against zero length rings */
+	if (!qlen)
+		return -EINVAL;
 
-	for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
-		__skb_queue_head_init(band2list(priv, prio));
+	for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
+		struct skb_array *q = band2list(priv, prio);
+		int err;
+
+		err = skb_array_init(q, qlen, GFP_KERNEL);
+		if (err)
+			return -ENOMEM;
+	}
+
+	atomic_set(&qdisc->qlen_atomic, 0);
 
 	/* Can by-pass the queue discipline */
 	qdisc->flags |= TCQ_F_CAN_BYPASS;
+	qdisc->flags |= TCQ_F_NOLOCK;
+	qdisc->flags |= TCQ_F_CPUSTATS;
+
 	return 0;
 }
 
+static void pfifo_fast_destroy(struct Qdisc *sch)
+{
+	struct pfifo_fast_priv *priv = qdisc_priv(sch);
+	int prio;
+
+	for (prio = 0; prio < PFIFO_FAST_BANDS; prio++) {
+		struct skb_array *q = band2list(priv, prio);
+
+		skb_array_cleanup(q);
+	}
+}
+
 struct Qdisc_ops pfifo_fast_ops __read_mostly = {
 	.id		=	"pfifo_fast",
 	.priv_size	=	sizeof(struct pfifo_fast_priv),
 	.enqueue	=	pfifo_fast_enqueue,
 	.dequeue	=	pfifo_fast_dequeue,
-	.peek		=	pfifo_fast_peek,
+	.peek		=	qdisc_peek_dequeued_cpu,
 	.init		=	pfifo_fast_init,
+	.destroy	=	pfifo_fast_destroy,
 	.reset		=	pfifo_fast_reset,
 	.dump		=	pfifo_fast_dump,
 	.owner		=	THIS_MODULE,

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (10 preceding siblings ...)
  2016-08-17 19:38 ` [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue John Fastabend
@ 2016-08-17 19:38 ` John Fastabend
  2016-08-17 19:49   ` John Fastabend
  2016-08-17 23:04   ` Eric Dumazet
  2016-08-17 19:39 ` [RFC PATCH 13/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend
  12 siblings, 2 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:38 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Also exports __gnet_stats_copy_queue() to use as a helper function.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/net/gen_stats.h |    3 +++
 net/core/gen_stats.c    |    9 +++++----
 net/sched/sch_mq.c      |   25 ++++++++++++++++++-------
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 231e121..5ddc88b 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -47,6 +47,9 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
 int gnet_stats_copy_queue(struct gnet_dump *d,
 			  struct gnet_stats_queue __percpu *cpu_q,
 			  struct gnet_stats_queue *q, __u32 qlen);
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+			     const struct gnet_stats_queue __percpu *cpu_q,
+			     const struct gnet_stats_queue *q, __u32 qlen);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 508e051..a503547 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -254,10 +254,10 @@ __gnet_stats_copy_queue_cpu(struct gnet_stats_queue *qstats,
 	}
 }
 
-static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
-				    const struct gnet_stats_queue __percpu *cpu,
-				    const struct gnet_stats_queue *q,
-				    __u32 qlen)
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+			     const struct gnet_stats_queue __percpu *cpu,
+			     const struct gnet_stats_queue *q,
+			     __u32 qlen)
 {
 	if (cpu) {
 		__gnet_stats_copy_queue_cpu(qstats, cpu);
@@ -271,6 +271,7 @@ static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
 
 	qstats->qlen = qlen;
 }
+EXPORT_SYMBOL(__gnet_stats_copy_queue);
 
 /**
  * gnet_stats_copy_queue - copy queue statistics into statistics TLV
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index b943982..f4b5bbb 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -17,6 +17,7 @@
 #include <linux/skbuff.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/sch_generic.h>
 
 struct mq_sched {
 	struct Qdisc		**qdiscs;
@@ -107,15 +108,25 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
 	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+		struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+		__u32 qlen = 0;
+
 		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
-		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+
+		if (qdisc_is_percpu_stats(qdisc)) {
+			cpu_bstats = qdisc->cpu_bstats;
+			cpu_qstats = qdisc->cpu_qstats;
+		}
+
+		qlen = qdisc_qlen_sum(qdisc);
+
+		__gnet_stats_copy_basic(NULL, &sch->bstats,
+					cpu_bstats, &qdisc->bstats);
+		__gnet_stats_copy_queue(&sch->qstats,
+					cpu_qstats, &qdisc->qstats, qlen);
+
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 	return 0;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 13/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio
  2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
                   ` (11 preceding siblings ...)
  2016-08-17 19:38 ` [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
@ 2016-08-17 19:39 ` John Fastabend
  12 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:39 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, john.fastabend, davem

The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
called independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 net/sched/sch_mqprio.c |   61 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 549c663..24f0360 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -229,22 +229,32 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	unsigned char *b = skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
 	struct Qdisc *qdisc;
-	unsigned int i;
+	unsigned int ntx, tc;
 
 	sch->q.qlen = 0;
 	memset(&sch->bstats, 0, sizeof(sch->bstats));
 	memset(&sch->qstats, 0, sizeof(sch->qstats));
 
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		qdisc = rtnl_dereference(netdev_get_tx_queue(dev, i)->qdisc);
+	for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+		struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+		struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+		__u32 qlen = 0;
+
+		qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
 		spin_lock_bh(qdisc_lock(qdisc));
-		sch->q.qlen		+= qdisc->q.qlen;
-		sch->bstats.bytes	+= qdisc->bstats.bytes;
-		sch->bstats.packets	+= qdisc->bstats.packets;
-		sch->qstats.backlog	+= qdisc->qstats.backlog;
-		sch->qstats.drops	+= qdisc->qstats.drops;
-		sch->qstats.requeues	+= qdisc->qstats.requeues;
-		sch->qstats.overlimits	+= qdisc->qstats.overlimits;
+
+		if (qdisc_is_percpu_stats(qdisc)) {
+			cpu_bstats = qdisc->cpu_bstats;
+			cpu_qstats = qdisc->cpu_qstats;
+		}
+
+		qlen = qdisc_qlen_sum(qdisc);
+
+		__gnet_stats_copy_basic(NULL, &sch->bstats,
+					cpu_bstats, &qdisc->bstats);
+		__gnet_stats_copy_queue(&sch->qstats,
+					cpu_qstats, &qdisc->qstats, qlen);
+
 		spin_unlock_bh(qdisc_lock(qdisc));
 	}
 
@@ -252,9 +262,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map));
 	opt.hw = priv->hw_owned;
 
-	for (i = 0; i < netdev_get_num_tc(dev); i++) {
-		opt.count[i] = dev->tc_to_txq[i].count;
-		opt.offset[i] = dev->tc_to_txq[i].offset;
+	for (tc = 0; tc < netdev_get_num_tc(dev); tc++) {
+		opt.count[tc] = dev->tc_to_txq[tc].count;
+		opt.offset[tc] = dev->tc_to_txq[tc].offset;
 	}
 
 	if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt))
@@ -332,7 +342,6 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	if (cl <= netdev_get_num_tc(dev)) {
 		int i;
 		__u32 qlen = 0;
-		struct Qdisc *qdisc;
 		struct gnet_stats_queue qstats = {0};
 		struct gnet_stats_basic_packed bstats = {0};
 		struct netdev_tc_txq tc = dev->tc_to_txq[cl - 1];
@@ -347,18 +356,26 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 
 		for (i = tc.offset; i < tc.offset + tc.count; i++) {
 			struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+			struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
+			struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+			struct gnet_stats_queue __percpu *cpu_qstats = NULL;
 
-			qdisc = rtnl_dereference(q->qdisc);
 			spin_lock_bh(qdisc_lock(qdisc));
-			qlen		  += qdisc->q.qlen;
-			bstats.bytes      += qdisc->bstats.bytes;
-			bstats.packets    += qdisc->bstats.packets;
-			qstats.backlog    += qdisc->qstats.backlog;
-			qstats.drops      += qdisc->qstats.drops;
-			qstats.requeues   += qdisc->qstats.requeues;
-			qstats.overlimits += qdisc->qstats.overlimits;
+			if (qdisc_is_percpu_stats(qdisc)) {
+				cpu_bstats = qdisc->cpu_bstats;
+				cpu_qstats = qdisc->cpu_qstats;
+			}
+
+			qlen = qdisc_qlen_sum(qdisc);
+			__gnet_stats_copy_basic(NULL, &sch->bstats,
+						cpu_bstats, &qdisc->bstats);
+			__gnet_stats_copy_queue(&sch->qstats,
+						cpu_qstats,
+						&qdisc->qstats,
+						qlen);
 			spin_unlock_bh(qdisc_lock(qdisc));
 		}
+
 		/* Reclaim root sleeping lock before completing stats */
 		if (d->lock)
 			spin_lock_bh(d->lock);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 10/13] net: sched: lockless support for netif_schedule
  2016-08-17 19:37 ` [RFC PATCH 10/13] net: sched: lockless support for netif_schedule John Fastabend
@ 2016-08-17 19:46   ` John Fastabend
  2016-08-17 23:01   ` Eric Dumazet
  1 sibling, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:46 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, davem

On 16-08-17 12:37 PM, John Fastabend wrote:
> netif_schedule uses a bit QDISC_STATE_SCHED to tell the qdisc layer
> if a run of the qdisc has been scheduler. This is important when
> tearing down qdisc instances. We can rcu_free an instance for example
> if its possible that we might have outstanding references to it.
> 
> Perhaps more importantly in the per cpu lockless case we need to
> schedule a run of the qdisc on all qdiscs that are enqueu'ing packets
> and hitting the gso_skb requeue logic or else the skb may get stuck
> on the gso_skb queue without anything to finish the xmit.
> 
> This patch uses a reference counter instead of a bit to account for
> the multiple CPUs.
> ---

oops the commit message is incorrect here it actually uses a per cpu
state bitmask to track this.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
  2016-08-17 19:38 ` [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
@ 2016-08-17 19:49   ` John Fastabend
  2016-08-17 23:04   ` Eric Dumazet
  1 sibling, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 19:49 UTC (permalink / raw)
  To: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet, brouer
  Cc: john.r.fastabend, netdev, davem

On 16-08-17 12:38 PM, John Fastabend wrote:
> The sch_mq qdisc creates a sub-qdisc per tx queue which are then
> called independently for enqueue and dequeue operations. However
> statistics are aggregated and pushed up to the "master" qdisc.
> 
> This patch adds support for any of the sub-qdiscs to be per cpu
> statistic qdiscs. To handle this case add a check when calculating
> stats and aggregate the per cpu stats if needed.
> 
> Also exports __gnet_stats_copy_queue() to use as a helper function.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---

[...]

> +		if (qdisc_is_percpu_stats(qdisc)) {
> +			cpu_bstats = qdisc->cpu_bstats;
> +			cpu_qstats = qdisc->cpu_qstats;
> +		}
> +
> +		qlen = qdisc_qlen_sum(qdisc);
> +
> +		__gnet_stats_copy_basic(NULL, &sch->bstats,
> +					cpu_bstats, &qdisc->bstats);
> +		__gnet_stats_copy_queue(&sch->qstats,
> +					cpu_qstats, &qdisc->qstats, qlen);


And also I forgot to bump this onto the atomic qlen and it is still
using the per cpu counters giving the incorrect qlen total.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking
  2016-08-17 19:33 ` [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking John Fastabend
@ 2016-08-17 22:33   ` Eric Dumazet
  2016-08-17 22:49     ` John Fastabend
  2016-08-17 22:34   ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 22:33 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 12:33 -0700, John Fastabend wrote:
> This patch adds a flag for queueing disciplines to indicate the stack
> does not need to use the qdisc lock to protect operations. This can
> be used to build lockless scheduling algorithms and improving
> performance.
> 
> The flag is checked in the tx path and the qdisc lock is only taken
> if it is not set. For now use a conditional if statement. Later we
> could be more aggressive if it proves worthwhile and use a static key
> or wrap this in a likely().
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  include/net/pkt_sched.h   |    4 +++-
>  include/net/sch_generic.h |    1 +
>  net/core/dev.c            |   32 ++++++++++++++++++++++++++++----
>  net/sched/sch_generic.c   |   26 ++++++++++++++++----------
>  4 files changed, 48 insertions(+), 15 deletions(-)
> 
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 7caa99b..69540c6 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -107,8 +107,10 @@ void __qdisc_run(struct Qdisc *q);
>  
>  static inline void qdisc_run(struct Qdisc *q)
>  {
> -	if (qdisc_run_begin(q))
> +	if (qdisc_run_begin(q)) {
>  		__qdisc_run(q);
> +		qdisc_run_end(q);
> +	}
>  }


Looks like you could have a separate patch, removing qdisc_run_end()
call done in __qdisc_run(q) ?

Then the 'allow qdiscs to handle locking'

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking
  2016-08-17 19:33 ` [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking John Fastabend
  2016-08-17 22:33   ` Eric Dumazet
@ 2016-08-17 22:34   ` Eric Dumazet
  2016-08-17 22:48     ` John Fastabend
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 22:34 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 12:33 -0700, John Fastabend wrote:


> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4ce07dc..5db395d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3076,6 +3076,26 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  	int rc;
>  
>  	qdisc_calculate_pkt_len(skb, q);
> +
> +	if (q->flags & TCQ_F_NOLOCK) {
> +		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
> +			__qdisc_drop(skb, &to_free);
> +			rc = NET_XMIT_DROP;
> +		} else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q)) {

For a lockless qdisc, do you believe TCQ_F_CAN_BYPASS is still a gain ?

Also !qdisc_qlen(q) looks racy anyway ?

> +			qdisc_bstats_cpu_update(q, skb);
> +			if (sch_direct_xmit(skb, q, dev, txq, root_lock, true))
> +				__qdisc_run(q);
> +			rc = NET_XMIT_SUCCESS;
> +		} else {
> +			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
> +			__qdisc_run(q);
> +		}
> +
> +		if (unlikely(to_free))
> +			kfree_skb_list(to_free);
> +		return rc;
> +	}
> +

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking
  2016-08-17 22:34   ` Eric Dumazet
@ 2016-08-17 22:48     ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 22:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 03:34 PM, Eric Dumazet wrote:
> On Wed, 2016-08-17 at 12:33 -0700, John Fastabend wrote:
> 
> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 4ce07dc..5db395d 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3076,6 +3076,26 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>>  	int rc;
>>  
>>  	qdisc_calculate_pkt_len(skb, q);
>> +
>> +	if (q->flags & TCQ_F_NOLOCK) {
>> +		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
>> +			__qdisc_drop(skb, &to_free);
>> +			rc = NET_XMIT_DROP;
>> +		} else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q)) {
> 
> For a lockless qdisc, do you believe TCQ_F_CAN_BYPASS is still a gain ?
> 

For the benchmarks from pktgen it appears to be a win or mute to just
drop the TCQ_F_CAN_BYPASS (just taking a look at one sample below)

    nolock & nobypass   locked (current master)
----------------------------------------------
1:  1435796 		1471479
2:  1880642		1746231
4:  1922935		1119626
8:  1585055		1001471
12: 1479273		989269

The only thing would be to test a bunch of netperf RR sessions to be
sure.

> Also !qdisc_qlen(q) looks racy anyway ?

Yep its racy unless you make it an atomic but this hurts performance
metrics. There is a patch further in the stack here that adds the
atomic variants but I tend to think we can just drop the bypass logic
in the lockless case assuming the netperf tests look good.

> 
>> +			qdisc_bstats_cpu_update(q, skb);
>> +			if (sch_direct_xmit(skb, q, dev, txq, root_lock, true))
>> +				__qdisc_run(q);
>> +			rc = NET_XMIT_SUCCESS;
>> +		} else {
>> +			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
>> +			__qdisc_run(q);
>> +		}
>> +
>> +		if (unlikely(to_free))
>> +			kfree_skb_list(to_free);
>> +		return rc;
>> +	}
>> +
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking
  2016-08-17 22:33   ` Eric Dumazet
@ 2016-08-17 22:49     ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 22:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 03:33 PM, Eric Dumazet wrote:
> On Wed, 2016-08-17 at 12:33 -0700, John Fastabend wrote:
>> This patch adds a flag for queueing disciplines to indicate the stack
>> does not need to use the qdisc lock to protect operations. This can
>> be used to build lockless scheduling algorithms and improving
>> performance.
>>
>> The flag is checked in the tx path and the qdisc lock is only taken
>> if it is not set. For now use a conditional if statement. Later we
>> could be more aggressive if it proves worthwhile and use a static key
>> or wrap this in a likely().
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  include/net/pkt_sched.h   |    4 +++-
>>  include/net/sch_generic.h |    1 +
>>  net/core/dev.c            |   32 ++++++++++++++++++++++++++++----
>>  net/sched/sch_generic.c   |   26 ++++++++++++++++----------
>>  4 files changed, 48 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
>> index 7caa99b..69540c6 100644
>> --- a/include/net/pkt_sched.h
>> +++ b/include/net/pkt_sched.h
>> @@ -107,8 +107,10 @@ void __qdisc_run(struct Qdisc *q);
>>  
>>  static inline void qdisc_run(struct Qdisc *q)
>>  {
>> -	if (qdisc_run_begin(q))
>> +	if (qdisc_run_begin(q)) {
>>  		__qdisc_run(q);
>> +		qdisc_run_end(q);
>> +	}
>>  }
> 
> 
> Looks like you could have a separate patch, removing qdisc_run_end()
> call done in __qdisc_run(q) ?
> 
> Then the 'allow qdiscs to handle locking'
> 
> 

Agreed that would clean this up a bit. Will do for next rev.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc
  2016-08-17 19:36 ` [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc John Fastabend
@ 2016-08-17 22:53   ` Eric Dumazet
  2016-08-17 22:59     ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 22:53 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 12:36 -0700, John Fastabend wrote:
> The qdisc_reset operation depends on the qdisc lock at the moment
> to halt any additions to gso_skb and statistics while the list is
> free'd and the stats zeroed.

...

> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  net/sched/sch_generic.c |   45 +++++++++++++++++++++++++++++++++++----------
>  1 file changed, 35 insertions(+), 10 deletions(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 3b9a21f..29238c4 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -737,6 +737,20 @@ void qdisc_reset(struct Qdisc *qdisc)
>  	kfree_skb(qdisc->skb_bad_txq);
>  	qdisc->skb_bad_txq = NULL;
>  
> +	if (qdisc->gso_cpu_skb) {
> +		int i;
> +
> +		for_each_possible_cpu(i) {
> +			struct gso_cell *cell;
> +
> +			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
> +			if (cell) {
> +				kfree_skb_list(cell->skb);
> +				cell = NULL;

	You probably wanted :
			cell->skb = NULL;


> +			}
> +		}
> +	}
> +
>  	if (qdisc->gso_skb) {
>  		kfree_skb_list(qdisc->gso_skb);
>  		qdisc->gso_skb = NULL;
> @@ -812,10 +826,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
>  	root_lock = qdisc_lock(oqdisc);
>  	spin_lock_bh(root_lock);
>  
> -	/* Prune old scheduler */
> -	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
> -		qdisc_reset(oqdisc);
> -
>  

This probably belongs to a separate patch, before any per cpu / lockless
qdisc changes ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc
  2016-08-17 19:36 ` [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
@ 2016-08-17 22:58   ` Eric Dumazet
  2016-08-17 23:00     ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 22:58 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 12:36 -0700, John Fastabend wrote:
> Similar to how gso is handled skb_bad_tx needs to be per cpu to handle
> lockless qdisc with multiple writer/producers.
\
> @@ -1021,6 +1026,7 @@ err_out4:
>  	free_percpu(sch->cpu_bstats);
>  	free_percpu(sch->cpu_qstats);
>  	free_percpu(sch->gso_cpu_skb);
> +	free_percpu(sch->skb_bad_txq_cpu);


This might be the time to group all these per cpu allocations to a
single one, to help data locality and decrease overhead of having XX
pointers.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc
  2016-08-17 22:53   ` Eric Dumazet
@ 2016-08-17 22:59     ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 22:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 03:53 PM, Eric Dumazet wrote:
> On Wed, 2016-08-17 at 12:36 -0700, John Fastabend wrote:
>> The qdisc_reset operation depends on the qdisc lock at the moment
>> to halt any additions to gso_skb and statistics while the list is
>> free'd and the stats zeroed.
> 
> ...
> 
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  net/sched/sch_generic.c |   45 +++++++++++++++++++++++++++++++++++----------
>>  1 file changed, 35 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 3b9a21f..29238c4 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -737,6 +737,20 @@ void qdisc_reset(struct Qdisc *qdisc)
>>  	kfree_skb(qdisc->skb_bad_txq);
>>  	qdisc->skb_bad_txq = NULL;
>>  
>> +	if (qdisc->gso_cpu_skb) {
>> +		int i;
>> +
>> +		for_each_possible_cpu(i) {
>> +			struct gso_cell *cell;
>> +
>> +			cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
>> +			if (cell) {
>> +				kfree_skb_list(cell->skb);
>> +				cell = NULL;
> 
> 	You probably wanted :
> 			cell->skb = NULL;
> 

Yep thanks!

> 
>> +			}
>> +		}
>> +	}
>> +
>>  	if (qdisc->gso_skb) {
>>  		kfree_skb_list(qdisc->gso_skb);
>>  		qdisc->gso_skb = NULL;
>> @@ -812,10 +826,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
>>  	root_lock = qdisc_lock(oqdisc);
>>  	spin_lock_bh(root_lock);
>>  
>> -	/* Prune old scheduler */
>> -	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
>> -		qdisc_reset(oqdisc);
>> -
>>  
> 
> This probably belongs to a separate patch, before any per cpu / lockless
> qdisc changes ?
> 
> 
> 

Agreed will do for next rev thanks for reviewing.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc
  2016-08-17 22:58   ` Eric Dumazet
@ 2016-08-17 23:00     ` John Fastabend
  2016-08-23 20:11       ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2016-08-17 23:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 03:58 PM, Eric Dumazet wrote:
> On Wed, 2016-08-17 at 12:36 -0700, John Fastabend wrote:
>> Similar to how gso is handled skb_bad_tx needs to be per cpu to handle
>> lockless qdisc with multiple writer/producers.
> \
>> @@ -1021,6 +1026,7 @@ err_out4:
>>  	free_percpu(sch->cpu_bstats);
>>  	free_percpu(sch->cpu_qstats);
>>  	free_percpu(sch->gso_cpu_skb);
>> +	free_percpu(sch->skb_bad_txq_cpu);
> 
> 
> This might be the time to group all these per cpu allocations to a
> single one, to help data locality and decrease overhead of having XX
> pointers.
> 
> 
> 

Sounds like a good idea to me. I'll go ahead and add a patch to the
front to consolidate the stats and then add these there.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 10/13] net: sched: lockless support for netif_schedule
  2016-08-17 19:37 ` [RFC PATCH 10/13] net: sched: lockless support for netif_schedule John Fastabend
  2016-08-17 19:46   ` John Fastabend
@ 2016-08-17 23:01   ` Eric Dumazet
  2016-08-17 23:17     ` John Fastabend
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 23:01 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 12:37 -0700, John Fastabend wrote:

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index d10b762..f5b7254 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -171,6 +171,7 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
>  			if (qdisc_is_percpu_stats(q)) {
>  				qdisc_qstats_cpu_backlog_inc(q, nskb);
>  				qdisc_qstats_cpu_qlen_inc(q);
> +				set_thread_flag(TIF_NEED_RESCHED);
>  			} else {
>  				qdisc_qstats_backlog_inc(q, nskb);
>  				q->q.qlen++;

Hmm... care to elaborate this bit ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
  2016-08-17 19:38 ` [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
  2016-08-17 19:49   ` John Fastabend
@ 2016-08-17 23:04   ` Eric Dumazet
  2016-08-17 23:18     ` John Fastabend
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 23:04 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 12:38 -0700, John Fastabend wrote:
> The sch_mq qdisc creates a sub-qdisc per tx queue which are then
> called independently for enqueue and dequeue operations. However
> statistics are aggregated and pushed up to the "master" qdisc.
> 
> This patch adds support for any of the sub-qdiscs to be per cpu
> statistic qdiscs. To handle this case add a check when calculating
> stats and aggregate the per cpu stats if needed.
> 
> Also exports __gnet_stats_copy_queue() to use as a helper function.


Looks like this patch should be happening earlier in the series ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 10/13] net: sched: lockless support for netif_schedule
  2016-08-17 23:01   ` Eric Dumazet
@ 2016-08-17 23:17     ` John Fastabend
  2016-08-17 23:33       ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2016-08-17 23:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 04:01 PM, Eric Dumazet wrote:
> On Wed, 2016-08-17 at 12:37 -0700, John Fastabend wrote:
> 
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index d10b762..f5b7254 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -171,6 +171,7 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
>>  			if (qdisc_is_percpu_stats(q)) {
>>  				qdisc_qstats_cpu_backlog_inc(q, nskb);
>>  				qdisc_qstats_cpu_qlen_inc(q);
>> +				set_thread_flag(TIF_NEED_RESCHED);
>>  			} else {
>>  				qdisc_qstats_backlog_inc(q, nskb);
>>  				q->q.qlen++;
> 
> Hmm... care to elaborate this bit ?
> 
> 
> 

ah dang thats leftover from trying to resolve a skb getting stuck on the
bad_txq_cell from qdisc_enqueue_skb_bad_txq(). You'll notice I added
a __netif_schedule(skb) call in qdisc_enqueue_skb_bad_txq() which
resolves this and the set_thread_flag() here can then just be removed.

.John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
  2016-08-17 23:04   ` Eric Dumazet
@ 2016-08-17 23:18     ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-17 23:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 04:04 PM, Eric Dumazet wrote:
> On Wed, 2016-08-17 at 12:38 -0700, John Fastabend wrote:
>> The sch_mq qdisc creates a sub-qdisc per tx queue which are then
>> called independently for enqueue and dequeue operations. However
>> statistics are aggregated and pushed up to the "master" qdisc.
>>
>> This patch adds support for any of the sub-qdiscs to be per cpu
>> statistic qdiscs. To handle this case add a check when calculating
>> stats and aggregate the per cpu stats if needed.
>>
>> Also exports __gnet_stats_copy_queue() to use as a helper function.
> 
> 
> Looks like this patch should be happening earlier in the series ?
> 
> 

hmm yep patches 12 and 13 should come before 11 to avoid introducing
a bug and subsequently fixing them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 10/13] net: sched: lockless support for netif_schedule
  2016-08-17 23:17     ` John Fastabend
@ 2016-08-17 23:33       ` Eric Dumazet
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2016-08-17 23:33 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On Wed, 2016-08-17 at 16:17 -0700, John Fastabend wrote:
> On 16-08-17 04:01 PM, Eric Dumazet wrote:
> > On Wed, 2016-08-17 at 12:37 -0700, John Fastabend wrote:
> > 
> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> >> index d10b762..f5b7254 100644
> >> --- a/net/sched/sch_generic.c
> >> +++ b/net/sched/sch_generic.c
> >> @@ -171,6 +171,7 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
> >>  			if (qdisc_is_percpu_stats(q)) {
> >>  				qdisc_qstats_cpu_backlog_inc(q, nskb);
> >>  				qdisc_qstats_cpu_qlen_inc(q);
> >> +				set_thread_flag(TIF_NEED_RESCHED);
> >>  			} else {
> >>  				qdisc_qstats_backlog_inc(q, nskb);
> >>  				q->q.qlen++;
> > 
> > Hmm... care to elaborate this bit ?
> > 
> > 
> > 
> 
> ah dang thats leftover from trying to resolve a skb getting stuck on the
> bad_txq_cell from qdisc_enqueue_skb_bad_txq(). You'll notice I added
> a __netif_schedule(skb) call in qdisc_enqueue_skb_bad_txq() which
> resolves this and the set_thread_flag() here can then just be removed.

OK I feel much better now ;)

Thanks !

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue
  2016-08-17 19:38 ` [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue John Fastabend
@ 2016-08-19 10:13   ` Jesper Dangaard Brouer
  2016-08-19 15:44     ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2016-08-19 10:13 UTC (permalink / raw)
  To: John Fastabend
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet,
	john.r.fastabend, netdev, davem, brouer, Michael S. Tsirkin

On Wed, 17 Aug 2016 12:38:10 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> This converts the pfifo_fast qdisc to use the alf_queue enqueue and
                                                ^^^^^^^^^
> dequeue routines then sets the NOLOCK bit.
> 
> This also removes the logic used to pick the next band to dequeue from
> and instead just checks each alf_queue for packets from top priority
                               ^^^^^^^^^
> to lowest. This might need to be a bit more clever but seems to work
> for now.

You need to fix the description, as you are no longer using my
alf_queue implementation but instead are using the skb_array/ptr_ring
queue (by MST).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue
  2016-08-19 10:13   ` Jesper Dangaard Brouer
@ 2016-08-19 15:44     ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-19 15:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, eric.dumazet,
	john.r.fastabend, netdev, davem, Michael S. Tsirkin

On 16-08-19 03:13 AM, Jesper Dangaard Brouer wrote:
> On Wed, 17 Aug 2016 12:38:10 -0700
> John Fastabend <john.fastabend@gmail.com> wrote:
> 
>> This converts the pfifo_fast qdisc to use the alf_queue enqueue and
>                                                 ^^^^^^^^^
>> dequeue routines then sets the NOLOCK bit.
>>
>> This also removes the logic used to pick the next band to dequeue from
>> and instead just checks each alf_queue for packets from top priority
>                                ^^^^^^^^^
>> to lowest. This might need to be a bit more clever but seems to work
>> for now.
> 
> You need to fix the description, as you are no longer using my
> alf_queue implementation but instead are using the skb_array/ptr_ring
> queue (by MST).
> 

Yep I forgot to change this even though iirc you may have had the same
comment in the last rev. Thanks! I'll get it fixed up this time.

.John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc
  2016-08-17 23:00     ` John Fastabend
@ 2016-08-23 20:11       ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2016-08-23 20:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: xiyou.wangcong, jhs, alexei.starovoitov, brouer,
	john.r.fastabend, netdev, davem

On 16-08-17 04:00 PM, John Fastabend wrote:
> On 16-08-17 03:58 PM, Eric Dumazet wrote:
>> On Wed, 2016-08-17 at 12:36 -0700, John Fastabend wrote:
>>> Similar to how gso is handled skb_bad_tx needs to be per cpu to handle
>>> lockless qdisc with multiple writer/producers.
>> \
>>> @@ -1021,6 +1026,7 @@ err_out4:
>>>  	free_percpu(sch->cpu_bstats);
>>>  	free_percpu(sch->cpu_qstats);
>>>  	free_percpu(sch->gso_cpu_skb);
>>> +	free_percpu(sch->skb_bad_txq_cpu);
>>
>>
>> This might be the time to group all these per cpu allocations to a
>> single one, to help data locality and decrease overhead of having XX
>> pointers.
>>
>>
>>
> 
> Sounds like a good idea to me. I'll go ahead and add a patch to the
> front to consolidate the stats and then add these there.
> 

Actually this turned out to be not so trivial. To do it reasonably
requires changes in how gnet stats work and such. I'm going to propose
pushing this into a series after the initial lockless set otherwise the
patch set is going to start pushing 20+ patches.

Also a follow on series to make all the qdiscs support per cpu stats
would be nice and allows us to remove a lot of the annoying if/else
cases around stats. Its a bit tedious to go and change all the qdiscs
but mostly mechanical.

.John

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-08-23 20:12 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-17 19:33 [RFC PATCH 00/13] Series short description John Fastabend
2016-08-17 19:33 ` [RFC PATCH 01/13] net: sched: allow qdiscs to handle locking John Fastabend
2016-08-17 22:33   ` Eric Dumazet
2016-08-17 22:49     ` John Fastabend
2016-08-17 22:34   ` Eric Dumazet
2016-08-17 22:48     ` John Fastabend
2016-08-17 19:34 ` [RFC PATCH 02/13] net: sched: qdisc_qlen for per cpu logic John Fastabend
2016-08-17 19:34 ` [RFC PATCH 03/13] net: sched: provide per cpu qstat helpers John Fastabend
2016-08-17 19:35 ` [RFC PATCH 04/13] net: sched: provide atomic qlen helpers for bypass case John Fastabend
2016-08-17 19:35 ` [RFC PATCH 05/13] net: sched: a dflt qdisc may be used with per cpu stats John Fastabend
2016-08-17 19:35 ` [RFC PATCH 06/13] net: sched: per cpu gso handlers John Fastabend
2016-08-17 19:36 ` [RFC PATCH 07/13] net: sched: support qdisc_reset on NOLOCK qdisc John Fastabend
2016-08-17 22:53   ` Eric Dumazet
2016-08-17 22:59     ` John Fastabend
2016-08-17 19:36 ` [RFC PATCH 08/13] net: sched: support skb_bad_tx with lockless qdisc John Fastabend
2016-08-17 22:58   ` Eric Dumazet
2016-08-17 23:00     ` John Fastabend
2016-08-23 20:11       ` John Fastabend
2016-08-17 19:37 ` [RFC PATCH 09/13] net: sched: helper to sum qlen John Fastabend
2016-08-17 19:37 ` [RFC PATCH 10/13] net: sched: lockless support for netif_schedule John Fastabend
2016-08-17 19:46   ` John Fastabend
2016-08-17 23:01   ` Eric Dumazet
2016-08-17 23:17     ` John Fastabend
2016-08-17 23:33       ` Eric Dumazet
2016-08-17 19:38 ` [RFC PATCH 11/13] net: sched: pfifo_fast use alf_queue John Fastabend
2016-08-19 10:13   ` Jesper Dangaard Brouer
2016-08-19 15:44     ` John Fastabend
2016-08-17 19:38 ` [RFC PATCH 12/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq John Fastabend
2016-08-17 19:49   ` John Fastabend
2016-08-17 23:04   ` Eric Dumazet
2016-08-17 23:18     ` John Fastabend
2016-08-17 19:39 ` [RFC PATCH 13/13] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.