All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] tc cleanup?
@ 2015-04-21 19:27 Alexei Starovoitov
  2015-04-21 19:27 ` [RFC 1/3] tc: fix return values of ingress qdisc Alexei Starovoitov
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-21 19:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

Hi,

I've started cleaning TC a bit.
Before I go too far, need your feedback on this RFC:

patch 1 - stop abuse of return values in ingress qdisc
patch 2 - deprecate TC_ACT_QUEUED
patch 3 - reduce copy-paste around tc_classify()

Lightly tested so far. Waiting for John's and other tc test scripts.

Alexei Starovoitov (3):
  tc: fix return values of ingress qdisc
  tc: deprecate TC_ACT_QUEUED
  tc: cleanup tc_classify

 include/net/pkt_sched.h      |    2 ++
 include/net/sch_generic.h    |    7 +++++++
 include/uapi/linux/pkt_cls.h |    2 +-
 net/core/dev.c               |    8 ++------
 net/sched/sch_api.c          |   22 ++++++++++++++++++++++
 net/sched/sch_atm.c          |    1 -
 net/sched/sch_cbq.c          |    1 -
 net/sched/sch_choke.c        |   17 +++--------------
 net/sched/sch_drr.c          |   18 +++---------------
 net/sched/sch_dsmark.c       |    1 -
 net/sched/sch_fq_codel.c     |   25 ++++++-------------------
 net/sched/sch_hfsc.c         |    1 -
 net/sched/sch_htb.c          |    1 -
 net/sched/sch_ingress.c      |   10 ++++------
 net/sched/sch_multiq.c       |    1 -
 net/sched/sch_prio.c         |    1 -
 net/sched/sch_qfq.c          |   16 ++--------------
 net/sched/sch_sfb.c          |   17 +++--------------
 net/sched/sch_sfq.c          |   26 ++++++--------------------
 19 files changed, 61 insertions(+), 116 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC 1/3] tc: fix return values of ingress qdisc
  2015-04-21 19:27 [RFC 0/3] tc cleanup? Alexei Starovoitov
@ 2015-04-21 19:27 ` Alexei Starovoitov
  2015-04-22  4:59   ` Cong Wang
  2015-04-21 19:27 ` [RFC 2/3] tc: deprecate TC_ACT_QUEUED Alexei Starovoitov
  2015-04-21 19:27 ` [RFC 3/3] tc: cleanup tc_classify Alexei Starovoitov
  2 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-21 19:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

ingress qdisc should return NET_XMIT_* values just like all other qdiscs.

Since it's invoked via qdisc_enqueue_root() (which suppose to return
only NET_XMIT_* values as well), it was working by accident,
since TC_ACT_* values fit within NET_XMIT_MASK.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/core/dev.c          |    8 ++------
 net/sched/sch_ingress.c |    9 ++++-----
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1796cef55ab5..ac6233f6f353 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3533,7 +3533,7 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
 {
 	struct net_device *dev = skb->dev;
 	u32 ttl = G_TC_RTTL(skb->tc_verd);
-	int result = TC_ACT_OK;
+	int result = NET_XMIT_SUCCESS;
 	struct Qdisc *q;
 
 	if (unlikely(MAX_RED_LOOP < ttl++)) {
@@ -3570,12 +3570,8 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
 		*pt_prev = NULL;
 	}
 
-	switch (ing_filter(skb, rxq)) {
-	case TC_ACT_SHOT:
-	case TC_ACT_STOLEN:
-		kfree_skb(skb);
+	if (ing_filter(skb, rxq) == NET_XMIT_DROP)
 		return NULL;
-	}
 
 	return skb;
 }
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 4cdbfb85686a..e68f4a5dbeba 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -65,21 +65,20 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	result = tc_classify(skb, fl, &res);
 
-	qdisc_bstats_update(sch, skb);
 	switch (result) {
 	case TC_ACT_SHOT:
-		result = TC_ACT_SHOT;
 		qdisc_qstats_drop(sch);
-		break;
 	case TC_ACT_STOLEN:
 	case TC_ACT_QUEUED:
-		result = TC_ACT_STOLEN;
+		result = NET_XMIT_DROP;
+		kfree_skb(skb);
 		break;
 	case TC_ACT_RECLASSIFY:
 	case TC_ACT_OK:
 		skb->tc_index = TC_H_MIN(res.classid);
 	default:
-		result = TC_ACT_OK;
+		qdisc_bstats_update(sch, skb);
+		result = NET_XMIT_SUCCESS;
 		break;
 	}
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-21 19:27 [RFC 0/3] tc cleanup? Alexei Starovoitov
  2015-04-21 19:27 ` [RFC 1/3] tc: fix return values of ingress qdisc Alexei Starovoitov
@ 2015-04-21 19:27 ` Alexei Starovoitov
  2015-04-22  5:02   ` Cong Wang
  2015-04-21 19:27 ` [RFC 3/3] tc: cleanup tc_classify Alexei Starovoitov
  2 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-21 19:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
Get rid of redundant checks in all qdiscs.
Instead do it once.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/uapi/linux/pkt_cls.h |    2 +-
 net/sched/sch_api.c          |    2 ++
 net/sched/sch_atm.c          |    1 -
 net/sched/sch_cbq.c          |    1 -
 net/sched/sch_choke.c        |    1 -
 net/sched/sch_drr.c          |    1 -
 net/sched/sch_dsmark.c       |    1 -
 net/sched/sch_fq_codel.c     |    1 -
 net/sched/sch_hfsc.c         |    1 -
 net/sched/sch_htb.c          |    1 -
 net/sched/sch_ingress.c      |    1 -
 net/sched/sch_multiq.c       |    1 -
 net/sched/sch_prio.c         |    1 -
 net/sched/sch_qfq.c          |    1 -
 net/sched/sch_sfb.c          |    1 -
 net/sched/sch_sfq.c          |    1 -
 16 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index bf08e76bf505..208e5ed5256c 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -102,7 +102,7 @@ enum {
 #define TC_ACT_SHOT		2
 #define TC_ACT_PIPE		3
 #define TC_ACT_STOLEN		4
-#define TC_ACT_QUEUED		5
+#define TC_ACT_QUEUED		5 /* deprecated. same as TC_ACT_STOLEN */
 #define TC_ACT_REPEAT		6
 #define TC_ACT_JUMP		0x10000000
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ad9eed70bc8f..f7950327bb22 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1820,6 +1820,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 #ifdef CONFIG_NET_CLS_ACT
 			if (err != TC_ACT_RECLASSIFY && skb->tc_verd)
 				skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
+			if (err == TC_ACT_QUEUED)
+				return TC_ACT_STOLEN;
 #endif
 			return err;
 		}
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index e3e2cc5fd068..7ef0bf4bdce6 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -396,7 +396,6 @@ done:
 		/*@@@ looks good ... but it's not supposed to work :-) */
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			kfree_skb(skb);
 			return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beeb75f80fdb..eca4725e3273 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -258,7 +258,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			goto fallback;
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index c009eb9045ce..a3bc7cf151d3 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -212,7 +212,6 @@ static bool choke_classify(struct sk_buff *skb,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return false;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 338706092c27..1051c5d4e85b 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -335,7 +335,6 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 66700a6116aa..ce9d4123cbbe 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -236,7 +236,6 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 		switch (result) {
 #ifdef CONFIG_NET_CLS_ACT
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			kfree_skb(skb);
 			return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 1e52decb7b59..4d00ece3243d 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -104,7 +104,6 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return 0;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e6c7416d0332..3438f9b4fd8b 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1168,7 +1168,6 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index f1acb0f60dc3..32c1f081bed9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -232,7 +232,6 @@ static struct htb_class *htb_classify(struct sk_buff *skb, struct Qdisc *sch,
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index e68f4a5dbeba..63cbd1e066bd 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -69,7 +69,6 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	case TC_ACT_SHOT:
 		qdisc_qstats_drop(sch);
 	case TC_ACT_STOLEN:
-	case TC_ACT_QUEUED:
 		result = NET_XMIT_DROP;
 		kfree_skb(skb);
 		break;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 42dd218871e0..dc2a708a772c 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -50,7 +50,6 @@ multiq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 #ifdef CONFIG_NET_CLS_ACT
 	switch (err) {
 	case TC_ACT_STOLEN:
-	case TC_ACT_QUEUED:
 		*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 	case TC_ACT_SHOT:
 		return NULL;
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 8e5cd34aaa74..de4825a9d0aa 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -46,7 +46,6 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 #ifdef CONFIG_NET_CLS_ACT
 		switch (err) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return NULL;
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 3ec7e88a43ca..9d1e43ea0ccb 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -723,7 +723,6 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	if (result >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
-		case TC_ACT_QUEUED:
 		case TC_ACT_STOLEN:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 5819dd82630d..61e18108d0d3 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -264,7 +264,6 @@ static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return false;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index b877140beda5..b508ff655da3 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -207,7 +207,6 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 #ifdef CONFIG_NET_CLS_ACT
 		switch (result) {
 		case TC_ACT_STOLEN:
-		case TC_ACT_QUEUED:
 			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
 		case TC_ACT_SHOT:
 			return 0;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 3/3] tc: cleanup tc_classify
  2015-04-21 19:27 [RFC 0/3] tc cleanup? Alexei Starovoitov
  2015-04-21 19:27 ` [RFC 1/3] tc: fix return values of ingress qdisc Alexei Starovoitov
  2015-04-21 19:27 ` [RFC 2/3] tc: deprecate TC_ACT_QUEUED Alexei Starovoitov
@ 2015-04-21 19:27 ` Alexei Starovoitov
  2015-04-22  5:05   ` Cong Wang
  2 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-21 19:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce
copy-paste among different qdiscs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 include/net/pkt_sched.h   |    2 ++
 include/net/sch_generic.h |    7 +++++++
 net/sched/sch_api.c       |   20 ++++++++++++++++++++
 net/sched/sch_choke.c     |   16 +++-------------
 net/sched/sch_drr.c       |   17 +++--------------
 net/sched/sch_fq_codel.c  |   24 ++++++------------------
 net/sched/sch_qfq.c       |   15 ++-------------
 net/sched/sch_sfb.c       |   16 +++-------------
 net/sched/sch_sfq.c       |   25 ++++++-------------------
 9 files changed, 52 insertions(+), 90 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2342bf12cb78..7c73cbe95169 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -114,6 +114,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 		       struct tcf_result *res);
 int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		struct tcf_result *res);
+int tc_classify_act(struct sk_buff *skb, const struct tcf_proto *tp,
+		    struct tcf_result *res, int *qerr);
 
 static inline __be16 tc_skb_protocol(const struct sk_buff *skb)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6d778efcfdfd..9a50bad24b1d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -715,6 +715,13 @@ static inline int qdisc_drop(struct sk_buff *skb, struct Qdisc *sch)
 	return NET_XMIT_DROP;
 }
 
+static inline void qdisc_drop_bypass(struct sk_buff *skb, struct Qdisc *sch, int err)
+{
+	if (err & __NET_XMIT_BYPASS)
+		qdisc_qstats_drop(sch);
+	kfree_skb(skb);
+}
+
 static inline int qdisc_reshape_fail(struct sk_buff *skb, struct Qdisc *sch)
 {
 	qdisc_qstats_drop(sch);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f7950327bb22..c7c4a672eb35 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1860,6 +1860,26 @@ reclassify:
 }
 EXPORT_SYMBOL(tc_classify);
 
+int tc_classify_act(struct sk_buff *skb, const struct tcf_proto *tp,
+		    struct tcf_result *res, int *qerr)
+{
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	result = tc_classify(skb, tp, res);
+
+#ifdef CONFIG_NET_CLS_ACT
+	switch (result) {
+	case TC_ACT_STOLEN:
+		*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+	case TC_ACT_SHOT:
+		return -1;
+	}
+#endif
+	return result;
+}
+EXPORT_SYMBOL(tc_classify_act);
+
 bool tcf_destroy(struct tcf_proto *tp, bool force)
 {
 	if (tp->ops->destroy(tp, force)) {
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index a3bc7cf151d3..8d8ad5303497 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -207,16 +207,8 @@ static bool choke_classify(struct sk_buff *skb,
 	int result;
 
 	fl = rcu_dereference_bh(q->filter_list);
-	result = tc_classify(skb, fl, &res);
+	result = tc_classify_act(skb, fl, &res, qerr);
 	if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		switch (result) {
-		case TC_ACT_STOLEN:
-			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		case TC_ACT_SHOT:
-			return false;
-		}
-#endif
 		choke_set_classid(skb, TC_H_MIN(res.classid));
 		return true;
 	}
@@ -268,9 +260,9 @@ static bool choke_match_random(const struct choke_sched_data *q,
 
 static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
-	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	struct choke_sched_data *q = qdisc_priv(sch);
 	const struct red_parms *p = &q->parms;
+	int ret;
 
 	if (rcu_access_pointer(q->filter_list)) {
 		/* If using external classifiers, get result and record it. */
@@ -343,9 +335,7 @@ congestion_drop:
 	return NET_XMIT_CN;
 
 other_drop:
-	if (ret & __NET_XMIT_BYPASS)
-		qdisc_qstats_drop(sch);
-	kfree_skb(skb);
+	qdisc_drop_bypass(skb, sch, ret);
 	return ret;
 }
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 1051c5d4e85b..36ab69375c79 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -329,18 +329,9 @@ static struct drr_class *drr_classify(struct sk_buff *skb, struct Qdisc *sch,
 			return cl;
 	}
 
-	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	fl = rcu_dereference_bh(q->filter_list);
-	result = tc_classify(skb, fl, &res);
+	result = tc_classify_act(skb, fl, &res, qerr);
 	if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		switch (result) {
-		case TC_ACT_STOLEN:
-			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		case TC_ACT_SHOT:
-			return NULL;
-		}
-#endif
 		cl = (struct drr_class *)res.class;
 		if (cl == NULL)
 			cl = drr_find_class(sch, res.classid);
@@ -353,13 +344,11 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct drr_sched *q = qdisc_priv(sch);
 	struct drr_class *cl;
-	int err = 0;
+	int err;
 
 	cl = drr_classify(skb, sch, &err);
 	if (cl == NULL) {
-		if (err & __NET_XMIT_BYPASS)
-			qdisc_qstats_drop(sch);
-		kfree_skb(skb);
+		qdisc_drop_bypass(skb, sch, err);
 		return err;
 	}
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 4d00ece3243d..84094637807f 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -98,20 +98,10 @@ static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
 	if (!filter)
 		return fq_codel_hash(q, skb) + 1;
 
-	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, filter, &res);
-	if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		switch (result) {
-		case TC_ACT_STOLEN:
-			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		case TC_ACT_SHOT:
-			return 0;
-		}
-#endif
-		if (TC_H_MIN(res.classid) <= q->flows_cnt)
-			return TC_H_MIN(res.classid);
-	}
+	result = tc_classify_act(skb, filter, &res, qerr);
+	if (result >= 0 && TC_H_MIN(res.classid) <= q->flows_cnt)
+		return TC_H_MIN(res.classid);
+
 	return 0;
 }
 
@@ -174,13 +164,11 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 	unsigned int idx;
 	struct fq_codel_flow *flow;
-	int uninitialized_var(ret);
+	int ret;
 
 	idx = fq_codel_classify(skb, sch, &ret);
 	if (idx == 0) {
-		if (ret & __NET_XMIT_BYPASS)
-			qdisc_qstats_drop(sch);
-		kfree_skb(skb);
+		qdisc_drop_bypass(skb, sch, ret);
 		return ret;
 	}
 	idx--;
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 9d1e43ea0ccb..7bed6b5bc500 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -717,18 +717,9 @@ static struct qfq_class *qfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 			return cl;
 	}
 
-	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
 	fl = rcu_dereference_bh(q->filter_list);
-	result = tc_classify(skb, fl, &res);
+	result = tc_classify_act(skb, fl, &res, qerr);
 	if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		switch (result) {
-		case TC_ACT_STOLEN:
-			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		case TC_ACT_SHOT:
-			return NULL;
-		}
-#endif
 		cl = (struct qfq_class *)res.class;
 		if (cl == NULL)
 			cl = qfq_find_class(sch, res.classid);
@@ -1227,9 +1218,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	cl = qfq_classify(skb, sch, &err);
 	if (cl == NULL) {
-		if (err & __NET_XMIT_BYPASS)
-			qdisc_qstats_drop(sch);
-		kfree_skb(skb);
+		qdisc_drop_bypass(skb, sch, err);
 		return err;
 	}
 	pr_debug("qfq_enqueue: cl = %x\n", cl->common.classid);
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 61e18108d0d3..c36c596ba56b 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -259,16 +259,8 @@ static bool sfb_classify(struct sk_buff *skb, struct tcf_proto *fl,
 	struct tcf_result res;
 	int result;
 
-	result = tc_classify(skb, fl, &res);
+	result = tc_classify_act(skb, fl, &res, qerr);
 	if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		switch (result) {
-		case TC_ACT_STOLEN:
-			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		case TC_ACT_SHOT:
-			return false;
-		}
-#endif
 		*salt = TC_H_MIN(res.classid);
 		return true;
 	}
@@ -285,7 +277,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	u32 p_min = ~0;
 	u32 minqlen = ~0;
 	u32 r, slot, salt, sfbhash;
-	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	int ret;
 	struct flow_keys keys;
 
 	if (unlikely(sch->q.qlen >= q->limit)) {
@@ -418,9 +410,7 @@ drop:
 	qdisc_drop(skb, sch);
 	return NET_XMIT_CN;
 other_drop:
-	if (ret & __NET_XMIT_BYPASS)
-		qdisc_qstats_drop(sch);
-	kfree_skb(skb);
+	qdisc_drop_bypass(skb, sch, ret);
 	return ret;
 }
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index b508ff655da3..7d9ad84bc8f9 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -201,20 +201,10 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 		return sfq_hash(q, skb) + 1;
 	}
 
-	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
-	result = tc_classify(skb, fl, &res);
-	if (result >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
-		switch (result) {
-		case TC_ACT_STOLEN:
-			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
-		case TC_ACT_SHOT:
-			return 0;
-		}
-#endif
-		if (TC_H_MIN(res.classid) <= q->divisor)
-			return TC_H_MIN(res.classid);
-	}
+	result = tc_classify_act(skb, fl, &res, qerr);
+	if (result >= 0 && TC_H_MIN(res.classid) <= q->divisor)
+		return TC_H_MIN(res.classid);
+
 	return 0;
 }
 
@@ -371,15 +361,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	unsigned int hash;
 	sfq_index x, qlen;
 	struct sfq_slot *slot;
-	int uninitialized_var(ret);
 	struct sk_buff *head;
-	int delta;
+	int delta, ret;
 
 	hash = sfq_classify(skb, sch, &ret);
 	if (hash == 0) {
-		if (ret & __NET_XMIT_BYPASS)
-			qdisc_qstats_drop(sch);
-		kfree_skb(skb);
+		qdisc_drop_bypass(skb, sch, ret);
 		return ret;
 	}
 	hash--;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC 1/3] tc: fix return values of ingress qdisc
  2015-04-21 19:27 ` [RFC 1/3] tc: fix return values of ingress qdisc Alexei Starovoitov
@ 2015-04-22  4:59   ` Cong Wang
  2015-04-22 22:04     ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-22  4:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
>

XMIT already means egress...

> Since it's invoked via qdisc_enqueue_root() (which suppose to return
> only NET_XMIT_* values as well), it was working by accident,
> since TC_ACT_* values fit within NET_XMIT_MASK.
>

Why not just add a BUILD_BUG_ON() to capture this?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-21 19:27 ` [RFC 2/3] tc: deprecate TC_ACT_QUEUED Alexei Starovoitov
@ 2015-04-22  5:02   ` Cong Wang
  2015-04-22  7:43     ` Daniel Borkmann
  2015-04-22 22:22     ` Alexei Starovoitov
  0 siblings, 2 replies; 30+ messages in thread
From: Cong Wang @ 2015-04-22  5:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
> Get rid of redundant checks in all qdiscs.
> Instead do it once.

The current code can be easily extended, while your code not.
I don't see the need of this change.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/3] tc: cleanup tc_classify
  2015-04-21 19:27 ` [RFC 3/3] tc: cleanup tc_classify Alexei Starovoitov
@ 2015-04-22  5:05   ` Cong Wang
  2015-04-22 22:27     ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-22  5:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce
> copy-paste among different qdiscs


I don't think qdisc_drop_bypass() is more readable than without it,
maybe you need a better name, or just leave the code as it is.

tc_classify_act() seems ok.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-22  5:02   ` Cong Wang
@ 2015-04-22  7:43     ` Daniel Borkmann
  2015-04-22 22:22     ` Alexei Starovoitov
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Borkmann @ 2015-04-22  7:43 UTC (permalink / raw)
  To: Cong Wang, Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 04/22/2015 07:02 AM, Cong Wang wrote:
> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>> Get rid of redundant checks in all qdiscs.
>> Instead do it once.
>
> The current code can be easily extended, while your code not.

So, do you plan to extend it ?

The alias has been there since pre-git times, so more than
10 years now, where no-one felt the need to extended the
semantics of it ...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 1/3] tc: fix return values of ingress qdisc
  2015-04-22  4:59   ` Cong Wang
@ 2015-04-22 22:04     ` Alexei Starovoitov
  2015-04-22 23:29       ` Cong Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-22 22:04 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 4/21/15 9:59 PM, Cong Wang wrote:
> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
>>
>
> XMIT already means egress...

may be then it should be renamed as well.
from include/linux/netdevice.h:
/* qdisc ->enqueue() return codes. */
#define NET_XMIT_SUCCESS        0x00
...

the point is that qdisc->enqeue() must return NET_XMIT_* values.
ingress qdisc is violating this and therefore should be fixed.

>> Since it's invoked via qdisc_enqueue_root() (which suppose to return
>> only NET_XMIT_* values as well), it was working by accident,
>> since TC_ACT_* values fit within NET_XMIT_MASK.
>>
>
> Why not just add a BUILD_BUG_ON() to capture this?

ingress qdisc returning TC_ACT_* values is an obvious layering
violation. I'm puzzled why it's been this way for so long.
Adding BUILD_BUG_ON is not an option.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-22  5:02   ` Cong Wang
  2015-04-22  7:43     ` Daniel Borkmann
@ 2015-04-22 22:22     ` Alexei Starovoitov
  2015-04-22 23:39       ` Cong Wang
  2015-04-23 20:45       ` Jamal Hadi Salim
  1 sibling, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-22 22:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 4/21/15 10:02 PM, Cong Wang wrote:
> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>> Get rid of redundant checks in all qdiscs.
>> Instead do it once.
>
> The current code can be easily extended, while your code not.
> I don't see the need of this change.

well, iproute2 doesn't use TC_ACT_QUEUED action at all and
TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
If you're saying that some future actions together with
some future qdiscs may take advantage of that, then why they didn't
use it over the last 10 years?
Having both that do the same thing is only confusing.
I think having one value to indicate 'stolen' condition makes TC
code easier to understand.
Jamal, what's your take on this?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/3] tc: cleanup tc_classify
  2015-04-22  5:05   ` Cong Wang
@ 2015-04-22 22:27     ` Alexei Starovoitov
  2015-04-22 23:38       ` Cong Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-22 22:27 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 4/21/15 10:05 PM, Cong Wang wrote:
> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce
>> copy-paste among different qdiscs
>
>
> I don't think qdisc_drop_bypass() is more readable than without it,
> maybe you need a better name, or just leave the code as it is.

what would be a better name? I'm open to suggestions.

We already have qdisc_drop() that does:
         kfree_skb(skb);
         qdisc_qstats_drop(sch);

my proposed qdisc_drop_bypass() does stats math conditionally:
         if (err & __NET_XMIT_BYPASS)
                 qdisc_qstats_drop(sch);
         kfree_skb(skb);

So together I think they fit nicely. With this helper the sch_choke.c
looks like:
congestion_drop:
         qdisc_drop(skb, sch);
         return NET_XMIT_CN;
other_drop:
         qdisc_drop_bypass(skb, sch, ret);
         return ret;

and in the next set of cleanups I'm planning to combine these two
helpers into one, but I need to cleanup __NET_XMIT_BYPASS flag first.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 1/3] tc: fix return values of ingress qdisc
  2015-04-22 22:04     ` Alexei Starovoitov
@ 2015-04-22 23:29       ` Cong Wang
  2015-04-23  8:46         ` Thomas Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-22 23:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Wed, Apr 22, 2015 at 3:04 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/21/15 9:59 PM, Cong Wang wrote:
>>
>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
>> wrote:
>>>
>>> ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
>>>
>>
>> XMIT already means egress...
>
>
> may be then it should be renamed as well.
> from include/linux/netdevice.h:
> /* qdisc ->enqueue() return codes. */
> #define NET_XMIT_SUCCESS        0x00
> ...
>
> the point is that qdisc->enqeue() must return NET_XMIT_* values.
> ingress qdisc is violating this and therefore should be fixed.

XMIT is non-sense for ingress, you really need to pick another
name for it if TC_ACT_OK isn't okay for you (it is okay for me).


>
>>> Since it's invoked via qdisc_enqueue_root() (which suppose to return
>>> only NET_XMIT_* values as well), it was working by accident,
>>> since TC_ACT_* values fit within NET_XMIT_MASK.
>>>
>>
>> Why not just add a BUILD_BUG_ON() to capture this?
>
>
> ingress qdisc returning TC_ACT_* values is an obvious layering
> violation. I'm puzzled why it's been this way for so long.

Why? Everyone knows ingress is the only qdisc on ingress
and it has no queue and can only accept filters (actions are on filters).

> Adding BUILD_BUG_ON is not an option.
>

It at least tells people we know their value are same, IOW, not a bug.

I don't see the need for the change except a BUILD_BUG_ON().

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/3] tc: cleanup tc_classify
  2015-04-22 22:27     ` Alexei Starovoitov
@ 2015-04-22 23:38       ` Cong Wang
  2015-04-23  8:49         ` Thomas Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-22 23:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Wed, Apr 22, 2015 at 3:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/21/15 10:05 PM, Cong Wang wrote:
>>
>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
>> wrote:
>>>
>>> introduce tc_classify_act() and qdisc_drop_bypass() helper functions to
>>> reduce
>>> copy-paste among different qdiscs
>>
>>
>>
>> I don't think qdisc_drop_bypass() is more readable than without it,
>> maybe you need a better name, or just leave the code as it is.
>
>
> what would be a better name? I'm open to suggestions.

My reading for "qdisc_drop_bypass()" is it bypasses packet
dropping for some case, apparently doesn't match its definition.

I can't think out a better name therefore I don't think it deserves
a function, just leave as it is.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-22 22:22     ` Alexei Starovoitov
@ 2015-04-22 23:39       ` Cong Wang
  2015-04-23  2:46         ` Alexei Starovoitov
  2015-04-23 20:45       ` Jamal Hadi Salim
  1 sibling, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-22 23:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Wed, Apr 22, 2015 at 3:22 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/21/15 10:02 PM, Cong Wang wrote:
>>
>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
>> wrote:
>>>
>>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>>> Get rid of redundant checks in all qdiscs.
>>> Instead do it once.
>>
>>
>> The current code can be easily extended, while your code not.
>> I don't see the need of this change.
>
>
> well, iproute2 doesn't use TC_ACT_QUEUED action at all and
> TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
> If you're saying that some future actions together with
> some future qdiscs may take advantage of that, then why they didn't
> use it over the last 10 years?
> Having both that do the same thing is only confusing.
> I think having one value to indicate 'stolen' condition makes TC
> code easier to understand.

Then remove it, I am all for this. ;)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-22 23:39       ` Cong Wang
@ 2015-04-23  2:46         ` Alexei Starovoitov
  2015-04-23  7:13           ` Daniel Borkmann
  2015-04-23 18:12           ` Cong Wang
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-23  2:46 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 4/22/15 4:39 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 3:22 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On 4/21/15 10:02 PM, Cong Wang wrote:
>>>
>>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
>>> wrote:
>>>>
>>>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>>>> Get rid of redundant checks in all qdiscs.
>>>> Instead do it once.
>>>
>>>
>>> The current code can be easily extended, while your code not.
>>> I don't see the need of this change.
>>
>>
>> well, iproute2 doesn't use TC_ACT_QUEUED action at all and
>> TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
>> If you're saying that some future actions together with
>> some future qdiscs may take advantage of that, then why they didn't
>> use it over the last 10 years?
>> Having both that do the same thing is only confusing.
>> I think having one value to indicate 'stolen' condition makes TC
>> code easier to understand.
>
> Then remove it, I am all for this. ;)

TC_ACT_QUEUED cannot be removed.
Only deprecated with backwards compatibility the way this patch did it.
That should have been obvious.

The other two threads degenerated into non-technical comments.

Anyway, this set was RFC to answer my main question whether I should
continue with tc cleanup or stop right here. I got my answer.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23  2:46         ` Alexei Starovoitov
@ 2015-04-23  7:13           ` Daniel Borkmann
  2015-04-23 18:12           ` Cong Wang
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Borkmann @ 2015-04-23  7:13 UTC (permalink / raw)
  To: Alexei Starovoitov, Cong Wang
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 04/23/2015 04:46 AM, Alexei Starovoitov wrote:
...
> The other two threads degenerated into non-technical comments.

Yep. :-/

> Anyway, this set was RFC to answer my main question whether I should
> continue with tc cleanup or stop right here. I got my answer.

I think it's worth proceeding with this, it's long overdue.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 1/3] tc: fix return values of ingress qdisc
  2015-04-22 23:29       ` Cong Wang
@ 2015-04-23  8:46         ` Thomas Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-04-23  8:46 UTC (permalink / raw)
  To: Cong Wang
  Cc: Alexei Starovoitov, David S. Miller, Eric Dumazet,
	Jamal Hadi Salim, John Fastabend, netdev

On 04/22/15 at 04:29pm, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 3:04 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> > On 4/21/15 9:59 PM, Cong Wang wrote:
> >>
> >> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
> >> wrote:
> >>>
> >>> ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
> >>>
> >>
> >> XMIT already means egress...
> >
> >
> > may be then it should be renamed as well.
> > from include/linux/netdevice.h:
> > /* qdisc ->enqueue() return codes. */
> > #define NET_XMIT_SUCCESS        0x00
> > ...
> >
> > the point is that qdisc->enqeue() must return NET_XMIT_* values.
> > ingress qdisc is violating this and therefore should be fixed.
> 
> XMIT is non-sense for ingress, you really need to pick another
> name for it if TC_ACT_OK isn't okay for you (it is okay for me).

You transmit into a qdisc. If that terminology doesn't suit
you then rename it to NET_QUEUE_* but moving away from returning
TC_ACT_* is definitely the right thing to do here.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 3/3] tc: cleanup tc_classify
  2015-04-22 23:38       ` Cong Wang
@ 2015-04-23  8:49         ` Thomas Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-04-23  8:49 UTC (permalink / raw)
  To: Cong Wang
  Cc: Alexei Starovoitov, David S. Miller, Eric Dumazet,
	Jamal Hadi Salim, John Fastabend, netdev

On 04/22/15 at 04:38pm, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 3:27 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> > On 4/21/15 10:05 PM, Cong Wang wrote:
> >>
> >> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov <ast@plumgrid.com>
> >> wrote:
> >>>
> >>> introduce tc_classify_act() and qdisc_drop_bypass() helper functions to
> >>> reduce
> >>> copy-paste among different qdiscs

I like this cleanup. It aligns all skb dropping in qdiscs to a
qdisc_drop*() function.

> >> I don't think qdisc_drop_bypass() is more readable than without it,
> >> maybe you need a better name, or just leave the code as it is.
> >
> >
> > what would be a better name? I'm open to suggestions.
> 
> My reading for "qdisc_drop_bypass()" is it bypasses packet
> dropping for some case, apparently doesn't match its definition.
> 
> I can't think out a better name therefore I don't think it deserves
> a function, just leave as it is.

Interesting logic ;-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23  2:46         ` Alexei Starovoitov
  2015-04-23  7:13           ` Daniel Borkmann
@ 2015-04-23 18:12           ` Cong Wang
  2015-04-23 18:21             ` Daniel Borkmann
  1 sibling, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-23 18:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>
> TC_ACT_QUEUED cannot be removed.
> Only deprecated with backwards compatibility the way this patch did it.
> That should have been obvious.
>

It is at least the third time I have to repeat that: we really don't care
about out-of-tree modules.

Everyone MUST read Documentation/stable_api_nonsense.txt.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 18:12           ` Cong Wang
@ 2015-04-23 18:21             ` Daniel Borkmann
  2015-04-23 18:30               ` Cong Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Borkmann @ 2015-04-23 18:21 UTC (permalink / raw)
  To: Cong Wang, Alexei Starovoitov
  Cc: David S. Miller, Eric Dumazet, Jamal Hadi Salim, John Fastabend, netdev

On 04/23/2015 08:12 PM, Cong Wang wrote:
> On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
...
>> TC_ACT_QUEUED cannot be removed.
>> Only deprecated with backwards compatibility the way this patch did it.
>> That should have been obvious.
>
> It is at least the third time I have to repeat that: we really don't care
> about out-of-tree modules.
>
> Everyone MUST read Documentation/stable_api_nonsense.txt.

Seriously, what are you talking about ???

$ git grep -n TC_ACT_QUEUED include/uapi/
include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED          5

This is *UAPI*, no-one is even remotely talking about out-of-tree
kernel modules.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 18:21             ` Daniel Borkmann
@ 2015-04-23 18:30               ` Cong Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Cong Wang @ 2015-04-23 18:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David S. Miller, Eric Dumazet,
	Jamal Hadi Salim, John Fastabend, netdev

On Thu, Apr 23, 2015 at 11:21 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 04/23/2015 08:12 PM, Cong Wang wrote:
>>
>> On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov <ast@plumgrid.com>
>> wrote:
>
> ...
>>>
>>> TC_ACT_QUEUED cannot be removed.
>>> Only deprecated with backwards compatibility the way this patch did it.
>>> That should have been obvious.
>>
>>
>> It is at least the third time I have to repeat that: we really don't care
>> about out-of-tree modules.
>>
>> Everyone MUST read Documentation/stable_api_nonsense.txt.
>
>
> Seriously, what are you talking about ???
>
> $ git grep -n TC_ACT_QUEUED include/uapi/
> include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED          5
>
> This is *UAPI*, no-one is even remotely talking about out-of-tree
> kernel modules.

I am not stupid. You should figure out we can just leave its definition
by removing it, leaving the default as stolen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-22 22:22     ` Alexei Starovoitov
  2015-04-22 23:39       ` Cong Wang
@ 2015-04-23 20:45       ` Jamal Hadi Salim
  2015-04-23 21:20         ` Florian Westphal
  2015-04-23 22:13         ` Alexei Starovoitov
  1 sibling, 2 replies; 30+ messages in thread
From: Jamal Hadi Salim @ 2015-04-23 20:45 UTC (permalink / raw)
  To: Alexei Starovoitov, Cong Wang
  Cc: David S. Miller, Eric Dumazet, John Fastabend, netdev

On 04/22/15 18:22, Alexei Starovoitov wrote:
> On 4/21/15 10:02 PM, Cong Wang wrote:
>> On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov
>> <ast@plumgrid.com> wrote:
>>> TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN.
>>> Get rid of redundant checks in all qdiscs.
>>> Instead do it once.
>>
>> The current code can be easily extended, while your code not.
>> I don't see the need of this change.
>
> well, iproute2 doesn't use TC_ACT_QUEUED action at all and
> TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them.
> If you're saying that some future actions together with
> some future qdiscs may take advantage of that, then why they didn't
> use it over the last 10 years?
> Having both that do the same thing is only confusing.
> I think having one value to indicate 'stolen' condition makes TC
> code easier to understand.
> Jamal, what's your take on this?


Sorry - I am on travel so not very helpful/responsive.
Several things:

1) the _XMIT semantics are useful on the egress side because in fact
we do have queues and they can be attached to qdiscs etc.
The TC_ACT_XXX codes were _intentional_ since ingress works as a
classifier shell. The fact that qdiscs dealt with these codes directly
allows for specialized handling. Moving them to a generic function
seems to defeat that purpose. So I am siding with Cong on this.
You probably have some other motivation for doing this which is not 
clear yet? Are you planning to queue things on ingress?

2) the ACT_QUEUED vs STOLEN was supposed to have semantics of something
that was stolen (eg redirection should definetely have been returning
STOLEN not QUEUED); something that queues for later re-injection
(with any/all metadata) was intended to use QUEUED. I believe netfilter
may have  followed suit and introduced similar codes (so it would be
interesting to see how they use them). In any case, it is clear it is
not being used - but i want to point the reason it was there
(which may still be useful; problem is TheLinuxWay is cutnpaste where
someone repeats the code that exists).

cheers,
jamal

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 20:45       ` Jamal Hadi Salim
@ 2015-04-23 21:20         ` Florian Westphal
  2015-04-23 22:13         ` Alexei Starovoitov
  1 sibling, 0 replies; 30+ messages in thread
From: Florian Westphal @ 2015-04-23 21:20 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Alexei Starovoitov, Cong Wang, David S. Miller, Eric Dumazet,
	John Fastabend, netdev

Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> 2) the ACT_QUEUED vs STOLEN was supposed to have semantics of something
> that was stolen (eg redirection should definetely have been returning
> STOLEN not QUEUED); something that queues for later re-injection
> (with any/all metadata) was intended to use QUEUED. I believe netfilter
> may have  followed suit and introduced similar codes (so it would be
> interesting to see how they use them).

Hooks (Targets) don't queue themselves, i.e. NF_QUEUE tells the
netfilter core that the skb is to be handed off to nf_queue machinery,
while NF_STOLEN is the more obvious "don't touch this skb ever again".

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 20:45       ` Jamal Hadi Salim
  2015-04-23 21:20         ` Florian Westphal
@ 2015-04-23 22:13         ` Alexei Starovoitov
  2015-04-23 22:33           ` Cong Wang
  2015-04-23 22:51           ` Jamal Hadi Salim
  1 sibling, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-23 22:13 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang
  Cc: David S. Miller, Eric Dumazet, John Fastabend, netdev

On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:
>
> 1) the _XMIT semantics are useful on the egress side because in fact
> we do have queues and they can be attached to qdiscs etc.
> The TC_ACT_XXX codes were _intentional_ since ingress works as a
> classifier shell.

then it is worse mess than I thought :(
Why call it _qdisc_ then? and have special and convoluted handling for
it in qdisc_create, qdisc_graft and other places?

 > Are you planning to queue things on ingress?

I thought that was the whole purpose of ingress qdisc.
why then we have dev->ingress_queue?

If queueing was never a goal, may be we should kill ingress qdisc
and replace it with a simple shim that only does cls/act.
The code overall will get much simpler and faster.

Feels like falling into rabbit hole.

> The fact that qdiscs dealt with these codes directly
> allows for specialized handling. Moving them to a generic function
> seems to defeat that purpose. So I am siding with Cong on this.

that's not what patch 1 is doing. It is still doing specialized
handling... but in light of what you said above, it looks like much
bigger cleanup is needed. We'll continue arguing when I refactor
this set and resubmit when net-next reopens.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 22:13         ` Alexei Starovoitov
@ 2015-04-23 22:33           ` Cong Wang
  2015-04-23 22:51           ` Jamal Hadi Salim
  1 sibling, 0 replies; 30+ messages in thread
From: Cong Wang @ 2015-04-23 22:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet, John Fastabend, netdev

On Thu, Apr 23, 2015 at 3:13 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:
>>
>>
>> 1) the _XMIT semantics are useful on the egress side because in fact
>> we do have queues and they can be attached to qdiscs etc.
>> The TC_ACT_XXX codes were _intentional_ since ingress works as a
>> classifier shell.
>
>
> then it is worse mess than I thought :(
> Why call it _qdisc_ then? and have special and convoluted handling for
> it in qdisc_create, qdisc_graft and other places?

It needs to glue into qdisc layer, unify the abstractions.

>
>> Are you planning to queue things on ingress?
>
> I thought that was the whole purpose of ingress qdisc.

There is no queue for ingress qdisc.

> why then we have dev->ingress_queue?

Glue layer, qdisc ties too much with txq, which is what I want to solve.

We have struct netdev_rx_queue too, but look at it, there is nothing
but some sysfs stuffs needed by RPS.

>
> If queueing was never a goal, may be we should kill ingress qdisc
> and replace it with a simple shim that only does cls/act.
> The code overall will get much simpler and faster.
>

As I said in response to your patch for skb->data, we need to align
ingress with egress, rather than making more differences, this includes
qdisc's too. We do have per-cpu queues for ingress, at least in theory
we have a queue for ingress to use.

Of course, RX is naturally different from TX, which is the root cause
why we have so many differences here.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 22:13         ` Alexei Starovoitov
  2015-04-23 22:33           ` Cong Wang
@ 2015-04-23 22:51           ` Jamal Hadi Salim
  2015-04-24  0:59             ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Jamal Hadi Salim @ 2015-04-23 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov, Cong Wang
  Cc: David S. Miller, Eric Dumazet, John Fastabend, netdev

On 04/23/15 18:13, Alexei Starovoitov wrote:
> On 4/23/15 1:45 PM, Jamal Hadi Salim wrote:

> then it is worse mess than I thought :(
> Why call it _qdisc_ then? and have special and convoluted handling for
> it in qdisc_create, qdisc_graft and other places?

Convinience for tooling and using existing abstractions is the
historical. You can attach qdiscs to netdevs.
You can then use all sorts of well understood tools.

>  > Are you planning to queue things on ingress?
>
> I thought that was the whole purpose of ingress qdisc.
> why then we have dev->ingress_queue?
>

So you are planning to add queues? If you are that is a different
discussion (and the use case needs some clarity).

> If queueing was never a goal, may be we should kill ingress qdisc
> and replace it with a simple shim that only does cls/act.
> The code overall will get much simpler and faster.
>

Attaching to the device was considered "simpler" based on the tooling
thought process. On whether we should queue on ingress - it was not
considered useful if the packets were going to the host i.e we want to
proceed to completion likely queueing to some socket layer further
upstream.
For packets being forwarded we already had egress qdiscs which had
queues so it didnt seem to make sense to enqueue on ingress for such
use cases.
There was a use case later where multiple ingress ports had to be shared
and ifb was born - which is pseudo temporary enqueueing on ingress.


> Feels like falling into rabbit hole.
>
>> The fact that qdiscs dealt with these codes directly
>> allows for specialized handling. Moving them to a generic function
>> seems to defeat that purpose. So I am siding with Cong on this.
>
> that's not what patch 1 is doing. It is still doing specialized
> handling... but in light of what you said above, it looks like much
> bigger cleanup is needed. We'll continue arguing when I refactor
> this set and resubmit when net-next reopens.
>

Sorry - i was refereing to the patch where you agregated things after
the qdisc invokes a classifier.

cheers,
jamal

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-23 22:51           ` Jamal Hadi Salim
@ 2015-04-24  0:59             ` Alexei Starovoitov
  2015-04-24  3:37               ` Cong Wang
  2015-04-27 12:31               ` Jamal Hadi Salim
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-04-24  0:59 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang
  Cc: David S. Miller, Eric Dumazet, John Fastabend, netdev

On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
>
> So you are planning to add queues? If you are that is a different
> discussion (and the use case needs some clarity).

nope. I wasn't planning to do that.

 > For packets being forwarded we already had egress qdiscs which had
 > queues so it didnt seem to make sense to enqueue on ingress for such
 > use cases.
 > There was a use case later where multiple ingress ports had to be
 > shared
 > and ifb was born - which is pseudo temporary enqueueing on ingress.

agree. imo ifb approach is more flexible, since it has full hierarchy
of qdiscs. As you're saying above and from the old ifb logs I thought
that ifb is _temporary_ and long term plan is to use ingress_queue,
but looks like this is not the case. Also not too long ago we decided
that we don't want another ingress qdisc. Therefore I think it makes
sense to simplify the code and boost performance.
I'm not proposing to change tooling, of course.
 From iproute2 point of view we'll still have ingress qdisc.
Right now we're pointlessly allocating memory for it and for
ingress_queue, whereas we only need to call cls/act.
I'm proposing to kill them and used tcf_proto in net_device instead.
Right now to reach cls in critical path on ingress we do:
rxq = skb->dev->ingress_queue
sch = rxq->qdisc
sch->enqueue
sch->filter_list
with a bunch of 'if' conditions and useless memory accesses in-between.
If we add 'ingress_filter_list' directly to net_device,
it will be just:
skb->dev->ingress_filter_list
which is huge performance boost. Code size will shrink as well.
iproute2 and all existing tools will work as-is. Looks as win-win to me.

>>> The fact that qdiscs dealt with these codes directly
>>> allows for specialized handling. Moving them to a generic function
>>> seems to defeat that purpose. So I am siding with Cong on this.
>>
>> that's not what patch 1 is doing. It is still doing specialized
>> handling... but in light of what you said above, it looks like much
>> bigger cleanup is needed. We'll continue arguing when I refactor
>> this set and resubmit when net-next reopens.
>
> Sorry - i was refereing to the patch where you agregated things after
> the qdisc invokes a classifier.

hmm. There I also didn't convert all qdiscs into single helper.
Only those that have exactly the same logic after tc_classify.
All other qdiscs have custom handling.
No worries, it's hard to review things while traveling. Been there :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-24  0:59             ` Alexei Starovoitov
@ 2015-04-24  3:37               ` Cong Wang
  2015-04-24  8:12                 ` Daniel Borkmann
  2015-04-27 12:31               ` Jamal Hadi Salim
  1 sibling, 1 reply; 30+ messages in thread
From: Cong Wang @ 2015-04-24  3:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet, John Fastabend, netdev

On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
>>
>>
>> So you are planning to add queues? If you are that is a different
>> discussion (and the use case needs some clarity).
>
>
> nope. I wasn't planning to do that.
>
>> For packets being forwarded we already had egress qdiscs which had
>> queues so it didnt seem to make sense to enqueue on ingress for such
>> use cases.
>> There was a use case later where multiple ingress ports had to be
>> shared
>> and ifb was born - which is pseudo temporary enqueueing on ingress.
>
> agree. imo ifb approach is more flexible, since it has full hierarchy
> of qdiscs. As you're saying above and from the old ifb logs I thought
> that ifb is _temporary_ and long term plan is to use ingress_queue,
> but looks like this is not the case. Also not too long ago we decided
> that we don't want another ingress qdisc. Therefore I think it makes
> sense to simplify the code and boost performance.

Which performance problem did you see? Did you hit the spinlock
bottleneck? That spinlock should be eliminated before you
try to do more, and it can be replaced by RCU read lock after
John's work (I am still waiting for his patch btw).


> I'm not proposing to change tooling, of course.
> From iproute2 point of view we'll still have ingress qdisc.
> Right now we're pointlessly allocating memory for it and for
> ingress_queue, whereas we only need to call cls/act.
> I'm proposing to kill them and used tcf_proto in net_device instead.
> Right now to reach cls in critical path on ingress we do:
> rxq = skb->dev->ingress_queue
> sch = rxq->qdisc
> sch->enqueue
> sch->filter_list
> with a bunch of 'if' conditions and useless memory accesses in-between.
> If we add 'ingress_filter_list' directly to net_device,
> it will be just:
> skb->dev->ingress_filter_list
> which is huge performance boost. Code size will shrink as well.
> iproute2 and all existing tools will work as-is. Looks as win-win to me.

Nope, it breaks Qdisc abstraction, filters never attach to netdevice
directly. You should stop, since you really don't understand the code.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-24  3:37               ` Cong Wang
@ 2015-04-24  8:12                 ` Daniel Borkmann
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Borkmann @ 2015-04-24  8:12 UTC (permalink / raw)
  To: Cong Wang, Alexei Starovoitov
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet, John Fastabend, netdev

On 04/24/2015 05:37 AM, Cong Wang wrote:
> On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
...
>> agree. imo ifb approach is more flexible, since it has full hierarchy
>> of qdiscs. As you're saying above and from the old ifb logs I thought
>> that ifb is _temporary_ and long term plan is to use ingress_queue,
>> but looks like this is not the case. Also not too long ago we decided
>> that we don't want another ingress qdisc. Therefore I think it makes
>> sense to simplify the code and boost performance.
>
> Which performance problem did you see? Did you hit the spinlock
> bottleneck? That spinlock should be eliminated before you
> try to do more  [...]

Those are already the next steps for improving performance after
that, by attempting to get rid of these additional indirections
and shrinking code.

>> I'm not proposing to change tooling, of course.
>>  From iproute2 point of view we'll still have ingress qdisc.
>> Right now we're pointlessly allocating memory for it and for
>> ingress_queue, whereas we only need to call cls/act.
>> I'm proposing to kill them and used tcf_proto in net_device instead.
>> Right now to reach cls in critical path on ingress we do:
>> rxq = skb->dev->ingress_queue
>> sch = rxq->qdisc
>> sch->enqueue
>> sch->filter_list
>> with a bunch of 'if' conditions and useless memory accesses in-between.
>> If we add 'ingress_filter_list' directly to net_device,
>> it will be just:
>> skb->dev->ingress_filter_list
>> which is huge performance boost. Code size will shrink as well.
>> iproute2 and all existing tools will work as-is. Looks as win-win to me.
>
> Nope, it breaks Qdisc abstraction, filters never attach to netdevice
> directly. You should stop, since you really don't understand the code.

Would be great if you make netdev a better place, stop such immature
comments, and keep it technical as everyone else.

The point here is to try thinking outside the box, so lets see how
the result in the end looks like, then we can discuss.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
  2015-04-24  0:59             ` Alexei Starovoitov
  2015-04-24  3:37               ` Cong Wang
@ 2015-04-27 12:31               ` Jamal Hadi Salim
  1 sibling, 0 replies; 30+ messages in thread
From: Jamal Hadi Salim @ 2015-04-27 12:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Cong Wang
  Cc: David S. Miller, Eric Dumazet, John Fastabend, netdev

On 04/23/15 20:59, Alexei Starovoitov wrote:
> On 4/23/15 3:51 PM, Jamal Hadi Salim wrote:
>>
>> So you are planning to add queues? If you are that is a different
>> discussion (and the use case needs some clarity).
>
> nope. I wasn't planning to do that.
>

Then i would say, lets just keep the naming convention. Maybe
better documentation is needed.

>  > For packets being forwarded we already had egress qdiscs which had
>  > queues so it didnt seem to make sense to enqueue on ingress for such
>  > use cases.
>  > There was a use case later where multiple ingress ports had to be
>  > shared
>  > and ifb was born - which is pseudo temporary enqueueing on ingress.
>
> agree. imo ifb approach is more flexible, since it has full hierarchy
> of qdiscs. As you're saying above and from the old ifb logs I thought
> that ifb is _temporary_ and long term plan is to use ingress_queue,
> but looks like this is not the case.

ifb represented a real use case which questioned the lack
of ingress queue. But do note, that problem was solved. And in
engineering we just move on unless a compelling reason makes us
rethink.
IMO, the original use case for ifb no longer requires it but the
internets and the googles havent caught up with it yet. Refer to
my netdev01 preso/paper where i talk about "sharing".
However, ifb addressed another issue. Instead of having per port/netdev
policies we can now have per-aggregate-port-group policies. While this
comes at a small cost of redirecting packets, that penalty is only paid
for by people interested in defining such policies.
This in itself is useful.

>Also not too long ago we decided
> that we don't want another ingress qdisc. Therefore I think it makes
> sense to simplify the code and boost performance.
> I'm not proposing to change tooling, of course.
>  From iproute2 point of view we'll still have ingress qdisc.
> Right now we're pointlessly allocating memory for it and for
> ingress_queue, whereas we only need to call cls/act.
> I'm proposing to kill them and used tcf_proto in net_device instead.
> Right now to reach cls in critical path on ingress we do:
> rxq = skb->dev->ingress_queue
> sch = rxq->qdisc
> sch->enqueue
> sch->filter_list
> with a bunch of 'if' conditions and useless memory accesses in-between.
> If we add 'ingress_filter_list' directly to net_device,
> it will be just:
> skb->dev->ingress_filter_list
> which is huge performance boost. Code size will shrink as well.
> iproute2 and all existing tools will work as-is. Looks as win-win to me.
>

I hear you; any extra cycle we can remove from the equation is useful.
But do note: in this case, it comes down to the thin line between
usability and  performance. Do take a closer look at the tooling
interfaces from user space on ingress qdisc. The serialization,
the ops already provided for manipulating filters etc. Those are
for free if you use the qdisc abstraction.
If you can still keep that and get rid of the opcodes you mention
above then it is win-win. My feeling is it is a lot more challenging.

>>>> The fact that qdiscs dealt with these codes directly
>>>> allows for specialized handling. Moving them to a generic function
>>>> seems to defeat that purpose. So I am siding with Cong on this.
>>>
>>> that's not what patch 1 is doing. It is still doing specialized
>>> handling... but in light of what you said above, it looks like much
>>> bigger cleanup is needed. We'll continue arguing when I refactor
>>> this set and resubmit when net-next reopens.
>>
>> Sorry - i was refereing to the patch where you agregated things after
>> the qdisc invokes a classifier.
>
> hmm. There I also didn't convert all qdiscs into single helper.
> Only those that have exactly the same logic after tc_classify.
> All other qdiscs have custom handling.
> No worries, it's hard to review things while traveling. Been there :)
>

I am sorry again - will look when i get out of travel mode.

cheers,
jamal

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-04-27 12:31 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-21 19:27 [RFC 0/3] tc cleanup? Alexei Starovoitov
2015-04-21 19:27 ` [RFC 1/3] tc: fix return values of ingress qdisc Alexei Starovoitov
2015-04-22  4:59   ` Cong Wang
2015-04-22 22:04     ` Alexei Starovoitov
2015-04-22 23:29       ` Cong Wang
2015-04-23  8:46         ` Thomas Graf
2015-04-21 19:27 ` [RFC 2/3] tc: deprecate TC_ACT_QUEUED Alexei Starovoitov
2015-04-22  5:02   ` Cong Wang
2015-04-22  7:43     ` Daniel Borkmann
2015-04-22 22:22     ` Alexei Starovoitov
2015-04-22 23:39       ` Cong Wang
2015-04-23  2:46         ` Alexei Starovoitov
2015-04-23  7:13           ` Daniel Borkmann
2015-04-23 18:12           ` Cong Wang
2015-04-23 18:21             ` Daniel Borkmann
2015-04-23 18:30               ` Cong Wang
2015-04-23 20:45       ` Jamal Hadi Salim
2015-04-23 21:20         ` Florian Westphal
2015-04-23 22:13         ` Alexei Starovoitov
2015-04-23 22:33           ` Cong Wang
2015-04-23 22:51           ` Jamal Hadi Salim
2015-04-24  0:59             ` Alexei Starovoitov
2015-04-24  3:37               ` Cong Wang
2015-04-24  8:12                 ` Daniel Borkmann
2015-04-27 12:31               ` Jamal Hadi Salim
2015-04-21 19:27 ` [RFC 3/3] tc: cleanup tc_classify Alexei Starovoitov
2015-04-22  5:05   ` Cong Wang
2015-04-22 22:27     ` Alexei Starovoitov
2015-04-22 23:38       ` Cong Wang
2015-04-23  8:49         ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.