All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake
@ 2020-06-25 11:55 Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags Toke Høiland-Jørgensen
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

Hi Dave

This series contains a number of fixes and optimisations for sch_cake that we've
accumulated in the out-of-tree version. See the individual patch descriptions
for more details.

The first three patches in the series are candidates for inclusion into stable.

-Toke

---

Ilya Ponetayev (2):
      sch_cake: fix IP protocol handling in the presence of VLAN tags
      sch_cake: don't try to reallocate or unshare skb unconditionally

Kevin Darbyshire-Bryant (1):
      sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling

Toke Høiland-Jørgensen (2):
      sch_cake: don't call diffserv parsing code when it is not needed
      sch_cake: fix a few style nits


 net/sched/sch_cake.c | 66 ++++++++++++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 21 deletions(-)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
@ 2020-06-25 11:55 ` Toke Høiland-Jørgensen
  2020-06-25 19:29   ` David Miller
  2020-06-25 11:55 ` [PATCH net-next 2/5] sch_cake: don't try to reallocate or unshare skb unconditionally Toke Høiland-Jørgensen
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

From: Ilya Ponetayev <i.ponetaev@ndmsystems.com>

CAKE was using the return value of tc_skb_protocol() and expecting it to be
the IP protocol type. This can fail in the presence of QinQ VLAN tags,
making CAKE unable to handle ECN marking and diffserv parsing in this case.
Fix this by implementing our own version of tc_skb_protocol(), which will
use skb->protocol directly, but also parse and skip over any VLAN tags and
return the inner protocol number instead.

Also fix CE marking by implementing a version of INET_ECN_set_ce() that
uses the same parsing routine.

Fixes: ea82511518f4 ("sch_cake: Add NAT awareness to packet classifier")
Fixes: b2100cc56fca ("sch_cake: Use tc_skb_protocol() helper for getting packet protocol")
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
[ squash original two patches, rewrite commit message ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/sched/sch_cake.c |   52 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 60f8ae578819..0f594d88a957 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -497,6 +497,52 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars,
 	return down;
 }
 
+static __be16 cake_skb_proto(const struct sk_buff *skb)
+{
+	unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
+	__be16 proto = skb->protocol;
+	struct vlan_hdr vhdr, *vh;
+
+	while (proto == htons(ETH_P_8021Q) || proto == htons(ETH_P_8021AD)) {
+		vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
+		if (!vh)
+			break;
+
+		proto = vh->h_vlan_encapsulated_proto;
+		offset += sizeof(vhdr);
+	}
+
+	return proto;
+}
+
+static int cake_set_ce(struct sk_buff *skb)
+{
+	int wlen = skb_network_offset(skb);
+
+	switch (cake_skb_proto(skb)) {
+	case htons(ETH_P_IP):
+		wlen += sizeof(struct iphdr);
+		if (!pskb_may_pull(skb, wlen) ||
+		    skb_try_make_writable(skb, wlen))
+			return 0;
+
+		return IP_ECN_set_ce(ip_hdr(skb));
+
+	case htons(ETH_P_IPV6):
+		wlen += sizeof(struct ipv6hdr);
+		if (!pskb_may_pull(skb, wlen) ||
+		    skb_try_make_writable(skb, wlen))
+			return 0;
+
+		return IP6_ECN_set_ce(skb, ipv6_hdr(skb));
+
+	default:
+		return 0;
+	}
+
+	return 0;
+}
+
 /* Call this with a freshly dequeued packet for possible congestion marking.
  * Returns true as an instruction to drop the packet, false for delivery.
  */
@@ -549,7 +595,7 @@ static bool cobalt_should_drop(struct cobalt_vars *vars,
 
 	if (next_due && vars->dropping) {
 		/* Use ECN mark if possible, otherwise drop */
-		drop = !(vars->ecn_marked = INET_ECN_set_ce(skb));
+		drop = !(vars->ecn_marked = cake_set_ce(skb));
 
 		vars->count++;
 		if (!vars->count)
@@ -592,7 +638,7 @@ static bool cake_update_flowkeys(struct flow_keys *keys,
 	bool rev = !skb->_nfct, upd = false;
 	__be32 ip;
 
-	if (tc_skb_protocol(skb) != htons(ETH_P_IP))
+	if (cake_skb_proto(skb) != htons(ETH_P_IP))
 		return false;
 
 	if (!nf_ct_get_tuple_skb(&tuple, skb))
@@ -1556,7 +1602,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 	int wlen = skb_network_offset(skb);
 	u8 dscp;
 
-	switch (tc_skb_protocol(skb)) {
+	switch (cake_skb_proto(skb)) {
 	case htons(ETH_P_IP):
 		wlen += sizeof(struct iphdr);
 		if (!pskb_may_pull(skb, wlen) ||


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/5] sch_cake: don't try to reallocate or unshare skb unconditionally
  2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags Toke Høiland-Jørgensen
@ 2020-06-25 11:55 ` Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 3/5] sch_cake: don't call diffserv parsing code when it is not needed Toke Høiland-Jørgensen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

From: Ilya Ponetayev <i.ponetaev@ndmsystems.com>

cake_handle_diffserv() tries to linearize mac and network header parts of
skb and to make it writable unconditionally. In some cases it leads to full
skb reallocation, which reduces throughput and increases CPU load. Some
measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
reallocates skb, if skb was allocated in ethernet driver via so-called
'build skb' method from page cache (it was discovered by strange increase
of kmalloc-2048 slab at first).

Obtain DSCP value via read-only skb_header_pointer() call, and leave
linearization only for DSCP bleaching or ECN CE setting. And, as an
additional optimisation, skip diffserv parsing entirely if it is not needed
by the current configuration.

Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
[ fix a few style issues, reflow commit message ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/sched/sch_cake.c |   41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 0f594d88a957..cebcc36755ac 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1599,30 +1599,49 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 
 static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 {
-	int wlen = skb_network_offset(skb);
+	const int offset = skb_network_offset(skb);
+	u16 *buf, buf_;
 	u8 dscp;
 
 	switch (cake_skb_proto(skb)) {
 	case htons(ETH_P_IP):
-		wlen += sizeof(struct iphdr);
-		if (!pskb_may_pull(skb, wlen) ||
-		    skb_try_make_writable(skb, wlen))
+		buf = skb_header_pointer(skb, offset, sizeof(buf_), &buf_);
+		if (unlikely(!buf))
 			return 0;
 
-		dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
-		if (wash && dscp)
+		/* ToS is in the second byte of iphdr */
+		dscp = ipv4_get_dsfield((struct iphdr *)buf) >> 2;
+
+		if (wash && dscp) {
+			const int wlen = offset + sizeof(struct iphdr);
+
+			if (!pskb_may_pull(skb, wlen) ||
+			    skb_try_make_writable(skb, wlen))
+				return 0;
+
 			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0);
+		}
+
 		return dscp;
 
 	case htons(ETH_P_IPV6):
-		wlen += sizeof(struct ipv6hdr);
-		if (!pskb_may_pull(skb, wlen) ||
-		    skb_try_make_writable(skb, wlen))
+		buf = skb_header_pointer(skb, offset, sizeof(buf_), &buf_);
+		if (unlikely(!buf))
 			return 0;
 
-		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
-		if (wash && dscp)
+		/* Traffic class is in the first and second bytes of ipv6hdr */
+		dscp = ipv6_get_dsfield((struct ipv6hdr *)buf) >> 2;
+
+		if (wash && dscp) {
+			const int wlen = offset + sizeof(struct ipv6hdr);
+
+			if (!pskb_may_pull(skb, wlen) ||
+			    skb_try_make_writable(skb, wlen))
+				return 0;
+
 			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0);
+		}
+
 		return dscp;
 
 	case htons(ETH_P_ARP):


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 3/5] sch_cake: don't call diffserv parsing code when it is not needed
  2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 2/5] sch_cake: don't try to reallocate or unshare skb unconditionally Toke Høiland-Jørgensen
@ 2020-06-25 11:55 ` Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 4/5] sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

From: Toke Høiland-Jørgensen <toke@redhat.com>

As a further optimisation of the diffserv parsing codepath, we can skip it
entirely if CAKE is neither configured to use diffserv-based
classification, nor to zero out the diffserv bits.

Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/sched/sch_cake.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index cebcc36755ac..958523c777be 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1597,7 +1597,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	return idx + (tin << 16);
 }
 
-static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
+static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
 {
 	const int offset = skb_network_offset(skb);
 	u16 *buf, buf_;
@@ -1658,14 +1658,17 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 {
 	struct cake_sched_data *q = qdisc_priv(sch);
 	u32 tin, mark;
+	bool wash;
 	u8 dscp;
 
 	/* Tin selection: Default to diffserv-based selection, allow overriding
-	 * using firewall marks or skb->priority.
+	 * using firewall marks or skb->priority. Call DSCP parsing early if
+	 * wash is enabled, otherwise defer to below to skip unneeded parsing.
 	 */
-	dscp = cake_handle_diffserv(skb,
-				    q->rate_flags & CAKE_FLAG_WASH);
 	mark = (skb->mark & q->fwmark_mask) >> q->fwmark_shft;
+	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
+	if (wash)
+		dscp = cake_handle_diffserv(skb, wash);
 
 	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
 		tin = 0;
@@ -1679,6 +1682,8 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
 
 	else {
+		if (!wash)
+			dscp = cake_handle_diffserv(skb, wash);
 		tin = q->tin_index[dscp];
 
 		if (unlikely(tin >= q->tin_cnt))


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 4/5] sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling
  2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2020-06-25 11:55 ` [PATCH net-next 3/5] sch_cake: don't call diffserv parsing code when it is not needed Toke Høiland-Jørgensen
@ 2020-06-25 11:55 ` Toke Høiland-Jørgensen
  2020-06-25 11:55 ` [PATCH net-next 5/5] sch_cake: fix a few style nits Toke Høiland-Jørgensen
  2020-06-25 19:31 ` [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake David Miller
  5 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>

Change tin mapping on diffserv3, 4 & 8 for LE PHB support, in essence
making LE a member of the Bulk tin.

Bulk has the least priority and minimum of 1/16th total bandwidth in the
face of higher priority traffic.

NB: Diffserv 3 & 4 swap tin 0 & 1 priorities from the default order as
found in diffserv8, in case anyone is wondering why it looks a bit odd.

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
[ reword commit message slightly ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/sched/sch_cake.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 958523c777be..78a702a4e1d4 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -312,8 +312,8 @@ static const u8 precedence[] = {
 };
 
 static const u8 diffserv8[] = {
-	2, 5, 1, 2, 4, 2, 2, 2,
-	0, 2, 1, 2, 1, 2, 1, 2,
+	2, 0, 1, 2, 4, 2, 2, 2,
+	1, 2, 1, 2, 1, 2, 1, 2,
 	5, 2, 4, 2, 4, 2, 4, 2,
 	3, 2, 3, 2, 3, 2, 3, 2,
 	6, 2, 3, 2, 3, 2, 3, 2,
@@ -323,7 +323,7 @@ static const u8 diffserv8[] = {
 };
 
 static const u8 diffserv4[] = {
-	0, 2, 0, 0, 2, 0, 0, 0,
+	0, 1, 0, 0, 2, 0, 0, 0,
 	1, 0, 0, 0, 0, 0, 0, 0,
 	2, 0, 2, 0, 2, 0, 2, 0,
 	2, 0, 2, 0, 2, 0, 2, 0,
@@ -334,7 +334,7 @@ static const u8 diffserv4[] = {
 };
 
 static const u8 diffserv3[] = {
-	0, 0, 0, 0, 2, 0, 0, 0,
+	0, 1, 0, 0, 2, 0, 0, 0,
 	1, 0, 0, 0, 0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 5/5] sch_cake: fix a few style nits
  2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
                   ` (3 preceding siblings ...)
  2020-06-25 11:55 ` [PATCH net-next 4/5] sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling Toke Høiland-Jørgensen
@ 2020-06-25 11:55 ` Toke Høiland-Jørgensen
  2020-06-25 19:31 ` [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake David Miller
  5 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

From: Toke Høiland-Jørgensen <toke@redhat.com>

I spotted a few nits when comparing the in-tree version of sch_cake with
the out-of-tree one: A redundant error variable declaration shadowing an
outer declaration, and an indentation alignment issue. Fix both of these to
minimise the delta.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/sched/sch_cake.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 78a702a4e1d4..e075913b2fd7 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -2761,7 +2761,7 @@ static int cake_init(struct Qdisc *sch, struct nlattr *opt,
 	qdisc_watchdog_init(&q->watchdog, sch);
 
 	if (opt) {
-		int err = cake_change(sch, opt, extack);
+		err = cake_change(sch, opt, extack);
 
 		if (err)
 			return err;
@@ -3078,7 +3078,7 @@ static int cake_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 			PUT_STAT_S32(BLUE_TIMER_US,
 				     ktime_to_us(
 					     ktime_sub(now,
-						     flow->cvars.blue_timer)));
+						       flow->cvars.blue_timer)));
 		}
 		if (flow->cvars.dropping) {
 			PUT_STAT_S32(DROP_NEXT_US,


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-25 11:55 ` [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags Toke Høiland-Jørgensen
@ 2020-06-25 19:29   ` David Miller
  2020-06-25 19:53     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2020-06-25 19:29 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Thu, 25 Jun 2020 13:55:03 +0200

> From: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
> 
> CAKE was using the return value of tc_skb_protocol() and expecting it to be
> the IP protocol type. This can fail in the presence of QinQ VLAN tags,
> making CAKE unable to handle ECN marking and diffserv parsing in this case.
> Fix this by implementing our own version of tc_skb_protocol(), which will
> use skb->protocol directly, but also parse and skip over any VLAN tags and
> return the inner protocol number instead.
> 
> Also fix CE marking by implementing a version of INET_ECN_set_ce() that
> uses the same parsing routine.
> 
> Fixes: ea82511518f4 ("sch_cake: Add NAT awareness to packet classifier")
> Fixes: b2100cc56fca ("sch_cake: Use tc_skb_protocol() helper for getting packet protocol")
> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
> Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
> [ squash original two patches, rewrite commit message ]
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

First, this is a bug fix and should probably be steered to 'net'.

Also, other users of tc_skb_protocol() are almost certainly hitting a
similar problem aren't they?  Maybe fix this generically.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake
  2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
                   ` (4 preceding siblings ...)
  2020-06-25 11:55 ` [PATCH net-next 5/5] sch_cake: fix a few style nits Toke Høiland-Jørgensen
@ 2020-06-25 19:31 ` David Miller
  2020-06-25 19:49   ` Toke Høiland-Jørgensen
  5 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2020-06-25 19:31 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Thu, 25 Jun 2020 13:55:02 +0200

> The first three patches in the series are candidates for inclusion into stable.

Stable candidates, ie. fixes, should target 'net' not 'net-next'.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake
  2020-06-25 19:31 ` [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake David Miller
@ 2020-06-25 19:49   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 19:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@redhat.com>
> Date: Thu, 25 Jun 2020 13:55:02 +0200
>
>> The first three patches in the series are candidates for inclusion into stable.
>
> Stable candidates, ie. fixes, should target 'net' not 'net-next'.

Right, sure, can do; I was just being lazy and putting everything in one
series :)

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-25 19:29   ` David Miller
@ 2020-06-25 19:53     ` Toke Høiland-Jørgensen
  2020-06-25 20:00       ` David Miller
  2020-06-26  8:27       ` Davide Caratti
  0 siblings, 2 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-25 19:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@redhat.com>
> Date: Thu, 25 Jun 2020 13:55:03 +0200
>
>> From: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
>> 
>> CAKE was using the return value of tc_skb_protocol() and expecting it to be
>> the IP protocol type. This can fail in the presence of QinQ VLAN tags,
>> making CAKE unable to handle ECN marking and diffserv parsing in this case.
>> Fix this by implementing our own version of tc_skb_protocol(), which will
>> use skb->protocol directly, but also parse and skip over any VLAN tags and
>> return the inner protocol number instead.
>> 
>> Also fix CE marking by implementing a version of INET_ECN_set_ce() that
>> uses the same parsing routine.
>> 
>> Fixes: ea82511518f4 ("sch_cake: Add NAT awareness to packet classifier")
>> Fixes: b2100cc56fca ("sch_cake: Use tc_skb_protocol() helper for getting packet protocol")
>> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
>> Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
>> [ squash original two patches, rewrite commit message ]
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> First, this is a bug fix and should probably be steered to 'net'.
>
> Also, other users of tc_skb_protocol() are almost certainly hitting a
> similar problem aren't they?  Maybe fix this generically.

I think it depends a little on the use case; some callers actually care
about the VLAN tags themselves and handle that specially (e.g.,
act_csum). Whereas others (e.g., sch_dsmark) probably will have the same
issue. I guess I can trying going through them all and figuring out if
there's a more generic solution.

I'll split out the diffserv parsing fixes and send those for your net
tree straight away, then circle back to this one...

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-25 19:53     ` Toke Høiland-Jørgensen
@ 2020-06-25 20:00       ` David Miller
  2020-06-26  8:27       ` Davide Caratti
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2020-06-25 20:00 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Thu, 25 Jun 2020 21:53:53 +0200

> I think it depends a little on the use case; some callers actually care
> about the VLAN tags themselves and handle that specially (e.g.,
> act_csum). Whereas others (e.g., sch_dsmark) probably will have the same
> issue. I guess I can trying going through them all and figuring out if
> there's a more generic solution.

That makes sense.

> I'll split out the diffserv parsing fixes and send those for your net
> tree straight away, then circle back to this one...

Great, thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-25 19:53     ` Toke Høiland-Jørgensen
  2020-06-25 20:00       ` David Miller
@ 2020-06-26  8:27       ` Davide Caratti
  2020-06-26 12:52         ` [Cake] " Toke Høiland-Jørgensen
  2020-06-26 13:11         ` Jonathan Morton
  1 sibling, 2 replies; 18+ messages in thread
From: Davide Caratti @ 2020-06-26  8:27 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Miller; +Cc: netdev, cake

hello,

my 2 cents:

On Thu, 2020-06-25 at 21:53 +0200, Toke Høiland-Jørgensen wrote:
> I think it depends a little on the use case; some callers actually care
> about the VLAN tags themselves and handle that specially (e.g.,
> act_csum).

I remember taht something similar was discussed about 1 year ago [1].

> Whereas others (e.g., sch_dsmark) probably will have the same
> issue.

I'd say that the issue "propagates" to all qdiscs that mangle the ECN-CE
bit (i.e., calling INET_ECN_set_ce() [2]), most notably all the RED
variants and "codel/fq_codel".

>  I guess I can trying going through them all and figuring out if
> there's a more generic solution.

For sch_cake, I think that the qdisc shouldn't look at the IP header when
it schedules packets having a VLAN tag.

Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
should look at the VLAN priority (PCP) bits (and that's something that
cake doesn't do currently - but I have a small patch in my stash that
implements this: please let me know if you are interested in seeing it :)
).

Then, to ensure that the IP precedence is respected, even with different
VLAN tags, users should explicitly configure TC filters that "map" the
DSCP value to a PCP value. This would ensure that configured priority is
respected by the scheduler, and would also be flexible enough to allow
different "mappings". 

Sure, my proposal does not cover the problem of mangling the CE bit inside
VLAN-tagged packets, i.e. if we should understand if qdiscs should allow
it or not.

WDYT?

thank you in advance!
-- 
davide


[1] https://lore.kernel.org/netdev/CAM_iQpUmuHH8S35ERuJ-sFS=17aa-C8uHSWF-WF7toANX2edCQ@mail.gmail.com/#t
[2] https://elixir.bootlin.com/linux/latest/C/ident/INET_ECN_set_ce


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-26  8:27       ` Davide Caratti
@ 2020-06-26 12:52         ` Toke Høiland-Jørgensen
  2020-06-26 14:01           ` Jamal Hadi Salim
  2020-06-26 18:52           ` Davide Caratti
  2020-06-26 13:11         ` Jonathan Morton
  1 sibling, 2 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-26 12:52 UTC (permalink / raw)
  To: Davide Caratti, David Miller; +Cc: cake, netdev, Jiri Pirko, Jamal Hadi Salim

Davide Caratti <dcaratti@redhat.com> writes:

> hello,
>
> my 2 cents:
>
> On Thu, 2020-06-25 at 21:53 +0200, Toke Høiland-Jørgensen wrote:
>> I think it depends a little on the use case; some callers actually care
>> about the VLAN tags themselves and handle that specially (e.g.,
>> act_csum).
>
> I remember taht something similar was discussed about 1 year ago [1].

Ah, thank you for the pointer!

>> Whereas others (e.g., sch_dsmark) probably will have the same
>> issue.
>
> I'd say that the issue "propagates" to all qdiscs that mangle the ECN-CE
> bit (i.e., calling INET_ECN_set_ce() [2]), most notably all the RED
> variants and "codel/fq_codel".

Yeah, I think we should fix INET_ECN_set_ce() instead of re-implementing
it in CAKE. See below, though.

>>  I guess I can trying going through them all and figuring out if
>> there's a more generic solution.
>
> For sch_cake, I think that the qdisc shouldn't look at the IP header when
> it schedules packets having a VLAN tag.
>
> Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
> should look at the VLAN priority (PCP) bits (and that's something that
> cake doesn't do currently - but I have a small patch in my stash that
> implements this: please let me know if you are interested in seeing it :)
> ).
>
> Then, to ensure that the IP precedence is respected, even with different
> VLAN tags, users should explicitly configure TC filters that "map" the
> DSCP value to a PCP value. This would ensure that configured priority is
> respected by the scheduler, and would also be flexible enough to allow
> different "mappings".

I think you have this the wrong way around :)

I.e., classifying based on VLAN priority is even more esoteric than
using diffserv markings, so that should not be the default. Making it
the default would also make the behaviour change for the same traffic if
there's a VLAN tag present, which is bound to confuse people. I suppose
it could be an option, but not really sure that's needed, since as you
say you could just implement it with your own TC filters...

> Sure, my proposal does not cover the problem of mangling the CE bit
> inside VLAN-tagged packets, i.e. if we should understand if qdiscs
> should allow it or not.

Hmm, yeah, that's the rub, isn't it? I think this is related to this
commit, which first introduced tc_skb_protocol():

d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")

That commit at least made the behaviour consistent across
accelerated/non-accelerated VLANs. However, the patch description
asserts that 'tc code .. expects vlan protocol type [in skb->protocol]'.
Looking at the various callers, I'm not actually sure that's true, in
the sense that most of the callers don't handle VLAN ethertypes at all,
but expects to find an IP header. This is the case for at least:

- act_ctinfo
- act_skbedit
- cls_flow
- em_ipset
- em_ipt
- sch_cake
- sch_dsmark

In fact the only caller that explicitly handles a VLAN ethertype seems
to be act_csum; and that handles it in a way that also just skips the
VLAN headers, albeit by skb_pull()'ing the header.

cls_api, em_meta and sch_teql don't explicitly handle it; but returning
the VLAN ethertypes to those does seem to make sense, since they just
pass the value somewhere else.

So my suggestion would be to add a new helper that skips the VLAN tags
and finds the L3 ethertype (i.e., basically cake_skb_proto() as
introduced in this patch), then switch all the callers listed above, as
well as the INET_ECN_set_ce() over to using that. Maybe something like
'skb_l3_protocol()' which could go into skbuff.h itself, so the ECN code
can also find it?

Any objections to this? It's not actually clear to me how the discussion
you quoted above landed; but this will at least make things consistent
across all the different actions/etc.

Adding in Jiri and Jamal as well since they were involved in the patch I
quoted above.

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-26  8:27       ` Davide Caratti
  2020-06-26 12:52         ` [Cake] " Toke Høiland-Jørgensen
@ 2020-06-26 13:11         ` Jonathan Morton
  2020-06-26 22:00           ` Stephen Hemminger
  1 sibling, 1 reply; 18+ messages in thread
From: Jonathan Morton @ 2020-06-26 13:11 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Toke Høiland-Jørgensen, David Miller, cake, netdev

Toke has already replied, but:

> Sure, my proposal does not cover the problem of mangling the CE bit inside
> VLAN-tagged packets, i.e. if we should understand if qdiscs should allow
> it or not.

This is clearly wrong-headed by itself.

Everything I've heard about VLAN tags thus far indicates that they should be *transparent* to nodes which don't care about them; they determine where the packet goes within the LAN, but not how it behaves.  In particular this means that AQM should be able to apply congestion control signals to them in the normal way, by modifying the ECN field of the IP header encapsulated within.

The most I would entertain is to incorporate a VLAN tag into the hashes that Cake uses to distinguish hosts and/or flows.  This would account for the case where two hosts on different VLANs of the same physical network have the same IP address.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-26 12:52         ` [Cake] " Toke Høiland-Jørgensen
@ 2020-06-26 14:01           ` Jamal Hadi Salim
  2020-06-26 18:52           ` Davide Caratti
  1 sibling, 0 replies; 18+ messages in thread
From: Jamal Hadi Salim @ 2020-06-26 14:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Davide Caratti, David Miller
  Cc: cake, netdev, Jiri Pirko, Simon Horman

On 2020-06-26 8:52 a.m., Toke Høiland-Jørgensen wrote:
> Davide Caratti <dcaratti@redhat.com> writes:
> 
>> hello,
>>
>> my 2 cents:
>>
>> On Thu, 2020-06-25 at 21:53 +0200, Toke Høiland-Jørgensen wrote:
>>> I think it depends a little on the use case; some callers actually care
>>> about the VLAN tags themselves and handle that specially (e.g.,
>>> act_csum).
>>
>> I remember taht something similar was discussed about 1 year ago [1].
> 
> Ah, thank you for the pointer!
> 
>>> Whereas others (e.g., sch_dsmark) probably will have the same
>>> issue.
>>
>> I'd say that the issue "propagates" to all qdiscs that mangle the ECN-CE
>> bit (i.e., calling INET_ECN_set_ce() [2]), most notably all the RED
>> variants and "codel/fq_codel".
> 
> Yeah, I think we should fix INET_ECN_set_ce() instead of re-implementing
> it in CAKE. See below, though.
> 
>>>   I guess I can trying going through them all and figuring out if
>>> there's a more generic solution.
>>
>> For sch_cake, I think that the qdisc shouldn't look at the IP header when
>> it schedules packets having a VLAN tag.
>>
>> Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
>> should look at the VLAN priority (PCP) bits (and that's something that
>> cake doesn't do currently - but I have a small patch in my stash that
>> implements this: please let me know if you are interested in seeing it :)
>> ).
>>
>> Then, to ensure that the IP precedence is respected, even with different
>> VLAN tags, users should explicitly configure TC filters that "map" the
>> DSCP value to a PCP value. This would ensure that configured priority is
>> respected by the scheduler, and would also be flexible enough to allow
>> different "mappings".
> 
> I think you have this the wrong way around :)
> 
> I.e., classifying based on VLAN priority is even more esoteric than
> using diffserv markings, so that should not be the default. Making it
> the default would also make the behaviour change for the same traffic if
> there's a VLAN tag present, which is bound to confuse people. I suppose
> it could be an option, but not really sure that's needed, since as you
> say you could just implement it with your own TC filters...
> 
>> Sure, my proposal does not cover the problem of mangling the CE bit
>> inside VLAN-tagged packets, i.e. if we should understand if qdiscs
>> should allow it or not.
> 
> Hmm, yeah, that's the rub, isn't it? I think this is related to this
> commit, which first introduced tc_skb_protocol():
> 
> d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
> 

I didnt quiet follow the discussion - but the patch you are referencing
was to fix an earlier commit which had broken things (we didnt
have the "fixes" tag back then).

> That commit at least made the behaviour consistent across
> accelerated/non-accelerated VLANs. However, the patch description
> asserts that 'tc code .. expects vlan protocol type [in skb->protocol]'.
> Looking at the various callers, I'm not actually sure that's true, in
> the sense that most of the callers don't handle VLAN ethertypes at all,
> but expects to find an IP header. This is the case for at least:
> 
> - act_ctinfo
> - act_skbedit
> - cls_flow
> - em_ipset
> - em_ipt
> - sch_cake
> - sch_dsmark
> 
> In fact the only caller that explicitly handles a VLAN ethertype seems
> to be act_csum; and that handles it in a way that also just skips the
> VLAN headers, albeit by skb_pull()'ing the header.
> 

The earlier change broke a few things unfortunately. There was a more
recent discussion with Simon Horman that i cant find now on breakage
with some classifiers in presence of double vlans.
+cc Simon maybe he can find the discussion.

> cls_api, em_meta and sch_teql don't explicitly handle it; but returning
> the VLAN ethertypes to those does seem to make sense, since they just
> pass the value somewhere else.
> 
> So my suggestion would be to add a new helper that skips the VLAN tags
> and finds the L3 ethertype (i.e., basically cake_skb_proto() as
> introduced in this patch), then switch all the callers listed above, as
> well as the INET_ECN_set_ce() over to using that. Maybe something like
> 'skb_l3_protocol()' which could go into skbuff.h itself, so the ECN code
> can also find it?
> 
> Any objections to this? It's not actually clear to me how the discussion
> you quoted above landed; but this will at least make things consistent
> across all the different actions/etc.
> 

I didnt follow the original discussion - will try to read in order
to form an opinion.

cheers,
jamal

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-26 12:52         ` [Cake] " Toke Høiland-Jørgensen
  2020-06-26 14:01           ` Jamal Hadi Salim
@ 2020-06-26 18:52           ` Davide Caratti
  2020-06-29 10:27             ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 18+ messages in thread
From: Davide Caratti @ 2020-06-26 18:52 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Miller
  Cc: cake, netdev, Jiri Pirko, Jamal Hadi Salim

hi Toke,

thanks for answering.

On Fri, 2020-06-26 at 14:52 +0200, Toke Høiland-Jørgensen wrote:
> Davide Caratti <dcaratti@redhat.com> writes:

[...]

> > 
> > >  I guess I can trying going through them all and figuring out if
> > > there's a more generic solution.
> > 
> > For sch_cake, I think that the qdisc shouldn't look at the IP header when
> > it schedules packets having a VLAN tag.
> > 
> > Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
> > should look at the VLAN priority (PCP) bits (and that's something that
> > cake doesn't do currently - but I have a small patch in my stash that
> > implements this: please let me know if you are interested in seeing it :)
> > ).
> > 
> > Then, to ensure that the IP precedence is respected, even with different
> > VLAN tags, users should explicitly configure TC filters that "map" the
> > DSCP value to a PCP value. This would ensure that configured priority is
> > respected by the scheduler, and would also be flexible enough to allow
> > different "mappings".
> 
> I think you have this the wrong way around :)
> 
> I.e., classifying based on VLAN priority is even more esoteric than
> using diffserv markings,

is it so uncommon? I knew that almost every wifi card did something
similar with 802.11 'access categories'. More generally, I'm not sure if
it's ok to ignore any QoS information present in the L2 header. Anyway,

> so that should not be the default. Making it
> the default would also make the behaviour change for the same traffic if
> there's a VLAN tag present, which is bound to confuse people. I suppose
> it could be an option, but not really sure that's needed, since as you
> say you could just implement it with your own TC filters...

you caught me :) ,

I wrote that patch in my stash to fix cake on my home router, where voice
and data are encapsulated in IP over PPPoE over VLANs, and different
services go over different VLAN ids (one VLAN dedicated for voice, the
other one for data) [1]. The quickest thing I did was: to prioritize
packets having VLAN id equal to 1035.

Now that I look at cake code again (where again means: after almost 1
year) it would be probably better to assign skb->priority using flower +
act_skbedit, and then prioritize in the qdisc: if I read the code well,
this would avoid voice and data falling into the same traffic class (that
was my original problem).

please note: I didn't try this patch - but I think that even with this
code I would have voice and data mixed together, because there is PPPoE
between VLAN and IP.

> > Sure, my proposal does not cover the problem of mangling the CE bit
> > inside VLAN-tagged packets, i.e. if we should understand if qdiscs
> > should allow it or not.
> 
> Hmm, yeah, that's the rub, isn't it? I think this is related to this
> commit, which first introduced tc_skb_protocol():
> 
> d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
> 
> That commit at least made the behaviour consistent across
> accelerated/non-accelerated VLANs. However, the patch description
> asserts that 'tc code .. expects vlan protocol type [in skb->protocol]'.
> Looking at the various callers, I'm not actually sure that's true, in
> the sense that most of the callers don't handle VLAN ethertypes at all,
> but expects to find an IP header. This is the case for at least:
> 
> - act_ctinfo
> - act_skbedit
> - cls_flow
> - em_ipset
> - em_ipt
> - sch_cake
> - sch_dsmark

sure, I'm not saying it's not possible to look inside IP headers. What I
understood from Cong's replies [2], and he sort-of convinced me, was: when
I have IP and one or more VLAN tags, no matter whether it is accelerated
or not, it should be sufficient to access the IP header daisy-chaining
'act_vlan pop actions' -> access to the IP header -> ' act_vlan push
actions (in the reversed order).

oh well, that's still not sufficient in my home router because of PPPoE. I
should practice with cls_bpf more seriously :-) 

or write act_pppoe.c :D

> In fact the only caller that explicitly handles a VLAN ethertype seems
> to be act_csum; and that handles it in a way that also just skips the
> VLAN headers, albeit by skb_pull()'ing the header.


> cls_api, em_meta and sch_teql don't explicitly handle it; but returning
> the VLAN ethertypes to those does seem to make sense, since they just
> pass the value somewhere else.
> 
> So my suggestion would be to add a new helper that skips the VLAN tags
> and finds the L3 ethertype (i.e., basically cake_skb_proto() as
> introduced in this patch), then switch all the callers listed above, as
> well as the INET_ECN_set_ce() over to using that. Maybe something like
> 'skb_l3_protocol()' which could go into skbuff.h itself, so the ECN code
> can also find it?

for setting the CE bit, that's understandable - in one way or the other,
the behaviour should be made consistent.

> Any objections to this? It's not actually clear to me how the discussion
> you quoted above landed; but this will at least make things consistent
> across all the different actions/etc.

well, it just didn't "land". But I agree, inconsistency here can make some
TC configurations "unreliable" (i.e., they don't do the job they were
configured for).

thanks!

-- 
davide

[1] https://gist.github.com/teknoraver/9524e539061d0b1e9f8774aa96902082
(by the way, thanks to Matteo Croce for this :) )
[2] https://lore.kernel.org/netdev/CAM_iQpWir7R3AQ7KSeFA5QNXSPHGK-1Nc7WsRM1vhkFyxB5ekA@mail.gmail.com/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-26 13:11         ` Jonathan Morton
@ 2020-06-26 22:00           ` Stephen Hemminger
  0 siblings, 0 replies; 18+ messages in thread
From: Stephen Hemminger @ 2020-06-26 22:00 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Davide Caratti, cake, netdev, David Miller

On Fri, 26 Jun 2020 16:11:49 +0300
Jonathan Morton <chromatix99@gmail.com> wrote:

> Toke has already replied, but:
> 
> > Sure, my proposal does not cover the problem of mangling the CE bit inside
> > VLAN-tagged packets, i.e. if we should understand if qdiscs should allow
> > it or not.  
> 
> This is clearly wrong-headed by itself.
> 
> Everything I've heard about VLAN tags thus far indicates that they should be *transparent* to nodes which don't care about them; they determine where the packet goes within the LAN, but not how it behaves.  In particular this means that AQM should be able to apply congestion control signals to them in the normal way, by modifying the ECN field of the IP header encapsulated within.
> 
> The most I would entertain is to incorporate a VLAN tag into the hashes that Cake uses to distinguish hosts and/or flows.  This would account for the case where two hosts on different VLANs of the same physical network have the same IP address.
> 
>  - Jonathan Morton
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake

The implementation of VLAN's is awkward/flawed. The outer VLAN tag is transparent
but the inner VLAN is visible. Similarly the outer VLAN tag doesn't count towards
the MTU but inner one does.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags
  2020-06-26 18:52           ` Davide Caratti
@ 2020-06-29 10:27             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-06-29 10:27 UTC (permalink / raw)
  To: Davide Caratti, David Miller; +Cc: cake, netdev, Jiri Pirko, Jamal Hadi Salim

Davide Caratti <dcaratti@redhat.com> writes:

> hi Toke,
>
> thanks for answering.
>
> On Fri, 2020-06-26 at 14:52 +0200, Toke Høiland-Jørgensen wrote:
>> Davide Caratti <dcaratti@redhat.com> writes:
>
> [...]
>
>> > 
>> > >  I guess I can trying going through them all and figuring out if
>> > > there's a more generic solution.
>> > 
>> > For sch_cake, I think that the qdisc shouldn't look at the IP header when
>> > it schedules packets having a VLAN tag.
>> > 
>> > Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
>> > should look at the VLAN priority (PCP) bits (and that's something that
>> > cake doesn't do currently - but I have a small patch in my stash that
>> > implements this: please let me know if you are interested in seeing it :)
>> > ).
>> > 
>> > Then, to ensure that the IP precedence is respected, even with different
>> > VLAN tags, users should explicitly configure TC filters that "map" the
>> > DSCP value to a PCP value. This would ensure that configured priority is
>> > respected by the scheduler, and would also be flexible enough to allow
>> > different "mappings".
>> 
>> I think you have this the wrong way around :)
>> 
>> I.e., classifying based on VLAN priority is even more esoteric than
>> using diffserv markings,
>
> is it so uncommon? I knew that almost every wifi card did something
> similar with 802.11 'access categories'. More generally, I'm not sure if
> it's ok to ignore any QoS information present in the L2 header. Anyway,
>
>> so that should not be the default. Making it
>> the default would also make the behaviour change for the same traffic if
>> there's a VLAN tag present, which is bound to confuse people. I suppose
>> it could be an option, but not really sure that's needed, since as you
>> say you could just implement it with your own TC filters...
>
> you caught me :) ,
>
> I wrote that patch in my stash to fix cake on my home router, where voice
> and data are encapsulated in IP over PPPoE over VLANs, and different
> services go over different VLAN ids (one VLAN dedicated for voice, the
> other one for data) [1]. The quickest thing I did was: to prioritize
> packets having VLAN id equal to 1035.
>
> Now that I look at cake code again (where again means: after almost 1
> year) it would be probably better to assign skb->priority using flower +
> act_skbedit, and then prioritize in the qdisc: if I read the code well,
> this would avoid voice and data falling into the same traffic class (that
> was my original problem).
>
> please note: I didn't try this patch - but I think that even with this
> code I would have voice and data mixed together, because there is PPPoE
> between VLAN and IP.
>
>> > Sure, my proposal does not cover the problem of mangling the CE bit
>> > inside VLAN-tagged packets, i.e. if we should understand if qdiscs
>> > should allow it or not.
>> 
>> Hmm, yeah, that's the rub, isn't it? I think this is related to this
>> commit, which first introduced tc_skb_protocol():
>> 
>> d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
>> 
>> That commit at least made the behaviour consistent across
>> accelerated/non-accelerated VLANs. However, the patch description
>> asserts that 'tc code .. expects vlan protocol type [in skb->protocol]'.
>> Looking at the various callers, I'm not actually sure that's true, in
>> the sense that most of the callers don't handle VLAN ethertypes at all,
>> but expects to find an IP header. This is the case for at least:
>> 
>> - act_ctinfo
>> - act_skbedit
>> - cls_flow
>> - em_ipset
>> - em_ipt
>> - sch_cake
>> - sch_dsmark
>
> sure, I'm not saying it's not possible to look inside IP headers. What I
> understood from Cong's replies [2], and he sort-of convinced me, was: when
> I have IP and one or more VLAN tags, no matter whether it is accelerated
> or not, it should be sufficient to access the IP header daisy-chaining
> 'act_vlan pop actions' -> access to the IP header -> ' act_vlan push
> actions (in the reversed order).
>
> oh well, that's still not sufficient in my home router because of PPPoE. I
> should practice with cls_bpf more seriously :-) 
>
> or write act_pppoe.c :D
>
>> In fact the only caller that explicitly handles a VLAN ethertype seems
>> to be act_csum; and that handles it in a way that also just skips the
>> VLAN headers, albeit by skb_pull()'ing the header.
>
>
>> cls_api, em_meta and sch_teql don't explicitly handle it; but returning
>> the VLAN ethertypes to those does seem to make sense, since they just
>> pass the value somewhere else.
>> 
>> So my suggestion would be to add a new helper that skips the VLAN tags
>> and finds the L3 ethertype (i.e., basically cake_skb_proto() as
>> introduced in this patch), then switch all the callers listed above, as
>> well as the INET_ECN_set_ce() over to using that. Maybe something like
>> 'skb_l3_protocol()' which could go into skbuff.h itself, so the ECN code
>> can also find it?
>
> for setting the CE bit, that's understandable - in one way or the other,
> the behaviour should be made consistent.
>
>> Any objections to this? It's not actually clear to me how the discussion
>> you quoted above landed; but this will at least make things consistent
>> across all the different actions/etc.
>
> well, it just didn't "land". But I agree, inconsistency here can make some
> TC configurations "unreliable" (i.e., they don't do the job they were
> configured for).

Right, I'll send a patch to try to make all this consistent :)

-Toke


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-29 19:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25 11:55 [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake Toke Høiland-Jørgensen
2020-06-25 11:55 ` [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags Toke Høiland-Jørgensen
2020-06-25 19:29   ` David Miller
2020-06-25 19:53     ` Toke Høiland-Jørgensen
2020-06-25 20:00       ` David Miller
2020-06-26  8:27       ` Davide Caratti
2020-06-26 12:52         ` [Cake] " Toke Høiland-Jørgensen
2020-06-26 14:01           ` Jamal Hadi Salim
2020-06-26 18:52           ` Davide Caratti
2020-06-29 10:27             ` Toke Høiland-Jørgensen
2020-06-26 13:11         ` Jonathan Morton
2020-06-26 22:00           ` Stephen Hemminger
2020-06-25 11:55 ` [PATCH net-next 2/5] sch_cake: don't try to reallocate or unshare skb unconditionally Toke Høiland-Jørgensen
2020-06-25 11:55 ` [PATCH net-next 3/5] sch_cake: don't call diffserv parsing code when it is not needed Toke Høiland-Jørgensen
2020-06-25 11:55 ` [PATCH net-next 4/5] sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling Toke Høiland-Jørgensen
2020-06-25 11:55 ` [PATCH net-next 5/5] sch_cake: fix a few style nits Toke Høiland-Jørgensen
2020-06-25 19:31 ` [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake David Miller
2020-06-25 19:49   ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.