All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6
@ 2020-06-24 19:23 Justin Iurman
  2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, justin.iurman

In-situ Operations, Administration, and Maintenance (IOAM) records
operational and telemetry information in a packet while it traverses
a path between two points in an IOAM domain. It is defined in
draft-ietf-ippm-ioam-data-09 [1]. IOAM data fields can be encapsulated
into a variety of protocols. The IPv6 encapsulation is defined in
draft-ietf-ippm-ioam-ipv6-options-01 [2], via extension headers. IOAM
can be used to complement OAM mechanisms based on e.g. ICMP or other
types of probe packets.

This patchset implements support for the Pre-allocated Trace, carried
by a Hop-by-Hop. Therefore, a new IPv6 Hop-by-Hop TLV option is
introduced, see IANA [3]. The three other IOAM options are not included
in this patchset (Incremental Trace, Proof-of-Transit and Edge-to-Edge).
The main idea behind the IOAM Pre-allocated Trace is that a node
pre-allocates some room in packets for IOAM data. Then, each IOAM node
on the path will insert its data. There exist several interesting use-
cases, e.g. Fast failure detection/isolation or Smart service selection.
Another killer use-case is what we have called Cross-Layer Telemetry,
see the demo video on its repository [4], that aims to make the entire
stack (L2/L3 -> L7) visible for distributed tracing tools (e.g. Jaeger),
instead of the current L5 -> L7 limited view. So, basically, this is a
nice feature for the Linux Kernel.

IOAM options must be 4n-aligned. Here is how a Hop-by-Hop looks like
with IOAM:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next header  |  Hdr Ext Len  |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Option Type  |  Opt Data Len |    Reserved   |   IOAM Type   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Namespace-ID          | NodeLen | Flags | RemainingLen|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                IOAM-Trace-Type                |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+
|                                                               |  |
|                         node data [0]                         |  |
|                                                               |  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  D
|                                                               |  a
|                         node data [1]                         |  t
|                                                               |  a
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                             ...                               ~  S
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  p
|                                                               |  a
|                         node data [n-1]                       |  c
|                                                               |  e
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+  |
|                                                               |  |
|                         node data [n]                         |  |
|                                                               |  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<-+

Namespace-ID represents an IOAM namespace identifier, not to be confused
with Linux namespaces. IOAM namespaces add further context to IOAM
options and associated data, and allow devices which are IOAM capable to
determine whether IOAM needs to be processed, updated or removed. They
can also be used by an operator to distinguish different operational
domains or to identify different sets of devices. Other fields are also
explained in [1] and [2].

This patchset does not provide support for the control plane part, ie
the IOAM encapsulation or inline insertion (ingress node behavior). It
will come as another patch since some design choices still need to be
discussed (talk @ Netdev 0x14). Globally, this patchset contains:

- Patch 1-3: Data plane support for the IOAM Pre-allocated Trace
- Patch 4:   Generic Netlink to configure IOAM from userspace (iproute2)
- Patch 5:   IOAM sysctls documentation

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
  [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
  [4] https://github.com/iurmanj/cross-layer-telemetry

Justin Iurman (5):
  ipv6: eh: Introduce removable TLVs
  ipv6: IOAM tunnel decapsulation
  ipv6: ioam: Data plane support for Pre-allocated Trace
  ipv6: ioam: Generic Netlink to configure IOAM
  ipv6: ioam: Documentation for new IOAM sysctls

 Documentation/networking/ioam6-sysctl.rst |  20 +
 Documentation/networking/ip-sysctl.rst    |   5 +
 include/linux/ioam6.h                     |   7 +
 include/linux/ipv6.h                      |   3 +
 include/net/ioam6.h                       |  98 +++
 include/net/netns/ipv6.h                  |   2 +
 include/uapi/linux/in6.h                  |   1 +
 include/uapi/linux/ioam6.h                |  43 ++
 include/uapi/linux/ipv6.h                 |   2 +
 net/ipv6/Makefile                         |   2 +-
 net/ipv6/addrconf.c                       |  20 +
 net/ipv6/af_inet6.c                       |   7 +
 net/ipv6/exthdrs.c                        | 201 +++++-
 net/ipv6/ioam6.c                          | 839 ++++++++++++++++++++++
 net/ipv6/ip6_input.c                      |  22 +
 net/ipv6/sysctl_net_ipv6.c                |   7 +
 16 files changed, 1252 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/networking/ioam6-sysctl.rst
 create mode 100644 include/linux/ioam6.h
 create mode 100644 include/net/ioam6.h
 create mode 100644 include/uapi/linux/ioam6.h
 create mode 100644 net/ipv6/ioam6.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
  2020-06-24 20:32   ` Tom Herbert
  2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, justin.iurman

Add the possibility to remove one or more consecutive TLVs without
messing up the alignment of others. For now, only IOAM requires this
behavior.

By default, an 8-octet boundary is automatically assumed. This is the
price to pay (at most a useless 4-octet padding) to make sure everything
is still aligned after the removal.

Proof: let's assume for instance the following alignments 2n, 4n and 8n
respectively for options X, Y and Z, inside a Hop-by-Hop extension
header.

Example 1:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next header  |  Hdr Ext Len  |       X       |       X       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       X       |       X       |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
~                Option to be removed (8 octets)                ~
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Y       |       Y       |       Y       |       Y       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Padding    |    Padding    |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
boundary (same result in both cases).

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next header  |  Hdr Ext Len  |       X       |       X       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       X       |       X       |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Y       |       Y       |       Y       |       Y       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Padding    |    Padding    |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Example 2:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next header  |  Hdr Ext Len  |       X       |       X       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       X       |       X       |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                Option to be removed (4 octets)                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Y       |       Y       |       Y       |       Y       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
of 8 anymore.

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next header  |  Hdr Ext Len  |       X       |       X       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       X       |       X       |    Padding    |    Padding    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Y       |       Y       |       Y       |       Y       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Z       |       Z       |       Z       |       Z       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Therefore, the largest (8-octet) boundary is assumed by default and for
all, which means that blocks are only moved in multiples of 8. This
assertion guarantees good alignment.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 108 insertions(+), 26 deletions(-)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index e9b366994475..f27ab3bf2e0c 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -52,17 +52,27 @@
 
 #include <linux/uaccess.h>
 
-/*
- *	Parsing tlv encoded headers.
+/* States for TLV parsing functions. */
+
+enum {
+	TLV_ACCEPT,
+	TLV_REJECT,
+	TLV_REMOVE,
+	__TLV_MAX
+};
+
+/* Parsing TLV encoded headers.
  *
- *	Parsing function "func" returns true, if parsing succeed
- *	and false, if it failed.
- *	It MUST NOT touch skb->h.
+ * Parsing function "func" returns either:
+ *  - TLV_ACCEPT if parsing succeeds
+ *  - TLV_REJECT if parsing fails
+ *  - TLV_REMOVE if TLV must be removed
+ * It MUST NOT touch skb->h.
  */
 
 struct tlvtype_proc {
 	int	type;
-	bool	(*func)(struct sk_buff *skb, int offset);
+	int	(*func)(struct sk_buff *skb, int offset);
 };
 
 /*********************
@@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff,
 	return false;
 }
 
+/* Remove one or several consecutive TLVs and recompute offsets, lengths */
+
+static int remove_tlv(int start, int end, struct sk_buff *skb)
+{
+	int len = end - start;
+	int padlen = len % 8;
+	unsigned char *h;
+	int rlen, off;
+	u16 pl_len;
+
+	rlen = len - padlen;
+	if (rlen) {
+		skb_pull(skb, rlen);
+		memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
+			start);
+		skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
+
+		skb_reset_network_header(skb);
+		skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+
+		pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
+		ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
+
+		skb_transport_header(skb)[1] -= rlen >> 3;
+		end -= rlen;
+	}
+
+	if (padlen) {
+		off = end - padlen;
+		h = skb_network_header(skb);
+
+		if (padlen == 1) {
+			h[off] = IPV6_TLV_PAD1;
+		} else {
+			padlen -= 2;
+
+			h[off] = IPV6_TLV_PADN;
+			h[off + 1] = padlen;
+			memset(&h[off + 2], 0, padlen);
+		}
+	}
+
+	return end;
+}
+
 /* Parse tlv encoded option header (hop-by-hop or destination) */
 
 static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
 			  struct sk_buff *skb,
-			  int max_count)
+			  int max_count,
+			  bool removable)
 {
 	int len = (skb_transport_header(skb)[1] + 1) << 3;
-	const unsigned char *nh = skb_network_header(skb);
+	unsigned char *nh = skb_network_header(skb);
 	int off = skb_network_header_len(skb);
 	const struct tlvtype_proc *curr;
 	bool disallow_unknowns = false;
+	int off_remove = 0;
 	int tlv_count = 0;
 	int padlen = 0;
+	int ret;
 
 	if (unlikely(max_count < 0)) {
 		disallow_unknowns = true;
@@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
 			if (tlv_count > max_count)
 				goto bad;
 
+			ret = -1;
 			for (curr = procs; curr->type >= 0; curr++) {
 				if (curr->type == nh[off]) {
 					/* type specific length/alignment
 					   checks will be performed in the
 					   func(). */
-					if (curr->func(skb, off) == false)
+					ret = curr->func(skb, off);
+					if (ret == TLV_REJECT)
 						return false;
 					break;
 				}
@@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
 			    !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
 				return false;
 
+			if (removable) {
+				if (ret == TLV_REMOVE) {
+					if (!off_remove)
+						off_remove = off - padlen;
+				} else if (off_remove) {
+					off = remove_tlv(off_remove, off, skb);
+					nh = skb_network_header(skb);
+					off_remove = 0;
+				}
+			}
+
 			padlen = 0;
 			break;
 		}
@@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
 		len -= optlen;
 	}
 
-	if (len == 0)
+	if (len == 0) {
+		/* Don't forget last TLV if it must be removed */
+		if (off_remove)
+			remove_tlv(off_remove, off, skb);
+
 		return true;
+	}
 bad:
 	kfree_skb(skb);
 	return false;
@@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
  *****************************/
 
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
-static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
+static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
 {
 	struct ipv6_destopt_hao *hao;
 	struct inet6_skb_parm *opt = IP6CB(skb);
@@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
 	if (skb->tstamp == 0)
 		__net_timestamp(skb);
 
-	return true;
+	return TLV_ACCEPT;
 
  discard:
 	kfree_skb(skb);
-	return false;
+	return TLV_REJECT;
 }
 #endif
 
@@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
 #endif
 
 	if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
-			  init_net.ipv6.sysctl.max_dst_opts_cnt)) {
+			  init_net.ipv6.sysctl.max_dst_opts_cnt,
+			  false)) {
 		skb->transport_header += extlen;
 		opt = IP6CB(skb);
 #if IS_ENABLED(CONFIG_IPV6_MIP6)
@@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff *skb)
 
 /* Router Alert as of RFC 2711 */
 
-static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
+static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
 {
 	const unsigned char *nh = skb_network_header(skb);
 
 	if (nh[optoff + 1] == 2) {
 		IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
 		memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
-		return true;
+		return TLV_ACCEPT;
 	}
 	net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
 			    nh[optoff + 1]);
 	kfree_skb(skb);
-	return false;
+	return TLV_REJECT;
 }
 
 /* Jumbo payload */
 
-static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
+static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
 {
 	const unsigned char *nh = skb_network_header(skb);
 	struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
@@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
 	if (pkt_len <= IPV6_MAXPLEN) {
 		__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
 		icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
-		return false;
+		return TLV_REJECT;
 	}
 	if (ipv6_hdr(skb)->payload_len) {
 		__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
 		icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
-		return false;
+		return TLV_REJECT;
 	}
 
 	if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
@@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
 		goto drop;
 
 	IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
-	return true;
+	return TLV_ACCEPT;
 
 drop:
 	kfree_skb(skb);
-	return false;
+	return TLV_REJECT;
 }
 
 /* CALIPSO RFC 5570 */
 
-static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
+static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
 {
 	const unsigned char *nh = skb_network_header(skb);
 
@@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
 	if (!calipso_validate(skb, nh + optoff))
 		goto drop;
 
-	return true;
+	return TLV_ACCEPT;
 
 drop:
 	kfree_skb(skb);
-	return false;
+	return TLV_REJECT;
 }
 
 static const struct tlvtype_proc tlvprochopopt_lst[] = {
@@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
 
 	opt->flags |= IP6SKB_HOPBYHOP;
 	if (ip6_parse_tlv(tlvprochopopt_lst, skb,
-			  init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
+			  init_net.ipv6.sysctl.max_hbh_opts_cnt,
+			  true)) {
+		/* we need to refresh the length in case
+		 * at least one TLV was removed
+		 */
+		extlen = (skb_transport_header(skb)[1] + 1) << 3;
 		skb->transport_header += extlen;
 		opt = IP6CB(skb);
 		opt->nhoff = sizeof(struct ipv6hdr);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
  2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
  2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
  2020-06-25  2:32   ` Tom Herbert
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, justin.iurman

Implement the IOAM egress behavior.

According to RFC 8200:
"Extension headers (except for the Hop-by-Hop Options header) are not
 processed, inserted, or deleted by any node along a packet's delivery
 path, until the packet reaches the node (or each of the set of nodes,
 in the case of multicast) identified in the Destination Address field
 of the IPv6 header."

Therefore, an ingress node (an IOAM domain border) must encapsulate an
incoming IPv6 packet with another similar IPv6 header that will contain
IOAM data while it traverses the domain. When leaving, the egress node,
another IOAM domain border which is also the tunnel destination, must
decapsulate the packet.

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 include/linux/ipv6.h |  1 +
 net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 2cb445a8fc9e..5312a718bc7a 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -138,6 +138,7 @@ struct inet6_skb_parm {
 #define IP6SKB_HOPBYHOP        32
 #define IP6SKB_L3SLAVE         64
 #define IP6SKB_JUMBOGRAM      128
+#define IP6SKB_IOAM           256
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index e96304d8a4a7..8cf75cc5e806 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
 void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
 			      bool have_final)
 {
+	struct inet6_skb_parm *opt = IP6CB(skb);
 	const struct inet6_protocol *ipprot;
 	struct inet6_dev *idev;
 	unsigned int nhoff;
+	u8 hop_limit;
 	bool raw;
 
 	/*
@@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
 	} else {
 		if (!raw) {
 			if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+				/* IOAM Tunnel Decapsulation
+				 * Packet is going to re-enter the stack
+				 */
+				if (nexthdr == NEXTHDR_IPV6 &&
+				    (opt->flags & IP6SKB_IOAM)) {
+					hop_limit = ipv6_hdr(skb)->hop_limit;
+
+					skb_reset_network_header(skb);
+					skb_reset_transport_header(skb);
+					skb->encapsulation = 0;
+
+					ipv6_hdr(skb)->hop_limit = hop_limit;
+					__skb_tunnel_rx(skb, skb->dev,
+							dev_net(skb->dev));
+
+					netif_rx(skb);
+					goto out;
+				}
+
 				__IP6_INC_STATS(net, idev,
 						IPSTATS_MIB_INUNKNOWNPROTOS);
 				icmpv6_send(skb, ICMPV6_PARAMPROB,
@@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
 			consume_skb(skb);
 		}
 	}
+out:
 	return;
 
 discard:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
  2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
  2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
  2020-06-24 21:37     ` kernel test robot
                     ` (4 more replies)
  2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
  2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
  4 siblings, 5 replies; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, justin.iurman

Implement support for processing the IOAM Pre-allocated Trace with IPv6,
see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
IPV6_TLV_IOAM_HOPOPTS, see IANA [3].

A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
packets. Default is drop.

Another per-interface sysctl ioam6_id is provided to define the IOAM
(unique) identifier of the interface.

A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
identifier of the node.

Two relativistic hash tables: one for IOAM namespaces, the other for
IOAM schemas. A namespace can only have a single active schema and a
schema can only be attached to a single namespace (1:1 relationship).

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
  [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 include/linux/ipv6.h       |   2 +
 include/net/ioam6.h        |  98 +++++++++++
 include/net/netns/ipv6.h   |   2 +
 include/uapi/linux/in6.h   |   1 +
 include/uapi/linux/ipv6.h  |   2 +
 net/ipv6/Makefile          |   2 +-
 net/ipv6/addrconf.c        |  20 +++
 net/ipv6/af_inet6.c        |   7 +
 net/ipv6/exthdrs.c         |  67 ++++++++
 net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
 net/ipv6/sysctl_net_ipv6.c |   7 +
 11 files changed, 533 insertions(+), 1 deletion(-)
 create mode 100644 include/net/ioam6.h
 create mode 100644 net/ipv6/ioam6.c

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 5312a718bc7a..15732f964c6e 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -75,6 +75,8 @@ struct ipv6_devconf {
 	__s32		disable_policy;
 	__s32           ndisc_tclass;
 	__s32		rpl_seg_enabled;
+	__u32		ioam6_enabled;
+	__u32           ioam6_id;
 
 	struct ctl_table_header *sysctl_header;
 };
diff --git a/include/net/ioam6.h b/include/net/ioam6.h
new file mode 100644
index 000000000000..2a910bc99947
--- /dev/null
+++ b/include/net/ioam6.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ *  IOAM IPv6 implementation
+ *
+ *  Author:
+ *  Justin Iurman <justin.iurman@uliege.be>
+ */
+
+#ifndef _NET_IOAM6_H
+#define _NET_IOAM6_H
+
+#include <linux/net.h>
+#include <linux/ipv6.h>
+#include <linux/rhashtable-types.h>
+
+#define IOAM6_OPT_TRACE_PREALLOC 0
+
+#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
+
+#define IOAM6_TRACE_TYPE0  (1 << 31)
+#define IOAM6_TRACE_TYPE1  (1 << 30)
+#define IOAM6_TRACE_TYPE2  (1 << 29)
+#define IOAM6_TRACE_TYPE3  (1 << 28)
+#define IOAM6_TRACE_TYPE4  (1 << 27)
+#define IOAM6_TRACE_TYPE5  (1 << 26)
+#define IOAM6_TRACE_TYPE6  (1 << 25)
+#define IOAM6_TRACE_TYPE7  (1 << 24)
+#define IOAM6_TRACE_TYPE8  (1 << 23)
+#define IOAM6_TRACE_TYPE9  (1 << 22)
+#define IOAM6_TRACE_TYPE10 (1 << 21)
+#define IOAM6_TRACE_TYPE11 (1 << 20)
+#define IOAM6_TRACE_TYPE22 (1 << 9)
+
+#define IOAM6_EMPTY_FIELD_u16 0xffff
+#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
+#define IOAM6_EMPTY_FIELD_u32 0xffffffff
+#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
+#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
+
+struct ioam6_common_hdr {
+	u8 opt_type;
+	u8 opt_len;
+	u8 res;
+	u8 ioam_type;
+	__be16 namespace_id;
+} __packed;
+
+struct ioam6_trace_hdr {
+	__be16 info;
+	__be32 type;
+} __packed;
+
+struct ioam6_namespace {
+	struct rhash_head head;
+	struct rcu_head rcu;
+
+	__be16 id;
+	__be64 data;
+	bool remove_tlv;
+
+	struct ioam6_schema *schema;
+};
+
+struct ioam6_schema {
+	struct rhash_head head;
+	struct rcu_head rcu;
+
+	u32 id;
+	int len;
+	__be32 hdr;
+	u8 *data;
+
+	struct ioam6_namespace *ns;
+};
+
+struct ioam6_pernet_data {
+	struct mutex lock;
+	struct rhashtable namespaces;
+	struct rhashtable schemas;
+};
+
+static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	return net->ipv6.ioam6_data;
+#else
+	return NULL;
+#endif
+}
+
+extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
+extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
+				  struct ioam6_namespace *ns);
+
+extern int ioam6_init(void);
+extern void ioam6_exit(void);
+
+#endif
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 5ec054473d81..89b27fa721f4 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
 	int max_hbh_opts_len;
 	int seg6_flowlabel;
 	bool skip_notify_on_dev_down;
+	unsigned int ioam6_id;
 };
 
 struct netns_ipv6 {
@@ -115,6 +116,7 @@ struct netns_ipv6 {
 		spinlock_t	lock;
 		u32		seq;
 	} ip6addrlbl_table;
+	struct ioam6_pernet_data *ioam6_data;
 };
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index 9f2273a08356..1c98435220c9 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -145,6 +145,7 @@ struct in6_flowlabel_req {
 #define IPV6_TLV_PADN		1
 #define IPV6_TLV_ROUTERALERT	5
 #define IPV6_TLV_CALIPSO	7	/* RFC 5570 */
+#define IPV6_TLV_IOAM_HOPOPTS	49
 #define IPV6_TLV_JUMBO		194
 #define IPV6_TLV_HAO		201	/* home address option */
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 13e8751bf24a..eb521b2dd885 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -189,6 +189,8 @@ enum {
 	DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
 	DEVCONF_NDISC_TCLASS,
 	DEVCONF_RPL_SEG_ENABLED,
+	DEVCONF_IOAM6_ENABLED,
+	DEVCONF_IOAM6_ID,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index cf7b47bdb9b3..b7ef10d417d6 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -10,7 +10,7 @@ ipv6-objs :=	af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
 		route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
 		raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
 		exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
-		udp_offload.o seg6.o fib6_notifier.o rpl.o
+		udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
 
 ipv6-offload :=	ip6_offload.o tcpv6_offload.o exthdrs_offload.o
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 840bfdb3d7bd..6c952a28ade2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.addr_gen_mode		= IN6_ADDR_GEN_MODE_EUI64,
 	.disable_policy		= 0,
 	.rpl_seg_enabled	= 0,
+	.ioam6_enabled		= 0,
+	.ioam6_id               = 0,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.addr_gen_mode		= IN6_ADDR_GEN_MODE_EUI64,
 	.disable_policy		= 0,
 	.rpl_seg_enabled	= 0,
+	.ioam6_enabled		= 0,
+	.ioam6_id               = 0,
 };
 
 /* Check if link is ready: is it up and is a valid qdisc available */
@@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
 	array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
 	array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
+	array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
+	array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "ioam6_enabled",
+		.data		= &ipv6_devconf.ioam6_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
+		.procname	= "ioam6_id",
+		.data		= &ipv6_devconf.ioam6_id,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 	{
 		/* sentinel */
 	}
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index b304b882e031..63a9ffc4b283 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -62,6 +62,7 @@
 #include <net/rpl.h>
 #include <net/compat.h>
 #include <net/xfrm.h>
+#include <net/ioam6.h>
 
 #include <linux/uaccess.h>
 #include <linux/mroute6.h>
@@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
 	if (err)
 		goto rpl_fail;
 
+	err = ioam6_init();
+	if (err)
+		goto ioam6_fail;
+
 	err = igmp6_late_init();
 	if (err)
 		goto igmp6_late_err;
@@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
 #endif
 igmp6_late_err:
 	rpl_exit();
+ioam6_fail:
+	ioam6_exit();
 rpl_fail:
 	seg6_exit();
 seg6_fail:
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index f27ab3bf2e0c..00aee1358f1c 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -49,6 +49,8 @@
 #include <net/seg6_hmac.h>
 #endif
 #include <net/rpl.h>
+#include <net/ioam6.h>
+#include <net/dst_metadata.h>
 
 #include <linux/uaccess.h>
 
@@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
 	return TLV_REJECT;
 }
 
+/* IOAM */
+
+static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
+{
+	struct ioam6_common_hdr *ioamh;
+	struct ioam6_namespace *ns;
+
+	/* Must be 4n-aligned */
+	if (optoff & 3)
+		goto drop;
+
+	if (!skb_valid_dst(skb))
+		ip6_route_input(skb);
+
+	/* IOAM must be enabled on ingress interface */
+	if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
+		goto drop;
+
+	ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
+	ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
+
+	/* Unknown IOAM namespace, either:
+	 *  - Drop it if IOAM is not enabled on egress interface (if any)
+	 *  - Ignore it otherwise
+	 */
+	if (!ns) {
+		if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
+		    !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
+			goto drop;
+
+		goto accept;
+	}
+
+	if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
+		goto remove;
+
+	/* Known IOAM namespace which must not be removed:
+	 * IOAM must be enabled on egress interface
+	 */
+	if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
+	    !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
+		goto drop;
+
+	switch (ioamh->ioam_type) {
+	case IOAM6_OPT_TRACE_PREALLOC:
+		ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
+		IP6CB(skb)->flags |= IP6SKB_IOAM;
+		break;
+	default:
+		break;
+	}
+
+accept:
+	return TLV_ACCEPT;
+remove:
+	return TLV_REMOVE;
+drop:
+	kfree_skb(skb);
+	return TLV_REJECT;
+}
+
 /* Jumbo payload */
 
 static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
@@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
 		.type	= IPV6_TLV_ROUTERALERT,
 		.func	= ipv6_hop_ra,
 	},
+	{
+		.type	= IPV6_TLV_IOAM_HOPOPTS,
+		.func	= ipv6_hop_ioam,
+	},
 	{
 		.type	= IPV6_TLV_JUMBO,
 		.func	= ipv6_hop_jumbo,
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
new file mode 100644
index 000000000000..406aa78eb504
--- /dev/null
+++ b/net/ipv6/ioam6.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *  IOAM IPv6 implementation
+ *
+ *  Author:
+ *  Justin Iurman <justin.iurman@uliege.be>
+ */
+
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/net.h>
+#include <linux/rhashtable.h>
+
+#include <net/addrconf.h>
+#include <net/ioam6.h>
+
+static inline void ioam6_ns_release(struct ioam6_namespace *ns)
+{
+	kfree_rcu(ns, rcu);
+}
+
+static inline void ioam6_sc_release(struct ioam6_schema *sc)
+{
+	kfree_rcu(sc, rcu);
+}
+
+static void ioam6_free_ns(void *ptr, void *arg)
+{
+	struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
+
+	if (ns)
+		ioam6_ns_release(ns);
+}
+
+static void ioam6_free_sc(void *ptr, void *arg)
+{
+	struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
+
+	if (sc)
+		ioam6_sc_release(sc);
+}
+
+static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+	const struct ioam6_namespace *ns = obj;
+
+	return (ns->id != *(__be16 *)arg->key);
+}
+
+static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+	const struct ioam6_schema *sc = obj;
+
+	return (sc->id != *(u32 *)arg->key);
+}
+
+static const struct rhashtable_params rht_ns_params = {
+	.key_len		= sizeof(__be16),
+	.key_offset		= offsetof(struct ioam6_namespace, id),
+	.head_offset		= offsetof(struct ioam6_namespace, head),
+	.automatic_shrinking	= true,
+	.obj_cmpfn		= ioam6_ns_cmpfn,
+};
+
+static const struct rhashtable_params rht_sc_params = {
+	.key_len		= sizeof(u32),
+	.key_offset		= offsetof(struct ioam6_schema, id),
+	.head_offset		= offsetof(struct ioam6_schema, head),
+	.automatic_shrinking	= true,
+	.obj_cmpfn		= ioam6_sc_cmpfn,
+};
+
+struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
+{
+	struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
+
+	return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
+}
+
+void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
+				u32 trace_type, struct ioam6_namespace *ns)
+{
+	u8 *data = skb_network_header(skb) + nodeoff;
+	struct __kernel_sock_timeval ts;
+	u64 raw_u64;
+	u32 raw_u32;
+	u16 raw_u16;
+	u8 byte;
+
+	/* hop_lim and node_id */
+	if (trace_type & IOAM6_TRACE_TYPE0) {
+		byte = ipv6_hdr(skb)->hop_limit - 1;
+		raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
+		if (!raw_u32)
+			raw_u32 = IOAM6_EMPTY_FIELD_u24;
+		else
+			raw_u32 &= IOAM6_EMPTY_FIELD_u24;
+		*(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
+		data += sizeof(__be32);
+	}
+
+	/* ingress_if_id and egress_if_id */
+	if (trace_type & IOAM6_TRACE_TYPE1) {
+		raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
+		if (!raw_u16)
+			raw_u16 = IOAM6_EMPTY_FIELD_u16;
+		*(__be16 *)data = cpu_to_be16(raw_u16);
+		data += sizeof(__be16);
+
+		raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
+		if (!raw_u16)
+			raw_u16 = IOAM6_EMPTY_FIELD_u16;
+		*(__be16 *)data = cpu_to_be16(raw_u16);
+		data += sizeof(__be16);
+	}
+
+	/* timestamp seconds */
+	if (trace_type & IOAM6_TRACE_TYPE2) {
+		if (!skb->tstamp) {
+			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+		} else {
+			skb_get_new_timestamp(skb, &ts);
+			*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
+		}
+		data += sizeof(__be32);
+	}
+
+	/* timestamp subseconds */
+	if (trace_type & IOAM6_TRACE_TYPE3) {
+		if (!skb->tstamp) {
+			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+		} else {
+			if (!(trace_type & IOAM6_TRACE_TYPE2))
+				skb_get_new_timestamp(skb, &ts);
+			*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
+		}
+		data += sizeof(__be32);
+	}
+
+	/* transit delay */
+	if (trace_type & IOAM6_TRACE_TYPE4) {
+		*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+		data += sizeof(__be32);
+	}
+
+	/* namespace data */
+	if (trace_type & IOAM6_TRACE_TYPE5) {
+		*(__be32 *)data = (__be32)ns->data;
+		data += sizeof(__be32);
+	}
+
+	/* queue depth */
+	if (trace_type & IOAM6_TRACE_TYPE6) {
+		*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+		data += sizeof(__be32);
+	}
+
+	/* hop_lim and node_id (wide) */
+	if (trace_type & IOAM6_TRACE_TYPE7) {
+		byte = ipv6_hdr(skb)->hop_limit - 1;
+		raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
+		if (!raw_u64)
+			raw_u64 = IOAM6_EMPTY_FIELD_u56;
+		else
+			raw_u64 &= IOAM6_EMPTY_FIELD_u56;
+		*(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
+		data += sizeof(__be64);
+	}
+
+	/* ingress_if_id and egress_if_id (wide) */
+	if (trace_type & IOAM6_TRACE_TYPE8) {
+		raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
+		if (!raw_u32)
+			raw_u32 = IOAM6_EMPTY_FIELD_u32;
+		*(__be32 *)data = cpu_to_be32(raw_u32);
+		data += sizeof(__be32);
+
+		raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
+		if (!raw_u32)
+			raw_u32 = IOAM6_EMPTY_FIELD_u32;
+		*(__be32 *)data = cpu_to_be32(raw_u32);
+		data += sizeof(__be32);
+	}
+
+	/* namespace data (wide) */
+	if (trace_type & IOAM6_TRACE_TYPE9) {
+		*(__be64 *)data = ns->data;
+		data += sizeof(__be64);
+	}
+
+	/* buffer occupancy */
+	if (trace_type & IOAM6_TRACE_TYPE10) {
+		*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+		data += sizeof(__be32);
+	}
+
+	/* checksum complement */
+	if (trace_type & IOAM6_TRACE_TYPE11) {
+		*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+		data += sizeof(__be32);
+	}
+
+	/* opaque state snapshot */
+	if (trace_type & IOAM6_TRACE_TYPE22) {
+		if (!ns->schema) {
+			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
+		} else {
+			*(__be32 *)data = ns->schema->hdr;
+			data += sizeof(__be32);
+			memcpy(data, ns->schema->data, ns->schema->len);
+		}
+	}
+}
+
+void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
+			   struct ioam6_namespace *ns)
+{
+	u8 nodelen, flags, remlen, sclen = 0;
+	struct ioam6_trace_hdr *trh;
+	int nodeoff;
+	u16 info;
+	u32 type;
+
+	trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
+	info = be16_to_cpu(trh->info);
+	type = be32_to_cpu(trh->type);
+
+	nodelen = info >> 11;
+	flags = (info >> 7) & 0xf;
+	remlen = info & 0x7f;
+
+	/* Skip if Overflow bit is set OR
+	 * if an unknown type (bit 12-21) is set
+	 */
+	if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
+		return;
+
+	/* NodeLen does not include Opaque State Snapshot length. We need to
+	 * take it into account if the corresponding bit is set and if current
+	 * IOAM namespace has an active schema attached to it
+	 */
+	if (type & IOAM6_TRACE_TYPE22) {
+		/* Opaque State Snapshot header size */
+		sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
+
+		if (ns->schema)
+			sclen += ns->schema->len / 4;
+	}
+
+	/* Not enough space remaining: set Overflow bit and skip */
+	if (!remlen || remlen < (nodelen + sclen)) {
+		info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
+		trh->info = cpu_to_be16(info);
+		return;
+	}
+
+	nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
+	ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
+
+	/* Update RemainingLen */
+	remlen -= nodelen + sclen;
+	info = (info & 0xff80) | remlen;
+	trh->info = cpu_to_be16(info);
+}
+
+static int __net_init ioam6_net_init(struct net *net)
+{
+	struct ioam6_pernet_data *nsdata;
+	int err = -ENOMEM;
+
+	nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
+	if (!nsdata)
+		goto out;
+
+	mutex_init(&nsdata->lock);
+	net->ipv6.ioam6_data = nsdata;
+
+	err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
+	if (err)
+		goto free_nsdata;
+
+	err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
+	if (err)
+		goto free_rht_ns;
+
+out:
+	return err;
+free_rht_ns:
+	rhashtable_destroy(&nsdata->namespaces);
+free_nsdata:
+	kfree(nsdata);
+	net->ipv6.ioam6_data = NULL;
+	goto out;
+}
+
+static void __net_exit ioam6_net_exit(struct net *net)
+{
+	struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
+
+	rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
+	rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
+
+	kfree(nsdata);
+}
+
+static struct pernet_operations ioam6_net_ops = {
+	.init = ioam6_net_init,
+	.exit = ioam6_net_exit,
+};
+
+int __init ioam6_init(void)
+{
+	int err = register_pernet_subsys(&ioam6_net_ops);
+
+	if (err)
+		return err;
+
+	pr_info("In-situ OAM (IOAM) with IPv6\n");
+	return 0;
+}
+
+void ioam6_exit(void)
+{
+	unregister_pernet_subsys(&ioam6_net_ops);
+}
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index fac2135aa47b..da49b33ab6fc 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "ioam6_id",
+		.data		= &init_net.ipv6.sysctl.ioam6_id,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
  2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
                   ` (2 preceding siblings ...)
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
  2020-06-25 10:52     ` Dan Carpenter
  2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
  4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, justin.iurman

Add Generic Netlink commands to allow userspace to configure IOAM
namespaces and schemas. The target is iproute2 and the patch is ready.
It will be posted as soon as this patchset is merged. Here is a taste:

$ sudo ip ioam
Usage: ip ioam { namespace | schema } { show | del ID }
               schema add ID DATA
	       namespace add ID [ DATA ] [ POP ]
               namespace set ID schema { ID | none }
POP := { true | false }

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 include/linux/ioam6.h      |   7 +
 include/uapi/linux/ioam6.h |  43 +++
 net/ipv6/ioam6.c           | 519 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 566 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/ioam6.h
 create mode 100644 include/uapi/linux/ioam6.h

diff --git a/include/linux/ioam6.h b/include/linux/ioam6.h
new file mode 100644
index 000000000000..156223095e57
--- /dev/null
+++ b/include/linux/ioam6.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IOAM6_H
+#define _LINUX_IOAM6_H
+
+#include <uapi/linux/ioam6.h>
+
+#endif
diff --git a/include/uapi/linux/ioam6.h b/include/uapi/linux/ioam6.h
new file mode 100644
index 000000000000..d2be5f820dc5
--- /dev/null
+++ b/include/uapi/linux/ioam6.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_IOAM6_H
+#define _UAPI_LINUX_IOAM6_H
+
+#define IOAM6_GENL_NAME "IOAM6"
+#define IOAM6_GENL_VERSION 0x1
+
+enum {
+	IOAM6_ATTR_UNSPEC,
+
+	IOAM6_ATTR_NS_ID,	/* u16 */
+	IOAM6_ATTR_NS_DATA,	/* u64 */
+	IOAM6_ATTR_NS_POP,	/* Flag */
+
+#define IOAM6_MAX_SCHEMA_DATA_LEN (255 * 4)
+	IOAM6_ATTR_SC_ID,	/* u32 */
+	IOAM6_ATTR_SC_DATA,	/* Binary */
+	IOAM6_ATTR_SC_NONE,	/* Flag */
+
+	IOAM6_ATTR_PAD,
+
+	__IOAM6_ATTR_MAX,
+};
+#define IOAM6_ATTR_MAX (__IOAM6_ATTR_MAX - 1)
+
+enum {
+	IOAM6_CMD_UNSPEC,
+
+	IOAM6_CMD_ADD_NAMESPACE,
+	IOAM6_CMD_DEL_NAMESPACE,
+	IOAM6_CMD_DUMP_NAMESPACES,
+
+	IOAM6_CMD_ADD_SCHEMA,
+	IOAM6_CMD_DEL_SCHEMA,
+	IOAM6_CMD_DUMP_SCHEMAS,
+
+	IOAM6_CMD_NS_SET_SCHEMA,
+
+	__IOAM6_CMD_MAX,
+};
+#define IOAM6_CMD_MAX (__IOAM6_CMD_MAX - 1)
+
+#endif
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 406aa78eb504..e414e915bf1e 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -11,8 +11,10 @@
 #include <linux/kernel.h>
 #include <linux/net.h>
 #include <linux/rhashtable.h>
+#include <linux/ioam6.h>
 
 #include <net/addrconf.h>
+#include <net/genetlink.h>
 #include <net/ioam6.h>
 
 static inline void ioam6_ns_release(struct ioam6_namespace *ns)
@@ -71,6 +73,507 @@ static const struct rhashtable_params rht_sc_params = {
 	.obj_cmpfn		= ioam6_sc_cmpfn,
 };
 
+static struct genl_family ioam6_genl_family;
+
+static const struct nla_policy ioam6_genl_policy[IOAM6_ATTR_MAX + 1] = {
+	[IOAM6_ATTR_NS_ID]	= { .type = NLA_U16 },
+	[IOAM6_ATTR_NS_DATA]	= { .type = NLA_U64 },
+	[IOAM6_ATTR_NS_POP]	= { .type = NLA_FLAG },
+	[IOAM6_ATTR_SC_ID]	= { .type = NLA_U32 },
+	[IOAM6_ATTR_SC_DATA]	= { .type = NLA_BINARY,
+				    .len = IOAM6_MAX_SCHEMA_DATA_LEN },
+	[IOAM6_ATTR_SC_NONE]	= { .type = NLA_FLAG },
+};
+
+static int ioam6_genl_addns(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ioam6_pernet_data *nsdata;
+	struct ioam6_namespace *ns;
+	__be16 ns_id;
+	int err;
+
+	if (!info->attrs[IOAM6_ATTR_NS_ID])
+		return -EINVAL;
+
+	ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
+	nsdata = ioam6_pernet(net);
+
+	mutex_lock(&nsdata->lock);
+
+	ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
+	if (ns) {
+		err = -EEXIST;
+		goto out_unlock;
+	}
+
+	ns = kzalloc(sizeof(*ns), GFP_KERNEL);
+	if (!ns) {
+		err = -ENOMEM;
+		goto out_unlock;
+	}
+
+	ns->id = ns_id;
+	ns->remove_tlv = info->attrs[IOAM6_ATTR_NS_POP] ? true : false;
+
+	if (!info->attrs[IOAM6_ATTR_NS_DATA]) {
+		ns->data = cpu_to_be64(IOAM6_EMPTY_FIELD_u64);
+	} else {
+		ns->data = cpu_to_be64(
+				nla_get_u64(info->attrs[IOAM6_ATTR_NS_DATA]));
+	}
+
+	err = rhashtable_lookup_insert_fast(&nsdata->namespaces, &ns->head,
+					    rht_ns_params);
+	if (err)
+		kfree(ns);
+
+out_unlock:
+	mutex_unlock(&nsdata->lock);
+	return err;
+}
+
+static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ioam6_pernet_data *nsdata;
+	struct ioam6_namespace *ns;
+	__be16 ns_id;
+	int err;
+
+	if (!info->attrs[IOAM6_ATTR_NS_ID])
+		return -EINVAL;
+
+	ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
+	nsdata = ioam6_pernet(net);
+
+	mutex_lock(&nsdata->lock);
+
+	ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
+	if (!ns) {
+		err = -ENOENT;
+		goto out_unlock;
+	}
+
+	if (ns->schema)
+		ns->schema->ns = NULL;
+
+	err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
+				     rht_ns_params);
+	if (err) {
+		ns->schema->ns = ns;
+		goto out_unlock;
+	}
+
+	ioam6_ns_release(ns);
+
+out_unlock:
+	mutex_unlock(&nsdata->lock);
+	return err;
+}
+
+static int __ioam6_genl_dumpns_element(struct ioam6_namespace *ns,
+				       u32 portid, u32 seq, u32 flags,
+				       struct sk_buff *skb, u8 cmd)
+{
+	void *hdr;
+	u64 data;
+
+	hdr = genlmsg_put(skb, portid, seq, &ioam6_genl_family, flags, cmd);
+	if (!hdr)
+		return -ENOMEM;
+
+	data = be64_to_cpu(ns->data);
+
+	if (nla_put_u16(skb, IOAM6_ATTR_NS_ID, be16_to_cpu(ns->id)) ||
+	    (data != IOAM6_EMPTY_FIELD_u64 &&
+	     nla_put_u64_64bit(skb, IOAM6_ATTR_NS_DATA, data, IOAM6_ATTR_PAD)) ||
+	    (ns->remove_tlv && nla_put_flag(skb, IOAM6_ATTR_NS_POP)) ||
+	    (ns->schema && nla_put_u32(skb, IOAM6_ATTR_SC_ID, ns->schema->id)))
+		goto nla_put_failure;
+
+	genlmsg_end(skb, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int ioam6_genl_dumpns_start(struct netlink_callback *cb)
+{
+	struct net *net = sock_net(cb->skb->sk);
+	struct ioam6_pernet_data *nsdata;
+	struct rhashtable_iter *iter;
+
+	nsdata = ioam6_pernet(net);
+	iter = (struct rhashtable_iter *)cb->args[0];
+
+	if (!iter) {
+		iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+		if (!iter)
+			return -ENOMEM;
+
+		cb->args[0] = (long)iter;
+	}
+
+	rhashtable_walk_enter(&nsdata->namespaces, iter);
+
+	return 0;
+}
+
+static int ioam6_genl_dumpns_done(struct netlink_callback *cb)
+{
+	struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+
+	rhashtable_walk_exit(iter);
+	kfree(iter);
+
+	return 0;
+}
+
+static int ioam6_genl_dumpns(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+	struct ioam6_namespace *ns;
+	int err;
+
+	rhashtable_walk_start(iter);
+
+	for (;;) {
+		ns = rhashtable_walk_next(iter);
+
+		if (IS_ERR(ns)) {
+			if (PTR_ERR(ns) == -EAGAIN)
+				continue;
+			err = PTR_ERR(ns);
+			goto done;
+		} else if (!ns) {
+			break;
+		}
+
+		err = __ioam6_genl_dumpns_element(ns,
+						  NETLINK_CB(cb->skb).portid,
+						  cb->nlh->nlmsg_seq,
+						  NLM_F_MULTI,
+						  skb,
+						  IOAM6_CMD_DUMP_NAMESPACES);
+		if (err)
+			goto done;
+	}
+
+	err = skb->len;
+
+done:
+	rhashtable_walk_stop(iter);
+	return err;
+}
+
+static int ioam6_genl_addsc(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ioam6_pernet_data *nsdata;
+	struct ioam6_schema *sc;
+	int len, pad, err;
+	u32 sc_id;
+
+	if (!info->attrs[IOAM6_ATTR_SC_ID] || !info->attrs[IOAM6_ATTR_SC_DATA])
+		return -EINVAL;
+
+	sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
+	nsdata = ioam6_pernet(net);
+
+	mutex_lock(&nsdata->lock);
+
+	sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
+	if (sc) {
+		err = -EEXIST;
+		goto out_unlock;
+	}
+
+	sc = kzalloc(sizeof(*sc), GFP_KERNEL);
+	if (!sc) {
+		err = -ENOMEM;
+		goto out_unlock;
+	}
+
+	len = nla_len(info->attrs[IOAM6_ATTR_SC_DATA]);
+	pad = (4 - (len % 4)) % 4;
+
+	sc->data = kzalloc(len + pad, GFP_KERNEL);
+	if (!sc->data) {
+		err = -ENOMEM;
+		goto free_sc;
+	}
+
+	sc->id = sc_id;
+	sc->len = len + pad;
+	sc->hdr = cpu_to_be32(sc->id | ((u8)(sc->len / 4) << 24));
+
+	nla_memcpy(sc->data, info->attrs[IOAM6_ATTR_SC_DATA], len);
+
+	err = rhashtable_lookup_insert_fast(&nsdata->schemas, &sc->head,
+					    rht_sc_params);
+	if (err)
+		goto free_data;
+
+out_unlock:
+	mutex_unlock(&nsdata->lock);
+	return err;
+free_data:
+	kfree(sc->data);
+free_sc:
+	kfree(sc);
+	goto out_unlock;
+}
+
+static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ioam6_pernet_data *nsdata;
+	struct ioam6_schema *sc;
+	u32 sc_id;
+	int err;
+
+	if (!info->attrs[IOAM6_ATTR_SC_ID])
+		return -EINVAL;
+
+	sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
+	nsdata = ioam6_pernet(net);
+
+	mutex_lock(&nsdata->lock);
+
+	sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
+	if (!sc) {
+		err = -ENOENT;
+		goto out_unlock;
+	}
+
+	if (sc->ns)
+		sc->ns->schema = NULL;
+
+	err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
+				     rht_sc_params);
+	if (err) {
+		sc->ns->schema = sc;
+		goto out_unlock;
+	}
+
+	ioam6_sc_release(sc);
+
+out_unlock:
+	mutex_unlock(&nsdata->lock);
+	return err;
+}
+
+static int __ioam6_genl_dumpsc_element(struct ioam6_schema *sc,
+				       u32 portid, u32 seq, u32 flags,
+				       struct sk_buff *skb, u8 cmd)
+{
+	void *hdr;
+
+	hdr = genlmsg_put(skb, portid, seq, &ioam6_genl_family, flags, cmd);
+	if (!hdr)
+		return -ENOMEM;
+
+	if (nla_put_u32(skb, IOAM6_ATTR_SC_ID, sc->id) ||
+	    nla_put(skb, IOAM6_ATTR_SC_DATA, sc->len, sc->data) ||
+	    (sc->ns && nla_put_u16(skb, IOAM6_ATTR_NS_ID,
+				   be16_to_cpu(sc->ns->id))))
+		goto nla_put_failure;
+
+	genlmsg_end(skb, hdr);
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int ioam6_genl_dumpsc_start(struct netlink_callback *cb)
+{
+	struct net *net = sock_net(cb->skb->sk);
+	struct ioam6_pernet_data *nsdata;
+	struct rhashtable_iter *iter;
+
+	nsdata = ioam6_pernet(net);
+	iter = (struct rhashtable_iter *)cb->args[0];
+
+	if (!iter) {
+		iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+		if (!iter)
+			return -ENOMEM;
+
+		cb->args[0] = (long)iter;
+	}
+
+	rhashtable_walk_enter(&nsdata->schemas, iter);
+
+	return 0;
+}
+
+static int ioam6_genl_dumpsc_done(struct netlink_callback *cb)
+{
+	struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+
+	rhashtable_walk_exit(iter);
+	kfree(iter);
+
+	return 0;
+}
+
+static int ioam6_genl_dumpsc(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+	struct ioam6_schema *sc;
+	int err;
+
+	rhashtable_walk_start(iter);
+
+	for (;;) {
+		sc = rhashtable_walk_next(iter);
+
+		if (IS_ERR(sc)) {
+			if (PTR_ERR(sc) == -EAGAIN)
+				continue;
+			err = PTR_ERR(sc);
+			goto done;
+		} else if (!sc) {
+			break;
+		}
+
+		err = __ioam6_genl_dumpsc_element(sc,
+						  NETLINK_CB(cb->skb).portid,
+						  cb->nlh->nlmsg_seq,
+						  NLM_F_MULTI,
+						  skb,
+						  IOAM6_CMD_DUMP_SCHEMAS);
+		if (err)
+			goto done;
+	}
+
+	err = skb->len;
+
+done:
+	rhashtable_walk_stop(iter);
+	return err;
+}
+
+static int ioam6_genl_ns_set_schema(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct ioam6_pernet_data *nsdata;
+	struct ioam6_namespace *ns;
+	struct ioam6_schema *sc;
+	__be16 ns_id;
+	int err = 0;
+	u32 sc_id;
+
+	if (!info->attrs[IOAM6_ATTR_NS_ID] ||
+	    (!info->attrs[IOAM6_ATTR_SC_ID] &&
+	     !info->attrs[IOAM6_ATTR_SC_NONE]))
+		return -EINVAL;
+
+	ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
+	nsdata = ioam6_pernet(net);
+
+	mutex_lock(&nsdata->lock);
+
+	ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
+	if (!ns) {
+		err = -ENOENT;
+		goto out_unlock;
+	}
+
+	if (info->attrs[IOAM6_ATTR_SC_NONE]) {
+		sc = NULL;
+	} else {
+		sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
+		sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id,
+					    rht_sc_params);
+		if (!sc) {
+			err = -ENOENT;
+			goto out_unlock;
+		}
+	}
+
+	if (ns->schema)
+		ns->schema->ns = NULL;
+	ns->schema = sc;
+
+	if (sc) {
+		if (sc->ns)
+			sc->ns->schema = NULL;
+		sc->ns = ns;
+	}
+
+out_unlock:
+	mutex_unlock(&nsdata->lock);
+	return err;
+}
+
+static const struct genl_ops ioam6_genl_ops[] = {
+	{
+		.cmd	= IOAM6_CMD_ADD_NAMESPACE,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit	= ioam6_genl_addns,
+		.flags	= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd	= IOAM6_CMD_DEL_NAMESPACE,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit	= ioam6_genl_delns,
+		.flags	= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd	= IOAM6_CMD_DUMP_NAMESPACES,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.start	= ioam6_genl_dumpns_start,
+		.dumpit	= ioam6_genl_dumpns,
+		.done	= ioam6_genl_dumpns_done,
+		.flags	= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd	= IOAM6_CMD_ADD_SCHEMA,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit	= ioam6_genl_addsc,
+		.flags	= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd	= IOAM6_CMD_DEL_SCHEMA,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit	= ioam6_genl_delsc,
+		.flags	= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd	= IOAM6_CMD_DUMP_SCHEMAS,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.start	= ioam6_genl_dumpsc_start,
+		.dumpit	= ioam6_genl_dumpsc,
+		.done	= ioam6_genl_dumpsc_done,
+		.flags	= GENL_ADMIN_PERM,
+	},
+	{
+		.cmd	= IOAM6_CMD_NS_SET_SCHEMA,
+		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+		.doit	= ioam6_genl_ns_set_schema,
+		.flags	= GENL_ADMIN_PERM,
+	},
+};
+
+static struct genl_family ioam6_genl_family __ro_after_init = {
+	.hdrsize	= 0,
+	.name		= IOAM6_GENL_NAME,
+	.version	= IOAM6_GENL_VERSION,
+	.maxattr	= IOAM6_ATTR_MAX,
+	.policy		= ioam6_genl_policy,
+	.netnsok	= true,
+	.parallel_ops	= true,
+	.ops		= ioam6_genl_ops,
+	.n_ops		= ARRAY_SIZE(ioam6_genl_ops),
+	.module		= THIS_MODULE,
+};
+
 struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
 {
 	struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
@@ -311,16 +814,26 @@ static struct pernet_operations ioam6_net_ops = {
 
 int __init ioam6_init(void)
 {
-	int err = register_pernet_subsys(&ioam6_net_ops);
+	int err = genl_register_family(&ioam6_genl_family);
+
+	if (err)
+		goto out;
 
+	err = register_pernet_subsys(&ioam6_net_ops);
 	if (err)
-		return err;
+		goto out_unregister_genl;
 
 	pr_info("In-situ OAM (IOAM) with IPv6\n");
-	return 0;
+
+out:
+	return err;
+out_unregister_genl:
+	genl_unregister_family(&ioam6_genl_family);
+	goto out;
 }
 
 void ioam6_exit(void)
 {
 	unregister_pernet_subsys(&ioam6_net_ops);
+	genl_unregister_family(&ioam6_genl_family);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
  2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
                   ` (3 preceding siblings ...)
  2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
  2020-06-25  2:53   ` Tom Herbert
  4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, justin.iurman

Add documentation for new IOAM sysctls:
 - ioam6_id: a namespace sysctl
 - ioam6_enabled and ioam6_id: two per-interface sysctls

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
 Documentation/networking/ip-sysctl.rst    |  5 +++++
 2 files changed, 25 insertions(+)
 create mode 100644 Documentation/networking/ioam6-sysctl.rst

diff --git a/Documentation/networking/ioam6-sysctl.rst b/Documentation/networking/ioam6-sysctl.rst
new file mode 100644
index 000000000000..bad6c64907bc
--- /dev/null
+++ b/Documentation/networking/ioam6-sysctl.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+IOAM6 Sysfs variables
+=====================
+
+
+/proc/sys/net/conf/<iface>/ioam6_* variables:
+============================================
+
+ioam6_enabled - BOOL
+	Enable (accept) or disable (drop) IPv6 IOAM packets on this interface.
+
+	* 0 - disabled (default)
+	* not 0 - enabled
+
+ioam6_id - INTEGER
+	Define the IOAM id of this interface.
+
+	Default is 0.
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index b72f89d5694c..5ba11f2766bd 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1770,6 +1770,11 @@ nexthop_compat_mode - BOOLEAN
 	and extraneous notifications.
 	Default: true (backward compat mode)
 
+ioam6_id - INTEGER
+	Define the IOAM id of this node.
+
+	Default: 0
+
 IPv6 Fragmentation:
 
 ip6frag_high_thresh - INTEGER
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
@ 2020-06-24 20:32   ` Tom Herbert
  2020-06-25 17:47     ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-24 20:32 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Add the possibility to remove one or more consecutive TLVs without
> messing up the alignment of others. For now, only IOAM requires this
> behavior.
>
Hi Justin,

Can you explain the motivation for this? Per RFC8200, extension
headers in flight are not to be added, removed, or modified outside of
the standard rules for processing modifiable HBH and DO TLVs., that
would include adding and removing TLVs in EH. One obvious problem this
creates is that it breaks AH if the TLVs are removed in HBH before AH
is processed (AH is processed after HBH).

Tom
> By default, an 8-octet boundary is automatically assumed. This is the
> price to pay (at most a useless 4-octet padding) to make sure everything
> is still aligned after the removal.
>
> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> header.
>
> Example 1:
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       X       |       X       |    Padding    |    Padding    |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |                                                               |
> ~                Option to be removed (8 octets)                ~
> |                                                               |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Y       |       Y       |       Y       |       Y       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |    Padding    |    Padding    |    Padding    |    Padding    |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> boundary (same result in both cases).
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       X       |       X       |    Padding    |    Padding    |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Y       |       Y       |       Y       |       Y       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |    Padding    |    Padding    |    Padding    |    Padding    |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Example 2:
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       X       |       X       |    Padding    |    Padding    |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |                Option to be removed (4 octets)                |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Y       |       Y       |       Y       |       Y       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> of 8 anymore.
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       X       |       X       |    Padding    |    Padding    |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Y       |       Y       |       Y       |       Y       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> |       Z       |       Z       |       Z       |       Z       |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Therefore, the largest (8-octet) boundary is assumed by default and for
> all, which means that blocks are only moved in multiples of 8. This
> assertion guarantees good alignment.
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 108 insertions(+), 26 deletions(-)
>
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index e9b366994475..f27ab3bf2e0c 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -52,17 +52,27 @@
>
>  #include <linux/uaccess.h>
>
> -/*
> - *     Parsing tlv encoded headers.
> +/* States for TLV parsing functions. */
> +
> +enum {
> +       TLV_ACCEPT,
> +       TLV_REJECT,
> +       TLV_REMOVE,
> +       __TLV_MAX
> +};
> +
> +/* Parsing TLV encoded headers.
>   *
> - *     Parsing function "func" returns true, if parsing succeed
> - *     and false, if it failed.
> - *     It MUST NOT touch skb->h.
> + * Parsing function "func" returns either:
> + *  - TLV_ACCEPT if parsing succeeds
> + *  - TLV_REJECT if parsing fails
> + *  - TLV_REMOVE if TLV must be removed
> + * It MUST NOT touch skb->h.
>   */
>
>  struct tlvtype_proc {
>         int     type;
> -       bool    (*func)(struct sk_buff *skb, int offset);
> +       int     (*func)(struct sk_buff *skb, int offset);
>  };
>
>  /*********************
> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff,
>         return false;
>  }
>
> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> +
> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> +{
> +       int len = end - start;
> +       int padlen = len % 8;
> +       unsigned char *h;
> +       int rlen, off;
> +       u16 pl_len;
> +
> +       rlen = len - padlen;
> +       if (rlen) {
> +               skb_pull(skb, rlen);
> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> +                       start);
> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> +
> +               skb_reset_network_header(skb);
> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> +
> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> +
> +               skb_transport_header(skb)[1] -= rlen >> 3;
> +               end -= rlen;
> +       }
> +
> +       if (padlen) {
> +               off = end - padlen;
> +               h = skb_network_header(skb);
> +
> +               if (padlen == 1) {
> +                       h[off] = IPV6_TLV_PAD1;
> +               } else {
> +                       padlen -= 2;
> +
> +                       h[off] = IPV6_TLV_PADN;
> +                       h[off + 1] = padlen;
> +                       memset(&h[off + 2], 0, padlen);
> +               }
> +       }
> +
> +       return end;
> +}
> +
>  /* Parse tlv encoded option header (hop-by-hop or destination) */
>
>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>                           struct sk_buff *skb,
> -                         int max_count)
> +                         int max_count,
> +                         bool removable)
>  {
>         int len = (skb_transport_header(skb)[1] + 1) << 3;
> -       const unsigned char *nh = skb_network_header(skb);
> +       unsigned char *nh = skb_network_header(skb);
>         int off = skb_network_header_len(skb);
>         const struct tlvtype_proc *curr;
>         bool disallow_unknowns = false;
> +       int off_remove = 0;
>         int tlv_count = 0;
>         int padlen = 0;
> +       int ret;
>
>         if (unlikely(max_count < 0)) {
>                 disallow_unknowns = true;
> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>                         if (tlv_count > max_count)
>                                 goto bad;
>
> +                       ret = -1;
>                         for (curr = procs; curr->type >= 0; curr++) {
>                                 if (curr->type == nh[off]) {
>                                         /* type specific length/alignment
>                                            checks will be performed in the
>                                            func(). */
> -                                       if (curr->func(skb, off) == false)
> +                                       ret = curr->func(skb, off);
> +                                       if (ret == TLV_REJECT)
>                                                 return false;
>                                         break;
>                                 }
> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>                                 return false;
>
> +                       if (removable) {
> +                               if (ret == TLV_REMOVE) {
> +                                       if (!off_remove)
> +                                               off_remove = off - padlen;
> +                               } else if (off_remove) {
> +                                       off = remove_tlv(off_remove, off, skb);
> +                                       nh = skb_network_header(skb);
> +                                       off_remove = 0;
> +                               }
> +                       }
> +
>                         padlen = 0;
>                         break;
>                 }
> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>                 len -= optlen;
>         }
>
> -       if (len == 0)
> +       if (len == 0) {
> +               /* Don't forget last TLV if it must be removed */
> +               if (off_remove)
> +                       remove_tlv(off_remove, off, skb);
> +
>                 return true;
> +       }
>  bad:
>         kfree_skb(skb);
>         return false;
> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>   *****************************/
>
>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>  {
>         struct ipv6_destopt_hao *hao;
>         struct inet6_skb_parm *opt = IP6CB(skb);
> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>         if (skb->tstamp == 0)
>                 __net_timestamp(skb);
>
> -       return true;
> +       return TLV_ACCEPT;
>
>   discard:
>         kfree_skb(skb);
> -       return false;
> +       return TLV_REJECT;
>  }
>  #endif
>
> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>  #endif
>
>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
> +                         false)) {
>                 skb->transport_header += extlen;
>                 opt = IP6CB(skb);
>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff *skb)
>
>  /* Router Alert as of RFC 2711 */
>
> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>  {
>         const unsigned char *nh = skb_network_header(skb);
>
>         if (nh[optoff + 1] == 2) {
>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> -               return true;
> +               return TLV_ACCEPT;
>         }
>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>                             nh[optoff + 1]);
>         kfree_skb(skb);
> -       return false;
> +       return TLV_REJECT;
>  }
>
>  /* Jumbo payload */
>
> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>  {
>         const unsigned char *nh = skb_network_header(skb);
>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>         if (pkt_len <= IPV6_MAXPLEN) {
>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> -               return false;
> +               return TLV_REJECT;
>         }
>         if (ipv6_hdr(skb)->payload_len) {
>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> -               return false;
> +               return TLV_REJECT;
>         }
>
>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>                 goto drop;
>
>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> -       return true;
> +       return TLV_ACCEPT;
>
>  drop:
>         kfree_skb(skb);
> -       return false;
> +       return TLV_REJECT;
>  }
>
>  /* CALIPSO RFC 5570 */
>
> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>  {
>         const unsigned char *nh = skb_network_header(skb);
>
> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>         if (!calipso_validate(skb, nh + optoff))
>                 goto drop;
>
> -       return true;
> +       return TLV_ACCEPT;
>
>  drop:
>         kfree_skb(skb);
> -       return false;
> +       return TLV_REJECT;
>  }
>
>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>
>         opt->flags |= IP6SKB_HOPBYHOP;
>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
> +                         true)) {
> +               /* we need to refresh the length in case
> +                * at least one TLV was removed
> +                */
> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
>                 skb->transport_header += extlen;
>                 opt = IP6CB(skb);
>                 opt->nhoff = sizeof(struct ipv6hdr);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 21:37     ` kernel test robot
  2020-06-24 23:11     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 21:37 UTC (permalink / raw)
  To: Justin Iurman, netdev; +Cc: kbuild-all, davem, justin.iurman

[-- Attachment #1: Type: text/plain, Size: 7932 bytes --]

Hi Justin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=um 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
   In file included from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/net.h:23,
                    from net/ipv6/ioam6.c:12:
   arch/um/include/asm/uaccess.h: In function '__access_ok':
   arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
      17 |    (((unsigned long) (addr) >= FIXADDR_USER_START) && \
         |                             ^~
   arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
      45 |   __access_ok_vsyscall(addr, size) ||
         |   ^~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/kernel.h:11,
                    from net/ipv6/ioam6.c:11:
   include/asm-generic/fixmap.h: In function 'fix_to_virt':
   include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
      32 |  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
         |                   ^~
   include/linux/compiler.h:372:9: note: in definition of macro '__compiletime_assert'
     372 |   if (!(condition))     \
         |         ^~~~~~~~~
   include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
     392 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |  ^~~~~~~~~~~~~~~~
   include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
      32 |  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
         |  ^~~~~~~~~~~~
   net/ipv6/ioam6.c: At top level:
>> net/ipv6/ioam6.c:81:6: warning: no previous prototype for 'ioam6_fill_trace_data_node' [-Wmissing-prototypes]
      81 | void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/ioam6_fill_trace_data_node +81 net/ipv6/ioam6.c

    80	
  > 81	void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
    82					u32 trace_type, struct ioam6_namespace *ns)
    83	{
    84		u8 *data = skb_network_header(skb) + nodeoff;
    85		struct __kernel_sock_timeval ts;
    86		u64 raw_u64;
    87		u32 raw_u32;
    88		u16 raw_u16;
    89		u8 byte;
    90	
    91		/* hop_lim and node_id */
    92		if (trace_type & IOAM6_TRACE_TYPE0) {
    93			byte = ipv6_hdr(skb)->hop_limit - 1;
    94			raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
    95			if (!raw_u32)
    96				raw_u32 = IOAM6_EMPTY_FIELD_u24;
    97			else
    98				raw_u32 &= IOAM6_EMPTY_FIELD_u24;
    99			*(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
   100			data += sizeof(__be32);
   101		}
   102	
   103		/* ingress_if_id and egress_if_id */
   104		if (trace_type & IOAM6_TRACE_TYPE1) {
   105			raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   106			if (!raw_u16)
   107				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   108			*(__be16 *)data = cpu_to_be16(raw_u16);
   109			data += sizeof(__be16);
   110	
   111			raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   112			if (!raw_u16)
   113				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   114			*(__be16 *)data = cpu_to_be16(raw_u16);
   115			data += sizeof(__be16);
   116		}
   117	
   118		/* timestamp seconds */
   119		if (trace_type & IOAM6_TRACE_TYPE2) {
   120			if (!skb->tstamp) {
   121				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   122			} else {
   123				skb_get_new_timestamp(skb, &ts);
   124				*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
   125			}
   126			data += sizeof(__be32);
   127		}
   128	
   129		/* timestamp subseconds */
   130		if (trace_type & IOAM6_TRACE_TYPE3) {
   131			if (!skb->tstamp) {
   132				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   133			} else {
   134				if (!(trace_type & IOAM6_TRACE_TYPE2))
   135					skb_get_new_timestamp(skb, &ts);
   136				*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
   137			}
   138			data += sizeof(__be32);
   139		}
   140	
   141		/* transit delay */
   142		if (trace_type & IOAM6_TRACE_TYPE4) {
   143			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   144			data += sizeof(__be32);
   145		}
   146	
   147		/* namespace data */
   148		if (trace_type & IOAM6_TRACE_TYPE5) {
   149			*(__be32 *)data = (__be32)ns->data;
   150			data += sizeof(__be32);
   151		}
   152	
   153		/* queue depth */
   154		if (trace_type & IOAM6_TRACE_TYPE6) {
   155			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   156			data += sizeof(__be32);
   157		}
   158	
   159		/* hop_lim and node_id (wide) */
   160		if (trace_type & IOAM6_TRACE_TYPE7) {
   161			byte = ipv6_hdr(skb)->hop_limit - 1;
   162			raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
   163			if (!raw_u64)
   164				raw_u64 = IOAM6_EMPTY_FIELD_u56;
   165			else
   166				raw_u64 &= IOAM6_EMPTY_FIELD_u56;
   167			*(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
   168			data += sizeof(__be64);
   169		}
   170	
   171		/* ingress_if_id and egress_if_id (wide) */
   172		if (trace_type & IOAM6_TRACE_TYPE8) {
   173			raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   174			if (!raw_u32)
   175				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   176			*(__be32 *)data = cpu_to_be32(raw_u32);
   177			data += sizeof(__be32);
   178	
   179			raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   180			if (!raw_u32)
   181				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   182			*(__be32 *)data = cpu_to_be32(raw_u32);
   183			data += sizeof(__be32);
   184		}
   185	
   186		/* namespace data (wide) */
   187		if (trace_type & IOAM6_TRACE_TYPE9) {
   188			*(__be64 *)data = ns->data;
   189			data += sizeof(__be64);
   190		}
   191	
   192		/* buffer occupancy */
   193		if (trace_type & IOAM6_TRACE_TYPE10) {
   194			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   195			data += sizeof(__be32);
   196		}
   197	
   198		/* checksum complement */
   199		if (trace_type & IOAM6_TRACE_TYPE11) {
   200			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   201			data += sizeof(__be32);
   202		}
   203	
   204		/* opaque state snapshot */
   205		if (trace_type & IOAM6_TRACE_TYPE22) {
   206			if (!ns->schema) {
   207				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
   208			} else {
   209				*(__be32 *)data = ns->schema->hdr;
   210				data += sizeof(__be32);
   211				memcpy(data, ns->schema->data, ns->schema->len);
   212			}
   213		}
   214	}
   215	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22959 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
@ 2020-06-24 21:37     ` kernel test robot
  0 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 21:37 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 8137 bytes --]

Hi Justin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=um 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
   In file included from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/net.h:23,
                    from net/ipv6/ioam6.c:12:
   arch/um/include/asm/uaccess.h: In function '__access_ok':
   arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
      17 |    (((unsigned long) (addr) >= FIXADDR_USER_START) && \
         |                             ^~
   arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
      45 |   __access_ok_vsyscall(addr, size) ||
         |   ^~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/kernel.h:11,
                    from net/ipv6/ioam6.c:11:
   include/asm-generic/fixmap.h: In function 'fix_to_virt':
   include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
      32 |  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
         |                   ^~
   include/linux/compiler.h:372:9: note: in definition of macro '__compiletime_assert'
     372 |   if (!(condition))     \
         |         ^~~~~~~~~
   include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
     392 |  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |  ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |  ^~~~~~~~~~~~~~~~
   include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
      32 |  BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
         |  ^~~~~~~~~~~~
   net/ipv6/ioam6.c: At top level:
>> net/ipv6/ioam6.c:81:6: warning: no previous prototype for 'ioam6_fill_trace_data_node' [-Wmissing-prototypes]
      81 | void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/ioam6_fill_trace_data_node +81 net/ipv6/ioam6.c

    80	
  > 81	void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
    82					u32 trace_type, struct ioam6_namespace *ns)
    83	{
    84		u8 *data = skb_network_header(skb) + nodeoff;
    85		struct __kernel_sock_timeval ts;
    86		u64 raw_u64;
    87		u32 raw_u32;
    88		u16 raw_u16;
    89		u8 byte;
    90	
    91		/* hop_lim and node_id */
    92		if (trace_type & IOAM6_TRACE_TYPE0) {
    93			byte = ipv6_hdr(skb)->hop_limit - 1;
    94			raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
    95			if (!raw_u32)
    96				raw_u32 = IOAM6_EMPTY_FIELD_u24;
    97			else
    98				raw_u32 &= IOAM6_EMPTY_FIELD_u24;
    99			*(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
   100			data += sizeof(__be32);
   101		}
   102	
   103		/* ingress_if_id and egress_if_id */
   104		if (trace_type & IOAM6_TRACE_TYPE1) {
   105			raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   106			if (!raw_u16)
   107				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   108			*(__be16 *)data = cpu_to_be16(raw_u16);
   109			data += sizeof(__be16);
   110	
   111			raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   112			if (!raw_u16)
   113				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   114			*(__be16 *)data = cpu_to_be16(raw_u16);
   115			data += sizeof(__be16);
   116		}
   117	
   118		/* timestamp seconds */
   119		if (trace_type & IOAM6_TRACE_TYPE2) {
   120			if (!skb->tstamp) {
   121				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   122			} else {
   123				skb_get_new_timestamp(skb, &ts);
   124				*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
   125			}
   126			data += sizeof(__be32);
   127		}
   128	
   129		/* timestamp subseconds */
   130		if (trace_type & IOAM6_TRACE_TYPE3) {
   131			if (!skb->tstamp) {
   132				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   133			} else {
   134				if (!(trace_type & IOAM6_TRACE_TYPE2))
   135					skb_get_new_timestamp(skb, &ts);
   136				*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
   137			}
   138			data += sizeof(__be32);
   139		}
   140	
   141		/* transit delay */
   142		if (trace_type & IOAM6_TRACE_TYPE4) {
   143			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   144			data += sizeof(__be32);
   145		}
   146	
   147		/* namespace data */
   148		if (trace_type & IOAM6_TRACE_TYPE5) {
   149			*(__be32 *)data = (__be32)ns->data;
   150			data += sizeof(__be32);
   151		}
   152	
   153		/* queue depth */
   154		if (trace_type & IOAM6_TRACE_TYPE6) {
   155			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   156			data += sizeof(__be32);
   157		}
   158	
   159		/* hop_lim and node_id (wide) */
   160		if (trace_type & IOAM6_TRACE_TYPE7) {
   161			byte = ipv6_hdr(skb)->hop_limit - 1;
   162			raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
   163			if (!raw_u64)
   164				raw_u64 = IOAM6_EMPTY_FIELD_u56;
   165			else
   166				raw_u64 &= IOAM6_EMPTY_FIELD_u56;
   167			*(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
   168			data += sizeof(__be64);
   169		}
   170	
   171		/* ingress_if_id and egress_if_id (wide) */
   172		if (trace_type & IOAM6_TRACE_TYPE8) {
   173			raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   174			if (!raw_u32)
   175				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   176			*(__be32 *)data = cpu_to_be32(raw_u32);
   177			data += sizeof(__be32);
   178	
   179			raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   180			if (!raw_u32)
   181				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   182			*(__be32 *)data = cpu_to_be32(raw_u32);
   183			data += sizeof(__be32);
   184		}
   185	
   186		/* namespace data (wide) */
   187		if (trace_type & IOAM6_TRACE_TYPE9) {
   188			*(__be64 *)data = ns->data;
   189			data += sizeof(__be64);
   190		}
   191	
   192		/* buffer occupancy */
   193		if (trace_type & IOAM6_TRACE_TYPE10) {
   194			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   195			data += sizeof(__be32);
   196		}
   197	
   198		/* checksum complement */
   199		if (trace_type & IOAM6_TRACE_TYPE11) {
   200			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   201			data += sizeof(__be32);
   202		}
   203	
   204		/* opaque state snapshot */
   205		if (trace_type & IOAM6_TRACE_TYPE22) {
   206			if (!ns->schema) {
   207				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
   208			} else {
   209				*(__be32 *)data = ns->schema->hdr;
   210				data += sizeof(__be32);
   211				memcpy(data, ns->schema->data, ns->schema->len);
   212			}
   213		}
   214	}
   215	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 22959 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 23:11     ` kernel test robot
  2020-06-24 23:11     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
  To: Justin Iurman, netdev; +Cc: kbuild-all, davem, justin.iurman

[-- Attachment #1: Type: text/plain, Size: 5791 bytes --]

Hi Justin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: i386-randconfig-s002-20200624 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 ARCH=i386 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast to restricted __be32
>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast from restricted __be64
>> net/ipv6/ioam6.c:81:6: sparse: sparse: symbol 'ioam6_fill_trace_data_node' was not declared. Should it be static?

Please review and possibly fold the followup patch.

vim +149 net/ipv6/ioam6.c

    80	
  > 81	void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
    82					u32 trace_type, struct ioam6_namespace *ns)
    83	{
    84		u8 *data = skb_network_header(skb) + nodeoff;
    85		struct __kernel_sock_timeval ts;
    86		u64 raw_u64;
    87		u32 raw_u32;
    88		u16 raw_u16;
    89		u8 byte;
    90	
    91		/* hop_lim and node_id */
    92		if (trace_type & IOAM6_TRACE_TYPE0) {
    93			byte = ipv6_hdr(skb)->hop_limit - 1;
    94			raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
    95			if (!raw_u32)
    96				raw_u32 = IOAM6_EMPTY_FIELD_u24;
    97			else
    98				raw_u32 &= IOAM6_EMPTY_FIELD_u24;
    99			*(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
   100			data += sizeof(__be32);
   101		}
   102	
   103		/* ingress_if_id and egress_if_id */
   104		if (trace_type & IOAM6_TRACE_TYPE1) {
   105			raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   106			if (!raw_u16)
   107				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   108			*(__be16 *)data = cpu_to_be16(raw_u16);
   109			data += sizeof(__be16);
   110	
   111			raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   112			if (!raw_u16)
   113				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   114			*(__be16 *)data = cpu_to_be16(raw_u16);
   115			data += sizeof(__be16);
   116		}
   117	
   118		/* timestamp seconds */
   119		if (trace_type & IOAM6_TRACE_TYPE2) {
   120			if (!skb->tstamp) {
   121				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   122			} else {
   123				skb_get_new_timestamp(skb, &ts);
   124				*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
   125			}
   126			data += sizeof(__be32);
   127		}
   128	
   129		/* timestamp subseconds */
   130		if (trace_type & IOAM6_TRACE_TYPE3) {
   131			if (!skb->tstamp) {
   132				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   133			} else {
   134				if (!(trace_type & IOAM6_TRACE_TYPE2))
   135					skb_get_new_timestamp(skb, &ts);
   136				*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
   137			}
   138			data += sizeof(__be32);
   139		}
   140	
   141		/* transit delay */
   142		if (trace_type & IOAM6_TRACE_TYPE4) {
   143			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   144			data += sizeof(__be32);
   145		}
   146	
   147		/* namespace data */
   148		if (trace_type & IOAM6_TRACE_TYPE5) {
 > 149			*(__be32 *)data = (__be32)ns->data;
   150			data += sizeof(__be32);
   151		}
   152	
   153		/* queue depth */
   154		if (trace_type & IOAM6_TRACE_TYPE6) {
   155			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   156			data += sizeof(__be32);
   157		}
   158	
   159		/* hop_lim and node_id (wide) */
   160		if (trace_type & IOAM6_TRACE_TYPE7) {
   161			byte = ipv6_hdr(skb)->hop_limit - 1;
   162			raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
   163			if (!raw_u64)
   164				raw_u64 = IOAM6_EMPTY_FIELD_u56;
   165			else
   166				raw_u64 &= IOAM6_EMPTY_FIELD_u56;
   167			*(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
   168			data += sizeof(__be64);
   169		}
   170	
   171		/* ingress_if_id and egress_if_id (wide) */
   172		if (trace_type & IOAM6_TRACE_TYPE8) {
   173			raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   174			if (!raw_u32)
   175				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   176			*(__be32 *)data = cpu_to_be32(raw_u32);
   177			data += sizeof(__be32);
   178	
   179			raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   180			if (!raw_u32)
   181				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   182			*(__be32 *)data = cpu_to_be32(raw_u32);
   183			data += sizeof(__be32);
   184		}
   185	
   186		/* namespace data (wide) */
   187		if (trace_type & IOAM6_TRACE_TYPE9) {
   188			*(__be64 *)data = ns->data;
   189			data += sizeof(__be64);
   190		}
   191	
   192		/* buffer occupancy */
   193		if (trace_type & IOAM6_TRACE_TYPE10) {
   194			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   195			data += sizeof(__be32);
   196		}
   197	
   198		/* checksum complement */
   199		if (trace_type & IOAM6_TRACE_TYPE11) {
   200			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   201			data += sizeof(__be32);
   202		}
   203	
   204		/* opaque state snapshot */
   205		if (trace_type & IOAM6_TRACE_TYPE22) {
   206			if (!ns->schema) {
   207				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
   208			} else {
   209				*(__be32 *)data = ns->schema->hdr;
   210				data += sizeof(__be32);
   211				memcpy(data, ns->schema->data, ns->schema->len);
   212			}
   213		}
   214	}
   215	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32299 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
@ 2020-06-24 23:11     ` kernel test robot
  0 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5963 bytes --]

Hi Justin,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: i386-randconfig-s002-20200624 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 ARCH=i386 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast to restricted __be32
>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast from restricted __be64
>> net/ipv6/ioam6.c:81:6: sparse: sparse: symbol 'ioam6_fill_trace_data_node' was not declared. Should it be static?

Please review and possibly fold the followup patch.

vim +149 net/ipv6/ioam6.c

    80	
  > 81	void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
    82					u32 trace_type, struct ioam6_namespace *ns)
    83	{
    84		u8 *data = skb_network_header(skb) + nodeoff;
    85		struct __kernel_sock_timeval ts;
    86		u64 raw_u64;
    87		u32 raw_u32;
    88		u16 raw_u16;
    89		u8 byte;
    90	
    91		/* hop_lim and node_id */
    92		if (trace_type & IOAM6_TRACE_TYPE0) {
    93			byte = ipv6_hdr(skb)->hop_limit - 1;
    94			raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
    95			if (!raw_u32)
    96				raw_u32 = IOAM6_EMPTY_FIELD_u24;
    97			else
    98				raw_u32 &= IOAM6_EMPTY_FIELD_u24;
    99			*(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
   100			data += sizeof(__be32);
   101		}
   102	
   103		/* ingress_if_id and egress_if_id */
   104		if (trace_type & IOAM6_TRACE_TYPE1) {
   105			raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   106			if (!raw_u16)
   107				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   108			*(__be16 *)data = cpu_to_be16(raw_u16);
   109			data += sizeof(__be16);
   110	
   111			raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   112			if (!raw_u16)
   113				raw_u16 = IOAM6_EMPTY_FIELD_u16;
   114			*(__be16 *)data = cpu_to_be16(raw_u16);
   115			data += sizeof(__be16);
   116		}
   117	
   118		/* timestamp seconds */
   119		if (trace_type & IOAM6_TRACE_TYPE2) {
   120			if (!skb->tstamp) {
   121				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   122			} else {
   123				skb_get_new_timestamp(skb, &ts);
   124				*(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
   125			}
   126			data += sizeof(__be32);
   127		}
   128	
   129		/* timestamp subseconds */
   130		if (trace_type & IOAM6_TRACE_TYPE3) {
   131			if (!skb->tstamp) {
   132				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   133			} else {
   134				if (!(trace_type & IOAM6_TRACE_TYPE2))
   135					skb_get_new_timestamp(skb, &ts);
   136				*(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
   137			}
   138			data += sizeof(__be32);
   139		}
   140	
   141		/* transit delay */
   142		if (trace_type & IOAM6_TRACE_TYPE4) {
   143			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   144			data += sizeof(__be32);
   145		}
   146	
   147		/* namespace data */
   148		if (trace_type & IOAM6_TRACE_TYPE5) {
 > 149			*(__be32 *)data = (__be32)ns->data;
   150			data += sizeof(__be32);
   151		}
   152	
   153		/* queue depth */
   154		if (trace_type & IOAM6_TRACE_TYPE6) {
   155			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   156			data += sizeof(__be32);
   157		}
   158	
   159		/* hop_lim and node_id (wide) */
   160		if (trace_type & IOAM6_TRACE_TYPE7) {
   161			byte = ipv6_hdr(skb)->hop_limit - 1;
   162			raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
   163			if (!raw_u64)
   164				raw_u64 = IOAM6_EMPTY_FIELD_u56;
   165			else
   166				raw_u64 &= IOAM6_EMPTY_FIELD_u56;
   167			*(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
   168			data += sizeof(__be64);
   169		}
   170	
   171		/* ingress_if_id and egress_if_id (wide) */
   172		if (trace_type & IOAM6_TRACE_TYPE8) {
   173			raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
   174			if (!raw_u32)
   175				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   176			*(__be32 *)data = cpu_to_be32(raw_u32);
   177			data += sizeof(__be32);
   178	
   179			raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
   180			if (!raw_u32)
   181				raw_u32 = IOAM6_EMPTY_FIELD_u32;
   182			*(__be32 *)data = cpu_to_be32(raw_u32);
   183			data += sizeof(__be32);
   184		}
   185	
   186		/* namespace data (wide) */
   187		if (trace_type & IOAM6_TRACE_TYPE9) {
   188			*(__be64 *)data = ns->data;
   189			data += sizeof(__be64);
   190		}
   191	
   192		/* buffer occupancy */
   193		if (trace_type & IOAM6_TRACE_TYPE10) {
   194			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   195			data += sizeof(__be32);
   196		}
   197	
   198		/* checksum complement */
   199		if (trace_type & IOAM6_TRACE_TYPE11) {
   200			*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
   201			data += sizeof(__be32);
   202		}
   203	
   204		/* opaque state snapshot */
   205		if (trace_type & IOAM6_TRACE_TYPE22) {
   206			if (!ns->schema) {
   207				*(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
   208			} else {
   209				*(__be32 *)data = ns->schema->hdr;
   210				data += sizeof(__be32);
   211				memcpy(data, ns->schema->data, ns->schema->len);
   212			}
   213		}
   214	}
   215	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32299 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC PATCH] ipv6: ioam: ioam6_fill_trace_data_node() can be static
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 23:11     ` kernel test robot
  2020-06-24 23:11     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
  To: Justin Iurman, netdev; +Cc: kbuild-all, davem, justin.iurman


Signed-off-by: kernel test robot <lkp@intel.com>
---
 ioam6.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 406aa78eb504c..4a4e72bb54cc5 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -78,8 +78,8 @@ struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
 	return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
 }
 
-void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
-				u32 trace_type, struct ioam6_namespace *ns)
+static void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
+				       u32 trace_type, struct ioam6_namespace *ns)
 {
 	u8 *data = skb_network_header(skb) + nodeoff;
 	struct __kernel_sock_timeval ts;

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH] ipv6: ioam: ioam6_fill_trace_data_node() can be static
@ 2020-06-24 23:11     ` kernel test robot
  0 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 778 bytes --]


Signed-off-by: kernel test robot <lkp@intel.com>
---
 ioam6.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 406aa78eb504c..4a4e72bb54cc5 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -78,8 +78,8 @@ struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
 	return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
 }
 
-void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
-				u32 trace_type, struct ioam6_namespace *ns)
+static void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
+				       u32 trace_type, struct ioam6_namespace *ns)
 {
 	u8 *data = skb_network_header(skb) + nodeoff;
 	struct __kernel_sock_timeval ts;

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
  2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
@ 2020-06-25  2:32   ` Tom Herbert
  2020-06-25 17:56     ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25  2:32 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Implement the IOAM egress behavior.
>
> According to RFC 8200:
> "Extension headers (except for the Hop-by-Hop Options header) are not
>  processed, inserted, or deleted by any node along a packet's delivery
>  path, until the packet reaches the node (or each of the set of nodes,
>  in the case of multicast) identified in the Destination Address field
>  of the IPv6 header."
>
> Therefore, an ingress node (an IOAM domain border) must encapsulate an
> incoming IPv6 packet with another similar IPv6 header that will contain
> IOAM data while it traverses the domain. When leaving, the egress node,
> another IOAM domain border which is also the tunnel destination, must
> decapsulate the packet.

This is just IP in IP encapsulation that happens to be terminated at
an egress node of the IOAM domain. The fact that it's IOAM isn't
germaine, this IP in IP is done in a variety of ways. We should be
using the normal protocol handler for NEXTHDR_IPV6  instead of special
case code.

>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  include/linux/ipv6.h |  1 +
>  net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 2cb445a8fc9e..5312a718bc7a 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
>  #define IP6SKB_HOPBYHOP        32
>  #define IP6SKB_L3SLAVE         64
>  #define IP6SKB_JUMBOGRAM      128
> +#define IP6SKB_IOAM           256
>  };
>
>  #if defined(CONFIG_NET_L3_MASTER_DEV)
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index e96304d8a4a7..8cf75cc5e806 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
>  void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>                               bool have_final)
>  {
> +       struct inet6_skb_parm *opt = IP6CB(skb);
>         const struct inet6_protocol *ipprot;
>         struct inet6_dev *idev;
>         unsigned int nhoff;
> +       u8 hop_limit;
>         bool raw;
>
>         /*
> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>         } else {
>                 if (!raw) {
>                         if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
> +                               /* IOAM Tunnel Decapsulation
> +                                * Packet is going to re-enter the stack
> +                                */
> +                               if (nexthdr == NEXTHDR_IPV6 &&
> +                                   (opt->flags & IP6SKB_IOAM)) {
> +                                       hop_limit = ipv6_hdr(skb)->hop_limit;
> +
> +                                       skb_reset_network_header(skb);
> +                                       skb_reset_transport_header(skb);
> +                                       skb->encapsulation = 0;
> +
> +                                       ipv6_hdr(skb)->hop_limit = hop_limit;
> +                                       __skb_tunnel_rx(skb, skb->dev,
> +                                                       dev_net(skb->dev));
> +
> +                                       netif_rx(skb);
> +                                       goto out;
> +                               }
> +
>                                 __IP6_INC_STATS(net, idev,
>                                                 IPSTATS_MIB_INUNKNOWNPROTOS);
>                                 icmpv6_send(skb, ICMPV6_PARAMPROB,
> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>                         consume_skb(skb);
>                 }
>         }
> +out:
>         return;
>
>  discard:
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
                     ` (2 preceding siblings ...)
  2020-06-24 23:11     ` kernel test robot
@ 2020-06-25  2:42   ` Tom Herbert
  2020-06-25 14:29   ` Tom Herbert
  4 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-25  2:42 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>
> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> packets. Default is drop.
>
> Another per-interface sysctl ioam6_id is provided to define the IOAM
> (unique) identifier of the interface.
>
> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> identifier of the node.
>
> Two relativistic hash tables: one for IOAM namespaces, the other for
> IOAM schemas. A namespace can only have a single active schema and a
> schema can only be attached to a single namespace (1:1 relationship).
>
>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>   [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  include/linux/ipv6.h       |   2 +
>  include/net/ioam6.h        |  98 +++++++++++
>  include/net/netns/ipv6.h   |   2 +
>  include/uapi/linux/in6.h   |   1 +
>  include/uapi/linux/ipv6.h  |   2 +
>  net/ipv6/Makefile          |   2 +-
>  net/ipv6/addrconf.c        |  20 +++
>  net/ipv6/af_inet6.c        |   7 +
>  net/ipv6/exthdrs.c         |  67 ++++++++
>  net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
>  net/ipv6/sysctl_net_ipv6.c |   7 +
>  11 files changed, 533 insertions(+), 1 deletion(-)
>  create mode 100644 include/net/ioam6.h
>  create mode 100644 net/ipv6/ioam6.c
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 5312a718bc7a..15732f964c6e 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -75,6 +75,8 @@ struct ipv6_devconf {
>         __s32           disable_policy;
>         __s32           ndisc_tclass;
>         __s32           rpl_seg_enabled;
> +       __u32           ioam6_enabled;
> +       __u32           ioam6_id;
>
>         struct ctl_table_header *sysctl_header;
>  };
> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> new file mode 100644
> index 000000000000..2a910bc99947
> --- /dev/null
> +++ b/include/net/ioam6.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + *  IOAM IPv6 implementation
> + *
> + *  Author:
> + *  Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#ifndef _NET_IOAM6_H
> +#define _NET_IOAM6_H
> +
> +#include <linux/net.h>
> +#include <linux/ipv6.h>
> +#include <linux/rhashtable-types.h>
> +
> +#define IOAM6_OPT_TRACE_PREALLOC 0
> +
> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> +
> +#define IOAM6_TRACE_TYPE0  (1 << 31)
> +#define IOAM6_TRACE_TYPE1  (1 << 30)
> +#define IOAM6_TRACE_TYPE2  (1 << 29)
> +#define IOAM6_TRACE_TYPE3  (1 << 28)
> +#define IOAM6_TRACE_TYPE4  (1 << 27)
> +#define IOAM6_TRACE_TYPE5  (1 << 26)
> +#define IOAM6_TRACE_TYPE6  (1 << 25)
> +#define IOAM6_TRACE_TYPE7  (1 << 24)
> +#define IOAM6_TRACE_TYPE8  (1 << 23)
> +#define IOAM6_TRACE_TYPE9  (1 << 22)
> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> +
> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> +
> +struct ioam6_common_hdr {
> +       u8 opt_type;
> +       u8 opt_len;
> +       u8 res;
> +       u8 ioam_type;
> +       __be16 namespace_id;
> +} __packed;
> +
> +struct ioam6_trace_hdr {
> +       __be16 info;
> +       __be32 type;
> +} __packed;
> +
> +struct ioam6_namespace {
> +       struct rhash_head head;
> +       struct rcu_head rcu;
> +
> +       __be16 id;
> +       __be64 data;
> +       bool remove_tlv;
> +
> +       struct ioam6_schema *schema;
> +};
> +
> +struct ioam6_schema {
> +       struct rhash_head head;
> +       struct rcu_head rcu;
> +
> +       u32 id;
> +       int len;
> +       __be32 hdr;
> +       u8 *data;
> +
> +       struct ioam6_namespace *ns;
> +};
> +
> +struct ioam6_pernet_data {
> +       struct mutex lock;
> +       struct rhashtable namespaces;
> +       struct rhashtable schemas;
> +};
> +
> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> +       return net->ipv6.ioam6_data;
> +#else
> +       return NULL;
> +#endif
> +}
> +
> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> +                                 struct ioam6_namespace *ns);
> +
> +extern int ioam6_init(void);
> +extern void ioam6_exit(void);
> +
> +#endif
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 5ec054473d81..89b27fa721f4 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
>         int max_hbh_opts_len;
>         int seg6_flowlabel;
>         bool skip_notify_on_dev_down;
> +       unsigned int ioam6_id;
>  };
>
>  struct netns_ipv6 {
> @@ -115,6 +116,7 @@ struct netns_ipv6 {
>                 spinlock_t      lock;
>                 u32             seq;
>         } ip6addrlbl_table;
> +       struct ioam6_pernet_data *ioam6_data;
>  };
>
>  #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index 9f2273a08356..1c98435220c9 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
>  #define IPV6_TLV_PADN          1
>  #define IPV6_TLV_ROUTERALERT   5
>  #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
> +#define IPV6_TLV_IOAM_HOPOPTS  49
>  #define IPV6_TLV_JUMBO         194
>  #define IPV6_TLV_HAO           201     /* home address option */
>
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 13e8751bf24a..eb521b2dd885 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -189,6 +189,8 @@ enum {
>         DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>         DEVCONF_NDISC_TCLASS,
>         DEVCONF_RPL_SEG_ENABLED,
> +       DEVCONF_IOAM6_ENABLED,
> +       DEVCONF_IOAM6_ID,
>         DEVCONF_MAX
>  };
>
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index cf7b47bdb9b3..b7ef10d417d6 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -10,7 +10,7 @@ ipv6-objs :=  af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
>                 route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
>                 raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
>                 exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> -               udp_offload.o seg6.o fib6_notifier.o rpl.o
> +               udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>
>  ipv6-offload :=        ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 840bfdb3d7bd..6c952a28ade2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>         .disable_policy         = 0,
>         .rpl_seg_enabled        = 0,
> +       .ioam6_enabled          = 0,
> +       .ioam6_id               = 0,
>  };
>
>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>         .disable_policy         = 0,
>         .rpl_seg_enabled        = 0,
> +       .ioam6_enabled          = 0,
> +       .ioam6_id               = 0,
>  };
>
>  /* Check if link is ready: is it up and is a valid qdisc available */
> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
>         array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
>         array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
>         array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> +       array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> +       array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
>  }
>
>  static inline size_t inet6_ifla6_size(void)
> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec,
>         },
> +       {
> +               .procname       = "ioam6_enabled",
> +               .data           = &ipv6_devconf.ioam6_enabled,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
> +       {
> +               .procname       = "ioam6_id",
> +               .data           = &ipv6_devconf.ioam6_id,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
>         {
>                 /* sentinel */
>         }
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index b304b882e031..63a9ffc4b283 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -62,6 +62,7 @@
>  #include <net/rpl.h>
>  #include <net/compat.h>
>  #include <net/xfrm.h>
> +#include <net/ioam6.h>
>
>  #include <linux/uaccess.h>
>  #include <linux/mroute6.h>
> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
>         if (err)
>                 goto rpl_fail;
>
> +       err = ioam6_init();
> +       if (err)
> +               goto ioam6_fail;
> +
>         err = igmp6_late_init();
>         if (err)
>                 goto igmp6_late_err;
> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
>  #endif
>  igmp6_late_err:
>         rpl_exit();
> +ioam6_fail:
> +       ioam6_exit();
>  rpl_fail:
>         seg6_exit();
>  seg6_fail:
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index f27ab3bf2e0c..00aee1358f1c 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -49,6 +49,8 @@
>  #include <net/seg6_hmac.h>
>  #endif
>  #include <net/rpl.h>
> +#include <net/ioam6.h>
> +#include <net/dst_metadata.h>
>
>  #include <linux/uaccess.h>
>
> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>         return TLV_REJECT;
>  }
>
> +/* IOAM */
> +
> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> +{
> +       struct ioam6_common_hdr *ioamh;
> +       struct ioam6_namespace *ns;
> +
> +       /* Must be 4n-aligned */
> +       if (optoff & 3)
> +               goto drop;
> +
> +       if (!skb_valid_dst(skb))
> +               ip6_route_input(skb);
> +
> +       /* IOAM must be enabled on ingress interface */
> +       if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> +               goto drop;
> +
> +       ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> +       ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> +
> +       /* Unknown IOAM namespace, either:
> +        *  - Drop it if IOAM is not enabled on egress interface (if any)
> +        *  - Ignore it otherwise
> +        */
> +       if (!ns) {
> +               if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> +                   !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> +                       goto drop;
> +
> +               goto accept;
> +       }
> +
> +       if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> +               goto remove;
> +
> +       /* Known IOAM namespace which must not be removed:
> +        * IOAM must be enabled on egress interface
> +        */
> +       if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> +           !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> +               goto drop;
> +
> +       switch (ioamh->ioam_type) {
> +       case IOAM6_OPT_TRACE_PREALLOC:
> +               ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> +               IP6CB(skb)->flags |= IP6SKB_IOAM;
> +               break;
> +       default:
> +               break;
> +       }
> +
> +accept:
> +       return TLV_ACCEPT;
> +remove:
> +       return TLV_REMOVE;
> +drop:
> +       kfree_skb(skb);
> +       return TLV_REJECT;
> +}

Hardcoding another TLV in exthdrs.c. I still hope we can eventually
TLVs to be registered from modules like any other protocol does...

> +
>  /* Jumbo payload */
>
>  static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
>                 .type   = IPV6_TLV_ROUTERALERT,
>                 .func   = ipv6_hop_ra,
>         },
> +       {
> +               .type   = IPV6_TLV_IOAM_HOPOPTS,
> +               .func   = ipv6_hop_ioam,
> +       },
>         {
>                 .type   = IPV6_TLV_JUMBO,
>                 .func   = ipv6_hop_jumbo,
> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> new file mode 100644
> index 000000000000..406aa78eb504
> --- /dev/null
> +++ b/net/ipv6/ioam6.c
> @@ -0,0 +1,326 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + *  IOAM IPv6 implementation
> + *
> + *  Author:
> + *  Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/net.h>
> +#include <linux/rhashtable.h>
> +
> +#include <net/addrconf.h>
> +#include <net/ioam6.h>
> +
> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> +{
> +       kfree_rcu(ns, rcu);
> +}
> +
> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> +{
> +       kfree_rcu(sc, rcu);
> +}
> +
> +static void ioam6_free_ns(void *ptr, void *arg)
> +{
> +       struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> +
> +       if (ns)
> +               ioam6_ns_release(ns);
> +}
> +
> +static void ioam6_free_sc(void *ptr, void *arg)
> +{
> +       struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> +
> +       if (sc)
> +               ioam6_sc_release(sc);
> +}
> +
> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> +       const struct ioam6_namespace *ns = obj;
> +
> +       return (ns->id != *(__be16 *)arg->key);
> +}
> +
> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> +       const struct ioam6_schema *sc = obj;
> +
> +       return (sc->id != *(u32 *)arg->key);
> +}
> +
> +static const struct rhashtable_params rht_ns_params = {
> +       .key_len                = sizeof(__be16),
> +       .key_offset             = offsetof(struct ioam6_namespace, id),
> +       .head_offset            = offsetof(struct ioam6_namespace, head),
> +       .automatic_shrinking    = true,
> +       .obj_cmpfn              = ioam6_ns_cmpfn,
> +};
> +
> +static const struct rhashtable_params rht_sc_params = {
> +       .key_len                = sizeof(u32),
> +       .key_offset             = offsetof(struct ioam6_schema, id),
> +       .head_offset            = offsetof(struct ioam6_schema, head),
> +       .automatic_shrinking    = true,
> +       .obj_cmpfn              = ioam6_sc_cmpfn,
> +};
> +
> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> +{
> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> +       return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> +}
> +
> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> +                               u32 trace_type, struct ioam6_namespace *ns)
> +{
> +       u8 *data = skb_network_header(skb) + nodeoff;
> +       struct __kernel_sock_timeval ts;
> +       u64 raw_u64;
> +       u32 raw_u32;
> +       u16 raw_u16;
> +       u8 byte;
> +
> +       /* hop_lim and node_id */
> +       if (trace_type & IOAM6_TRACE_TYPE0) {
> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> +               raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> +               if (!raw_u32)
> +                       raw_u32 = IOAM6_EMPTY_FIELD_u24;
> +               else
> +                       raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> +               *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* ingress_if_id and egress_if_id */
> +       if (trace_type & IOAM6_TRACE_TYPE1) {
> +               raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> +               if (!raw_u16)
> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> +               data += sizeof(__be16);
> +
> +               raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> +               if (!raw_u16)
> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> +               data += sizeof(__be16);
> +       }
> +
> +       /* timestamp seconds */
> +       if (trace_type & IOAM6_TRACE_TYPE2) {
> +               if (!skb->tstamp) {
> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               } else {
> +                       skb_get_new_timestamp(skb, &ts);
> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> +               }
> +               data += sizeof(__be32);
> +       }
> +
> +       /* timestamp subseconds */
> +       if (trace_type & IOAM6_TRACE_TYPE3) {
> +               if (!skb->tstamp) {
> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               } else {
> +                       if (!(trace_type & IOAM6_TRACE_TYPE2))
> +                               skb_get_new_timestamp(skb, &ts);
> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> +               }
> +               data += sizeof(__be32);
> +       }
> +
> +       /* transit delay */
> +       if (trace_type & IOAM6_TRACE_TYPE4) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* namespace data */
> +       if (trace_type & IOAM6_TRACE_TYPE5) {
> +               *(__be32 *)data = (__be32)ns->data;
> +               data += sizeof(__be32);
> +       }
> +
> +       /* queue depth */
> +       if (trace_type & IOAM6_TRACE_TYPE6) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* hop_lim and node_id (wide) */
> +       if (trace_type & IOAM6_TRACE_TYPE7) {
> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> +               raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> +               if (!raw_u64)
> +                       raw_u64 = IOAM6_EMPTY_FIELD_u56;
> +               else
> +                       raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> +               *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> +               data += sizeof(__be64);
> +       }
> +
> +       /* ingress_if_id and egress_if_id (wide) */
> +       if (trace_type & IOAM6_TRACE_TYPE8) {
> +               raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> +               if (!raw_u32)
> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> +               *(__be32 *)data = cpu_to_be32(raw_u32);

Hmm, I wonder if the compiler is implementing this as:

 *(__be32 *)data = raw_u32 ? cpu_to_be32(raw_u32) :
cpu_to_be32(IOAM6_EMPTY_FIELD_u32);

That is it realizes cpu_to_be32(IOAM6_EMPTY_FIELD_u32) is a constant expression

> +               data += sizeof(__be32);
> +
> +               raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> +               if (!raw_u32)
> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* namespace data (wide) */
> +       if (trace_type & IOAM6_TRACE_TYPE9) {
> +               *(__be64 *)data = ns->data;
> +               data += sizeof(__be64);
> +       }
> +
> +       /* buffer occupancy */
> +       if (trace_type & IOAM6_TRACE_TYPE10) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* checksum complement */
> +       if (trace_type & IOAM6_TRACE_TYPE11) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* opaque state snapshot */
> +       if (trace_type & IOAM6_TRACE_TYPE22) {
> +               if (!ns->schema) {
> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> +               } else {
> +                       *(__be32 *)data = ns->schema->hdr;
> +                       data += sizeof(__be32);
> +                       memcpy(data, ns->schema->data, ns->schema->len);
> +               }
> +       }
> +}
> +
> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> +                          struct ioam6_namespace *ns)
> +{
> +       u8 nodelen, flags, remlen, sclen = 0;
> +       struct ioam6_trace_hdr *trh;
> +       int nodeoff;
> +       u16 info;
> +       u32 type;
> +
> +       trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> +       info = be16_to_cpu(trh->info);
> +       type = be32_to_cpu(trh->type);
> +
> +       nodelen = info >> 11;
> +       flags = (info >> 7) & 0xf;
> +       remlen = info & 0x7f;
> +
> +       /* Skip if Overflow bit is set OR
> +        * if an unknown type (bit 12-21) is set
> +        */
> +       if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> +               return;
> +
> +       /* NodeLen does not include Opaque State Snapshot length. We need to
> +        * take it into account if the corresponding bit is set and if current
> +        * IOAM namespace has an active schema attached to it
> +        */
> +       if (type & IOAM6_TRACE_TYPE22) {
> +               /* Opaque State Snapshot header size */
> +               sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> +
> +               if (ns->schema)
> +                       sclen += ns->schema->len / 4;
> +       }
> +
> +       /* Not enough space remaining: set Overflow bit and skip */
> +       if (!remlen || remlen < (nodelen + sclen)) {
> +               info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> +               trh->info = cpu_to_be16(info);
> +               return;
> +       }
> +
> +       nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> +       ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> +
> +       /* Update RemainingLen */
> +       remlen -= nodelen + sclen;
> +       info = (info & 0xff80) | remlen;
> +       trh->info = cpu_to_be16(info);
> +}
> +
> +static int __net_init ioam6_net_init(struct net *net)
> +{
> +       struct ioam6_pernet_data *nsdata;
> +       int err = -ENOMEM;
> +
> +       nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> +       if (!nsdata)
> +               goto out;
> +
> +       mutex_init(&nsdata->lock);
> +       net->ipv6.ioam6_data = nsdata;
> +
> +       err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> +       if (err)
> +               goto free_nsdata;
> +
> +       err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> +       if (err)
> +               goto free_rht_ns;
> +
> +out:
> +       return err;
> +free_rht_ns:
> +       rhashtable_destroy(&nsdata->namespaces);
> +free_nsdata:
> +       kfree(nsdata);
> +       net->ipv6.ioam6_data = NULL;
> +       goto out;
> +}
> +
> +static void __net_exit ioam6_net_exit(struct net *net)
> +{
> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> +       rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> +       rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> +
> +       kfree(nsdata);
> +}
> +
> +static struct pernet_operations ioam6_net_ops = {
> +       .init = ioam6_net_init,
> +       .exit = ioam6_net_exit,
> +};
> +
> +int __init ioam6_init(void)
> +{
> +       int err = register_pernet_subsys(&ioam6_net_ops);
> +
> +       if (err)
> +               return err;
> +
> +       pr_info("In-situ OAM (IOAM) with IPv6\n");
> +       return 0;
> +}
> +
> +void ioam6_exit(void)
> +{
> +       unregister_pernet_subsys(&ioam6_net_ops);
> +}
> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> index fac2135aa47b..da49b33ab6fc 100644
> --- a/net/ipv6/sysctl_net_ipv6.c
> +++ b/net/ipv6/sysctl_net_ipv6.c
> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec
>         },
> +       {
> +               .procname       = "ioam6_id",
> +               .data           = &init_net.ipv6.sysctl.ioam6_id,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec
> +       },
>         { }
>  };
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
  2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
@ 2020-06-25  2:53   ` Tom Herbert
  2020-06-25 18:00     ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25  2:53 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Add documentation for new IOAM sysctls:
>  - ioam6_id: a namespace sysctl
>  - ioam6_enabled and ioam6_id: two per-interface sysctls
>
Are you planning add a more detailed description of the feature and
how to use it (would be nice I think :-) )

> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
>  Documentation/networking/ip-sysctl.rst    |  5 +++++
>  2 files changed, 25 insertions(+)
>  create mode 100644 Documentation/networking/ioam6-sysctl.rst
>
> diff --git a/Documentation/networking/ioam6-sysctl.rst b/Documentation/networking/ioam6-sysctl.rst
> new file mode 100644
> index 000000000000..bad6c64907bc
> --- /dev/null
> +++ b/Documentation/networking/ioam6-sysctl.rst
> @@ -0,0 +1,20 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +IOAM6 Sysfs variables
> +=====================
> +
> +
> +/proc/sys/net/conf/<iface>/ioam6_* variables:
> +============================================
> +
> +ioam6_enabled - BOOL
> +       Enable (accept) or disable (drop) IPv6 IOAM packets on this interface.
> +
> +       * 0 - disabled (default)
> +       * not 0 - enabled
> +
> +ioam6_id - INTEGER
> +       Define the IOAM id of this interface.
> +
> +       Default is 0.
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index b72f89d5694c..5ba11f2766bd 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1770,6 +1770,11 @@ nexthop_compat_mode - BOOLEAN
>         and extraneous notifications.
>         Default: true (backward compat mode)
>
> +ioam6_id - INTEGER
> +       Define the IOAM id of this node.
> +
> +       Default: 0
> +
>  IPv6 Fragmentation:
>
>  ip6frag_high_thresh - INTEGER
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
  2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
  2020-06-25 10:52     ` Dan Carpenter
@ 2020-06-25 10:52     ` Dan Carpenter
  0 siblings, 0 replies; 42+ messages in thread
From: Dan Carpenter @ 2020-06-25 10:52 UTC (permalink / raw)
  To: kbuild, Justin Iurman, netdev; +Cc: lkp, kbuild-all, davem, justin.iurman

[-- Attachment #1: Type: text/plain, Size: 6579 bytes --]

Hi Justin,

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: microblaze-randconfig-m031-20200624 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
net/ipv6/ioam6.c:164 ioam6_genl_delns() error: we previously assumed 'ns->schema' could be null (see line 158)
net/ipv6/ioam6.c:358 ioam6_genl_delsc() error: we previously assumed 'sc->ns' could be null (see line 352)

# https://github.com/0day-ci/linux/commit/ce303f2d7c40f84739505f1daa7dac53daa6c4c5
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout ce303f2d7c40f84739505f1daa7dac53daa6c4c5
vim +164 net/ipv6/ioam6.c

ce303f2d7c40f8 Justin Iurman 2020-06-24  135  
ce303f2d7c40f8 Justin Iurman 2020-06-24  136  static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24  137  {
ce303f2d7c40f8 Justin Iurman 2020-06-24  138  	struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24  139  	struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24  140  	struct ioam6_namespace *ns;
ce303f2d7c40f8 Justin Iurman 2020-06-24  141  	__be16 ns_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24  142  	int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  143  
ce303f2d7c40f8 Justin Iurman 2020-06-24  144  	if (!info->attrs[IOAM6_ATTR_NS_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24  145  		return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  146  
ce303f2d7c40f8 Justin Iurman 2020-06-24  147  	ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
ce303f2d7c40f8 Justin Iurman 2020-06-24  148  	nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24  149  
ce303f2d7c40f8 Justin Iurman 2020-06-24  150  	mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  151  
ce303f2d7c40f8 Justin Iurman 2020-06-24  152  	ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  153  	if (!ns) {
ce303f2d7c40f8 Justin Iurman 2020-06-24  154  		err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24  155  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  156  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  157  
ce303f2d7c40f8 Justin Iurman 2020-06-24 @158  	if (ns->schema)
                                                    ^^^^^^^^^^
Check for NULL

ce303f2d7c40f8 Justin Iurman 2020-06-24  159  		ns->schema->ns = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  160  
ce303f2d7c40f8 Justin Iurman 2020-06-24  161  	err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24  162  				     rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  163  	if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @164  		ns->schema->ns = ns;
                                                        ^^^^^^^^^^^^^^
Unchecked dereference.

ce303f2d7c40f8 Justin Iurman 2020-06-24  165  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  166  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  167  
ce303f2d7c40f8 Justin Iurman 2020-06-24  168  	ioam6_ns_release(ns);
ce303f2d7c40f8 Justin Iurman 2020-06-24  169  
ce303f2d7c40f8 Justin Iurman 2020-06-24  170  out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24  171  	mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  172  	return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  173  }

[ snip ]

ce303f2d7c40f8 Justin Iurman 2020-06-24  330  static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24  331  {
ce303f2d7c40f8 Justin Iurman 2020-06-24  332  	struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24  333  	struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24  334  	struct ioam6_schema *sc;
ce303f2d7c40f8 Justin Iurman 2020-06-24  335  	u32 sc_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24  336  	int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  337  
ce303f2d7c40f8 Justin Iurman 2020-06-24  338  	if (!info->attrs[IOAM6_ATTR_SC_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24  339  		return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  340  
ce303f2d7c40f8 Justin Iurman 2020-06-24  341  	sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
ce303f2d7c40f8 Justin Iurman 2020-06-24  342  	nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24  343  
ce303f2d7c40f8 Justin Iurman 2020-06-24  344  	mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  345  
ce303f2d7c40f8 Justin Iurman 2020-06-24  346  	sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  347  	if (!sc) {
ce303f2d7c40f8 Justin Iurman 2020-06-24  348  		err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24  349  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  350  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  351  
ce303f2d7c40f8 Justin Iurman 2020-06-24 @352  	if (sc->ns)
                                                    ^^^^^^
Check for NULL

ce303f2d7c40f8 Justin Iurman 2020-06-24  353  		sc->ns->schema = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  354  
ce303f2d7c40f8 Justin Iurman 2020-06-24  355  	err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24  356  				     rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  357  	if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @358  		sc->ns->schema = sc;
                                                        ^^^^^^^^^^^^^^
Unchecked dereference

ce303f2d7c40f8 Justin Iurman 2020-06-24  359  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  360  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  361  
ce303f2d7c40f8 Justin Iurman 2020-06-24  362  	ioam6_sc_release(sc);
ce303f2d7c40f8 Justin Iurman 2020-06-24  363  
ce303f2d7c40f8 Justin Iurman 2020-06-24  364  out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24  365  	mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  366  	return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  367  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26285 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
@ 2020-06-25 10:52     ` Dan Carpenter
  0 siblings, 0 replies; 42+ messages in thread
From: Dan Carpenter @ 2020-06-25 10:52 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 6698 bytes --]

Hi Justin,

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: microblaze-randconfig-m031-20200624 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
net/ipv6/ioam6.c:164 ioam6_genl_delns() error: we previously assumed 'ns->schema' could be null (see line 158)
net/ipv6/ioam6.c:358 ioam6_genl_delsc() error: we previously assumed 'sc->ns' could be null (see line 352)

# https://github.com/0day-ci/linux/commit/ce303f2d7c40f84739505f1daa7dac53daa6c4c5
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout ce303f2d7c40f84739505f1daa7dac53daa6c4c5
vim +164 net/ipv6/ioam6.c

ce303f2d7c40f8 Justin Iurman 2020-06-24  135  
ce303f2d7c40f8 Justin Iurman 2020-06-24  136  static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24  137  {
ce303f2d7c40f8 Justin Iurman 2020-06-24  138  	struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24  139  	struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24  140  	struct ioam6_namespace *ns;
ce303f2d7c40f8 Justin Iurman 2020-06-24  141  	__be16 ns_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24  142  	int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  143  
ce303f2d7c40f8 Justin Iurman 2020-06-24  144  	if (!info->attrs[IOAM6_ATTR_NS_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24  145  		return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  146  
ce303f2d7c40f8 Justin Iurman 2020-06-24  147  	ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
ce303f2d7c40f8 Justin Iurman 2020-06-24  148  	nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24  149  
ce303f2d7c40f8 Justin Iurman 2020-06-24  150  	mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  151  
ce303f2d7c40f8 Justin Iurman 2020-06-24  152  	ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  153  	if (!ns) {
ce303f2d7c40f8 Justin Iurman 2020-06-24  154  		err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24  155  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  156  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  157  
ce303f2d7c40f8 Justin Iurman 2020-06-24 @158  	if (ns->schema)
                                                    ^^^^^^^^^^
Check for NULL

ce303f2d7c40f8 Justin Iurman 2020-06-24  159  		ns->schema->ns = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  160  
ce303f2d7c40f8 Justin Iurman 2020-06-24  161  	err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24  162  				     rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  163  	if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @164  		ns->schema->ns = ns;
                                                        ^^^^^^^^^^^^^^
Unchecked dereference.

ce303f2d7c40f8 Justin Iurman 2020-06-24  165  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  166  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  167  
ce303f2d7c40f8 Justin Iurman 2020-06-24  168  	ioam6_ns_release(ns);
ce303f2d7c40f8 Justin Iurman 2020-06-24  169  
ce303f2d7c40f8 Justin Iurman 2020-06-24  170  out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24  171  	mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  172  	return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  173  }

[ snip ]

ce303f2d7c40f8 Justin Iurman 2020-06-24  330  static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24  331  {
ce303f2d7c40f8 Justin Iurman 2020-06-24  332  	struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24  333  	struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24  334  	struct ioam6_schema *sc;
ce303f2d7c40f8 Justin Iurman 2020-06-24  335  	u32 sc_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24  336  	int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  337  
ce303f2d7c40f8 Justin Iurman 2020-06-24  338  	if (!info->attrs[IOAM6_ATTR_SC_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24  339  		return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  340  
ce303f2d7c40f8 Justin Iurman 2020-06-24  341  	sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
ce303f2d7c40f8 Justin Iurman 2020-06-24  342  	nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24  343  
ce303f2d7c40f8 Justin Iurman 2020-06-24  344  	mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  345  
ce303f2d7c40f8 Justin Iurman 2020-06-24  346  	sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  347  	if (!sc) {
ce303f2d7c40f8 Justin Iurman 2020-06-24  348  		err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24  349  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  350  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  351  
ce303f2d7c40f8 Justin Iurman 2020-06-24 @352  	if (sc->ns)
                                                    ^^^^^^
Check for NULL

ce303f2d7c40f8 Justin Iurman 2020-06-24  353  		sc->ns->schema = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  354  
ce303f2d7c40f8 Justin Iurman 2020-06-24  355  	err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24  356  				     rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  357  	if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @358  		sc->ns->schema = sc;
                                                        ^^^^^^^^^^^^^^
Unchecked dereference

ce303f2d7c40f8 Justin Iurman 2020-06-24  359  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  360  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  361  
ce303f2d7c40f8 Justin Iurman 2020-06-24  362  	ioam6_sc_release(sc);
ce303f2d7c40f8 Justin Iurman 2020-06-24  363  
ce303f2d7c40f8 Justin Iurman 2020-06-24  364  out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24  365  	mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  366  	return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  367  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26285 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
@ 2020-06-25 10:52     ` Dan Carpenter
  0 siblings, 0 replies; 42+ messages in thread
From: Dan Carpenter @ 2020-06-25 10:52 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6698 bytes --]

Hi Justin,

url:    https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: microblaze-randconfig-m031-20200624 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
net/ipv6/ioam6.c:164 ioam6_genl_delns() error: we previously assumed 'ns->schema' could be null (see line 158)
net/ipv6/ioam6.c:358 ioam6_genl_delsc() error: we previously assumed 'sc->ns' could be null (see line 352)

# https://github.com/0day-ci/linux/commit/ce303f2d7c40f84739505f1daa7dac53daa6c4c5
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout ce303f2d7c40f84739505f1daa7dac53daa6c4c5
vim +164 net/ipv6/ioam6.c

ce303f2d7c40f8 Justin Iurman 2020-06-24  135  
ce303f2d7c40f8 Justin Iurman 2020-06-24  136  static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24  137  {
ce303f2d7c40f8 Justin Iurman 2020-06-24  138  	struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24  139  	struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24  140  	struct ioam6_namespace *ns;
ce303f2d7c40f8 Justin Iurman 2020-06-24  141  	__be16 ns_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24  142  	int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  143  
ce303f2d7c40f8 Justin Iurman 2020-06-24  144  	if (!info->attrs[IOAM6_ATTR_NS_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24  145  		return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  146  
ce303f2d7c40f8 Justin Iurman 2020-06-24  147  	ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
ce303f2d7c40f8 Justin Iurman 2020-06-24  148  	nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24  149  
ce303f2d7c40f8 Justin Iurman 2020-06-24  150  	mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  151  
ce303f2d7c40f8 Justin Iurman 2020-06-24  152  	ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  153  	if (!ns) {
ce303f2d7c40f8 Justin Iurman 2020-06-24  154  		err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24  155  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  156  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  157  
ce303f2d7c40f8 Justin Iurman 2020-06-24 @158  	if (ns->schema)
                                                    ^^^^^^^^^^
Check for NULL

ce303f2d7c40f8 Justin Iurman 2020-06-24  159  		ns->schema->ns = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  160  
ce303f2d7c40f8 Justin Iurman 2020-06-24  161  	err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24  162  				     rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  163  	if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @164  		ns->schema->ns = ns;
                                                        ^^^^^^^^^^^^^^
Unchecked dereference.

ce303f2d7c40f8 Justin Iurman 2020-06-24  165  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  166  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  167  
ce303f2d7c40f8 Justin Iurman 2020-06-24  168  	ioam6_ns_release(ns);
ce303f2d7c40f8 Justin Iurman 2020-06-24  169  
ce303f2d7c40f8 Justin Iurman 2020-06-24  170  out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24  171  	mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  172  	return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  173  }

[ snip ]

ce303f2d7c40f8 Justin Iurman 2020-06-24  330  static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24  331  {
ce303f2d7c40f8 Justin Iurman 2020-06-24  332  	struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24  333  	struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24  334  	struct ioam6_schema *sc;
ce303f2d7c40f8 Justin Iurman 2020-06-24  335  	u32 sc_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24  336  	int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  337  
ce303f2d7c40f8 Justin Iurman 2020-06-24  338  	if (!info->attrs[IOAM6_ATTR_SC_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24  339  		return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  340  
ce303f2d7c40f8 Justin Iurman 2020-06-24  341  	sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
ce303f2d7c40f8 Justin Iurman 2020-06-24  342  	nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24  343  
ce303f2d7c40f8 Justin Iurman 2020-06-24  344  	mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  345  
ce303f2d7c40f8 Justin Iurman 2020-06-24  346  	sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  347  	if (!sc) {
ce303f2d7c40f8 Justin Iurman 2020-06-24  348  		err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24  349  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  350  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  351  
ce303f2d7c40f8 Justin Iurman 2020-06-24 @352  	if (sc->ns)
                                                    ^^^^^^
Check for NULL

ce303f2d7c40f8 Justin Iurman 2020-06-24  353  		sc->ns->schema = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24  354  
ce303f2d7c40f8 Justin Iurman 2020-06-24  355  	err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24  356  				     rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24  357  	if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @358  		sc->ns->schema = sc;
                                                        ^^^^^^^^^^^^^^
Unchecked dereference

ce303f2d7c40f8 Justin Iurman 2020-06-24  359  		goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24  360  	}
ce303f2d7c40f8 Justin Iurman 2020-06-24  361  
ce303f2d7c40f8 Justin Iurman 2020-06-24  362  	ioam6_sc_release(sc);
ce303f2d7c40f8 Justin Iurman 2020-06-24  363  
ce303f2d7c40f8 Justin Iurman 2020-06-24  364  out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24  365  	mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24  366  	return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24  367  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26285 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
                     ` (3 preceding siblings ...)
  2020-06-25  2:42   ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Tom Herbert
@ 2020-06-25 14:29   ` Tom Herbert
  2020-06-25 18:23     ` Justin Iurman
  4 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 14:29 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>

The IANA allocation is TEMPORARY, with an expiration date is
4/16/2021. Note from RFC7120:

"Implementers and deployers need to be aware that deprecation and
de-allocation could take place at any time after expiry; therefore, an
expired early allocation is best considered as deprecated."

Please add a comment in the code and in the Documentation to this effect.

> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> packets. Default is drop.

I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
packet containing the IOAM HBH option . Note that the act bits of the
option type are 00 which means the TLV is skipped if the option isn't
processed soI don't think it's correct to drop these packets by
default.

>
> Another per-interface sysctl ioam6_id is provided to define the IOAM
> (unique) identifier of the interface.
>
> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> identifier of the node.
>
> Two relativistic hash tables: one for IOAM namespaces, the other for
> IOAM schemas. A namespace can only have a single active schema and a
> schema can only be attached to a single namespace (1:1 relationship).
>
>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>   [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
>  include/linux/ipv6.h       |   2 +
>  include/net/ioam6.h        |  98 +++++++++++
>  include/net/netns/ipv6.h   |   2 +
>  include/uapi/linux/in6.h   |   1 +
>  include/uapi/linux/ipv6.h  |   2 +
>  net/ipv6/Makefile          |   2 +-
>  net/ipv6/addrconf.c        |  20 +++
>  net/ipv6/af_inet6.c        |   7 +
>  net/ipv6/exthdrs.c         |  67 ++++++++
>  net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
>  net/ipv6/sysctl_net_ipv6.c |   7 +
>  11 files changed, 533 insertions(+), 1 deletion(-)
>  create mode 100644 include/net/ioam6.h
>  create mode 100644 net/ipv6/ioam6.c
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 5312a718bc7a..15732f964c6e 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -75,6 +75,8 @@ struct ipv6_devconf {
>         __s32           disable_policy;
>         __s32           ndisc_tclass;
>         __s32           rpl_seg_enabled;
> +       __u32           ioam6_enabled;
> +       __u32           ioam6_id;
>
>         struct ctl_table_header *sysctl_header;
>  };
> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> new file mode 100644
> index 000000000000..2a910bc99947
> --- /dev/null
> +++ b/include/net/ioam6.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + *  IOAM IPv6 implementation
> + *
> + *  Author:
> + *  Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#ifndef _NET_IOAM6_H
> +#define _NET_IOAM6_H
> +
> +#include <linux/net.h>
> +#include <linux/ipv6.h>
> +#include <linux/rhashtable-types.h>
> +
> +#define IOAM6_OPT_TRACE_PREALLOC 0
> +
> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> +
> +#define IOAM6_TRACE_TYPE0  (1 << 31)
> +#define IOAM6_TRACE_TYPE1  (1 << 30)
> +#define IOAM6_TRACE_TYPE2  (1 << 29)
> +#define IOAM6_TRACE_TYPE3  (1 << 28)
> +#define IOAM6_TRACE_TYPE4  (1 << 27)
> +#define IOAM6_TRACE_TYPE5  (1 << 26)
> +#define IOAM6_TRACE_TYPE6  (1 << 25)
> +#define IOAM6_TRACE_TYPE7  (1 << 24)
> +#define IOAM6_TRACE_TYPE8  (1 << 23)
> +#define IOAM6_TRACE_TYPE9  (1 << 22)
> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> +
> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> +
> +struct ioam6_common_hdr {
> +       u8 opt_type;
> +       u8 opt_len;
> +       u8 res;
> +       u8 ioam_type;
> +       __be16 namespace_id;
> +} __packed;
> +
> +struct ioam6_trace_hdr {
> +       __be16 info;
> +       __be32 type;
> +} __packed;
> +
> +struct ioam6_namespace {
> +       struct rhash_head head;
> +       struct rcu_head rcu;
> +
> +       __be16 id;
> +       __be64 data;
> +       bool remove_tlv;
> +
> +       struct ioam6_schema *schema;
> +};
> +
> +struct ioam6_schema {
> +       struct rhash_head head;
> +       struct rcu_head rcu;
> +
> +       u32 id;
> +       int len;
> +       __be32 hdr;
> +       u8 *data;
> +
> +       struct ioam6_namespace *ns;
> +};
> +
> +struct ioam6_pernet_data {
> +       struct mutex lock;
> +       struct rhashtable namespaces;
> +       struct rhashtable schemas;
> +};
> +
> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> +       return net->ipv6.ioam6_data;
> +#else
> +       return NULL;
> +#endif
> +}
> +
> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> +                                 struct ioam6_namespace *ns);
> +
> +extern int ioam6_init(void);
> +extern void ioam6_exit(void);
> +
> +#endif
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 5ec054473d81..89b27fa721f4 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
>         int max_hbh_opts_len;
>         int seg6_flowlabel;
>         bool skip_notify_on_dev_down;
> +       unsigned int ioam6_id;
>  };
>
>  struct netns_ipv6 {
> @@ -115,6 +116,7 @@ struct netns_ipv6 {
>                 spinlock_t      lock;
>                 u32             seq;
>         } ip6addrlbl_table;
> +       struct ioam6_pernet_data *ioam6_data;
>  };
>
>  #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index 9f2273a08356..1c98435220c9 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
>  #define IPV6_TLV_PADN          1
>  #define IPV6_TLV_ROUTERALERT   5
>  #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
> +#define IPV6_TLV_IOAM_HOPOPTS  49

The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
Note from RFC7120:

"Implementers and deployers need to be aware that deprecation and
de-allocation could take place at any time after expiry; therefore, an
expired early allocation is best considered as deprecated. It is not
IANA's responsibility to track the status of allocations, their
expirations, or when they may be re-allocated."

The expiration date is Please add a comment here and in the
Documentation to this effect.

>  #define IPV6_TLV_JUMBO         194
>  #define IPV6_TLV_HAO           201     /* home address option */
>
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 13e8751bf24a..eb521b2dd885 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -189,6 +189,8 @@ enum {
>         DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>         DEVCONF_NDISC_TCLASS,
>         DEVCONF_RPL_SEG_ENABLED,
> +       DEVCONF_IOAM6_ENABLED,
> +       DEVCONF_IOAM6_ID,
>         DEVCONF_MAX
>  };
>
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index cf7b47bdb9b3..b7ef10d417d6 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -10,7 +10,7 @@ ipv6-objs :=  af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
>                 route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
>                 raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
>                 exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> -               udp_offload.o seg6.o fib6_notifier.o rpl.o
> +               udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>
>  ipv6-offload :=        ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 840bfdb3d7bd..6c952a28ade2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>         .disable_policy         = 0,
>         .rpl_seg_enabled        = 0,
> +       .ioam6_enabled          = 0,
> +       .ioam6_id               = 0,
>  };
>
>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>         .disable_policy         = 0,
>         .rpl_seg_enabled        = 0,
> +       .ioam6_enabled          = 0,
> +       .ioam6_id               = 0,
>  };
>
>  /* Check if link is ready: is it up and is a valid qdisc available */
> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
>         array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
>         array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
>         array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> +       array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> +       array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
>  }
>
>  static inline size_t inet6_ifla6_size(void)
> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec,
>         },
> +       {
> +               .procname       = "ioam6_enabled",
> +               .data           = &ipv6_devconf.ioam6_enabled,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
> +       {
> +               .procname       = "ioam6_id",
> +               .data           = &ipv6_devconf.ioam6_id,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec,
> +       },
>         {
>                 /* sentinel */
>         }
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index b304b882e031..63a9ffc4b283 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -62,6 +62,7 @@
>  #include <net/rpl.h>
>  #include <net/compat.h>
>  #include <net/xfrm.h>
> +#include <net/ioam6.h>
>
>  #include <linux/uaccess.h>
>  #include <linux/mroute6.h>
> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
>         if (err)
>                 goto rpl_fail;
>
> +       err = ioam6_init();
> +       if (err)
> +               goto ioam6_fail;
> +
>         err = igmp6_late_init();
>         if (err)
>                 goto igmp6_late_err;
> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
>  #endif
>  igmp6_late_err:
>         rpl_exit();
> +ioam6_fail:
> +       ioam6_exit();
>  rpl_fail:
>         seg6_exit();
>  seg6_fail:
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index f27ab3bf2e0c..00aee1358f1c 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -49,6 +49,8 @@
>  #include <net/seg6_hmac.h>
>  #endif
>  #include <net/rpl.h>
> +#include <net/ioam6.h>
> +#include <net/dst_metadata.h>
>
>  #include <linux/uaccess.h>
>
> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>         return TLV_REJECT;
>  }
>
> +/* IOAM */
> +
> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> +{
> +       struct ioam6_common_hdr *ioamh;
> +       struct ioam6_namespace *ns;
> +
> +       /* Must be 4n-aligned */
> +       if (optoff & 3)
> +               goto drop;
> +
> +       if (!skb_valid_dst(skb))
> +               ip6_route_input(skb);
> +
> +       /* IOAM must be enabled on ingress interface */
> +       if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> +               goto drop;
> +
> +       ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> +       ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> +
> +       /* Unknown IOAM namespace, either:
> +        *  - Drop it if IOAM is not enabled on egress interface (if any)
> +        *  - Ignore it otherwise
> +        */
> +       if (!ns) {
> +               if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> +                   !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> +                       goto drop;
> +
> +               goto accept;
> +       }
> +
> +       if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> +               goto remove;
> +
> +       /* Known IOAM namespace which must not be removed:
> +        * IOAM must be enabled on egress interface
> +        */
> +       if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> +           !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> +               goto drop;
> +
> +       switch (ioamh->ioam_type) {
> +       case IOAM6_OPT_TRACE_PREALLOC:
> +               ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> +               IP6CB(skb)->flags |= IP6SKB_IOAM;
> +               break;
> +       default:
> +               break;
> +       }
> +
> +accept:
> +       return TLV_ACCEPT;
> +remove:
> +       return TLV_REMOVE;
> +drop:
> +       kfree_skb(skb);
> +       return TLV_REJECT;
> +}
> +
>  /* Jumbo payload */
>
>  static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
>                 .type   = IPV6_TLV_ROUTERALERT,
>                 .func   = ipv6_hop_ra,
>         },
> +       {
> +               .type   = IPV6_TLV_IOAM_HOPOPTS,
> +               .func   = ipv6_hop_ioam,
> +       },
>         {
>                 .type   = IPV6_TLV_JUMBO,
>                 .func   = ipv6_hop_jumbo,
> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> new file mode 100644
> index 000000000000..406aa78eb504
> --- /dev/null
> +++ b/net/ipv6/ioam6.c
> @@ -0,0 +1,326 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + *  IOAM IPv6 implementation
> + *
> + *  Author:
> + *  Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/net.h>
> +#include <linux/rhashtable.h>
> +
> +#include <net/addrconf.h>
> +#include <net/ioam6.h>
> +
> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> +{
> +       kfree_rcu(ns, rcu);
> +}
> +
> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> +{
> +       kfree_rcu(sc, rcu);
> +}
> +
> +static void ioam6_free_ns(void *ptr, void *arg)
> +{
> +       struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> +
> +       if (ns)
> +               ioam6_ns_release(ns);
> +}
> +
> +static void ioam6_free_sc(void *ptr, void *arg)
> +{
> +       struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> +
> +       if (sc)
> +               ioam6_sc_release(sc);
> +}
> +
> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> +       const struct ioam6_namespace *ns = obj;
> +
> +       return (ns->id != *(__be16 *)arg->key);
> +}
> +
> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> +       const struct ioam6_schema *sc = obj;
> +
> +       return (sc->id != *(u32 *)arg->key);
> +}
> +
> +static const struct rhashtable_params rht_ns_params = {
> +       .key_len                = sizeof(__be16),
> +       .key_offset             = offsetof(struct ioam6_namespace, id),
> +       .head_offset            = offsetof(struct ioam6_namespace, head),
> +       .automatic_shrinking    = true,
> +       .obj_cmpfn              = ioam6_ns_cmpfn,
> +};
> +
> +static const struct rhashtable_params rht_sc_params = {
> +       .key_len                = sizeof(u32),
> +       .key_offset             = offsetof(struct ioam6_schema, id),
> +       .head_offset            = offsetof(struct ioam6_schema, head),
> +       .automatic_shrinking    = true,
> +       .obj_cmpfn              = ioam6_sc_cmpfn,
> +};
> +
> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> +{
> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> +       return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> +}
> +
> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> +                               u32 trace_type, struct ioam6_namespace *ns)
> +{
> +       u8 *data = skb_network_header(skb) + nodeoff;
> +       struct __kernel_sock_timeval ts;
> +       u64 raw_u64;
> +       u32 raw_u32;
> +       u16 raw_u16;
> +       u8 byte;
> +
> +       /* hop_lim and node_id */
> +       if (trace_type & IOAM6_TRACE_TYPE0) {
> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> +               raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> +               if (!raw_u32)
> +                       raw_u32 = IOAM6_EMPTY_FIELD_u24;
> +               else
> +                       raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> +               *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* ingress_if_id and egress_if_id */
> +       if (trace_type & IOAM6_TRACE_TYPE1) {
> +               raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> +               if (!raw_u16)
> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> +               data += sizeof(__be16);
> +
> +               raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> +               if (!raw_u16)
> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> +               data += sizeof(__be16);
> +       }
> +
> +       /* timestamp seconds */
> +       if (trace_type & IOAM6_TRACE_TYPE2) {
> +               if (!skb->tstamp) {
> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               } else {
> +                       skb_get_new_timestamp(skb, &ts);
> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> +               }
> +               data += sizeof(__be32);
> +       }
> +
> +       /* timestamp subseconds */
> +       if (trace_type & IOAM6_TRACE_TYPE3) {
> +               if (!skb->tstamp) {
> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               } else {
> +                       if (!(trace_type & IOAM6_TRACE_TYPE2))
> +                               skb_get_new_timestamp(skb, &ts);
> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> +               }
> +               data += sizeof(__be32);
> +       }
> +
> +       /* transit delay */
> +       if (trace_type & IOAM6_TRACE_TYPE4) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* namespace data */
> +       if (trace_type & IOAM6_TRACE_TYPE5) {
> +               *(__be32 *)data = (__be32)ns->data;
> +               data += sizeof(__be32);
> +       }
> +
> +       /* queue depth */
> +       if (trace_type & IOAM6_TRACE_TYPE6) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* hop_lim and node_id (wide) */
> +       if (trace_type & IOAM6_TRACE_TYPE7) {
> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> +               raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> +               if (!raw_u64)
> +                       raw_u64 = IOAM6_EMPTY_FIELD_u56;
> +               else
> +                       raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> +               *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> +               data += sizeof(__be64);
> +       }
> +
> +       /* ingress_if_id and egress_if_id (wide) */
> +       if (trace_type & IOAM6_TRACE_TYPE8) {
> +               raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> +               if (!raw_u32)
> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> +               data += sizeof(__be32);
> +
> +               raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> +               if (!raw_u32)
> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* namespace data (wide) */
> +       if (trace_type & IOAM6_TRACE_TYPE9) {
> +               *(__be64 *)data = ns->data;
> +               data += sizeof(__be64);
> +       }
> +
> +       /* buffer occupancy */
> +       if (trace_type & IOAM6_TRACE_TYPE10) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* checksum complement */
> +       if (trace_type & IOAM6_TRACE_TYPE11) {
> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> +               data += sizeof(__be32);
> +       }
> +
> +       /* opaque state snapshot */
> +       if (trace_type & IOAM6_TRACE_TYPE22) {
> +               if (!ns->schema) {
> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> +               } else {
> +                       *(__be32 *)data = ns->schema->hdr;
> +                       data += sizeof(__be32);
> +                       memcpy(data, ns->schema->data, ns->schema->len);
> +               }
> +       }
> +}
> +
> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> +                          struct ioam6_namespace *ns)
> +{
> +       u8 nodelen, flags, remlen, sclen = 0;
> +       struct ioam6_trace_hdr *trh;
> +       int nodeoff;
> +       u16 info;
> +       u32 type;
> +
> +       trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> +       info = be16_to_cpu(trh->info);
> +       type = be32_to_cpu(trh->type);
> +
> +       nodelen = info >> 11;
> +       flags = (info >> 7) & 0xf;
> +       remlen = info & 0x7f;
> +
> +       /* Skip if Overflow bit is set OR
> +        * if an unknown type (bit 12-21) is set
> +        */
> +       if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> +               return;
> +
> +       /* NodeLen does not include Opaque State Snapshot length. We need to
> +        * take it into account if the corresponding bit is set and if current
> +        * IOAM namespace has an active schema attached to it
> +        */
> +       if (type & IOAM6_TRACE_TYPE22) {
> +               /* Opaque State Snapshot header size */
> +               sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> +
> +               if (ns->schema)
> +                       sclen += ns->schema->len / 4;
> +       }
> +
> +       /* Not enough space remaining: set Overflow bit and skip */
> +       if (!remlen || remlen < (nodelen + sclen)) {
> +               info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> +               trh->info = cpu_to_be16(info);
> +               return;
> +       }
> +
> +       nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> +       ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> +
> +       /* Update RemainingLen */
> +       remlen -= nodelen + sclen;
> +       info = (info & 0xff80) | remlen;
> +       trh->info = cpu_to_be16(info);
> +}
> +
> +static int __net_init ioam6_net_init(struct net *net)
> +{
> +       struct ioam6_pernet_data *nsdata;
> +       int err = -ENOMEM;
> +
> +       nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> +       if (!nsdata)
> +               goto out;
> +
> +       mutex_init(&nsdata->lock);
> +       net->ipv6.ioam6_data = nsdata;
> +
> +       err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> +       if (err)
> +               goto free_nsdata;
> +
> +       err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> +       if (err)
> +               goto free_rht_ns;
> +
> +out:
> +       return err;
> +free_rht_ns:
> +       rhashtable_destroy(&nsdata->namespaces);
> +free_nsdata:
> +       kfree(nsdata);
> +       net->ipv6.ioam6_data = NULL;
> +       goto out;
> +}
> +
> +static void __net_exit ioam6_net_exit(struct net *net)
> +{
> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> +       rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> +       rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> +
> +       kfree(nsdata);
> +}
> +
> +static struct pernet_operations ioam6_net_ops = {
> +       .init = ioam6_net_init,
> +       .exit = ioam6_net_exit,
> +};
> +
> +int __init ioam6_init(void)
> +{
> +       int err = register_pernet_subsys(&ioam6_net_ops);
> +
> +       if (err)
> +               return err;
> +
> +       pr_info("In-situ OAM (IOAM) with IPv6\n");
> +       return 0;
> +}
> +
> +void ioam6_exit(void)
> +{
> +       unregister_pernet_subsys(&ioam6_net_ops);
> +}
> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> index fac2135aa47b..da49b33ab6fc 100644
> --- a/net/ipv6/sysctl_net_ipv6.c
> +++ b/net/ipv6/sysctl_net_ipv6.c
> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
>                 .mode           = 0644,
>                 .proc_handler   = proc_dointvec
>         },
> +       {
> +               .procname       = "ioam6_id",
> +               .data           = &init_net.ipv6.sysctl.ioam6_id,
> +               .maxlen         = sizeof(int),
> +               .mode           = 0644,
> +               .proc_handler   = proc_dointvec
> +       },
>         { }
>  };
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-24 20:32   ` Tom Herbert
@ 2020-06-25 17:47     ` Justin Iurman
  2020-06-25 20:53       ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 17:47 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

Hi Tom,

>> Add the possibility to remove one or more consecutive TLVs without
>> messing up the alignment of others. For now, only IOAM requires this
>> behavior.
>>
> Hi Justin,
> 
> Can you explain the motivation for this? Per RFC8200, extension
> headers in flight are not to be added, removed, or modified outside of
> the standard rules for processing modifiable HBH and DO TLVs., that
> would include adding and removing TLVs in EH. One obvious problem this

As you already know from our last meeting, IOAM may be configured on a node such that a specific IOAM namespace should be removed. Therefore, this patch provides support for the deletion of a TLV (or consecutive TLVs), without removing the entire EH (if it's empty, there will be padding). Note that there is a similar "problem" with the Incremental Trace where you'd need to expand the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is against modification of in-flight EHs, but there are several reasons that, I believe, mitigates this statement.

Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely deployed on the Internet. We can distinguish two big scenarios: (i) in-transit traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the domain, ie from an IOAM node inside the domain to another one (no need for encapsulation). In both cases, we kind of own the traffic: (i) encapsulation, so we modify "our" header and (ii) we already own the traffic.

And if someone is still angry about this, well, the good news is that such modification can be avoided most of the time. Indeed, operators are advised to remove an IOAM namespace only on egress nodes. This way, the destination (either the tunnel destination or the real destination, depending on the scenario) will receive EHs and take care of them without the need to remove anything. But, again, operators can do what they want and I'd tend to adhere to David's philosophy [1] and give them the possibility to choose what to do.

> creates is that it breaks AH if the TLVs are removed in HBH before AH
> is processed (AH is processed after HBH).

Correct. But I don't think it should prevent us from having IOAM in the kernel. Again, operators could simply apply IOAM on a subset of the traffic that does not include AHs, for example.

Justin

  [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html

> Tom
>> By default, an 8-octet boundary is automatically assumed. This is the
>> price to pay (at most a useless 4-octet padding) to make sure everything
>> is still aligned after the removal.
>>
>> Proof: let's assume for instance the following alignments 2n, 4n and 8n
>> respectively for options X, Y and Z, inside a Hop-by-Hop extension
>> header.
>>
>> Example 1:
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       X       |       X       |    Padding    |    Padding    |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |                                                               |
>> ~                Option to be removed (8 octets)                ~
>> |                                                               |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Y       |       Y       |       Y       |       Y       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |    Padding    |    Padding    |    Padding    |    Padding    |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
>> boundary (same result in both cases).
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       X       |       X       |    Padding    |    Padding    |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Y       |       Y       |       Y       |       Y       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |    Padding    |    Padding    |    Padding    |    Padding    |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Example 2:
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       X       |       X       |    Padding    |    Padding    |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |                Option to be removed (4 octets)                |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Y       |       Y       |       Y       |       Y       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
>> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
>> of 8 anymore.
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       X       |       X       |    Padding    |    Padding    |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Y       |       Y       |       Y       |       Y       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> |       Z       |       Z       |       Z       |       Z       |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Therefore, the largest (8-octet) boundary is assumed by default and for
>> all, which means that blocks are only moved in multiples of 8. This
>> assertion guarantees good alignment.
>>
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 108 insertions(+), 26 deletions(-)
>>
>> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> index e9b366994475..f27ab3bf2e0c 100644
>> --- a/net/ipv6/exthdrs.c
>> +++ b/net/ipv6/exthdrs.c
>> @@ -52,17 +52,27 @@
>>
>>  #include <linux/uaccess.h>
>>
>> -/*
>> - *     Parsing tlv encoded headers.
>> +/* States for TLV parsing functions. */
>> +
>> +enum {
>> +       TLV_ACCEPT,
>> +       TLV_REJECT,
>> +       TLV_REMOVE,
>> +       __TLV_MAX
>> +};
>> +
>> +/* Parsing TLV encoded headers.
>>   *
>> - *     Parsing function "func" returns true, if parsing succeed
>> - *     and false, if it failed.
>> - *     It MUST NOT touch skb->h.
>> + * Parsing function "func" returns either:
>> + *  - TLV_ACCEPT if parsing succeeds
>> + *  - TLV_REJECT if parsing fails
>> + *  - TLV_REMOVE if TLV must be removed
>> + * It MUST NOT touch skb->h.
>>   */
>>
>>  struct tlvtype_proc {
>>         int     type;
>> -       bool    (*func)(struct sk_buff *skb, int offset);
>> +       int     (*func)(struct sk_buff *skb, int offset);
>>  };
>>
>>  /*********************
>> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
>> optoff,
>>         return false;
>>  }
>>
>> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
>> +
>> +static int remove_tlv(int start, int end, struct sk_buff *skb)
>> +{
>> +       int len = end - start;
>> +       int padlen = len % 8;
>> +       unsigned char *h;
>> +       int rlen, off;
>> +       u16 pl_len;
>> +
>> +       rlen = len - padlen;
>> +       if (rlen) {
>> +               skb_pull(skb, rlen);
>> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
>> +                       start);
>> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
>> +
>> +               skb_reset_network_header(skb);
>> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
>> +
>> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
>> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
>> +
>> +               skb_transport_header(skb)[1] -= rlen >> 3;
>> +               end -= rlen;
>> +       }
>> +
>> +       if (padlen) {
>> +               off = end - padlen;
>> +               h = skb_network_header(skb);
>> +
>> +               if (padlen == 1) {
>> +                       h[off] = IPV6_TLV_PAD1;
>> +               } else {
>> +                       padlen -= 2;
>> +
>> +                       h[off] = IPV6_TLV_PADN;
>> +                       h[off + 1] = padlen;
>> +                       memset(&h[off + 2], 0, padlen);
>> +               }
>> +       }
>> +
>> +       return end;
>> +}
>> +
>>  /* Parse tlv encoded option header (hop-by-hop or destination) */
>>
>>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>>                           struct sk_buff *skb,
>> -                         int max_count)
>> +                         int max_count,
>> +                         bool removable)
>>  {
>>         int len = (skb_transport_header(skb)[1] + 1) << 3;
>> -       const unsigned char *nh = skb_network_header(skb);
>> +       unsigned char *nh = skb_network_header(skb);
>>         int off = skb_network_header_len(skb);
>>         const struct tlvtype_proc *curr;
>>         bool disallow_unknowns = false;
>> +       int off_remove = 0;
>>         int tlv_count = 0;
>>         int padlen = 0;
>> +       int ret;
>>
>>         if (unlikely(max_count < 0)) {
>>                 disallow_unknowns = true;
>> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
>> *procs,
>>                         if (tlv_count > max_count)
>>                                 goto bad;
>>
>> +                       ret = -1;
>>                         for (curr = procs; curr->type >= 0; curr++) {
>>                                 if (curr->type == nh[off]) {
>>                                         /* type specific length/alignment
>>                                            checks will be performed in the
>>                                            func(). */
>> -                                       if (curr->func(skb, off) == false)
>> +                                       ret = curr->func(skb, off);
>> +                                       if (ret == TLV_REJECT)
>>                                                 return false;
>>                                         break;
>>                                 }
>> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>>                                 return false;
>>
>> +                       if (removable) {
>> +                               if (ret == TLV_REMOVE) {
>> +                                       if (!off_remove)
>> +                                               off_remove = off - padlen;
>> +                               } else if (off_remove) {
>> +                                       off = remove_tlv(off_remove, off, skb);
>> +                                       nh = skb_network_header(skb);
>> +                                       off_remove = 0;
>> +                               }
>> +                       }
>> +
>>                         padlen = 0;
>>                         break;
>>                 }
>> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>>                 len -= optlen;
>>         }
>>
>> -       if (len == 0)
>> +       if (len == 0) {
>> +               /* Don't forget last TLV if it must be removed */
>> +               if (off_remove)
>> +                       remove_tlv(off_remove, off, skb);
>> +
>>                 return true;
>> +       }
>>  bad:
>>         kfree_skb(skb);
>>         return false;
>> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>>   *****************************/
>>
>>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>>  {
>>         struct ipv6_destopt_hao *hao;
>>         struct inet6_skb_parm *opt = IP6CB(skb);
>> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>>         if (skb->tstamp == 0)
>>                 __net_timestamp(skb);
>>
>> -       return true;
>> +       return TLV_ACCEPT;
>>
>>   discard:
>>         kfree_skb(skb);
>> -       return false;
>> +       return TLV_REJECT;
>>  }
>>  #endif
>>
>> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>>  #endif
>>
>>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
>> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
>> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
>> +                         false)) {
>>                 skb->transport_header += extlen;
>>                 opt = IP6CB(skb);
>>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
>> *skb)
>>
>>  /* Router Alert as of RFC 2711 */
>>
>> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>>  {
>>         const unsigned char *nh = skb_network_header(skb);
>>
>>         if (nh[optoff + 1] == 2) {
>>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
>> -               return true;
>> +               return TLV_ACCEPT;
>>         }
>>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>>                             nh[optoff + 1]);
>>         kfree_skb(skb);
>> -       return false;
>> +       return TLV_REJECT;
>>  }
>>
>>  /* Jumbo payload */
>>
>> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>>  {
>>         const unsigned char *nh = skb_network_header(skb);
>>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
>> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> optoff)
>>         if (pkt_len <= IPV6_MAXPLEN) {
>>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
>> -               return false;
>> +               return TLV_REJECT;
>>         }
>>         if (ipv6_hdr(skb)->payload_len) {
>>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
>> -               return false;
>> +               return TLV_REJECT;
>>         }
>>
>>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
>> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> optoff)
>>                 goto drop;
>>
>>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
>> -       return true;
>> +       return TLV_ACCEPT;
>>
>>  drop:
>>         kfree_skb(skb);
>> -       return false;
>> +       return TLV_REJECT;
>>  }
>>
>>  /* CALIPSO RFC 5570 */
>>
>> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>>  {
>>         const unsigned char *nh = skb_network_header(skb);
>>
>> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
>> optoff)
>>         if (!calipso_validate(skb, nh + optoff))
>>                 goto drop;
>>
>> -       return true;
>> +       return TLV_ACCEPT;
>>
>>  drop:
>>         kfree_skb(skb);
>> -       return false;
>> +       return TLV_REJECT;
>>  }
>>
>>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>>
>>         opt->flags |= IP6SKB_HOPBYHOP;
>>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
>> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
>> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
>> +                         true)) {
>> +               /* we need to refresh the length in case
>> +                * at least one TLV was removed
>> +                */
>> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
>>                 skb->transport_header += extlen;
>>                 opt = IP6CB(skb);
>>                 opt->nhoff = sizeof(struct ipv6hdr);
>> --
>> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
  2020-06-25  2:32   ` Tom Herbert
@ 2020-06-25 17:56     ` Justin Iurman
  2020-06-26  0:48       ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 17:56 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

>> Implement the IOAM egress behavior.
>>
>> According to RFC 8200:
>> "Extension headers (except for the Hop-by-Hop Options header) are not
>>  processed, inserted, or deleted by any node along a packet's delivery
>>  path, until the packet reaches the node (or each of the set of nodes,
>>  in the case of multicast) identified in the Destination Address field
>>  of the IPv6 header."
>>
>> Therefore, an ingress node (an IOAM domain border) must encapsulate an
>> incoming IPv6 packet with another similar IPv6 header that will contain
>> IOAM data while it traverses the domain. When leaving, the egress node,
>> another IOAM domain border which is also the tunnel destination, must
>> decapsulate the packet.
> 
> This is just IP in IP encapsulation that happens to be terminated at
> an egress node of the IOAM domain. The fact that it's IOAM isn't
> germaine, this IP in IP is done in a variety of ways. We should be
> using the normal protocol handler for NEXTHDR_IPV6  instead of special
> case code.

Agree. The reason for this special case code is that I was not aware of a more elegant solution.

Justin

>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>>  include/linux/ipv6.h |  1 +
>>  net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> index 2cb445a8fc9e..5312a718bc7a 100644
>> --- a/include/linux/ipv6.h
>> +++ b/include/linux/ipv6.h
>> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
>>  #define IP6SKB_HOPBYHOP        32
>>  #define IP6SKB_L3SLAVE         64
>>  #define IP6SKB_JUMBOGRAM      128
>> +#define IP6SKB_IOAM           256
>>  };
>>
>>  #if defined(CONFIG_NET_L3_MASTER_DEV)
>> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> index e96304d8a4a7..8cf75cc5e806 100644
>> --- a/net/ipv6/ip6_input.c
>> +++ b/net/ipv6/ip6_input.c
>> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
>> *));
>>  void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>>                               bool have_final)
>>  {
>> +       struct inet6_skb_parm *opt = IP6CB(skb);
>>         const struct inet6_protocol *ipprot;
>>         struct inet6_dev *idev;
>>         unsigned int nhoff;
>> +       u8 hop_limit;
>>         bool raw;
>>
>>         /*
>> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> sk_buff *skb, int nexthdr,
>>         } else {
>>                 if (!raw) {
>>                         if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
>> +                               /* IOAM Tunnel Decapsulation
>> +                                * Packet is going to re-enter the stack
>> +                                */
>> +                               if (nexthdr == NEXTHDR_IPV6 &&
>> +                                   (opt->flags & IP6SKB_IOAM)) {
>> +                                       hop_limit = ipv6_hdr(skb)->hop_limit;
>> +
>> +                                       skb_reset_network_header(skb);
>> +                                       skb_reset_transport_header(skb);
>> +                                       skb->encapsulation = 0;
>> +
>> +                                       ipv6_hdr(skb)->hop_limit = hop_limit;
>> +                                       __skb_tunnel_rx(skb, skb->dev,
>> +                                                       dev_net(skb->dev));
>> +
>> +                                       netif_rx(skb);
>> +                                       goto out;
>> +                               }
>> +
>>                                 __IP6_INC_STATS(net, idev,
>>                                                 IPSTATS_MIB_INUNKNOWNPROTOS);
>>                                 icmpv6_send(skb, ICMPV6_PARAMPROB,
>> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> sk_buff *skb, int nexthdr,
>>                         consume_skb(skb);
>>                 }
>>         }
>> +out:
>>         return;
>>
>>  discard:
>> --
>> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
  2020-06-25  2:53   ` Tom Herbert
@ 2020-06-25 18:00     ` Justin Iurman
  0 siblings, 0 replies; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 18:00 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

>> Add documentation for new IOAM sysctls:
>>  - ioam6_id: a namespace sysctl
>>  - ioam6_enabled and ioam6_id: two per-interface sysctls
>>
> Are you planning add a more detailed description of the feature and
> how to use it (would be nice I think :-) )

Of course, will do that ASAP!

Justin

>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>>  Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
>>  Documentation/networking/ip-sysctl.rst    |  5 +++++
>>  2 files changed, 25 insertions(+)
>>  create mode 100644 Documentation/networking/ioam6-sysctl.rst
>>
>> diff --git a/Documentation/networking/ioam6-sysctl.rst
>> b/Documentation/networking/ioam6-sysctl.rst
>> new file mode 100644
>> index 000000000000..bad6c64907bc
>> --- /dev/null
>> +++ b/Documentation/networking/ioam6-sysctl.rst
>> @@ -0,0 +1,20 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=====================
>> +IOAM6 Sysfs variables
>> +=====================
>> +
>> +
>> +/proc/sys/net/conf/<iface>/ioam6_* variables:
>> +============================================
>> +
>> +ioam6_enabled - BOOL
>> +       Enable (accept) or disable (drop) IPv6 IOAM packets on this interface.
>> +
>> +       * 0 - disabled (default)
>> +       * not 0 - enabled
>> +
>> +ioam6_id - INTEGER
>> +       Define the IOAM id of this interface.
>> +
>> +       Default is 0.
>> diff --git a/Documentation/networking/ip-sysctl.rst
>> b/Documentation/networking/ip-sysctl.rst
>> index b72f89d5694c..5ba11f2766bd 100644
>> --- a/Documentation/networking/ip-sysctl.rst
>> +++ b/Documentation/networking/ip-sysctl.rst
>> @@ -1770,6 +1770,11 @@ nexthop_compat_mode - BOOLEAN
>>         and extraneous notifications.
>>         Default: true (backward compat mode)
>>
>> +ioam6_id - INTEGER
>> +       Define the IOAM id of this node.
>> +
>> +       Default: 0
>> +
>>  IPv6 Fragmentation:
>>
>>  ip6frag_high_thresh - INTEGER
>> --
>> 2.17.1

-- 
Justin Iurman
Université de Liège (ULg)
Bât. B28  Algorithmique des Grands Systèmes
Quartier Polytech 1
Allée de la Découverte 10
4000 Liège
Phone: +32 4 366 28 09

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-25 14:29   ` Tom Herbert
@ 2020-06-25 18:23     ` Justin Iurman
  2020-06-25 20:32       ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 18:23 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

>> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
>> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
>> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>>
> 
> The IANA allocation is TEMPORARY, with an expiration date is
> 4/16/2021. Note from RFC7120:
> 
> "Implementers and deployers need to be aware that deprecation and
> de-allocation could take place at any time after expiry; therefore, an
> expired early allocation is best considered as deprecated."
> 
> Please add a comment in the code and in the Documentation to this effect.

I'll do that, thanks. What kind of comment (is there an official pattern?) and, where in the Documentation should I add it?

>> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
>> packets. Default is drop.
> 
> I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> packet containing the IOAM HBH option . Note that the act bits of the

Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets containing the IOAM HBH option.

> option type are 00 which means the TLV is skipped if the option isn't
> processed soI don't think it's correct to drop these packets by
> default.

Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for this option, I do believe it should be disabled (dropped) by default for nodes that "speak IOAM". Indeed, you don't want anyone with a kernel that includes IOAM to accept IOAM packets by default, which would mean that anyone would create (potentially without being aware) an IOAM domain. And, also, to avoid spreading leaks.

Justin

>> Another per-interface sysctl ioam6_id is provided to define the IOAM
>> (unique) identifier of the interface.
>>
>> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
>> identifier of the node.
>>
>> Two relativistic hash tables: one for IOAM namespaces, the other for
>> IOAM schemas. A namespace can only have a single active schema and a
>> schema can only be attached to a single namespace (1:1 relationship).
>>
>>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>>   [3]
>>   https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>>
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>>  include/linux/ipv6.h       |   2 +
>>  include/net/ioam6.h        |  98 +++++++++++
>>  include/net/netns/ipv6.h   |   2 +
>>  include/uapi/linux/in6.h   |   1 +
>>  include/uapi/linux/ipv6.h  |   2 +
>>  net/ipv6/Makefile          |   2 +-
>>  net/ipv6/addrconf.c        |  20 +++
>>  net/ipv6/af_inet6.c        |   7 +
>>  net/ipv6/exthdrs.c         |  67 ++++++++
>>  net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
>>  net/ipv6/sysctl_net_ipv6.c |   7 +
>>  11 files changed, 533 insertions(+), 1 deletion(-)
>>  create mode 100644 include/net/ioam6.h
>>  create mode 100644 net/ipv6/ioam6.c
>>
>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> index 5312a718bc7a..15732f964c6e 100644
>> --- a/include/linux/ipv6.h
>> +++ b/include/linux/ipv6.h
>> @@ -75,6 +75,8 @@ struct ipv6_devconf {
>>         __s32           disable_policy;
>>         __s32           ndisc_tclass;
>>         __s32           rpl_seg_enabled;
>> +       __u32           ioam6_enabled;
>> +       __u32           ioam6_id;
>>
>>         struct ctl_table_header *sysctl_header;
>>  };
>> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
>> new file mode 100644
>> index 000000000000..2a910bc99947
>> --- /dev/null
>> +++ b/include/net/ioam6.h
>> @@ -0,0 +1,98 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + *  IOAM IPv6 implementation
>> + *
>> + *  Author:
>> + *  Justin Iurman <justin.iurman@uliege.be>
>> + */
>> +
>> +#ifndef _NET_IOAM6_H
>> +#define _NET_IOAM6_H
>> +
>> +#include <linux/net.h>
>> +#include <linux/ipv6.h>
>> +#include <linux/rhashtable-types.h>
>> +
>> +#define IOAM6_OPT_TRACE_PREALLOC 0
>> +
>> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
>> +
>> +#define IOAM6_TRACE_TYPE0  (1 << 31)
>> +#define IOAM6_TRACE_TYPE1  (1 << 30)
>> +#define IOAM6_TRACE_TYPE2  (1 << 29)
>> +#define IOAM6_TRACE_TYPE3  (1 << 28)
>> +#define IOAM6_TRACE_TYPE4  (1 << 27)
>> +#define IOAM6_TRACE_TYPE5  (1 << 26)
>> +#define IOAM6_TRACE_TYPE6  (1 << 25)
>> +#define IOAM6_TRACE_TYPE7  (1 << 24)
>> +#define IOAM6_TRACE_TYPE8  (1 << 23)
>> +#define IOAM6_TRACE_TYPE9  (1 << 22)
>> +#define IOAM6_TRACE_TYPE10 (1 << 21)
>> +#define IOAM6_TRACE_TYPE11 (1 << 20)
>> +#define IOAM6_TRACE_TYPE22 (1 << 9)
>> +
>> +#define IOAM6_EMPTY_FIELD_u16 0xffff
>> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
>> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
>> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
>> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
>> +
>> +struct ioam6_common_hdr {
>> +       u8 opt_type;
>> +       u8 opt_len;
>> +       u8 res;
>> +       u8 ioam_type;
>> +       __be16 namespace_id;
>> +} __packed;
>> +
>> +struct ioam6_trace_hdr {
>> +       __be16 info;
>> +       __be32 type;
>> +} __packed;
>> +
>> +struct ioam6_namespace {
>> +       struct rhash_head head;
>> +       struct rcu_head rcu;
>> +
>> +       __be16 id;
>> +       __be64 data;
>> +       bool remove_tlv;
>> +
>> +       struct ioam6_schema *schema;
>> +};
>> +
>> +struct ioam6_schema {
>> +       struct rhash_head head;
>> +       struct rcu_head rcu;
>> +
>> +       u32 id;
>> +       int len;
>> +       __be32 hdr;
>> +       u8 *data;
>> +
>> +       struct ioam6_namespace *ns;
>> +};
>> +
>> +struct ioam6_pernet_data {
>> +       struct mutex lock;
>> +       struct rhashtable namespaces;
>> +       struct rhashtable schemas;
>> +};
>> +
>> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +       return net->ipv6.ioam6_data;
>> +#else
>> +       return NULL;
>> +#endif
>> +}
>> +
>> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
>> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> +                                 struct ioam6_namespace *ns);
>> +
>> +extern int ioam6_init(void);
>> +extern void ioam6_exit(void);
>> +
>> +#endif
>> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
>> index 5ec054473d81..89b27fa721f4 100644
>> --- a/include/net/netns/ipv6.h
>> +++ b/include/net/netns/ipv6.h
>> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
>>         int max_hbh_opts_len;
>>         int seg6_flowlabel;
>>         bool skip_notify_on_dev_down;
>> +       unsigned int ioam6_id;
>>  };
>>
>>  struct netns_ipv6 {
>> @@ -115,6 +116,7 @@ struct netns_ipv6 {
>>                 spinlock_t      lock;
>>                 u32             seq;
>>         } ip6addrlbl_table;
>> +       struct ioam6_pernet_data *ioam6_data;
>>  };
>>
>>  #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
>> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
>> index 9f2273a08356..1c98435220c9 100644
>> --- a/include/uapi/linux/in6.h
>> +++ b/include/uapi/linux/in6.h
>> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
>>  #define IPV6_TLV_PADN          1
>>  #define IPV6_TLV_ROUTERALERT   5
>>  #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
>> +#define IPV6_TLV_IOAM_HOPOPTS  49
> 
> The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> Note from RFC7120:
> 
> "Implementers and deployers need to be aware that deprecation and
> de-allocation could take place at any time after expiry; therefore, an
> expired early allocation is best considered as deprecated. It is not
> IANA's responsibility to track the status of allocations, their
> expirations, or when they may be re-allocated."
> 
> The expiration date is Please add a comment here and in the
> Documentation to this effect.
> 
>>  #define IPV6_TLV_JUMBO         194
>>  #define IPV6_TLV_HAO           201     /* home address option */
>>
>> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
>> index 13e8751bf24a..eb521b2dd885 100644
>> --- a/include/uapi/linux/ipv6.h
>> +++ b/include/uapi/linux/ipv6.h
>> @@ -189,6 +189,8 @@ enum {
>>         DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>>         DEVCONF_NDISC_TCLASS,
>>         DEVCONF_RPL_SEG_ENABLED,
>> +       DEVCONF_IOAM6_ENABLED,
>> +       DEVCONF_IOAM6_ID,
>>         DEVCONF_MAX
>>  };
>>
>> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
>> index cf7b47bdb9b3..b7ef10d417d6 100644
>> --- a/net/ipv6/Makefile
>> +++ b/net/ipv6/Makefile
>> @@ -10,7 +10,7 @@ ipv6-objs :=  af_inet6.o anycast.o ip6_output.o ip6_input.o
>> addrconf.o \
>>                 route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
>>                 raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
>>                 exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
>> -               udp_offload.o seg6.o fib6_notifier.o rpl.o
>> +               udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>>
>>  ipv6-offload :=        ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 840bfdb3d7bd..6c952a28ade2 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>>         .disable_policy         = 0,
>>         .rpl_seg_enabled        = 0,
>> +       .ioam6_enabled          = 0,
>> +       .ioam6_id               = 0,
>>  };
>>
>>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
>> {
>>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>>         .disable_policy         = 0,
>>         .rpl_seg_enabled        = 0,
>> +       .ioam6_enabled          = 0,
>> +       .ioam6_id               = 0,
>>  };
>>
>>  /* Check if link is ready: is it up and is a valid qdisc available */
>> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
>> *cnf,
>>         array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
>>         array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
>>         array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
>> +       array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
>> +       array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
>>  }
>>
>>  static inline size_t inet6_ifla6_size(void)
>> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
>>                 .mode           = 0644,
>>                 .proc_handler   = proc_dointvec,
>>         },
>> +       {
>> +               .procname       = "ioam6_enabled",
>> +               .data           = &ipv6_devconf.ioam6_enabled,
>> +               .maxlen         = sizeof(int),
>> +               .mode           = 0644,
>> +               .proc_handler   = proc_dointvec,
>> +       },
>> +       {
>> +               .procname       = "ioam6_id",
>> +               .data           = &ipv6_devconf.ioam6_id,
>> +               .maxlen         = sizeof(int),
>> +               .mode           = 0644,
>> +               .proc_handler   = proc_dointvec,
>> +       },
>>         {
>>                 /* sentinel */
>>         }
>> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
>> index b304b882e031..63a9ffc4b283 100644
>> --- a/net/ipv6/af_inet6.c
>> +++ b/net/ipv6/af_inet6.c
>> @@ -62,6 +62,7 @@
>>  #include <net/rpl.h>
>>  #include <net/compat.h>
>>  #include <net/xfrm.h>
>> +#include <net/ioam6.h>
>>
>>  #include <linux/uaccess.h>
>>  #include <linux/mroute6.h>
>> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
>>         if (err)
>>                 goto rpl_fail;
>>
>> +       err = ioam6_init();
>> +       if (err)
>> +               goto ioam6_fail;
>> +
>>         err = igmp6_late_init();
>>         if (err)
>>                 goto igmp6_late_err;
>> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
>>  #endif
>>  igmp6_late_err:
>>         rpl_exit();
>> +ioam6_fail:
>> +       ioam6_exit();
>>  rpl_fail:
>>         seg6_exit();
>>  seg6_fail:
>> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> index f27ab3bf2e0c..00aee1358f1c 100644
>> --- a/net/ipv6/exthdrs.c
>> +++ b/net/ipv6/exthdrs.c
>> @@ -49,6 +49,8 @@
>>  #include <net/seg6_hmac.h>
>>  #endif
>>  #include <net/rpl.h>
>> +#include <net/ioam6.h>
>> +#include <net/dst_metadata.h>
>>
>>  #include <linux/uaccess.h>
>>
>> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>>         return TLV_REJECT;
>>  }
>>
>> +/* IOAM */
>> +
>> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
>> +{
>> +       struct ioam6_common_hdr *ioamh;
>> +       struct ioam6_namespace *ns;
>> +
>> +       /* Must be 4n-aligned */
>> +       if (optoff & 3)
>> +               goto drop;
>> +
>> +       if (!skb_valid_dst(skb))
>> +               ip6_route_input(skb);
>> +
>> +       /* IOAM must be enabled on ingress interface */
>> +       if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
>> +               goto drop;
>> +
>> +       ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
>> +       ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
>> +
>> +       /* Unknown IOAM namespace, either:
>> +        *  - Drop it if IOAM is not enabled on egress interface (if any)
>> +        *  - Ignore it otherwise
>> +        */
>> +       if (!ns) {
>> +               if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> +                   !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> +                       goto drop;
>> +
>> +               goto accept;
>> +       }
>> +
>> +       if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> +               goto remove;
>> +
>> +       /* Known IOAM namespace which must not be removed:
>> +        * IOAM must be enabled on egress interface
>> +        */
>> +       if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> +           !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> +               goto drop;
>> +
>> +       switch (ioamh->ioam_type) {
>> +       case IOAM6_OPT_TRACE_PREALLOC:
>> +               ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
>> +               IP6CB(skb)->flags |= IP6SKB_IOAM;
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +
>> +accept:
>> +       return TLV_ACCEPT;
>> +remove:
>> +       return TLV_REMOVE;
>> +drop:
>> +       kfree_skb(skb);
>> +       return TLV_REJECT;
>> +}
>> +
>>  /* Jumbo payload */
>>
>>  static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
>>                 .type   = IPV6_TLV_ROUTERALERT,
>>                 .func   = ipv6_hop_ra,
>>         },
>> +       {
>> +               .type   = IPV6_TLV_IOAM_HOPOPTS,
>> +               .func   = ipv6_hop_ioam,
>> +       },
>>         {
>>                 .type   = IPV6_TLV_JUMBO,
>>                 .func   = ipv6_hop_jumbo,
>> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
>> new file mode 100644
>> index 000000000000..406aa78eb504
>> --- /dev/null
>> +++ b/net/ipv6/ioam6.c
>> @@ -0,0 +1,326 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + *  IOAM IPv6 implementation
>> + *
>> + *  Author:
>> + *  Justin Iurman <justin.iurman@uliege.be>
>> + */
>> +
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/kernel.h>
>> +#include <linux/net.h>
>> +#include <linux/rhashtable.h>
>> +
>> +#include <net/addrconf.h>
>> +#include <net/ioam6.h>
>> +
>> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
>> +{
>> +       kfree_rcu(ns, rcu);
>> +}
>> +
>> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
>> +{
>> +       kfree_rcu(sc, rcu);
>> +}
>> +
>> +static void ioam6_free_ns(void *ptr, void *arg)
>> +{
>> +       struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
>> +
>> +       if (ns)
>> +               ioam6_ns_release(ns);
>> +}
>> +
>> +static void ioam6_free_sc(void *ptr, void *arg)
>> +{
>> +       struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
>> +
>> +       if (sc)
>> +               ioam6_sc_release(sc);
>> +}
>> +
>> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> +{
>> +       const struct ioam6_namespace *ns = obj;
>> +
>> +       return (ns->id != *(__be16 *)arg->key);
>> +}
>> +
>> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> +{
>> +       const struct ioam6_schema *sc = obj;
>> +
>> +       return (sc->id != *(u32 *)arg->key);
>> +}
>> +
>> +static const struct rhashtable_params rht_ns_params = {
>> +       .key_len                = sizeof(__be16),
>> +       .key_offset             = offsetof(struct ioam6_namespace, id),
>> +       .head_offset            = offsetof(struct ioam6_namespace, head),
>> +       .automatic_shrinking    = true,
>> +       .obj_cmpfn              = ioam6_ns_cmpfn,
>> +};
>> +
>> +static const struct rhashtable_params rht_sc_params = {
>> +       .key_len                = sizeof(u32),
>> +       .key_offset             = offsetof(struct ioam6_schema, id),
>> +       .head_offset            = offsetof(struct ioam6_schema, head),
>> +       .automatic_shrinking    = true,
>> +       .obj_cmpfn              = ioam6_sc_cmpfn,
>> +};
>> +
>> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
>> +{
>> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> +
>> +       return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
>> +}
>> +
>> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
>> +                               u32 trace_type, struct ioam6_namespace *ns)
>> +{
>> +       u8 *data = skb_network_header(skb) + nodeoff;
>> +       struct __kernel_sock_timeval ts;
>> +       u64 raw_u64;
>> +       u32 raw_u32;
>> +       u16 raw_u16;
>> +       u8 byte;
>> +
>> +       /* hop_lim and node_id */
>> +       if (trace_type & IOAM6_TRACE_TYPE0) {
>> +               byte = ipv6_hdr(skb)->hop_limit - 1;
>> +               raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> +               if (!raw_u32)
>> +                       raw_u32 = IOAM6_EMPTY_FIELD_u24;
>> +               else
>> +                       raw_u32 &= IOAM6_EMPTY_FIELD_u24;
>> +               *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* ingress_if_id and egress_if_id */
>> +       if (trace_type & IOAM6_TRACE_TYPE1) {
>> +               raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> +               if (!raw_u16)
>> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> +               *(__be16 *)data = cpu_to_be16(raw_u16);
>> +               data += sizeof(__be16);
>> +
>> +               raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> +               if (!raw_u16)
>> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> +               *(__be16 *)data = cpu_to_be16(raw_u16);
>> +               data += sizeof(__be16);
>> +       }
>> +
>> +       /* timestamp seconds */
>> +       if (trace_type & IOAM6_TRACE_TYPE2) {
>> +               if (!skb->tstamp) {
>> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> +               } else {
>> +                       skb_get_new_timestamp(skb, &ts);
>> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
>> +               }
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* timestamp subseconds */
>> +       if (trace_type & IOAM6_TRACE_TYPE3) {
>> +               if (!skb->tstamp) {
>> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> +               } else {
>> +                       if (!(trace_type & IOAM6_TRACE_TYPE2))
>> +                               skb_get_new_timestamp(skb, &ts);
>> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
>> +               }
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* transit delay */
>> +       if (trace_type & IOAM6_TRACE_TYPE4) {
>> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* namespace data */
>> +       if (trace_type & IOAM6_TRACE_TYPE5) {
>> +               *(__be32 *)data = (__be32)ns->data;
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* queue depth */
>> +       if (trace_type & IOAM6_TRACE_TYPE6) {
>> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* hop_lim and node_id (wide) */
>> +       if (trace_type & IOAM6_TRACE_TYPE7) {
>> +               byte = ipv6_hdr(skb)->hop_limit - 1;
>> +               raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> +               if (!raw_u64)
>> +                       raw_u64 = IOAM6_EMPTY_FIELD_u56;
>> +               else
>> +                       raw_u64 &= IOAM6_EMPTY_FIELD_u56;
>> +               *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
>> +               data += sizeof(__be64);
>> +       }
>> +
>> +       /* ingress_if_id and egress_if_id (wide) */
>> +       if (trace_type & IOAM6_TRACE_TYPE8) {
>> +               raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> +               if (!raw_u32)
>> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> +               *(__be32 *)data = cpu_to_be32(raw_u32);
>> +               data += sizeof(__be32);
>> +
>> +               raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> +               if (!raw_u32)
>> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> +               *(__be32 *)data = cpu_to_be32(raw_u32);
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* namespace data (wide) */
>> +       if (trace_type & IOAM6_TRACE_TYPE9) {
>> +               *(__be64 *)data = ns->data;
>> +               data += sizeof(__be64);
>> +       }
>> +
>> +       /* buffer occupancy */
>> +       if (trace_type & IOAM6_TRACE_TYPE10) {
>> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* checksum complement */
>> +       if (trace_type & IOAM6_TRACE_TYPE11) {
>> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> +               data += sizeof(__be32);
>> +       }
>> +
>> +       /* opaque state snapshot */
>> +       if (trace_type & IOAM6_TRACE_TYPE22) {
>> +               if (!ns->schema) {
>> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
>> +               } else {
>> +                       *(__be32 *)data = ns->schema->hdr;
>> +                       data += sizeof(__be32);
>> +                       memcpy(data, ns->schema->data, ns->schema->len);
>> +               }
>> +       }
>> +}
>> +
>> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> +                          struct ioam6_namespace *ns)
>> +{
>> +       u8 nodelen, flags, remlen, sclen = 0;
>> +       struct ioam6_trace_hdr *trh;
>> +       int nodeoff;
>> +       u16 info;
>> +       u32 type;
>> +
>> +       trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
>> +       info = be16_to_cpu(trh->info);
>> +       type = be32_to_cpu(trh->type);
>> +
>> +       nodelen = info >> 11;
>> +       flags = (info >> 7) & 0xf;
>> +       remlen = info & 0x7f;
>> +
>> +       /* Skip if Overflow bit is set OR
>> +        * if an unknown type (bit 12-21) is set
>> +        */
>> +       if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
>> +               return;
>> +
>> +       /* NodeLen does not include Opaque State Snapshot length. We need to
>> +        * take it into account if the corresponding bit is set and if current
>> +        * IOAM namespace has an active schema attached to it
>> +        */
>> +       if (type & IOAM6_TRACE_TYPE22) {
>> +               /* Opaque State Snapshot header size */
>> +               sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
>> +
>> +               if (ns->schema)
>> +                       sclen += ns->schema->len / 4;
>> +       }
>> +
>> +       /* Not enough space remaining: set Overflow bit and skip */
>> +       if (!remlen || remlen < (nodelen + sclen)) {
>> +               info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
>> +               trh->info = cpu_to_be16(info);
>> +               return;
>> +       }
>> +
>> +       nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
>> +       ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
>> +
>> +       /* Update RemainingLen */
>> +       remlen -= nodelen + sclen;
>> +       info = (info & 0xff80) | remlen;
>> +       trh->info = cpu_to_be16(info);
>> +}
>> +
>> +static int __net_init ioam6_net_init(struct net *net)
>> +{
>> +       struct ioam6_pernet_data *nsdata;
>> +       int err = -ENOMEM;
>> +
>> +       nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
>> +       if (!nsdata)
>> +               goto out;
>> +
>> +       mutex_init(&nsdata->lock);
>> +       net->ipv6.ioam6_data = nsdata;
>> +
>> +       err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
>> +       if (err)
>> +               goto free_nsdata;
>> +
>> +       err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
>> +       if (err)
>> +               goto free_rht_ns;
>> +
>> +out:
>> +       return err;
>> +free_rht_ns:
>> +       rhashtable_destroy(&nsdata->namespaces);
>> +free_nsdata:
>> +       kfree(nsdata);
>> +       net->ipv6.ioam6_data = NULL;
>> +       goto out;
>> +}
>> +
>> +static void __net_exit ioam6_net_exit(struct net *net)
>> +{
>> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> +
>> +       rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
>> +       rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
>> +
>> +       kfree(nsdata);
>> +}
>> +
>> +static struct pernet_operations ioam6_net_ops = {
>> +       .init = ioam6_net_init,
>> +       .exit = ioam6_net_exit,
>> +};
>> +
>> +int __init ioam6_init(void)
>> +{
>> +       int err = register_pernet_subsys(&ioam6_net_ops);
>> +
>> +       if (err)
>> +               return err;
>> +
>> +       pr_info("In-situ OAM (IOAM) with IPv6\n");
>> +       return 0;
>> +}
>> +
>> +void ioam6_exit(void)
>> +{
>> +       unregister_pernet_subsys(&ioam6_net_ops);
>> +}
>> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
>> index fac2135aa47b..da49b33ab6fc 100644
>> --- a/net/ipv6/sysctl_net_ipv6.c
>> +++ b/net/ipv6/sysctl_net_ipv6.c
>> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
>>                 .mode           = 0644,
>>                 .proc_handler   = proc_dointvec
>>         },
>> +       {
>> +               .procname       = "ioam6_id",
>> +               .data           = &init_net.ipv6.sysctl.ioam6_id,
>> +               .maxlen         = sizeof(int),
>> +               .mode           = 0644,
>> +               .proc_handler   = proc_dointvec
>> +       },
>>         { }
>>  };
>>
>> --
>> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-25 18:23     ` Justin Iurman
@ 2020-06-25 20:32       ` Tom Herbert
  2020-06-26  8:13         ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 20:32 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Thu, Jun 25, 2020 at 11:23 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
> >>
> >
> > The IANA allocation is TEMPORARY, with an expiration date is
> > 4/16/2021. Note from RFC7120:
> >
> > "Implementers and deployers need to be aware that deprecation and
> > de-allocation could take place at any time after expiry; therefore, an
> > expired early allocation is best considered as deprecated."
> >
> > Please add a comment in the code and in the Documentation to this effect.
>
> I'll do that, thanks. What kind of comment (is there an official pattern?) and, where in the Documentation should I add it?
>
> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> >> packets. Default is drop.
> >
> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> > packet containing the IOAM HBH option . Note that the act bits of the
>
> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets containing the IOAM HBH option.
>
> > option type are 00 which means the TLV is skipped if the option isn't
> > processed soI don't think it's correct to drop these packets by
> > default.
>
> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for this option, I do believe it should be disabled (dropped) by default for nodes that "speak IOAM". Indeed, you don't want anyone with a kernel that includes IOAM to accept IOAM packets by default, which would mean that anyone would create (potentially without being aware) an IOAM domain. And, also, to avoid spreading leaks.
>
I think you're convoluting whether a node processes an IOAM or whether
it needs to drop because it doesn't process. Yes, on a IOAM system it
makes sense to allow configuration at whether to process the TLV.
However, even when it doesn't then the TLV should be skipped and the
packet not dropped. We know this is the correct behavior since on a
system that isn't IOAM aware, i.e. all deployed nodes right now, they
will skip the TLV per the act bits. If we want to change the default
behavior, the only way to do that is to change the act bits to
non-zero.

For the leakage problem, that is a firewall issue. The expectation is
that border devices will have rules that prevent leaking packets out
of their domain. This is an orthogonal mechanism that needs to be done
for other protocols-- SRH for instance. The filtering is simple, just
drop the packet when TLV matches (although I suspect most sites
probably just drop packets with EH at this point). This doesn't
require any changes to the implementation and doesn't require that
border devices even implement IOAM-- they just drop on pattern
matching.


Tom
> Justin
>
> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
> >> (unique) identifier of the interface.
> >>
> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> >> identifier of the node.
> >>
> >> Two relativistic hash tables: one for IOAM namespaces, the other for
> >> IOAM schemas. A namespace can only have a single active schema and a
> >> schema can only be attached to a single namespace (1:1 relationship).
> >>
> >>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >>   [3]
> >>   https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
> >>
> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> ---
> >>  include/linux/ipv6.h       |   2 +
> >>  include/net/ioam6.h        |  98 +++++++++++
> >>  include/net/netns/ipv6.h   |   2 +
> >>  include/uapi/linux/in6.h   |   1 +
> >>  include/uapi/linux/ipv6.h  |   2 +
> >>  net/ipv6/Makefile          |   2 +-
> >>  net/ipv6/addrconf.c        |  20 +++
> >>  net/ipv6/af_inet6.c        |   7 +
> >>  net/ipv6/exthdrs.c         |  67 ++++++++
> >>  net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
> >>  net/ipv6/sysctl_net_ipv6.c |   7 +
> >>  11 files changed, 533 insertions(+), 1 deletion(-)
> >>  create mode 100644 include/net/ioam6.h
> >>  create mode 100644 net/ipv6/ioam6.c
> >>
> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> index 5312a718bc7a..15732f964c6e 100644
> >> --- a/include/linux/ipv6.h
> >> +++ b/include/linux/ipv6.h
> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> >>         __s32           disable_policy;
> >>         __s32           ndisc_tclass;
> >>         __s32           rpl_seg_enabled;
> >> +       __u32           ioam6_enabled;
> >> +       __u32           ioam6_id;
> >>
> >>         struct ctl_table_header *sysctl_header;
> >>  };
> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> >> new file mode 100644
> >> index 000000000000..2a910bc99947
> >> --- /dev/null
> >> +++ b/include/net/ioam6.h
> >> @@ -0,0 +1,98 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> +/*
> >> + *  IOAM IPv6 implementation
> >> + *
> >> + *  Author:
> >> + *  Justin Iurman <justin.iurman@uliege.be>
> >> + */
> >> +
> >> +#ifndef _NET_IOAM6_H
> >> +#define _NET_IOAM6_H
> >> +
> >> +#include <linux/net.h>
> >> +#include <linux/ipv6.h>
> >> +#include <linux/rhashtable-types.h>
> >> +
> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
> >> +
> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> >> +
> >> +#define IOAM6_TRACE_TYPE0  (1 << 31)
> >> +#define IOAM6_TRACE_TYPE1  (1 << 30)
> >> +#define IOAM6_TRACE_TYPE2  (1 << 29)
> >> +#define IOAM6_TRACE_TYPE3  (1 << 28)
> >> +#define IOAM6_TRACE_TYPE4  (1 << 27)
> >> +#define IOAM6_TRACE_TYPE5  (1 << 26)
> >> +#define IOAM6_TRACE_TYPE6  (1 << 25)
> >> +#define IOAM6_TRACE_TYPE7  (1 << 24)
> >> +#define IOAM6_TRACE_TYPE8  (1 << 23)
> >> +#define IOAM6_TRACE_TYPE9  (1 << 22)
> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> >> +
> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> >> +
> >> +struct ioam6_common_hdr {
> >> +       u8 opt_type;
> >> +       u8 opt_len;
> >> +       u8 res;
> >> +       u8 ioam_type;
> >> +       __be16 namespace_id;
> >> +} __packed;
> >> +
> >> +struct ioam6_trace_hdr {
> >> +       __be16 info;
> >> +       __be32 type;
> >> +} __packed;
> >> +
> >> +struct ioam6_namespace {
> >> +       struct rhash_head head;
> >> +       struct rcu_head rcu;
> >> +
> >> +       __be16 id;
> >> +       __be64 data;
> >> +       bool remove_tlv;
> >> +
> >> +       struct ioam6_schema *schema;
> >> +};
> >> +
> >> +struct ioam6_schema {
> >> +       struct rhash_head head;
> >> +       struct rcu_head rcu;
> >> +
> >> +       u32 id;
> >> +       int len;
> >> +       __be32 hdr;
> >> +       u8 *data;
> >> +
> >> +       struct ioam6_namespace *ns;
> >> +};
> >> +
> >> +struct ioam6_pernet_data {
> >> +       struct mutex lock;
> >> +       struct rhashtable namespaces;
> >> +       struct rhashtable schemas;
> >> +};
> >> +
> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> >> +{
> >> +#if IS_ENABLED(CONFIG_IPV6)
> >> +       return net->ipv6.ioam6_data;
> >> +#else
> >> +       return NULL;
> >> +#endif
> >> +}
> >> +
> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> +                                 struct ioam6_namespace *ns);
> >> +
> >> +extern int ioam6_init(void);
> >> +extern void ioam6_exit(void);
> >> +
> >> +#endif
> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> >> index 5ec054473d81..89b27fa721f4 100644
> >> --- a/include/net/netns/ipv6.h
> >> +++ b/include/net/netns/ipv6.h
> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> >>         int max_hbh_opts_len;
> >>         int seg6_flowlabel;
> >>         bool skip_notify_on_dev_down;
> >> +       unsigned int ioam6_id;
> >>  };
> >>
> >>  struct netns_ipv6 {
> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> >>                 spinlock_t      lock;
> >>                 u32             seq;
> >>         } ip6addrlbl_table;
> >> +       struct ioam6_pernet_data *ioam6_data;
> >>  };
> >>
> >>  #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> >> index 9f2273a08356..1c98435220c9 100644
> >> --- a/include/uapi/linux/in6.h
> >> +++ b/include/uapi/linux/in6.h
> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> >>  #define IPV6_TLV_PADN          1
> >>  #define IPV6_TLV_ROUTERALERT   5
> >>  #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
> >> +#define IPV6_TLV_IOAM_HOPOPTS  49
> >
> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> > Note from RFC7120:
> >
> > "Implementers and deployers need to be aware that deprecation and
> > de-allocation could take place at any time after expiry; therefore, an
> > expired early allocation is best considered as deprecated. It is not
> > IANA's responsibility to track the status of allocations, their
> > expirations, or when they may be re-allocated."
> >
> > The expiration date is Please add a comment here and in the
> > Documentation to this effect.
> >
> >>  #define IPV6_TLV_JUMBO         194
> >>  #define IPV6_TLV_HAO           201     /* home address option */
> >>
> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> >> index 13e8751bf24a..eb521b2dd885 100644
> >> --- a/include/uapi/linux/ipv6.h
> >> +++ b/include/uapi/linux/ipv6.h
> >> @@ -189,6 +189,8 @@ enum {
> >>         DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> >>         DEVCONF_NDISC_TCLASS,
> >>         DEVCONF_RPL_SEG_ENABLED,
> >> +       DEVCONF_IOAM6_ENABLED,
> >> +       DEVCONF_IOAM6_ID,
> >>         DEVCONF_MAX
> >>  };
> >>
> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> >> index cf7b47bdb9b3..b7ef10d417d6 100644
> >> --- a/net/ipv6/Makefile
> >> +++ b/net/ipv6/Makefile
> >> @@ -10,7 +10,7 @@ ipv6-objs :=  af_inet6.o anycast.o ip6_output.o ip6_input.o
> >> addrconf.o \
> >>                 route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> >>                 raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> >>                 exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> >> -               udp_offload.o seg6.o fib6_notifier.o rpl.o
> >> +               udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
> >>
> >>  ipv6-offload :=        ip6_offload.o tcpv6_offload.o exthdrs_offload.o
> >>
> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> index 840bfdb3d7bd..6c952a28ade2 100644
> >> --- a/net/ipv6/addrconf.c
> >> +++ b/net/ipv6/addrconf.c
> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> >>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
> >>         .disable_policy         = 0,
> >>         .rpl_seg_enabled        = 0,
> >> +       .ioam6_enabled          = 0,
> >> +       .ioam6_id               = 0,
> >>  };
> >>
> >>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
> >> {
> >>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
> >>         .disable_policy         = 0,
> >>         .rpl_seg_enabled        = 0,
> >> +       .ioam6_enabled          = 0,
> >> +       .ioam6_id               = 0,
> >>  };
> >>
> >>  /* Check if link is ready: is it up and is a valid qdisc available */
> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
> >> *cnf,
> >>         array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> >>         array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> >>         array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> >> +       array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> >> +       array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> >>  }
> >>
> >>  static inline size_t inet6_ifla6_size(void)
> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> >>                 .mode           = 0644,
> >>                 .proc_handler   = proc_dointvec,
> >>         },
> >> +       {
> >> +               .procname       = "ioam6_enabled",
> >> +               .data           = &ipv6_devconf.ioam6_enabled,
> >> +               .maxlen         = sizeof(int),
> >> +               .mode           = 0644,
> >> +               .proc_handler   = proc_dointvec,
> >> +       },
> >> +       {
> >> +               .procname       = "ioam6_id",
> >> +               .data           = &ipv6_devconf.ioam6_id,
> >> +               .maxlen         = sizeof(int),
> >> +               .mode           = 0644,
> >> +               .proc_handler   = proc_dointvec,
> >> +       },
> >>         {
> >>                 /* sentinel */
> >>         }
> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> >> index b304b882e031..63a9ffc4b283 100644
> >> --- a/net/ipv6/af_inet6.c
> >> +++ b/net/ipv6/af_inet6.c
> >> @@ -62,6 +62,7 @@
> >>  #include <net/rpl.h>
> >>  #include <net/compat.h>
> >>  #include <net/xfrm.h>
> >> +#include <net/ioam6.h>
> >>
> >>  #include <linux/uaccess.h>
> >>  #include <linux/mroute6.h>
> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> >>         if (err)
> >>                 goto rpl_fail;
> >>
> >> +       err = ioam6_init();
> >> +       if (err)
> >> +               goto ioam6_fail;
> >> +
> >>         err = igmp6_late_init();
> >>         if (err)
> >>                 goto igmp6_late_err;
> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> >>  #endif
> >>  igmp6_late_err:
> >>         rpl_exit();
> >> +ioam6_fail:
> >> +       ioam6_exit();
> >>  rpl_fail:
> >>         seg6_exit();
> >>  seg6_fail:
> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> index f27ab3bf2e0c..00aee1358f1c 100644
> >> --- a/net/ipv6/exthdrs.c
> >> +++ b/net/ipv6/exthdrs.c
> >> @@ -49,6 +49,8 @@
> >>  #include <net/seg6_hmac.h>
> >>  #endif
> >>  #include <net/rpl.h>
> >> +#include <net/ioam6.h>
> >> +#include <net/dst_metadata.h>
> >>
> >>  #include <linux/uaccess.h>
> >>
> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >>         return TLV_REJECT;
> >>  }
> >>
> >> +/* IOAM */
> >> +
> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> >> +{
> >> +       struct ioam6_common_hdr *ioamh;
> >> +       struct ioam6_namespace *ns;
> >> +
> >> +       /* Must be 4n-aligned */
> >> +       if (optoff & 3)
> >> +               goto drop;
> >> +
> >> +       if (!skb_valid_dst(skb))
> >> +               ip6_route_input(skb);
> >> +
> >> +       /* IOAM must be enabled on ingress interface */
> >> +       if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> >> +               goto drop;
> >> +
> >> +       ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> >> +       ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> >> +
> >> +       /* Unknown IOAM namespace, either:
> >> +        *  - Drop it if IOAM is not enabled on egress interface (if any)
> >> +        *  - Ignore it otherwise
> >> +        */
> >> +       if (!ns) {
> >> +               if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> +                   !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> +                       goto drop;
> >> +
> >> +               goto accept;
> >> +       }
> >> +
> >> +       if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> +               goto remove;
> >> +
> >> +       /* Known IOAM namespace which must not be removed:
> >> +        * IOAM must be enabled on egress interface
> >> +        */
> >> +       if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> +           !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> +               goto drop;
> >> +
> >> +       switch (ioamh->ioam_type) {
> >> +       case IOAM6_OPT_TRACE_PREALLOC:
> >> +               ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> >> +               IP6CB(skb)->flags |= IP6SKB_IOAM;
> >> +               break;
> >> +       default:
> >> +               break;
> >> +       }
> >> +
> >> +accept:
> >> +       return TLV_ACCEPT;
> >> +remove:
> >> +       return TLV_REMOVE;
> >> +drop:
> >> +       kfree_skb(skb);
> >> +       return TLV_REJECT;
> >> +}
> >> +
> >>  /* Jumbo payload */
> >>
> >>  static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >>                 .type   = IPV6_TLV_ROUTERALERT,
> >>                 .func   = ipv6_hop_ra,
> >>         },
> >> +       {
> >> +               .type   = IPV6_TLV_IOAM_HOPOPTS,
> >> +               .func   = ipv6_hop_ioam,
> >> +       },
> >>         {
> >>                 .type   = IPV6_TLV_JUMBO,
> >>                 .func   = ipv6_hop_jumbo,
> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> >> new file mode 100644
> >> index 000000000000..406aa78eb504
> >> --- /dev/null
> >> +++ b/net/ipv6/ioam6.c
> >> @@ -0,0 +1,326 @@
> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> +/*
> >> + *  IOAM IPv6 implementation
> >> + *
> >> + *  Author:
> >> + *  Justin Iurman <justin.iurman@uliege.be>
> >> + */
> >> +
> >> +#include <linux/errno.h>
> >> +#include <linux/types.h>
> >> +#include <linux/kernel.h>
> >> +#include <linux/net.h>
> >> +#include <linux/rhashtable.h>
> >> +
> >> +#include <net/addrconf.h>
> >> +#include <net/ioam6.h>
> >> +
> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> >> +{
> >> +       kfree_rcu(ns, rcu);
> >> +}
> >> +
> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> >> +{
> >> +       kfree_rcu(sc, rcu);
> >> +}
> >> +
> >> +static void ioam6_free_ns(void *ptr, void *arg)
> >> +{
> >> +       struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> >> +
> >> +       if (ns)
> >> +               ioam6_ns_release(ns);
> >> +}
> >> +
> >> +static void ioam6_free_sc(void *ptr, void *arg)
> >> +{
> >> +       struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> >> +
> >> +       if (sc)
> >> +               ioam6_sc_release(sc);
> >> +}
> >> +
> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> +{
> >> +       const struct ioam6_namespace *ns = obj;
> >> +
> >> +       return (ns->id != *(__be16 *)arg->key);
> >> +}
> >> +
> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> +{
> >> +       const struct ioam6_schema *sc = obj;
> >> +
> >> +       return (sc->id != *(u32 *)arg->key);
> >> +}
> >> +
> >> +static const struct rhashtable_params rht_ns_params = {
> >> +       .key_len                = sizeof(__be16),
> >> +       .key_offset             = offsetof(struct ioam6_namespace, id),
> >> +       .head_offset            = offsetof(struct ioam6_namespace, head),
> >> +       .automatic_shrinking    = true,
> >> +       .obj_cmpfn              = ioam6_ns_cmpfn,
> >> +};
> >> +
> >> +static const struct rhashtable_params rht_sc_params = {
> >> +       .key_len                = sizeof(u32),
> >> +       .key_offset             = offsetof(struct ioam6_schema, id),
> >> +       .head_offset            = offsetof(struct ioam6_schema, head),
> >> +       .automatic_shrinking    = true,
> >> +       .obj_cmpfn              = ioam6_sc_cmpfn,
> >> +};
> >> +
> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> >> +{
> >> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> +
> >> +       return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> >> +}
> >> +
> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> >> +                               u32 trace_type, struct ioam6_namespace *ns)
> >> +{
> >> +       u8 *data = skb_network_header(skb) + nodeoff;
> >> +       struct __kernel_sock_timeval ts;
> >> +       u64 raw_u64;
> >> +       u32 raw_u32;
> >> +       u16 raw_u16;
> >> +       u8 byte;
> >> +
> >> +       /* hop_lim and node_id */
> >> +       if (trace_type & IOAM6_TRACE_TYPE0) {
> >> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> >> +               raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> +               if (!raw_u32)
> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u24;
> >> +               else
> >> +                       raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> >> +               *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* ingress_if_id and egress_if_id */
> >> +       if (trace_type & IOAM6_TRACE_TYPE1) {
> >> +               raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> +               if (!raw_u16)
> >> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> >> +               data += sizeof(__be16);
> >> +
> >> +               raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> +               if (!raw_u16)
> >> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> >> +               data += sizeof(__be16);
> >> +       }
> >> +
> >> +       /* timestamp seconds */
> >> +       if (trace_type & IOAM6_TRACE_TYPE2) {
> >> +               if (!skb->tstamp) {
> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> +               } else {
> >> +                       skb_get_new_timestamp(skb, &ts);
> >> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> >> +               }
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* timestamp subseconds */
> >> +       if (trace_type & IOAM6_TRACE_TYPE3) {
> >> +               if (!skb->tstamp) {
> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> +               } else {
> >> +                       if (!(trace_type & IOAM6_TRACE_TYPE2))
> >> +                               skb_get_new_timestamp(skb, &ts);
> >> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> >> +               }
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* transit delay */
> >> +       if (trace_type & IOAM6_TRACE_TYPE4) {
> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* namespace data */
> >> +       if (trace_type & IOAM6_TRACE_TYPE5) {
> >> +               *(__be32 *)data = (__be32)ns->data;
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* queue depth */
> >> +       if (trace_type & IOAM6_TRACE_TYPE6) {
> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* hop_lim and node_id (wide) */
> >> +       if (trace_type & IOAM6_TRACE_TYPE7) {
> >> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> >> +               raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> +               if (!raw_u64)
> >> +                       raw_u64 = IOAM6_EMPTY_FIELD_u56;
> >> +               else
> >> +                       raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> >> +               *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> >> +               data += sizeof(__be64);
> >> +       }
> >> +
> >> +       /* ingress_if_id and egress_if_id (wide) */
> >> +       if (trace_type & IOAM6_TRACE_TYPE8) {
> >> +               raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> +               if (!raw_u32)
> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> >> +               data += sizeof(__be32);
> >> +
> >> +               raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> +               if (!raw_u32)
> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* namespace data (wide) */
> >> +       if (trace_type & IOAM6_TRACE_TYPE9) {
> >> +               *(__be64 *)data = ns->data;
> >> +               data += sizeof(__be64);
> >> +       }
> >> +
> >> +       /* buffer occupancy */
> >> +       if (trace_type & IOAM6_TRACE_TYPE10) {
> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* checksum complement */
> >> +       if (trace_type & IOAM6_TRACE_TYPE11) {
> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> +               data += sizeof(__be32);
> >> +       }
> >> +
> >> +       /* opaque state snapshot */
> >> +       if (trace_type & IOAM6_TRACE_TYPE22) {
> >> +               if (!ns->schema) {
> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> >> +               } else {
> >> +                       *(__be32 *)data = ns->schema->hdr;
> >> +                       data += sizeof(__be32);
> >> +                       memcpy(data, ns->schema->data, ns->schema->len);
> >> +               }
> >> +       }
> >> +}
> >> +
> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> +                          struct ioam6_namespace *ns)
> >> +{
> >> +       u8 nodelen, flags, remlen, sclen = 0;
> >> +       struct ioam6_trace_hdr *trh;
> >> +       int nodeoff;
> >> +       u16 info;
> >> +       u32 type;
> >> +
> >> +       trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> >> +       info = be16_to_cpu(trh->info);
> >> +       type = be32_to_cpu(trh->type);
> >> +
> >> +       nodelen = info >> 11;
> >> +       flags = (info >> 7) & 0xf;
> >> +       remlen = info & 0x7f;
> >> +
> >> +       /* Skip if Overflow bit is set OR
> >> +        * if an unknown type (bit 12-21) is set
> >> +        */
> >> +       if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> >> +               return;
> >> +
> >> +       /* NodeLen does not include Opaque State Snapshot length. We need to
> >> +        * take it into account if the corresponding bit is set and if current
> >> +        * IOAM namespace has an active schema attached to it
> >> +        */
> >> +       if (type & IOAM6_TRACE_TYPE22) {
> >> +               /* Opaque State Snapshot header size */
> >> +               sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> >> +
> >> +               if (ns->schema)
> >> +                       sclen += ns->schema->len / 4;
> >> +       }
> >> +
> >> +       /* Not enough space remaining: set Overflow bit and skip */
> >> +       if (!remlen || remlen < (nodelen + sclen)) {
> >> +               info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> >> +               trh->info = cpu_to_be16(info);
> >> +               return;
> >> +       }
> >> +
> >> +       nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> >> +       ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> >> +
> >> +       /* Update RemainingLen */
> >> +       remlen -= nodelen + sclen;
> >> +       info = (info & 0xff80) | remlen;
> >> +       trh->info = cpu_to_be16(info);
> >> +}
> >> +
> >> +static int __net_init ioam6_net_init(struct net *net)
> >> +{
> >> +       struct ioam6_pernet_data *nsdata;
> >> +       int err = -ENOMEM;
> >> +
> >> +       nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> >> +       if (!nsdata)
> >> +               goto out;
> >> +
> >> +       mutex_init(&nsdata->lock);
> >> +       net->ipv6.ioam6_data = nsdata;
> >> +
> >> +       err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> >> +       if (err)
> >> +               goto free_nsdata;
> >> +
> >> +       err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> >> +       if (err)
> >> +               goto free_rht_ns;
> >> +
> >> +out:
> >> +       return err;
> >> +free_rht_ns:
> >> +       rhashtable_destroy(&nsdata->namespaces);
> >> +free_nsdata:
> >> +       kfree(nsdata);
> >> +       net->ipv6.ioam6_data = NULL;
> >> +       goto out;
> >> +}
> >> +
> >> +static void __net_exit ioam6_net_exit(struct net *net)
> >> +{
> >> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> +
> >> +       rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> >> +       rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> >> +
> >> +       kfree(nsdata);
> >> +}
> >> +
> >> +static struct pernet_operations ioam6_net_ops = {
> >> +       .init = ioam6_net_init,
> >> +       .exit = ioam6_net_exit,
> >> +};
> >> +
> >> +int __init ioam6_init(void)
> >> +{
> >> +       int err = register_pernet_subsys(&ioam6_net_ops);
> >> +
> >> +       if (err)
> >> +               return err;
> >> +
> >> +       pr_info("In-situ OAM (IOAM) with IPv6\n");
> >> +       return 0;
> >> +}
> >> +
> >> +void ioam6_exit(void)
> >> +{
> >> +       unregister_pernet_subsys(&ioam6_net_ops);
> >> +}
> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> >> index fac2135aa47b..da49b33ab6fc 100644
> >> --- a/net/ipv6/sysctl_net_ipv6.c
> >> +++ b/net/ipv6/sysctl_net_ipv6.c
> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> >>                 .mode           = 0644,
> >>                 .proc_handler   = proc_dointvec
> >>         },
> >> +       {
> >> +               .procname       = "ioam6_id",
> >> +               .data           = &init_net.ipv6.sysctl.ioam6_id,
> >> +               .maxlen         = sizeof(int),
> >> +               .mode           = 0644,
> >> +               .proc_handler   = proc_dointvec
> >> +       },
> >>         { }
> >>  };
> >>
> >> --
> >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-25 17:47     ` Justin Iurman
@ 2020-06-25 20:53       ` Tom Herbert
  2020-06-26  8:22         ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 20:53 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Thu, Jun 25, 2020 at 10:47 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Hi Tom,
>
> >> Add the possibility to remove one or more consecutive TLVs without
> >> messing up the alignment of others. For now, only IOAM requires this
> >> behavior.
> >>
> > Hi Justin,
> >
> > Can you explain the motivation for this? Per RFC8200, extension
> > headers in flight are not to be added, removed, or modified outside of
> > the standard rules for processing modifiable HBH and DO TLVs., that
> > would include adding and removing TLVs in EH. One obvious problem this
>
> As you already know from our last meeting, IOAM may be configured on a node such that a specific IOAM namespace should be removed. Therefore, this patch provides support for the deletion of a TLV (or consecutive TLVs), without removing the entire EH (if it's empty, there will be padding). Note that there is a similar "problem" with the Incremental Trace where you'd need to expand the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is against modification of in-flight EHs, but there are several reasons that, I believe, mitigates this statement.
>
> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely deployed on the Internet. We can distinguish two big scenarios: (i) in-transit traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the domain, ie from an IOAM node inside the domain to another one (no need for encapsulation). In both cases, we kind of own the traffic: (i) encapsulation, so we modify "our" header and (ii) we already own the traffic.
>
> And if someone is still angry about this, well, the good news is that such modification can be avoided most of the time. Indeed, operators are advised to remove an IOAM namespace only on egress nodes. This way, the destination (either the tunnel destination or the real destination, depending on the scenario) will receive EHs and take care of them without the need to remove anything. But, again, operators can do what they want and I'd tend to adhere to David's philosophy [1] and give them the possibility to choose what to do.
>

Justin,

6man WG has had a _long_ and sometimes bitter discussion around this
particularly with regards to insertion of SRH. The current consensus
of IETF is that it is a violation of RFC8200.  We've heard all the
arguments that it's only for limited domains and narrow use cases,
nevertheless there are several problems that the header
insertion/deletion advocates never answered-- it breaks AH, it breaks
PMTU discovery, it breaks ICMP. There is also a risk that a
non-standard modification could cause a packet to be dropped
downstream from the node that modifies it. There is no attribution on
who created the problem, and hence this can lead to systematic
blackholes which are the most miserable sort of problem to debug.
Fundamentally, it is not robust per Postel's law (I actually wrote a
draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
you're interested).

IMO, we shouldn't be using Linux as a backdoor to implement protocol
that IETF is saying isn't robust. Can you point out in the IOAM drafts
where this requirement is specified, then I can take it up in IOAM WG
or 6man if needed...

Tom

> > creates is that it breaks AH if the TLVs are removed in HBH before AH
> > is processed (AH is processed after HBH).
>
> Correct. But I don't think it should prevent us from having IOAM in the kernel. Again, operators could simply apply IOAM on a subset of the traffic that does not include AHs, for example.
>
> Justin
>
>   [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
>
> > Tom
> >> By default, an 8-octet boundary is automatically assumed. This is the
> >> price to pay (at most a useless 4-octet padding) to make sure everything
> >> is still aligned after the removal.
> >>
> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> >> header.
> >>
> >> Example 1:
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       X       |       X       |    Padding    |    Padding    |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |                                                               |
> >> ~                Option to be removed (8 octets)                ~
> >> |                                                               |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Y       |       Y       |       Y       |       Y       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |    Padding    |    Padding    |    Padding    |    Padding    |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> >> boundary (same result in both cases).
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       X       |       X       |    Padding    |    Padding    |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Y       |       Y       |       Y       |       Y       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |    Padding    |    Padding    |    Padding    |    Padding    |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Example 2:
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       X       |       X       |    Padding    |    Padding    |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |                Option to be removed (4 octets)                |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Y       |       Y       |       Y       |       Y       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> >> of 8 anymore.
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       X       |       X       |    Padding    |    Padding    |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Y       |       Y       |       Y       |       Y       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> |       Z       |       Z       |       Z       |       Z       |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Therefore, the largest (8-octet) boundary is assumed by default and for
> >> all, which means that blocks are only moved in multiples of 8. This
> >> assertion guarantees good alignment.
> >>
> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> ---
> >>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> >>  1 file changed, 108 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> index e9b366994475..f27ab3bf2e0c 100644
> >> --- a/net/ipv6/exthdrs.c
> >> +++ b/net/ipv6/exthdrs.c
> >> @@ -52,17 +52,27 @@
> >>
> >>  #include <linux/uaccess.h>
> >>
> >> -/*
> >> - *     Parsing tlv encoded headers.
> >> +/* States for TLV parsing functions. */
> >> +
> >> +enum {
> >> +       TLV_ACCEPT,
> >> +       TLV_REJECT,
> >> +       TLV_REMOVE,
> >> +       __TLV_MAX
> >> +};
> >> +
> >> +/* Parsing TLV encoded headers.
> >>   *
> >> - *     Parsing function "func" returns true, if parsing succeed
> >> - *     and false, if it failed.
> >> - *     It MUST NOT touch skb->h.
> >> + * Parsing function "func" returns either:
> >> + *  - TLV_ACCEPT if parsing succeeds
> >> + *  - TLV_REJECT if parsing fails
> >> + *  - TLV_REMOVE if TLV must be removed
> >> + * It MUST NOT touch skb->h.
> >>   */
> >>
> >>  struct tlvtype_proc {
> >>         int     type;
> >> -       bool    (*func)(struct sk_buff *skb, int offset);
> >> +       int     (*func)(struct sk_buff *skb, int offset);
> >>  };
> >>
> >>  /*********************
> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
> >> optoff,
> >>         return false;
> >>  }
> >>
> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> >> +
> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> >> +{
> >> +       int len = end - start;
> >> +       int padlen = len % 8;
> >> +       unsigned char *h;
> >> +       int rlen, off;
> >> +       u16 pl_len;
> >> +
> >> +       rlen = len - padlen;
> >> +       if (rlen) {
> >> +               skb_pull(skb, rlen);
> >> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> >> +                       start);
> >> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> >> +
> >> +               skb_reset_network_header(skb);
> >> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> >> +
> >> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> >> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> >> +
> >> +               skb_transport_header(skb)[1] -= rlen >> 3;
> >> +               end -= rlen;
> >> +       }
> >> +
> >> +       if (padlen) {
> >> +               off = end - padlen;
> >> +               h = skb_network_header(skb);
> >> +
> >> +               if (padlen == 1) {
> >> +                       h[off] = IPV6_TLV_PAD1;
> >> +               } else {
> >> +                       padlen -= 2;
> >> +
> >> +                       h[off] = IPV6_TLV_PADN;
> >> +                       h[off + 1] = padlen;
> >> +                       memset(&h[off + 2], 0, padlen);
> >> +               }
> >> +       }
> >> +
> >> +       return end;
> >> +}
> >> +
> >>  /* Parse tlv encoded option header (hop-by-hop or destination) */
> >>
> >>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >>                           struct sk_buff *skb,
> >> -                         int max_count)
> >> +                         int max_count,
> >> +                         bool removable)
> >>  {
> >>         int len = (skb_transport_header(skb)[1] + 1) << 3;
> >> -       const unsigned char *nh = skb_network_header(skb);
> >> +       unsigned char *nh = skb_network_header(skb);
> >>         int off = skb_network_header_len(skb);
> >>         const struct tlvtype_proc *curr;
> >>         bool disallow_unknowns = false;
> >> +       int off_remove = 0;
> >>         int tlv_count = 0;
> >>         int padlen = 0;
> >> +       int ret;
> >>
> >>         if (unlikely(max_count < 0)) {
> >>                 disallow_unknowns = true;
> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
> >> *procs,
> >>                         if (tlv_count > max_count)
> >>                                 goto bad;
> >>
> >> +                       ret = -1;
> >>                         for (curr = procs; curr->type >= 0; curr++) {
> >>                                 if (curr->type == nh[off]) {
> >>                                         /* type specific length/alignment
> >>                                            checks will be performed in the
> >>                                            func(). */
> >> -                                       if (curr->func(skb, off) == false)
> >> +                                       ret = curr->func(skb, off);
> >> +                                       if (ret == TLV_REJECT)
> >>                                                 return false;
> >>                                         break;
> >>                                 }
> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> >>                                 return false;
> >>
> >> +                       if (removable) {
> >> +                               if (ret == TLV_REMOVE) {
> >> +                                       if (!off_remove)
> >> +                                               off_remove = off - padlen;
> >> +                               } else if (off_remove) {
> >> +                                       off = remove_tlv(off_remove, off, skb);
> >> +                                       nh = skb_network_header(skb);
> >> +                                       off_remove = 0;
> >> +                               }
> >> +                       }
> >> +
> >>                         padlen = 0;
> >>                         break;
> >>                 }
> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >>                 len -= optlen;
> >>         }
> >>
> >> -       if (len == 0)
> >> +       if (len == 0) {
> >> +               /* Don't forget last TLV if it must be removed */
> >> +               if (off_remove)
> >> +                       remove_tlv(off_remove, off, skb);
> >> +
> >>                 return true;
> >> +       }
> >>  bad:
> >>         kfree_skb(skb);
> >>         return false;
> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >>   *****************************/
> >>
> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >>  {
> >>         struct ipv6_destopt_hao *hao;
> >>         struct inet6_skb_parm *opt = IP6CB(skb);
> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >>         if (skb->tstamp == 0)
> >>                 __net_timestamp(skb);
> >>
> >> -       return true;
> >> +       return TLV_ACCEPT;
> >>
> >>   discard:
> >>         kfree_skb(skb);
> >> -       return false;
> >> +       return TLV_REJECT;
> >>  }
> >>  #endif
> >>
> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> >>  #endif
> >>
> >>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> >> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> >> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
> >> +                         false)) {
> >>                 skb->transport_header += extlen;
> >>                 opt = IP6CB(skb);
> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
> >> *skb)
> >>
> >>  /* Router Alert as of RFC 2711 */
> >>
> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >>  {
> >>         const unsigned char *nh = skb_network_header(skb);
> >>
> >>         if (nh[optoff + 1] == 2) {
> >>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> >>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> >> -               return true;
> >> +               return TLV_ACCEPT;
> >>         }
> >>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> >>                             nh[optoff + 1]);
> >>         kfree_skb(skb);
> >> -       return false;
> >> +       return TLV_REJECT;
> >>  }
> >>
> >>  /* Jumbo payload */
> >>
> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >>  {
> >>         const unsigned char *nh = skb_network_header(skb);
> >>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> optoff)
> >>         if (pkt_len <= IPV6_MAXPLEN) {
> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> >> -               return false;
> >> +               return TLV_REJECT;
> >>         }
> >>         if (ipv6_hdr(skb)->payload_len) {
> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> >> -               return false;
> >> +               return TLV_REJECT;
> >>         }
> >>
> >>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> optoff)
> >>                 goto drop;
> >>
> >>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> >> -       return true;
> >> +       return TLV_ACCEPT;
> >>
> >>  drop:
> >>         kfree_skb(skb);
> >> -       return false;
> >> +       return TLV_REJECT;
> >>  }
> >>
> >>  /* CALIPSO RFC 5570 */
> >>
> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >>  {
> >>         const unsigned char *nh = skb_network_header(skb);
> >>
> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
> >> optoff)
> >>         if (!calipso_validate(skb, nh + optoff))
> >>                 goto drop;
> >>
> >> -       return true;
> >> +       return TLV_ACCEPT;
> >>
> >>  drop:
> >>         kfree_skb(skb);
> >> -       return false;
> >> +       return TLV_REJECT;
> >>  }
> >>
> >>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
> >>
> >>         opt->flags |= IP6SKB_HOPBYHOP;
> >>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> >> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> >> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
> >> +                         true)) {
> >> +               /* we need to refresh the length in case
> >> +                * at least one TLV was removed
> >> +                */
> >> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
> >>                 skb->transport_header += extlen;
> >>                 opt = IP6CB(skb);
> >>                 opt->nhoff = sizeof(struct ipv6hdr);
> >> --
> >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
  2020-06-25 17:56     ` Justin Iurman
@ 2020-06-26  0:48       ` Tom Herbert
  2020-06-26  8:31         ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-26  0:48 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Thu, Jun 25, 2020 at 10:56 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> >> Implement the IOAM egress behavior.
> >>
> >> According to RFC 8200:
> >> "Extension headers (except for the Hop-by-Hop Options header) are not
> >>  processed, inserted, or deleted by any node along a packet's delivery
> >>  path, until the packet reaches the node (or each of the set of nodes,
> >>  in the case of multicast) identified in the Destination Address field
> >>  of the IPv6 header."
> >>
> >> Therefore, an ingress node (an IOAM domain border) must encapsulate an
> >> incoming IPv6 packet with another similar IPv6 header that will contain
> >> IOAM data while it traverses the domain. When leaving, the egress node,
> >> another IOAM domain border which is also the tunnel destination, must
> >> decapsulate the packet.
> >
> > This is just IP in IP encapsulation that happens to be terminated at
> > an egress node of the IOAM domain. The fact that it's IOAM isn't
> > germaine, this IP in IP is done in a variety of ways. We should be
> > using the normal protocol handler for NEXTHDR_IPV6  instead of special
> > case code.
>
> Agree. The reason for this special case code is that I was not aware of a more elegant solution.
>
The current implementation might not be what you're looking for since
ip6ip6 wants a tunnel configured. What we really want is more like
anonymous decapsulation, that is just decap the ip6ip6 packet and
resubmit the packet into the stack (this is what you patch is doing).
The idea has been kicked around before, especially in the use case
where we're tunneling across a domain and there could be hundreds of
such tunnels to some device. I think it's generally okay to do this,
although someone might raise security concerns since it sort of
obfuscates the "real packet". Probably makes sense to have a sysctl to
enable this and probably could default to on. Of course, if we do this
the next question is should we also implement anonymous decapsulation
for 44,64,46 tunnels.

Tom

> Justin
>
> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> ---
> >>  include/linux/ipv6.h |  1 +
> >>  net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
> >>  2 files changed, 23 insertions(+)
> >>
> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> index 2cb445a8fc9e..5312a718bc7a 100644
> >> --- a/include/linux/ipv6.h
> >> +++ b/include/linux/ipv6.h
> >> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
> >>  #define IP6SKB_HOPBYHOP        32
> >>  #define IP6SKB_L3SLAVE         64
> >>  #define IP6SKB_JUMBOGRAM      128
> >> +#define IP6SKB_IOAM           256
> >>  };
> >>
> >>  #if defined(CONFIG_NET_L3_MASTER_DEV)
> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> >> index e96304d8a4a7..8cf75cc5e806 100644
> >> --- a/net/ipv6/ip6_input.c
> >> +++ b/net/ipv6/ip6_input.c
> >> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
> >> *));
> >>  void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> >>                               bool have_final)
> >>  {
> >> +       struct inet6_skb_parm *opt = IP6CB(skb);
> >>         const struct inet6_protocol *ipprot;
> >>         struct inet6_dev *idev;
> >>         unsigned int nhoff;
> >> +       u8 hop_limit;
> >>         bool raw;
> >>
> >>         /*
> >> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> sk_buff *skb, int nexthdr,
> >>         } else {
> >>                 if (!raw) {
> >>                         if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
> >> +                               /* IOAM Tunnel Decapsulation
> >> +                                * Packet is going to re-enter the stack
> >> +                                */
> >> +                               if (nexthdr == NEXTHDR_IPV6 &&
> >> +                                   (opt->flags & IP6SKB_IOAM)) {
> >> +                                       hop_limit = ipv6_hdr(skb)->hop_limit;
> >> +
> >> +                                       skb_reset_network_header(skb);
> >> +                                       skb_reset_transport_header(skb);
> >> +                                       skb->encapsulation = 0;
> >> +
> >> +                                       ipv6_hdr(skb)->hop_limit = hop_limit;
> >> +                                       __skb_tunnel_rx(skb, skb->dev,
> >> +                                                       dev_net(skb->dev));
> >> +
> >> +                                       netif_rx(skb);
> >> +                                       goto out;
> >> +                               }
> >> +
> >>                                 __IP6_INC_STATS(net, idev,
> >>                                                 IPSTATS_MIB_INUNKNOWNPROTOS);
> >>                                 icmpv6_send(skb, ICMPV6_PARAMPROB,
> >> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> sk_buff *skb, int nexthdr,
> >>                         consume_skb(skb);
> >>                 }
> >>         }
> >> +out:
> >>         return;
> >>
> >>  discard:
> >> --
> >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-25 20:32       ` Tom Herbert
@ 2020-06-26  8:13         ` Justin Iurman
  2020-06-26 14:53           ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26  8:13 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

Tom,

>> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
>> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
>> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>> >>
>> >
>> > The IANA allocation is TEMPORARY, with an expiration date is
>> > 4/16/2021. Note from RFC7120:
>> >
>> > "Implementers and deployers need to be aware that deprecation and
>> > de-allocation could take place at any time after expiry; therefore, an
>> > expired early allocation is best considered as deprecated."
>> >
>> > Please add a comment in the code and in the Documentation to this effect.
>>
>> I'll do that, thanks. What kind of comment (is there an official pattern?) and,
>> where in the Documentation should I add it?
>>
>> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
>> >> packets. Default is drop.
>> >
>> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
>> > packet containing the IOAM HBH option . Note that the act bits of the
>>
>> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets
>> containing the IOAM HBH option.
>>
>> > option type are 00 which means the TLV is skipped if the option isn't
>> > processed soI don't think it's correct to drop these packets by
>> > default.
>>
>> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for
>> this option, I do believe it should be disabled (dropped) by default for nodes
>> that "speak IOAM". Indeed, you don't want anyone with a kernel that includes
>> IOAM to accept IOAM packets by default, which would mean that anyone would
>> create (potentially without being aware) an IOAM domain. And, also, to avoid
>> spreading leaks.
>>
> I think you're convoluting whether a node processes an IOAM or whether
> it needs to drop because it doesn't process. Yes, on a IOAM system it
> makes sense to allow configuration at whether to process the TLV.
> However, even when it doesn't then the TLV should be skipped and the
> packet not dropped. We know this is the correct behavior since on a
> system that isn't IOAM aware, i.e. all deployed nodes right now, they
> will skip the TLV per the act bits. If we want to change the default
> behavior, the only way to do that is to change the act bits to
> non-zero.

Makes sense, you're right indeed. But still, I'm a bit worried to enable it by default. That would open the door to things we don't want. We'd end up in a situation where IOAM is not "privately" deployed. And, think about the guy that runs a kernel with IOAM (that he does not know anything about). Of course, he would not have a FW to drop IOAM. Therefore, someone could simply "create" an IOAM domain with him by sending IPv6 packets with IOAM HBH and steel data. This is something similar to the leak problem.

So, I think there are 2 possibilities against the above: (i) the current one, ie drop by default or (ii) use 01 for act bits. This topic has been widely discussed in the WG and is still open, though the trend seems to be "00" with the drop-by-default compromise.

> For the leakage problem, that is a firewall issue. The expectation is
> that border devices will have rules that prevent leaking packets out
> of their domain. This is an orthogonal mechanism that needs to be done
> for other protocols-- SRH for instance. The filtering is simple, just
> drop the packet when TLV matches (although I suspect most sites
> probably just drop packets with EH at this point). This doesn't
> require any changes to the implementation and doesn't require that
> border devices even implement IOAM-- they just drop on pattern
> matching.

+1

Justin

> Tom
>> Justin
>>
>> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
>> >> (unique) identifier of the interface.
>> >>
>> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
>> >> identifier of the node.
>> >>
>> >> Two relativistic hash tables: one for IOAM namespaces, the other for
>> >> IOAM schemas. A namespace can only have a single active schema and a
>> >> schema can only be attached to a single namespace (1:1 relationship).
>> >>
>> >>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>> >>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>> >>   [3]
>> >>   https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>> >>
>> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> ---
>> >>  include/linux/ipv6.h       |   2 +
>> >>  include/net/ioam6.h        |  98 +++++++++++
>> >>  include/net/netns/ipv6.h   |   2 +
>> >>  include/uapi/linux/in6.h   |   1 +
>> >>  include/uapi/linux/ipv6.h  |   2 +
>> >>  net/ipv6/Makefile          |   2 +-
>> >>  net/ipv6/addrconf.c        |  20 +++
>> >>  net/ipv6/af_inet6.c        |   7 +
>> >>  net/ipv6/exthdrs.c         |  67 ++++++++
>> >>  net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
>> >>  net/ipv6/sysctl_net_ipv6.c |   7 +
>> >>  11 files changed, 533 insertions(+), 1 deletion(-)
>> >>  create mode 100644 include/net/ioam6.h
>> >>  create mode 100644 net/ipv6/ioam6.c
>> >>
>> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> >> index 5312a718bc7a..15732f964c6e 100644
>> >> --- a/include/linux/ipv6.h
>> >> +++ b/include/linux/ipv6.h
>> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
>> >>         __s32           disable_policy;
>> >>         __s32           ndisc_tclass;
>> >>         __s32           rpl_seg_enabled;
>> >> +       __u32           ioam6_enabled;
>> >> +       __u32           ioam6_id;
>> >>
>> >>         struct ctl_table_header *sysctl_header;
>> >>  };
>> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
>> >> new file mode 100644
>> >> index 000000000000..2a910bc99947
>> >> --- /dev/null
>> >> +++ b/include/net/ioam6.h
>> >> @@ -0,0 +1,98 @@
>> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> >> +/*
>> >> + *  IOAM IPv6 implementation
>> >> + *
>> >> + *  Author:
>> >> + *  Justin Iurman <justin.iurman@uliege.be>
>> >> + */
>> >> +
>> >> +#ifndef _NET_IOAM6_H
>> >> +#define _NET_IOAM6_H
>> >> +
>> >> +#include <linux/net.h>
>> >> +#include <linux/ipv6.h>
>> >> +#include <linux/rhashtable-types.h>
>> >> +
>> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
>> >> +
>> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
>> >> +
>> >> +#define IOAM6_TRACE_TYPE0  (1 << 31)
>> >> +#define IOAM6_TRACE_TYPE1  (1 << 30)
>> >> +#define IOAM6_TRACE_TYPE2  (1 << 29)
>> >> +#define IOAM6_TRACE_TYPE3  (1 << 28)
>> >> +#define IOAM6_TRACE_TYPE4  (1 << 27)
>> >> +#define IOAM6_TRACE_TYPE5  (1 << 26)
>> >> +#define IOAM6_TRACE_TYPE6  (1 << 25)
>> >> +#define IOAM6_TRACE_TYPE7  (1 << 24)
>> >> +#define IOAM6_TRACE_TYPE8  (1 << 23)
>> >> +#define IOAM6_TRACE_TYPE9  (1 << 22)
>> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
>> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
>> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
>> >> +
>> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
>> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
>> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
>> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
>> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
>> >> +
>> >> +struct ioam6_common_hdr {
>> >> +       u8 opt_type;
>> >> +       u8 opt_len;
>> >> +       u8 res;
>> >> +       u8 ioam_type;
>> >> +       __be16 namespace_id;
>> >> +} __packed;
>> >> +
>> >> +struct ioam6_trace_hdr {
>> >> +       __be16 info;
>> >> +       __be32 type;
>> >> +} __packed;
>> >> +
>> >> +struct ioam6_namespace {
>> >> +       struct rhash_head head;
>> >> +       struct rcu_head rcu;
>> >> +
>> >> +       __be16 id;
>> >> +       __be64 data;
>> >> +       bool remove_tlv;
>> >> +
>> >> +       struct ioam6_schema *schema;
>> >> +};
>> >> +
>> >> +struct ioam6_schema {
>> >> +       struct rhash_head head;
>> >> +       struct rcu_head rcu;
>> >> +
>> >> +       u32 id;
>> >> +       int len;
>> >> +       __be32 hdr;
>> >> +       u8 *data;
>> >> +
>> >> +       struct ioam6_namespace *ns;
>> >> +};
>> >> +
>> >> +struct ioam6_pernet_data {
>> >> +       struct mutex lock;
>> >> +       struct rhashtable namespaces;
>> >> +       struct rhashtable schemas;
>> >> +};
>> >> +
>> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
>> >> +{
>> >> +#if IS_ENABLED(CONFIG_IPV6)
>> >> +       return net->ipv6.ioam6_data;
>> >> +#else
>> >> +       return NULL;
>> >> +#endif
>> >> +}
>> >> +
>> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
>> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> >> +                                 struct ioam6_namespace *ns);
>> >> +
>> >> +extern int ioam6_init(void);
>> >> +extern void ioam6_exit(void);
>> >> +
>> >> +#endif
>> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
>> >> index 5ec054473d81..89b27fa721f4 100644
>> >> --- a/include/net/netns/ipv6.h
>> >> +++ b/include/net/netns/ipv6.h
>> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
>> >>         int max_hbh_opts_len;
>> >>         int seg6_flowlabel;
>> >>         bool skip_notify_on_dev_down;
>> >> +       unsigned int ioam6_id;
>> >>  };
>> >>
>> >>  struct netns_ipv6 {
>> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
>> >>                 spinlock_t      lock;
>> >>                 u32             seq;
>> >>         } ip6addrlbl_table;
>> >> +       struct ioam6_pernet_data *ioam6_data;
>> >>  };
>> >>
>> >>  #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
>> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
>> >> index 9f2273a08356..1c98435220c9 100644
>> >> --- a/include/uapi/linux/in6.h
>> >> +++ b/include/uapi/linux/in6.h
>> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
>> >>  #define IPV6_TLV_PADN          1
>> >>  #define IPV6_TLV_ROUTERALERT   5
>> >>  #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
>> >> +#define IPV6_TLV_IOAM_HOPOPTS  49
>> >
>> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
>> > Note from RFC7120:
>> >
>> > "Implementers and deployers need to be aware that deprecation and
>> > de-allocation could take place at any time after expiry; therefore, an
>> > expired early allocation is best considered as deprecated. It is not
>> > IANA's responsibility to track the status of allocations, their
>> > expirations, or when they may be re-allocated."
>> >
>> > The expiration date is Please add a comment here and in the
>> > Documentation to this effect.
>> >
>> >>  #define IPV6_TLV_JUMBO         194
>> >>  #define IPV6_TLV_HAO           201     /* home address option */
>> >>
>> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
>> >> index 13e8751bf24a..eb521b2dd885 100644
>> >> --- a/include/uapi/linux/ipv6.h
>> >> +++ b/include/uapi/linux/ipv6.h
>> >> @@ -189,6 +189,8 @@ enum {
>> >>         DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>> >>         DEVCONF_NDISC_TCLASS,
>> >>         DEVCONF_RPL_SEG_ENABLED,
>> >> +       DEVCONF_IOAM6_ENABLED,
>> >> +       DEVCONF_IOAM6_ID,
>> >>         DEVCONF_MAX
>> >>  };
>> >>
>> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
>> >> index cf7b47bdb9b3..b7ef10d417d6 100644
>> >> --- a/net/ipv6/Makefile
>> >> +++ b/net/ipv6/Makefile
>> >> @@ -10,7 +10,7 @@ ipv6-objs :=  af_inet6.o anycast.o ip6_output.o ip6_input.o
>> >> addrconf.o \
>> >>                 route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
>> >>                 raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
>> >>                 exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
>> >> -               udp_offload.o seg6.o fib6_notifier.o rpl.o
>> >> +               udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>> >>
>> >>  ipv6-offload :=        ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>> >>
>> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> >> index 840bfdb3d7bd..6c952a28ade2 100644
>> >> --- a/net/ipv6/addrconf.c
>> >> +++ b/net/ipv6/addrconf.c
>> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>> >>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>> >>         .disable_policy         = 0,
>> >>         .rpl_seg_enabled        = 0,
>> >> +       .ioam6_enabled          = 0,
>> >> +       .ioam6_id               = 0,
>> >>  };
>> >>
>> >>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
>> >> {
>> >>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
>> >>         .disable_policy         = 0,
>> >>         .rpl_seg_enabled        = 0,
>> >> +       .ioam6_enabled          = 0,
>> >> +       .ioam6_id               = 0,
>> >>  };
>> >>
>> >>  /* Check if link is ready: is it up and is a valid qdisc available */
>> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
>> >> *cnf,
>> >>         array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
>> >>         array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
>> >>         array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
>> >> +       array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
>> >> +       array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
>> >>  }
>> >>
>> >>  static inline size_t inet6_ifla6_size(void)
>> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
>> >>                 .mode           = 0644,
>> >>                 .proc_handler   = proc_dointvec,
>> >>         },
>> >> +       {
>> >> +               .procname       = "ioam6_enabled",
>> >> +               .data           = &ipv6_devconf.ioam6_enabled,
>> >> +               .maxlen         = sizeof(int),
>> >> +               .mode           = 0644,
>> >> +               .proc_handler   = proc_dointvec,
>> >> +       },
>> >> +       {
>> >> +               .procname       = "ioam6_id",
>> >> +               .data           = &ipv6_devconf.ioam6_id,
>> >> +               .maxlen         = sizeof(int),
>> >> +               .mode           = 0644,
>> >> +               .proc_handler   = proc_dointvec,
>> >> +       },
>> >>         {
>> >>                 /* sentinel */
>> >>         }
>> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
>> >> index b304b882e031..63a9ffc4b283 100644
>> >> --- a/net/ipv6/af_inet6.c
>> >> +++ b/net/ipv6/af_inet6.c
>> >> @@ -62,6 +62,7 @@
>> >>  #include <net/rpl.h>
>> >>  #include <net/compat.h>
>> >>  #include <net/xfrm.h>
>> >> +#include <net/ioam6.h>
>> >>
>> >>  #include <linux/uaccess.h>
>> >>  #include <linux/mroute6.h>
>> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
>> >>         if (err)
>> >>                 goto rpl_fail;
>> >>
>> >> +       err = ioam6_init();
>> >> +       if (err)
>> >> +               goto ioam6_fail;
>> >> +
>> >>         err = igmp6_late_init();
>> >>         if (err)
>> >>                 goto igmp6_late_err;
>> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
>> >>  #endif
>> >>  igmp6_late_err:
>> >>         rpl_exit();
>> >> +ioam6_fail:
>> >> +       ioam6_exit();
>> >>  rpl_fail:
>> >>         seg6_exit();
>> >>  seg6_fail:
>> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> >> index f27ab3bf2e0c..00aee1358f1c 100644
>> >> --- a/net/ipv6/exthdrs.c
>> >> +++ b/net/ipv6/exthdrs.c
>> >> @@ -49,6 +49,8 @@
>> >>  #include <net/seg6_hmac.h>
>> >>  #endif
>> >>  #include <net/rpl.h>
>> >> +#include <net/ioam6.h>
>> >> +#include <net/dst_metadata.h>
>> >>
>> >>  #include <linux/uaccess.h>
>> >>
>> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >>         return TLV_REJECT;
>> >>  }
>> >>
>> >> +/* IOAM */
>> >> +
>> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
>> >> +{
>> >> +       struct ioam6_common_hdr *ioamh;
>> >> +       struct ioam6_namespace *ns;
>> >> +
>> >> +       /* Must be 4n-aligned */
>> >> +       if (optoff & 3)
>> >> +               goto drop;
>> >> +
>> >> +       if (!skb_valid_dst(skb))
>> >> +               ip6_route_input(skb);
>> >> +
>> >> +       /* IOAM must be enabled on ingress interface */
>> >> +       if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
>> >> +               goto drop;
>> >> +
>> >> +       ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
>> >> +       ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
>> >> +
>> >> +       /* Unknown IOAM namespace, either:
>> >> +        *  - Drop it if IOAM is not enabled on egress interface (if any)
>> >> +        *  - Ignore it otherwise
>> >> +        */
>> >> +       if (!ns) {
>> >> +               if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> >> +                   !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> >> +                       goto drop;
>> >> +
>> >> +               goto accept;
>> >> +       }
>> >> +
>> >> +       if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> >> +               goto remove;
>> >> +
>> >> +       /* Known IOAM namespace which must not be removed:
>> >> +        * IOAM must be enabled on egress interface
>> >> +        */
>> >> +       if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> >> +           !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> >> +               goto drop;
>> >> +
>> >> +       switch (ioamh->ioam_type) {
>> >> +       case IOAM6_OPT_TRACE_PREALLOC:
>> >> +               ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
>> >> +               IP6CB(skb)->flags |= IP6SKB_IOAM;
>> >> +               break;
>> >> +       default:
>> >> +               break;
>> >> +       }
>> >> +
>> >> +accept:
>> >> +       return TLV_ACCEPT;
>> >> +remove:
>> >> +       return TLV_REMOVE;
>> >> +drop:
>> >> +       kfree_skb(skb);
>> >> +       return TLV_REJECT;
>> >> +}
>> >> +
>> >>  /* Jumbo payload */
>> >>
>> >>  static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> >>                 .type   = IPV6_TLV_ROUTERALERT,
>> >>                 .func   = ipv6_hop_ra,
>> >>         },
>> >> +       {
>> >> +               .type   = IPV6_TLV_IOAM_HOPOPTS,
>> >> +               .func   = ipv6_hop_ioam,
>> >> +       },
>> >>         {
>> >>                 .type   = IPV6_TLV_JUMBO,
>> >>                 .func   = ipv6_hop_jumbo,
>> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
>> >> new file mode 100644
>> >> index 000000000000..406aa78eb504
>> >> --- /dev/null
>> >> +++ b/net/ipv6/ioam6.c
>> >> @@ -0,0 +1,326 @@
>> >> +// SPDX-License-Identifier: GPL-2.0-or-later
>> >> +/*
>> >> + *  IOAM IPv6 implementation
>> >> + *
>> >> + *  Author:
>> >> + *  Justin Iurman <justin.iurman@uliege.be>
>> >> + */
>> >> +
>> >> +#include <linux/errno.h>
>> >> +#include <linux/types.h>
>> >> +#include <linux/kernel.h>
>> >> +#include <linux/net.h>
>> >> +#include <linux/rhashtable.h>
>> >> +
>> >> +#include <net/addrconf.h>
>> >> +#include <net/ioam6.h>
>> >> +
>> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
>> >> +{
>> >> +       kfree_rcu(ns, rcu);
>> >> +}
>> >> +
>> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
>> >> +{
>> >> +       kfree_rcu(sc, rcu);
>> >> +}
>> >> +
>> >> +static void ioam6_free_ns(void *ptr, void *arg)
>> >> +{
>> >> +       struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
>> >> +
>> >> +       if (ns)
>> >> +               ioam6_ns_release(ns);
>> >> +}
>> >> +
>> >> +static void ioam6_free_sc(void *ptr, void *arg)
>> >> +{
>> >> +       struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
>> >> +
>> >> +       if (sc)
>> >> +               ioam6_sc_release(sc);
>> >> +}
>> >> +
>> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> >> +{
>> >> +       const struct ioam6_namespace *ns = obj;
>> >> +
>> >> +       return (ns->id != *(__be16 *)arg->key);
>> >> +}
>> >> +
>> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> >> +{
>> >> +       const struct ioam6_schema *sc = obj;
>> >> +
>> >> +       return (sc->id != *(u32 *)arg->key);
>> >> +}
>> >> +
>> >> +static const struct rhashtable_params rht_ns_params = {
>> >> +       .key_len                = sizeof(__be16),
>> >> +       .key_offset             = offsetof(struct ioam6_namespace, id),
>> >> +       .head_offset            = offsetof(struct ioam6_namespace, head),
>> >> +       .automatic_shrinking    = true,
>> >> +       .obj_cmpfn              = ioam6_ns_cmpfn,
>> >> +};
>> >> +
>> >> +static const struct rhashtable_params rht_sc_params = {
>> >> +       .key_len                = sizeof(u32),
>> >> +       .key_offset             = offsetof(struct ioam6_schema, id),
>> >> +       .head_offset            = offsetof(struct ioam6_schema, head),
>> >> +       .automatic_shrinking    = true,
>> >> +       .obj_cmpfn              = ioam6_sc_cmpfn,
>> >> +};
>> >> +
>> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
>> >> +{
>> >> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> >> +
>> >> +       return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
>> >> +}
>> >> +
>> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
>> >> +                               u32 trace_type, struct ioam6_namespace *ns)
>> >> +{
>> >> +       u8 *data = skb_network_header(skb) + nodeoff;
>> >> +       struct __kernel_sock_timeval ts;
>> >> +       u64 raw_u64;
>> >> +       u32 raw_u32;
>> >> +       u16 raw_u16;
>> >> +       u8 byte;
>> >> +
>> >> +       /* hop_lim and node_id */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE0) {
>> >> +               byte = ipv6_hdr(skb)->hop_limit - 1;
>> >> +               raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> >> +               if (!raw_u32)
>> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u24;
>> >> +               else
>> >> +                       raw_u32 &= IOAM6_EMPTY_FIELD_u24;
>> >> +               *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* ingress_if_id and egress_if_id */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE1) {
>> >> +               raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> >> +               if (!raw_u16)
>> >> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> >> +               *(__be16 *)data = cpu_to_be16(raw_u16);
>> >> +               data += sizeof(__be16);
>> >> +
>> >> +               raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> >> +               if (!raw_u16)
>> >> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> >> +               *(__be16 *)data = cpu_to_be16(raw_u16);
>> >> +               data += sizeof(__be16);
>> >> +       }
>> >> +
>> >> +       /* timestamp seconds */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE2) {
>> >> +               if (!skb->tstamp) {
>> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> +               } else {
>> >> +                       skb_get_new_timestamp(skb, &ts);
>> >> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
>> >> +               }
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* timestamp subseconds */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE3) {
>> >> +               if (!skb->tstamp) {
>> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> +               } else {
>> >> +                       if (!(trace_type & IOAM6_TRACE_TYPE2))
>> >> +                               skb_get_new_timestamp(skb, &ts);
>> >> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
>> >> +               }
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* transit delay */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE4) {
>> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* namespace data */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE5) {
>> >> +               *(__be32 *)data = (__be32)ns->data;
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* queue depth */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE6) {
>> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* hop_lim and node_id (wide) */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE7) {
>> >> +               byte = ipv6_hdr(skb)->hop_limit - 1;
>> >> +               raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> >> +               if (!raw_u64)
>> >> +                       raw_u64 = IOAM6_EMPTY_FIELD_u56;
>> >> +               else
>> >> +                       raw_u64 &= IOAM6_EMPTY_FIELD_u56;
>> >> +               *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
>> >> +               data += sizeof(__be64);
>> >> +       }
>> >> +
>> >> +       /* ingress_if_id and egress_if_id (wide) */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE8) {
>> >> +               raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> >> +               if (!raw_u32)
>> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> >> +               *(__be32 *)data = cpu_to_be32(raw_u32);
>> >> +               data += sizeof(__be32);
>> >> +
>> >> +               raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> >> +               if (!raw_u32)
>> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> >> +               *(__be32 *)data = cpu_to_be32(raw_u32);
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* namespace data (wide) */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE9) {
>> >> +               *(__be64 *)data = ns->data;
>> >> +               data += sizeof(__be64);
>> >> +       }
>> >> +
>> >> +       /* buffer occupancy */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE10) {
>> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* checksum complement */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE11) {
>> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> +               data += sizeof(__be32);
>> >> +       }
>> >> +
>> >> +       /* opaque state snapshot */
>> >> +       if (trace_type & IOAM6_TRACE_TYPE22) {
>> >> +               if (!ns->schema) {
>> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
>> >> +               } else {
>> >> +                       *(__be32 *)data = ns->schema->hdr;
>> >> +                       data += sizeof(__be32);
>> >> +                       memcpy(data, ns->schema->data, ns->schema->len);
>> >> +               }
>> >> +       }
>> >> +}
>> >> +
>> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> >> +                          struct ioam6_namespace *ns)
>> >> +{
>> >> +       u8 nodelen, flags, remlen, sclen = 0;
>> >> +       struct ioam6_trace_hdr *trh;
>> >> +       int nodeoff;
>> >> +       u16 info;
>> >> +       u32 type;
>> >> +
>> >> +       trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
>> >> +       info = be16_to_cpu(trh->info);
>> >> +       type = be32_to_cpu(trh->type);
>> >> +
>> >> +       nodelen = info >> 11;
>> >> +       flags = (info >> 7) & 0xf;
>> >> +       remlen = info & 0x7f;
>> >> +
>> >> +       /* Skip if Overflow bit is set OR
>> >> +        * if an unknown type (bit 12-21) is set
>> >> +        */
>> >> +       if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
>> >> +               return;
>> >> +
>> >> +       /* NodeLen does not include Opaque State Snapshot length. We need to
>> >> +        * take it into account if the corresponding bit is set and if current
>> >> +        * IOAM namespace has an active schema attached to it
>> >> +        */
>> >> +       if (type & IOAM6_TRACE_TYPE22) {
>> >> +               /* Opaque State Snapshot header size */
>> >> +               sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
>> >> +
>> >> +               if (ns->schema)
>> >> +                       sclen += ns->schema->len / 4;
>> >> +       }
>> >> +
>> >> +       /* Not enough space remaining: set Overflow bit and skip */
>> >> +       if (!remlen || remlen < (nodelen + sclen)) {
>> >> +               info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
>> >> +               trh->info = cpu_to_be16(info);
>> >> +               return;
>> >> +       }
>> >> +
>> >> +       nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
>> >> +       ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
>> >> +
>> >> +       /* Update RemainingLen */
>> >> +       remlen -= nodelen + sclen;
>> >> +       info = (info & 0xff80) | remlen;
>> >> +       trh->info = cpu_to_be16(info);
>> >> +}
>> >> +
>> >> +static int __net_init ioam6_net_init(struct net *net)
>> >> +{
>> >> +       struct ioam6_pernet_data *nsdata;
>> >> +       int err = -ENOMEM;
>> >> +
>> >> +       nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
>> >> +       if (!nsdata)
>> >> +               goto out;
>> >> +
>> >> +       mutex_init(&nsdata->lock);
>> >> +       net->ipv6.ioam6_data = nsdata;
>> >> +
>> >> +       err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
>> >> +       if (err)
>> >> +               goto free_nsdata;
>> >> +
>> >> +       err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
>> >> +       if (err)
>> >> +               goto free_rht_ns;
>> >> +
>> >> +out:
>> >> +       return err;
>> >> +free_rht_ns:
>> >> +       rhashtable_destroy(&nsdata->namespaces);
>> >> +free_nsdata:
>> >> +       kfree(nsdata);
>> >> +       net->ipv6.ioam6_data = NULL;
>> >> +       goto out;
>> >> +}
>> >> +
>> >> +static void __net_exit ioam6_net_exit(struct net *net)
>> >> +{
>> >> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> >> +
>> >> +       rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
>> >> +       rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
>> >> +
>> >> +       kfree(nsdata);
>> >> +}
>> >> +
>> >> +static struct pernet_operations ioam6_net_ops = {
>> >> +       .init = ioam6_net_init,
>> >> +       .exit = ioam6_net_exit,
>> >> +};
>> >> +
>> >> +int __init ioam6_init(void)
>> >> +{
>> >> +       int err = register_pernet_subsys(&ioam6_net_ops);
>> >> +
>> >> +       if (err)
>> >> +               return err;
>> >> +
>> >> +       pr_info("In-situ OAM (IOAM) with IPv6\n");
>> >> +       return 0;
>> >> +}
>> >> +
>> >> +void ioam6_exit(void)
>> >> +{
>> >> +       unregister_pernet_subsys(&ioam6_net_ops);
>> >> +}
>> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
>> >> index fac2135aa47b..da49b33ab6fc 100644
>> >> --- a/net/ipv6/sysctl_net_ipv6.c
>> >> +++ b/net/ipv6/sysctl_net_ipv6.c
>> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
>> >>                 .mode           = 0644,
>> >>                 .proc_handler   = proc_dointvec
>> >>         },
>> >> +       {
>> >> +               .procname       = "ioam6_id",
>> >> +               .data           = &init_net.ipv6.sysctl.ioam6_id,
>> >> +               .maxlen         = sizeof(int),
>> >> +               .mode           = 0644,
>> >> +               .proc_handler   = proc_dointvec
>> >> +       },
>> >>         { }
>> >>  };
>> >>
>> >> --
> > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-25 20:53       ` Tom Herbert
@ 2020-06-26  8:22         ` Justin Iurman
  2020-06-26 15:39           ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26  8:22 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

Tom,

>> Hi Tom,
>>
>> >> Add the possibility to remove one or more consecutive TLVs without
>> >> messing up the alignment of others. For now, only IOAM requires this
>> >> behavior.
>> >>
>> > Hi Justin,
>> >
>> > Can you explain the motivation for this? Per RFC8200, extension
>> > headers in flight are not to be added, removed, or modified outside of
>> > the standard rules for processing modifiable HBH and DO TLVs., that
>> > would include adding and removing TLVs in EH. One obvious problem this
>>
>> As you already know from our last meeting, IOAM may be configured on a node such
>> that a specific IOAM namespace should be removed. Therefore, this patch
>> provides support for the deletion of a TLV (or consecutive TLVs), without
>> removing the entire EH (if it's empty, there will be padding). Note that there
>> is a similar "problem" with the Incremental Trace where you'd need to expand
>> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
>> against modification of in-flight EHs, but there are several reasons that, I
>> believe, mitigates this statement.
>>
>> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
>> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
>> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
>> domain, ie from an IOAM node inside the domain to another one (no need for
>> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
>> so we modify "our" header and (ii) we already own the traffic.
>>
>> And if someone is still angry about this, well, the good news is that such
>> modification can be avoided most of the time. Indeed, operators are advised to
>> remove an IOAM namespace only on egress nodes. This way, the destination
>> (either the tunnel destination or the real destination, depending on the
>> scenario) will receive EHs and take care of them without the need to remove
>> anything. But, again, operators can do what they want and I'd tend to adhere to
>> David's philosophy [1] and give them the possibility to choose what to do.
>>
> 
> Justin,
> 
> 6man WG has had a _long_ and sometimes bitter discussion around this
> particularly with regards to insertion of SRH. The current consensus
> of IETF is that it is a violation of RFC8200.  We've heard all the
> arguments that it's only for limited domains and narrow use cases,
> nevertheless there are several problems that the header
> insertion/deletion advocates never answered-- it breaks AH, it breaks
> PMTU discovery, it breaks ICMP. There is also a risk that a
> non-standard modification could cause a packet to be dropped
> downstream from the node that modifies it. There is no attribution on
> who created the problem, and hence this can lead to systematic
> blackholes which are the most miserable sort of problem to debug.

Yes, I know the whole story and it's been stormy from what I understood.

> Fundamentally, it is not robust per Postel's law (I actually wrote a
> draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
> you're interested).

Interesting, I'll take a look.

> IMO, we shouldn't be using Linux as a backdoor to implement protocol
> that IETF is saying isn't robust. Can you point out in the IOAM drafts
> where this requirement is specified, then I can take it up in IOAM WG
> or 6man if needed...

Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1] (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be published.

Justin

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09

> Tom
> 
>> > creates is that it breaks AH if the TLVs are removed in HBH before AH
>> > is processed (AH is processed after HBH).
>>
>> Correct. But I don't think it should prevent us from having IOAM in the kernel.
>> Again, operators could simply apply IOAM on a subset of the traffic that does
>> not include AHs, for example.
>>
>> Justin
>>
>>   [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
>>
>> > Tom
>> >> By default, an 8-octet boundary is automatically assumed. This is the
>> >> price to pay (at most a useless 4-octet padding) to make sure everything
>> >> is still aligned after the removal.
>> >>
>> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
>> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
>> >> header.
>> >>
>> >> Example 1:
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       X       |       X       |    Padding    |    Padding    |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |                                                               |
>> >> ~                Option to be removed (8 octets)                ~
>> >> |                                                               |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Y       |       Y       |       Y       |       Y       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |    Padding    |    Padding    |    Padding    |    Padding    |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
>> >> boundary (same result in both cases).
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       X       |       X       |    Padding    |    Padding    |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Y       |       Y       |       Y       |       Y       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |    Padding    |    Padding    |    Padding    |    Padding    |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Example 2:
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       X       |       X       |    Padding    |    Padding    |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |                Option to be removed (4 octets)                |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Y       |       Y       |       Y       |       Y       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
>> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
>> >> of 8 anymore.
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       X       |       X       |    Padding    |    Padding    |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Y       |       Y       |       Y       |       Y       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> |       Z       |       Z       |       Z       |       Z       |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Therefore, the largest (8-octet) boundary is assumed by default and for
>> >> all, which means that blocks are only moved in multiples of 8. This
>> >> assertion guarantees good alignment.
>> >>
>> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> ---
>> >>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>> >>  1 file changed, 108 insertions(+), 26 deletions(-)
>> >>
>> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> >> index e9b366994475..f27ab3bf2e0c 100644
>> >> --- a/net/ipv6/exthdrs.c
>> >> +++ b/net/ipv6/exthdrs.c
>> >> @@ -52,17 +52,27 @@
>> >>
>> >>  #include <linux/uaccess.h>
>> >>
>> >> -/*
>> >> - *     Parsing tlv encoded headers.
>> >> +/* States for TLV parsing functions. */
>> >> +
>> >> +enum {
>> >> +       TLV_ACCEPT,
>> >> +       TLV_REJECT,
>> >> +       TLV_REMOVE,
>> >> +       __TLV_MAX
>> >> +};
>> >> +
>> >> +/* Parsing TLV encoded headers.
>> >>   *
>> >> - *     Parsing function "func" returns true, if parsing succeed
>> >> - *     and false, if it failed.
>> >> - *     It MUST NOT touch skb->h.
>> >> + * Parsing function "func" returns either:
>> >> + *  - TLV_ACCEPT if parsing succeeds
>> >> + *  - TLV_REJECT if parsing fails
>> >> + *  - TLV_REMOVE if TLV must be removed
>> >> + * It MUST NOT touch skb->h.
>> >>   */
>> >>
>> >>  struct tlvtype_proc {
>> >>         int     type;
>> >> -       bool    (*func)(struct sk_buff *skb, int offset);
>> >> +       int     (*func)(struct sk_buff *skb, int offset);
>> >>  };
>> >>
>> >>  /*********************
>> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
>> >> optoff,
>> >>         return false;
>> >>  }
>> >>
>> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
>> >> +
>> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
>> >> +{
>> >> +       int len = end - start;
>> >> +       int padlen = len % 8;
>> >> +       unsigned char *h;
>> >> +       int rlen, off;
>> >> +       u16 pl_len;
>> >> +
>> >> +       rlen = len - padlen;
>> >> +       if (rlen) {
>> >> +               skb_pull(skb, rlen);
>> >> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
>> >> +                       start);
>> >> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
>> >> +
>> >> +               skb_reset_network_header(skb);
>> >> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
>> >> +
>> >> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
>> >> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
>> >> +
>> >> +               skb_transport_header(skb)[1] -= rlen >> 3;
>> >> +               end -= rlen;
>> >> +       }
>> >> +
>> >> +       if (padlen) {
>> >> +               off = end - padlen;
>> >> +               h = skb_network_header(skb);
>> >> +
>> >> +               if (padlen == 1) {
>> >> +                       h[off] = IPV6_TLV_PAD1;
>> >> +               } else {
>> >> +                       padlen -= 2;
>> >> +
>> >> +                       h[off] = IPV6_TLV_PADN;
>> >> +                       h[off + 1] = padlen;
>> >> +                       memset(&h[off + 2], 0, padlen);
>> >> +               }
>> >> +       }
>> >> +
>> >> +       return end;
>> >> +}
>> >> +
>> >>  /* Parse tlv encoded option header (hop-by-hop or destination) */
>> >>
>> >>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >>                           struct sk_buff *skb,
>> >> -                         int max_count)
>> >> +                         int max_count,
>> >> +                         bool removable)
>> >>  {
>> >>         int len = (skb_transport_header(skb)[1] + 1) << 3;
>> >> -       const unsigned char *nh = skb_network_header(skb);
>> >> +       unsigned char *nh = skb_network_header(skb);
>> >>         int off = skb_network_header_len(skb);
>> >>         const struct tlvtype_proc *curr;
>> >>         bool disallow_unknowns = false;
>> >> +       int off_remove = 0;
>> >>         int tlv_count = 0;
>> >>         int padlen = 0;
>> >> +       int ret;
>> >>
>> >>         if (unlikely(max_count < 0)) {
>> >>                 disallow_unknowns = true;
>> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
>> >> *procs,
>> >>                         if (tlv_count > max_count)
>> >>                                 goto bad;
>> >>
>> >> +                       ret = -1;
>> >>                         for (curr = procs; curr->type >= 0; curr++) {
>> >>                                 if (curr->type == nh[off]) {
>> >>                                         /* type specific length/alignment
>> >>                                            checks will be performed in the
>> >>                                            func(). */
>> >> -                                       if (curr->func(skb, off) == false)
>> >> +                                       ret = curr->func(skb, off);
>> >> +                                       if (ret == TLV_REJECT)
>> >>                                                 return false;
>> >>                                         break;
>> >>                                 }
>> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>> >>                                 return false;
>> >>
>> >> +                       if (removable) {
>> >> +                               if (ret == TLV_REMOVE) {
>> >> +                                       if (!off_remove)
>> >> +                                               off_remove = off - padlen;
>> >> +                               } else if (off_remove) {
>> >> +                                       off = remove_tlv(off_remove, off, skb);
>> >> +                                       nh = skb_network_header(skb);
>> >> +                                       off_remove = 0;
>> >> +                               }
>> >> +                       }
>> >> +
>> >>                         padlen = 0;
>> >>                         break;
>> >>                 }
>> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >>                 len -= optlen;
>> >>         }
>> >>
>> >> -       if (len == 0)
>> >> +       if (len == 0) {
>> >> +               /* Don't forget last TLV if it must be removed */
>> >> +               if (off_remove)
>> >> +                       remove_tlv(off_remove, off, skb);
>> >> +
>> >>                 return true;
>> >> +       }
>> >>  bad:
>> >>         kfree_skb(skb);
>> >>         return false;
>> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >>   *****************************/
>> >>
>> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >>  {
>> >>         struct ipv6_destopt_hao *hao;
>> >>         struct inet6_skb_parm *opt = IP6CB(skb);
>> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >>         if (skb->tstamp == 0)
>> >>                 __net_timestamp(skb);
>> >>
>> >> -       return true;
>> >> +       return TLV_ACCEPT;
>> >>
>> >>   discard:
>> >>         kfree_skb(skb);
>> >> -       return false;
>> >> +       return TLV_REJECT;
>> >>  }
>> >>  #endif
>> >>
>> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>> >>  #endif
>> >>
>> >>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
>> >> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
>> >> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
>> >> +                         false)) {
>> >>                 skb->transport_header += extlen;
>> >>                 opt = IP6CB(skb);
>> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
>> >> *skb)
>> >>
>> >>  /* Router Alert as of RFC 2711 */
>> >>
>> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >>  {
>> >>         const unsigned char *nh = skb_network_header(skb);
>> >>
>> >>         if (nh[optoff + 1] == 2) {
>> >>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>> >>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
>> >> -               return true;
>> >> +               return TLV_ACCEPT;
>> >>         }
>> >>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>> >>                             nh[optoff + 1]);
>> >>         kfree_skb(skb);
>> >> -       return false;
>> >> +       return TLV_REJECT;
>> >>  }
>> >>
>> >>  /* Jumbo payload */
>> >>
>> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >>  {
>> >>         const unsigned char *nh = skb_network_header(skb);
>> >>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
>> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> optoff)
>> >>         if (pkt_len <= IPV6_MAXPLEN) {
>> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
>> >> -               return false;
>> >> +               return TLV_REJECT;
>> >>         }
>> >>         if (ipv6_hdr(skb)->payload_len) {
>> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
>> >> -               return false;
>> >> +               return TLV_REJECT;
>> >>         }
>> >>
>> >>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
>> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> optoff)
>> >>                 goto drop;
>> >>
>> >>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
>> >> -       return true;
>> >> +       return TLV_ACCEPT;
>> >>
>> >>  drop:
>> >>         kfree_skb(skb);
>> >> -       return false;
>> >> +       return TLV_REJECT;
>> >>  }
>> >>
>> >>  /* CALIPSO RFC 5570 */
>> >>
>> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >>  {
>> >>         const unsigned char *nh = skb_network_header(skb);
>> >>
>> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
>> >> optoff)
>> >>         if (!calipso_validate(skb, nh + optoff))
>> >>                 goto drop;
>> >>
>> >> -       return true;
>> >> +       return TLV_ACCEPT;
>> >>
>> >>  drop:
>> >>         kfree_skb(skb);
>> >> -       return false;
>> >> +       return TLV_REJECT;
>> >>  }
>> >>
>> >>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>> >>
>> >>         opt->flags |= IP6SKB_HOPBYHOP;
>> >>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
>> >> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
>> >> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
>> >> +                         true)) {
>> >> +               /* we need to refresh the length in case
>> >> +                * at least one TLV was removed
>> >> +                */
>> >> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
>> >>                 skb->transport_header += extlen;
>> >>                 opt = IP6CB(skb);
>> >>                 opt->nhoff = sizeof(struct ipv6hdr);
>> >> --
> > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
  2020-06-26  0:48       ` Tom Herbert
@ 2020-06-26  8:31         ` Justin Iurman
  2020-06-26 15:52           ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26  8:31 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

Tom,

>> >> Implement the IOAM egress behavior.
>> >>
>> >> According to RFC 8200:
>> >> "Extension headers (except for the Hop-by-Hop Options header) are not
>> >>  processed, inserted, or deleted by any node along a packet's delivery
>> >>  path, until the packet reaches the node (or each of the set of nodes,
>> >>  in the case of multicast) identified in the Destination Address field
>> >>  of the IPv6 header."
>> >>
>> >> Therefore, an ingress node (an IOAM domain border) must encapsulate an
>> >> incoming IPv6 packet with another similar IPv6 header that will contain
>> >> IOAM data while it traverses the domain. When leaving, the egress node,
>> >> another IOAM domain border which is also the tunnel destination, must
>> >> decapsulate the packet.
>> >
>> > This is just IP in IP encapsulation that happens to be terminated at
>> > an egress node of the IOAM domain. The fact that it's IOAM isn't
>> > germaine, this IP in IP is done in a variety of ways. We should be
>> > using the normal protocol handler for NEXTHDR_IPV6  instead of special
>> > case code.
>>
>> Agree. The reason for this special case code is that I was not aware of a more
>> elegant solution.
>>
> The current implementation might not be what you're looking for since
> ip6ip6 wants a tunnel configured. What we really want is more like
> anonymous decapsulation, that is just decap the ip6ip6 packet and
> resubmit the packet into the stack (this is what you patch is doing).
> The idea has been kicked around before, especially in the use case
> where we're tunneling across a domain and there could be hundreds of
> such tunnels to some device. I think it's generally okay to do this,
> although someone might raise security concerns since it sort of
> obfuscates the "real packet". Probably makes sense to have a sysctl to

Indeed. However, in this precise case for IOAM, you don't have security issues since you would only decap if an IOAM HBH is found in the outer header, which is only valid if the node is part of the IOAM domain (IOAM is enabled on its ingress interface). But, for a more generic case, I agree for the sysctl solution.

> enable this and probably could default to on. Of course, if we do this
> the next question is should we also implement anonymous decapsulation
> for 44,64,46 tunnels.

Interesting question. I'd say that we should only do it if there is at least a use case that is (or will be) part of the kernel.

Justin

> Tom
> 
>> Justin
>>
>> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> ---
>> >>  include/linux/ipv6.h |  1 +
>> >>  net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
>> >>  2 files changed, 23 insertions(+)
>> >>
>> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> >> index 2cb445a8fc9e..5312a718bc7a 100644
>> >> --- a/include/linux/ipv6.h
>> >> +++ b/include/linux/ipv6.h
>> >> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
>> >>  #define IP6SKB_HOPBYHOP        32
>> >>  #define IP6SKB_L3SLAVE         64
>> >>  #define IP6SKB_JUMBOGRAM      128
>> >> +#define IP6SKB_IOAM           256
>> >>  };
>> >>
>> >>  #if defined(CONFIG_NET_L3_MASTER_DEV)
>> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> >> index e96304d8a4a7..8cf75cc5e806 100644
>> >> --- a/net/ipv6/ip6_input.c
>> >> +++ b/net/ipv6/ip6_input.c
>> >> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
>> >> *));
>> >>  void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>> >>                               bool have_final)
>> >>  {
>> >> +       struct inet6_skb_parm *opt = IP6CB(skb);
>> >>         const struct inet6_protocol *ipprot;
>> >>         struct inet6_dev *idev;
>> >>         unsigned int nhoff;
>> >> +       u8 hop_limit;
>> >>         bool raw;
>> >>
>> >>         /*
>> >> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> >> sk_buff *skb, int nexthdr,
>> >>         } else {
>> >>                 if (!raw) {
>> >>                         if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
>> >> +                               /* IOAM Tunnel Decapsulation
>> >> +                                * Packet is going to re-enter the stack
>> >> +                                */
>> >> +                               if (nexthdr == NEXTHDR_IPV6 &&
>> >> +                                   (opt->flags & IP6SKB_IOAM)) {
>> >> +                                       hop_limit = ipv6_hdr(skb)->hop_limit;
>> >> +
>> >> +                                       skb_reset_network_header(skb);
>> >> +                                       skb_reset_transport_header(skb);
>> >> +                                       skb->encapsulation = 0;
>> >> +
>> >> +                                       ipv6_hdr(skb)->hop_limit = hop_limit;
>> >> +                                       __skb_tunnel_rx(skb, skb->dev,
>> >> +                                                       dev_net(skb->dev));
>> >> +
>> >> +                                       netif_rx(skb);
>> >> +                                       goto out;
>> >> +                               }
>> >> +
>> >>                                 __IP6_INC_STATS(net, idev,
>> >>                                                 IPSTATS_MIB_INUNKNOWNPROTOS);
>> >>                                 icmpv6_send(skb, ICMPV6_PARAMPROB,
>> >> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> >> sk_buff *skb, int nexthdr,
>> >>                         consume_skb(skb);
>> >>                 }
>> >>         }
>> >> +out:
>> >>         return;
>> >>
>> >>  discard:
>> >> --
> > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH net-next] Fix unchecked dereference
  2020-06-25 10:52     ` Dan Carpenter
  (?)
  (?)
@ 2020-06-26  8:54     ` Justin Iurman
  2020-06-26 16:01         ` Jakub Kicinski
  -1 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26  8:54 UTC (permalink / raw)
  To: dan.carpenter; +Cc: kbuild, justin.iurman, netdev, lkp, kbuild-all, davem

If rhashtable_remove_fast returns an error, a rollback is applied. In
that case, an unchecked dereference has been fixed.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 net/ipv6/ioam6.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index e414e915bf1e..f1347940245e 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -161,7 +161,8 @@ static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
 	err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
 				     rht_ns_params);
 	if (err) {
-		ns->schema->ns = ns;
+		if (ns->schema)
+			ns->schema->ns = ns;
 		goto out_unlock;
 	}
 
@@ -355,7 +356,8 @@ static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
 	err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
 				     rht_sc_params);
 	if (err) {
-		sc->ns->schema = sc;
+		if (sc->ns)
+			sc->ns->schema = sc;
 		goto out_unlock;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
  2020-06-26  8:13         ` Justin Iurman
@ 2020-06-26 14:53           ` Tom Herbert
  0 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 14:53 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Fri, Jun 26, 2020 at 1:13 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Tom,
>
> >> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> >> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> >> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
> >> >>
> >> >
> >> > The IANA allocation is TEMPORARY, with an expiration date is
> >> > 4/16/2021. Note from RFC7120:
> >> >
> >> > "Implementers and deployers need to be aware that deprecation and
> >> > de-allocation could take place at any time after expiry; therefore, an
> >> > expired early allocation is best considered as deprecated."
> >> >
> >> > Please add a comment in the code and in the Documentation to this effect.
> >>
> >> I'll do that, thanks. What kind of comment (is there an official pattern?) and,
> >> where in the Documentation should I add it?
> >>
> >> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> >> >> packets. Default is drop.
> >> >
> >> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> >> > packet containing the IOAM HBH option . Note that the act bits of the
> >>
> >> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets
> >> containing the IOAM HBH option.
> >>
> >> > option type are 00 which means the TLV is skipped if the option isn't
> >> > processed soI don't think it's correct to drop these packets by
> >> > default.
> >>
> >> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for
> >> this option, I do believe it should be disabled (dropped) by default for nodes
> >> that "speak IOAM". Indeed, you don't want anyone with a kernel that includes
> >> IOAM to accept IOAM packets by default, which would mean that anyone would
> >> create (potentially without being aware) an IOAM domain. And, also, to avoid
> >> spreading leaks.
> >>
> > I think you're convoluting whether a node processes an IOAM or whether
> > it needs to drop because it doesn't process. Yes, on a IOAM system it
> > makes sense to allow configuration at whether to process the TLV.
> > However, even when it doesn't then the TLV should be skipped and the
> > packet not dropped. We know this is the correct behavior since on a
> > system that isn't IOAM aware, i.e. all deployed nodes right now, they
> > will skip the TLV per the act bits. If we want to change the default
> > behavior, the only way to do that is to change the act bits to
> > non-zero.
>
> Makes sense, you're right indeed. But still, I'm a bit worried to enable it by default. That would open the door to things we don't want. We'd end up in a situation where IOAM is not "privately" deployed. And, think about the guy that runs a kernel with IOAM (that he does not know anything about). Of course, he would not have a FW to drop IOAM. Therefore, someone could simply "create" an IOAM domain with him by sending IPv6 packets with IOAM HBH and steel data. This is something similar to the leak problem.
>

Indeed, draft-ioametal-ippm-6man-ioam-ipv6-options-02 states: "Unless
a particular interface is explicitly enabled (i.e. explicitly
configured) for IOAM, a router MUST drop packets which contain
extension headers carrying IOAM data-fields." I believe this
requirement contradicts the option type act bits being zero. I've
posted to IOAM list about this.


> So, I think there are 2 possibilities against the above: (i) the current one, ie drop by default or (ii) use 01 for act bits. This topic has been widely discussed in the WG and is still open, though the trend seems to be "00" with the drop-by-default compromise.
>
> > For the leakage problem, that is a firewall issue. The expectation is
> > that border devices will have rules that prevent leaking packets out
> > of their domain. This is an orthogonal mechanism that needs to be done
> > for other protocols-- SRH for instance. The filtering is simple, just
> > drop the packet when TLV matches (although I suspect most sites
> > probably just drop packets with EH at this point). This doesn't
> > require any changes to the implementation and doesn't require that
> > border devices even implement IOAM-- they just drop on pattern
> > matching.
>
> +1

Mentioned that also.

>
> Justin
>
> > Tom
> >> Justin
> >>
> >> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
> >> >> (unique) identifier of the interface.
> >> >>
> >> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> >> >> identifier of the node.
> >> >>
> >> >> Two relativistic hash tables: one for IOAM namespaces, the other for
> >> >> IOAM schemas. A namespace can only have a single active schema and a
> >> >> schema can only be attached to a single namespace (1:1 relationship).
> >> >>
> >> >>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >> >>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >> >>   [3]
> >> >>   https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
> >> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> ---
> >> >>  include/linux/ipv6.h       |   2 +
> >> >>  include/net/ioam6.h        |  98 +++++++++++
> >> >>  include/net/netns/ipv6.h   |   2 +
> >> >>  include/uapi/linux/in6.h   |   1 +
> >> >>  include/uapi/linux/ipv6.h  |   2 +
> >> >>  net/ipv6/Makefile          |   2 +-
> >> >>  net/ipv6/addrconf.c        |  20 +++
> >> >>  net/ipv6/af_inet6.c        |   7 +
> >> >>  net/ipv6/exthdrs.c         |  67 ++++++++
> >> >>  net/ipv6/ioam6.c           | 326 +++++++++++++++++++++++++++++++++++++
> >> >>  net/ipv6/sysctl_net_ipv6.c |   7 +
> >> >>  11 files changed, 533 insertions(+), 1 deletion(-)
> >> >>  create mode 100644 include/net/ioam6.h
> >> >>  create mode 100644 net/ipv6/ioam6.c
> >> >>
> >> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> >> index 5312a718bc7a..15732f964c6e 100644
> >> >> --- a/include/linux/ipv6.h
> >> >> +++ b/include/linux/ipv6.h
> >> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> >> >>         __s32           disable_policy;
> >> >>         __s32           ndisc_tclass;
> >> >>         __s32           rpl_seg_enabled;
> >> >> +       __u32           ioam6_enabled;
> >> >> +       __u32           ioam6_id;
> >> >>
> >> >>         struct ctl_table_header *sysctl_header;
> >> >>  };
> >> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> >> >> new file mode 100644
> >> >> index 000000000000..2a910bc99947
> >> >> --- /dev/null
> >> >> +++ b/include/net/ioam6.h
> >> >> @@ -0,0 +1,98 @@
> >> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> >> +/*
> >> >> + *  IOAM IPv6 implementation
> >> >> + *
> >> >> + *  Author:
> >> >> + *  Justin Iurman <justin.iurman@uliege.be>
> >> >> + */
> >> >> +
> >> >> +#ifndef _NET_IOAM6_H
> >> >> +#define _NET_IOAM6_H
> >> >> +
> >> >> +#include <linux/net.h>
> >> >> +#include <linux/ipv6.h>
> >> >> +#include <linux/rhashtable-types.h>
> >> >> +
> >> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
> >> >> +
> >> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> >> >> +
> >> >> +#define IOAM6_TRACE_TYPE0  (1 << 31)
> >> >> +#define IOAM6_TRACE_TYPE1  (1 << 30)
> >> >> +#define IOAM6_TRACE_TYPE2  (1 << 29)
> >> >> +#define IOAM6_TRACE_TYPE3  (1 << 28)
> >> >> +#define IOAM6_TRACE_TYPE4  (1 << 27)
> >> >> +#define IOAM6_TRACE_TYPE5  (1 << 26)
> >> >> +#define IOAM6_TRACE_TYPE6  (1 << 25)
> >> >> +#define IOAM6_TRACE_TYPE7  (1 << 24)
> >> >> +#define IOAM6_TRACE_TYPE8  (1 << 23)
> >> >> +#define IOAM6_TRACE_TYPE9  (1 << 22)
> >> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> >> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> >> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> >> >> +
> >> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> >> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> >> >> +
> >> >> +struct ioam6_common_hdr {
> >> >> +       u8 opt_type;
> >> >> +       u8 opt_len;
> >> >> +       u8 res;
> >> >> +       u8 ioam_type;
> >> >> +       __be16 namespace_id;
> >> >> +} __packed;
> >> >> +
> >> >> +struct ioam6_trace_hdr {
> >> >> +       __be16 info;
> >> >> +       __be32 type;
> >> >> +} __packed;
> >> >> +
> >> >> +struct ioam6_namespace {
> >> >> +       struct rhash_head head;
> >> >> +       struct rcu_head rcu;
> >> >> +
> >> >> +       __be16 id;
> >> >> +       __be64 data;
> >> >> +       bool remove_tlv;
> >> >> +
> >> >> +       struct ioam6_schema *schema;
> >> >> +};
> >> >> +
> >> >> +struct ioam6_schema {
> >> >> +       struct rhash_head head;
> >> >> +       struct rcu_head rcu;
> >> >> +
> >> >> +       u32 id;
> >> >> +       int len;
> >> >> +       __be32 hdr;
> >> >> +       u8 *data;
> >> >> +
> >> >> +       struct ioam6_namespace *ns;
> >> >> +};
> >> >> +
> >> >> +struct ioam6_pernet_data {
> >> >> +       struct mutex lock;
> >> >> +       struct rhashtable namespaces;
> >> >> +       struct rhashtable schemas;
> >> >> +};
> >> >> +
> >> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> >> >> +{
> >> >> +#if IS_ENABLED(CONFIG_IPV6)
> >> >> +       return net->ipv6.ioam6_data;
> >> >> +#else
> >> >> +       return NULL;
> >> >> +#endif
> >> >> +}
> >> >> +
> >> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> >> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> >> +                                 struct ioam6_namespace *ns);
> >> >> +
> >> >> +extern int ioam6_init(void);
> >> >> +extern void ioam6_exit(void);
> >> >> +
> >> >> +#endif
> >> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> >> >> index 5ec054473d81..89b27fa721f4 100644
> >> >> --- a/include/net/netns/ipv6.h
> >> >> +++ b/include/net/netns/ipv6.h
> >> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> >> >>         int max_hbh_opts_len;
> >> >>         int seg6_flowlabel;
> >> >>         bool skip_notify_on_dev_down;
> >> >> +       unsigned int ioam6_id;
> >> >>  };
> >> >>
> >> >>  struct netns_ipv6 {
> >> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> >> >>                 spinlock_t      lock;
> >> >>                 u32             seq;
> >> >>         } ip6addrlbl_table;
> >> >> +       struct ioam6_pernet_data *ioam6_data;
> >> >>  };
> >> >>
> >> >>  #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> >> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> >> >> index 9f2273a08356..1c98435220c9 100644
> >> >> --- a/include/uapi/linux/in6.h
> >> >> +++ b/include/uapi/linux/in6.h
> >> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> >> >>  #define IPV6_TLV_PADN          1
> >> >>  #define IPV6_TLV_ROUTERALERT   5
> >> >>  #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
> >> >> +#define IPV6_TLV_IOAM_HOPOPTS  49
> >> >
> >> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> >> > Note from RFC7120:
> >> >
> >> > "Implementers and deployers need to be aware that deprecation and
> >> > de-allocation could take place at any time after expiry; therefore, an
> >> > expired early allocation is best considered as deprecated. It is not
> >> > IANA's responsibility to track the status of allocations, their
> >> > expirations, or when they may be re-allocated."
> >> >
> >> > The expiration date is Please add a comment here and in the
> >> > Documentation to this effect.
> >> >
> >> >>  #define IPV6_TLV_JUMBO         194
> >> >>  #define IPV6_TLV_HAO           201     /* home address option */
> >> >>
> >> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> >> >> index 13e8751bf24a..eb521b2dd885 100644
> >> >> --- a/include/uapi/linux/ipv6.h
> >> >> +++ b/include/uapi/linux/ipv6.h
> >> >> @@ -189,6 +189,8 @@ enum {
> >> >>         DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> >> >>         DEVCONF_NDISC_TCLASS,
> >> >>         DEVCONF_RPL_SEG_ENABLED,
> >> >> +       DEVCONF_IOAM6_ENABLED,
> >> >> +       DEVCONF_IOAM6_ID,
> >> >>         DEVCONF_MAX
> >> >>  };
> >> >>
> >> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> >> >> index cf7b47bdb9b3..b7ef10d417d6 100644
> >> >> --- a/net/ipv6/Makefile
> >> >> +++ b/net/ipv6/Makefile
> >> >> @@ -10,7 +10,7 @@ ipv6-objs :=  af_inet6.o anycast.o ip6_output.o ip6_input.o
> >> >> addrconf.o \
> >> >>                 route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> >> >>                 raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> >> >>                 exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> >> >> -               udp_offload.o seg6.o fib6_notifier.o rpl.o
> >> >> +               udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
> >> >>
> >> >>  ipv6-offload :=        ip6_offload.o tcpv6_offload.o exthdrs_offload.o
> >> >>
> >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> >> index 840bfdb3d7bd..6c952a28ade2 100644
> >> >> --- a/net/ipv6/addrconf.c
> >> >> +++ b/net/ipv6/addrconf.c
> >> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> >> >>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
> >> >>         .disable_policy         = 0,
> >> >>         .rpl_seg_enabled        = 0,
> >> >> +       .ioam6_enabled          = 0,
> >> >> +       .ioam6_id               = 0,
> >> >>  };
> >> >>
> >> >>  static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> >> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
> >> >> {
> >> >>         .addr_gen_mode          = IN6_ADDR_GEN_MODE_EUI64,
> >> >>         .disable_policy         = 0,
> >> >>         .rpl_seg_enabled        = 0,
> >> >> +       .ioam6_enabled          = 0,
> >> >> +       .ioam6_id               = 0,
> >> >>  };
> >> >>
> >> >>  /* Check if link is ready: is it up and is a valid qdisc available */
> >> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
> >> >> *cnf,
> >> >>         array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> >> >>         array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> >> >>         array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> >> >> +       array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> >> >> +       array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> >> >>  }
> >> >>
> >> >>  static inline size_t inet6_ifla6_size(void)
> >> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> >> >>                 .mode           = 0644,
> >> >>                 .proc_handler   = proc_dointvec,
> >> >>         },
> >> >> +       {
> >> >> +               .procname       = "ioam6_enabled",
> >> >> +               .data           = &ipv6_devconf.ioam6_enabled,
> >> >> +               .maxlen         = sizeof(int),
> >> >> +               .mode           = 0644,
> >> >> +               .proc_handler   = proc_dointvec,
> >> >> +       },
> >> >> +       {
> >> >> +               .procname       = "ioam6_id",
> >> >> +               .data           = &ipv6_devconf.ioam6_id,
> >> >> +               .maxlen         = sizeof(int),
> >> >> +               .mode           = 0644,
> >> >> +               .proc_handler   = proc_dointvec,
> >> >> +       },
> >> >>         {
> >> >>                 /* sentinel */
> >> >>         }
> >> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> >> >> index b304b882e031..63a9ffc4b283 100644
> >> >> --- a/net/ipv6/af_inet6.c
> >> >> +++ b/net/ipv6/af_inet6.c
> >> >> @@ -62,6 +62,7 @@
> >> >>  #include <net/rpl.h>
> >> >>  #include <net/compat.h>
> >> >>  #include <net/xfrm.h>
> >> >> +#include <net/ioam6.h>
> >> >>
> >> >>  #include <linux/uaccess.h>
> >> >>  #include <linux/mroute6.h>
> >> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> >> >>         if (err)
> >> >>                 goto rpl_fail;
> >> >>
> >> >> +       err = ioam6_init();
> >> >> +       if (err)
> >> >> +               goto ioam6_fail;
> >> >> +
> >> >>         err = igmp6_late_init();
> >> >>         if (err)
> >> >>                 goto igmp6_late_err;
> >> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> >> >>  #endif
> >> >>  igmp6_late_err:
> >> >>         rpl_exit();
> >> >> +ioam6_fail:
> >> >> +       ioam6_exit();
> >> >>  rpl_fail:
> >> >>         seg6_exit();
> >> >>  seg6_fail:
> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> index f27ab3bf2e0c..00aee1358f1c 100644
> >> >> --- a/net/ipv6/exthdrs.c
> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> @@ -49,6 +49,8 @@
> >> >>  #include <net/seg6_hmac.h>
> >> >>  #endif
> >> >>  #include <net/rpl.h>
> >> >> +#include <net/ioam6.h>
> >> >> +#include <net/dst_metadata.h>
> >> >>
> >> >>  #include <linux/uaccess.h>
> >> >>
> >> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >>         return TLV_REJECT;
> >> >>  }
> >> >>
> >> >> +/* IOAM */
> >> >> +
> >> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> >> >> +{
> >> >> +       struct ioam6_common_hdr *ioamh;
> >> >> +       struct ioam6_namespace *ns;
> >> >> +
> >> >> +       /* Must be 4n-aligned */
> >> >> +       if (optoff & 3)
> >> >> +               goto drop;
> >> >> +
> >> >> +       if (!skb_valid_dst(skb))
> >> >> +               ip6_route_input(skb);
> >> >> +
> >> >> +       /* IOAM must be enabled on ingress interface */
> >> >> +       if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> >> >> +               goto drop;
> >> >> +
> >> >> +       ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> >> >> +       ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> >> >> +
> >> >> +       /* Unknown IOAM namespace, either:
> >> >> +        *  - Drop it if IOAM is not enabled on egress interface (if any)
> >> >> +        *  - Ignore it otherwise
> >> >> +        */
> >> >> +       if (!ns) {
> >> >> +               if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> >> +                   !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> +                       goto drop;
> >> >> +
> >> >> +               goto accept;
> >> >> +       }
> >> >> +
> >> >> +       if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> +               goto remove;
> >> >> +
> >> >> +       /* Known IOAM namespace which must not be removed:
> >> >> +        * IOAM must be enabled on egress interface
> >> >> +        */
> >> >> +       if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> >> +           !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> +               goto drop;
> >> >> +
> >> >> +       switch (ioamh->ioam_type) {
> >> >> +       case IOAM6_OPT_TRACE_PREALLOC:
> >> >> +               ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> >> >> +               IP6CB(skb)->flags |= IP6SKB_IOAM;
> >> >> +               break;
> >> >> +       default:
> >> >> +               break;
> >> >> +       }
> >> >> +
> >> >> +accept:
> >> >> +       return TLV_ACCEPT;
> >> >> +remove:
> >> >> +       return TLV_REMOVE;
> >> >> +drop:
> >> >> +       kfree_skb(skb);
> >> >> +       return TLV_REJECT;
> >> >> +}
> >> >> +
> >> >>  /* Jumbo payload */
> >> >>
> >> >>  static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >>                 .type   = IPV6_TLV_ROUTERALERT,
> >> >>                 .func   = ipv6_hop_ra,
> >> >>         },
> >> >> +       {
> >> >> +               .type   = IPV6_TLV_IOAM_HOPOPTS,
> >> >> +               .func   = ipv6_hop_ioam,
> >> >> +       },
> >> >>         {
> >> >>                 .type   = IPV6_TLV_JUMBO,
> >> >>                 .func   = ipv6_hop_jumbo,
> >> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> >> >> new file mode 100644
> >> >> index 000000000000..406aa78eb504
> >> >> --- /dev/null
> >> >> +++ b/net/ipv6/ioam6.c
> >> >> @@ -0,0 +1,326 @@
> >> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> >> +/*
> >> >> + *  IOAM IPv6 implementation
> >> >> + *
> >> >> + *  Author:
> >> >> + *  Justin Iurman <justin.iurman@uliege.be>
> >> >> + */
> >> >> +
> >> >> +#include <linux/errno.h>
> >> >> +#include <linux/types.h>
> >> >> +#include <linux/kernel.h>
> >> >> +#include <linux/net.h>
> >> >> +#include <linux/rhashtable.h>
> >> >> +
> >> >> +#include <net/addrconf.h>
> >> >> +#include <net/ioam6.h>
> >> >> +
> >> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> >> >> +{
> >> >> +       kfree_rcu(ns, rcu);
> >> >> +}
> >> >> +
> >> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> >> >> +{
> >> >> +       kfree_rcu(sc, rcu);
> >> >> +}
> >> >> +
> >> >> +static void ioam6_free_ns(void *ptr, void *arg)
> >> >> +{
> >> >> +       struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> >> >> +
> >> >> +       if (ns)
> >> >> +               ioam6_ns_release(ns);
> >> >> +}
> >> >> +
> >> >> +static void ioam6_free_sc(void *ptr, void *arg)
> >> >> +{
> >> >> +       struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> >> >> +
> >> >> +       if (sc)
> >> >> +               ioam6_sc_release(sc);
> >> >> +}
> >> >> +
> >> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> >> +{
> >> >> +       const struct ioam6_namespace *ns = obj;
> >> >> +
> >> >> +       return (ns->id != *(__be16 *)arg->key);
> >> >> +}
> >> >> +
> >> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> >> +{
> >> >> +       const struct ioam6_schema *sc = obj;
> >> >> +
> >> >> +       return (sc->id != *(u32 *)arg->key);
> >> >> +}
> >> >> +
> >> >> +static const struct rhashtable_params rht_ns_params = {
> >> >> +       .key_len                = sizeof(__be16),
> >> >> +       .key_offset             = offsetof(struct ioam6_namespace, id),
> >> >> +       .head_offset            = offsetof(struct ioam6_namespace, head),
> >> >> +       .automatic_shrinking    = true,
> >> >> +       .obj_cmpfn              = ioam6_ns_cmpfn,
> >> >> +};
> >> >> +
> >> >> +static const struct rhashtable_params rht_sc_params = {
> >> >> +       .key_len                = sizeof(u32),
> >> >> +       .key_offset             = offsetof(struct ioam6_schema, id),
> >> >> +       .head_offset            = offsetof(struct ioam6_schema, head),
> >> >> +       .automatic_shrinking    = true,
> >> >> +       .obj_cmpfn              = ioam6_sc_cmpfn,
> >> >> +};
> >> >> +
> >> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> >> >> +{
> >> >> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> >> +
> >> >> +       return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> >> >> +}
> >> >> +
> >> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> >> >> +                               u32 trace_type, struct ioam6_namespace *ns)
> >> >> +{
> >> >> +       u8 *data = skb_network_header(skb) + nodeoff;
> >> >> +       struct __kernel_sock_timeval ts;
> >> >> +       u64 raw_u64;
> >> >> +       u32 raw_u32;
> >> >> +       u16 raw_u16;
> >> >> +       u8 byte;
> >> >> +
> >> >> +       /* hop_lim and node_id */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE0) {
> >> >> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> >> >> +               raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> >> +               if (!raw_u32)
> >> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u24;
> >> >> +               else
> >> >> +                       raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> >> >> +               *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* ingress_if_id and egress_if_id */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE1) {
> >> >> +               raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> >> +               if (!raw_u16)
> >> >> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> >> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> >> >> +               data += sizeof(__be16);
> >> >> +
> >> >> +               raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> >> +               if (!raw_u16)
> >> >> +                       raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> >> +               *(__be16 *)data = cpu_to_be16(raw_u16);
> >> >> +               data += sizeof(__be16);
> >> >> +       }
> >> >> +
> >> >> +       /* timestamp seconds */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE2) {
> >> >> +               if (!skb->tstamp) {
> >> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> +               } else {
> >> >> +                       skb_get_new_timestamp(skb, &ts);
> >> >> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> >> >> +               }
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* timestamp subseconds */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE3) {
> >> >> +               if (!skb->tstamp) {
> >> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> +               } else {
> >> >> +                       if (!(trace_type & IOAM6_TRACE_TYPE2))
> >> >> +                               skb_get_new_timestamp(skb, &ts);
> >> >> +                       *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> >> >> +               }
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* transit delay */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE4) {
> >> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* namespace data */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE5) {
> >> >> +               *(__be32 *)data = (__be32)ns->data;
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* queue depth */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE6) {
> >> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* hop_lim and node_id (wide) */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE7) {
> >> >> +               byte = ipv6_hdr(skb)->hop_limit - 1;
> >> >> +               raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> >> +               if (!raw_u64)
> >> >> +                       raw_u64 = IOAM6_EMPTY_FIELD_u56;
> >> >> +               else
> >> >> +                       raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> >> >> +               *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> >> >> +               data += sizeof(__be64);
> >> >> +       }
> >> >> +
> >> >> +       /* ingress_if_id and egress_if_id (wide) */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE8) {
> >> >> +               raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> >> +               if (!raw_u32)
> >> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> >> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +
> >> >> +               raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> >> +               if (!raw_u32)
> >> >> +                       raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> >> +               *(__be32 *)data = cpu_to_be32(raw_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* namespace data (wide) */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE9) {
> >> >> +               *(__be64 *)data = ns->data;
> >> >> +               data += sizeof(__be64);
> >> >> +       }
> >> >> +
> >> >> +       /* buffer occupancy */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE10) {
> >> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* checksum complement */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE11) {
> >> >> +               *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> +               data += sizeof(__be32);
> >> >> +       }
> >> >> +
> >> >> +       /* opaque state snapshot */
> >> >> +       if (trace_type & IOAM6_TRACE_TYPE22) {
> >> >> +               if (!ns->schema) {
> >> >> +                       *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> >> >> +               } else {
> >> >> +                       *(__be32 *)data = ns->schema->hdr;
> >> >> +                       data += sizeof(__be32);
> >> >> +                       memcpy(data, ns->schema->data, ns->schema->len);
> >> >> +               }
> >> >> +       }
> >> >> +}
> >> >> +
> >> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> >> +                          struct ioam6_namespace *ns)
> >> >> +{
> >> >> +       u8 nodelen, flags, remlen, sclen = 0;
> >> >> +       struct ioam6_trace_hdr *trh;
> >> >> +       int nodeoff;
> >> >> +       u16 info;
> >> >> +       u32 type;
> >> >> +
> >> >> +       trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> >> >> +       info = be16_to_cpu(trh->info);
> >> >> +       type = be32_to_cpu(trh->type);
> >> >> +
> >> >> +       nodelen = info >> 11;
> >> >> +       flags = (info >> 7) & 0xf;
> >> >> +       remlen = info & 0x7f;
> >> >> +
> >> >> +       /* Skip if Overflow bit is set OR
> >> >> +        * if an unknown type (bit 12-21) is set
> >> >> +        */
> >> >> +       if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> >> >> +               return;
> >> >> +
> >> >> +       /* NodeLen does not include Opaque State Snapshot length. We need to
> >> >> +        * take it into account if the corresponding bit is set and if current
> >> >> +        * IOAM namespace has an active schema attached to it
> >> >> +        */
> >> >> +       if (type & IOAM6_TRACE_TYPE22) {
> >> >> +               /* Opaque State Snapshot header size */
> >> >> +               sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> >> >> +
> >> >> +               if (ns->schema)
> >> >> +                       sclen += ns->schema->len / 4;
> >> >> +       }
> >> >> +
> >> >> +       /* Not enough space remaining: set Overflow bit and skip */
> >> >> +       if (!remlen || remlen < (nodelen + sclen)) {
> >> >> +               info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> >> >> +               trh->info = cpu_to_be16(info);
> >> >> +               return;
> >> >> +       }
> >> >> +
> >> >> +       nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> >> >> +       ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> >> >> +
> >> >> +       /* Update RemainingLen */
> >> >> +       remlen -= nodelen + sclen;
> >> >> +       info = (info & 0xff80) | remlen;
> >> >> +       trh->info = cpu_to_be16(info);
> >> >> +}
> >> >> +
> >> >> +static int __net_init ioam6_net_init(struct net *net)
> >> >> +{
> >> >> +       struct ioam6_pernet_data *nsdata;
> >> >> +       int err = -ENOMEM;
> >> >> +
> >> >> +       nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> >> >> +       if (!nsdata)
> >> >> +               goto out;
> >> >> +
> >> >> +       mutex_init(&nsdata->lock);
> >> >> +       net->ipv6.ioam6_data = nsdata;
> >> >> +
> >> >> +       err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> >> >> +       if (err)
> >> >> +               goto free_nsdata;
> >> >> +
> >> >> +       err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> >> >> +       if (err)
> >> >> +               goto free_rht_ns;
> >> >> +
> >> >> +out:
> >> >> +       return err;
> >> >> +free_rht_ns:
> >> >> +       rhashtable_destroy(&nsdata->namespaces);
> >> >> +free_nsdata:
> >> >> +       kfree(nsdata);
> >> >> +       net->ipv6.ioam6_data = NULL;
> >> >> +       goto out;
> >> >> +}
> >> >> +
> >> >> +static void __net_exit ioam6_net_exit(struct net *net)
> >> >> +{
> >> >> +       struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> >> +
> >> >> +       rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> >> >> +       rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> >> >> +
> >> >> +       kfree(nsdata);
> >> >> +}
> >> >> +
> >> >> +static struct pernet_operations ioam6_net_ops = {
> >> >> +       .init = ioam6_net_init,
> >> >> +       .exit = ioam6_net_exit,
> >> >> +};
> >> >> +
> >> >> +int __init ioam6_init(void)
> >> >> +{
> >> >> +       int err = register_pernet_subsys(&ioam6_net_ops);
> >> >> +
> >> >> +       if (err)
> >> >> +               return err;
> >> >> +
> >> >> +       pr_info("In-situ OAM (IOAM) with IPv6\n");
> >> >> +       return 0;
> >> >> +}
> >> >> +
> >> >> +void ioam6_exit(void)
> >> >> +{
> >> >> +       unregister_pernet_subsys(&ioam6_net_ops);
> >> >> +}
> >> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> >> >> index fac2135aa47b..da49b33ab6fc 100644
> >> >> --- a/net/ipv6/sysctl_net_ipv6.c
> >> >> +++ b/net/ipv6/sysctl_net_ipv6.c
> >> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> >> >>                 .mode           = 0644,
> >> >>                 .proc_handler   = proc_dointvec
> >> >>         },
> >> >> +       {
> >> >> +               .procname       = "ioam6_id",
> >> >> +               .data           = &init_net.ipv6.sysctl.ioam6_id,
> >> >> +               .maxlen         = sizeof(int),
> >> >> +               .mode           = 0644,
> >> >> +               .proc_handler   = proc_dointvec
> >> >> +       },
> >> >>         { }
> >> >>  };
> >> >>
> >> >> --
> > > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-26  8:22         ` Justin Iurman
@ 2020-06-26 15:39           ` Tom Herbert
  2020-06-26 17:14             ` Justin Iurman
  0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 15:39 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Fri, Jun 26, 2020 at 1:22 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Tom,
>
> >> Hi Tom,
> >>
> >> >> Add the possibility to remove one or more consecutive TLVs without
> >> >> messing up the alignment of others. For now, only IOAM requires this
> >> >> behavior.
> >> >>
> >> > Hi Justin,
> >> >
> >> > Can you explain the motivation for this? Per RFC8200, extension
> >> > headers in flight are not to be added, removed, or modified outside of
> >> > the standard rules for processing modifiable HBH and DO TLVs., that
> >> > would include adding and removing TLVs in EH. One obvious problem this
> >>
> >> As you already know from our last meeting, IOAM may be configured on a node such
> >> that a specific IOAM namespace should be removed. Therefore, this patch
> >> provides support for the deletion of a TLV (or consecutive TLVs), without
> >> removing the entire EH (if it's empty, there will be padding). Note that there
> >> is a similar "problem" with the Incremental Trace where you'd need to expand
> >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
> >> against modification of in-flight EHs, but there are several reasons that, I
> >> believe, mitigates this statement.
> >>
> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
> >> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
> >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
> >> domain, ie from an IOAM node inside the domain to another one (no need for
> >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
> >> so we modify "our" header and (ii) we already own the traffic.
> >>
> >> And if someone is still angry about this, well, the good news is that such
> >> modification can be avoided most of the time. Indeed, operators are advised to
> >> remove an IOAM namespace only on egress nodes. This way, the destination
> >> (either the tunnel destination or the real destination, depending on the
> >> scenario) will receive EHs and take care of them without the need to remove
> >> anything. But, again, operators can do what they want and I'd tend to adhere to
> >> David's philosophy [1] and give them the possibility to choose what to do.
> >>
> >
> > Justin,
> >
> > 6man WG has had a _long_ and sometimes bitter discussion around this
> > particularly with regards to insertion of SRH. The current consensus
> > of IETF is that it is a violation of RFC8200.  We've heard all the
> > arguments that it's only for limited domains and narrow use cases,
> > nevertheless there are several problems that the header
> > insertion/deletion advocates never answered-- it breaks AH, it breaks
> > PMTU discovery, it breaks ICMP. There is also a risk that a
> > non-standard modification could cause a packet to be dropped
> > downstream from the node that modifies it. There is no attribution on
> > who created the problem, and hence this can lead to systematic
> > blackholes which are the most miserable sort of problem to debug.
>
> Yes, I know the whole story and it's been stormy from what I understood.
>
> > Fundamentally, it is not robust per Postel's law (I actually wrote a
> > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
> > you're interested).
>
> Interesting, I'll take a look.
>
> > IMO, we shouldn't be using Linux as a backdoor to implement protocol
> > that IETF is saying isn't robust. Can you point out in the IOAM drafts
> > where this requirement is specified, then I can take it up in IOAM WG
> > or 6man if needed...
>
> Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1] (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be published.

I was specifically referring to the requirements around removing the
IOAM TLV from packets in-flight. I don't readily see that in the IOAM
drafts.

Also, be careful about saying that drafts are about to be published by
IETF. Until a draft reaches the RFC editor we really can't say that. I
don't believe drafts you're referring to have even made it through
WGLC.

Tom

>
> Justin
>
>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>
> > Tom
> >
> >> > creates is that it breaks AH if the TLVs are removed in HBH before AH
> >> > is processed (AH is processed after HBH).
> >>
> >> Correct. But I don't think it should prevent us from having IOAM in the kernel.
> >> Again, operators could simply apply IOAM on a subset of the traffic that does
> >> not include AHs, for example.
> >>
> >> Justin
> >>
> >>   [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
> >>
> >> > Tom
> >> >> By default, an 8-octet boundary is automatically assumed. This is the
> >> >> price to pay (at most a useless 4-octet padding) to make sure everything
> >> >> is still aligned after the removal.
> >> >>
> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> >> >> header.
> >> >>
> >> >> Example 1:
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |                                                               |
> >> >> ~                Option to be removed (8 octets)                ~
> >> >> |                                                               |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |    Padding    |    Padding    |    Padding    |    Padding    |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> >> >> boundary (same result in both cases).
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |    Padding    |    Padding    |    Padding    |    Padding    |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Example 2:
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |                Option to be removed (4 octets)                |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> >> >> of 8 anymore.
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for
> >> >> all, which means that blocks are only moved in multiples of 8. This
> >> >> assertion guarantees good alignment.
> >> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> ---
> >> >>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> >> >>  1 file changed, 108 insertions(+), 26 deletions(-)
> >> >>
> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> index e9b366994475..f27ab3bf2e0c 100644
> >> >> --- a/net/ipv6/exthdrs.c
> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> @@ -52,17 +52,27 @@
> >> >>
> >> >>  #include <linux/uaccess.h>
> >> >>
> >> >> -/*
> >> >> - *     Parsing tlv encoded headers.
> >> >> +/* States for TLV parsing functions. */
> >> >> +
> >> >> +enum {
> >> >> +       TLV_ACCEPT,
> >> >> +       TLV_REJECT,
> >> >> +       TLV_REMOVE,
> >> >> +       __TLV_MAX
> >> >> +};
> >> >> +
> >> >> +/* Parsing TLV encoded headers.
> >> >>   *
> >> >> - *     Parsing function "func" returns true, if parsing succeed
> >> >> - *     and false, if it failed.
> >> >> - *     It MUST NOT touch skb->h.
> >> >> + * Parsing function "func" returns either:
> >> >> + *  - TLV_ACCEPT if parsing succeeds
> >> >> + *  - TLV_REJECT if parsing fails
> >> >> + *  - TLV_REMOVE if TLV must be removed
> >> >> + * It MUST NOT touch skb->h.
> >> >>   */
> >> >>
> >> >>  struct tlvtype_proc {
> >> >>         int     type;
> >> >> -       bool    (*func)(struct sk_buff *skb, int offset);
> >> >> +       int     (*func)(struct sk_buff *skb, int offset);
> >> >>  };
> >> >>
> >> >>  /*********************
> >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
> >> >> optoff,
> >> >>         return false;
> >> >>  }
> >> >>
> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> >> >> +
> >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> >> >> +{
> >> >> +       int len = end - start;
> >> >> +       int padlen = len % 8;
> >> >> +       unsigned char *h;
> >> >> +       int rlen, off;
> >> >> +       u16 pl_len;
> >> >> +
> >> >> +       rlen = len - padlen;
> >> >> +       if (rlen) {
> >> >> +               skb_pull(skb, rlen);
> >> >> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> >> >> +                       start);
> >> >> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> >> >> +
> >> >> +               skb_reset_network_header(skb);
> >> >> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> >> >> +
> >> >> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> >> >> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> >> >> +
> >> >> +               skb_transport_header(skb)[1] -= rlen >> 3;
> >> >> +               end -= rlen;
> >> >> +       }
> >> >> +
> >> >> +       if (padlen) {
> >> >> +               off = end - padlen;
> >> >> +               h = skb_network_header(skb);
> >> >> +
> >> >> +               if (padlen == 1) {
> >> >> +                       h[off] = IPV6_TLV_PAD1;
> >> >> +               } else {
> >> >> +                       padlen -= 2;
> >> >> +
> >> >> +                       h[off] = IPV6_TLV_PADN;
> >> >> +                       h[off + 1] = padlen;
> >> >> +                       memset(&h[off + 2], 0, padlen);
> >> >> +               }
> >> >> +       }
> >> >> +
> >> >> +       return end;
> >> >> +}
> >> >> +
> >> >>  /* Parse tlv encoded option header (hop-by-hop or destination) */
> >> >>
> >> >>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >>                           struct sk_buff *skb,
> >> >> -                         int max_count)
> >> >> +                         int max_count,
> >> >> +                         bool removable)
> >> >>  {
> >> >>         int len = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> -       const unsigned char *nh = skb_network_header(skb);
> >> >> +       unsigned char *nh = skb_network_header(skb);
> >> >>         int off = skb_network_header_len(skb);
> >> >>         const struct tlvtype_proc *curr;
> >> >>         bool disallow_unknowns = false;
> >> >> +       int off_remove = 0;
> >> >>         int tlv_count = 0;
> >> >>         int padlen = 0;
> >> >> +       int ret;
> >> >>
> >> >>         if (unlikely(max_count < 0)) {
> >> >>                 disallow_unknowns = true;
> >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
> >> >> *procs,
> >> >>                         if (tlv_count > max_count)
> >> >>                                 goto bad;
> >> >>
> >> >> +                       ret = -1;
> >> >>                         for (curr = procs; curr->type >= 0; curr++) {
> >> >>                                 if (curr->type == nh[off]) {
> >> >>                                         /* type specific length/alignment
> >> >>                                            checks will be performed in the
> >> >>                                            func(). */
> >> >> -                                       if (curr->func(skb, off) == false)
> >> >> +                                       ret = curr->func(skb, off);
> >> >> +                                       if (ret == TLV_REJECT)
> >> >>                                                 return false;
> >> >>                                         break;
> >> >>                                 }
> >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> >> >>                                 return false;
> >> >>
> >> >> +                       if (removable) {
> >> >> +                               if (ret == TLV_REMOVE) {
> >> >> +                                       if (!off_remove)
> >> >> +                                               off_remove = off - padlen;
> >> >> +                               } else if (off_remove) {
> >> >> +                                       off = remove_tlv(off_remove, off, skb);
> >> >> +                                       nh = skb_network_header(skb);
> >> >> +                                       off_remove = 0;
> >> >> +                               }
> >> >> +                       }
> >> >> +
> >> >>                         padlen = 0;
> >> >>                         break;
> >> >>                 }
> >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >>                 len -= optlen;
> >> >>         }
> >> >>
> >> >> -       if (len == 0)
> >> >> +       if (len == 0) {
> >> >> +               /* Don't forget last TLV if it must be removed */
> >> >> +               if (off_remove)
> >> >> +                       remove_tlv(off_remove, off, skb);
> >> >> +
> >> >>                 return true;
> >> >> +       }
> >> >>  bad:
> >> >>         kfree_skb(skb);
> >> >>         return false;
> >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >>   *****************************/
> >> >>
> >> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >>  {
> >> >>         struct ipv6_destopt_hao *hao;
> >> >>         struct inet6_skb_parm *opt = IP6CB(skb);
> >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >>         if (skb->tstamp == 0)
> >> >>                 __net_timestamp(skb);
> >> >>
> >> >> -       return true;
> >> >> +       return TLV_ACCEPT;
> >> >>
> >> >>   discard:
> >> >>         kfree_skb(skb);
> >> >> -       return false;
> >> >> +       return TLV_REJECT;
> >> >>  }
> >> >>  #endif
> >> >>
> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> >> >>  #endif
> >> >>
> >> >>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> >> >> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> >> >> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
> >> >> +                         false)) {
> >> >>                 skb->transport_header += extlen;
> >> >>                 opt = IP6CB(skb);
> >> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
> >> >> *skb)
> >> >>
> >> >>  /* Router Alert as of RFC 2711 */
> >> >>
> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >>  {
> >> >>         const unsigned char *nh = skb_network_header(skb);
> >> >>
> >> >>         if (nh[optoff + 1] == 2) {
> >> >>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> >> >>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> >> >> -               return true;
> >> >> +               return TLV_ACCEPT;
> >> >>         }
> >> >>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> >> >>                             nh[optoff + 1]);
> >> >>         kfree_skb(skb);
> >> >> -       return false;
> >> >> +       return TLV_REJECT;
> >> >>  }
> >> >>
> >> >>  /* Jumbo payload */
> >> >>
> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >>  {
> >> >>         const unsigned char *nh = skb_network_header(skb);
> >> >>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> optoff)
> >> >>         if (pkt_len <= IPV6_MAXPLEN) {
> >> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> >> >> -               return false;
> >> >> +               return TLV_REJECT;
> >> >>         }
> >> >>         if (ipv6_hdr(skb)->payload_len) {
> >> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> >> >> -               return false;
> >> >> +               return TLV_REJECT;
> >> >>         }
> >> >>
> >> >>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> optoff)
> >> >>                 goto drop;
> >> >>
> >> >>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> >> >> -       return true;
> >> >> +       return TLV_ACCEPT;
> >> >>
> >> >>  drop:
> >> >>         kfree_skb(skb);
> >> >> -       return false;
> >> >> +       return TLV_REJECT;
> >> >>  }
> >> >>
> >> >>  /* CALIPSO RFC 5570 */
> >> >>
> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >>  {
> >> >>         const unsigned char *nh = skb_network_header(skb);
> >> >>
> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
> >> >> optoff)
> >> >>         if (!calipso_validate(skb, nh + optoff))
> >> >>                 goto drop;
> >> >>
> >> >> -       return true;
> >> >> +       return TLV_ACCEPT;
> >> >>
> >> >>  drop:
> >> >>         kfree_skb(skb);
> >> >> -       return false;
> >> >> +       return TLV_REJECT;
> >> >>  }
> >> >>
> >> >>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
> >> >>
> >> >>         opt->flags |= IP6SKB_HOPBYHOP;
> >> >>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> >> >> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> >> >> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
> >> >> +                         true)) {
> >> >> +               /* we need to refresh the length in case
> >> >> +                * at least one TLV was removed
> >> >> +                */
> >> >> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
> >> >>                 skb->transport_header += extlen;
> >> >>                 opt = IP6CB(skb);
> >> >>                 opt->nhoff = sizeof(struct ipv6hdr);
> >> >> --
> > > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
  2020-06-26  8:31         ` Justin Iurman
@ 2020-06-26 15:52           ` Tom Herbert
  0 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 15:52 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Fri, Jun 26, 2020 at 1:31 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Tom,
>
> >> >> Implement the IOAM egress behavior.
> >> >>
> >> >> According to RFC 8200:
> >> >> "Extension headers (except for the Hop-by-Hop Options header) are not
> >> >>  processed, inserted, or deleted by any node along a packet's delivery
> >> >>  path, until the packet reaches the node (or each of the set of nodes,
> >> >>  in the case of multicast) identified in the Destination Address field
> >> >>  of the IPv6 header."
> >> >>
> >> >> Therefore, an ingress node (an IOAM domain border) must encapsulate an
> >> >> incoming IPv6 packet with another similar IPv6 header that will contain
> >> >> IOAM data while it traverses the domain. When leaving, the egress node,
> >> >> another IOAM domain border which is also the tunnel destination, must
> >> >> decapsulate the packet.
> >> >
> >> > This is just IP in IP encapsulation that happens to be terminated at
> >> > an egress node of the IOAM domain. The fact that it's IOAM isn't
> >> > germaine, this IP in IP is done in a variety of ways. We should be
> >> > using the normal protocol handler for NEXTHDR_IPV6  instead of special
> >> > case code.
> >>
> >> Agree. The reason for this special case code is that I was not aware of a more
> >> elegant solution.
> >>
> > The current implementation might not be what you're looking for since
> > ip6ip6 wants a tunnel configured. What we really want is more like
> > anonymous decapsulation, that is just decap the ip6ip6 packet and
> > resubmit the packet into the stack (this is what you patch is doing).
> > The idea has been kicked around before, especially in the use case
> > where we're tunneling across a domain and there could be hundreds of
> > such tunnels to some device. I think it's generally okay to do this,
> > although someone might raise security concerns since it sort of
> > obfuscates the "real packet". Probably makes sense to have a sysctl to
>
> Indeed. However, in this precise case for IOAM, you don't have security issues since you would only decap if an IOAM HBH is found in the outer header, which is only valid if the node is part of the IOAM domain (IOAM is enabled on its ingress interface). But, for a more generic case, I agree for the sysctl solution.

But again there's no such thing as IOAM packets. There are IPv6
packets that have IOAM TLVs in their Hop-by-Hop or Destination
Options. In this case there are IP6IP6 packets that contain an IOAM
TLV in the other headers, but from a protocol and implementation
perspective there's nothing special about that. The outer headers
could just as easily include an SRH (probably more deployed at this
point) or other options and EH ot maybe no options. So we need a
generic solution and not one tied to a particular use case of IP6IP6
tunneling.

Tom
>
> > enable this and probably could default to on. Of course, if we do this
> > the next question is should we also implement anonymous decapsulation
> > for 44,64,46 tunnels.
>
> Interesting question. I'd say that we should only do it if there is at least a use case that is (or will be) part of the kernel.
>
> Justin
>
> > Tom
> >
> >> Justin
> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> ---
> >> >>  include/linux/ipv6.h |  1 +
> >> >>  net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
> >> >>  2 files changed, 23 insertions(+)
> >> >>
> >> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> >> index 2cb445a8fc9e..5312a718bc7a 100644
> >> >> --- a/include/linux/ipv6.h
> >> >> +++ b/include/linux/ipv6.h
> >> >> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
> >> >>  #define IP6SKB_HOPBYHOP        32
> >> >>  #define IP6SKB_L3SLAVE         64
> >> >>  #define IP6SKB_JUMBOGRAM      128
> >> >> +#define IP6SKB_IOAM           256
> >> >>  };
> >> >>
> >> >>  #if defined(CONFIG_NET_L3_MASTER_DEV)
> >> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> >> >> index e96304d8a4a7..8cf75cc5e806 100644
> >> >> --- a/net/ipv6/ip6_input.c
> >> >> +++ b/net/ipv6/ip6_input.c
> >> >> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
> >> >> *));
> >> >>  void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> >> >>                               bool have_final)
> >> >>  {
> >> >> +       struct inet6_skb_parm *opt = IP6CB(skb);
> >> >>         const struct inet6_protocol *ipprot;
> >> >>         struct inet6_dev *idev;
> >> >>         unsigned int nhoff;
> >> >> +       u8 hop_limit;
> >> >>         bool raw;
> >> >>
> >> >>         /*
> >> >> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> >> sk_buff *skb, int nexthdr,
> >> >>         } else {
> >> >>                 if (!raw) {
> >> >>                         if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
> >> >> +                               /* IOAM Tunnel Decapsulation
> >> >> +                                * Packet is going to re-enter the stack
> >> >> +                                */
> >> >> +                               if (nexthdr == NEXTHDR_IPV6 &&
> >> >> +                                   (opt->flags & IP6SKB_IOAM)) {
> >> >> +                                       hop_limit = ipv6_hdr(skb)->hop_limit;
> >> >> +
> >> >> +                                       skb_reset_network_header(skb);
> >> >> +                                       skb_reset_transport_header(skb);
> >> >> +                                       skb->encapsulation = 0;
> >> >> +
> >> >> +                                       ipv6_hdr(skb)->hop_limit = hop_limit;
> >> >> +                                       __skb_tunnel_rx(skb, skb->dev,
> >> >> +                                                       dev_net(skb->dev));
> >> >> +
> >> >> +                                       netif_rx(skb);
> >> >> +                                       goto out;
> >> >> +                               }
> >> >> +
> >> >>                                 __IP6_INC_STATS(net, idev,
> >> >>                                                 IPSTATS_MIB_INUNKNOWNPROTOS);
> >> >>                                 icmpv6_send(skb, ICMPV6_PARAMPROB,
> >> >> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> >> sk_buff *skb, int nexthdr,
> >> >>                         consume_skb(skb);
> >> >>                 }
> >> >>         }
> >> >> +out:
> >> >>         return;
> >> >>
> >> >>  discard:
> >> >> --
> > > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next] Fix unchecked dereference
  2020-06-26  8:54     ` [PATCH net-next] Fix unchecked dereference Justin Iurman
@ 2020-06-26 16:01         ` Jakub Kicinski
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-26 16:01 UTC (permalink / raw)
  To: Justin Iurman; +Cc: dan.carpenter, kbuild, netdev, lkp, kbuild-all, davem

On Fri, 26 Jun 2020 10:54:35 +0200 Justin Iurman wrote:
> If rhashtable_remove_fast returns an error, a rollback is applied. In
> that case, an unchecked dereference has been fixed.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>

My bot says this doesn't apply to net-next, could you double-check?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next] Fix unchecked dereference
@ 2020-06-26 16:01         ` Jakub Kicinski
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-26 16:01 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

On Fri, 26 Jun 2020 10:54:35 +0200 Justin Iurman wrote:
> If rhashtable_remove_fast returns an error, a rollback is applied. In
> that case, an unchecked dereference has been fixed.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>

My bot says this doesn't apply to net-next, could you double-check?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-26 15:39           ` Tom Herbert
@ 2020-06-26 17:14             ` Justin Iurman
  2020-06-26 18:35               ` Tom Herbert
  0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 17:14 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller

>> Tom,
>>
>> >> Hi Tom,
>> >>
>> >> >> Add the possibility to remove one or more consecutive TLVs without
>> >> >> messing up the alignment of others. For now, only IOAM requires this
>> >> >> behavior.
>> >> >>
>> >> > Hi Justin,
>> >> >
>> >> > Can you explain the motivation for this? Per RFC8200, extension
>> >> > headers in flight are not to be added, removed, or modified outside of
>> >> > the standard rules for processing modifiable HBH and DO TLVs., that
>> >> > would include adding and removing TLVs in EH. One obvious problem this
>> >>
>> >> As you already know from our last meeting, IOAM may be configured on a node such
>> >> that a specific IOAM namespace should be removed. Therefore, this patch
>> >> provides support for the deletion of a TLV (or consecutive TLVs), without
>> >> removing the entire EH (if it's empty, there will be padding). Note that there
>> >> is a similar "problem" with the Incremental Trace where you'd need to expand
>> >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
>> >> against modification of in-flight EHs, but there are several reasons that, I
>> >> believe, mitigates this statement.
>> >>
>> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
>> >> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
>> >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
>> >> domain, ie from an IOAM node inside the domain to another one (no need for
>> >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
>> >> so we modify "our" header and (ii) we already own the traffic.
>> >>
>> >> And if someone is still angry about this, well, the good news is that such
>> >> modification can be avoided most of the time. Indeed, operators are advised to
>> >> remove an IOAM namespace only on egress nodes. This way, the destination
>> >> (either the tunnel destination or the real destination, depending on the
>> >> scenario) will receive EHs and take care of them without the need to remove
>> >> anything. But, again, operators can do what they want and I'd tend to adhere to
>> >> David's philosophy [1] and give them the possibility to choose what to do.
>> >>
>> >
>> > Justin,
>> >
>> > 6man WG has had a _long_ and sometimes bitter discussion around this
>> > particularly with regards to insertion of SRH. The current consensus
>> > of IETF is that it is a violation of RFC8200.  We've heard all the
>> > arguments that it's only for limited domains and narrow use cases,
>> > nevertheless there are several problems that the header
>> > insertion/deletion advocates never answered-- it breaks AH, it breaks
>> > PMTU discovery, it breaks ICMP. There is also a risk that a
>> > non-standard modification could cause a packet to be dropped
>> > downstream from the node that modifies it. There is no attribution on
>> > who created the problem, and hence this can lead to systematic
>> > blackholes which are the most miserable sort of problem to debug.
>>
>> Yes, I know the whole story and it's been stormy from what I understood.
>>
>> > Fundamentally, it is not robust per Postel's law (I actually wrote a
>> > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
>> > you're interested).
>>
>> Interesting, I'll take a look.
>>
>> > IMO, we shouldn't be using Linux as a backdoor to implement protocol
>> > that IETF is saying isn't robust. Can you point out in the IOAM drafts
>> > where this requirement is specified, then I can take it up in IOAM WG
>> > or 6man if needed...
>>
>> Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1]
>> (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be
>> published.
> 
> I was specifically referring to the requirements around removing the
> IOAM TLV from packets in-flight. I don't readily see that in the IOAM
> drafts.

Actually, this is not in the draft. Authors wanted to give operators a little bit of freedom and this one would restrict their choices, even if it's better or even the most logical option we could think about. Maybe we could discuss this on the IPPM mailing list as well on whether we should add it or not? I've two advises for operators, one about the encapsulation and this one about the removal of an IOAM option.

> Also, be careful about saying that drafts are about to be published by
> IETF. Until a draft reaches the RFC editor we really can't say that. I
> don't believe drafts you're referring to have even made it through
> WGLC.

Indeed, but draft-ietf-ippm-ioam-data is already at its second WGLC, did you miss it on the IPPM mailing list? As for draft-ietf-ippm-ioam-ipv6-options, it is just my prediction but I guess it should come soon as well since IANA early allocation (there were talks about that on the WG).

Justin

> Tom
> 
>>
>> Justin
>>
>>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>>
>> > Tom
>> >
>> >> > creates is that it breaks AH if the TLVs are removed in HBH before AH
>> >> > is processed (AH is processed after HBH).
>> >>
>> >> Correct. But I don't think it should prevent us from having IOAM in the kernel.
>> >> Again, operators could simply apply IOAM on a subset of the traffic that does
>> >> not include AHs, for example.
>> >>
>> >> Justin
>> >>
>> >>   [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
>> >>
>> >> > Tom
>> >> >> By default, an 8-octet boundary is automatically assumed. This is the
>> >> >> price to pay (at most a useless 4-octet padding) to make sure everything
>> >> >> is still aligned after the removal.
>> >> >>
>> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
>> >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
>> >> >> header.
>> >> >>
>> >> >> Example 1:
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       X       |       X       |    Padding    |    Padding    |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |                                                               |
>> >> >> ~                Option to be removed (8 octets)                ~
>> >> >> |                                                               |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Y       |       Y       |       Y       |       Y       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |    Padding    |    Padding    |    Padding    |    Padding    |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
>> >> >> boundary (same result in both cases).
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       X       |       X       |    Padding    |    Padding    |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Y       |       Y       |       Y       |       Y       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |    Padding    |    Padding    |    Padding    |    Padding    |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Example 2:
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       X       |       X       |    Padding    |    Padding    |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |                Option to be removed (4 octets)                |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Y       |       Y       |       Y       |       Y       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
>> >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
>> >> >> of 8 anymore.
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       X       |       X       |    Padding    |    Padding    |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Y       |       Y       |       Y       |       Y       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> |       Z       |       Z       |       Z       |       Z       |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for
>> >> >> all, which means that blocks are only moved in multiples of 8. This
>> >> >> assertion guarantees good alignment.
>> >> >>
>> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> >> ---
>> >> >>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>> >> >>  1 file changed, 108 insertions(+), 26 deletions(-)
>> >> >>
>> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> >> >> index e9b366994475..f27ab3bf2e0c 100644
>> >> >> --- a/net/ipv6/exthdrs.c
>> >> >> +++ b/net/ipv6/exthdrs.c
>> >> >> @@ -52,17 +52,27 @@
>> >> >>
>> >> >>  #include <linux/uaccess.h>
>> >> >>
>> >> >> -/*
>> >> >> - *     Parsing tlv encoded headers.
>> >> >> +/* States for TLV parsing functions. */
>> >> >> +
>> >> >> +enum {
>> >> >> +       TLV_ACCEPT,
>> >> >> +       TLV_REJECT,
>> >> >> +       TLV_REMOVE,
>> >> >> +       __TLV_MAX
>> >> >> +};
>> >> >> +
>> >> >> +/* Parsing TLV encoded headers.
>> >> >>   *
>> >> >> - *     Parsing function "func" returns true, if parsing succeed
>> >> >> - *     and false, if it failed.
>> >> >> - *     It MUST NOT touch skb->h.
>> >> >> + * Parsing function "func" returns either:
>> >> >> + *  - TLV_ACCEPT if parsing succeeds
>> >> >> + *  - TLV_REJECT if parsing fails
>> >> >> + *  - TLV_REMOVE if TLV must be removed
>> >> >> + * It MUST NOT touch skb->h.
>> >> >>   */
>> >> >>
>> >> >>  struct tlvtype_proc {
>> >> >>         int     type;
>> >> >> -       bool    (*func)(struct sk_buff *skb, int offset);
>> >> >> +       int     (*func)(struct sk_buff *skb, int offset);
>> >> >>  };
>> >> >>
>> >> >>  /*********************
>> >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
>> >> >> optoff,
>> >> >>         return false;
>> >> >>  }
>> >> >>
>> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
>> >> >> +
>> >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
>> >> >> +{
>> >> >> +       int len = end - start;
>> >> >> +       int padlen = len % 8;
>> >> >> +       unsigned char *h;
>> >> >> +       int rlen, off;
>> >> >> +       u16 pl_len;
>> >> >> +
>> >> >> +       rlen = len - padlen;
>> >> >> +       if (rlen) {
>> >> >> +               skb_pull(skb, rlen);
>> >> >> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
>> >> >> +                       start);
>> >> >> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
>> >> >> +
>> >> >> +               skb_reset_network_header(skb);
>> >> >> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
>> >> >> +
>> >> >> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
>> >> >> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
>> >> >> +
>> >> >> +               skb_transport_header(skb)[1] -= rlen >> 3;
>> >> >> +               end -= rlen;
>> >> >> +       }
>> >> >> +
>> >> >> +       if (padlen) {
>> >> >> +               off = end - padlen;
>> >> >> +               h = skb_network_header(skb);
>> >> >> +
>> >> >> +               if (padlen == 1) {
>> >> >> +                       h[off] = IPV6_TLV_PAD1;
>> >> >> +               } else {
>> >> >> +                       padlen -= 2;
>> >> >> +
>> >> >> +                       h[off] = IPV6_TLV_PADN;
>> >> >> +                       h[off + 1] = padlen;
>> >> >> +                       memset(&h[off + 2], 0, padlen);
>> >> >> +               }
>> >> >> +       }
>> >> >> +
>> >> >> +       return end;
>> >> >> +}
>> >> >> +
>> >> >>  /* Parse tlv encoded option header (hop-by-hop or destination) */
>> >> >>
>> >> >>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >>                           struct sk_buff *skb,
>> >> >> -                         int max_count)
>> >> >> +                         int max_count,
>> >> >> +                         bool removable)
>> >> >>  {
>> >> >>         int len = (skb_transport_header(skb)[1] + 1) << 3;
>> >> >> -       const unsigned char *nh = skb_network_header(skb);
>> >> >> +       unsigned char *nh = skb_network_header(skb);
>> >> >>         int off = skb_network_header_len(skb);
>> >> >>         const struct tlvtype_proc *curr;
>> >> >>         bool disallow_unknowns = false;
>> >> >> +       int off_remove = 0;
>> >> >>         int tlv_count = 0;
>> >> >>         int padlen = 0;
>> >> >> +       int ret;
>> >> >>
>> >> >>         if (unlikely(max_count < 0)) {
>> >> >>                 disallow_unknowns = true;
>> >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
>> >> >> *procs,
>> >> >>                         if (tlv_count > max_count)
>> >> >>                                 goto bad;
>> >> >>
>> >> >> +                       ret = -1;
>> >> >>                         for (curr = procs; curr->type >= 0; curr++) {
>> >> >>                                 if (curr->type == nh[off]) {
>> >> >>                                         /* type specific length/alignment
>> >> >>                                            checks will be performed in the
>> >> >>                                            func(). */
>> >> >> -                                       if (curr->func(skb, off) == false)
>> >> >> +                                       ret = curr->func(skb, off);
>> >> >> +                                       if (ret == TLV_REJECT)
>> >> >>                                                 return false;
>> >> >>                                         break;
>> >> >>                                 }
>> >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>> >> >>                                 return false;
>> >> >>
>> >> >> +                       if (removable) {
>> >> >> +                               if (ret == TLV_REMOVE) {
>> >> >> +                                       if (!off_remove)
>> >> >> +                                               off_remove = off - padlen;
>> >> >> +                               } else if (off_remove) {
>> >> >> +                                       off = remove_tlv(off_remove, off, skb);
>> >> >> +                                       nh = skb_network_header(skb);
>> >> >> +                                       off_remove = 0;
>> >> >> +                               }
>> >> >> +                       }
>> >> >> +
>> >> >>                         padlen = 0;
>> >> >>                         break;
>> >> >>                 }
>> >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >>                 len -= optlen;
>> >> >>         }
>> >> >>
>> >> >> -       if (len == 0)
>> >> >> +       if (len == 0) {
>> >> >> +               /* Don't forget last TLV if it must be removed */
>> >> >> +               if (off_remove)
>> >> >> +                       remove_tlv(off_remove, off, skb);
>> >> >> +
>> >> >>                 return true;
>> >> >> +       }
>> >> >>  bad:
>> >> >>         kfree_skb(skb);
>> >> >>         return false;
>> >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >>   *****************************/
>> >> >>
>> >> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> >>  {
>> >> >>         struct ipv6_destopt_hao *hao;
>> >> >>         struct inet6_skb_parm *opt = IP6CB(skb);
>> >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> >>         if (skb->tstamp == 0)
>> >> >>                 __net_timestamp(skb);
>> >> >>
>> >> >> -       return true;
>> >> >> +       return TLV_ACCEPT;
>> >> >>
>> >> >>   discard:
>> >> >>         kfree_skb(skb);
>> >> >> -       return false;
>> >> >> +       return TLV_REJECT;
>> >> >>  }
>> >> >>  #endif
>> >> >>
>> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>> >> >>  #endif
>> >> >>
>> >> >>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
>> >> >> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
>> >> >> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
>> >> >> +                         false)) {
>> >> >>                 skb->transport_header += extlen;
>> >> >>                 opt = IP6CB(skb);
>> >> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
>> >> >> *skb)
>> >> >>
>> >> >>  /* Router Alert as of RFC 2711 */
>> >> >>
>> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> >>  {
>> >> >>         const unsigned char *nh = skb_network_header(skb);
>> >> >>
>> >> >>         if (nh[optoff + 1] == 2) {
>> >> >>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>> >> >>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
>> >> >> -               return true;
>> >> >> +               return TLV_ACCEPT;
>> >> >>         }
>> >> >>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>> >> >>                             nh[optoff + 1]);
>> >> >>         kfree_skb(skb);
>> >> >> -       return false;
>> >> >> +       return TLV_REJECT;
>> >> >>  }
>> >> >>
>> >> >>  /* Jumbo payload */
>> >> >>
>> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> >>  {
>> >> >>         const unsigned char *nh = skb_network_header(skb);
>> >> >>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
>> >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> >> optoff)
>> >> >>         if (pkt_len <= IPV6_MAXPLEN) {
>> >> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
>> >> >> -               return false;
>> >> >> +               return TLV_REJECT;
>> >> >>         }
>> >> >>         if (ipv6_hdr(skb)->payload_len) {
>> >> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
>> >> >> -               return false;
>> >> >> +               return TLV_REJECT;
>> >> >>         }
>> >> >>
>> >> >>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
>> >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> >> optoff)
>> >> >>                 goto drop;
>> >> >>
>> >> >>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
>> >> >> -       return true;
>> >> >> +       return TLV_ACCEPT;
>> >> >>
>> >> >>  drop:
>> >> >>         kfree_skb(skb);
>> >> >> -       return false;
>> >> >> +       return TLV_REJECT;
>> >> >>  }
>> >> >>
>> >> >>  /* CALIPSO RFC 5570 */
>> >> >>
>> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> >>  {
>> >> >>         const unsigned char *nh = skb_network_header(skb);
>> >> >>
>> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
>> >> >> optoff)
>> >> >>         if (!calipso_validate(skb, nh + optoff))
>> >> >>                 goto drop;
>> >> >>
>> >> >> -       return true;
>> >> >> +       return TLV_ACCEPT;
>> >> >>
>> >> >>  drop:
>> >> >>         kfree_skb(skb);
>> >> >> -       return false;
>> >> >> +       return TLV_REJECT;
>> >> >>  }
>> >> >>
>> >> >>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>> >> >>
>> >> >>         opt->flags |= IP6SKB_HOPBYHOP;
>> >> >>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
>> >> >> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
>> >> >> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
>> >> >> +                         true)) {
>> >> >> +               /* we need to refresh the length in case
>> >> >> +                * at least one TLV was removed
>> >> >> +                */
>> >> >> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
>> >> >>                 skb->transport_header += extlen;
>> >> >>                 opt = IP6CB(skb);
>> >> >>                 opt->nhoff = sizeof(struct ipv6hdr);
>> >> >> --
> > > > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next] Fix unchecked dereference
  2020-06-26 16:01         ` Jakub Kicinski
  (?)
@ 2020-06-26 17:23         ` Justin Iurman
  2020-06-27  4:04             ` Jakub Kicinski
  -1 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 17:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: dan carpenter, kbuild, netdev, lkp, kbuild-all, davem

Hi Jakub,

It is an inline modification of the patch 4 of this series. The modification in itself cannot be a problem. Maybe I did send it the wrong way?

Justin

>> If rhashtable_remove_fast returns an error, a rollback is applied. In
>> that case, an unchecked dereference has been fixed.
>> 
>> Reported-by: kernel test robot <lkp@intel.com>
>> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> 
> My bot says this doesn't apply to net-next, could you double-check?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
  2020-06-26 17:14             ` Justin Iurman
@ 2020-06-26 18:35               ` Tom Herbert
  0 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 18:35 UTC (permalink / raw)
  To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller

On Fri, Jun 26, 2020 at 10:14 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> >> Tom,
> >>
> >> >> Hi Tom,
> >> >>
> >> >> >> Add the possibility to remove one or more consecutive TLVs without
> >> >> >> messing up the alignment of others. For now, only IOAM requires this
> >> >> >> behavior.
> >> >> >>
> >> >> > Hi Justin,
> >> >> >
> >> >> > Can you explain the motivation for this? Per RFC8200, extension
> >> >> > headers in flight are not to be added, removed, or modified outside of
> >> >> > the standard rules for processing modifiable HBH and DO TLVs., that
> >> >> > would include adding and removing TLVs in EH. One obvious problem this
> >> >>
> >> >> As you already know from our last meeting, IOAM may be configured on a node such
> >> >> that a specific IOAM namespace should be removed. Therefore, this patch
> >> >> provides support for the deletion of a TLV (or consecutive TLVs), without
> >> >> removing the entire EH (if it's empty, there will be padding). Note that there
> >> >> is a similar "problem" with the Incremental Trace where you'd need to expand
> >> >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
> >> >> against modification of in-flight EHs, but there are several reasons that, I
> >> >> believe, mitigates this statement.
> >> >>
> >> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
> >> >> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
> >> >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
> >> >> domain, ie from an IOAM node inside the domain to another one (no need for
> >> >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
> >> >> so we modify "our" header and (ii) we already own the traffic.
> >> >>
> >> >> And if someone is still angry about this, well, the good news is that such
> >> >> modification can be avoided most of the time. Indeed, operators are advised to
> >> >> remove an IOAM namespace only on egress nodes. This way, the destination
> >> >> (either the tunnel destination or the real destination, depending on the
> >> >> scenario) will receive EHs and take care of them without the need to remove
> >> >> anything. But, again, operators can do what they want and I'd tend to adhere to
> >> >> David's philosophy [1] and give them the possibility to choose what to do.
> >> >>
> >> >
> >> > Justin,
> >> >
> >> > 6man WG has had a _long_ and sometimes bitter discussion around this
> >> > particularly with regards to insertion of SRH. The current consensus
> >> > of IETF is that it is a violation of RFC8200.  We've heard all the
> >> > arguments that it's only for limited domains and narrow use cases,
> >> > nevertheless there are several problems that the header
> >> > insertion/deletion advocates never answered-- it breaks AH, it breaks
> >> > PMTU discovery, it breaks ICMP. There is also a risk that a
> >> > non-standard modification could cause a packet to be dropped
> >> > downstream from the node that modifies it. There is no attribution on
> >> > who created the problem, and hence this can lead to systematic
> >> > blackholes which are the most miserable sort of problem to debug.
> >>
> >> Yes, I know the whole story and it's been stormy from what I understood.
> >>
> >> > Fundamentally, it is not robust per Postel's law (I actually wrote a
> >> > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
> >> > you're interested).
> >>
> >> Interesting, I'll take a look.
> >>
> >> > IMO, we shouldn't be using Linux as a backdoor to implement protocol
> >> > that IETF is saying isn't robust. Can you point out in the IOAM drafts
> >> > where this requirement is specified, then I can take it up in IOAM WG
> >> > or 6man if needed...
> >>
> >> Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1]
> >> (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be
> >> published.
> >
> > I was specifically referring to the requirements around removing the
> > IOAM TLV from packets in-flight. I don't readily see that in the IOAM
> > drafts.
>patch
> Actually, this is not in the draft. Authors wanted to give operators a little bit of freedom and this one would restrict their choices, even if it's better or even the most logical option we could think about. Maybe we could discuss this on the IPPM mailing list as well on whether we should add it or not? I've two advises for operators, one about the encapsulation and this one about the removal of an IOAM option.
>

Justin,

You're welcome to take it up on the IPPM list, but beware there is
going to be pushback. Make sure you can show a clear justification and
how any potential issues it causes are mitigated. If
draft-herbert-6man-eh-attrib-00 facilitates that we can take a look
implementing it.

Until we have clarity on the protocol requirements and the need for
this, I don't think this patch should be accepted.

Tom
> > Also, be careful about saying that drafts are about to be published by
> > IETF. Until a draft reaches the RFC editor we really can't say that. I
> > don't believe drafts you're referring to have even made it through
> > WGLC.
>
> Indeed, but draft-ietf-ippm-ioam-data is already at its second WGLC, did you miss it on the IPPM mailing list? As for draft-ietf-ippm-ioam-ipv6-options, it is just my prediction but I guess it should come soon as well since IANA early allocation (there were talks about that on the WG).
>
> Justin
>
> > Tom
> >
> >>
> >> Justin
> >>
> >>   [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >>   [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >>
> >> > Tom
> >> >
> >> >> > creates is that it breaks AH if the TLVs are removed in HBH before AH
> >> >> > is processed (AH is processed after HBH).
> >> >>
> >> >> Correct. But I don't think it should prevent us from having IOAM in the kernel.
> >> >> Again, operators could simply apply IOAM on a subset of the traffic that does
> >> >> not include AHs, for example.
> >> >>
> >> >> Justin
> >> >>
> >> >>   [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
> >> >>
> >> >> > Tom
> >> >> >> By default, an 8-octet boundary is automatically assumed. This is the
> >> >> >> price to pay (at most a useless 4-octet padding) to make sure everything
> >> >> >> is still aligned after the removal.
> >> >> >>
> >> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> >> >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> >> >> >> header.
> >> >> >>
> >> >> >> Example 1:
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |                                                               |
> >> >> >> ~                Option to be removed (8 octets)                ~
> >> >> >> |                                                               |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |    Padding    |    Padding    |    Padding    |    Padding    |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> >> >> >> boundary (same result in both cases).
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |    Padding    |    Padding    |    Padding    |    Padding    |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Example 2:
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |                Option to be removed (4 octets)                |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> >> >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> >> >> >> of 8 anymore.
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |  Next header  |  Hdr Ext Len  |       X       |       X       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       X       |       X       |    Padding    |    Padding    |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Y       |       Y       |       Y       |       Y       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> |       Z       |       Z       |       Z       |       Z       |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for
> >> >> >> all, which means that blocks are only moved in multiples of 8. This
> >> >> >> assertion guarantees good alignment.
> >> >> >>
> >> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> >> ---
> >> >> >>  net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> >> >> >>  1 file changed, 108 insertions(+), 26 deletions(-)
> >> >> >>
> >> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> >> index e9b366994475..f27ab3bf2e0c 100644
> >> >> >> --- a/net/ipv6/exthdrs.c
> >> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> >> @@ -52,17 +52,27 @@
> >> >> >>
> >> >> >>  #include <linux/uaccess.h>
> >> >> >>
> >> >> >> -/*
> >> >> >> - *     Parsing tlv encoded headers.
> >> >> >> +/* States for TLV parsing functions. */
> >> >> >> +
> >> >> >> +enum {
> >> >> >> +       TLV_ACCEPT,
> >> >> >> +       TLV_REJECT,
> >> >> >> +       TLV_REMOVE,
> >> >> >> +       __TLV_MAX
> >> >> >> +};
> >> >> >> +
> >> >> >> +/* Parsing TLV encoded headers.
> >> >> >>   *
> >> >> >> - *     Parsing function "func" returns true, if parsing succeed
> >> >> >> - *     and false, if it failed.
> >> >> >> - *     It MUST NOT touch skb->h.
> >> >> >> + * Parsing function "func" returns either:
> >> >> >> + *  - TLV_ACCEPT if parsing succeeds
> >> >> >> + *  - TLV_REJECT if parsing fails
> >> >> >> + *  - TLV_REMOVE if TLV must be removed
> >> >> >> + * It MUST NOT touch skb->h.
> >> >> >>   */
> >> >> >>
> >> >> >>  struct tlvtype_proc {
> >> >> >>         int     type;
> >> >> >> -       bool    (*func)(struct sk_buff *skb, int offset);
> >> >> >> +       int     (*func)(struct sk_buff *skb, int offset);
> >> >> >>  };
> >> >> >>
> >> >> >>  /*********************
> >> >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
> >> >> >> optoff,
> >> >> >>         return false;
> >> >> >>  }
> >> >> >>
> >> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> >> >> >> +
> >> >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> >> >> >> +{
> >> >> >> +       int len = end - start;
> >> >> >> +       int padlen = len % 8;
> >> >> >> +       unsigned char *h;
> >> >> >> +       int rlen, off;
> >> >> >> +       u16 pl_len;
> >> >> >> +
> >> >> >> +       rlen = len - padlen;
> >> >> >> +       if (rlen) {
> >> >> >> +               skb_pull(skb, rlen);
> >> >> >> +               memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> >> >> >> +                       start);
> >> >> >> +               skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> >> >> >> +
> >> >> >> +               skb_reset_network_header(skb);
> >> >> >> +               skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> >> >> >> +
> >> >> >> +               pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> >> >> >> +               ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> >> >> >> +
> >> >> >> +               skb_transport_header(skb)[1] -= rlen >> 3;
> >> >> >> +               end -= rlen;
> >> >> >> +       }
> >> >> >> +
> >> >> >> +       if (padlen) {
> >> >> >> +               off = end - padlen;
> >> >> >> +               h = skb_network_header(skb);
> >> >> >> +
> >> >> >> +               if (padlen == 1) {
> >> >> >> +                       h[off] = IPV6_TLV_PAD1;
> >> >> >> +               } else {
> >> >> >> +                       padlen -= 2;
> >> >> >> +
> >> >> >> +                       h[off] = IPV6_TLV_PADN;
> >> >> >> +                       h[off + 1] = padlen;
> >> >> >> +                       memset(&h[off + 2], 0, padlen);
> >> >> >> +               }
> >> >> >> +       }
> >> >> >> +
> >> >> >> +       return end;
> >> >> >> +}
> >> >> >> +
> >> >> >>  /* Parse tlv encoded option header (hop-by-hop or destination) */
> >> >> >>
> >> >> >>  static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >>                           struct sk_buff *skb,
> >> >> >> -                         int max_count)
> >> >> >> +                         int max_count,
> >> >> >> +                         bool removable)
> >> >> >>  {
> >> >> >>         int len = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> >> -       const unsigned char *nh = skb_network_header(skb);
> >> >> >> +       unsigned char *nh = skb_network_header(skb);
> >> >> >>         int off = skb_network_header_len(skb);
> >> >> >>         const struct tlvtype_proc *curr;
> >> >> >>         bool disallow_unknowns = false;
> >> >> >> +       int off_remove = 0;
> >> >> >>         int tlv_count = 0;
> >> >> >>         int padlen = 0;
> >> >> >> +       int ret;
> >> >> >>
> >> >> >>         if (unlikely(max_count < 0)) {
> >> >> >>                 disallow_unknowns = true;
> >> >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
> >> >> >> *procs,
> >> >> >>                         if (tlv_count > max_count)
> >> >> >>                                 goto bad;
> >> >> >>
> >> >> >> +                       ret = -1;
> >> >> >>                         for (curr = procs; curr->type >= 0; curr++) {
> >> >> >>                                 if (curr->type == nh[off]) {
> >> >> >>                                         /* type specific length/alignment
> >> >> >>                                            checks will be performed in the
> >> >> >>                                            func(). */
> >> >> >> -                                       if (curr->func(skb, off) == false)
> >> >> >> +                                       ret = curr->func(skb, off);
> >> >> >> +                                       if (ret == TLV_REJECT)
> >> >> >>                                                 return false;
> >> >> >>                                         break;
> >> >> >>                                 }
> >> >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >>                             !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> >> >> >>                                 return false;
> >> >> >>
> >> >> >> +                       if (removable) {
> >> >> >> +                               if (ret == TLV_REMOVE) {
> >> >> >> +                                       if (!off_remove)
> >> >> >> +                                               off_remove = off - padlen;
> >> >> >> +                               } else if (off_remove) {
> >> >> >> +                                       off = remove_tlv(off_remove, off, skb);
> >> >> >> +                                       nh = skb_network_header(skb);
> >> >> >> +                                       off_remove = 0;
> >> >> >> +                               }
> >> >> >> +                       }
> >> >> >> +
> >> >> >>                         padlen = 0;
> >> >> >>                         break;
> >> >> >>                 }
> >> >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >>                 len -= optlen;
> >> >> >>         }
> >> >> >>
> >> >> >> -       if (len == 0)
> >> >> >> +       if (len == 0) {
> >> >> >> +               /* Don't forget last TLV if it must be removed */
> >> >> >> +               if (off_remove)
> >> >> >> +                       remove_tlv(off_remove, off, skb);
> >> >> >> +
> >> >> >>                 return true;
> >> >> >> +       }
> >> >> >>  bad:
> >> >> >>         kfree_skb(skb);
> >> >> >>         return false;
> >> >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >>   *****************************/
> >> >> >>
> >> >> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> >>  {
> >> >> >>         struct ipv6_destopt_hao *hao;
> >> >> >>         struct inet6_skb_parm *opt = IP6CB(skb);
> >> >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> >>         if (skb->tstamp == 0)
> >> >> >>                 __net_timestamp(skb);
> >> >> >>
> >> >> >> -       return true;
> >> >> >> +       return TLV_ACCEPT;
> >> >> >>
> >> >> >>   discard:
> >> >> >>         kfree_skb(skb);
> >> >> >> -       return false;
> >> >> >> +       return TLV_REJECT;
> >> >> >>  }
> >> >> >>  #endif
> >> >> >>
> >> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> >> >> >>  #endif
> >> >> >>
> >> >> >>         if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> >> >> >> -                         init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> >> >> >> +                         init_net.ipv6.sysctl.max_dst_opts_cnt,
> >> >> >> +                         false)) {
> >> >> >>                 skb->transport_header += extlen;
> >> >> >>                 opt = IP6CB(skb);
> >> >> >>  #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
> >> >> >> *skb)
> >> >> >>
> >> >> >>  /* Router Alert as of RFC 2711 */
> >> >> >>
> >> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> >>  {
> >> >> >>         const unsigned char *nh = skb_network_header(skb);
> >> >> >>
> >> >> >>         if (nh[optoff + 1] == 2) {
> >> >> >>                 IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> >> >> >>                 memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> >> >> >> -               return true;
> >> >> >> +               return TLV_ACCEPT;
> >> >> >>         }
> >> >> >>         net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> >> >> >>                             nh[optoff + 1]);
> >> >> >>         kfree_skb(skb);
> >> >> >> -       return false;
> >> >> >> +       return TLV_REJECT;
> >> >> >>  }
> >> >> >>
> >> >> >>  /* Jumbo payload */
> >> >> >>
> >> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> >>  {
> >> >> >>         const unsigned char *nh = skb_network_header(skb);
> >> >> >>         struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> >> >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> >> optoff)
> >> >> >>         if (pkt_len <= IPV6_MAXPLEN) {
> >> >> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> >> >> >> -               return false;
> >> >> >> +               return TLV_REJECT;
> >> >> >>         }
> >> >> >>         if (ipv6_hdr(skb)->payload_len) {
> >> >> >>                 __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >> >>                 icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> >> >> >> -               return false;
> >> >> >> +               return TLV_REJECT;
> >> >> >>         }
> >> >> >>
> >> >> >>         if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> >> >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> >> optoff)
> >> >> >>                 goto drop;
> >> >> >>
> >> >> >>         IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> >> >> >> -       return true;
> >> >> >> +       return TLV_ACCEPT;
> >> >> >>
> >> >> >>  drop:
> >> >> >>         kfree_skb(skb);
> >> >> >> -       return false;
> >> >> >> +       return TLV_REJECT;
> >> >> >>  }
> >> >> >>
> >> >> >>  /* CALIPSO RFC 5570 */
> >> >> >>
> >> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> >>  {
> >> >> >>         const unsigned char *nh = skb_network_header(skb);
> >> >> >>
> >> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
> >> >> >> optoff)
> >> >> >>         if (!calipso_validate(skb, nh + optoff))
> >> >> >>                 goto drop;
> >> >> >>
> >> >> >> -       return true;
> >> >> >> +       return TLV_ACCEPT;
> >> >> >>
> >> >> >>  drop:
> >> >> >>         kfree_skb(skb);
> >> >> >> -       return false;
> >> >> >> +       return TLV_REJECT;
> >> >> >>  }
> >> >> >>
> >> >> >>  static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
> >> >> >>
> >> >> >>         opt->flags |= IP6SKB_HOPBYHOP;
> >> >> >>         if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> >> >> >> -                         init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> >> >> >> +                         init_net.ipv6.sysctl.max_hbh_opts_cnt,
> >> >> >> +                         true)) {
> >> >> >> +               /* we need to refresh the length in case
> >> >> >> +                * at least one TLV was removed
> >> >> >> +                */
> >> >> >> +               extlen = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> >>                 skb->transport_header += extlen;
> >> >> >>                 opt = IP6CB(skb);
> >> >> >>                 opt->nhoff = sizeof(struct ipv6hdr);
> >> >> >> --
> > > > > >> 2.17.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next] Fix unchecked dereference
  2020-06-26 17:23         ` Justin Iurman
@ 2020-06-27  4:04             ` Jakub Kicinski
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-27  4:04 UTC (permalink / raw)
  To: Justin Iurman; +Cc: dan carpenter, kbuild, netdev, lkp, kbuild-all, davem

On Fri, 26 Jun 2020 19:23:21 +0200 (CEST) Justin Iurman wrote:
> Hi Jakub,
> 
> It is an inline modification of the patch 4 of this series. The
> modification in itself cannot be a problem. Maybe I did send it the
> wrong way?

Ah, sorry I didn't notice the threading. Please don't tag fixups like
this with [PATCH $tree], the series would need a revision.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH net-next] Fix unchecked dereference
@ 2020-06-27  4:04             ` Jakub Kicinski
  0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-27  4:04 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 364 bytes --]

On Fri, 26 Jun 2020 19:23:21 +0200 (CEST) Justin Iurman wrote:
> Hi Jakub,
> 
> It is an inline modification of the patch 4 of this series. The
> modification in itself cannot be a problem. Maybe I did send it the
> wrong way?

Ah, sorry I didn't notice the threading. Please don't tag fixups like
this with [PATCH $tree], the series would need a revision.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
  2021-03-10 16:44 [PATCH net-next 0/5] Support for the IOAM Pre-allocated Trace with IPv6 Justin Iurman
@ 2021-03-10 16:44 ` Justin Iurman
  0 siblings, 0 replies; 42+ messages in thread
From: Justin Iurman @ 2021-03-10 16:44 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuba, tom, justin.iurman

Add documentation for new IOAM sysctls:
 - ioam6_id: a namespace sysctl
 - ioam6_enabled and ioam6_id: two per-interface sysctls

Example of IOAM configuration based on the following simple topology:

 _____              _____              _____
|     | eth0  eth0 |     | eth1  eth0 |     |
|  A  |.----------.|  B  |.----------.|  C  |
|_____|            |_____|            |_____|

1) Node and interface IDs can be configured for IOAM:

  # IOAM ID of A = 1, IOAM ID of A.eth0 = 11
  (A) sysctl -w net.ipv6.ioam6_id=1
  (A) sysctl -w net.ipv6.conf.eth0.ioam6_id=11

  # IOAM ID of B = 2, IOAM ID of B.eth0 = 21, IOAM ID of B.eth1 = 22
  (B) sysctl -w net.ipv6.ioam6_id=2
  (B) sysctl -w net.ipv6.conf.eth0.ioam6_id=21
  (B) sysctl -w net.ipv6.conf.eth1.ioam6_id=22

  # IOAM ID of C = 3, IOAM ID of C.eth0 = 31
  (C) sysctl -w net.ipv6.ioam6_id=3
  (C) sysctl -w net.ipv6.conf.eth0.ioam6_id=31

2) Each node can be configured to form an IOAM domain. For instance,
   we allow IOAM from A to C, i.e. enable IOAM on ingress for B.eth0
   and C.eth0:

  (B) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1
  (C) sysctl -w net.ipv6.conf.eth0.ioam6_enabled=1

3) An IOAM domain (e.g. ID=123) is defined and made known to each node:

  (A) ip ioam namespace add 123
  (B) ip ioam namespace add 123
  (C) ip ioam namespace add 123

4) Finally, an IOAM Pre-allocated Trace can be inserted in traffic sent
   by A when C (e.g. db02::2) is the destination:

  (A) ip -6 route add db02::2/128 encap ioam6 trace type 0x800000 ns 123
      size 12 dev eth0

Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
 Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
 Documentation/networking/ip-sysctl.rst    |  5 +++++
 2 files changed, 25 insertions(+)
 create mode 100644 Documentation/networking/ioam6-sysctl.rst

diff --git a/Documentation/networking/ioam6-sysctl.rst b/Documentation/networking/ioam6-sysctl.rst
new file mode 100644
index 000000000000..37a9b4e731a0
--- /dev/null
+++ b/Documentation/networking/ioam6-sysctl.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+IOAM6 Sysfs variables
+=====================
+
+
+/proc/sys/net/conf/<iface>/ioam6_* variables:
+=============================================
+
+ioam6_enabled - BOOL
+	Accept or ignore IPv6 IOAM options for ingress on this interface.
+
+	* 0 - disabled (default)
+	* not 0 - enabled
+
+ioam6_id - INTEGER
+	Define the IOAM id of this interface.
+
+	Default is 0.
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index c7952ac5bd2f..bd7ca536ba27 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1835,6 +1835,11 @@ fib_notify_on_flag_change - INTEGER
         - 1 - Emit notifications.
         - 2 - Emit notifications only for RTM_F_OFFLOAD_FAILED flag change.
 
+ioam6_id - INTEGER
+	Define the IOAM id of this node.
+
+	Default: 0
+
 IPv6 Fragmentation:
 
 ip6frag_high_thresh - INTEGER
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2021-03-10 16:55 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
2020-06-24 20:32   ` Tom Herbert
2020-06-25 17:47     ` Justin Iurman
2020-06-25 20:53       ` Tom Herbert
2020-06-26  8:22         ` Justin Iurman
2020-06-26 15:39           ` Tom Herbert
2020-06-26 17:14             ` Justin Iurman
2020-06-26 18:35               ` Tom Herbert
2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
2020-06-25  2:32   ` Tom Herbert
2020-06-25 17:56     ` Justin Iurman
2020-06-26  0:48       ` Tom Herbert
2020-06-26  8:31         ` Justin Iurman
2020-06-26 15:52           ` Tom Herbert
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
2020-06-24 21:37   ` kernel test robot
2020-06-24 21:37     ` kernel test robot
2020-06-24 23:11   ` kernel test robot
2020-06-24 23:11     ` kernel test robot
2020-06-24 23:11   ` [RFC PATCH] ipv6: ioam: ioam6_fill_trace_data_node() can be static kernel test robot
2020-06-24 23:11     ` kernel test robot
2020-06-25  2:42   ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Tom Herbert
2020-06-25 14:29   ` Tom Herbert
2020-06-25 18:23     ` Justin Iurman
2020-06-25 20:32       ` Tom Herbert
2020-06-26  8:13         ` Justin Iurman
2020-06-26 14:53           ` Tom Herbert
2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
2020-06-25 10:52   ` Dan Carpenter
2020-06-25 10:52     ` Dan Carpenter
2020-06-25 10:52     ` Dan Carpenter
2020-06-26  8:54     ` [PATCH net-next] Fix unchecked dereference Justin Iurman
2020-06-26 16:01       ` Jakub Kicinski
2020-06-26 16:01         ` Jakub Kicinski
2020-06-26 17:23         ` Justin Iurman
2020-06-27  4:04           ` Jakub Kicinski
2020-06-27  4:04             ` Jakub Kicinski
2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
2020-06-25  2:53   ` Tom Herbert
2020-06-25 18:00     ` Justin Iurman
2021-03-10 16:44 [PATCH net-next 0/5] Support for the IOAM Pre-allocated Trace with IPv6 Justin Iurman
2021-03-10 16:44 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.