* [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
2020-06-24 20:32 ` Tom Herbert
2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
` (3 subsequent siblings)
4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
To: netdev; +Cc: davem, justin.iurman
Add the possibility to remove one or more consecutive TLVs without
messing up the alignment of others. For now, only IOAM requires this
behavior.
By default, an 8-octet boundary is automatically assumed. This is the
price to pay (at most a useless 4-octet padding) to make sure everything
is still aligned after the removal.
Proof: let's assume for instance the following alignments 2n, 4n and 8n
respectively for options X, Y and Z, inside a Hop-by-Hop extension
header.
Example 1:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next header | Hdr Ext Len | X | X |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| X | X | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Option to be removed (8 octets) ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Y | Y | Y | Y |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Padding | Padding | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
boundary (same result in both cases).
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next header | Hdr Ext Len | X | X |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| X | X | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Y | Y | Y | Y |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Padding | Padding | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example 2:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next header | Hdr Ext Len | X | X |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| X | X | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option to be removed (4 octets) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Y | Y | Y | Y |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
of 8 anymore.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next header | Hdr Ext Len | X | X |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| X | X | Padding | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Y | Y | Y | Y |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Z | Z | Z | Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Therefore, the largest (8-octet) boundary is assumed by default and for
all, which means that blocks are only moved in multiples of 8. This
assertion guarantees good alignment.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 108 insertions(+), 26 deletions(-)
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index e9b366994475..f27ab3bf2e0c 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -52,17 +52,27 @@
#include <linux/uaccess.h>
-/*
- * Parsing tlv encoded headers.
+/* States for TLV parsing functions. */
+
+enum {
+ TLV_ACCEPT,
+ TLV_REJECT,
+ TLV_REMOVE,
+ __TLV_MAX
+};
+
+/* Parsing TLV encoded headers.
*
- * Parsing function "func" returns true, if parsing succeed
- * and false, if it failed.
- * It MUST NOT touch skb->h.
+ * Parsing function "func" returns either:
+ * - TLV_ACCEPT if parsing succeeds
+ * - TLV_REJECT if parsing fails
+ * - TLV_REMOVE if TLV must be removed
+ * It MUST NOT touch skb->h.
*/
struct tlvtype_proc {
int type;
- bool (*func)(struct sk_buff *skb, int offset);
+ int (*func)(struct sk_buff *skb, int offset);
};
/*********************
@@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff,
return false;
}
+/* Remove one or several consecutive TLVs and recompute offsets, lengths */
+
+static int remove_tlv(int start, int end, struct sk_buff *skb)
+{
+ int len = end - start;
+ int padlen = len % 8;
+ unsigned char *h;
+ int rlen, off;
+ u16 pl_len;
+
+ rlen = len - padlen;
+ if (rlen) {
+ skb_pull(skb, rlen);
+ memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
+ start);
+ skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
+
+ skb_reset_network_header(skb);
+ skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+
+ pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
+ ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
+
+ skb_transport_header(skb)[1] -= rlen >> 3;
+ end -= rlen;
+ }
+
+ if (padlen) {
+ off = end - padlen;
+ h = skb_network_header(skb);
+
+ if (padlen == 1) {
+ h[off] = IPV6_TLV_PAD1;
+ } else {
+ padlen -= 2;
+
+ h[off] = IPV6_TLV_PADN;
+ h[off + 1] = padlen;
+ memset(&h[off + 2], 0, padlen);
+ }
+ }
+
+ return end;
+}
+
/* Parse tlv encoded option header (hop-by-hop or destination) */
static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
struct sk_buff *skb,
- int max_count)
+ int max_count,
+ bool removable)
{
int len = (skb_transport_header(skb)[1] + 1) << 3;
- const unsigned char *nh = skb_network_header(skb);
+ unsigned char *nh = skb_network_header(skb);
int off = skb_network_header_len(skb);
const struct tlvtype_proc *curr;
bool disallow_unknowns = false;
+ int off_remove = 0;
int tlv_count = 0;
int padlen = 0;
+ int ret;
if (unlikely(max_count < 0)) {
disallow_unknowns = true;
@@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
if (tlv_count > max_count)
goto bad;
+ ret = -1;
for (curr = procs; curr->type >= 0; curr++) {
if (curr->type == nh[off]) {
/* type specific length/alignment
checks will be performed in the
func(). */
- if (curr->func(skb, off) == false)
+ ret = curr->func(skb, off);
+ if (ret == TLV_REJECT)
return false;
break;
}
@@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
!ip6_tlvopt_unknown(skb, off, disallow_unknowns))
return false;
+ if (removable) {
+ if (ret == TLV_REMOVE) {
+ if (!off_remove)
+ off_remove = off - padlen;
+ } else if (off_remove) {
+ off = remove_tlv(off_remove, off, skb);
+ nh = skb_network_header(skb);
+ off_remove = 0;
+ }
+ }
+
padlen = 0;
break;
}
@@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
len -= optlen;
}
- if (len == 0)
+ if (len == 0) {
+ /* Don't forget last TLV if it must be removed */
+ if (off_remove)
+ remove_tlv(off_remove, off, skb);
+
return true;
+ }
bad:
kfree_skb(skb);
return false;
@@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
*****************************/
#if IS_ENABLED(CONFIG_IPV6_MIP6)
-static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
+static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
{
struct ipv6_destopt_hao *hao;
struct inet6_skb_parm *opt = IP6CB(skb);
@@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
if (skb->tstamp == 0)
__net_timestamp(skb);
- return true;
+ return TLV_ACCEPT;
discard:
kfree_skb(skb);
- return false;
+ return TLV_REJECT;
}
#endif
@@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
#endif
if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
- init_net.ipv6.sysctl.max_dst_opts_cnt)) {
+ init_net.ipv6.sysctl.max_dst_opts_cnt,
+ false)) {
skb->transport_header += extlen;
opt = IP6CB(skb);
#if IS_ENABLED(CONFIG_IPV6_MIP6)
@@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff *skb)
/* Router Alert as of RFC 2711 */
-static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
+static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
{
const unsigned char *nh = skb_network_header(skb);
if (nh[optoff + 1] == 2) {
IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
- return true;
+ return TLV_ACCEPT;
}
net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
nh[optoff + 1]);
kfree_skb(skb);
- return false;
+ return TLV_REJECT;
}
/* Jumbo payload */
-static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
+static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
{
const unsigned char *nh = skb_network_header(skb);
struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
@@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
if (pkt_len <= IPV6_MAXPLEN) {
__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
- return false;
+ return TLV_REJECT;
}
if (ipv6_hdr(skb)->payload_len) {
__IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
- return false;
+ return TLV_REJECT;
}
if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
@@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
goto drop;
IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
- return true;
+ return TLV_ACCEPT;
drop:
kfree_skb(skb);
- return false;
+ return TLV_REJECT;
}
/* CALIPSO RFC 5570 */
-static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
+static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
{
const unsigned char *nh = skb_network_header(skb);
@@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
if (!calipso_validate(skb, nh + optoff))
goto drop;
- return true;
+ return TLV_ACCEPT;
drop:
kfree_skb(skb);
- return false;
+ return TLV_REJECT;
}
static const struct tlvtype_proc tlvprochopopt_lst[] = {
@@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
opt->flags |= IP6SKB_HOPBYHOP;
if (ip6_parse_tlv(tlvprochopopt_lst, skb,
- init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
+ init_net.ipv6.sysctl.max_hbh_opts_cnt,
+ true)) {
+ /* we need to refresh the length in case
+ * at least one TLV was removed
+ */
+ extlen = (skb_transport_header(skb)[1] + 1) << 3;
skb->transport_header += extlen;
opt = IP6CB(skb);
opt->nhoff = sizeof(struct ipv6hdr);
--
2.17.1
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
@ 2020-06-24 20:32 ` Tom Herbert
2020-06-25 17:47 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-24 20:32 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Add the possibility to remove one or more consecutive TLVs without
> messing up the alignment of others. For now, only IOAM requires this
> behavior.
>
Hi Justin,
Can you explain the motivation for this? Per RFC8200, extension
headers in flight are not to be added, removed, or modified outside of
the standard rules for processing modifiable HBH and DO TLVs., that
would include adding and removing TLVs in EH. One obvious problem this
creates is that it breaks AH if the TLVs are removed in HBH before AH
is processed (AH is processed after HBH).
Tom
> By default, an 8-octet boundary is automatically assumed. This is the
> price to pay (at most a useless 4-octet padding) to make sure everything
> is still aligned after the removal.
>
> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> header.
>
> Example 1:
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Next header | Hdr Ext Len | X | X |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | X | X | Padding | Padding |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | |
> ~ Option to be removed (8 octets) ~
> | |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Y | Y | Y | Y |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Padding | Padding | Padding | Padding |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> boundary (same result in both cases).
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Next header | Hdr Ext Len | X | X |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | X | X | Padding | Padding |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Y | Y | Y | Y |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Padding | Padding | Padding | Padding |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Example 2:
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Next header | Hdr Ext Len | X | X |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | X | X | Padding | Padding |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Option to be removed (4 octets) |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Y | Y | Y | Y |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> of 8 anymore.
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Next header | Hdr Ext Len | X | X |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | X | X | Padding | Padding |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Y | Y | Y | Y |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Z | Z | Z | Z |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Therefore, the largest (8-octet) boundary is assumed by default and for
> all, which means that blocks are only moved in multiples of 8. This
> assertion guarantees good alignment.
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 108 insertions(+), 26 deletions(-)
>
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index e9b366994475..f27ab3bf2e0c 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -52,17 +52,27 @@
>
> #include <linux/uaccess.h>
>
> -/*
> - * Parsing tlv encoded headers.
> +/* States for TLV parsing functions. */
> +
> +enum {
> + TLV_ACCEPT,
> + TLV_REJECT,
> + TLV_REMOVE,
> + __TLV_MAX
> +};
> +
> +/* Parsing TLV encoded headers.
> *
> - * Parsing function "func" returns true, if parsing succeed
> - * and false, if it failed.
> - * It MUST NOT touch skb->h.
> + * Parsing function "func" returns either:
> + * - TLV_ACCEPT if parsing succeeds
> + * - TLV_REJECT if parsing fails
> + * - TLV_REMOVE if TLV must be removed
> + * It MUST NOT touch skb->h.
> */
>
> struct tlvtype_proc {
> int type;
> - bool (*func)(struct sk_buff *skb, int offset);
> + int (*func)(struct sk_buff *skb, int offset);
> };
>
> /*********************
> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int optoff,
> return false;
> }
>
> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> +
> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> +{
> + int len = end - start;
> + int padlen = len % 8;
> + unsigned char *h;
> + int rlen, off;
> + u16 pl_len;
> +
> + rlen = len - padlen;
> + if (rlen) {
> + skb_pull(skb, rlen);
> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> + start);
> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> +
> + skb_reset_network_header(skb);
> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> +
> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> +
> + skb_transport_header(skb)[1] -= rlen >> 3;
> + end -= rlen;
> + }
> +
> + if (padlen) {
> + off = end - padlen;
> + h = skb_network_header(skb);
> +
> + if (padlen == 1) {
> + h[off] = IPV6_TLV_PAD1;
> + } else {
> + padlen -= 2;
> +
> + h[off] = IPV6_TLV_PADN;
> + h[off + 1] = padlen;
> + memset(&h[off + 2], 0, padlen);
> + }
> + }
> +
> + return end;
> +}
> +
> /* Parse tlv encoded option header (hop-by-hop or destination) */
>
> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> struct sk_buff *skb,
> - int max_count)
> + int max_count,
> + bool removable)
> {
> int len = (skb_transport_header(skb)[1] + 1) << 3;
> - const unsigned char *nh = skb_network_header(skb);
> + unsigned char *nh = skb_network_header(skb);
> int off = skb_network_header_len(skb);
> const struct tlvtype_proc *curr;
> bool disallow_unknowns = false;
> + int off_remove = 0;
> int tlv_count = 0;
> int padlen = 0;
> + int ret;
>
> if (unlikely(max_count < 0)) {
> disallow_unknowns = true;
> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> if (tlv_count > max_count)
> goto bad;
>
> + ret = -1;
> for (curr = procs; curr->type >= 0; curr++) {
> if (curr->type == nh[off]) {
> /* type specific length/alignment
> checks will be performed in the
> func(). */
> - if (curr->func(skb, off) == false)
> + ret = curr->func(skb, off);
> + if (ret == TLV_REJECT)
> return false;
> break;
> }
> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> return false;
>
> + if (removable) {
> + if (ret == TLV_REMOVE) {
> + if (!off_remove)
> + off_remove = off - padlen;
> + } else if (off_remove) {
> + off = remove_tlv(off_remove, off, skb);
> + nh = skb_network_header(skb);
> + off_remove = 0;
> + }
> + }
> +
> padlen = 0;
> break;
> }
> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> len -= optlen;
> }
>
> - if (len == 0)
> + if (len == 0) {
> + /* Don't forget last TLV if it must be removed */
> + if (off_remove)
> + remove_tlv(off_remove, off, skb);
> +
> return true;
> + }
> bad:
> kfree_skb(skb);
> return false;
> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> *****************************/
>
> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> {
> struct ipv6_destopt_hao *hao;
> struct inet6_skb_parm *opt = IP6CB(skb);
> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> if (skb->tstamp == 0)
> __net_timestamp(skb);
>
> - return true;
> + return TLV_ACCEPT;
>
> discard:
> kfree_skb(skb);
> - return false;
> + return TLV_REJECT;
> }
> #endif
>
> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> #endif
>
> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> + init_net.ipv6.sysctl.max_dst_opts_cnt,
> + false)) {
> skb->transport_header += extlen;
> opt = IP6CB(skb);
> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff *skb)
>
> /* Router Alert as of RFC 2711 */
>
> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> {
> const unsigned char *nh = skb_network_header(skb);
>
> if (nh[optoff + 1] == 2) {
> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> - return true;
> + return TLV_ACCEPT;
> }
> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> nh[optoff + 1]);
> kfree_skb(skb);
> - return false;
> + return TLV_REJECT;
> }
>
> /* Jumbo payload */
>
> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> {
> const unsigned char *nh = skb_network_header(skb);
> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> if (pkt_len <= IPV6_MAXPLEN) {
> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> - return false;
> + return TLV_REJECT;
> }
> if (ipv6_hdr(skb)->payload_len) {
> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> - return false;
> + return TLV_REJECT;
> }
>
> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> goto drop;
>
> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> - return true;
> + return TLV_ACCEPT;
>
> drop:
> kfree_skb(skb);
> - return false;
> + return TLV_REJECT;
> }
>
> /* CALIPSO RFC 5570 */
>
> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> {
> const unsigned char *nh = skb_network_header(skb);
>
> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> if (!calipso_validate(skb, nh + optoff))
> goto drop;
>
> - return true;
> + return TLV_ACCEPT;
>
> drop:
> kfree_skb(skb);
> - return false;
> + return TLV_REJECT;
> }
>
> static const struct tlvtype_proc tlvprochopopt_lst[] = {
> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>
> opt->flags |= IP6SKB_HOPBYHOP;
> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
> + true)) {
> + /* we need to refresh the length in case
> + * at least one TLV was removed
> + */
> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
> skb->transport_header += extlen;
> opt = IP6CB(skb);
> opt->nhoff = sizeof(struct ipv6hdr);
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-24 20:32 ` Tom Herbert
@ 2020-06-25 17:47 ` Justin Iurman
2020-06-25 20:53 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 17:47 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
Hi Tom,
>> Add the possibility to remove one or more consecutive TLVs without
>> messing up the alignment of others. For now, only IOAM requires this
>> behavior.
>>
> Hi Justin,
>
> Can you explain the motivation for this? Per RFC8200, extension
> headers in flight are not to be added, removed, or modified outside of
> the standard rules for processing modifiable HBH and DO TLVs., that
> would include adding and removing TLVs in EH. One obvious problem this
As you already know from our last meeting, IOAM may be configured on a node such that a specific IOAM namespace should be removed. Therefore, this patch provides support for the deletion of a TLV (or consecutive TLVs), without removing the entire EH (if it's empty, there will be padding). Note that there is a similar "problem" with the Incremental Trace where you'd need to expand the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is against modification of in-flight EHs, but there are several reasons that, I believe, mitigates this statement.
Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely deployed on the Internet. We can distinguish two big scenarios: (i) in-transit traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the domain, ie from an IOAM node inside the domain to another one (no need for encapsulation). In both cases, we kind of own the traffic: (i) encapsulation, so we modify "our" header and (ii) we already own the traffic.
And if someone is still angry about this, well, the good news is that such modification can be avoided most of the time. Indeed, operators are advised to remove an IOAM namespace only on egress nodes. This way, the destination (either the tunnel destination or the real destination, depending on the scenario) will receive EHs and take care of them without the need to remove anything. But, again, operators can do what they want and I'd tend to adhere to David's philosophy [1] and give them the possibility to choose what to do.
> creates is that it breaks AH if the TLVs are removed in HBH before AH
> is processed (AH is processed after HBH).
Correct. But I don't think it should prevent us from having IOAM in the kernel. Again, operators could simply apply IOAM on a subset of the traffic that does not include AHs, for example.
Justin
[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
> Tom
>> By default, an 8-octet boundary is automatically assumed. This is the
>> price to pay (at most a useless 4-octet padding) to make sure everything
>> is still aligned after the removal.
>>
>> Proof: let's assume for instance the following alignments 2n, 4n and 8n
>> respectively for options X, Y and Z, inside a Hop-by-Hop extension
>> header.
>>
>> Example 1:
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Next header | Hdr Ext Len | X | X |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | X | X | Padding | Padding |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | |
>> ~ Option to be removed (8 octets) ~
>> | |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Y | Y | Y | Y |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Padding | Padding | Padding | Padding |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
>> boundary (same result in both cases).
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Next header | Hdr Ext Len | X | X |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | X | X | Padding | Padding |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Y | Y | Y | Y |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Padding | Padding | Padding | Padding |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Example 2:
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Next header | Hdr Ext Len | X | X |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | X | X | Padding | Padding |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Option to be removed (4 octets) |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Y | Y | Y | Y |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
>> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
>> of 8 anymore.
>>
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Next header | Hdr Ext Len | X | X |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | X | X | Padding | Padding |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Y | Y | Y | Y |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> | Z | Z | Z | Z |
>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Therefore, the largest (8-octet) boundary is assumed by default and for
>> all, which means that blocks are only moved in multiples of 8. This
>> assertion guarantees good alignment.
>>
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>> 1 file changed, 108 insertions(+), 26 deletions(-)
>>
>> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> index e9b366994475..f27ab3bf2e0c 100644
>> --- a/net/ipv6/exthdrs.c
>> +++ b/net/ipv6/exthdrs.c
>> @@ -52,17 +52,27 @@
>>
>> #include <linux/uaccess.h>
>>
>> -/*
>> - * Parsing tlv encoded headers.
>> +/* States for TLV parsing functions. */
>> +
>> +enum {
>> + TLV_ACCEPT,
>> + TLV_REJECT,
>> + TLV_REMOVE,
>> + __TLV_MAX
>> +};
>> +
>> +/* Parsing TLV encoded headers.
>> *
>> - * Parsing function "func" returns true, if parsing succeed
>> - * and false, if it failed.
>> - * It MUST NOT touch skb->h.
>> + * Parsing function "func" returns either:
>> + * - TLV_ACCEPT if parsing succeeds
>> + * - TLV_REJECT if parsing fails
>> + * - TLV_REMOVE if TLV must be removed
>> + * It MUST NOT touch skb->h.
>> */
>>
>> struct tlvtype_proc {
>> int type;
>> - bool (*func)(struct sk_buff *skb, int offset);
>> + int (*func)(struct sk_buff *skb, int offset);
>> };
>>
>> /*********************
>> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
>> optoff,
>> return false;
>> }
>>
>> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
>> +
>> +static int remove_tlv(int start, int end, struct sk_buff *skb)
>> +{
>> + int len = end - start;
>> + int padlen = len % 8;
>> + unsigned char *h;
>> + int rlen, off;
>> + u16 pl_len;
>> +
>> + rlen = len - padlen;
>> + if (rlen) {
>> + skb_pull(skb, rlen);
>> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
>> + start);
>> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
>> +
>> + skb_reset_network_header(skb);
>> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
>> +
>> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
>> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
>> +
>> + skb_transport_header(skb)[1] -= rlen >> 3;
>> + end -= rlen;
>> + }
>> +
>> + if (padlen) {
>> + off = end - padlen;
>> + h = skb_network_header(skb);
>> +
>> + if (padlen == 1) {
>> + h[off] = IPV6_TLV_PAD1;
>> + } else {
>> + padlen -= 2;
>> +
>> + h[off] = IPV6_TLV_PADN;
>> + h[off + 1] = padlen;
>> + memset(&h[off + 2], 0, padlen);
>> + }
>> + }
>> +
>> + return end;
>> +}
>> +
>> /* Parse tlv encoded option header (hop-by-hop or destination) */
>>
>> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> struct sk_buff *skb,
>> - int max_count)
>> + int max_count,
>> + bool removable)
>> {
>> int len = (skb_transport_header(skb)[1] + 1) << 3;
>> - const unsigned char *nh = skb_network_header(skb);
>> + unsigned char *nh = skb_network_header(skb);
>> int off = skb_network_header_len(skb);
>> const struct tlvtype_proc *curr;
>> bool disallow_unknowns = false;
>> + int off_remove = 0;
>> int tlv_count = 0;
>> int padlen = 0;
>> + int ret;
>>
>> if (unlikely(max_count < 0)) {
>> disallow_unknowns = true;
>> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
>> *procs,
>> if (tlv_count > max_count)
>> goto bad;
>>
>> + ret = -1;
>> for (curr = procs; curr->type >= 0; curr++) {
>> if (curr->type == nh[off]) {
>> /* type specific length/alignment
>> checks will be performed in the
>> func(). */
>> - if (curr->func(skb, off) == false)
>> + ret = curr->func(skb, off);
>> + if (ret == TLV_REJECT)
>> return false;
>> break;
>> }
>> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>> return false;
>>
>> + if (removable) {
>> + if (ret == TLV_REMOVE) {
>> + if (!off_remove)
>> + off_remove = off - padlen;
>> + } else if (off_remove) {
>> + off = remove_tlv(off_remove, off, skb);
>> + nh = skb_network_header(skb);
>> + off_remove = 0;
>> + }
>> + }
>> +
>> padlen = 0;
>> break;
>> }
>> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> len -= optlen;
>> }
>>
>> - if (len == 0)
>> + if (len == 0) {
>> + /* Don't forget last TLV if it must be removed */
>> + if (off_remove)
>> + remove_tlv(off_remove, off, skb);
>> +
>> return true;
>> + }
>> bad:
>> kfree_skb(skb);
>> return false;
>> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> *****************************/
>>
>> #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> {
>> struct ipv6_destopt_hao *hao;
>> struct inet6_skb_parm *opt = IP6CB(skb);
>> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> if (skb->tstamp == 0)
>> __net_timestamp(skb);
>>
>> - return true;
>> + return TLV_ACCEPT;
>>
>> discard:
>> kfree_skb(skb);
>> - return false;
>> + return TLV_REJECT;
>> }
>> #endif
>>
>> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>> #endif
>>
>> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
>> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
>> + init_net.ipv6.sysctl.max_dst_opts_cnt,
>> + false)) {
>> skb->transport_header += extlen;
>> opt = IP6CB(skb);
>> #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
>> *skb)
>>
>> /* Router Alert as of RFC 2711 */
>>
>> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> {
>> const unsigned char *nh = skb_network_header(skb);
>>
>> if (nh[optoff + 1] == 2) {
>> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
>> - return true;
>> + return TLV_ACCEPT;
>> }
>> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>> nh[optoff + 1]);
>> kfree_skb(skb);
>> - return false;
>> + return TLV_REJECT;
>> }
>>
>> /* Jumbo payload */
>>
>> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> {
>> const unsigned char *nh = skb_network_header(skb);
>> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
>> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> optoff)
>> if (pkt_len <= IPV6_MAXPLEN) {
>> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
>> - return false;
>> + return TLV_REJECT;
>> }
>> if (ipv6_hdr(skb)->payload_len) {
>> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
>> - return false;
>> + return TLV_REJECT;
>> }
>>
>> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
>> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> optoff)
>> goto drop;
>>
>> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
>> - return true;
>> + return TLV_ACCEPT;
>>
>> drop:
>> kfree_skb(skb);
>> - return false;
>> + return TLV_REJECT;
>> }
>>
>> /* CALIPSO RFC 5570 */
>>
>> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> {
>> const unsigned char *nh = skb_network_header(skb);
>>
>> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
>> optoff)
>> if (!calipso_validate(skb, nh + optoff))
>> goto drop;
>>
>> - return true;
>> + return TLV_ACCEPT;
>>
>> drop:
>> kfree_skb(skb);
>> - return false;
>> + return TLV_REJECT;
>> }
>>
>> static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>>
>> opt->flags |= IP6SKB_HOPBYHOP;
>> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
>> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
>> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
>> + true)) {
>> + /* we need to refresh the length in case
>> + * at least one TLV was removed
>> + */
>> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
>> skb->transport_header += extlen;
>> opt = IP6CB(skb);
>> opt->nhoff = sizeof(struct ipv6hdr);
>> --
>> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-25 17:47 ` Justin Iurman
@ 2020-06-25 20:53 ` Tom Herbert
2020-06-26 8:22 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 20:53 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Thu, Jun 25, 2020 at 10:47 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Hi Tom,
>
> >> Add the possibility to remove one or more consecutive TLVs without
> >> messing up the alignment of others. For now, only IOAM requires this
> >> behavior.
> >>
> > Hi Justin,
> >
> > Can you explain the motivation for this? Per RFC8200, extension
> > headers in flight are not to be added, removed, or modified outside of
> > the standard rules for processing modifiable HBH and DO TLVs., that
> > would include adding and removing TLVs in EH. One obvious problem this
>
> As you already know from our last meeting, IOAM may be configured on a node such that a specific IOAM namespace should be removed. Therefore, this patch provides support for the deletion of a TLV (or consecutive TLVs), without removing the entire EH (if it's empty, there will be padding). Note that there is a similar "problem" with the Incremental Trace where you'd need to expand the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is against modification of in-flight EHs, but there are several reasons that, I believe, mitigates this statement.
>
> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely deployed on the Internet. We can distinguish two big scenarios: (i) in-transit traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the domain, ie from an IOAM node inside the domain to another one (no need for encapsulation). In both cases, we kind of own the traffic: (i) encapsulation, so we modify "our" header and (ii) we already own the traffic.
>
> And if someone is still angry about this, well, the good news is that such modification can be avoided most of the time. Indeed, operators are advised to remove an IOAM namespace only on egress nodes. This way, the destination (either the tunnel destination or the real destination, depending on the scenario) will receive EHs and take care of them without the need to remove anything. But, again, operators can do what they want and I'd tend to adhere to David's philosophy [1] and give them the possibility to choose what to do.
>
Justin,
6man WG has had a _long_ and sometimes bitter discussion around this
particularly with regards to insertion of SRH. The current consensus
of IETF is that it is a violation of RFC8200. We've heard all the
arguments that it's only for limited domains and narrow use cases,
nevertheless there are several problems that the header
insertion/deletion advocates never answered-- it breaks AH, it breaks
PMTU discovery, it breaks ICMP. There is also a risk that a
non-standard modification could cause a packet to be dropped
downstream from the node that modifies it. There is no attribution on
who created the problem, and hence this can lead to systematic
blackholes which are the most miserable sort of problem to debug.
Fundamentally, it is not robust per Postel's law (I actually wrote a
draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
you're interested).
IMO, we shouldn't be using Linux as a backdoor to implement protocol
that IETF is saying isn't robust. Can you point out in the IOAM drafts
where this requirement is specified, then I can take it up in IOAM WG
or 6man if needed...
Tom
> > creates is that it breaks AH if the TLVs are removed in HBH before AH
> > is processed (AH is processed after HBH).
>
> Correct. But I don't think it should prevent us from having IOAM in the kernel. Again, operators could simply apply IOAM on a subset of the traffic that does not include AHs, for example.
>
> Justin
>
> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
>
> > Tom
> >> By default, an 8-octet boundary is automatically assumed. This is the
> >> price to pay (at most a useless 4-octet padding) to make sure everything
> >> is still aligned after the removal.
> >>
> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> >> header.
> >>
> >> Example 1:
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Next header | Hdr Ext Len | X | X |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | X | X | Padding | Padding |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | |
> >> ~ Option to be removed (8 octets) ~
> >> | |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Y | Y | Y | Y |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Padding | Padding | Padding | Padding |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> >> boundary (same result in both cases).
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Next header | Hdr Ext Len | X | X |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | X | X | Padding | Padding |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Y | Y | Y | Y |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Padding | Padding | Padding | Padding |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Example 2:
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Next header | Hdr Ext Len | X | X |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | X | X | Padding | Padding |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Option to be removed (4 octets) |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Y | Y | Y | Y |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> >> of 8 anymore.
> >>
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Next header | Hdr Ext Len | X | X |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | X | X | Padding | Padding |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Y | Y | Y | Y |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> | Z | Z | Z | Z |
> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >>
> >> Therefore, the largest (8-octet) boundary is assumed by default and for
> >> all, which means that blocks are only moved in multiples of 8. This
> >> assertion guarantees good alignment.
> >>
> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> ---
> >> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> >> 1 file changed, 108 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> index e9b366994475..f27ab3bf2e0c 100644
> >> --- a/net/ipv6/exthdrs.c
> >> +++ b/net/ipv6/exthdrs.c
> >> @@ -52,17 +52,27 @@
> >>
> >> #include <linux/uaccess.h>
> >>
> >> -/*
> >> - * Parsing tlv encoded headers.
> >> +/* States for TLV parsing functions. */
> >> +
> >> +enum {
> >> + TLV_ACCEPT,
> >> + TLV_REJECT,
> >> + TLV_REMOVE,
> >> + __TLV_MAX
> >> +};
> >> +
> >> +/* Parsing TLV encoded headers.
> >> *
> >> - * Parsing function "func" returns true, if parsing succeed
> >> - * and false, if it failed.
> >> - * It MUST NOT touch skb->h.
> >> + * Parsing function "func" returns either:
> >> + * - TLV_ACCEPT if parsing succeeds
> >> + * - TLV_REJECT if parsing fails
> >> + * - TLV_REMOVE if TLV must be removed
> >> + * It MUST NOT touch skb->h.
> >> */
> >>
> >> struct tlvtype_proc {
> >> int type;
> >> - bool (*func)(struct sk_buff *skb, int offset);
> >> + int (*func)(struct sk_buff *skb, int offset);
> >> };
> >>
> >> /*********************
> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
> >> optoff,
> >> return false;
> >> }
> >>
> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> >> +
> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> >> +{
> >> + int len = end - start;
> >> + int padlen = len % 8;
> >> + unsigned char *h;
> >> + int rlen, off;
> >> + u16 pl_len;
> >> +
> >> + rlen = len - padlen;
> >> + if (rlen) {
> >> + skb_pull(skb, rlen);
> >> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> >> + start);
> >> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> >> +
> >> + skb_reset_network_header(skb);
> >> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> >> +
> >> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> >> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> >> +
> >> + skb_transport_header(skb)[1] -= rlen >> 3;
> >> + end -= rlen;
> >> + }
> >> +
> >> + if (padlen) {
> >> + off = end - padlen;
> >> + h = skb_network_header(skb);
> >> +
> >> + if (padlen == 1) {
> >> + h[off] = IPV6_TLV_PAD1;
> >> + } else {
> >> + padlen -= 2;
> >> +
> >> + h[off] = IPV6_TLV_PADN;
> >> + h[off + 1] = padlen;
> >> + memset(&h[off + 2], 0, padlen);
> >> + }
> >> + }
> >> +
> >> + return end;
> >> +}
> >> +
> >> /* Parse tlv encoded option header (hop-by-hop or destination) */
> >>
> >> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> struct sk_buff *skb,
> >> - int max_count)
> >> + int max_count,
> >> + bool removable)
> >> {
> >> int len = (skb_transport_header(skb)[1] + 1) << 3;
> >> - const unsigned char *nh = skb_network_header(skb);
> >> + unsigned char *nh = skb_network_header(skb);
> >> int off = skb_network_header_len(skb);
> >> const struct tlvtype_proc *curr;
> >> bool disallow_unknowns = false;
> >> + int off_remove = 0;
> >> int tlv_count = 0;
> >> int padlen = 0;
> >> + int ret;
> >>
> >> if (unlikely(max_count < 0)) {
> >> disallow_unknowns = true;
> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
> >> *procs,
> >> if (tlv_count > max_count)
> >> goto bad;
> >>
> >> + ret = -1;
> >> for (curr = procs; curr->type >= 0; curr++) {
> >> if (curr->type == nh[off]) {
> >> /* type specific length/alignment
> >> checks will be performed in the
> >> func(). */
> >> - if (curr->func(skb, off) == false)
> >> + ret = curr->func(skb, off);
> >> + if (ret == TLV_REJECT)
> >> return false;
> >> break;
> >> }
> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> >> return false;
> >>
> >> + if (removable) {
> >> + if (ret == TLV_REMOVE) {
> >> + if (!off_remove)
> >> + off_remove = off - padlen;
> >> + } else if (off_remove) {
> >> + off = remove_tlv(off_remove, off, skb);
> >> + nh = skb_network_header(skb);
> >> + off_remove = 0;
> >> + }
> >> + }
> >> +
> >> padlen = 0;
> >> break;
> >> }
> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> len -= optlen;
> >> }
> >>
> >> - if (len == 0)
> >> + if (len == 0) {
> >> + /* Don't forget last TLV if it must be removed */
> >> + if (off_remove)
> >> + remove_tlv(off_remove, off, skb);
> >> +
> >> return true;
> >> + }
> >> bad:
> >> kfree_skb(skb);
> >> return false;
> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> *****************************/
> >>
> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> {
> >> struct ipv6_destopt_hao *hao;
> >> struct inet6_skb_parm *opt = IP6CB(skb);
> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> if (skb->tstamp == 0)
> >> __net_timestamp(skb);
> >>
> >> - return true;
> >> + return TLV_ACCEPT;
> >>
> >> discard:
> >> kfree_skb(skb);
> >> - return false;
> >> + return TLV_REJECT;
> >> }
> >> #endif
> >>
> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> >> #endif
> >>
> >> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> >> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> >> + init_net.ipv6.sysctl.max_dst_opts_cnt,
> >> + false)) {
> >> skb->transport_header += extlen;
> >> opt = IP6CB(skb);
> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
> >> *skb)
> >>
> >> /* Router Alert as of RFC 2711 */
> >>
> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> {
> >> const unsigned char *nh = skb_network_header(skb);
> >>
> >> if (nh[optoff + 1] == 2) {
> >> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> >> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> >> - return true;
> >> + return TLV_ACCEPT;
> >> }
> >> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> >> nh[optoff + 1]);
> >> kfree_skb(skb);
> >> - return false;
> >> + return TLV_REJECT;
> >> }
> >>
> >> /* Jumbo payload */
> >>
> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> {
> >> const unsigned char *nh = skb_network_header(skb);
> >> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> optoff)
> >> if (pkt_len <= IPV6_MAXPLEN) {
> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> >> - return false;
> >> + return TLV_REJECT;
> >> }
> >> if (ipv6_hdr(skb)->payload_len) {
> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> >> - return false;
> >> + return TLV_REJECT;
> >> }
> >>
> >> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> optoff)
> >> goto drop;
> >>
> >> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> >> - return true;
> >> + return TLV_ACCEPT;
> >>
> >> drop:
> >> kfree_skb(skb);
> >> - return false;
> >> + return TLV_REJECT;
> >> }
> >>
> >> /* CALIPSO RFC 5570 */
> >>
> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> {
> >> const unsigned char *nh = skb_network_header(skb);
> >>
> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
> >> optoff)
> >> if (!calipso_validate(skb, nh + optoff))
> >> goto drop;
> >>
> >> - return true;
> >> + return TLV_ACCEPT;
> >>
> >> drop:
> >> kfree_skb(skb);
> >> - return false;
> >> + return TLV_REJECT;
> >> }
> >>
> >> static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
> >>
> >> opt->flags |= IP6SKB_HOPBYHOP;
> >> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> >> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> >> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
> >> + true)) {
> >> + /* we need to refresh the length in case
> >> + * at least one TLV was removed
> >> + */
> >> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
> >> skb->transport_header += extlen;
> >> opt = IP6CB(skb);
> >> opt->nhoff = sizeof(struct ipv6hdr);
> >> --
> >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-25 20:53 ` Tom Herbert
@ 2020-06-26 8:22 ` Justin Iurman
2020-06-26 15:39 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 8:22 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
Tom,
>> Hi Tom,
>>
>> >> Add the possibility to remove one or more consecutive TLVs without
>> >> messing up the alignment of others. For now, only IOAM requires this
>> >> behavior.
>> >>
>> > Hi Justin,
>> >
>> > Can you explain the motivation for this? Per RFC8200, extension
>> > headers in flight are not to be added, removed, or modified outside of
>> > the standard rules for processing modifiable HBH and DO TLVs., that
>> > would include adding and removing TLVs in EH. One obvious problem this
>>
>> As you already know from our last meeting, IOAM may be configured on a node such
>> that a specific IOAM namespace should be removed. Therefore, this patch
>> provides support for the deletion of a TLV (or consecutive TLVs), without
>> removing the entire EH (if it's empty, there will be padding). Note that there
>> is a similar "problem" with the Incremental Trace where you'd need to expand
>> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
>> against modification of in-flight EHs, but there are several reasons that, I
>> believe, mitigates this statement.
>>
>> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
>> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
>> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
>> domain, ie from an IOAM node inside the domain to another one (no need for
>> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
>> so we modify "our" header and (ii) we already own the traffic.
>>
>> And if someone is still angry about this, well, the good news is that such
>> modification can be avoided most of the time. Indeed, operators are advised to
>> remove an IOAM namespace only on egress nodes. This way, the destination
>> (either the tunnel destination or the real destination, depending on the
>> scenario) will receive EHs and take care of them without the need to remove
>> anything. But, again, operators can do what they want and I'd tend to adhere to
>> David's philosophy [1] and give them the possibility to choose what to do.
>>
>
> Justin,
>
> 6man WG has had a _long_ and sometimes bitter discussion around this
> particularly with regards to insertion of SRH. The current consensus
> of IETF is that it is a violation of RFC8200. We've heard all the
> arguments that it's only for limited domains and narrow use cases,
> nevertheless there are several problems that the header
> insertion/deletion advocates never answered-- it breaks AH, it breaks
> PMTU discovery, it breaks ICMP. There is also a risk that a
> non-standard modification could cause a packet to be dropped
> downstream from the node that modifies it. There is no attribution on
> who created the problem, and hence this can lead to systematic
> blackholes which are the most miserable sort of problem to debug.
Yes, I know the whole story and it's been stormy from what I understood.
> Fundamentally, it is not robust per Postel's law (I actually wrote a
> draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
> you're interested).
Interesting, I'll take a look.
> IMO, we shouldn't be using Linux as a backdoor to implement protocol
> that IETF is saying isn't robust. Can you point out in the IOAM drafts
> where this requirement is specified, then I can take it up in IOAM WG
> or 6man if needed...
Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1] (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be published.
Justin
[1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
[2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> Tom
>
>> > creates is that it breaks AH if the TLVs are removed in HBH before AH
>> > is processed (AH is processed after HBH).
>>
>> Correct. But I don't think it should prevent us from having IOAM in the kernel.
>> Again, operators could simply apply IOAM on a subset of the traffic that does
>> not include AHs, for example.
>>
>> Justin
>>
>> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
>>
>> > Tom
>> >> By default, an 8-octet boundary is automatically assumed. This is the
>> >> price to pay (at most a useless 4-octet padding) to make sure everything
>> >> is still aligned after the removal.
>> >>
>> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
>> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
>> >> header.
>> >>
>> >> Example 1:
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Next header | Hdr Ext Len | X | X |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | X | X | Padding | Padding |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | |
>> >> ~ Option to be removed (8 octets) ~
>> >> | |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Y | Y | Y | Y |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Padding | Padding | Padding | Padding |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
>> >> boundary (same result in both cases).
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Next header | Hdr Ext Len | X | X |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | X | X | Padding | Padding |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Y | Y | Y | Y |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Padding | Padding | Padding | Padding |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Example 2:
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Next header | Hdr Ext Len | X | X |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | X | X | Padding | Padding |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Option to be removed (4 octets) |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Y | Y | Y | Y |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
>> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
>> >> of 8 anymore.
>> >>
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Next header | Hdr Ext Len | X | X |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | X | X | Padding | Padding |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Y | Y | Y | Y |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> | Z | Z | Z | Z |
>> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >>
>> >> Therefore, the largest (8-octet) boundary is assumed by default and for
>> >> all, which means that blocks are only moved in multiples of 8. This
>> >> assertion guarantees good alignment.
>> >>
>> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> ---
>> >> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>> >> 1 file changed, 108 insertions(+), 26 deletions(-)
>> >>
>> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> >> index e9b366994475..f27ab3bf2e0c 100644
>> >> --- a/net/ipv6/exthdrs.c
>> >> +++ b/net/ipv6/exthdrs.c
>> >> @@ -52,17 +52,27 @@
>> >>
>> >> #include <linux/uaccess.h>
>> >>
>> >> -/*
>> >> - * Parsing tlv encoded headers.
>> >> +/* States for TLV parsing functions. */
>> >> +
>> >> +enum {
>> >> + TLV_ACCEPT,
>> >> + TLV_REJECT,
>> >> + TLV_REMOVE,
>> >> + __TLV_MAX
>> >> +};
>> >> +
>> >> +/* Parsing TLV encoded headers.
>> >> *
>> >> - * Parsing function "func" returns true, if parsing succeed
>> >> - * and false, if it failed.
>> >> - * It MUST NOT touch skb->h.
>> >> + * Parsing function "func" returns either:
>> >> + * - TLV_ACCEPT if parsing succeeds
>> >> + * - TLV_REJECT if parsing fails
>> >> + * - TLV_REMOVE if TLV must be removed
>> >> + * It MUST NOT touch skb->h.
>> >> */
>> >>
>> >> struct tlvtype_proc {
>> >> int type;
>> >> - bool (*func)(struct sk_buff *skb, int offset);
>> >> + int (*func)(struct sk_buff *skb, int offset);
>> >> };
>> >>
>> >> /*********************
>> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
>> >> optoff,
>> >> return false;
>> >> }
>> >>
>> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
>> >> +
>> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
>> >> +{
>> >> + int len = end - start;
>> >> + int padlen = len % 8;
>> >> + unsigned char *h;
>> >> + int rlen, off;
>> >> + u16 pl_len;
>> >> +
>> >> + rlen = len - padlen;
>> >> + if (rlen) {
>> >> + skb_pull(skb, rlen);
>> >> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
>> >> + start);
>> >> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
>> >> +
>> >> + skb_reset_network_header(skb);
>> >> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
>> >> +
>> >> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
>> >> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
>> >> +
>> >> + skb_transport_header(skb)[1] -= rlen >> 3;
>> >> + end -= rlen;
>> >> + }
>> >> +
>> >> + if (padlen) {
>> >> + off = end - padlen;
>> >> + h = skb_network_header(skb);
>> >> +
>> >> + if (padlen == 1) {
>> >> + h[off] = IPV6_TLV_PAD1;
>> >> + } else {
>> >> + padlen -= 2;
>> >> +
>> >> + h[off] = IPV6_TLV_PADN;
>> >> + h[off + 1] = padlen;
>> >> + memset(&h[off + 2], 0, padlen);
>> >> + }
>> >> + }
>> >> +
>> >> + return end;
>> >> +}
>> >> +
>> >> /* Parse tlv encoded option header (hop-by-hop or destination) */
>> >>
>> >> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> struct sk_buff *skb,
>> >> - int max_count)
>> >> + int max_count,
>> >> + bool removable)
>> >> {
>> >> int len = (skb_transport_header(skb)[1] + 1) << 3;
>> >> - const unsigned char *nh = skb_network_header(skb);
>> >> + unsigned char *nh = skb_network_header(skb);
>> >> int off = skb_network_header_len(skb);
>> >> const struct tlvtype_proc *curr;
>> >> bool disallow_unknowns = false;
>> >> + int off_remove = 0;
>> >> int tlv_count = 0;
>> >> int padlen = 0;
>> >> + int ret;
>> >>
>> >> if (unlikely(max_count < 0)) {
>> >> disallow_unknowns = true;
>> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
>> >> *procs,
>> >> if (tlv_count > max_count)
>> >> goto bad;
>> >>
>> >> + ret = -1;
>> >> for (curr = procs; curr->type >= 0; curr++) {
>> >> if (curr->type == nh[off]) {
>> >> /* type specific length/alignment
>> >> checks will be performed in the
>> >> func(). */
>> >> - if (curr->func(skb, off) == false)
>> >> + ret = curr->func(skb, off);
>> >> + if (ret == TLV_REJECT)
>> >> return false;
>> >> break;
>> >> }
>> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>> >> return false;
>> >>
>> >> + if (removable) {
>> >> + if (ret == TLV_REMOVE) {
>> >> + if (!off_remove)
>> >> + off_remove = off - padlen;
>> >> + } else if (off_remove) {
>> >> + off = remove_tlv(off_remove, off, skb);
>> >> + nh = skb_network_header(skb);
>> >> + off_remove = 0;
>> >> + }
>> >> + }
>> >> +
>> >> padlen = 0;
>> >> break;
>> >> }
>> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> len -= optlen;
>> >> }
>> >>
>> >> - if (len == 0)
>> >> + if (len == 0) {
>> >> + /* Don't forget last TLV if it must be removed */
>> >> + if (off_remove)
>> >> + remove_tlv(off_remove, off, skb);
>> >> +
>> >> return true;
>> >> + }
>> >> bad:
>> >> kfree_skb(skb);
>> >> return false;
>> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> *****************************/
>> >>
>> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> {
>> >> struct ipv6_destopt_hao *hao;
>> >> struct inet6_skb_parm *opt = IP6CB(skb);
>> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> if (skb->tstamp == 0)
>> >> __net_timestamp(skb);
>> >>
>> >> - return true;
>> >> + return TLV_ACCEPT;
>> >>
>> >> discard:
>> >> kfree_skb(skb);
>> >> - return false;
>> >> + return TLV_REJECT;
>> >> }
>> >> #endif
>> >>
>> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>> >> #endif
>> >>
>> >> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
>> >> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
>> >> + init_net.ipv6.sysctl.max_dst_opts_cnt,
>> >> + false)) {
>> >> skb->transport_header += extlen;
>> >> opt = IP6CB(skb);
>> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
>> >> *skb)
>> >>
>> >> /* Router Alert as of RFC 2711 */
>> >>
>> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> {
>> >> const unsigned char *nh = skb_network_header(skb);
>> >>
>> >> if (nh[optoff + 1] == 2) {
>> >> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>> >> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
>> >> - return true;
>> >> + return TLV_ACCEPT;
>> >> }
>> >> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>> >> nh[optoff + 1]);
>> >> kfree_skb(skb);
>> >> - return false;
>> >> + return TLV_REJECT;
>> >> }
>> >>
>> >> /* Jumbo payload */
>> >>
>> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> {
>> >> const unsigned char *nh = skb_network_header(skb);
>> >> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
>> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> optoff)
>> >> if (pkt_len <= IPV6_MAXPLEN) {
>> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
>> >> - return false;
>> >> + return TLV_REJECT;
>> >> }
>> >> if (ipv6_hdr(skb)->payload_len) {
>> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
>> >> - return false;
>> >> + return TLV_REJECT;
>> >> }
>> >>
>> >> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
>> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> optoff)
>> >> goto drop;
>> >>
>> >> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
>> >> - return true;
>> >> + return TLV_ACCEPT;
>> >>
>> >> drop:
>> >> kfree_skb(skb);
>> >> - return false;
>> >> + return TLV_REJECT;
>> >> }
>> >>
>> >> /* CALIPSO RFC 5570 */
>> >>
>> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> {
>> >> const unsigned char *nh = skb_network_header(skb);
>> >>
>> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
>> >> optoff)
>> >> if (!calipso_validate(skb, nh + optoff))
>> >> goto drop;
>> >>
>> >> - return true;
>> >> + return TLV_ACCEPT;
>> >>
>> >> drop:
>> >> kfree_skb(skb);
>> >> - return false;
>> >> + return TLV_REJECT;
>> >> }
>> >>
>> >> static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>> >>
>> >> opt->flags |= IP6SKB_HOPBYHOP;
>> >> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
>> >> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
>> >> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
>> >> + true)) {
>> >> + /* we need to refresh the length in case
>> >> + * at least one TLV was removed
>> >> + */
>> >> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
>> >> skb->transport_header += extlen;
>> >> opt = IP6CB(skb);
>> >> opt->nhoff = sizeof(struct ipv6hdr);
>> >> --
> > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-26 8:22 ` Justin Iurman
@ 2020-06-26 15:39 ` Tom Herbert
2020-06-26 17:14 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 15:39 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Fri, Jun 26, 2020 at 1:22 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Tom,
>
> >> Hi Tom,
> >>
> >> >> Add the possibility to remove one or more consecutive TLVs without
> >> >> messing up the alignment of others. For now, only IOAM requires this
> >> >> behavior.
> >> >>
> >> > Hi Justin,
> >> >
> >> > Can you explain the motivation for this? Per RFC8200, extension
> >> > headers in flight are not to be added, removed, or modified outside of
> >> > the standard rules for processing modifiable HBH and DO TLVs., that
> >> > would include adding and removing TLVs in EH. One obvious problem this
> >>
> >> As you already know from our last meeting, IOAM may be configured on a node such
> >> that a specific IOAM namespace should be removed. Therefore, this patch
> >> provides support for the deletion of a TLV (or consecutive TLVs), without
> >> removing the entire EH (if it's empty, there will be padding). Note that there
> >> is a similar "problem" with the Incremental Trace where you'd need to expand
> >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
> >> against modification of in-flight EHs, but there are several reasons that, I
> >> believe, mitigates this statement.
> >>
> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
> >> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
> >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
> >> domain, ie from an IOAM node inside the domain to another one (no need for
> >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
> >> so we modify "our" header and (ii) we already own the traffic.
> >>
> >> And if someone is still angry about this, well, the good news is that such
> >> modification can be avoided most of the time. Indeed, operators are advised to
> >> remove an IOAM namespace only on egress nodes. This way, the destination
> >> (either the tunnel destination or the real destination, depending on the
> >> scenario) will receive EHs and take care of them without the need to remove
> >> anything. But, again, operators can do what they want and I'd tend to adhere to
> >> David's philosophy [1] and give them the possibility to choose what to do.
> >>
> >
> > Justin,
> >
> > 6man WG has had a _long_ and sometimes bitter discussion around this
> > particularly with regards to insertion of SRH. The current consensus
> > of IETF is that it is a violation of RFC8200. We've heard all the
> > arguments that it's only for limited domains and narrow use cases,
> > nevertheless there are several problems that the header
> > insertion/deletion advocates never answered-- it breaks AH, it breaks
> > PMTU discovery, it breaks ICMP. There is also a risk that a
> > non-standard modification could cause a packet to be dropped
> > downstream from the node that modifies it. There is no attribution on
> > who created the problem, and hence this can lead to systematic
> > blackholes which are the most miserable sort of problem to debug.
>
> Yes, I know the whole story and it's been stormy from what I understood.
>
> > Fundamentally, it is not robust per Postel's law (I actually wrote a
> > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
> > you're interested).
>
> Interesting, I'll take a look.
>
> > IMO, we shouldn't be using Linux as a backdoor to implement protocol
> > that IETF is saying isn't robust. Can you point out in the IOAM drafts
> > where this requirement is specified, then I can take it up in IOAM WG
> > or 6man if needed...
>
> Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1] (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be published.
I was specifically referring to the requirements around removing the
IOAM TLV from packets in-flight. I don't readily see that in the IOAM
drafts.
Also, be careful about saying that drafts are about to be published by
IETF. Until a draft reaches the RFC editor we really can't say that. I
don't believe drafts you're referring to have even made it through
WGLC.
Tom
>
> Justin
>
> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>
> > Tom
> >
> >> > creates is that it breaks AH if the TLVs are removed in HBH before AH
> >> > is processed (AH is processed after HBH).
> >>
> >> Correct. But I don't think it should prevent us from having IOAM in the kernel.
> >> Again, operators could simply apply IOAM on a subset of the traffic that does
> >> not include AHs, for example.
> >>
> >> Justin
> >>
> >> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
> >>
> >> > Tom
> >> >> By default, an 8-octet boundary is automatically assumed. This is the
> >> >> price to pay (at most a useless 4-octet padding) to make sure everything
> >> >> is still aligned after the removal.
> >> >>
> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> >> >> header.
> >> >>
> >> >> Example 1:
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | X | X | Padding | Padding |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | |
> >> >> ~ Option to be removed (8 octets) ~
> >> >> | |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Y | Y | Y | Y |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Padding | Padding | Padding | Padding |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> >> >> boundary (same result in both cases).
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | X | X | Padding | Padding |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Y | Y | Y | Y |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Padding | Padding | Padding | Padding |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Example 2:
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | X | X | Padding | Padding |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Option to be removed (4 octets) |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Y | Y | Y | Y |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> >> >> of 8 anymore.
> >> >>
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | X | X | Padding | Padding |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Y | Y | Y | Y |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> | Z | Z | Z | Z |
> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >>
> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for
> >> >> all, which means that blocks are only moved in multiples of 8. This
> >> >> assertion guarantees good alignment.
> >> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> ---
> >> >> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> >> >> 1 file changed, 108 insertions(+), 26 deletions(-)
> >> >>
> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> index e9b366994475..f27ab3bf2e0c 100644
> >> >> --- a/net/ipv6/exthdrs.c
> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> @@ -52,17 +52,27 @@
> >> >>
> >> >> #include <linux/uaccess.h>
> >> >>
> >> >> -/*
> >> >> - * Parsing tlv encoded headers.
> >> >> +/* States for TLV parsing functions. */
> >> >> +
> >> >> +enum {
> >> >> + TLV_ACCEPT,
> >> >> + TLV_REJECT,
> >> >> + TLV_REMOVE,
> >> >> + __TLV_MAX
> >> >> +};
> >> >> +
> >> >> +/* Parsing TLV encoded headers.
> >> >> *
> >> >> - * Parsing function "func" returns true, if parsing succeed
> >> >> - * and false, if it failed.
> >> >> - * It MUST NOT touch skb->h.
> >> >> + * Parsing function "func" returns either:
> >> >> + * - TLV_ACCEPT if parsing succeeds
> >> >> + * - TLV_REJECT if parsing fails
> >> >> + * - TLV_REMOVE if TLV must be removed
> >> >> + * It MUST NOT touch skb->h.
> >> >> */
> >> >>
> >> >> struct tlvtype_proc {
> >> >> int type;
> >> >> - bool (*func)(struct sk_buff *skb, int offset);
> >> >> + int (*func)(struct sk_buff *skb, int offset);
> >> >> };
> >> >>
> >> >> /*********************
> >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
> >> >> optoff,
> >> >> return false;
> >> >> }
> >> >>
> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> >> >> +
> >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> >> >> +{
> >> >> + int len = end - start;
> >> >> + int padlen = len % 8;
> >> >> + unsigned char *h;
> >> >> + int rlen, off;
> >> >> + u16 pl_len;
> >> >> +
> >> >> + rlen = len - padlen;
> >> >> + if (rlen) {
> >> >> + skb_pull(skb, rlen);
> >> >> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> >> >> + start);
> >> >> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> >> >> +
> >> >> + skb_reset_network_header(skb);
> >> >> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> >> >> +
> >> >> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> >> >> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> >> >> +
> >> >> + skb_transport_header(skb)[1] -= rlen >> 3;
> >> >> + end -= rlen;
> >> >> + }
> >> >> +
> >> >> + if (padlen) {
> >> >> + off = end - padlen;
> >> >> + h = skb_network_header(skb);
> >> >> +
> >> >> + if (padlen == 1) {
> >> >> + h[off] = IPV6_TLV_PAD1;
> >> >> + } else {
> >> >> + padlen -= 2;
> >> >> +
> >> >> + h[off] = IPV6_TLV_PADN;
> >> >> + h[off + 1] = padlen;
> >> >> + memset(&h[off + 2], 0, padlen);
> >> >> + }
> >> >> + }
> >> >> +
> >> >> + return end;
> >> >> +}
> >> >> +
> >> >> /* Parse tlv encoded option header (hop-by-hop or destination) */
> >> >>
> >> >> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> struct sk_buff *skb,
> >> >> - int max_count)
> >> >> + int max_count,
> >> >> + bool removable)
> >> >> {
> >> >> int len = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> - const unsigned char *nh = skb_network_header(skb);
> >> >> + unsigned char *nh = skb_network_header(skb);
> >> >> int off = skb_network_header_len(skb);
> >> >> const struct tlvtype_proc *curr;
> >> >> bool disallow_unknowns = false;
> >> >> + int off_remove = 0;
> >> >> int tlv_count = 0;
> >> >> int padlen = 0;
> >> >> + int ret;
> >> >>
> >> >> if (unlikely(max_count < 0)) {
> >> >> disallow_unknowns = true;
> >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
> >> >> *procs,
> >> >> if (tlv_count > max_count)
> >> >> goto bad;
> >> >>
> >> >> + ret = -1;
> >> >> for (curr = procs; curr->type >= 0; curr++) {
> >> >> if (curr->type == nh[off]) {
> >> >> /* type specific length/alignment
> >> >> checks will be performed in the
> >> >> func(). */
> >> >> - if (curr->func(skb, off) == false)
> >> >> + ret = curr->func(skb, off);
> >> >> + if (ret == TLV_REJECT)
> >> >> return false;
> >> >> break;
> >> >> }
> >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> >> >> return false;
> >> >>
> >> >> + if (removable) {
> >> >> + if (ret == TLV_REMOVE) {
> >> >> + if (!off_remove)
> >> >> + off_remove = off - padlen;
> >> >> + } else if (off_remove) {
> >> >> + off = remove_tlv(off_remove, off, skb);
> >> >> + nh = skb_network_header(skb);
> >> >> + off_remove = 0;
> >> >> + }
> >> >> + }
> >> >> +
> >> >> padlen = 0;
> >> >> break;
> >> >> }
> >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> len -= optlen;
> >> >> }
> >> >>
> >> >> - if (len == 0)
> >> >> + if (len == 0) {
> >> >> + /* Don't forget last TLV if it must be removed */
> >> >> + if (off_remove)
> >> >> + remove_tlv(off_remove, off, skb);
> >> >> +
> >> >> return true;
> >> >> + }
> >> >> bad:
> >> >> kfree_skb(skb);
> >> >> return false;
> >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> *****************************/
> >> >>
> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> {
> >> >> struct ipv6_destopt_hao *hao;
> >> >> struct inet6_skb_parm *opt = IP6CB(skb);
> >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> if (skb->tstamp == 0)
> >> >> __net_timestamp(skb);
> >> >>
> >> >> - return true;
> >> >> + return TLV_ACCEPT;
> >> >>
> >> >> discard:
> >> >> kfree_skb(skb);
> >> >> - return false;
> >> >> + return TLV_REJECT;
> >> >> }
> >> >> #endif
> >> >>
> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> >> >> #endif
> >> >>
> >> >> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> >> >> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> >> >> + init_net.ipv6.sysctl.max_dst_opts_cnt,
> >> >> + false)) {
> >> >> skb->transport_header += extlen;
> >> >> opt = IP6CB(skb);
> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
> >> >> *skb)
> >> >>
> >> >> /* Router Alert as of RFC 2711 */
> >> >>
> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> {
> >> >> const unsigned char *nh = skb_network_header(skb);
> >> >>
> >> >> if (nh[optoff + 1] == 2) {
> >> >> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> >> >> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> >> >> - return true;
> >> >> + return TLV_ACCEPT;
> >> >> }
> >> >> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> >> >> nh[optoff + 1]);
> >> >> kfree_skb(skb);
> >> >> - return false;
> >> >> + return TLV_REJECT;
> >> >> }
> >> >>
> >> >> /* Jumbo payload */
> >> >>
> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> {
> >> >> const unsigned char *nh = skb_network_header(skb);
> >> >> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> optoff)
> >> >> if (pkt_len <= IPV6_MAXPLEN) {
> >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> >> >> - return false;
> >> >> + return TLV_REJECT;
> >> >> }
> >> >> if (ipv6_hdr(skb)->payload_len) {
> >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> >> >> - return false;
> >> >> + return TLV_REJECT;
> >> >> }
> >> >>
> >> >> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> optoff)
> >> >> goto drop;
> >> >>
> >> >> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> >> >> - return true;
> >> >> + return TLV_ACCEPT;
> >> >>
> >> >> drop:
> >> >> kfree_skb(skb);
> >> >> - return false;
> >> >> + return TLV_REJECT;
> >> >> }
> >> >>
> >> >> /* CALIPSO RFC 5570 */
> >> >>
> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> {
> >> >> const unsigned char *nh = skb_network_header(skb);
> >> >>
> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
> >> >> optoff)
> >> >> if (!calipso_validate(skb, nh + optoff))
> >> >> goto drop;
> >> >>
> >> >> - return true;
> >> >> + return TLV_ACCEPT;
> >> >>
> >> >> drop:
> >> >> kfree_skb(skb);
> >> >> - return false;
> >> >> + return TLV_REJECT;
> >> >> }
> >> >>
> >> >> static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
> >> >>
> >> >> opt->flags |= IP6SKB_HOPBYHOP;
> >> >> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> >> >> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> >> >> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
> >> >> + true)) {
> >> >> + /* we need to refresh the length in case
> >> >> + * at least one TLV was removed
> >> >> + */
> >> >> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> skb->transport_header += extlen;
> >> >> opt = IP6CB(skb);
> >> >> opt->nhoff = sizeof(struct ipv6hdr);
> >> >> --
> > > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-26 15:39 ` Tom Herbert
@ 2020-06-26 17:14 ` Justin Iurman
2020-06-26 18:35 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 17:14 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
>> Tom,
>>
>> >> Hi Tom,
>> >>
>> >> >> Add the possibility to remove one or more consecutive TLVs without
>> >> >> messing up the alignment of others. For now, only IOAM requires this
>> >> >> behavior.
>> >> >>
>> >> > Hi Justin,
>> >> >
>> >> > Can you explain the motivation for this? Per RFC8200, extension
>> >> > headers in flight are not to be added, removed, or modified outside of
>> >> > the standard rules for processing modifiable HBH and DO TLVs., that
>> >> > would include adding and removing TLVs in EH. One obvious problem this
>> >>
>> >> As you already know from our last meeting, IOAM may be configured on a node such
>> >> that a specific IOAM namespace should be removed. Therefore, this patch
>> >> provides support for the deletion of a TLV (or consecutive TLVs), without
>> >> removing the entire EH (if it's empty, there will be padding). Note that there
>> >> is a similar "problem" with the Incremental Trace where you'd need to expand
>> >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
>> >> against modification of in-flight EHs, but there are several reasons that, I
>> >> believe, mitigates this statement.
>> >>
>> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
>> >> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
>> >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
>> >> domain, ie from an IOAM node inside the domain to another one (no need for
>> >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
>> >> so we modify "our" header and (ii) we already own the traffic.
>> >>
>> >> And if someone is still angry about this, well, the good news is that such
>> >> modification can be avoided most of the time. Indeed, operators are advised to
>> >> remove an IOAM namespace only on egress nodes. This way, the destination
>> >> (either the tunnel destination or the real destination, depending on the
>> >> scenario) will receive EHs and take care of them without the need to remove
>> >> anything. But, again, operators can do what they want and I'd tend to adhere to
>> >> David's philosophy [1] and give them the possibility to choose what to do.
>> >>
>> >
>> > Justin,
>> >
>> > 6man WG has had a _long_ and sometimes bitter discussion around this
>> > particularly with regards to insertion of SRH. The current consensus
>> > of IETF is that it is a violation of RFC8200. We've heard all the
>> > arguments that it's only for limited domains and narrow use cases,
>> > nevertheless there are several problems that the header
>> > insertion/deletion advocates never answered-- it breaks AH, it breaks
>> > PMTU discovery, it breaks ICMP. There is also a risk that a
>> > non-standard modification could cause a packet to be dropped
>> > downstream from the node that modifies it. There is no attribution on
>> > who created the problem, and hence this can lead to systematic
>> > blackholes which are the most miserable sort of problem to debug.
>>
>> Yes, I know the whole story and it's been stormy from what I understood.
>>
>> > Fundamentally, it is not robust per Postel's law (I actually wrote a
>> > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
>> > you're interested).
>>
>> Interesting, I'll take a look.
>>
>> > IMO, we shouldn't be using Linux as a backdoor to implement protocol
>> > that IETF is saying isn't robust. Can you point out in the IOAM drafts
>> > where this requirement is specified, then I can take it up in IOAM WG
>> > or 6man if needed...
>>
>> Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1]
>> (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be
>> published.
>
> I was specifically referring to the requirements around removing the
> IOAM TLV from packets in-flight. I don't readily see that in the IOAM
> drafts.
Actually, this is not in the draft. Authors wanted to give operators a little bit of freedom and this one would restrict their choices, even if it's better or even the most logical option we could think about. Maybe we could discuss this on the IPPM mailing list as well on whether we should add it or not? I've two advises for operators, one about the encapsulation and this one about the removal of an IOAM option.
> Also, be careful about saying that drafts are about to be published by
> IETF. Until a draft reaches the RFC editor we really can't say that. I
> don't believe drafts you're referring to have even made it through
> WGLC.
Indeed, but draft-ietf-ippm-ioam-data is already at its second WGLC, did you miss it on the IPPM mailing list? As for draft-ietf-ippm-ioam-ipv6-options, it is just my prediction but I guess it should come soon as well since IANA early allocation (there were talks about that on the WG).
Justin
> Tom
>
>>
>> Justin
>>
>> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>>
>> > Tom
>> >
>> >> > creates is that it breaks AH if the TLVs are removed in HBH before AH
>> >> > is processed (AH is processed after HBH).
>> >>
>> >> Correct. But I don't think it should prevent us from having IOAM in the kernel.
>> >> Again, operators could simply apply IOAM on a subset of the traffic that does
>> >> not include AHs, for example.
>> >>
>> >> Justin
>> >>
>> >> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
>> >>
>> >> > Tom
>> >> >> By default, an 8-octet boundary is automatically assumed. This is the
>> >> >> price to pay (at most a useless 4-octet padding) to make sure everything
>> >> >> is still aligned after the removal.
>> >> >>
>> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
>> >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
>> >> >> header.
>> >> >>
>> >> >> Example 1:
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Next header | Hdr Ext Len | X | X |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | X | X | Padding | Padding |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | |
>> >> >> ~ Option to be removed (8 octets) ~
>> >> >> | |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Y | Y | Y | Y |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Padding | Padding | Padding | Padding |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
>> >> >> boundary (same result in both cases).
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Next header | Hdr Ext Len | X | X |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | X | X | Padding | Padding |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Y | Y | Y | Y |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Padding | Padding | Padding | Padding |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Example 2:
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Next header | Hdr Ext Len | X | X |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | X | X | Padding | Padding |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Option to be removed (4 octets) |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Y | Y | Y | Y |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
>> >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
>> >> >> of 8 anymore.
>> >> >>
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Next header | Hdr Ext Len | X | X |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | X | X | Padding | Padding |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Y | Y | Y | Y |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >> | Z | Z | Z | Z |
>> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> >> >>
>> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for
>> >> >> all, which means that blocks are only moved in multiples of 8. This
>> >> >> assertion guarantees good alignment.
>> >> >>
>> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> >> ---
>> >> >> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
>> >> >> 1 file changed, 108 insertions(+), 26 deletions(-)
>> >> >>
>> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> >> >> index e9b366994475..f27ab3bf2e0c 100644
>> >> >> --- a/net/ipv6/exthdrs.c
>> >> >> +++ b/net/ipv6/exthdrs.c
>> >> >> @@ -52,17 +52,27 @@
>> >> >>
>> >> >> #include <linux/uaccess.h>
>> >> >>
>> >> >> -/*
>> >> >> - * Parsing tlv encoded headers.
>> >> >> +/* States for TLV parsing functions. */
>> >> >> +
>> >> >> +enum {
>> >> >> + TLV_ACCEPT,
>> >> >> + TLV_REJECT,
>> >> >> + TLV_REMOVE,
>> >> >> + __TLV_MAX
>> >> >> +};
>> >> >> +
>> >> >> +/* Parsing TLV encoded headers.
>> >> >> *
>> >> >> - * Parsing function "func" returns true, if parsing succeed
>> >> >> - * and false, if it failed.
>> >> >> - * It MUST NOT touch skb->h.
>> >> >> + * Parsing function "func" returns either:
>> >> >> + * - TLV_ACCEPT if parsing succeeds
>> >> >> + * - TLV_REJECT if parsing fails
>> >> >> + * - TLV_REMOVE if TLV must be removed
>> >> >> + * It MUST NOT touch skb->h.
>> >> >> */
>> >> >>
>> >> >> struct tlvtype_proc {
>> >> >> int type;
>> >> >> - bool (*func)(struct sk_buff *skb, int offset);
>> >> >> + int (*func)(struct sk_buff *skb, int offset);
>> >> >> };
>> >> >>
>> >> >> /*********************
>> >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
>> >> >> optoff,
>> >> >> return false;
>> >> >> }
>> >> >>
>> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
>> >> >> +
>> >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
>> >> >> +{
>> >> >> + int len = end - start;
>> >> >> + int padlen = len % 8;
>> >> >> + unsigned char *h;
>> >> >> + int rlen, off;
>> >> >> + u16 pl_len;
>> >> >> +
>> >> >> + rlen = len - padlen;
>> >> >> + if (rlen) {
>> >> >> + skb_pull(skb, rlen);
>> >> >> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
>> >> >> + start);
>> >> >> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
>> >> >> +
>> >> >> + skb_reset_network_header(skb);
>> >> >> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
>> >> >> +
>> >> >> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
>> >> >> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
>> >> >> +
>> >> >> + skb_transport_header(skb)[1] -= rlen >> 3;
>> >> >> + end -= rlen;
>> >> >> + }
>> >> >> +
>> >> >> + if (padlen) {
>> >> >> + off = end - padlen;
>> >> >> + h = skb_network_header(skb);
>> >> >> +
>> >> >> + if (padlen == 1) {
>> >> >> + h[off] = IPV6_TLV_PAD1;
>> >> >> + } else {
>> >> >> + padlen -= 2;
>> >> >> +
>> >> >> + h[off] = IPV6_TLV_PADN;
>> >> >> + h[off + 1] = padlen;
>> >> >> + memset(&h[off + 2], 0, padlen);
>> >> >> + }
>> >> >> + }
>> >> >> +
>> >> >> + return end;
>> >> >> +}
>> >> >> +
>> >> >> /* Parse tlv encoded option header (hop-by-hop or destination) */
>> >> >>
>> >> >> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >> struct sk_buff *skb,
>> >> >> - int max_count)
>> >> >> + int max_count,
>> >> >> + bool removable)
>> >> >> {
>> >> >> int len = (skb_transport_header(skb)[1] + 1) << 3;
>> >> >> - const unsigned char *nh = skb_network_header(skb);
>> >> >> + unsigned char *nh = skb_network_header(skb);
>> >> >> int off = skb_network_header_len(skb);
>> >> >> const struct tlvtype_proc *curr;
>> >> >> bool disallow_unknowns = false;
>> >> >> + int off_remove = 0;
>> >> >> int tlv_count = 0;
>> >> >> int padlen = 0;
>> >> >> + int ret;
>> >> >>
>> >> >> if (unlikely(max_count < 0)) {
>> >> >> disallow_unknowns = true;
>> >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
>> >> >> *procs,
>> >> >> if (tlv_count > max_count)
>> >> >> goto bad;
>> >> >>
>> >> >> + ret = -1;
>> >> >> for (curr = procs; curr->type >= 0; curr++) {
>> >> >> if (curr->type == nh[off]) {
>> >> >> /* type specific length/alignment
>> >> >> checks will be performed in the
>> >> >> func(). */
>> >> >> - if (curr->func(skb, off) == false)
>> >> >> + ret = curr->func(skb, off);
>> >> >> + if (ret == TLV_REJECT)
>> >> >> return false;
>> >> >> break;
>> >> >> }
>> >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
>> >> >> return false;
>> >> >>
>> >> >> + if (removable) {
>> >> >> + if (ret == TLV_REMOVE) {
>> >> >> + if (!off_remove)
>> >> >> + off_remove = off - padlen;
>> >> >> + } else if (off_remove) {
>> >> >> + off = remove_tlv(off_remove, off, skb);
>> >> >> + nh = skb_network_header(skb);
>> >> >> + off_remove = 0;
>> >> >> + }
>> >> >> + }
>> >> >> +
>> >> >> padlen = 0;
>> >> >> break;
>> >> >> }
>> >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >> len -= optlen;
>> >> >> }
>> >> >>
>> >> >> - if (len == 0)
>> >> >> + if (len == 0) {
>> >> >> + /* Don't forget last TLV if it must be removed */
>> >> >> + if (off_remove)
>> >> >> + remove_tlv(off_remove, off, skb);
>> >> >> +
>> >> >> return true;
>> >> >> + }
>> >> >> bad:
>> >> >> kfree_skb(skb);
>> >> >> return false;
>> >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
>> >> >> *****************************/
>> >> >>
>> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> >> {
>> >> >> struct ipv6_destopt_hao *hao;
>> >> >> struct inet6_skb_parm *opt = IP6CB(skb);
>> >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
>> >> >> if (skb->tstamp == 0)
>> >> >> __net_timestamp(skb);
>> >> >>
>> >> >> - return true;
>> >> >> + return TLV_ACCEPT;
>> >> >>
>> >> >> discard:
>> >> >> kfree_skb(skb);
>> >> >> - return false;
>> >> >> + return TLV_REJECT;
>> >> >> }
>> >> >> #endif
>> >> >>
>> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
>> >> >> #endif
>> >> >>
>> >> >> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
>> >> >> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
>> >> >> + init_net.ipv6.sysctl.max_dst_opts_cnt,
>> >> >> + false)) {
>> >> >> skb->transport_header += extlen;
>> >> >> opt = IP6CB(skb);
>> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
>> >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
>> >> >> *skb)
>> >> >>
>> >> >> /* Router Alert as of RFC 2711 */
>> >> >>
>> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> >> {
>> >> >> const unsigned char *nh = skb_network_header(skb);
>> >> >>
>> >> >> if (nh[optoff + 1] == 2) {
>> >> >> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
>> >> >> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
>> >> >> - return true;
>> >> >> + return TLV_ACCEPT;
>> >> >> }
>> >> >> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
>> >> >> nh[optoff + 1]);
>> >> >> kfree_skb(skb);
>> >> >> - return false;
>> >> >> + return TLV_REJECT;
>> >> >> }
>> >> >>
>> >> >> /* Jumbo payload */
>> >> >>
>> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> >> {
>> >> >> const unsigned char *nh = skb_network_header(skb);
>> >> >> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
>> >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> >> optoff)
>> >> >> if (pkt_len <= IPV6_MAXPLEN) {
>> >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
>> >> >> - return false;
>> >> >> + return TLV_REJECT;
>> >> >> }
>> >> >> if (ipv6_hdr(skb)->payload_len) {
>> >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>> >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
>> >> >> - return false;
>> >> >> + return TLV_REJECT;
>> >> >> }
>> >> >>
>> >> >> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
>> >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
>> >> >> optoff)
>> >> >> goto drop;
>> >> >>
>> >> >> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
>> >> >> - return true;
>> >> >> + return TLV_ACCEPT;
>> >> >>
>> >> >> drop:
>> >> >> kfree_skb(skb);
>> >> >> - return false;
>> >> >> + return TLV_REJECT;
>> >> >> }
>> >> >>
>> >> >> /* CALIPSO RFC 5570 */
>> >> >>
>> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
>> >> >> {
>> >> >> const unsigned char *nh = skb_network_header(skb);
>> >> >>
>> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
>> >> >> optoff)
>> >> >> if (!calipso_validate(skb, nh + optoff))
>> >> >> goto drop;
>> >> >>
>> >> >> - return true;
>> >> >> + return TLV_ACCEPT;
>> >> >>
>> >> >> drop:
>> >> >> kfree_skb(skb);
>> >> >> - return false;
>> >> >> + return TLV_REJECT;
>> >> >> }
>> >> >>
>> >> >> static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
>> >> >>
>> >> >> opt->flags |= IP6SKB_HOPBYHOP;
>> >> >> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
>> >> >> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
>> >> >> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
>> >> >> + true)) {
>> >> >> + /* we need to refresh the length in case
>> >> >> + * at least one TLV was removed
>> >> >> + */
>> >> >> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
>> >> >> skb->transport_header += extlen;
>> >> >> opt = IP6CB(skb);
>> >> >> opt->nhoff = sizeof(struct ipv6hdr);
>> >> >> --
> > > > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs
2020-06-26 17:14 ` Justin Iurman
@ 2020-06-26 18:35 ` Tom Herbert
0 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 18:35 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Fri, Jun 26, 2020 at 10:14 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> >> Tom,
> >>
> >> >> Hi Tom,
> >> >>
> >> >> >> Add the possibility to remove one or more consecutive TLVs without
> >> >> >> messing up the alignment of others. For now, only IOAM requires this
> >> >> >> behavior.
> >> >> >>
> >> >> > Hi Justin,
> >> >> >
> >> >> > Can you explain the motivation for this? Per RFC8200, extension
> >> >> > headers in flight are not to be added, removed, or modified outside of
> >> >> > the standard rules for processing modifiable HBH and DO TLVs., that
> >> >> > would include adding and removing TLVs in EH. One obvious problem this
> >> >>
> >> >> As you already know from our last meeting, IOAM may be configured on a node such
> >> >> that a specific IOAM namespace should be removed. Therefore, this patch
> >> >> provides support for the deletion of a TLV (or consecutive TLVs), without
> >> >> removing the entire EH (if it's empty, there will be padding). Note that there
> >> >> is a similar "problem" with the Incremental Trace where you'd need to expand
> >> >> the Hop-by-Hop (not included in this patchset). I agree that RFC 8200 is
> >> >> against modification of in-flight EHs, but there are several reasons that, I
> >> >> believe, mitigates this statement.
> >> >>
> >> >> Let's keep in mind that IOAM purpose is "private" (= IOAM domain), ie not widely
> >> >> deployed on the Internet. We can distinguish two big scenarios: (i) in-transit
> >> >> traffic where it is encapsulated (IPv6-in-IPv6) and (ii) traffic inside the
> >> >> domain, ie from an IOAM node inside the domain to another one (no need for
> >> >> encapsulation). In both cases, we kind of own the traffic: (i) encapsulation,
> >> >> so we modify "our" header and (ii) we already own the traffic.
> >> >>
> >> >> And if someone is still angry about this, well, the good news is that such
> >> >> modification can be avoided most of the time. Indeed, operators are advised to
> >> >> remove an IOAM namespace only on egress nodes. This way, the destination
> >> >> (either the tunnel destination or the real destination, depending on the
> >> >> scenario) will receive EHs and take care of them without the need to remove
> >> >> anything. But, again, operators can do what they want and I'd tend to adhere to
> >> >> David's philosophy [1] and give them the possibility to choose what to do.
> >> >>
> >> >
> >> > Justin,
> >> >
> >> > 6man WG has had a _long_ and sometimes bitter discussion around this
> >> > particularly with regards to insertion of SRH. The current consensus
> >> > of IETF is that it is a violation of RFC8200. We've heard all the
> >> > arguments that it's only for limited domains and narrow use cases,
> >> > nevertheless there are several problems that the header
> >> > insertion/deletion advocates never answered-- it breaks AH, it breaks
> >> > PMTU discovery, it breaks ICMP. There is also a risk that a
> >> > non-standard modification could cause a packet to be dropped
> >> > downstream from the node that modifies it. There is no attribution on
> >> > who created the problem, and hence this can lead to systematic
> >> > blackholes which are the most miserable sort of problem to debug.
> >>
> >> Yes, I know the whole story and it's been stormy from what I understood.
> >>
> >> > Fundamentally, it is not robust per Postel's law (I actually wrote a
> >> > draft to try to make it robust in draft-herbert-6man-eh-attrib-00 if
> >> > you're interested).
> >>
> >> Interesting, I'll take a look.
> >>
> >> > IMO, we shouldn't be using Linux as a backdoor to implement protocol
> >> > that IETF is saying isn't robust. Can you point out in the IOAM drafts
> >> > where this requirement is specified, then I can take it up in IOAM WG
> >> > or 6man if needed...
> >>
> >> Well, I wouldn't say that IETF is considering IPv6-IOAM as not robust since [1]
> >> (IPv6 encapsulation for IOAM) and [2] (IOAM data fields) are about to be
> >> published.
> >
> > I was specifically referring to the requirements around removing the
> > IOAM TLV from packets in-flight. I don't readily see that in the IOAM
> > drafts.
>patch
> Actually, this is not in the draft. Authors wanted to give operators a little bit of freedom and this one would restrict their choices, even if it's better or even the most logical option we could think about. Maybe we could discuss this on the IPPM mailing list as well on whether we should add it or not? I've two advises for operators, one about the encapsulation and this one about the removal of an IOAM option.
>
Justin,
You're welcome to take it up on the IPPM list, but beware there is
going to be pushback. Make sure you can show a clear justification and
how any potential issues it causes are mitigated. If
draft-herbert-6man-eh-attrib-00 facilitates that we can take a look
implementing it.
Until we have clarity on the protocol requirements and the need for
this, I don't think this patch should be accepted.
Tom
> > Also, be careful about saying that drafts are about to be published by
> > IETF. Until a draft reaches the RFC editor we really can't say that. I
> > don't believe drafts you're referring to have even made it through
> > WGLC.
>
> Indeed, but draft-ietf-ippm-ioam-data is already at its second WGLC, did you miss it on the IPPM mailing list? As for draft-ietf-ippm-ioam-ipv6-options, it is just my prediction but I guess it should come soon as well since IANA early allocation (there were talks about that on the WG).
>
> Justin
>
> > Tom
> >
> >>
> >> Justin
> >>
> >> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >>
> >> > Tom
> >> >
> >> >> > creates is that it breaks AH if the TLVs are removed in HBH before AH
> >> >> > is processed (AH is processed after HBH).
> >> >>
> >> >> Correct. But I don't think it should prevent us from having IOAM in the kernel.
> >> >> Again, operators could simply apply IOAM on a subset of the traffic that does
> >> >> not include AHs, for example.
> >> >>
> >> >> Justin
> >> >>
> >> >> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg136797.html
> >> >>
> >> >> > Tom
> >> >> >> By default, an 8-octet boundary is automatically assumed. This is the
> >> >> >> price to pay (at most a useless 4-octet padding) to make sure everything
> >> >> >> is still aligned after the removal.
> >> >> >>
> >> >> >> Proof: let's assume for instance the following alignments 2n, 4n and 8n
> >> >> >> respectively for options X, Y and Z, inside a Hop-by-Hop extension
> >> >> >> header.
> >> >> >>
> >> >> >> Example 1:
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | X | X | Padding | Padding |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | |
> >> >> >> ~ Option to be removed (8 octets) ~
> >> >> >> | |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Y | Y | Y | Y |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Padding | Padding | Padding | Padding |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Result 1: assuming a 4-octet boundary would work, as well as an 8-octet
> >> >> >> boundary (same result in both cases).
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | X | X | Padding | Padding |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Y | Y | Y | Y |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Padding | Padding | Padding | Padding |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Example 2:
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | X | X | Padding | Padding |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Option to be removed (4 octets) |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Y | Y | Y | Y |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Result 2: assuming a 4-octet boundary WOULD NOT WORK. Indeed, option Z
> >> >> >> would not be 8n-aligned and the Hop-by-Hop size would not be a multiple
> >> >> >> of 8 anymore.
> >> >> >>
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Next header | Hdr Ext Len | X | X |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | X | X | Padding | Padding |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Y | Y | Y | Y |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >> | Z | Z | Z | Z |
> >> >> >> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> >> >> >>
> >> >> >> Therefore, the largest (8-octet) boundary is assumed by default and for
> >> >> >> all, which means that blocks are only moved in multiples of 8. This
> >> >> >> assertion guarantees good alignment.
> >> >> >>
> >> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> >> ---
> >> >> >> net/ipv6/exthdrs.c | 134 ++++++++++++++++++++++++++++++++++++---------
> >> >> >> 1 file changed, 108 insertions(+), 26 deletions(-)
> >> >> >>
> >> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> >> index e9b366994475..f27ab3bf2e0c 100644
> >> >> >> --- a/net/ipv6/exthdrs.c
> >> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> >> @@ -52,17 +52,27 @@
> >> >> >>
> >> >> >> #include <linux/uaccess.h>
> >> >> >>
> >> >> >> -/*
> >> >> >> - * Parsing tlv encoded headers.
> >> >> >> +/* States for TLV parsing functions. */
> >> >> >> +
> >> >> >> +enum {
> >> >> >> + TLV_ACCEPT,
> >> >> >> + TLV_REJECT,
> >> >> >> + TLV_REMOVE,
> >> >> >> + __TLV_MAX
> >> >> >> +};
> >> >> >> +
> >> >> >> +/* Parsing TLV encoded headers.
> >> >> >> *
> >> >> >> - * Parsing function "func" returns true, if parsing succeed
> >> >> >> - * and false, if it failed.
> >> >> >> - * It MUST NOT touch skb->h.
> >> >> >> + * Parsing function "func" returns either:
> >> >> >> + * - TLV_ACCEPT if parsing succeeds
> >> >> >> + * - TLV_REJECT if parsing fails
> >> >> >> + * - TLV_REMOVE if TLV must be removed
> >> >> >> + * It MUST NOT touch skb->h.
> >> >> >> */
> >> >> >>
> >> >> >> struct tlvtype_proc {
> >> >> >> int type;
> >> >> >> - bool (*func)(struct sk_buff *skb, int offset);
> >> >> >> + int (*func)(struct sk_buff *skb, int offset);
> >> >> >> };
> >> >> >>
> >> >> >> /*********************
> >> >> >> @@ -109,19 +119,67 @@ static bool ip6_tlvopt_unknown(struct sk_buff *skb, int
> >> >> >> optoff,
> >> >> >> return false;
> >> >> >> }
> >> >> >>
> >> >> >> +/* Remove one or several consecutive TLVs and recompute offsets, lengths */
> >> >> >> +
> >> >> >> +static int remove_tlv(int start, int end, struct sk_buff *skb)
> >> >> >> +{
> >> >> >> + int len = end - start;
> >> >> >> + int padlen = len % 8;
> >> >> >> + unsigned char *h;
> >> >> >> + int rlen, off;
> >> >> >> + u16 pl_len;
> >> >> >> +
> >> >> >> + rlen = len - padlen;
> >> >> >> + if (rlen) {
> >> >> >> + skb_pull(skb, rlen);
> >> >> >> + memmove(skb_network_header(skb) + rlen, skb_network_header(skb),
> >> >> >> + start);
> >> >> >> + skb_postpull_rcsum(skb, skb_network_header(skb), rlen);
> >> >> >> +
> >> >> >> + skb_reset_network_header(skb);
> >> >> >> + skb_set_transport_header(skb, sizeof(struct ipv6hdr));
> >> >> >> +
> >> >> >> + pl_len = be16_to_cpu(ipv6_hdr(skb)->payload_len) - rlen;
> >> >> >> + ipv6_hdr(skb)->payload_len = cpu_to_be16(pl_len);
> >> >> >> +
> >> >> >> + skb_transport_header(skb)[1] -= rlen >> 3;
> >> >> >> + end -= rlen;
> >> >> >> + }
> >> >> >> +
> >> >> >> + if (padlen) {
> >> >> >> + off = end - padlen;
> >> >> >> + h = skb_network_header(skb);
> >> >> >> +
> >> >> >> + if (padlen == 1) {
> >> >> >> + h[off] = IPV6_TLV_PAD1;
> >> >> >> + } else {
> >> >> >> + padlen -= 2;
> >> >> >> +
> >> >> >> + h[off] = IPV6_TLV_PADN;
> >> >> >> + h[off + 1] = padlen;
> >> >> >> + memset(&h[off + 2], 0, padlen);
> >> >> >> + }
> >> >> >> + }
> >> >> >> +
> >> >> >> + return end;
> >> >> >> +}
> >> >> >> +
> >> >> >> /* Parse tlv encoded option header (hop-by-hop or destination) */
> >> >> >>
> >> >> >> static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >> struct sk_buff *skb,
> >> >> >> - int max_count)
> >> >> >> + int max_count,
> >> >> >> + bool removable)
> >> >> >> {
> >> >> >> int len = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> >> - const unsigned char *nh = skb_network_header(skb);
> >> >> >> + unsigned char *nh = skb_network_header(skb);
> >> >> >> int off = skb_network_header_len(skb);
> >> >> >> const struct tlvtype_proc *curr;
> >> >> >> bool disallow_unknowns = false;
> >> >> >> + int off_remove = 0;
> >> >> >> int tlv_count = 0;
> >> >> >> int padlen = 0;
> >> >> >> + int ret;
> >> >> >>
> >> >> >> if (unlikely(max_count < 0)) {
> >> >> >> disallow_unknowns = true;
> >> >> >> @@ -173,12 +231,14 @@ static bool ip6_parse_tlv(const struct tlvtype_proc
> >> >> >> *procs,
> >> >> >> if (tlv_count > max_count)
> >> >> >> goto bad;
> >> >> >>
> >> >> >> + ret = -1;
> >> >> >> for (curr = procs; curr->type >= 0; curr++) {
> >> >> >> if (curr->type == nh[off]) {
> >> >> >> /* type specific length/alignment
> >> >> >> checks will be performed in the
> >> >> >> func(). */
> >> >> >> - if (curr->func(skb, off) == false)
> >> >> >> + ret = curr->func(skb, off);
> >> >> >> + if (ret == TLV_REJECT)
> >> >> >> return false;
> >> >> >> break;
> >> >> >> }
> >> >> >> @@ -187,6 +247,17 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >> !ip6_tlvopt_unknown(skb, off, disallow_unknowns))
> >> >> >> return false;
> >> >> >>
> >> >> >> + if (removable) {
> >> >> >> + if (ret == TLV_REMOVE) {
> >> >> >> + if (!off_remove)
> >> >> >> + off_remove = off - padlen;
> >> >> >> + } else if (off_remove) {
> >> >> >> + off = remove_tlv(off_remove, off, skb);
> >> >> >> + nh = skb_network_header(skb);
> >> >> >> + off_remove = 0;
> >> >> >> + }
> >> >> >> + }
> >> >> >> +
> >> >> >> padlen = 0;
> >> >> >> break;
> >> >> >> }
> >> >> >> @@ -194,8 +265,13 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >> len -= optlen;
> >> >> >> }
> >> >> >>
> >> >> >> - if (len == 0)
> >> >> >> + if (len == 0) {
> >> >> >> + /* Don't forget last TLV if it must be removed */
> >> >> >> + if (off_remove)
> >> >> >> + remove_tlv(off_remove, off, skb);
> >> >> >> +
> >> >> >> return true;
> >> >> >> + }
> >> >> >> bad:
> >> >> >> kfree_skb(skb);
> >> >> >> return false;
> >> >> >> @@ -206,7 +282,7 @@ static bool ip6_parse_tlv(const struct tlvtype_proc *procs,
> >> >> >> *****************************/
> >> >> >>
> >> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> >> -static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> >> {
> >> >> >> struct ipv6_destopt_hao *hao;
> >> >> >> struct inet6_skb_parm *opt = IP6CB(skb);
> >> >> >> @@ -257,11 +333,11 @@ static bool ipv6_dest_hao(struct sk_buff *skb, int optoff)
> >> >> >> if (skb->tstamp == 0)
> >> >> >> __net_timestamp(skb);
> >> >> >>
> >> >> >> - return true;
> >> >> >> + return TLV_ACCEPT;
> >> >> >>
> >> >> >> discard:
> >> >> >> kfree_skb(skb);
> >> >> >> - return false;
> >> >> >> + return TLV_REJECT;
> >> >> >> }
> >> >> >> #endif
> >> >> >>
> >> >> >> @@ -306,7 +382,8 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
> >> >> >> #endif
> >> >> >>
> >> >> >> if (ip6_parse_tlv(tlvprocdestopt_lst, skb,
> >> >> >> - init_net.ipv6.sysctl.max_dst_opts_cnt)) {
> >> >> >> + init_net.ipv6.sysctl.max_dst_opts_cnt,
> >> >> >> + false)) {
> >> >> >> skb->transport_header += extlen;
> >> >> >> opt = IP6CB(skb);
> >> >> >> #if IS_ENABLED(CONFIG_IPV6_MIP6)
> >> >> >> @@ -918,24 +995,24 @@ static inline struct net *ipv6_skb_net(struct sk_buff
> >> >> >> *skb)
> >> >> >>
> >> >> >> /* Router Alert as of RFC 2711 */
> >> >> >>
> >> >> >> -static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> >> {
> >> >> >> const unsigned char *nh = skb_network_header(skb);
> >> >> >>
> >> >> >> if (nh[optoff + 1] == 2) {
> >> >> >> IP6CB(skb)->flags |= IP6SKB_ROUTERALERT;
> >> >> >> memcpy(&IP6CB(skb)->ra, nh + optoff + 2, sizeof(IP6CB(skb)->ra));
> >> >> >> - return true;
> >> >> >> + return TLV_ACCEPT;
> >> >> >> }
> >> >> >> net_dbg_ratelimited("ipv6_hop_ra: wrong RA length %d\n",
> >> >> >> nh[optoff + 1]);
> >> >> >> kfree_skb(skb);
> >> >> >> - return false;
> >> >> >> + return TLV_REJECT;
> >> >> >> }
> >> >> >>
> >> >> >> /* Jumbo payload */
> >> >> >>
> >> >> >> -static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> >> {
> >> >> >> const unsigned char *nh = skb_network_header(skb);
> >> >> >> struct inet6_dev *idev = __in6_dev_get_safely(skb->dev);
> >> >> >> @@ -953,12 +1030,12 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> >> optoff)
> >> >> >> if (pkt_len <= IPV6_MAXPLEN) {
> >> >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff+2);
> >> >> >> - return false;
> >> >> >> + return TLV_REJECT;
> >> >> >> }
> >> >> >> if (ipv6_hdr(skb)->payload_len) {
> >> >> >> __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
> >> >> >> icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, optoff);
> >> >> >> - return false;
> >> >> >> + return TLV_REJECT;
> >> >> >> }
> >> >> >>
> >> >> >> if (pkt_len > skb->len - sizeof(struct ipv6hdr)) {
> >> >> >> @@ -970,16 +1047,16 @@ static bool ipv6_hop_jumbo(struct sk_buff *skb, int
> >> >> >> optoff)
> >> >> >> goto drop;
> >> >> >>
> >> >> >> IP6CB(skb)->flags |= IP6SKB_JUMBOGRAM;
> >> >> >> - return true;
> >> >> >> + return TLV_ACCEPT;
> >> >> >>
> >> >> >> drop:
> >> >> >> kfree_skb(skb);
> >> >> >> - return false;
> >> >> >> + return TLV_REJECT;
> >> >> >> }
> >> >> >>
> >> >> >> /* CALIPSO RFC 5570 */
> >> >> >>
> >> >> >> -static bool ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> >> +static int ipv6_hop_calipso(struct sk_buff *skb, int optoff)
> >> >> >> {
> >> >> >> const unsigned char *nh = skb_network_header(skb);
> >> >> >>
> >> >> >> @@ -992,11 +1069,11 @@ static bool ipv6_hop_calipso(struct sk_buff *skb, int
> >> >> >> optoff)
> >> >> >> if (!calipso_validate(skb, nh + optoff))
> >> >> >> goto drop;
> >> >> >>
> >> >> >> - return true;
> >> >> >> + return TLV_ACCEPT;
> >> >> >>
> >> >> >> drop:
> >> >> >> kfree_skb(skb);
> >> >> >> - return false;
> >> >> >> + return TLV_REJECT;
> >> >> >> }
> >> >> >>
> >> >> >> static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >> >> @@ -1041,7 +1118,12 @@ int ipv6_parse_hopopts(struct sk_buff *skb)
> >> >> >>
> >> >> >> opt->flags |= IP6SKB_HOPBYHOP;
> >> >> >> if (ip6_parse_tlv(tlvprochopopt_lst, skb,
> >> >> >> - init_net.ipv6.sysctl.max_hbh_opts_cnt)) {
> >> >> >> + init_net.ipv6.sysctl.max_hbh_opts_cnt,
> >> >> >> + true)) {
> >> >> >> + /* we need to refresh the length in case
> >> >> >> + * at least one TLV was removed
> >> >> >> + */
> >> >> >> + extlen = (skb_transport_header(skb)[1] + 1) << 3;
> >> >> >> skb->transport_header += extlen;
> >> >> >> opt = IP6CB(skb);
> >> >> >> opt->nhoff = sizeof(struct ipv6hdr);
> >> >> >> --
> > > > > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
2020-06-25 2:32 ` Tom Herbert
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
` (2 subsequent siblings)
4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
To: netdev; +Cc: davem, justin.iurman
Implement the IOAM egress behavior.
According to RFC 8200:
"Extension headers (except for the Hop-by-Hop Options header) are not
processed, inserted, or deleted by any node along a packet's delivery
path, until the packet reaches the node (or each of the set of nodes,
in the case of multicast) identified in the Destination Address field
of the IPv6 header."
Therefore, an ingress node (an IOAM domain border) must encapsulate an
incoming IPv6 packet with another similar IPv6 header that will contain
IOAM data while it traverses the domain. When leaving, the egress node,
another IOAM domain border which is also the tunnel destination, must
decapsulate the packet.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
include/linux/ipv6.h | 1 +
net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 2cb445a8fc9e..5312a718bc7a 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -138,6 +138,7 @@ struct inet6_skb_parm {
#define IP6SKB_HOPBYHOP 32
#define IP6SKB_L3SLAVE 64
#define IP6SKB_JUMBOGRAM 128
+#define IP6SKB_IOAM 256
};
#if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index e96304d8a4a7..8cf75cc5e806 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
bool have_final)
{
+ struct inet6_skb_parm *opt = IP6CB(skb);
const struct inet6_protocol *ipprot;
struct inet6_dev *idev;
unsigned int nhoff;
+ u8 hop_limit;
bool raw;
/*
@@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
} else {
if (!raw) {
if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+ /* IOAM Tunnel Decapsulation
+ * Packet is going to re-enter the stack
+ */
+ if (nexthdr == NEXTHDR_IPV6 &&
+ (opt->flags & IP6SKB_IOAM)) {
+ hop_limit = ipv6_hdr(skb)->hop_limit;
+
+ skb_reset_network_header(skb);
+ skb_reset_transport_header(skb);
+ skb->encapsulation = 0;
+
+ ipv6_hdr(skb)->hop_limit = hop_limit;
+ __skb_tunnel_rx(skb, skb->dev,
+ dev_net(skb->dev));
+
+ netif_rx(skb);
+ goto out;
+ }
+
__IP6_INC_STATS(net, idev,
IPSTATS_MIB_INUNKNOWNPROTOS);
icmpv6_send(skb, ICMPV6_PARAMPROB,
@@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
consume_skb(skb);
}
}
+out:
return;
discard:
--
2.17.1
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
@ 2020-06-25 2:32 ` Tom Herbert
2020-06-25 17:56 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 2:32 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Implement the IOAM egress behavior.
>
> According to RFC 8200:
> "Extension headers (except for the Hop-by-Hop Options header) are not
> processed, inserted, or deleted by any node along a packet's delivery
> path, until the packet reaches the node (or each of the set of nodes,
> in the case of multicast) identified in the Destination Address field
> of the IPv6 header."
>
> Therefore, an ingress node (an IOAM domain border) must encapsulate an
> incoming IPv6 packet with another similar IPv6 header that will contain
> IOAM data while it traverses the domain. When leaving, the egress node,
> another IOAM domain border which is also the tunnel destination, must
> decapsulate the packet.
This is just IP in IP encapsulation that happens to be terminated at
an egress node of the IOAM domain. The fact that it's IOAM isn't
germaine, this IP in IP is done in a variety of ways. We should be
using the normal protocol handler for NEXTHDR_IPV6 instead of special
case code.
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> include/linux/ipv6.h | 1 +
> net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
> 2 files changed, 23 insertions(+)
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 2cb445a8fc9e..5312a718bc7a 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
> #define IP6SKB_HOPBYHOP 32
> #define IP6SKB_L3SLAVE 64
> #define IP6SKB_JUMBOGRAM 128
> +#define IP6SKB_IOAM 256
> };
>
> #if defined(CONFIG_NET_L3_MASTER_DEV)
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index e96304d8a4a7..8cf75cc5e806 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
> void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> bool have_final)
> {
> + struct inet6_skb_parm *opt = IP6CB(skb);
> const struct inet6_protocol *ipprot;
> struct inet6_dev *idev;
> unsigned int nhoff;
> + u8 hop_limit;
> bool raw;
>
> /*
> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> } else {
> if (!raw) {
> if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
> + /* IOAM Tunnel Decapsulation
> + * Packet is going to re-enter the stack
> + */
> + if (nexthdr == NEXTHDR_IPV6 &&
> + (opt->flags & IP6SKB_IOAM)) {
> + hop_limit = ipv6_hdr(skb)->hop_limit;
> +
> + skb_reset_network_header(skb);
> + skb_reset_transport_header(skb);
> + skb->encapsulation = 0;
> +
> + ipv6_hdr(skb)->hop_limit = hop_limit;
> + __skb_tunnel_rx(skb, skb->dev,
> + dev_net(skb->dev));
> +
> + netif_rx(skb);
> + goto out;
> + }
> +
> __IP6_INC_STATS(net, idev,
> IPSTATS_MIB_INUNKNOWNPROTOS);
> icmpv6_send(skb, ICMPV6_PARAMPROB,
> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> consume_skb(skb);
> }
> }
> +out:
> return;
>
> discard:
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
2020-06-25 2:32 ` Tom Herbert
@ 2020-06-25 17:56 ` Justin Iurman
2020-06-26 0:48 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 17:56 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
>> Implement the IOAM egress behavior.
>>
>> According to RFC 8200:
>> "Extension headers (except for the Hop-by-Hop Options header) are not
>> processed, inserted, or deleted by any node along a packet's delivery
>> path, until the packet reaches the node (or each of the set of nodes,
>> in the case of multicast) identified in the Destination Address field
>> of the IPv6 header."
>>
>> Therefore, an ingress node (an IOAM domain border) must encapsulate an
>> incoming IPv6 packet with another similar IPv6 header that will contain
>> IOAM data while it traverses the domain. When leaving, the egress node,
>> another IOAM domain border which is also the tunnel destination, must
>> decapsulate the packet.
>
> This is just IP in IP encapsulation that happens to be terminated at
> an egress node of the IOAM domain. The fact that it's IOAM isn't
> germaine, this IP in IP is done in a variety of ways. We should be
> using the normal protocol handler for NEXTHDR_IPV6 instead of special
> case code.
Agree. The reason for this special case code is that I was not aware of a more elegant solution.
Justin
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>> include/linux/ipv6.h | 1 +
>> net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> index 2cb445a8fc9e..5312a718bc7a 100644
>> --- a/include/linux/ipv6.h
>> +++ b/include/linux/ipv6.h
>> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
>> #define IP6SKB_HOPBYHOP 32
>> #define IP6SKB_L3SLAVE 64
>> #define IP6SKB_JUMBOGRAM 128
>> +#define IP6SKB_IOAM 256
>> };
>>
>> #if defined(CONFIG_NET_L3_MASTER_DEV)
>> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> index e96304d8a4a7..8cf75cc5e806 100644
>> --- a/net/ipv6/ip6_input.c
>> +++ b/net/ipv6/ip6_input.c
>> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
>> *));
>> void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>> bool have_final)
>> {
>> + struct inet6_skb_parm *opt = IP6CB(skb);
>> const struct inet6_protocol *ipprot;
>> struct inet6_dev *idev;
>> unsigned int nhoff;
>> + u8 hop_limit;
>> bool raw;
>>
>> /*
>> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> sk_buff *skb, int nexthdr,
>> } else {
>> if (!raw) {
>> if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
>> + /* IOAM Tunnel Decapsulation
>> + * Packet is going to re-enter the stack
>> + */
>> + if (nexthdr == NEXTHDR_IPV6 &&
>> + (opt->flags & IP6SKB_IOAM)) {
>> + hop_limit = ipv6_hdr(skb)->hop_limit;
>> +
>> + skb_reset_network_header(skb);
>> + skb_reset_transport_header(skb);
>> + skb->encapsulation = 0;
>> +
>> + ipv6_hdr(skb)->hop_limit = hop_limit;
>> + __skb_tunnel_rx(skb, skb->dev,
>> + dev_net(skb->dev));
>> +
>> + netif_rx(skb);
>> + goto out;
>> + }
>> +
>> __IP6_INC_STATS(net, idev,
>> IPSTATS_MIB_INUNKNOWNPROTOS);
>> icmpv6_send(skb, ICMPV6_PARAMPROB,
>> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> sk_buff *skb, int nexthdr,
>> consume_skb(skb);
>> }
>> }
>> +out:
>> return;
>>
>> discard:
>> --
>> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
2020-06-25 17:56 ` Justin Iurman
@ 2020-06-26 0:48 ` Tom Herbert
2020-06-26 8:31 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 0:48 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Thu, Jun 25, 2020 at 10:56 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> >> Implement the IOAM egress behavior.
> >>
> >> According to RFC 8200:
> >> "Extension headers (except for the Hop-by-Hop Options header) are not
> >> processed, inserted, or deleted by any node along a packet's delivery
> >> path, until the packet reaches the node (or each of the set of nodes,
> >> in the case of multicast) identified in the Destination Address field
> >> of the IPv6 header."
> >>
> >> Therefore, an ingress node (an IOAM domain border) must encapsulate an
> >> incoming IPv6 packet with another similar IPv6 header that will contain
> >> IOAM data while it traverses the domain. When leaving, the egress node,
> >> another IOAM domain border which is also the tunnel destination, must
> >> decapsulate the packet.
> >
> > This is just IP in IP encapsulation that happens to be terminated at
> > an egress node of the IOAM domain. The fact that it's IOAM isn't
> > germaine, this IP in IP is done in a variety of ways. We should be
> > using the normal protocol handler for NEXTHDR_IPV6 instead of special
> > case code.
>
> Agree. The reason for this special case code is that I was not aware of a more elegant solution.
>
The current implementation might not be what you're looking for since
ip6ip6 wants a tunnel configured. What we really want is more like
anonymous decapsulation, that is just decap the ip6ip6 packet and
resubmit the packet into the stack (this is what you patch is doing).
The idea has been kicked around before, especially in the use case
where we're tunneling across a domain and there could be hundreds of
such tunnels to some device. I think it's generally okay to do this,
although someone might raise security concerns since it sort of
obfuscates the "real packet". Probably makes sense to have a sysctl to
enable this and probably could default to on. Of course, if we do this
the next question is should we also implement anonymous decapsulation
for 44,64,46 tunnels.
Tom
> Justin
>
> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> ---
> >> include/linux/ipv6.h | 1 +
> >> net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
> >> 2 files changed, 23 insertions(+)
> >>
> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> index 2cb445a8fc9e..5312a718bc7a 100644
> >> --- a/include/linux/ipv6.h
> >> +++ b/include/linux/ipv6.h
> >> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
> >> #define IP6SKB_HOPBYHOP 32
> >> #define IP6SKB_L3SLAVE 64
> >> #define IP6SKB_JUMBOGRAM 128
> >> +#define IP6SKB_IOAM 256
> >> };
> >>
> >> #if defined(CONFIG_NET_L3_MASTER_DEV)
> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> >> index e96304d8a4a7..8cf75cc5e806 100644
> >> --- a/net/ipv6/ip6_input.c
> >> +++ b/net/ipv6/ip6_input.c
> >> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
> >> *));
> >> void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> >> bool have_final)
> >> {
> >> + struct inet6_skb_parm *opt = IP6CB(skb);
> >> const struct inet6_protocol *ipprot;
> >> struct inet6_dev *idev;
> >> unsigned int nhoff;
> >> + u8 hop_limit;
> >> bool raw;
> >>
> >> /*
> >> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> sk_buff *skb, int nexthdr,
> >> } else {
> >> if (!raw) {
> >> if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
> >> + /* IOAM Tunnel Decapsulation
> >> + * Packet is going to re-enter the stack
> >> + */
> >> + if (nexthdr == NEXTHDR_IPV6 &&
> >> + (opt->flags & IP6SKB_IOAM)) {
> >> + hop_limit = ipv6_hdr(skb)->hop_limit;
> >> +
> >> + skb_reset_network_header(skb);
> >> + skb_reset_transport_header(skb);
> >> + skb->encapsulation = 0;
> >> +
> >> + ipv6_hdr(skb)->hop_limit = hop_limit;
> >> + __skb_tunnel_rx(skb, skb->dev,
> >> + dev_net(skb->dev));
> >> +
> >> + netif_rx(skb);
> >> + goto out;
> >> + }
> >> +
> >> __IP6_INC_STATS(net, idev,
> >> IPSTATS_MIB_INUNKNOWNPROTOS);
> >> icmpv6_send(skb, ICMPV6_PARAMPROB,
> >> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> sk_buff *skb, int nexthdr,
> >> consume_skb(skb);
> >> }
> >> }
> >> +out:
> >> return;
> >>
> >> discard:
> >> --
> >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
2020-06-26 0:48 ` Tom Herbert
@ 2020-06-26 8:31 ` Justin Iurman
2020-06-26 15:52 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 8:31 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
Tom,
>> >> Implement the IOAM egress behavior.
>> >>
>> >> According to RFC 8200:
>> >> "Extension headers (except for the Hop-by-Hop Options header) are not
>> >> processed, inserted, or deleted by any node along a packet's delivery
>> >> path, until the packet reaches the node (or each of the set of nodes,
>> >> in the case of multicast) identified in the Destination Address field
>> >> of the IPv6 header."
>> >>
>> >> Therefore, an ingress node (an IOAM domain border) must encapsulate an
>> >> incoming IPv6 packet with another similar IPv6 header that will contain
>> >> IOAM data while it traverses the domain. When leaving, the egress node,
>> >> another IOAM domain border which is also the tunnel destination, must
>> >> decapsulate the packet.
>> >
>> > This is just IP in IP encapsulation that happens to be terminated at
>> > an egress node of the IOAM domain. The fact that it's IOAM isn't
>> > germaine, this IP in IP is done in a variety of ways. We should be
>> > using the normal protocol handler for NEXTHDR_IPV6 instead of special
>> > case code.
>>
>> Agree. The reason for this special case code is that I was not aware of a more
>> elegant solution.
>>
> The current implementation might not be what you're looking for since
> ip6ip6 wants a tunnel configured. What we really want is more like
> anonymous decapsulation, that is just decap the ip6ip6 packet and
> resubmit the packet into the stack (this is what you patch is doing).
> The idea has been kicked around before, especially in the use case
> where we're tunneling across a domain and there could be hundreds of
> such tunnels to some device. I think it's generally okay to do this,
> although someone might raise security concerns since it sort of
> obfuscates the "real packet". Probably makes sense to have a sysctl to
Indeed. However, in this precise case for IOAM, you don't have security issues since you would only decap if an IOAM HBH is found in the outer header, which is only valid if the node is part of the IOAM domain (IOAM is enabled on its ingress interface). But, for a more generic case, I agree for the sysctl solution.
> enable this and probably could default to on. Of course, if we do this
> the next question is should we also implement anonymous decapsulation
> for 44,64,46 tunnels.
Interesting question. I'd say that we should only do it if there is at least a use case that is (or will be) part of the kernel.
Justin
> Tom
>
>> Justin
>>
>> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> ---
>> >> include/linux/ipv6.h | 1 +
>> >> net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
>> >> 2 files changed, 23 insertions(+)
>> >>
>> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> >> index 2cb445a8fc9e..5312a718bc7a 100644
>> >> --- a/include/linux/ipv6.h
>> >> +++ b/include/linux/ipv6.h
>> >> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
>> >> #define IP6SKB_HOPBYHOP 32
>> >> #define IP6SKB_L3SLAVE 64
>> >> #define IP6SKB_JUMBOGRAM 128
>> >> +#define IP6SKB_IOAM 256
>> >> };
>> >>
>> >> #if defined(CONFIG_NET_L3_MASTER_DEV)
>> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> >> index e96304d8a4a7..8cf75cc5e806 100644
>> >> --- a/net/ipv6/ip6_input.c
>> >> +++ b/net/ipv6/ip6_input.c
>> >> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
>> >> *));
>> >> void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>> >> bool have_final)
>> >> {
>> >> + struct inet6_skb_parm *opt = IP6CB(skb);
>> >> const struct inet6_protocol *ipprot;
>> >> struct inet6_dev *idev;
>> >> unsigned int nhoff;
>> >> + u8 hop_limit;
>> >> bool raw;
>> >>
>> >> /*
>> >> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> >> sk_buff *skb, int nexthdr,
>> >> } else {
>> >> if (!raw) {
>> >> if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
>> >> + /* IOAM Tunnel Decapsulation
>> >> + * Packet is going to re-enter the stack
>> >> + */
>> >> + if (nexthdr == NEXTHDR_IPV6 &&
>> >> + (opt->flags & IP6SKB_IOAM)) {
>> >> + hop_limit = ipv6_hdr(skb)->hop_limit;
>> >> +
>> >> + skb_reset_network_header(skb);
>> >> + skb_reset_transport_header(skb);
>> >> + skb->encapsulation = 0;
>> >> +
>> >> + ipv6_hdr(skb)->hop_limit = hop_limit;
>> >> + __skb_tunnel_rx(skb, skb->dev,
>> >> + dev_net(skb->dev));
>> >> +
>> >> + netif_rx(skb);
>> >> + goto out;
>> >> + }
>> >> +
>> >> __IP6_INC_STATS(net, idev,
>> >> IPSTATS_MIB_INUNKNOWNPROTOS);
>> >> icmpv6_send(skb, ICMPV6_PARAMPROB,
>> >> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
>> >> sk_buff *skb, int nexthdr,
>> >> consume_skb(skb);
>> >> }
>> >> }
>> >> +out:
>> >> return;
>> >>
>> >> discard:
>> >> --
> > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation
2020-06-26 8:31 ` Justin Iurman
@ 2020-06-26 15:52 ` Tom Herbert
0 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 15:52 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Fri, Jun 26, 2020 at 1:31 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Tom,
>
> >> >> Implement the IOAM egress behavior.
> >> >>
> >> >> According to RFC 8200:
> >> >> "Extension headers (except for the Hop-by-Hop Options header) are not
> >> >> processed, inserted, or deleted by any node along a packet's delivery
> >> >> path, until the packet reaches the node (or each of the set of nodes,
> >> >> in the case of multicast) identified in the Destination Address field
> >> >> of the IPv6 header."
> >> >>
> >> >> Therefore, an ingress node (an IOAM domain border) must encapsulate an
> >> >> incoming IPv6 packet with another similar IPv6 header that will contain
> >> >> IOAM data while it traverses the domain. When leaving, the egress node,
> >> >> another IOAM domain border which is also the tunnel destination, must
> >> >> decapsulate the packet.
> >> >
> >> > This is just IP in IP encapsulation that happens to be terminated at
> >> > an egress node of the IOAM domain. The fact that it's IOAM isn't
> >> > germaine, this IP in IP is done in a variety of ways. We should be
> >> > using the normal protocol handler for NEXTHDR_IPV6 instead of special
> >> > case code.
> >>
> >> Agree. The reason for this special case code is that I was not aware of a more
> >> elegant solution.
> >>
> > The current implementation might not be what you're looking for since
> > ip6ip6 wants a tunnel configured. What we really want is more like
> > anonymous decapsulation, that is just decap the ip6ip6 packet and
> > resubmit the packet into the stack (this is what you patch is doing).
> > The idea has been kicked around before, especially in the use case
> > where we're tunneling across a domain and there could be hundreds of
> > such tunnels to some device. I think it's generally okay to do this,
> > although someone might raise security concerns since it sort of
> > obfuscates the "real packet". Probably makes sense to have a sysctl to
>
> Indeed. However, in this precise case for IOAM, you don't have security issues since you would only decap if an IOAM HBH is found in the outer header, which is only valid if the node is part of the IOAM domain (IOAM is enabled on its ingress interface). But, for a more generic case, I agree for the sysctl solution.
But again there's no such thing as IOAM packets. There are IPv6
packets that have IOAM TLVs in their Hop-by-Hop or Destination
Options. In this case there are IP6IP6 packets that contain an IOAM
TLV in the other headers, but from a protocol and implementation
perspective there's nothing special about that. The outer headers
could just as easily include an SRH (probably more deployed at this
point) or other options and EH ot maybe no options. So we need a
generic solution and not one tied to a particular use case of IP6IP6
tunneling.
Tom
>
> > enable this and probably could default to on. Of course, if we do this
> > the next question is should we also implement anonymous decapsulation
> > for 44,64,46 tunnels.
>
> Interesting question. I'd say that we should only do it if there is at least a use case that is (or will be) part of the kernel.
>
> Justin
>
> > Tom
> >
> >> Justin
> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> ---
> >> >> include/linux/ipv6.h | 1 +
> >> >> net/ipv6/ip6_input.c | 22 ++++++++++++++++++++++
> >> >> 2 files changed, 23 insertions(+)
> >> >>
> >> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> >> index 2cb445a8fc9e..5312a718bc7a 100644
> >> >> --- a/include/linux/ipv6.h
> >> >> +++ b/include/linux/ipv6.h
> >> >> @@ -138,6 +138,7 @@ struct inet6_skb_parm {
> >> >> #define IP6SKB_HOPBYHOP 32
> >> >> #define IP6SKB_L3SLAVE 64
> >> >> #define IP6SKB_JUMBOGRAM 128
> >> >> +#define IP6SKB_IOAM 256
> >> >> };
> >> >>
> >> >> #if defined(CONFIG_NET_L3_MASTER_DEV)
> >> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> >> >> index e96304d8a4a7..8cf75cc5e806 100644
> >> >> --- a/net/ipv6/ip6_input.c
> >> >> +++ b/net/ipv6/ip6_input.c
> >> >> @@ -361,9 +361,11 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff
> >> >> *));
> >> >> void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> >> >> bool have_final)
> >> >> {
> >> >> + struct inet6_skb_parm *opt = IP6CB(skb);
> >> >> const struct inet6_protocol *ipprot;
> >> >> struct inet6_dev *idev;
> >> >> unsigned int nhoff;
> >> >> + u8 hop_limit;
> >> >> bool raw;
> >> >>
> >> >> /*
> >> >> @@ -450,6 +452,25 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> >> sk_buff *skb, int nexthdr,
> >> >> } else {
> >> >> if (!raw) {
> >> >> if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
> >> >> + /* IOAM Tunnel Decapsulation
> >> >> + * Packet is going to re-enter the stack
> >> >> + */
> >> >> + if (nexthdr == NEXTHDR_IPV6 &&
> >> >> + (opt->flags & IP6SKB_IOAM)) {
> >> >> + hop_limit = ipv6_hdr(skb)->hop_limit;
> >> >> +
> >> >> + skb_reset_network_header(skb);
> >> >> + skb_reset_transport_header(skb);
> >> >> + skb->encapsulation = 0;
> >> >> +
> >> >> + ipv6_hdr(skb)->hop_limit = hop_limit;
> >> >> + __skb_tunnel_rx(skb, skb->dev,
> >> >> + dev_net(skb->dev));
> >> >> +
> >> >> + netif_rx(skb);
> >> >> + goto out;
> >> >> + }
> >> >> +
> >> >> __IP6_INC_STATS(net, idev,
> >> >> IPSTATS_MIB_INUNKNOWNPROTOS);
> >> >> icmpv6_send(skb, ICMPV6_PARAMPROB,
> >> >> @@ -461,6 +482,7 @@ void ip6_protocol_deliver_rcu(struct net *net, struct
> >> >> sk_buff *skb, int nexthdr,
> >> >> consume_skb(skb);
> >> >> }
> >> >> }
> >> >> +out:
> >> >> return;
> >> >>
> >> >> discard:
> >> >> --
> > > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
2020-06-24 19:23 ` [PATCH net-next 1/5] ipv6: eh: Introduce removable TLVs Justin Iurman
2020-06-24 19:23 ` [PATCH net-next 2/5] ipv6: IOAM tunnel decapsulation Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
2020-06-24 21:37 ` kernel test robot
` (4 more replies)
2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
4 siblings, 5 replies; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
To: netdev; +Cc: davem, justin.iurman
Implement support for processing the IOAM Pre-allocated Trace with IPv6,
see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
packets. Default is drop.
Another per-interface sysctl ioam6_id is provided to define the IOAM
(unique) identifier of the interface.
A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
identifier of the node.
Two relativistic hash tables: one for IOAM namespaces, the other for
IOAM schemas. A namespace can only have a single active schema and a
schema can only be attached to a single namespace (1:1 relationship).
[1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
[2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
[3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
include/linux/ipv6.h | 2 +
include/net/ioam6.h | 98 +++++++++++
include/net/netns/ipv6.h | 2 +
include/uapi/linux/in6.h | 1 +
include/uapi/linux/ipv6.h | 2 +
net/ipv6/Makefile | 2 +-
net/ipv6/addrconf.c | 20 +++
net/ipv6/af_inet6.c | 7 +
net/ipv6/exthdrs.c | 67 ++++++++
net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
net/ipv6/sysctl_net_ipv6.c | 7 +
11 files changed, 533 insertions(+), 1 deletion(-)
create mode 100644 include/net/ioam6.h
create mode 100644 net/ipv6/ioam6.c
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 5312a718bc7a..15732f964c6e 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -75,6 +75,8 @@ struct ipv6_devconf {
__s32 disable_policy;
__s32 ndisc_tclass;
__s32 rpl_seg_enabled;
+ __u32 ioam6_enabled;
+ __u32 ioam6_id;
struct ctl_table_header *sysctl_header;
};
diff --git a/include/net/ioam6.h b/include/net/ioam6.h
new file mode 100644
index 000000000000..2a910bc99947
--- /dev/null
+++ b/include/net/ioam6.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * IOAM IPv6 implementation
+ *
+ * Author:
+ * Justin Iurman <justin.iurman@uliege.be>
+ */
+
+#ifndef _NET_IOAM6_H
+#define _NET_IOAM6_H
+
+#include <linux/net.h>
+#include <linux/ipv6.h>
+#include <linux/rhashtable-types.h>
+
+#define IOAM6_OPT_TRACE_PREALLOC 0
+
+#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
+
+#define IOAM6_TRACE_TYPE0 (1 << 31)
+#define IOAM6_TRACE_TYPE1 (1 << 30)
+#define IOAM6_TRACE_TYPE2 (1 << 29)
+#define IOAM6_TRACE_TYPE3 (1 << 28)
+#define IOAM6_TRACE_TYPE4 (1 << 27)
+#define IOAM6_TRACE_TYPE5 (1 << 26)
+#define IOAM6_TRACE_TYPE6 (1 << 25)
+#define IOAM6_TRACE_TYPE7 (1 << 24)
+#define IOAM6_TRACE_TYPE8 (1 << 23)
+#define IOAM6_TRACE_TYPE9 (1 << 22)
+#define IOAM6_TRACE_TYPE10 (1 << 21)
+#define IOAM6_TRACE_TYPE11 (1 << 20)
+#define IOAM6_TRACE_TYPE22 (1 << 9)
+
+#define IOAM6_EMPTY_FIELD_u16 0xffff
+#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
+#define IOAM6_EMPTY_FIELD_u32 0xffffffff
+#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
+#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
+
+struct ioam6_common_hdr {
+ u8 opt_type;
+ u8 opt_len;
+ u8 res;
+ u8 ioam_type;
+ __be16 namespace_id;
+} __packed;
+
+struct ioam6_trace_hdr {
+ __be16 info;
+ __be32 type;
+} __packed;
+
+struct ioam6_namespace {
+ struct rhash_head head;
+ struct rcu_head rcu;
+
+ __be16 id;
+ __be64 data;
+ bool remove_tlv;
+
+ struct ioam6_schema *schema;
+};
+
+struct ioam6_schema {
+ struct rhash_head head;
+ struct rcu_head rcu;
+
+ u32 id;
+ int len;
+ __be32 hdr;
+ u8 *data;
+
+ struct ioam6_namespace *ns;
+};
+
+struct ioam6_pernet_data {
+ struct mutex lock;
+ struct rhashtable namespaces;
+ struct rhashtable schemas;
+};
+
+static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+ return net->ipv6.ioam6_data;
+#else
+ return NULL;
+#endif
+}
+
+extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
+extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
+ struct ioam6_namespace *ns);
+
+extern int ioam6_init(void);
+extern void ioam6_exit(void);
+
+#endif
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 5ec054473d81..89b27fa721f4 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
int max_hbh_opts_len;
int seg6_flowlabel;
bool skip_notify_on_dev_down;
+ unsigned int ioam6_id;
};
struct netns_ipv6 {
@@ -115,6 +116,7 @@ struct netns_ipv6 {
spinlock_t lock;
u32 seq;
} ip6addrlbl_table;
+ struct ioam6_pernet_data *ioam6_data;
};
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index 9f2273a08356..1c98435220c9 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -145,6 +145,7 @@ struct in6_flowlabel_req {
#define IPV6_TLV_PADN 1
#define IPV6_TLV_ROUTERALERT 5
#define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
+#define IPV6_TLV_IOAM_HOPOPTS 49
#define IPV6_TLV_JUMBO 194
#define IPV6_TLV_HAO 201 /* home address option */
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 13e8751bf24a..eb521b2dd885 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -189,6 +189,8 @@ enum {
DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
DEVCONF_NDISC_TCLASS,
DEVCONF_RPL_SEG_ENABLED,
+ DEVCONF_IOAM6_ENABLED,
+ DEVCONF_IOAM6_ID,
DEVCONF_MAX
};
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index cf7b47bdb9b3..b7ef10d417d6 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
- udp_offload.o seg6.o fib6_notifier.o rpl.o
+ udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 840bfdb3d7bd..6c952a28ade2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
.disable_policy = 0,
.rpl_seg_enabled = 0,
+ .ioam6_enabled = 0,
+ .ioam6_id = 0,
};
static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
.addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
.disable_policy = 0,
.rpl_seg_enabled = 0,
+ .ioam6_enabled = 0,
+ .ioam6_id = 0,
};
/* Check if link is ready: is it up and is a valid qdisc available */
@@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
+ array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
+ array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
}
static inline size_t inet6_ifla6_size(void)
@@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+ {
+ .procname = "ioam6_enabled",
+ .data = &ipv6_devconf.ioam6_enabled,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "ioam6_id",
+ .data = &ipv6_devconf.ioam6_id,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
{
/* sentinel */
}
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index b304b882e031..63a9ffc4b283 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -62,6 +62,7 @@
#include <net/rpl.h>
#include <net/compat.h>
#include <net/xfrm.h>
+#include <net/ioam6.h>
#include <linux/uaccess.h>
#include <linux/mroute6.h>
@@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
if (err)
goto rpl_fail;
+ err = ioam6_init();
+ if (err)
+ goto ioam6_fail;
+
err = igmp6_late_init();
if (err)
goto igmp6_late_err;
@@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
#endif
igmp6_late_err:
rpl_exit();
+ioam6_fail:
+ ioam6_exit();
rpl_fail:
seg6_exit();
seg6_fail:
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index f27ab3bf2e0c..00aee1358f1c 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -49,6 +49,8 @@
#include <net/seg6_hmac.h>
#endif
#include <net/rpl.h>
+#include <net/ioam6.h>
+#include <net/dst_metadata.h>
#include <linux/uaccess.h>
@@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
return TLV_REJECT;
}
+/* IOAM */
+
+static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
+{
+ struct ioam6_common_hdr *ioamh;
+ struct ioam6_namespace *ns;
+
+ /* Must be 4n-aligned */
+ if (optoff & 3)
+ goto drop;
+
+ if (!skb_valid_dst(skb))
+ ip6_route_input(skb);
+
+ /* IOAM must be enabled on ingress interface */
+ if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
+ goto drop;
+
+ ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
+ ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
+
+ /* Unknown IOAM namespace, either:
+ * - Drop it if IOAM is not enabled on egress interface (if any)
+ * - Ignore it otherwise
+ */
+ if (!ns) {
+ if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
+ !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
+ goto drop;
+
+ goto accept;
+ }
+
+ if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
+ goto remove;
+
+ /* Known IOAM namespace which must not be removed:
+ * IOAM must be enabled on egress interface
+ */
+ if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
+ !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
+ goto drop;
+
+ switch (ioamh->ioam_type) {
+ case IOAM6_OPT_TRACE_PREALLOC:
+ ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
+ IP6CB(skb)->flags |= IP6SKB_IOAM;
+ break;
+ default:
+ break;
+ }
+
+accept:
+ return TLV_ACCEPT;
+remove:
+ return TLV_REMOVE;
+drop:
+ kfree_skb(skb);
+ return TLV_REJECT;
+}
+
/* Jumbo payload */
static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
@@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
.type = IPV6_TLV_ROUTERALERT,
.func = ipv6_hop_ra,
},
+ {
+ .type = IPV6_TLV_IOAM_HOPOPTS,
+ .func = ipv6_hop_ioam,
+ },
{
.type = IPV6_TLV_JUMBO,
.func = ipv6_hop_jumbo,
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
new file mode 100644
index 000000000000..406aa78eb504
--- /dev/null
+++ b/net/ipv6/ioam6.c
@@ -0,0 +1,326 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * IOAM IPv6 implementation
+ *
+ * Author:
+ * Justin Iurman <justin.iurman@uliege.be>
+ */
+
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/net.h>
+#include <linux/rhashtable.h>
+
+#include <net/addrconf.h>
+#include <net/ioam6.h>
+
+static inline void ioam6_ns_release(struct ioam6_namespace *ns)
+{
+ kfree_rcu(ns, rcu);
+}
+
+static inline void ioam6_sc_release(struct ioam6_schema *sc)
+{
+ kfree_rcu(sc, rcu);
+}
+
+static void ioam6_free_ns(void *ptr, void *arg)
+{
+ struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
+
+ if (ns)
+ ioam6_ns_release(ns);
+}
+
+static void ioam6_free_sc(void *ptr, void *arg)
+{
+ struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
+
+ if (sc)
+ ioam6_sc_release(sc);
+}
+
+static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+ const struct ioam6_namespace *ns = obj;
+
+ return (ns->id != *(__be16 *)arg->key);
+}
+
+static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
+{
+ const struct ioam6_schema *sc = obj;
+
+ return (sc->id != *(u32 *)arg->key);
+}
+
+static const struct rhashtable_params rht_ns_params = {
+ .key_len = sizeof(__be16),
+ .key_offset = offsetof(struct ioam6_namespace, id),
+ .head_offset = offsetof(struct ioam6_namespace, head),
+ .automatic_shrinking = true,
+ .obj_cmpfn = ioam6_ns_cmpfn,
+};
+
+static const struct rhashtable_params rht_sc_params = {
+ .key_len = sizeof(u32),
+ .key_offset = offsetof(struct ioam6_schema, id),
+ .head_offset = offsetof(struct ioam6_schema, head),
+ .automatic_shrinking = true,
+ .obj_cmpfn = ioam6_sc_cmpfn,
+};
+
+struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
+{
+ struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
+
+ return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
+}
+
+void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
+ u32 trace_type, struct ioam6_namespace *ns)
+{
+ u8 *data = skb_network_header(skb) + nodeoff;
+ struct __kernel_sock_timeval ts;
+ u64 raw_u64;
+ u32 raw_u32;
+ u16 raw_u16;
+ u8 byte;
+
+ /* hop_lim and node_id */
+ if (trace_type & IOAM6_TRACE_TYPE0) {
+ byte = ipv6_hdr(skb)->hop_limit - 1;
+ raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
+ if (!raw_u32)
+ raw_u32 = IOAM6_EMPTY_FIELD_u24;
+ else
+ raw_u32 &= IOAM6_EMPTY_FIELD_u24;
+ *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
+ data += sizeof(__be32);
+ }
+
+ /* ingress_if_id and egress_if_id */
+ if (trace_type & IOAM6_TRACE_TYPE1) {
+ raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
+ if (!raw_u16)
+ raw_u16 = IOAM6_EMPTY_FIELD_u16;
+ *(__be16 *)data = cpu_to_be16(raw_u16);
+ data += sizeof(__be16);
+
+ raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
+ if (!raw_u16)
+ raw_u16 = IOAM6_EMPTY_FIELD_u16;
+ *(__be16 *)data = cpu_to_be16(raw_u16);
+ data += sizeof(__be16);
+ }
+
+ /* timestamp seconds */
+ if (trace_type & IOAM6_TRACE_TYPE2) {
+ if (!skb->tstamp) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+ } else {
+ skb_get_new_timestamp(skb, &ts);
+ *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
+ }
+ data += sizeof(__be32);
+ }
+
+ /* timestamp subseconds */
+ if (trace_type & IOAM6_TRACE_TYPE3) {
+ if (!skb->tstamp) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+ } else {
+ if (!(trace_type & IOAM6_TRACE_TYPE2))
+ skb_get_new_timestamp(skb, &ts);
+ *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
+ }
+ data += sizeof(__be32);
+ }
+
+ /* transit delay */
+ if (trace_type & IOAM6_TRACE_TYPE4) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+ data += sizeof(__be32);
+ }
+
+ /* namespace data */
+ if (trace_type & IOAM6_TRACE_TYPE5) {
+ *(__be32 *)data = (__be32)ns->data;
+ data += sizeof(__be32);
+ }
+
+ /* queue depth */
+ if (trace_type & IOAM6_TRACE_TYPE6) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+ data += sizeof(__be32);
+ }
+
+ /* hop_lim and node_id (wide) */
+ if (trace_type & IOAM6_TRACE_TYPE7) {
+ byte = ipv6_hdr(skb)->hop_limit - 1;
+ raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
+ if (!raw_u64)
+ raw_u64 = IOAM6_EMPTY_FIELD_u56;
+ else
+ raw_u64 &= IOAM6_EMPTY_FIELD_u56;
+ *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
+ data += sizeof(__be64);
+ }
+
+ /* ingress_if_id and egress_if_id (wide) */
+ if (trace_type & IOAM6_TRACE_TYPE8) {
+ raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
+ if (!raw_u32)
+ raw_u32 = IOAM6_EMPTY_FIELD_u32;
+ *(__be32 *)data = cpu_to_be32(raw_u32);
+ data += sizeof(__be32);
+
+ raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
+ if (!raw_u32)
+ raw_u32 = IOAM6_EMPTY_FIELD_u32;
+ *(__be32 *)data = cpu_to_be32(raw_u32);
+ data += sizeof(__be32);
+ }
+
+ /* namespace data (wide) */
+ if (trace_type & IOAM6_TRACE_TYPE9) {
+ *(__be64 *)data = ns->data;
+ data += sizeof(__be64);
+ }
+
+ /* buffer occupancy */
+ if (trace_type & IOAM6_TRACE_TYPE10) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+ data += sizeof(__be32);
+ }
+
+ /* checksum complement */
+ if (trace_type & IOAM6_TRACE_TYPE11) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
+ data += sizeof(__be32);
+ }
+
+ /* opaque state snapshot */
+ if (trace_type & IOAM6_TRACE_TYPE22) {
+ if (!ns->schema) {
+ *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
+ } else {
+ *(__be32 *)data = ns->schema->hdr;
+ data += sizeof(__be32);
+ memcpy(data, ns->schema->data, ns->schema->len);
+ }
+ }
+}
+
+void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
+ struct ioam6_namespace *ns)
+{
+ u8 nodelen, flags, remlen, sclen = 0;
+ struct ioam6_trace_hdr *trh;
+ int nodeoff;
+ u16 info;
+ u32 type;
+
+ trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
+ info = be16_to_cpu(trh->info);
+ type = be32_to_cpu(trh->type);
+
+ nodelen = info >> 11;
+ flags = (info >> 7) & 0xf;
+ remlen = info & 0x7f;
+
+ /* Skip if Overflow bit is set OR
+ * if an unknown type (bit 12-21) is set
+ */
+ if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
+ return;
+
+ /* NodeLen does not include Opaque State Snapshot length. We need to
+ * take it into account if the corresponding bit is set and if current
+ * IOAM namespace has an active schema attached to it
+ */
+ if (type & IOAM6_TRACE_TYPE22) {
+ /* Opaque State Snapshot header size */
+ sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
+
+ if (ns->schema)
+ sclen += ns->schema->len / 4;
+ }
+
+ /* Not enough space remaining: set Overflow bit and skip */
+ if (!remlen || remlen < (nodelen + sclen)) {
+ info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
+ trh->info = cpu_to_be16(info);
+ return;
+ }
+
+ nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
+ ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
+
+ /* Update RemainingLen */
+ remlen -= nodelen + sclen;
+ info = (info & 0xff80) | remlen;
+ trh->info = cpu_to_be16(info);
+}
+
+static int __net_init ioam6_net_init(struct net *net)
+{
+ struct ioam6_pernet_data *nsdata;
+ int err = -ENOMEM;
+
+ nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
+ if (!nsdata)
+ goto out;
+
+ mutex_init(&nsdata->lock);
+ net->ipv6.ioam6_data = nsdata;
+
+ err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
+ if (err)
+ goto free_nsdata;
+
+ err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
+ if (err)
+ goto free_rht_ns;
+
+out:
+ return err;
+free_rht_ns:
+ rhashtable_destroy(&nsdata->namespaces);
+free_nsdata:
+ kfree(nsdata);
+ net->ipv6.ioam6_data = NULL;
+ goto out;
+}
+
+static void __net_exit ioam6_net_exit(struct net *net)
+{
+ struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
+
+ rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
+ rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
+
+ kfree(nsdata);
+}
+
+static struct pernet_operations ioam6_net_ops = {
+ .init = ioam6_net_init,
+ .exit = ioam6_net_exit,
+};
+
+int __init ioam6_init(void)
+{
+ int err = register_pernet_subsys(&ioam6_net_ops);
+
+ if (err)
+ return err;
+
+ pr_info("In-situ OAM (IOAM) with IPv6\n");
+ return 0;
+}
+
+void ioam6_exit(void)
+{
+ unregister_pernet_subsys(&ioam6_net_ops);
+}
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index fac2135aa47b..da49b33ab6fc 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "ioam6_id",
+ .data = &init_net.ipv6.sysctl.ioam6_id,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{ }
};
--
2.17.1
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 21:37 ` kernel test robot
2020-06-24 23:11 ` kernel test robot
` (3 subsequent siblings)
4 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 21:37 UTC (permalink / raw)
To: Justin Iurman, netdev; +Cc: kbuild-all, davem, justin.iurman
[-- Attachment #1: Type: text/plain, Size: 7932 bytes --]
Hi Justin,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=um
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
In file included from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:33,
from include/linux/net.h:23,
from net/ipv6/ioam6.c:12:
arch/um/include/asm/uaccess.h: In function '__access_ok':
arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
17 | (((unsigned long) (addr) >= FIXADDR_USER_START) && \
| ^~
arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
45 | __access_ok_vsyscall(addr, size) ||
| ^~~~~~~~~~~~~~~~~~~~
In file included from include/linux/kernel.h:11,
from net/ipv6/ioam6.c:11:
include/asm-generic/fixmap.h: In function 'fix_to_virt':
include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
32 | BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
| ^~
include/linux/compiler.h:372:9: note: in definition of macro '__compiletime_assert'
372 | if (!(condition)) \
| ^~~~~~~~~
include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
392 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
32 | BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
| ^~~~~~~~~~~~
net/ipv6/ioam6.c: At top level:
>> net/ipv6/ioam6.c:81:6: warning: no previous prototype for 'ioam6_fill_trace_data_node' [-Wmissing-prototypes]
81 | void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/ioam6_fill_trace_data_node +81 net/ipv6/ioam6.c
80
> 81 void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
82 u32 trace_type, struct ioam6_namespace *ns)
83 {
84 u8 *data = skb_network_header(skb) + nodeoff;
85 struct __kernel_sock_timeval ts;
86 u64 raw_u64;
87 u32 raw_u32;
88 u16 raw_u16;
89 u8 byte;
90
91 /* hop_lim and node_id */
92 if (trace_type & IOAM6_TRACE_TYPE0) {
93 byte = ipv6_hdr(skb)->hop_limit - 1;
94 raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
95 if (!raw_u32)
96 raw_u32 = IOAM6_EMPTY_FIELD_u24;
97 else
98 raw_u32 &= IOAM6_EMPTY_FIELD_u24;
99 *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
100 data += sizeof(__be32);
101 }
102
103 /* ingress_if_id and egress_if_id */
104 if (trace_type & IOAM6_TRACE_TYPE1) {
105 raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
106 if (!raw_u16)
107 raw_u16 = IOAM6_EMPTY_FIELD_u16;
108 *(__be16 *)data = cpu_to_be16(raw_u16);
109 data += sizeof(__be16);
110
111 raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
112 if (!raw_u16)
113 raw_u16 = IOAM6_EMPTY_FIELD_u16;
114 *(__be16 *)data = cpu_to_be16(raw_u16);
115 data += sizeof(__be16);
116 }
117
118 /* timestamp seconds */
119 if (trace_type & IOAM6_TRACE_TYPE2) {
120 if (!skb->tstamp) {
121 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
122 } else {
123 skb_get_new_timestamp(skb, &ts);
124 *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
125 }
126 data += sizeof(__be32);
127 }
128
129 /* timestamp subseconds */
130 if (trace_type & IOAM6_TRACE_TYPE3) {
131 if (!skb->tstamp) {
132 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
133 } else {
134 if (!(trace_type & IOAM6_TRACE_TYPE2))
135 skb_get_new_timestamp(skb, &ts);
136 *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
137 }
138 data += sizeof(__be32);
139 }
140
141 /* transit delay */
142 if (trace_type & IOAM6_TRACE_TYPE4) {
143 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
144 data += sizeof(__be32);
145 }
146
147 /* namespace data */
148 if (trace_type & IOAM6_TRACE_TYPE5) {
149 *(__be32 *)data = (__be32)ns->data;
150 data += sizeof(__be32);
151 }
152
153 /* queue depth */
154 if (trace_type & IOAM6_TRACE_TYPE6) {
155 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
156 data += sizeof(__be32);
157 }
158
159 /* hop_lim and node_id (wide) */
160 if (trace_type & IOAM6_TRACE_TYPE7) {
161 byte = ipv6_hdr(skb)->hop_limit - 1;
162 raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
163 if (!raw_u64)
164 raw_u64 = IOAM6_EMPTY_FIELD_u56;
165 else
166 raw_u64 &= IOAM6_EMPTY_FIELD_u56;
167 *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
168 data += sizeof(__be64);
169 }
170
171 /* ingress_if_id and egress_if_id (wide) */
172 if (trace_type & IOAM6_TRACE_TYPE8) {
173 raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
174 if (!raw_u32)
175 raw_u32 = IOAM6_EMPTY_FIELD_u32;
176 *(__be32 *)data = cpu_to_be32(raw_u32);
177 data += sizeof(__be32);
178
179 raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
180 if (!raw_u32)
181 raw_u32 = IOAM6_EMPTY_FIELD_u32;
182 *(__be32 *)data = cpu_to_be32(raw_u32);
183 data += sizeof(__be32);
184 }
185
186 /* namespace data (wide) */
187 if (trace_type & IOAM6_TRACE_TYPE9) {
188 *(__be64 *)data = ns->data;
189 data += sizeof(__be64);
190 }
191
192 /* buffer occupancy */
193 if (trace_type & IOAM6_TRACE_TYPE10) {
194 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
195 data += sizeof(__be32);
196 }
197
198 /* checksum complement */
199 if (trace_type & IOAM6_TRACE_TYPE11) {
200 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
201 data += sizeof(__be32);
202 }
203
204 /* opaque state snapshot */
205 if (trace_type & IOAM6_TRACE_TYPE22) {
206 if (!ns->schema) {
207 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
208 } else {
209 *(__be32 *)data = ns->schema->hdr;
210 data += sizeof(__be32);
211 memcpy(data, ns->schema->data, ns->schema->len);
212 }
213 }
214 }
215
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22959 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
@ 2020-06-24 21:37 ` kernel test robot
0 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 21:37 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 8137 bytes --]
Hi Justin,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: um-allmodconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=um
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
cc1: warning: arch/um/include/uapi: No such file or directory [-Wmissing-include-dirs]
In file included from include/linux/uaccess.h:11,
from include/linux/sched/task.h:11,
from include/linux/sched/signal.h:9,
from include/linux/rcuwait.h:6,
from include/linux/percpu-rwsem.h:7,
from include/linux/fs.h:33,
from include/linux/net.h:23,
from net/ipv6/ioam6.c:12:
arch/um/include/asm/uaccess.h: In function '__access_ok':
arch/um/include/asm/uaccess.h:17:29: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
17 | (((unsigned long) (addr) >= FIXADDR_USER_START) && \
| ^~
arch/um/include/asm/uaccess.h:45:3: note: in expansion of macro '__access_ok_vsyscall'
45 | __access_ok_vsyscall(addr, size) ||
| ^~~~~~~~~~~~~~~~~~~~
In file included from include/linux/kernel.h:11,
from net/ipv6/ioam6.c:11:
include/asm-generic/fixmap.h: In function 'fix_to_virt':
include/asm-generic/fixmap.h:32:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
32 | BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
| ^~
include/linux/compiler.h:372:9: note: in definition of macro '__compiletime_assert'
372 | if (!(condition)) \
| ^~~~~~~~~
include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
392 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
include/asm-generic/fixmap.h:32:2: note: in expansion of macro 'BUILD_BUG_ON'
32 | BUILD_BUG_ON(idx >= __end_of_fixed_addresses);
| ^~~~~~~~~~~~
net/ipv6/ioam6.c: At top level:
>> net/ipv6/ioam6.c:81:6: warning: no previous prototype for 'ioam6_fill_trace_data_node' [-Wmissing-prototypes]
81 | void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/ioam6_fill_trace_data_node +81 net/ipv6/ioam6.c
80
> 81 void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
82 u32 trace_type, struct ioam6_namespace *ns)
83 {
84 u8 *data = skb_network_header(skb) + nodeoff;
85 struct __kernel_sock_timeval ts;
86 u64 raw_u64;
87 u32 raw_u32;
88 u16 raw_u16;
89 u8 byte;
90
91 /* hop_lim and node_id */
92 if (trace_type & IOAM6_TRACE_TYPE0) {
93 byte = ipv6_hdr(skb)->hop_limit - 1;
94 raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
95 if (!raw_u32)
96 raw_u32 = IOAM6_EMPTY_FIELD_u24;
97 else
98 raw_u32 &= IOAM6_EMPTY_FIELD_u24;
99 *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
100 data += sizeof(__be32);
101 }
102
103 /* ingress_if_id and egress_if_id */
104 if (trace_type & IOAM6_TRACE_TYPE1) {
105 raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
106 if (!raw_u16)
107 raw_u16 = IOAM6_EMPTY_FIELD_u16;
108 *(__be16 *)data = cpu_to_be16(raw_u16);
109 data += sizeof(__be16);
110
111 raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
112 if (!raw_u16)
113 raw_u16 = IOAM6_EMPTY_FIELD_u16;
114 *(__be16 *)data = cpu_to_be16(raw_u16);
115 data += sizeof(__be16);
116 }
117
118 /* timestamp seconds */
119 if (trace_type & IOAM6_TRACE_TYPE2) {
120 if (!skb->tstamp) {
121 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
122 } else {
123 skb_get_new_timestamp(skb, &ts);
124 *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
125 }
126 data += sizeof(__be32);
127 }
128
129 /* timestamp subseconds */
130 if (trace_type & IOAM6_TRACE_TYPE3) {
131 if (!skb->tstamp) {
132 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
133 } else {
134 if (!(trace_type & IOAM6_TRACE_TYPE2))
135 skb_get_new_timestamp(skb, &ts);
136 *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
137 }
138 data += sizeof(__be32);
139 }
140
141 /* transit delay */
142 if (trace_type & IOAM6_TRACE_TYPE4) {
143 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
144 data += sizeof(__be32);
145 }
146
147 /* namespace data */
148 if (trace_type & IOAM6_TRACE_TYPE5) {
149 *(__be32 *)data = (__be32)ns->data;
150 data += sizeof(__be32);
151 }
152
153 /* queue depth */
154 if (trace_type & IOAM6_TRACE_TYPE6) {
155 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
156 data += sizeof(__be32);
157 }
158
159 /* hop_lim and node_id (wide) */
160 if (trace_type & IOAM6_TRACE_TYPE7) {
161 byte = ipv6_hdr(skb)->hop_limit - 1;
162 raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
163 if (!raw_u64)
164 raw_u64 = IOAM6_EMPTY_FIELD_u56;
165 else
166 raw_u64 &= IOAM6_EMPTY_FIELD_u56;
167 *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
168 data += sizeof(__be64);
169 }
170
171 /* ingress_if_id and egress_if_id (wide) */
172 if (trace_type & IOAM6_TRACE_TYPE8) {
173 raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
174 if (!raw_u32)
175 raw_u32 = IOAM6_EMPTY_FIELD_u32;
176 *(__be32 *)data = cpu_to_be32(raw_u32);
177 data += sizeof(__be32);
178
179 raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
180 if (!raw_u32)
181 raw_u32 = IOAM6_EMPTY_FIELD_u32;
182 *(__be32 *)data = cpu_to_be32(raw_u32);
183 data += sizeof(__be32);
184 }
185
186 /* namespace data (wide) */
187 if (trace_type & IOAM6_TRACE_TYPE9) {
188 *(__be64 *)data = ns->data;
189 data += sizeof(__be64);
190 }
191
192 /* buffer occupancy */
193 if (trace_type & IOAM6_TRACE_TYPE10) {
194 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
195 data += sizeof(__be32);
196 }
197
198 /* checksum complement */
199 if (trace_type & IOAM6_TRACE_TYPE11) {
200 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
201 data += sizeof(__be32);
202 }
203
204 /* opaque state snapshot */
205 if (trace_type & IOAM6_TRACE_TYPE22) {
206 if (!ns->schema) {
207 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
208 } else {
209 *(__be32 *)data = ns->schema->hdr;
210 data += sizeof(__be32);
211 memcpy(data, ns->schema->data, ns->schema->len);
212 }
213 }
214 }
215
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 22959 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 23:11 ` kernel test robot
2020-06-24 23:11 ` kernel test robot
` (3 subsequent siblings)
4 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
To: Justin Iurman, netdev; +Cc: kbuild-all, davem, justin.iurman
[-- Attachment #1: Type: text/plain, Size: 5791 bytes --]
Hi Justin,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: i386-randconfig-s002-20200624 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-dirty
# save the attached .config to linux build tree
make W=1 C=1 ARCH=i386 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast to restricted __be32
>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast from restricted __be64
>> net/ipv6/ioam6.c:81:6: sparse: sparse: symbol 'ioam6_fill_trace_data_node' was not declared. Should it be static?
Please review and possibly fold the followup patch.
vim +149 net/ipv6/ioam6.c
80
> 81 void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
82 u32 trace_type, struct ioam6_namespace *ns)
83 {
84 u8 *data = skb_network_header(skb) + nodeoff;
85 struct __kernel_sock_timeval ts;
86 u64 raw_u64;
87 u32 raw_u32;
88 u16 raw_u16;
89 u8 byte;
90
91 /* hop_lim and node_id */
92 if (trace_type & IOAM6_TRACE_TYPE0) {
93 byte = ipv6_hdr(skb)->hop_limit - 1;
94 raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
95 if (!raw_u32)
96 raw_u32 = IOAM6_EMPTY_FIELD_u24;
97 else
98 raw_u32 &= IOAM6_EMPTY_FIELD_u24;
99 *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
100 data += sizeof(__be32);
101 }
102
103 /* ingress_if_id and egress_if_id */
104 if (trace_type & IOAM6_TRACE_TYPE1) {
105 raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
106 if (!raw_u16)
107 raw_u16 = IOAM6_EMPTY_FIELD_u16;
108 *(__be16 *)data = cpu_to_be16(raw_u16);
109 data += sizeof(__be16);
110
111 raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
112 if (!raw_u16)
113 raw_u16 = IOAM6_EMPTY_FIELD_u16;
114 *(__be16 *)data = cpu_to_be16(raw_u16);
115 data += sizeof(__be16);
116 }
117
118 /* timestamp seconds */
119 if (trace_type & IOAM6_TRACE_TYPE2) {
120 if (!skb->tstamp) {
121 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
122 } else {
123 skb_get_new_timestamp(skb, &ts);
124 *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
125 }
126 data += sizeof(__be32);
127 }
128
129 /* timestamp subseconds */
130 if (trace_type & IOAM6_TRACE_TYPE3) {
131 if (!skb->tstamp) {
132 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
133 } else {
134 if (!(trace_type & IOAM6_TRACE_TYPE2))
135 skb_get_new_timestamp(skb, &ts);
136 *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
137 }
138 data += sizeof(__be32);
139 }
140
141 /* transit delay */
142 if (trace_type & IOAM6_TRACE_TYPE4) {
143 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
144 data += sizeof(__be32);
145 }
146
147 /* namespace data */
148 if (trace_type & IOAM6_TRACE_TYPE5) {
> 149 *(__be32 *)data = (__be32)ns->data;
150 data += sizeof(__be32);
151 }
152
153 /* queue depth */
154 if (trace_type & IOAM6_TRACE_TYPE6) {
155 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
156 data += sizeof(__be32);
157 }
158
159 /* hop_lim and node_id (wide) */
160 if (trace_type & IOAM6_TRACE_TYPE7) {
161 byte = ipv6_hdr(skb)->hop_limit - 1;
162 raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
163 if (!raw_u64)
164 raw_u64 = IOAM6_EMPTY_FIELD_u56;
165 else
166 raw_u64 &= IOAM6_EMPTY_FIELD_u56;
167 *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
168 data += sizeof(__be64);
169 }
170
171 /* ingress_if_id and egress_if_id (wide) */
172 if (trace_type & IOAM6_TRACE_TYPE8) {
173 raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
174 if (!raw_u32)
175 raw_u32 = IOAM6_EMPTY_FIELD_u32;
176 *(__be32 *)data = cpu_to_be32(raw_u32);
177 data += sizeof(__be32);
178
179 raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
180 if (!raw_u32)
181 raw_u32 = IOAM6_EMPTY_FIELD_u32;
182 *(__be32 *)data = cpu_to_be32(raw_u32);
183 data += sizeof(__be32);
184 }
185
186 /* namespace data (wide) */
187 if (trace_type & IOAM6_TRACE_TYPE9) {
188 *(__be64 *)data = ns->data;
189 data += sizeof(__be64);
190 }
191
192 /* buffer occupancy */
193 if (trace_type & IOAM6_TRACE_TYPE10) {
194 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
195 data += sizeof(__be32);
196 }
197
198 /* checksum complement */
199 if (trace_type & IOAM6_TRACE_TYPE11) {
200 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
201 data += sizeof(__be32);
202 }
203
204 /* opaque state snapshot */
205 if (trace_type & IOAM6_TRACE_TYPE22) {
206 if (!ns->schema) {
207 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
208 } else {
209 *(__be32 *)data = ns->schema->hdr;
210 data += sizeof(__be32);
211 memcpy(data, ns->schema->data, ns->schema->len);
212 }
213 }
214 }
215
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32299 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
@ 2020-06-24 23:11 ` kernel test robot
0 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 5963 bytes --]
Hi Justin,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: i386-randconfig-s002-20200624 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-dirty
# save the attached .config to linux build tree
make W=1 C=1 ARCH=i386 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast to restricted __be32
>> net/ipv6/ioam6.c:149:36: sparse: sparse: cast from restricted __be64
>> net/ipv6/ioam6.c:81:6: sparse: sparse: symbol 'ioam6_fill_trace_data_node' was not declared. Should it be static?
Please review and possibly fold the followup patch.
vim +149 net/ipv6/ioam6.c
80
> 81 void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
82 u32 trace_type, struct ioam6_namespace *ns)
83 {
84 u8 *data = skb_network_header(skb) + nodeoff;
85 struct __kernel_sock_timeval ts;
86 u64 raw_u64;
87 u32 raw_u32;
88 u16 raw_u16;
89 u8 byte;
90
91 /* hop_lim and node_id */
92 if (trace_type & IOAM6_TRACE_TYPE0) {
93 byte = ipv6_hdr(skb)->hop_limit - 1;
94 raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
95 if (!raw_u32)
96 raw_u32 = IOAM6_EMPTY_FIELD_u24;
97 else
98 raw_u32 &= IOAM6_EMPTY_FIELD_u24;
99 *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
100 data += sizeof(__be32);
101 }
102
103 /* ingress_if_id and egress_if_id */
104 if (trace_type & IOAM6_TRACE_TYPE1) {
105 raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
106 if (!raw_u16)
107 raw_u16 = IOAM6_EMPTY_FIELD_u16;
108 *(__be16 *)data = cpu_to_be16(raw_u16);
109 data += sizeof(__be16);
110
111 raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
112 if (!raw_u16)
113 raw_u16 = IOAM6_EMPTY_FIELD_u16;
114 *(__be16 *)data = cpu_to_be16(raw_u16);
115 data += sizeof(__be16);
116 }
117
118 /* timestamp seconds */
119 if (trace_type & IOAM6_TRACE_TYPE2) {
120 if (!skb->tstamp) {
121 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
122 } else {
123 skb_get_new_timestamp(skb, &ts);
124 *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
125 }
126 data += sizeof(__be32);
127 }
128
129 /* timestamp subseconds */
130 if (trace_type & IOAM6_TRACE_TYPE3) {
131 if (!skb->tstamp) {
132 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
133 } else {
134 if (!(trace_type & IOAM6_TRACE_TYPE2))
135 skb_get_new_timestamp(skb, &ts);
136 *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
137 }
138 data += sizeof(__be32);
139 }
140
141 /* transit delay */
142 if (trace_type & IOAM6_TRACE_TYPE4) {
143 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
144 data += sizeof(__be32);
145 }
146
147 /* namespace data */
148 if (trace_type & IOAM6_TRACE_TYPE5) {
> 149 *(__be32 *)data = (__be32)ns->data;
150 data += sizeof(__be32);
151 }
152
153 /* queue depth */
154 if (trace_type & IOAM6_TRACE_TYPE6) {
155 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
156 data += sizeof(__be32);
157 }
158
159 /* hop_lim and node_id (wide) */
160 if (trace_type & IOAM6_TRACE_TYPE7) {
161 byte = ipv6_hdr(skb)->hop_limit - 1;
162 raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
163 if (!raw_u64)
164 raw_u64 = IOAM6_EMPTY_FIELD_u56;
165 else
166 raw_u64 &= IOAM6_EMPTY_FIELD_u56;
167 *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
168 data += sizeof(__be64);
169 }
170
171 /* ingress_if_id and egress_if_id (wide) */
172 if (trace_type & IOAM6_TRACE_TYPE8) {
173 raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
174 if (!raw_u32)
175 raw_u32 = IOAM6_EMPTY_FIELD_u32;
176 *(__be32 *)data = cpu_to_be32(raw_u32);
177 data += sizeof(__be32);
178
179 raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
180 if (!raw_u32)
181 raw_u32 = IOAM6_EMPTY_FIELD_u32;
182 *(__be32 *)data = cpu_to_be32(raw_u32);
183 data += sizeof(__be32);
184 }
185
186 /* namespace data (wide) */
187 if (trace_type & IOAM6_TRACE_TYPE9) {
188 *(__be64 *)data = ns->data;
189 data += sizeof(__be64);
190 }
191
192 /* buffer occupancy */
193 if (trace_type & IOAM6_TRACE_TYPE10) {
194 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
195 data += sizeof(__be32);
196 }
197
198 /* checksum complement */
199 if (trace_type & IOAM6_TRACE_TYPE11) {
200 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
201 data += sizeof(__be32);
202 }
203
204 /* opaque state snapshot */
205 if (trace_type & IOAM6_TRACE_TYPE22) {
206 if (!ns->schema) {
207 *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
208 } else {
209 *(__be32 *)data = ns->schema->hdr;
210 data += sizeof(__be32);
211 memcpy(data, ns->schema->data, ns->schema->len);
212 }
213 }
214 }
215
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32299 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* [RFC PATCH] ipv6: ioam: ioam6_fill_trace_data_node() can be static
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 23:11 ` kernel test robot
2020-06-24 23:11 ` kernel test robot
` (3 subsequent siblings)
4 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
To: Justin Iurman, netdev; +Cc: kbuild-all, davem, justin.iurman
Signed-off-by: kernel test robot <lkp@intel.com>
---
ioam6.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 406aa78eb504c..4a4e72bb54cc5 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -78,8 +78,8 @@ struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
}
-void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
- u32 trace_type, struct ioam6_namespace *ns)
+static void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
+ u32 trace_type, struct ioam6_namespace *ns)
{
u8 *data = skb_network_header(skb) + nodeoff;
struct __kernel_sock_timeval ts;
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [RFC PATCH] ipv6: ioam: ioam6_fill_trace_data_node() can be static
@ 2020-06-24 23:11 ` kernel test robot
0 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2020-06-24 23:11 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 778 bytes --]
Signed-off-by: kernel test robot <lkp@intel.com>
---
ioam6.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 406aa78eb504c..4a4e72bb54cc5 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -78,8 +78,8 @@ struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
}
-void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
- u32 trace_type, struct ioam6_namespace *ns)
+static void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
+ u32 trace_type, struct ioam6_namespace *ns)
{
u8 *data = skb_network_header(skb) + nodeoff;
struct __kernel_sock_timeval ts;
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
` (2 preceding siblings ...)
2020-06-24 23:11 ` kernel test robot
@ 2020-06-25 2:42 ` Tom Herbert
2020-06-25 14:29 ` Tom Herbert
4 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 2:42 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>
> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> packets. Default is drop.
>
> Another per-interface sysctl ioam6_id is provided to define the IOAM
> (unique) identifier of the interface.
>
> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> identifier of the node.
>
> Two relativistic hash tables: one for IOAM namespaces, the other for
> IOAM schemas. A namespace can only have a single active schema and a
> schema can only be attached to a single namespace (1:1 relationship).
>
> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> include/linux/ipv6.h | 2 +
> include/net/ioam6.h | 98 +++++++++++
> include/net/netns/ipv6.h | 2 +
> include/uapi/linux/in6.h | 1 +
> include/uapi/linux/ipv6.h | 2 +
> net/ipv6/Makefile | 2 +-
> net/ipv6/addrconf.c | 20 +++
> net/ipv6/af_inet6.c | 7 +
> net/ipv6/exthdrs.c | 67 ++++++++
> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
> net/ipv6/sysctl_net_ipv6.c | 7 +
> 11 files changed, 533 insertions(+), 1 deletion(-)
> create mode 100644 include/net/ioam6.h
> create mode 100644 net/ipv6/ioam6.c
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 5312a718bc7a..15732f964c6e 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> __s32 disable_policy;
> __s32 ndisc_tclass;
> __s32 rpl_seg_enabled;
> + __u32 ioam6_enabled;
> + __u32 ioam6_id;
>
> struct ctl_table_header *sysctl_header;
> };
> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> new file mode 100644
> index 000000000000..2a910bc99947
> --- /dev/null
> +++ b/include/net/ioam6.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * IOAM IPv6 implementation
> + *
> + * Author:
> + * Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#ifndef _NET_IOAM6_H
> +#define _NET_IOAM6_H
> +
> +#include <linux/net.h>
> +#include <linux/ipv6.h>
> +#include <linux/rhashtable-types.h>
> +
> +#define IOAM6_OPT_TRACE_PREALLOC 0
> +
> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> +
> +#define IOAM6_TRACE_TYPE0 (1 << 31)
> +#define IOAM6_TRACE_TYPE1 (1 << 30)
> +#define IOAM6_TRACE_TYPE2 (1 << 29)
> +#define IOAM6_TRACE_TYPE3 (1 << 28)
> +#define IOAM6_TRACE_TYPE4 (1 << 27)
> +#define IOAM6_TRACE_TYPE5 (1 << 26)
> +#define IOAM6_TRACE_TYPE6 (1 << 25)
> +#define IOAM6_TRACE_TYPE7 (1 << 24)
> +#define IOAM6_TRACE_TYPE8 (1 << 23)
> +#define IOAM6_TRACE_TYPE9 (1 << 22)
> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> +
> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> +
> +struct ioam6_common_hdr {
> + u8 opt_type;
> + u8 opt_len;
> + u8 res;
> + u8 ioam_type;
> + __be16 namespace_id;
> +} __packed;
> +
> +struct ioam6_trace_hdr {
> + __be16 info;
> + __be32 type;
> +} __packed;
> +
> +struct ioam6_namespace {
> + struct rhash_head head;
> + struct rcu_head rcu;
> +
> + __be16 id;
> + __be64 data;
> + bool remove_tlv;
> +
> + struct ioam6_schema *schema;
> +};
> +
> +struct ioam6_schema {
> + struct rhash_head head;
> + struct rcu_head rcu;
> +
> + u32 id;
> + int len;
> + __be32 hdr;
> + u8 *data;
> +
> + struct ioam6_namespace *ns;
> +};
> +
> +struct ioam6_pernet_data {
> + struct mutex lock;
> + struct rhashtable namespaces;
> + struct rhashtable schemas;
> +};
> +
> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> + return net->ipv6.ioam6_data;
> +#else
> + return NULL;
> +#endif
> +}
> +
> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> + struct ioam6_namespace *ns);
> +
> +extern int ioam6_init(void);
> +extern void ioam6_exit(void);
> +
> +#endif
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 5ec054473d81..89b27fa721f4 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> int max_hbh_opts_len;
> int seg6_flowlabel;
> bool skip_notify_on_dev_down;
> + unsigned int ioam6_id;
> };
>
> struct netns_ipv6 {
> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> spinlock_t lock;
> u32 seq;
> } ip6addrlbl_table;
> + struct ioam6_pernet_data *ioam6_data;
> };
>
> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index 9f2273a08356..1c98435220c9 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> #define IPV6_TLV_PADN 1
> #define IPV6_TLV_ROUTERALERT 5
> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
> +#define IPV6_TLV_IOAM_HOPOPTS 49
> #define IPV6_TLV_JUMBO 194
> #define IPV6_TLV_HAO 201 /* home address option */
>
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 13e8751bf24a..eb521b2dd885 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -189,6 +189,8 @@ enum {
> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> DEVCONF_NDISC_TCLASS,
> DEVCONF_RPL_SEG_ENABLED,
> + DEVCONF_IOAM6_ENABLED,
> + DEVCONF_IOAM6_ID,
> DEVCONF_MAX
> };
>
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index cf7b47bdb9b3..b7ef10d417d6 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> - udp_offload.o seg6.o fib6_notifier.o rpl.o
> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>
> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 840bfdb3d7bd..6c952a28ade2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> .disable_policy = 0,
> .rpl_seg_enabled = 0,
> + .ioam6_enabled = 0,
> + .ioam6_id = 0,
> };
>
> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> .disable_policy = 0,
> .rpl_seg_enabled = 0,
> + .ioam6_enabled = 0,
> + .ioam6_id = 0,
> };
>
> /* Check if link is ready: is it up and is a valid qdisc available */
> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> }
>
> static inline size_t inet6_ifla6_size(void)
> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec,
> },
> + {
> + .procname = "ioam6_enabled",
> + .data = &ipv6_devconf.ioam6_enabled,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> + {
> + .procname = "ioam6_id",
> + .data = &ipv6_devconf.ioam6_id,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> {
> /* sentinel */
> }
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index b304b882e031..63a9ffc4b283 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -62,6 +62,7 @@
> #include <net/rpl.h>
> #include <net/compat.h>
> #include <net/xfrm.h>
> +#include <net/ioam6.h>
>
> #include <linux/uaccess.h>
> #include <linux/mroute6.h>
> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> if (err)
> goto rpl_fail;
>
> + err = ioam6_init();
> + if (err)
> + goto ioam6_fail;
> +
> err = igmp6_late_init();
> if (err)
> goto igmp6_late_err;
> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> #endif
> igmp6_late_err:
> rpl_exit();
> +ioam6_fail:
> + ioam6_exit();
> rpl_fail:
> seg6_exit();
> seg6_fail:
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index f27ab3bf2e0c..00aee1358f1c 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -49,6 +49,8 @@
> #include <net/seg6_hmac.h>
> #endif
> #include <net/rpl.h>
> +#include <net/ioam6.h>
> +#include <net/dst_metadata.h>
>
> #include <linux/uaccess.h>
>
> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> return TLV_REJECT;
> }
>
> +/* IOAM */
> +
> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> +{
> + struct ioam6_common_hdr *ioamh;
> + struct ioam6_namespace *ns;
> +
> + /* Must be 4n-aligned */
> + if (optoff & 3)
> + goto drop;
> +
> + if (!skb_valid_dst(skb))
> + ip6_route_input(skb);
> +
> + /* IOAM must be enabled on ingress interface */
> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> + goto drop;
> +
> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> +
> + /* Unknown IOAM namespace, either:
> + * - Drop it if IOAM is not enabled on egress interface (if any)
> + * - Ignore it otherwise
> + */
> + if (!ns) {
> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> + goto drop;
> +
> + goto accept;
> + }
> +
> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> + goto remove;
> +
> + /* Known IOAM namespace which must not be removed:
> + * IOAM must be enabled on egress interface
> + */
> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> + goto drop;
> +
> + switch (ioamh->ioam_type) {
> + case IOAM6_OPT_TRACE_PREALLOC:
> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> + IP6CB(skb)->flags |= IP6SKB_IOAM;
> + break;
> + default:
> + break;
> + }
> +
> +accept:
> + return TLV_ACCEPT;
> +remove:
> + return TLV_REMOVE;
> +drop:
> + kfree_skb(skb);
> + return TLV_REJECT;
> +}
Hardcoding another TLV in exthdrs.c. I still hope we can eventually
TLVs to be registered from modules like any other protocol does...
> +
> /* Jumbo payload */
>
> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> .type = IPV6_TLV_ROUTERALERT,
> .func = ipv6_hop_ra,
> },
> + {
> + .type = IPV6_TLV_IOAM_HOPOPTS,
> + .func = ipv6_hop_ioam,
> + },
> {
> .type = IPV6_TLV_JUMBO,
> .func = ipv6_hop_jumbo,
> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> new file mode 100644
> index 000000000000..406aa78eb504
> --- /dev/null
> +++ b/net/ipv6/ioam6.c
> @@ -0,0 +1,326 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * IOAM IPv6 implementation
> + *
> + * Author:
> + * Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/net.h>
> +#include <linux/rhashtable.h>
> +
> +#include <net/addrconf.h>
> +#include <net/ioam6.h>
> +
> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> +{
> + kfree_rcu(ns, rcu);
> +}
> +
> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> +{
> + kfree_rcu(sc, rcu);
> +}
> +
> +static void ioam6_free_ns(void *ptr, void *arg)
> +{
> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> +
> + if (ns)
> + ioam6_ns_release(ns);
> +}
> +
> +static void ioam6_free_sc(void *ptr, void *arg)
> +{
> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> +
> + if (sc)
> + ioam6_sc_release(sc);
> +}
> +
> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> + const struct ioam6_namespace *ns = obj;
> +
> + return (ns->id != *(__be16 *)arg->key);
> +}
> +
> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> + const struct ioam6_schema *sc = obj;
> +
> + return (sc->id != *(u32 *)arg->key);
> +}
> +
> +static const struct rhashtable_params rht_ns_params = {
> + .key_len = sizeof(__be16),
> + .key_offset = offsetof(struct ioam6_namespace, id),
> + .head_offset = offsetof(struct ioam6_namespace, head),
> + .automatic_shrinking = true,
> + .obj_cmpfn = ioam6_ns_cmpfn,
> +};
> +
> +static const struct rhashtable_params rht_sc_params = {
> + .key_len = sizeof(u32),
> + .key_offset = offsetof(struct ioam6_schema, id),
> + .head_offset = offsetof(struct ioam6_schema, head),
> + .automatic_shrinking = true,
> + .obj_cmpfn = ioam6_sc_cmpfn,
> +};
> +
> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> +{
> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> +}
> +
> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> + u32 trace_type, struct ioam6_namespace *ns)
> +{
> + u8 *data = skb_network_header(skb) + nodeoff;
> + struct __kernel_sock_timeval ts;
> + u64 raw_u64;
> + u32 raw_u32;
> + u16 raw_u16;
> + u8 byte;
> +
> + /* hop_lim and node_id */
> + if (trace_type & IOAM6_TRACE_TYPE0) {
> + byte = ipv6_hdr(skb)->hop_limit - 1;
> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> + if (!raw_u32)
> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
> + else
> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* ingress_if_id and egress_if_id */
> + if (trace_type & IOAM6_TRACE_TYPE1) {
> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> + if (!raw_u16)
> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> + *(__be16 *)data = cpu_to_be16(raw_u16);
> + data += sizeof(__be16);
> +
> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> + if (!raw_u16)
> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> + *(__be16 *)data = cpu_to_be16(raw_u16);
> + data += sizeof(__be16);
> + }
> +
> + /* timestamp seconds */
> + if (trace_type & IOAM6_TRACE_TYPE2) {
> + if (!skb->tstamp) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + } else {
> + skb_get_new_timestamp(skb, &ts);
> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> + }
> + data += sizeof(__be32);
> + }
> +
> + /* timestamp subseconds */
> + if (trace_type & IOAM6_TRACE_TYPE3) {
> + if (!skb->tstamp) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + } else {
> + if (!(trace_type & IOAM6_TRACE_TYPE2))
> + skb_get_new_timestamp(skb, &ts);
> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> + }
> + data += sizeof(__be32);
> + }
> +
> + /* transit delay */
> + if (trace_type & IOAM6_TRACE_TYPE4) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* namespace data */
> + if (trace_type & IOAM6_TRACE_TYPE5) {
> + *(__be32 *)data = (__be32)ns->data;
> + data += sizeof(__be32);
> + }
> +
> + /* queue depth */
> + if (trace_type & IOAM6_TRACE_TYPE6) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* hop_lim and node_id (wide) */
> + if (trace_type & IOAM6_TRACE_TYPE7) {
> + byte = ipv6_hdr(skb)->hop_limit - 1;
> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> + if (!raw_u64)
> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
> + else
> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> + data += sizeof(__be64);
> + }
> +
> + /* ingress_if_id and egress_if_id (wide) */
> + if (trace_type & IOAM6_TRACE_TYPE8) {
> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> + if (!raw_u32)
> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> + *(__be32 *)data = cpu_to_be32(raw_u32);
Hmm, I wonder if the compiler is implementing this as:
*(__be32 *)data = raw_u32 ? cpu_to_be32(raw_u32) :
cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
That is it realizes cpu_to_be32(IOAM6_EMPTY_FIELD_u32) is a constant expression
> + data += sizeof(__be32);
> +
> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> + if (!raw_u32)
> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> + *(__be32 *)data = cpu_to_be32(raw_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* namespace data (wide) */
> + if (trace_type & IOAM6_TRACE_TYPE9) {
> + *(__be64 *)data = ns->data;
> + data += sizeof(__be64);
> + }
> +
> + /* buffer occupancy */
> + if (trace_type & IOAM6_TRACE_TYPE10) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* checksum complement */
> + if (trace_type & IOAM6_TRACE_TYPE11) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* opaque state snapshot */
> + if (trace_type & IOAM6_TRACE_TYPE22) {
> + if (!ns->schema) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> + } else {
> + *(__be32 *)data = ns->schema->hdr;
> + data += sizeof(__be32);
> + memcpy(data, ns->schema->data, ns->schema->len);
> + }
> + }
> +}
> +
> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> + struct ioam6_namespace *ns)
> +{
> + u8 nodelen, flags, remlen, sclen = 0;
> + struct ioam6_trace_hdr *trh;
> + int nodeoff;
> + u16 info;
> + u32 type;
> +
> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> + info = be16_to_cpu(trh->info);
> + type = be32_to_cpu(trh->type);
> +
> + nodelen = info >> 11;
> + flags = (info >> 7) & 0xf;
> + remlen = info & 0x7f;
> +
> + /* Skip if Overflow bit is set OR
> + * if an unknown type (bit 12-21) is set
> + */
> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> + return;
> +
> + /* NodeLen does not include Opaque State Snapshot length. We need to
> + * take it into account if the corresponding bit is set and if current
> + * IOAM namespace has an active schema attached to it
> + */
> + if (type & IOAM6_TRACE_TYPE22) {
> + /* Opaque State Snapshot header size */
> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> +
> + if (ns->schema)
> + sclen += ns->schema->len / 4;
> + }
> +
> + /* Not enough space remaining: set Overflow bit and skip */
> + if (!remlen || remlen < (nodelen + sclen)) {
> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> + trh->info = cpu_to_be16(info);
> + return;
> + }
> +
> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> +
> + /* Update RemainingLen */
> + remlen -= nodelen + sclen;
> + info = (info & 0xff80) | remlen;
> + trh->info = cpu_to_be16(info);
> +}
> +
> +static int __net_init ioam6_net_init(struct net *net)
> +{
> + struct ioam6_pernet_data *nsdata;
> + int err = -ENOMEM;
> +
> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> + if (!nsdata)
> + goto out;
> +
> + mutex_init(&nsdata->lock);
> + net->ipv6.ioam6_data = nsdata;
> +
> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> + if (err)
> + goto free_nsdata;
> +
> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> + if (err)
> + goto free_rht_ns;
> +
> +out:
> + return err;
> +free_rht_ns:
> + rhashtable_destroy(&nsdata->namespaces);
> +free_nsdata:
> + kfree(nsdata);
> + net->ipv6.ioam6_data = NULL;
> + goto out;
> +}
> +
> +static void __net_exit ioam6_net_exit(struct net *net)
> +{
> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> +
> + kfree(nsdata);
> +}
> +
> +static struct pernet_operations ioam6_net_ops = {
> + .init = ioam6_net_init,
> + .exit = ioam6_net_exit,
> +};
> +
> +int __init ioam6_init(void)
> +{
> + int err = register_pernet_subsys(&ioam6_net_ops);
> +
> + if (err)
> + return err;
> +
> + pr_info("In-situ OAM (IOAM) with IPv6\n");
> + return 0;
> +}
> +
> +void ioam6_exit(void)
> +{
> + unregister_pernet_subsys(&ioam6_net_ops);
> +}
> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> index fac2135aa47b..da49b33ab6fc 100644
> --- a/net/ipv6/sysctl_net_ipv6.c
> +++ b/net/ipv6/sysctl_net_ipv6.c
> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec
> },
> + {
> + .procname = "ioam6_id",
> + .data = &init_net.ipv6.sysctl.ioam6_id,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec
> + },
> { }
> };
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
` (3 preceding siblings ...)
2020-06-25 2:42 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Tom Herbert
@ 2020-06-25 14:29 ` Tom Herbert
2020-06-25 18:23 ` Justin Iurman
4 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 14:29 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>
The IANA allocation is TEMPORARY, with an expiration date is
4/16/2021. Note from RFC7120:
"Implementers and deployers need to be aware that deprecation and
de-allocation could take place at any time after expiry; therefore, an
expired early allocation is best considered as deprecated."
Please add a comment in the code and in the Documentation to this effect.
> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> packets. Default is drop.
I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
packet containing the IOAM HBH option . Note that the act bits of the
option type are 00 which means the TLV is skipped if the option isn't
processed soI don't think it's correct to drop these packets by
default.
>
> Another per-interface sysctl ioam6_id is provided to define the IOAM
> (unique) identifier of the interface.
>
> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> identifier of the node.
>
> Two relativistic hash tables: one for IOAM namespaces, the other for
> IOAM schemas. A namespace can only have a single active schema and a
> schema can only be attached to a single namespace (1:1 relationship).
>
> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> include/linux/ipv6.h | 2 +
> include/net/ioam6.h | 98 +++++++++++
> include/net/netns/ipv6.h | 2 +
> include/uapi/linux/in6.h | 1 +
> include/uapi/linux/ipv6.h | 2 +
> net/ipv6/Makefile | 2 +-
> net/ipv6/addrconf.c | 20 +++
> net/ipv6/af_inet6.c | 7 +
> net/ipv6/exthdrs.c | 67 ++++++++
> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
> net/ipv6/sysctl_net_ipv6.c | 7 +
> 11 files changed, 533 insertions(+), 1 deletion(-)
> create mode 100644 include/net/ioam6.h
> create mode 100644 net/ipv6/ioam6.c
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 5312a718bc7a..15732f964c6e 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> __s32 disable_policy;
> __s32 ndisc_tclass;
> __s32 rpl_seg_enabled;
> + __u32 ioam6_enabled;
> + __u32 ioam6_id;
>
> struct ctl_table_header *sysctl_header;
> };
> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> new file mode 100644
> index 000000000000..2a910bc99947
> --- /dev/null
> +++ b/include/net/ioam6.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * IOAM IPv6 implementation
> + *
> + * Author:
> + * Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#ifndef _NET_IOAM6_H
> +#define _NET_IOAM6_H
> +
> +#include <linux/net.h>
> +#include <linux/ipv6.h>
> +#include <linux/rhashtable-types.h>
> +
> +#define IOAM6_OPT_TRACE_PREALLOC 0
> +
> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> +
> +#define IOAM6_TRACE_TYPE0 (1 << 31)
> +#define IOAM6_TRACE_TYPE1 (1 << 30)
> +#define IOAM6_TRACE_TYPE2 (1 << 29)
> +#define IOAM6_TRACE_TYPE3 (1 << 28)
> +#define IOAM6_TRACE_TYPE4 (1 << 27)
> +#define IOAM6_TRACE_TYPE5 (1 << 26)
> +#define IOAM6_TRACE_TYPE6 (1 << 25)
> +#define IOAM6_TRACE_TYPE7 (1 << 24)
> +#define IOAM6_TRACE_TYPE8 (1 << 23)
> +#define IOAM6_TRACE_TYPE9 (1 << 22)
> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> +
> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> +
> +struct ioam6_common_hdr {
> + u8 opt_type;
> + u8 opt_len;
> + u8 res;
> + u8 ioam_type;
> + __be16 namespace_id;
> +} __packed;
> +
> +struct ioam6_trace_hdr {
> + __be16 info;
> + __be32 type;
> +} __packed;
> +
> +struct ioam6_namespace {
> + struct rhash_head head;
> + struct rcu_head rcu;
> +
> + __be16 id;
> + __be64 data;
> + bool remove_tlv;
> +
> + struct ioam6_schema *schema;
> +};
> +
> +struct ioam6_schema {
> + struct rhash_head head;
> + struct rcu_head rcu;
> +
> + u32 id;
> + int len;
> + __be32 hdr;
> + u8 *data;
> +
> + struct ioam6_namespace *ns;
> +};
> +
> +struct ioam6_pernet_data {
> + struct mutex lock;
> + struct rhashtable namespaces;
> + struct rhashtable schemas;
> +};
> +
> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> +{
> +#if IS_ENABLED(CONFIG_IPV6)
> + return net->ipv6.ioam6_data;
> +#else
> + return NULL;
> +#endif
> +}
> +
> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> + struct ioam6_namespace *ns);
> +
> +extern int ioam6_init(void);
> +extern void ioam6_exit(void);
> +
> +#endif
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index 5ec054473d81..89b27fa721f4 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> int max_hbh_opts_len;
> int seg6_flowlabel;
> bool skip_notify_on_dev_down;
> + unsigned int ioam6_id;
> };
>
> struct netns_ipv6 {
> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> spinlock_t lock;
> u32 seq;
> } ip6addrlbl_table;
> + struct ioam6_pernet_data *ioam6_data;
> };
>
> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> index 9f2273a08356..1c98435220c9 100644
> --- a/include/uapi/linux/in6.h
> +++ b/include/uapi/linux/in6.h
> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> #define IPV6_TLV_PADN 1
> #define IPV6_TLV_ROUTERALERT 5
> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
> +#define IPV6_TLV_IOAM_HOPOPTS 49
The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
Note from RFC7120:
"Implementers and deployers need to be aware that deprecation and
de-allocation could take place at any time after expiry; therefore, an
expired early allocation is best considered as deprecated. It is not
IANA's responsibility to track the status of allocations, their
expirations, or when they may be re-allocated."
The expiration date is Please add a comment here and in the
Documentation to this effect.
> #define IPV6_TLV_JUMBO 194
> #define IPV6_TLV_HAO 201 /* home address option */
>
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 13e8751bf24a..eb521b2dd885 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -189,6 +189,8 @@ enum {
> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> DEVCONF_NDISC_TCLASS,
> DEVCONF_RPL_SEG_ENABLED,
> + DEVCONF_IOAM6_ENABLED,
> + DEVCONF_IOAM6_ID,
> DEVCONF_MAX
> };
>
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index cf7b47bdb9b3..b7ef10d417d6 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> - udp_offload.o seg6.o fib6_notifier.o rpl.o
> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>
> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 840bfdb3d7bd..6c952a28ade2 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> .disable_policy = 0,
> .rpl_seg_enabled = 0,
> + .ioam6_enabled = 0,
> + .ioam6_id = 0,
> };
>
> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> .disable_policy = 0,
> .rpl_seg_enabled = 0,
> + .ioam6_enabled = 0,
> + .ioam6_id = 0,
> };
>
> /* Check if link is ready: is it up and is a valid qdisc available */
> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> }
>
> static inline size_t inet6_ifla6_size(void)
> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec,
> },
> + {
> + .procname = "ioam6_enabled",
> + .data = &ipv6_devconf.ioam6_enabled,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> + {
> + .procname = "ioam6_id",
> + .data = &ipv6_devconf.ioam6_id,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> {
> /* sentinel */
> }
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index b304b882e031..63a9ffc4b283 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -62,6 +62,7 @@
> #include <net/rpl.h>
> #include <net/compat.h>
> #include <net/xfrm.h>
> +#include <net/ioam6.h>
>
> #include <linux/uaccess.h>
> #include <linux/mroute6.h>
> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> if (err)
> goto rpl_fail;
>
> + err = ioam6_init();
> + if (err)
> + goto ioam6_fail;
> +
> err = igmp6_late_init();
> if (err)
> goto igmp6_late_err;
> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> #endif
> igmp6_late_err:
> rpl_exit();
> +ioam6_fail:
> + ioam6_exit();
> rpl_fail:
> seg6_exit();
> seg6_fail:
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index f27ab3bf2e0c..00aee1358f1c 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -49,6 +49,8 @@
> #include <net/seg6_hmac.h>
> #endif
> #include <net/rpl.h>
> +#include <net/ioam6.h>
> +#include <net/dst_metadata.h>
>
> #include <linux/uaccess.h>
>
> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> return TLV_REJECT;
> }
>
> +/* IOAM */
> +
> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> +{
> + struct ioam6_common_hdr *ioamh;
> + struct ioam6_namespace *ns;
> +
> + /* Must be 4n-aligned */
> + if (optoff & 3)
> + goto drop;
> +
> + if (!skb_valid_dst(skb))
> + ip6_route_input(skb);
> +
> + /* IOAM must be enabled on ingress interface */
> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> + goto drop;
> +
> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> +
> + /* Unknown IOAM namespace, either:
> + * - Drop it if IOAM is not enabled on egress interface (if any)
> + * - Ignore it otherwise
> + */
> + if (!ns) {
> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> + goto drop;
> +
> + goto accept;
> + }
> +
> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> + goto remove;
> +
> + /* Known IOAM namespace which must not be removed:
> + * IOAM must be enabled on egress interface
> + */
> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> + goto drop;
> +
> + switch (ioamh->ioam_type) {
> + case IOAM6_OPT_TRACE_PREALLOC:
> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> + IP6CB(skb)->flags |= IP6SKB_IOAM;
> + break;
> + default:
> + break;
> + }
> +
> +accept:
> + return TLV_ACCEPT;
> +remove:
> + return TLV_REMOVE;
> +drop:
> + kfree_skb(skb);
> + return TLV_REJECT;
> +}
> +
> /* Jumbo payload */
>
> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> .type = IPV6_TLV_ROUTERALERT,
> .func = ipv6_hop_ra,
> },
> + {
> + .type = IPV6_TLV_IOAM_HOPOPTS,
> + .func = ipv6_hop_ioam,
> + },
> {
> .type = IPV6_TLV_JUMBO,
> .func = ipv6_hop_jumbo,
> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> new file mode 100644
> index 000000000000..406aa78eb504
> --- /dev/null
> +++ b/net/ipv6/ioam6.c
> @@ -0,0 +1,326 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * IOAM IPv6 implementation
> + *
> + * Author:
> + * Justin Iurman <justin.iurman@uliege.be>
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/net.h>
> +#include <linux/rhashtable.h>
> +
> +#include <net/addrconf.h>
> +#include <net/ioam6.h>
> +
> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> +{
> + kfree_rcu(ns, rcu);
> +}
> +
> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> +{
> + kfree_rcu(sc, rcu);
> +}
> +
> +static void ioam6_free_ns(void *ptr, void *arg)
> +{
> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> +
> + if (ns)
> + ioam6_ns_release(ns);
> +}
> +
> +static void ioam6_free_sc(void *ptr, void *arg)
> +{
> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> +
> + if (sc)
> + ioam6_sc_release(sc);
> +}
> +
> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> + const struct ioam6_namespace *ns = obj;
> +
> + return (ns->id != *(__be16 *)arg->key);
> +}
> +
> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> +{
> + const struct ioam6_schema *sc = obj;
> +
> + return (sc->id != *(u32 *)arg->key);
> +}
> +
> +static const struct rhashtable_params rht_ns_params = {
> + .key_len = sizeof(__be16),
> + .key_offset = offsetof(struct ioam6_namespace, id),
> + .head_offset = offsetof(struct ioam6_namespace, head),
> + .automatic_shrinking = true,
> + .obj_cmpfn = ioam6_ns_cmpfn,
> +};
> +
> +static const struct rhashtable_params rht_sc_params = {
> + .key_len = sizeof(u32),
> + .key_offset = offsetof(struct ioam6_schema, id),
> + .head_offset = offsetof(struct ioam6_schema, head),
> + .automatic_shrinking = true,
> + .obj_cmpfn = ioam6_sc_cmpfn,
> +};
> +
> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> +{
> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> +}
> +
> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> + u32 trace_type, struct ioam6_namespace *ns)
> +{
> + u8 *data = skb_network_header(skb) + nodeoff;
> + struct __kernel_sock_timeval ts;
> + u64 raw_u64;
> + u32 raw_u32;
> + u16 raw_u16;
> + u8 byte;
> +
> + /* hop_lim and node_id */
> + if (trace_type & IOAM6_TRACE_TYPE0) {
> + byte = ipv6_hdr(skb)->hop_limit - 1;
> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> + if (!raw_u32)
> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
> + else
> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* ingress_if_id and egress_if_id */
> + if (trace_type & IOAM6_TRACE_TYPE1) {
> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> + if (!raw_u16)
> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> + *(__be16 *)data = cpu_to_be16(raw_u16);
> + data += sizeof(__be16);
> +
> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> + if (!raw_u16)
> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> + *(__be16 *)data = cpu_to_be16(raw_u16);
> + data += sizeof(__be16);
> + }
> +
> + /* timestamp seconds */
> + if (trace_type & IOAM6_TRACE_TYPE2) {
> + if (!skb->tstamp) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + } else {
> + skb_get_new_timestamp(skb, &ts);
> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> + }
> + data += sizeof(__be32);
> + }
> +
> + /* timestamp subseconds */
> + if (trace_type & IOAM6_TRACE_TYPE3) {
> + if (!skb->tstamp) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + } else {
> + if (!(trace_type & IOAM6_TRACE_TYPE2))
> + skb_get_new_timestamp(skb, &ts);
> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> + }
> + data += sizeof(__be32);
> + }
> +
> + /* transit delay */
> + if (trace_type & IOAM6_TRACE_TYPE4) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* namespace data */
> + if (trace_type & IOAM6_TRACE_TYPE5) {
> + *(__be32 *)data = (__be32)ns->data;
> + data += sizeof(__be32);
> + }
> +
> + /* queue depth */
> + if (trace_type & IOAM6_TRACE_TYPE6) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* hop_lim and node_id (wide) */
> + if (trace_type & IOAM6_TRACE_TYPE7) {
> + byte = ipv6_hdr(skb)->hop_limit - 1;
> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> + if (!raw_u64)
> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
> + else
> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> + data += sizeof(__be64);
> + }
> +
> + /* ingress_if_id and egress_if_id (wide) */
> + if (trace_type & IOAM6_TRACE_TYPE8) {
> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> + if (!raw_u32)
> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> + *(__be32 *)data = cpu_to_be32(raw_u32);
> + data += sizeof(__be32);
> +
> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> + if (!raw_u32)
> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> + *(__be32 *)data = cpu_to_be32(raw_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* namespace data (wide) */
> + if (trace_type & IOAM6_TRACE_TYPE9) {
> + *(__be64 *)data = ns->data;
> + data += sizeof(__be64);
> + }
> +
> + /* buffer occupancy */
> + if (trace_type & IOAM6_TRACE_TYPE10) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* checksum complement */
> + if (trace_type & IOAM6_TRACE_TYPE11) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> + data += sizeof(__be32);
> + }
> +
> + /* opaque state snapshot */
> + if (trace_type & IOAM6_TRACE_TYPE22) {
> + if (!ns->schema) {
> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> + } else {
> + *(__be32 *)data = ns->schema->hdr;
> + data += sizeof(__be32);
> + memcpy(data, ns->schema->data, ns->schema->len);
> + }
> + }
> +}
> +
> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> + struct ioam6_namespace *ns)
> +{
> + u8 nodelen, flags, remlen, sclen = 0;
> + struct ioam6_trace_hdr *trh;
> + int nodeoff;
> + u16 info;
> + u32 type;
> +
> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> + info = be16_to_cpu(trh->info);
> + type = be32_to_cpu(trh->type);
> +
> + nodelen = info >> 11;
> + flags = (info >> 7) & 0xf;
> + remlen = info & 0x7f;
> +
> + /* Skip if Overflow bit is set OR
> + * if an unknown type (bit 12-21) is set
> + */
> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> + return;
> +
> + /* NodeLen does not include Opaque State Snapshot length. We need to
> + * take it into account if the corresponding bit is set and if current
> + * IOAM namespace has an active schema attached to it
> + */
> + if (type & IOAM6_TRACE_TYPE22) {
> + /* Opaque State Snapshot header size */
> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> +
> + if (ns->schema)
> + sclen += ns->schema->len / 4;
> + }
> +
> + /* Not enough space remaining: set Overflow bit and skip */
> + if (!remlen || remlen < (nodelen + sclen)) {
> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> + trh->info = cpu_to_be16(info);
> + return;
> + }
> +
> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> +
> + /* Update RemainingLen */
> + remlen -= nodelen + sclen;
> + info = (info & 0xff80) | remlen;
> + trh->info = cpu_to_be16(info);
> +}
> +
> +static int __net_init ioam6_net_init(struct net *net)
> +{
> + struct ioam6_pernet_data *nsdata;
> + int err = -ENOMEM;
> +
> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> + if (!nsdata)
> + goto out;
> +
> + mutex_init(&nsdata->lock);
> + net->ipv6.ioam6_data = nsdata;
> +
> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> + if (err)
> + goto free_nsdata;
> +
> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> + if (err)
> + goto free_rht_ns;
> +
> +out:
> + return err;
> +free_rht_ns:
> + rhashtable_destroy(&nsdata->namespaces);
> +free_nsdata:
> + kfree(nsdata);
> + net->ipv6.ioam6_data = NULL;
> + goto out;
> +}
> +
> +static void __net_exit ioam6_net_exit(struct net *net)
> +{
> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> +
> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> +
> + kfree(nsdata);
> +}
> +
> +static struct pernet_operations ioam6_net_ops = {
> + .init = ioam6_net_init,
> + .exit = ioam6_net_exit,
> +};
> +
> +int __init ioam6_init(void)
> +{
> + int err = register_pernet_subsys(&ioam6_net_ops);
> +
> + if (err)
> + return err;
> +
> + pr_info("In-situ OAM (IOAM) with IPv6\n");
> + return 0;
> +}
> +
> +void ioam6_exit(void)
> +{
> + unregister_pernet_subsys(&ioam6_net_ops);
> +}
> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> index fac2135aa47b..da49b33ab6fc 100644
> --- a/net/ipv6/sysctl_net_ipv6.c
> +++ b/net/ipv6/sysctl_net_ipv6.c
> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec
> },
> + {
> + .procname = "ioam6_id",
> + .data = &init_net.ipv6.sysctl.ioam6_id,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec
> + },
> { }
> };
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-25 14:29 ` Tom Herbert
@ 2020-06-25 18:23 ` Justin Iurman
2020-06-25 20:32 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 18:23 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
>> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
>> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
>> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>>
>
> The IANA allocation is TEMPORARY, with an expiration date is
> 4/16/2021. Note from RFC7120:
>
> "Implementers and deployers need to be aware that deprecation and
> de-allocation could take place at any time after expiry; therefore, an
> expired early allocation is best considered as deprecated."
>
> Please add a comment in the code and in the Documentation to this effect.
I'll do that, thanks. What kind of comment (is there an official pattern?) and, where in the Documentation should I add it?
>> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
>> packets. Default is drop.
>
> I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> packet containing the IOAM HBH option . Note that the act bits of the
Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets containing the IOAM HBH option.
> option type are 00 which means the TLV is skipped if the option isn't
> processed soI don't think it's correct to drop these packets by
> default.
Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for this option, I do believe it should be disabled (dropped) by default for nodes that "speak IOAM". Indeed, you don't want anyone with a kernel that includes IOAM to accept IOAM packets by default, which would mean that anyone would create (potentially without being aware) an IOAM domain. And, also, to avoid spreading leaks.
Justin
>> Another per-interface sysctl ioam6_id is provided to define the IOAM
>> (unique) identifier of the interface.
>>
>> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
>> identifier of the node.
>>
>> Two relativistic hash tables: one for IOAM namespaces, the other for
>> IOAM schemas. A namespace can only have a single active schema and a
>> schema can only be attached to a single namespace (1:1 relationship).
>>
>> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>> [3]
>> https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>>
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>> include/linux/ipv6.h | 2 +
>> include/net/ioam6.h | 98 +++++++++++
>> include/net/netns/ipv6.h | 2 +
>> include/uapi/linux/in6.h | 1 +
>> include/uapi/linux/ipv6.h | 2 +
>> net/ipv6/Makefile | 2 +-
>> net/ipv6/addrconf.c | 20 +++
>> net/ipv6/af_inet6.c | 7 +
>> net/ipv6/exthdrs.c | 67 ++++++++
>> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
>> net/ipv6/sysctl_net_ipv6.c | 7 +
>> 11 files changed, 533 insertions(+), 1 deletion(-)
>> create mode 100644 include/net/ioam6.h
>> create mode 100644 net/ipv6/ioam6.c
>>
>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> index 5312a718bc7a..15732f964c6e 100644
>> --- a/include/linux/ipv6.h
>> +++ b/include/linux/ipv6.h
>> @@ -75,6 +75,8 @@ struct ipv6_devconf {
>> __s32 disable_policy;
>> __s32 ndisc_tclass;
>> __s32 rpl_seg_enabled;
>> + __u32 ioam6_enabled;
>> + __u32 ioam6_id;
>>
>> struct ctl_table_header *sysctl_header;
>> };
>> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
>> new file mode 100644
>> index 000000000000..2a910bc99947
>> --- /dev/null
>> +++ b/include/net/ioam6.h
>> @@ -0,0 +1,98 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +/*
>> + * IOAM IPv6 implementation
>> + *
>> + * Author:
>> + * Justin Iurman <justin.iurman@uliege.be>
>> + */
>> +
>> +#ifndef _NET_IOAM6_H
>> +#define _NET_IOAM6_H
>> +
>> +#include <linux/net.h>
>> +#include <linux/ipv6.h>
>> +#include <linux/rhashtable-types.h>
>> +
>> +#define IOAM6_OPT_TRACE_PREALLOC 0
>> +
>> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
>> +
>> +#define IOAM6_TRACE_TYPE0 (1 << 31)
>> +#define IOAM6_TRACE_TYPE1 (1 << 30)
>> +#define IOAM6_TRACE_TYPE2 (1 << 29)
>> +#define IOAM6_TRACE_TYPE3 (1 << 28)
>> +#define IOAM6_TRACE_TYPE4 (1 << 27)
>> +#define IOAM6_TRACE_TYPE5 (1 << 26)
>> +#define IOAM6_TRACE_TYPE6 (1 << 25)
>> +#define IOAM6_TRACE_TYPE7 (1 << 24)
>> +#define IOAM6_TRACE_TYPE8 (1 << 23)
>> +#define IOAM6_TRACE_TYPE9 (1 << 22)
>> +#define IOAM6_TRACE_TYPE10 (1 << 21)
>> +#define IOAM6_TRACE_TYPE11 (1 << 20)
>> +#define IOAM6_TRACE_TYPE22 (1 << 9)
>> +
>> +#define IOAM6_EMPTY_FIELD_u16 0xffff
>> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
>> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
>> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
>> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
>> +
>> +struct ioam6_common_hdr {
>> + u8 opt_type;
>> + u8 opt_len;
>> + u8 res;
>> + u8 ioam_type;
>> + __be16 namespace_id;
>> +} __packed;
>> +
>> +struct ioam6_trace_hdr {
>> + __be16 info;
>> + __be32 type;
>> +} __packed;
>> +
>> +struct ioam6_namespace {
>> + struct rhash_head head;
>> + struct rcu_head rcu;
>> +
>> + __be16 id;
>> + __be64 data;
>> + bool remove_tlv;
>> +
>> + struct ioam6_schema *schema;
>> +};
>> +
>> +struct ioam6_schema {
>> + struct rhash_head head;
>> + struct rcu_head rcu;
>> +
>> + u32 id;
>> + int len;
>> + __be32 hdr;
>> + u8 *data;
>> +
>> + struct ioam6_namespace *ns;
>> +};
>> +
>> +struct ioam6_pernet_data {
>> + struct mutex lock;
>> + struct rhashtable namespaces;
>> + struct rhashtable schemas;
>> +};
>> +
>> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6)
>> + return net->ipv6.ioam6_data;
>> +#else
>> + return NULL;
>> +#endif
>> +}
>> +
>> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
>> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> + struct ioam6_namespace *ns);
>> +
>> +extern int ioam6_init(void);
>> +extern void ioam6_exit(void);
>> +
>> +#endif
>> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
>> index 5ec054473d81..89b27fa721f4 100644
>> --- a/include/net/netns/ipv6.h
>> +++ b/include/net/netns/ipv6.h
>> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
>> int max_hbh_opts_len;
>> int seg6_flowlabel;
>> bool skip_notify_on_dev_down;
>> + unsigned int ioam6_id;
>> };
>>
>> struct netns_ipv6 {
>> @@ -115,6 +116,7 @@ struct netns_ipv6 {
>> spinlock_t lock;
>> u32 seq;
>> } ip6addrlbl_table;
>> + struct ioam6_pernet_data *ioam6_data;
>> };
>>
>> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
>> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
>> index 9f2273a08356..1c98435220c9 100644
>> --- a/include/uapi/linux/in6.h
>> +++ b/include/uapi/linux/in6.h
>> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
>> #define IPV6_TLV_PADN 1
>> #define IPV6_TLV_ROUTERALERT 5
>> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
>> +#define IPV6_TLV_IOAM_HOPOPTS 49
>
> The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> Note from RFC7120:
>
> "Implementers and deployers need to be aware that deprecation and
> de-allocation could take place at any time after expiry; therefore, an
> expired early allocation is best considered as deprecated. It is not
> IANA's responsibility to track the status of allocations, their
> expirations, or when they may be re-allocated."
>
> The expiration date is Please add a comment here and in the
> Documentation to this effect.
>
>> #define IPV6_TLV_JUMBO 194
>> #define IPV6_TLV_HAO 201 /* home address option */
>>
>> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
>> index 13e8751bf24a..eb521b2dd885 100644
>> --- a/include/uapi/linux/ipv6.h
>> +++ b/include/uapi/linux/ipv6.h
>> @@ -189,6 +189,8 @@ enum {
>> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>> DEVCONF_NDISC_TCLASS,
>> DEVCONF_RPL_SEG_ENABLED,
>> + DEVCONF_IOAM6_ENABLED,
>> + DEVCONF_IOAM6_ID,
>> DEVCONF_MAX
>> };
>>
>> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
>> index cf7b47bdb9b3..b7ef10d417d6 100644
>> --- a/net/ipv6/Makefile
>> +++ b/net/ipv6/Makefile
>> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o
>> addrconf.o \
>> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
>> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
>> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
>> - udp_offload.o seg6.o fib6_notifier.o rpl.o
>> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>>
>> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 840bfdb3d7bd..6c952a28ade2 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
>> .disable_policy = 0,
>> .rpl_seg_enabled = 0,
>> + .ioam6_enabled = 0,
>> + .ioam6_id = 0,
>> };
>>
>> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
>> {
>> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
>> .disable_policy = 0,
>> .rpl_seg_enabled = 0,
>> + .ioam6_enabled = 0,
>> + .ioam6_id = 0,
>> };
>>
>> /* Check if link is ready: is it up and is a valid qdisc available */
>> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
>> *cnf,
>> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
>> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
>> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
>> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
>> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
>> }
>>
>> static inline size_t inet6_ifla6_size(void)
>> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
>> .mode = 0644,
>> .proc_handler = proc_dointvec,
>> },
>> + {
>> + .procname = "ioam6_enabled",
>> + .data = &ipv6_devconf.ioam6_enabled,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = proc_dointvec,
>> + },
>> + {
>> + .procname = "ioam6_id",
>> + .data = &ipv6_devconf.ioam6_id,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = proc_dointvec,
>> + },
>> {
>> /* sentinel */
>> }
>> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
>> index b304b882e031..63a9ffc4b283 100644
>> --- a/net/ipv6/af_inet6.c
>> +++ b/net/ipv6/af_inet6.c
>> @@ -62,6 +62,7 @@
>> #include <net/rpl.h>
>> #include <net/compat.h>
>> #include <net/xfrm.h>
>> +#include <net/ioam6.h>
>>
>> #include <linux/uaccess.h>
>> #include <linux/mroute6.h>
>> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
>> if (err)
>> goto rpl_fail;
>>
>> + err = ioam6_init();
>> + if (err)
>> + goto ioam6_fail;
>> +
>> err = igmp6_late_init();
>> if (err)
>> goto igmp6_late_err;
>> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
>> #endif
>> igmp6_late_err:
>> rpl_exit();
>> +ioam6_fail:
>> + ioam6_exit();
>> rpl_fail:
>> seg6_exit();
>> seg6_fail:
>> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> index f27ab3bf2e0c..00aee1358f1c 100644
>> --- a/net/ipv6/exthdrs.c
>> +++ b/net/ipv6/exthdrs.c
>> @@ -49,6 +49,8 @@
>> #include <net/seg6_hmac.h>
>> #endif
>> #include <net/rpl.h>
>> +#include <net/ioam6.h>
>> +#include <net/dst_metadata.h>
>>
>> #include <linux/uaccess.h>
>>
>> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> return TLV_REJECT;
>> }
>>
>> +/* IOAM */
>> +
>> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
>> +{
>> + struct ioam6_common_hdr *ioamh;
>> + struct ioam6_namespace *ns;
>> +
>> + /* Must be 4n-aligned */
>> + if (optoff & 3)
>> + goto drop;
>> +
>> + if (!skb_valid_dst(skb))
>> + ip6_route_input(skb);
>> +
>> + /* IOAM must be enabled on ingress interface */
>> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
>> + goto drop;
>> +
>> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
>> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
>> +
>> + /* Unknown IOAM namespace, either:
>> + * - Drop it if IOAM is not enabled on egress interface (if any)
>> + * - Ignore it otherwise
>> + */
>> + if (!ns) {
>> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> + goto drop;
>> +
>> + goto accept;
>> + }
>> +
>> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> + goto remove;
>> +
>> + /* Known IOAM namespace which must not be removed:
>> + * IOAM must be enabled on egress interface
>> + */
>> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> + goto drop;
>> +
>> + switch (ioamh->ioam_type) {
>> + case IOAM6_OPT_TRACE_PREALLOC:
>> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
>> + IP6CB(skb)->flags |= IP6SKB_IOAM;
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> +accept:
>> + return TLV_ACCEPT;
>> +remove:
>> + return TLV_REMOVE;
>> +drop:
>> + kfree_skb(skb);
>> + return TLV_REJECT;
>> +}
>> +
>> /* Jumbo payload */
>>
>> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> .type = IPV6_TLV_ROUTERALERT,
>> .func = ipv6_hop_ra,
>> },
>> + {
>> + .type = IPV6_TLV_IOAM_HOPOPTS,
>> + .func = ipv6_hop_ioam,
>> + },
>> {
>> .type = IPV6_TLV_JUMBO,
>> .func = ipv6_hop_jumbo,
>> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
>> new file mode 100644
>> index 000000000000..406aa78eb504
>> --- /dev/null
>> +++ b/net/ipv6/ioam6.c
>> @@ -0,0 +1,326 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * IOAM IPv6 implementation
>> + *
>> + * Author:
>> + * Justin Iurman <justin.iurman@uliege.be>
>> + */
>> +
>> +#include <linux/errno.h>
>> +#include <linux/types.h>
>> +#include <linux/kernel.h>
>> +#include <linux/net.h>
>> +#include <linux/rhashtable.h>
>> +
>> +#include <net/addrconf.h>
>> +#include <net/ioam6.h>
>> +
>> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
>> +{
>> + kfree_rcu(ns, rcu);
>> +}
>> +
>> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
>> +{
>> + kfree_rcu(sc, rcu);
>> +}
>> +
>> +static void ioam6_free_ns(void *ptr, void *arg)
>> +{
>> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
>> +
>> + if (ns)
>> + ioam6_ns_release(ns);
>> +}
>> +
>> +static void ioam6_free_sc(void *ptr, void *arg)
>> +{
>> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
>> +
>> + if (sc)
>> + ioam6_sc_release(sc);
>> +}
>> +
>> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> +{
>> + const struct ioam6_namespace *ns = obj;
>> +
>> + return (ns->id != *(__be16 *)arg->key);
>> +}
>> +
>> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> +{
>> + const struct ioam6_schema *sc = obj;
>> +
>> + return (sc->id != *(u32 *)arg->key);
>> +}
>> +
>> +static const struct rhashtable_params rht_ns_params = {
>> + .key_len = sizeof(__be16),
>> + .key_offset = offsetof(struct ioam6_namespace, id),
>> + .head_offset = offsetof(struct ioam6_namespace, head),
>> + .automatic_shrinking = true,
>> + .obj_cmpfn = ioam6_ns_cmpfn,
>> +};
>> +
>> +static const struct rhashtable_params rht_sc_params = {
>> + .key_len = sizeof(u32),
>> + .key_offset = offsetof(struct ioam6_schema, id),
>> + .head_offset = offsetof(struct ioam6_schema, head),
>> + .automatic_shrinking = true,
>> + .obj_cmpfn = ioam6_sc_cmpfn,
>> +};
>> +
>> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
>> +{
>> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> +
>> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
>> +}
>> +
>> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
>> + u32 trace_type, struct ioam6_namespace *ns)
>> +{
>> + u8 *data = skb_network_header(skb) + nodeoff;
>> + struct __kernel_sock_timeval ts;
>> + u64 raw_u64;
>> + u32 raw_u32;
>> + u16 raw_u16;
>> + u8 byte;
>> +
>> + /* hop_lim and node_id */
>> + if (trace_type & IOAM6_TRACE_TYPE0) {
>> + byte = ipv6_hdr(skb)->hop_limit - 1;
>> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> + if (!raw_u32)
>> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
>> + else
>> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
>> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* ingress_if_id and egress_if_id */
>> + if (trace_type & IOAM6_TRACE_TYPE1) {
>> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> + if (!raw_u16)
>> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> + *(__be16 *)data = cpu_to_be16(raw_u16);
>> + data += sizeof(__be16);
>> +
>> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> + if (!raw_u16)
>> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> + *(__be16 *)data = cpu_to_be16(raw_u16);
>> + data += sizeof(__be16);
>> + }
>> +
>> + /* timestamp seconds */
>> + if (trace_type & IOAM6_TRACE_TYPE2) {
>> + if (!skb->tstamp) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> + } else {
>> + skb_get_new_timestamp(skb, &ts);
>> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
>> + }
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* timestamp subseconds */
>> + if (trace_type & IOAM6_TRACE_TYPE3) {
>> + if (!skb->tstamp) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> + } else {
>> + if (!(trace_type & IOAM6_TRACE_TYPE2))
>> + skb_get_new_timestamp(skb, &ts);
>> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
>> + }
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* transit delay */
>> + if (trace_type & IOAM6_TRACE_TYPE4) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* namespace data */
>> + if (trace_type & IOAM6_TRACE_TYPE5) {
>> + *(__be32 *)data = (__be32)ns->data;
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* queue depth */
>> + if (trace_type & IOAM6_TRACE_TYPE6) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* hop_lim and node_id (wide) */
>> + if (trace_type & IOAM6_TRACE_TYPE7) {
>> + byte = ipv6_hdr(skb)->hop_limit - 1;
>> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> + if (!raw_u64)
>> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
>> + else
>> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
>> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
>> + data += sizeof(__be64);
>> + }
>> +
>> + /* ingress_if_id and egress_if_id (wide) */
>> + if (trace_type & IOAM6_TRACE_TYPE8) {
>> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> + if (!raw_u32)
>> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> + *(__be32 *)data = cpu_to_be32(raw_u32);
>> + data += sizeof(__be32);
>> +
>> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> + if (!raw_u32)
>> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> + *(__be32 *)data = cpu_to_be32(raw_u32);
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* namespace data (wide) */
>> + if (trace_type & IOAM6_TRACE_TYPE9) {
>> + *(__be64 *)data = ns->data;
>> + data += sizeof(__be64);
>> + }
>> +
>> + /* buffer occupancy */
>> + if (trace_type & IOAM6_TRACE_TYPE10) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* checksum complement */
>> + if (trace_type & IOAM6_TRACE_TYPE11) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> + data += sizeof(__be32);
>> + }
>> +
>> + /* opaque state snapshot */
>> + if (trace_type & IOAM6_TRACE_TYPE22) {
>> + if (!ns->schema) {
>> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
>> + } else {
>> + *(__be32 *)data = ns->schema->hdr;
>> + data += sizeof(__be32);
>> + memcpy(data, ns->schema->data, ns->schema->len);
>> + }
>> + }
>> +}
>> +
>> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> + struct ioam6_namespace *ns)
>> +{
>> + u8 nodelen, flags, remlen, sclen = 0;
>> + struct ioam6_trace_hdr *trh;
>> + int nodeoff;
>> + u16 info;
>> + u32 type;
>> +
>> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
>> + info = be16_to_cpu(trh->info);
>> + type = be32_to_cpu(trh->type);
>> +
>> + nodelen = info >> 11;
>> + flags = (info >> 7) & 0xf;
>> + remlen = info & 0x7f;
>> +
>> + /* Skip if Overflow bit is set OR
>> + * if an unknown type (bit 12-21) is set
>> + */
>> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
>> + return;
>> +
>> + /* NodeLen does not include Opaque State Snapshot length. We need to
>> + * take it into account if the corresponding bit is set and if current
>> + * IOAM namespace has an active schema attached to it
>> + */
>> + if (type & IOAM6_TRACE_TYPE22) {
>> + /* Opaque State Snapshot header size */
>> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
>> +
>> + if (ns->schema)
>> + sclen += ns->schema->len / 4;
>> + }
>> +
>> + /* Not enough space remaining: set Overflow bit and skip */
>> + if (!remlen || remlen < (nodelen + sclen)) {
>> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
>> + trh->info = cpu_to_be16(info);
>> + return;
>> + }
>> +
>> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
>> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
>> +
>> + /* Update RemainingLen */
>> + remlen -= nodelen + sclen;
>> + info = (info & 0xff80) | remlen;
>> + trh->info = cpu_to_be16(info);
>> +}
>> +
>> +static int __net_init ioam6_net_init(struct net *net)
>> +{
>> + struct ioam6_pernet_data *nsdata;
>> + int err = -ENOMEM;
>> +
>> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
>> + if (!nsdata)
>> + goto out;
>> +
>> + mutex_init(&nsdata->lock);
>> + net->ipv6.ioam6_data = nsdata;
>> +
>> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
>> + if (err)
>> + goto free_nsdata;
>> +
>> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
>> + if (err)
>> + goto free_rht_ns;
>> +
>> +out:
>> + return err;
>> +free_rht_ns:
>> + rhashtable_destroy(&nsdata->namespaces);
>> +free_nsdata:
>> + kfree(nsdata);
>> + net->ipv6.ioam6_data = NULL;
>> + goto out;
>> +}
>> +
>> +static void __net_exit ioam6_net_exit(struct net *net)
>> +{
>> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> +
>> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
>> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
>> +
>> + kfree(nsdata);
>> +}
>> +
>> +static struct pernet_operations ioam6_net_ops = {
>> + .init = ioam6_net_init,
>> + .exit = ioam6_net_exit,
>> +};
>> +
>> +int __init ioam6_init(void)
>> +{
>> + int err = register_pernet_subsys(&ioam6_net_ops);
>> +
>> + if (err)
>> + return err;
>> +
>> + pr_info("In-situ OAM (IOAM) with IPv6\n");
>> + return 0;
>> +}
>> +
>> +void ioam6_exit(void)
>> +{
>> + unregister_pernet_subsys(&ioam6_net_ops);
>> +}
>> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
>> index fac2135aa47b..da49b33ab6fc 100644
>> --- a/net/ipv6/sysctl_net_ipv6.c
>> +++ b/net/ipv6/sysctl_net_ipv6.c
>> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
>> .mode = 0644,
>> .proc_handler = proc_dointvec
>> },
>> + {
>> + .procname = "ioam6_id",
>> + .data = &init_net.ipv6.sysctl.ioam6_id,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = proc_dointvec
>> + },
>> { }
>> };
>>
>> --
>> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-25 18:23 ` Justin Iurman
@ 2020-06-25 20:32 ` Tom Herbert
2020-06-26 8:13 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 20:32 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Thu, Jun 25, 2020 at 11:23 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
> >>
> >
> > The IANA allocation is TEMPORARY, with an expiration date is
> > 4/16/2021. Note from RFC7120:
> >
> > "Implementers and deployers need to be aware that deprecation and
> > de-allocation could take place at any time after expiry; therefore, an
> > expired early allocation is best considered as deprecated."
> >
> > Please add a comment in the code and in the Documentation to this effect.
>
> I'll do that, thanks. What kind of comment (is there an official pattern?) and, where in the Documentation should I add it?
>
> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> >> packets. Default is drop.
> >
> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> > packet containing the IOAM HBH option . Note that the act bits of the
>
> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets containing the IOAM HBH option.
>
> > option type are 00 which means the TLV is skipped if the option isn't
> > processed soI don't think it's correct to drop these packets by
> > default.
>
> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for this option, I do believe it should be disabled (dropped) by default for nodes that "speak IOAM". Indeed, you don't want anyone with a kernel that includes IOAM to accept IOAM packets by default, which would mean that anyone would create (potentially without being aware) an IOAM domain. And, also, to avoid spreading leaks.
>
I think you're convoluting whether a node processes an IOAM or whether
it needs to drop because it doesn't process. Yes, on a IOAM system it
makes sense to allow configuration at whether to process the TLV.
However, even when it doesn't then the TLV should be skipped and the
packet not dropped. We know this is the correct behavior since on a
system that isn't IOAM aware, i.e. all deployed nodes right now, they
will skip the TLV per the act bits. If we want to change the default
behavior, the only way to do that is to change the act bits to
non-zero.
For the leakage problem, that is a firewall issue. The expectation is
that border devices will have rules that prevent leaking packets out
of their domain. This is an orthogonal mechanism that needs to be done
for other protocols-- SRH for instance. The filtering is simple, just
drop the packet when TLV matches (although I suspect most sites
probably just drop packets with EH at this point). This doesn't
require any changes to the implementation and doesn't require that
border devices even implement IOAM-- they just drop on pattern
matching.
Tom
> Justin
>
> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
> >> (unique) identifier of the interface.
> >>
> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> >> identifier of the node.
> >>
> >> Two relativistic hash tables: one for IOAM namespaces, the other for
> >> IOAM schemas. A namespace can only have a single active schema and a
> >> schema can only be attached to a single namespace (1:1 relationship).
> >>
> >> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >> [3]
> >> https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
> >>
> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> ---
> >> include/linux/ipv6.h | 2 +
> >> include/net/ioam6.h | 98 +++++++++++
> >> include/net/netns/ipv6.h | 2 +
> >> include/uapi/linux/in6.h | 1 +
> >> include/uapi/linux/ipv6.h | 2 +
> >> net/ipv6/Makefile | 2 +-
> >> net/ipv6/addrconf.c | 20 +++
> >> net/ipv6/af_inet6.c | 7 +
> >> net/ipv6/exthdrs.c | 67 ++++++++
> >> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
> >> net/ipv6/sysctl_net_ipv6.c | 7 +
> >> 11 files changed, 533 insertions(+), 1 deletion(-)
> >> create mode 100644 include/net/ioam6.h
> >> create mode 100644 net/ipv6/ioam6.c
> >>
> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> index 5312a718bc7a..15732f964c6e 100644
> >> --- a/include/linux/ipv6.h
> >> +++ b/include/linux/ipv6.h
> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> >> __s32 disable_policy;
> >> __s32 ndisc_tclass;
> >> __s32 rpl_seg_enabled;
> >> + __u32 ioam6_enabled;
> >> + __u32 ioam6_id;
> >>
> >> struct ctl_table_header *sysctl_header;
> >> };
> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> >> new file mode 100644
> >> index 000000000000..2a910bc99947
> >> --- /dev/null
> >> +++ b/include/net/ioam6.h
> >> @@ -0,0 +1,98 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> +/*
> >> + * IOAM IPv6 implementation
> >> + *
> >> + * Author:
> >> + * Justin Iurman <justin.iurman@uliege.be>
> >> + */
> >> +
> >> +#ifndef _NET_IOAM6_H
> >> +#define _NET_IOAM6_H
> >> +
> >> +#include <linux/net.h>
> >> +#include <linux/ipv6.h>
> >> +#include <linux/rhashtable-types.h>
> >> +
> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
> >> +
> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> >> +
> >> +#define IOAM6_TRACE_TYPE0 (1 << 31)
> >> +#define IOAM6_TRACE_TYPE1 (1 << 30)
> >> +#define IOAM6_TRACE_TYPE2 (1 << 29)
> >> +#define IOAM6_TRACE_TYPE3 (1 << 28)
> >> +#define IOAM6_TRACE_TYPE4 (1 << 27)
> >> +#define IOAM6_TRACE_TYPE5 (1 << 26)
> >> +#define IOAM6_TRACE_TYPE6 (1 << 25)
> >> +#define IOAM6_TRACE_TYPE7 (1 << 24)
> >> +#define IOAM6_TRACE_TYPE8 (1 << 23)
> >> +#define IOAM6_TRACE_TYPE9 (1 << 22)
> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> >> +
> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> >> +
> >> +struct ioam6_common_hdr {
> >> + u8 opt_type;
> >> + u8 opt_len;
> >> + u8 res;
> >> + u8 ioam_type;
> >> + __be16 namespace_id;
> >> +} __packed;
> >> +
> >> +struct ioam6_trace_hdr {
> >> + __be16 info;
> >> + __be32 type;
> >> +} __packed;
> >> +
> >> +struct ioam6_namespace {
> >> + struct rhash_head head;
> >> + struct rcu_head rcu;
> >> +
> >> + __be16 id;
> >> + __be64 data;
> >> + bool remove_tlv;
> >> +
> >> + struct ioam6_schema *schema;
> >> +};
> >> +
> >> +struct ioam6_schema {
> >> + struct rhash_head head;
> >> + struct rcu_head rcu;
> >> +
> >> + u32 id;
> >> + int len;
> >> + __be32 hdr;
> >> + u8 *data;
> >> +
> >> + struct ioam6_namespace *ns;
> >> +};
> >> +
> >> +struct ioam6_pernet_data {
> >> + struct mutex lock;
> >> + struct rhashtable namespaces;
> >> + struct rhashtable schemas;
> >> +};
> >> +
> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> >> +{
> >> +#if IS_ENABLED(CONFIG_IPV6)
> >> + return net->ipv6.ioam6_data;
> >> +#else
> >> + return NULL;
> >> +#endif
> >> +}
> >> +
> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> + struct ioam6_namespace *ns);
> >> +
> >> +extern int ioam6_init(void);
> >> +extern void ioam6_exit(void);
> >> +
> >> +#endif
> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> >> index 5ec054473d81..89b27fa721f4 100644
> >> --- a/include/net/netns/ipv6.h
> >> +++ b/include/net/netns/ipv6.h
> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> >> int max_hbh_opts_len;
> >> int seg6_flowlabel;
> >> bool skip_notify_on_dev_down;
> >> + unsigned int ioam6_id;
> >> };
> >>
> >> struct netns_ipv6 {
> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> >> spinlock_t lock;
> >> u32 seq;
> >> } ip6addrlbl_table;
> >> + struct ioam6_pernet_data *ioam6_data;
> >> };
> >>
> >> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> >> index 9f2273a08356..1c98435220c9 100644
> >> --- a/include/uapi/linux/in6.h
> >> +++ b/include/uapi/linux/in6.h
> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> >> #define IPV6_TLV_PADN 1
> >> #define IPV6_TLV_ROUTERALERT 5
> >> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
> >> +#define IPV6_TLV_IOAM_HOPOPTS 49
> >
> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> > Note from RFC7120:
> >
> > "Implementers and deployers need to be aware that deprecation and
> > de-allocation could take place at any time after expiry; therefore, an
> > expired early allocation is best considered as deprecated. It is not
> > IANA's responsibility to track the status of allocations, their
> > expirations, or when they may be re-allocated."
> >
> > The expiration date is Please add a comment here and in the
> > Documentation to this effect.
> >
> >> #define IPV6_TLV_JUMBO 194
> >> #define IPV6_TLV_HAO 201 /* home address option */
> >>
> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> >> index 13e8751bf24a..eb521b2dd885 100644
> >> --- a/include/uapi/linux/ipv6.h
> >> +++ b/include/uapi/linux/ipv6.h
> >> @@ -189,6 +189,8 @@ enum {
> >> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> >> DEVCONF_NDISC_TCLASS,
> >> DEVCONF_RPL_SEG_ENABLED,
> >> + DEVCONF_IOAM6_ENABLED,
> >> + DEVCONF_IOAM6_ID,
> >> DEVCONF_MAX
> >> };
> >>
> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> >> index cf7b47bdb9b3..b7ef10d417d6 100644
> >> --- a/net/ipv6/Makefile
> >> +++ b/net/ipv6/Makefile
> >> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o
> >> addrconf.o \
> >> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> >> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> >> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> >> - udp_offload.o seg6.o fib6_notifier.o rpl.o
> >> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
> >>
> >> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
> >>
> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> index 840bfdb3d7bd..6c952a28ade2 100644
> >> --- a/net/ipv6/addrconf.c
> >> +++ b/net/ipv6/addrconf.c
> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> >> .disable_policy = 0,
> >> .rpl_seg_enabled = 0,
> >> + .ioam6_enabled = 0,
> >> + .ioam6_id = 0,
> >> };
> >>
> >> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
> >> {
> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> >> .disable_policy = 0,
> >> .rpl_seg_enabled = 0,
> >> + .ioam6_enabled = 0,
> >> + .ioam6_id = 0,
> >> };
> >>
> >> /* Check if link is ready: is it up and is a valid qdisc available */
> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
> >> *cnf,
> >> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> >> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> >> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> >> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> >> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> >> }
> >>
> >> static inline size_t inet6_ifla6_size(void)
> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> >> .mode = 0644,
> >> .proc_handler = proc_dointvec,
> >> },
> >> + {
> >> + .procname = "ioam6_enabled",
> >> + .data = &ipv6_devconf.ioam6_enabled,
> >> + .maxlen = sizeof(int),
> >> + .mode = 0644,
> >> + .proc_handler = proc_dointvec,
> >> + },
> >> + {
> >> + .procname = "ioam6_id",
> >> + .data = &ipv6_devconf.ioam6_id,
> >> + .maxlen = sizeof(int),
> >> + .mode = 0644,
> >> + .proc_handler = proc_dointvec,
> >> + },
> >> {
> >> /* sentinel */
> >> }
> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> >> index b304b882e031..63a9ffc4b283 100644
> >> --- a/net/ipv6/af_inet6.c
> >> +++ b/net/ipv6/af_inet6.c
> >> @@ -62,6 +62,7 @@
> >> #include <net/rpl.h>
> >> #include <net/compat.h>
> >> #include <net/xfrm.h>
> >> +#include <net/ioam6.h>
> >>
> >> #include <linux/uaccess.h>
> >> #include <linux/mroute6.h>
> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> >> if (err)
> >> goto rpl_fail;
> >>
> >> + err = ioam6_init();
> >> + if (err)
> >> + goto ioam6_fail;
> >> +
> >> err = igmp6_late_init();
> >> if (err)
> >> goto igmp6_late_err;
> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> >> #endif
> >> igmp6_late_err:
> >> rpl_exit();
> >> +ioam6_fail:
> >> + ioam6_exit();
> >> rpl_fail:
> >> seg6_exit();
> >> seg6_fail:
> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> index f27ab3bf2e0c..00aee1358f1c 100644
> >> --- a/net/ipv6/exthdrs.c
> >> +++ b/net/ipv6/exthdrs.c
> >> @@ -49,6 +49,8 @@
> >> #include <net/seg6_hmac.h>
> >> #endif
> >> #include <net/rpl.h>
> >> +#include <net/ioam6.h>
> >> +#include <net/dst_metadata.h>
> >>
> >> #include <linux/uaccess.h>
> >>
> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> return TLV_REJECT;
> >> }
> >>
> >> +/* IOAM */
> >> +
> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> >> +{
> >> + struct ioam6_common_hdr *ioamh;
> >> + struct ioam6_namespace *ns;
> >> +
> >> + /* Must be 4n-aligned */
> >> + if (optoff & 3)
> >> + goto drop;
> >> +
> >> + if (!skb_valid_dst(skb))
> >> + ip6_route_input(skb);
> >> +
> >> + /* IOAM must be enabled on ingress interface */
> >> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> >> + goto drop;
> >> +
> >> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> >> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> >> +
> >> + /* Unknown IOAM namespace, either:
> >> + * - Drop it if IOAM is not enabled on egress interface (if any)
> >> + * - Ignore it otherwise
> >> + */
> >> + if (!ns) {
> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> + goto drop;
> >> +
> >> + goto accept;
> >> + }
> >> +
> >> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> + goto remove;
> >> +
> >> + /* Known IOAM namespace which must not be removed:
> >> + * IOAM must be enabled on egress interface
> >> + */
> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> + goto drop;
> >> +
> >> + switch (ioamh->ioam_type) {
> >> + case IOAM6_OPT_TRACE_PREALLOC:
> >> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> >> + IP6CB(skb)->flags |= IP6SKB_IOAM;
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> +
> >> +accept:
> >> + return TLV_ACCEPT;
> >> +remove:
> >> + return TLV_REMOVE;
> >> +drop:
> >> + kfree_skb(skb);
> >> + return TLV_REJECT;
> >> +}
> >> +
> >> /* Jumbo payload */
> >>
> >> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> .type = IPV6_TLV_ROUTERALERT,
> >> .func = ipv6_hop_ra,
> >> },
> >> + {
> >> + .type = IPV6_TLV_IOAM_HOPOPTS,
> >> + .func = ipv6_hop_ioam,
> >> + },
> >> {
> >> .type = IPV6_TLV_JUMBO,
> >> .func = ipv6_hop_jumbo,
> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> >> new file mode 100644
> >> index 000000000000..406aa78eb504
> >> --- /dev/null
> >> +++ b/net/ipv6/ioam6.c
> >> @@ -0,0 +1,326 @@
> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> +/*
> >> + * IOAM IPv6 implementation
> >> + *
> >> + * Author:
> >> + * Justin Iurman <justin.iurman@uliege.be>
> >> + */
> >> +
> >> +#include <linux/errno.h>
> >> +#include <linux/types.h>
> >> +#include <linux/kernel.h>
> >> +#include <linux/net.h>
> >> +#include <linux/rhashtable.h>
> >> +
> >> +#include <net/addrconf.h>
> >> +#include <net/ioam6.h>
> >> +
> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> >> +{
> >> + kfree_rcu(ns, rcu);
> >> +}
> >> +
> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> >> +{
> >> + kfree_rcu(sc, rcu);
> >> +}
> >> +
> >> +static void ioam6_free_ns(void *ptr, void *arg)
> >> +{
> >> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> >> +
> >> + if (ns)
> >> + ioam6_ns_release(ns);
> >> +}
> >> +
> >> +static void ioam6_free_sc(void *ptr, void *arg)
> >> +{
> >> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> >> +
> >> + if (sc)
> >> + ioam6_sc_release(sc);
> >> +}
> >> +
> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> +{
> >> + const struct ioam6_namespace *ns = obj;
> >> +
> >> + return (ns->id != *(__be16 *)arg->key);
> >> +}
> >> +
> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> +{
> >> + const struct ioam6_schema *sc = obj;
> >> +
> >> + return (sc->id != *(u32 *)arg->key);
> >> +}
> >> +
> >> +static const struct rhashtable_params rht_ns_params = {
> >> + .key_len = sizeof(__be16),
> >> + .key_offset = offsetof(struct ioam6_namespace, id),
> >> + .head_offset = offsetof(struct ioam6_namespace, head),
> >> + .automatic_shrinking = true,
> >> + .obj_cmpfn = ioam6_ns_cmpfn,
> >> +};
> >> +
> >> +static const struct rhashtable_params rht_sc_params = {
> >> + .key_len = sizeof(u32),
> >> + .key_offset = offsetof(struct ioam6_schema, id),
> >> + .head_offset = offsetof(struct ioam6_schema, head),
> >> + .automatic_shrinking = true,
> >> + .obj_cmpfn = ioam6_sc_cmpfn,
> >> +};
> >> +
> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> >> +{
> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> +
> >> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> >> +}
> >> +
> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> >> + u32 trace_type, struct ioam6_namespace *ns)
> >> +{
> >> + u8 *data = skb_network_header(skb) + nodeoff;
> >> + struct __kernel_sock_timeval ts;
> >> + u64 raw_u64;
> >> + u32 raw_u32;
> >> + u16 raw_u16;
> >> + u8 byte;
> >> +
> >> + /* hop_lim and node_id */
> >> + if (trace_type & IOAM6_TRACE_TYPE0) {
> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
> >> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> + if (!raw_u32)
> >> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
> >> + else
> >> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> >> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* ingress_if_id and egress_if_id */
> >> + if (trace_type & IOAM6_TRACE_TYPE1) {
> >> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> + if (!raw_u16)
> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
> >> + data += sizeof(__be16);
> >> +
> >> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> + if (!raw_u16)
> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
> >> + data += sizeof(__be16);
> >> + }
> >> +
> >> + /* timestamp seconds */
> >> + if (trace_type & IOAM6_TRACE_TYPE2) {
> >> + if (!skb->tstamp) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> + } else {
> >> + skb_get_new_timestamp(skb, &ts);
> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> >> + }
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* timestamp subseconds */
> >> + if (trace_type & IOAM6_TRACE_TYPE3) {
> >> + if (!skb->tstamp) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> + } else {
> >> + if (!(trace_type & IOAM6_TRACE_TYPE2))
> >> + skb_get_new_timestamp(skb, &ts);
> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> >> + }
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* transit delay */
> >> + if (trace_type & IOAM6_TRACE_TYPE4) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* namespace data */
> >> + if (trace_type & IOAM6_TRACE_TYPE5) {
> >> + *(__be32 *)data = (__be32)ns->data;
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* queue depth */
> >> + if (trace_type & IOAM6_TRACE_TYPE6) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* hop_lim and node_id (wide) */
> >> + if (trace_type & IOAM6_TRACE_TYPE7) {
> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
> >> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> + if (!raw_u64)
> >> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
> >> + else
> >> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> >> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> >> + data += sizeof(__be64);
> >> + }
> >> +
> >> + /* ingress_if_id and egress_if_id (wide) */
> >> + if (trace_type & IOAM6_TRACE_TYPE8) {
> >> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> + if (!raw_u32)
> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
> >> + data += sizeof(__be32);
> >> +
> >> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> + if (!raw_u32)
> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* namespace data (wide) */
> >> + if (trace_type & IOAM6_TRACE_TYPE9) {
> >> + *(__be64 *)data = ns->data;
> >> + data += sizeof(__be64);
> >> + }
> >> +
> >> + /* buffer occupancy */
> >> + if (trace_type & IOAM6_TRACE_TYPE10) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* checksum complement */
> >> + if (trace_type & IOAM6_TRACE_TYPE11) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> + data += sizeof(__be32);
> >> + }
> >> +
> >> + /* opaque state snapshot */
> >> + if (trace_type & IOAM6_TRACE_TYPE22) {
> >> + if (!ns->schema) {
> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> >> + } else {
> >> + *(__be32 *)data = ns->schema->hdr;
> >> + data += sizeof(__be32);
> >> + memcpy(data, ns->schema->data, ns->schema->len);
> >> + }
> >> + }
> >> +}
> >> +
> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> + struct ioam6_namespace *ns)
> >> +{
> >> + u8 nodelen, flags, remlen, sclen = 0;
> >> + struct ioam6_trace_hdr *trh;
> >> + int nodeoff;
> >> + u16 info;
> >> + u32 type;
> >> +
> >> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> >> + info = be16_to_cpu(trh->info);
> >> + type = be32_to_cpu(trh->type);
> >> +
> >> + nodelen = info >> 11;
> >> + flags = (info >> 7) & 0xf;
> >> + remlen = info & 0x7f;
> >> +
> >> + /* Skip if Overflow bit is set OR
> >> + * if an unknown type (bit 12-21) is set
> >> + */
> >> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> >> + return;
> >> +
> >> + /* NodeLen does not include Opaque State Snapshot length. We need to
> >> + * take it into account if the corresponding bit is set and if current
> >> + * IOAM namespace has an active schema attached to it
> >> + */
> >> + if (type & IOAM6_TRACE_TYPE22) {
> >> + /* Opaque State Snapshot header size */
> >> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> >> +
> >> + if (ns->schema)
> >> + sclen += ns->schema->len / 4;
> >> + }
> >> +
> >> + /* Not enough space remaining: set Overflow bit and skip */
> >> + if (!remlen || remlen < (nodelen + sclen)) {
> >> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> >> + trh->info = cpu_to_be16(info);
> >> + return;
> >> + }
> >> +
> >> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> >> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> >> +
> >> + /* Update RemainingLen */
> >> + remlen -= nodelen + sclen;
> >> + info = (info & 0xff80) | remlen;
> >> + trh->info = cpu_to_be16(info);
> >> +}
> >> +
> >> +static int __net_init ioam6_net_init(struct net *net)
> >> +{
> >> + struct ioam6_pernet_data *nsdata;
> >> + int err = -ENOMEM;
> >> +
> >> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> >> + if (!nsdata)
> >> + goto out;
> >> +
> >> + mutex_init(&nsdata->lock);
> >> + net->ipv6.ioam6_data = nsdata;
> >> +
> >> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> >> + if (err)
> >> + goto free_nsdata;
> >> +
> >> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> >> + if (err)
> >> + goto free_rht_ns;
> >> +
> >> +out:
> >> + return err;
> >> +free_rht_ns:
> >> + rhashtable_destroy(&nsdata->namespaces);
> >> +free_nsdata:
> >> + kfree(nsdata);
> >> + net->ipv6.ioam6_data = NULL;
> >> + goto out;
> >> +}
> >> +
> >> +static void __net_exit ioam6_net_exit(struct net *net)
> >> +{
> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> +
> >> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> >> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> >> +
> >> + kfree(nsdata);
> >> +}
> >> +
> >> +static struct pernet_operations ioam6_net_ops = {
> >> + .init = ioam6_net_init,
> >> + .exit = ioam6_net_exit,
> >> +};
> >> +
> >> +int __init ioam6_init(void)
> >> +{
> >> + int err = register_pernet_subsys(&ioam6_net_ops);
> >> +
> >> + if (err)
> >> + return err;
> >> +
> >> + pr_info("In-situ OAM (IOAM) with IPv6\n");
> >> + return 0;
> >> +}
> >> +
> >> +void ioam6_exit(void)
> >> +{
> >> + unregister_pernet_subsys(&ioam6_net_ops);
> >> +}
> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> >> index fac2135aa47b..da49b33ab6fc 100644
> >> --- a/net/ipv6/sysctl_net_ipv6.c
> >> +++ b/net/ipv6/sysctl_net_ipv6.c
> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> >> .mode = 0644,
> >> .proc_handler = proc_dointvec
> >> },
> >> + {
> >> + .procname = "ioam6_id",
> >> + .data = &init_net.ipv6.sysctl.ioam6_id,
> >> + .maxlen = sizeof(int),
> >> + .mode = 0644,
> >> + .proc_handler = proc_dointvec
> >> + },
> >> { }
> >> };
> >>
> >> --
> >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-25 20:32 ` Tom Herbert
@ 2020-06-26 8:13 ` Justin Iurman
2020-06-26 14:53 ` Tom Herbert
0 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 8:13 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
Tom,
>> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
>> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
>> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
>> >>
>> >
>> > The IANA allocation is TEMPORARY, with an expiration date is
>> > 4/16/2021. Note from RFC7120:
>> >
>> > "Implementers and deployers need to be aware that deprecation and
>> > de-allocation could take place at any time after expiry; therefore, an
>> > expired early allocation is best considered as deprecated."
>> >
>> > Please add a comment in the code and in the Documentation to this effect.
>>
>> I'll do that, thanks. What kind of comment (is there an official pattern?) and,
>> where in the Documentation should I add it?
>>
>> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
>> >> packets. Default is drop.
>> >
>> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
>> > packet containing the IOAM HBH option . Note that the act bits of the
>>
>> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets
>> containing the IOAM HBH option.
>>
>> > option type are 00 which means the TLV is skipped if the option isn't
>> > processed soI don't think it's correct to drop these packets by
>> > default.
>>
>> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for
>> this option, I do believe it should be disabled (dropped) by default for nodes
>> that "speak IOAM". Indeed, you don't want anyone with a kernel that includes
>> IOAM to accept IOAM packets by default, which would mean that anyone would
>> create (potentially without being aware) an IOAM domain. And, also, to avoid
>> spreading leaks.
>>
> I think you're convoluting whether a node processes an IOAM or whether
> it needs to drop because it doesn't process. Yes, on a IOAM system it
> makes sense to allow configuration at whether to process the TLV.
> However, even when it doesn't then the TLV should be skipped and the
> packet not dropped. We know this is the correct behavior since on a
> system that isn't IOAM aware, i.e. all deployed nodes right now, they
> will skip the TLV per the act bits. If we want to change the default
> behavior, the only way to do that is to change the act bits to
> non-zero.
Makes sense, you're right indeed. But still, I'm a bit worried to enable it by default. That would open the door to things we don't want. We'd end up in a situation where IOAM is not "privately" deployed. And, think about the guy that runs a kernel with IOAM (that he does not know anything about). Of course, he would not have a FW to drop IOAM. Therefore, someone could simply "create" an IOAM domain with him by sending IPv6 packets with IOAM HBH and steel data. This is something similar to the leak problem.
So, I think there are 2 possibilities against the above: (i) the current one, ie drop by default or (ii) use 01 for act bits. This topic has been widely discussed in the WG and is still open, though the trend seems to be "00" with the drop-by-default compromise.
> For the leakage problem, that is a firewall issue. The expectation is
> that border devices will have rules that prevent leaking packets out
> of their domain. This is an orthogonal mechanism that needs to be done
> for other protocols-- SRH for instance. The filtering is simple, just
> drop the packet when TLV matches (although I suspect most sites
> probably just drop packets with EH at this point). This doesn't
> require any changes to the implementation and doesn't require that
> border devices even implement IOAM-- they just drop on pattern
> matching.
+1
Justin
> Tom
>> Justin
>>
>> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
>> >> (unique) identifier of the interface.
>> >>
>> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
>> >> identifier of the node.
>> >>
>> >> Two relativistic hash tables: one for IOAM namespaces, the other for
>> >> IOAM schemas. A namespace can only have a single active schema and a
>> >> schema can only be attached to a single namespace (1:1 relationship).
>> >>
>> >> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
>> >> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
>> >> [3]
>> >> https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
>> >>
>> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> >> ---
>> >> include/linux/ipv6.h | 2 +
>> >> include/net/ioam6.h | 98 +++++++++++
>> >> include/net/netns/ipv6.h | 2 +
>> >> include/uapi/linux/in6.h | 1 +
>> >> include/uapi/linux/ipv6.h | 2 +
>> >> net/ipv6/Makefile | 2 +-
>> >> net/ipv6/addrconf.c | 20 +++
>> >> net/ipv6/af_inet6.c | 7 +
>> >> net/ipv6/exthdrs.c | 67 ++++++++
>> >> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
>> >> net/ipv6/sysctl_net_ipv6.c | 7 +
>> >> 11 files changed, 533 insertions(+), 1 deletion(-)
>> >> create mode 100644 include/net/ioam6.h
>> >> create mode 100644 net/ipv6/ioam6.c
>> >>
>> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>> >> index 5312a718bc7a..15732f964c6e 100644
>> >> --- a/include/linux/ipv6.h
>> >> +++ b/include/linux/ipv6.h
>> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
>> >> __s32 disable_policy;
>> >> __s32 ndisc_tclass;
>> >> __s32 rpl_seg_enabled;
>> >> + __u32 ioam6_enabled;
>> >> + __u32 ioam6_id;
>> >>
>> >> struct ctl_table_header *sysctl_header;
>> >> };
>> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
>> >> new file mode 100644
>> >> index 000000000000..2a910bc99947
>> >> --- /dev/null
>> >> +++ b/include/net/ioam6.h
>> >> @@ -0,0 +1,98 @@
>> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> >> +/*
>> >> + * IOAM IPv6 implementation
>> >> + *
>> >> + * Author:
>> >> + * Justin Iurman <justin.iurman@uliege.be>
>> >> + */
>> >> +
>> >> +#ifndef _NET_IOAM6_H
>> >> +#define _NET_IOAM6_H
>> >> +
>> >> +#include <linux/net.h>
>> >> +#include <linux/ipv6.h>
>> >> +#include <linux/rhashtable-types.h>
>> >> +
>> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
>> >> +
>> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
>> >> +
>> >> +#define IOAM6_TRACE_TYPE0 (1 << 31)
>> >> +#define IOAM6_TRACE_TYPE1 (1 << 30)
>> >> +#define IOAM6_TRACE_TYPE2 (1 << 29)
>> >> +#define IOAM6_TRACE_TYPE3 (1 << 28)
>> >> +#define IOAM6_TRACE_TYPE4 (1 << 27)
>> >> +#define IOAM6_TRACE_TYPE5 (1 << 26)
>> >> +#define IOAM6_TRACE_TYPE6 (1 << 25)
>> >> +#define IOAM6_TRACE_TYPE7 (1 << 24)
>> >> +#define IOAM6_TRACE_TYPE8 (1 << 23)
>> >> +#define IOAM6_TRACE_TYPE9 (1 << 22)
>> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
>> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
>> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
>> >> +
>> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
>> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
>> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
>> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
>> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
>> >> +
>> >> +struct ioam6_common_hdr {
>> >> + u8 opt_type;
>> >> + u8 opt_len;
>> >> + u8 res;
>> >> + u8 ioam_type;
>> >> + __be16 namespace_id;
>> >> +} __packed;
>> >> +
>> >> +struct ioam6_trace_hdr {
>> >> + __be16 info;
>> >> + __be32 type;
>> >> +} __packed;
>> >> +
>> >> +struct ioam6_namespace {
>> >> + struct rhash_head head;
>> >> + struct rcu_head rcu;
>> >> +
>> >> + __be16 id;
>> >> + __be64 data;
>> >> + bool remove_tlv;
>> >> +
>> >> + struct ioam6_schema *schema;
>> >> +};
>> >> +
>> >> +struct ioam6_schema {
>> >> + struct rhash_head head;
>> >> + struct rcu_head rcu;
>> >> +
>> >> + u32 id;
>> >> + int len;
>> >> + __be32 hdr;
>> >> + u8 *data;
>> >> +
>> >> + struct ioam6_namespace *ns;
>> >> +};
>> >> +
>> >> +struct ioam6_pernet_data {
>> >> + struct mutex lock;
>> >> + struct rhashtable namespaces;
>> >> + struct rhashtable schemas;
>> >> +};
>> >> +
>> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
>> >> +{
>> >> +#if IS_ENABLED(CONFIG_IPV6)
>> >> + return net->ipv6.ioam6_data;
>> >> +#else
>> >> + return NULL;
>> >> +#endif
>> >> +}
>> >> +
>> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
>> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> >> + struct ioam6_namespace *ns);
>> >> +
>> >> +extern int ioam6_init(void);
>> >> +extern void ioam6_exit(void);
>> >> +
>> >> +#endif
>> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
>> >> index 5ec054473d81..89b27fa721f4 100644
>> >> --- a/include/net/netns/ipv6.h
>> >> +++ b/include/net/netns/ipv6.h
>> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
>> >> int max_hbh_opts_len;
>> >> int seg6_flowlabel;
>> >> bool skip_notify_on_dev_down;
>> >> + unsigned int ioam6_id;
>> >> };
>> >>
>> >> struct netns_ipv6 {
>> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
>> >> spinlock_t lock;
>> >> u32 seq;
>> >> } ip6addrlbl_table;
>> >> + struct ioam6_pernet_data *ioam6_data;
>> >> };
>> >>
>> >> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
>> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
>> >> index 9f2273a08356..1c98435220c9 100644
>> >> --- a/include/uapi/linux/in6.h
>> >> +++ b/include/uapi/linux/in6.h
>> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
>> >> #define IPV6_TLV_PADN 1
>> >> #define IPV6_TLV_ROUTERALERT 5
>> >> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
>> >> +#define IPV6_TLV_IOAM_HOPOPTS 49
>> >
>> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
>> > Note from RFC7120:
>> >
>> > "Implementers and deployers need to be aware that deprecation and
>> > de-allocation could take place at any time after expiry; therefore, an
>> > expired early allocation is best considered as deprecated. It is not
>> > IANA's responsibility to track the status of allocations, their
>> > expirations, or when they may be re-allocated."
>> >
>> > The expiration date is Please add a comment here and in the
>> > Documentation to this effect.
>> >
>> >> #define IPV6_TLV_JUMBO 194
>> >> #define IPV6_TLV_HAO 201 /* home address option */
>> >>
>> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
>> >> index 13e8751bf24a..eb521b2dd885 100644
>> >> --- a/include/uapi/linux/ipv6.h
>> >> +++ b/include/uapi/linux/ipv6.h
>> >> @@ -189,6 +189,8 @@ enum {
>> >> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>> >> DEVCONF_NDISC_TCLASS,
>> >> DEVCONF_RPL_SEG_ENABLED,
>> >> + DEVCONF_IOAM6_ENABLED,
>> >> + DEVCONF_IOAM6_ID,
>> >> DEVCONF_MAX
>> >> };
>> >>
>> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
>> >> index cf7b47bdb9b3..b7ef10d417d6 100644
>> >> --- a/net/ipv6/Makefile
>> >> +++ b/net/ipv6/Makefile
>> >> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o
>> >> addrconf.o \
>> >> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
>> >> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
>> >> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
>> >> - udp_offload.o seg6.o fib6_notifier.o rpl.o
>> >> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
>> >>
>> >> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
>> >>
>> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> >> index 840bfdb3d7bd..6c952a28ade2 100644
>> >> --- a/net/ipv6/addrconf.c
>> >> +++ b/net/ipv6/addrconf.c
>> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
>> >> .disable_policy = 0,
>> >> .rpl_seg_enabled = 0,
>> >> + .ioam6_enabled = 0,
>> >> + .ioam6_id = 0,
>> >> };
>> >>
>> >> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
>> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
>> >> {
>> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
>> >> .disable_policy = 0,
>> >> .rpl_seg_enabled = 0,
>> >> + .ioam6_enabled = 0,
>> >> + .ioam6_id = 0,
>> >> };
>> >>
>> >> /* Check if link is ready: is it up and is a valid qdisc available */
>> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
>> >> *cnf,
>> >> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
>> >> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
>> >> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
>> >> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
>> >> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
>> >> }
>> >>
>> >> static inline size_t inet6_ifla6_size(void)
>> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
>> >> .mode = 0644,
>> >> .proc_handler = proc_dointvec,
>> >> },
>> >> + {
>> >> + .procname = "ioam6_enabled",
>> >> + .data = &ipv6_devconf.ioam6_enabled,
>> >> + .maxlen = sizeof(int),
>> >> + .mode = 0644,
>> >> + .proc_handler = proc_dointvec,
>> >> + },
>> >> + {
>> >> + .procname = "ioam6_id",
>> >> + .data = &ipv6_devconf.ioam6_id,
>> >> + .maxlen = sizeof(int),
>> >> + .mode = 0644,
>> >> + .proc_handler = proc_dointvec,
>> >> + },
>> >> {
>> >> /* sentinel */
>> >> }
>> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
>> >> index b304b882e031..63a9ffc4b283 100644
>> >> --- a/net/ipv6/af_inet6.c
>> >> +++ b/net/ipv6/af_inet6.c
>> >> @@ -62,6 +62,7 @@
>> >> #include <net/rpl.h>
>> >> #include <net/compat.h>
>> >> #include <net/xfrm.h>
>> >> +#include <net/ioam6.h>
>> >>
>> >> #include <linux/uaccess.h>
>> >> #include <linux/mroute6.h>
>> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
>> >> if (err)
>> >> goto rpl_fail;
>> >>
>> >> + err = ioam6_init();
>> >> + if (err)
>> >> + goto ioam6_fail;
>> >> +
>> >> err = igmp6_late_init();
>> >> if (err)
>> >> goto igmp6_late_err;
>> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
>> >> #endif
>> >> igmp6_late_err:
>> >> rpl_exit();
>> >> +ioam6_fail:
>> >> + ioam6_exit();
>> >> rpl_fail:
>> >> seg6_exit();
>> >> seg6_fail:
>> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
>> >> index f27ab3bf2e0c..00aee1358f1c 100644
>> >> --- a/net/ipv6/exthdrs.c
>> >> +++ b/net/ipv6/exthdrs.c
>> >> @@ -49,6 +49,8 @@
>> >> #include <net/seg6_hmac.h>
>> >> #endif
>> >> #include <net/rpl.h>
>> >> +#include <net/ioam6.h>
>> >> +#include <net/dst_metadata.h>
>> >>
>> >> #include <linux/uaccess.h>
>> >>
>> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
>> >> return TLV_REJECT;
>> >> }
>> >>
>> >> +/* IOAM */
>> >> +
>> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
>> >> +{
>> >> + struct ioam6_common_hdr *ioamh;
>> >> + struct ioam6_namespace *ns;
>> >> +
>> >> + /* Must be 4n-aligned */
>> >> + if (optoff & 3)
>> >> + goto drop;
>> >> +
>> >> + if (!skb_valid_dst(skb))
>> >> + ip6_route_input(skb);
>> >> +
>> >> + /* IOAM must be enabled on ingress interface */
>> >> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
>> >> + goto drop;
>> >> +
>> >> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
>> >> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
>> >> +
>> >> + /* Unknown IOAM namespace, either:
>> >> + * - Drop it if IOAM is not enabled on egress interface (if any)
>> >> + * - Ignore it otherwise
>> >> + */
>> >> + if (!ns) {
>> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> >> + goto drop;
>> >> +
>> >> + goto accept;
>> >> + }
>> >> +
>> >> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> >> + goto remove;
>> >> +
>> >> + /* Known IOAM namespace which must not be removed:
>> >> + * IOAM must be enabled on egress interface
>> >> + */
>> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
>> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
>> >> + goto drop;
>> >> +
>> >> + switch (ioamh->ioam_type) {
>> >> + case IOAM6_OPT_TRACE_PREALLOC:
>> >> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
>> >> + IP6CB(skb)->flags |= IP6SKB_IOAM;
>> >> + break;
>> >> + default:
>> >> + break;
>> >> + }
>> >> +
>> >> +accept:
>> >> + return TLV_ACCEPT;
>> >> +remove:
>> >> + return TLV_REMOVE;
>> >> +drop:
>> >> + kfree_skb(skb);
>> >> + return TLV_REJECT;
>> >> +}
>> >> +
>> >> /* Jumbo payload */
>> >>
>> >> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
>> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
>> >> .type = IPV6_TLV_ROUTERALERT,
>> >> .func = ipv6_hop_ra,
>> >> },
>> >> + {
>> >> + .type = IPV6_TLV_IOAM_HOPOPTS,
>> >> + .func = ipv6_hop_ioam,
>> >> + },
>> >> {
>> >> .type = IPV6_TLV_JUMBO,
>> >> .func = ipv6_hop_jumbo,
>> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
>> >> new file mode 100644
>> >> index 000000000000..406aa78eb504
>> >> --- /dev/null
>> >> +++ b/net/ipv6/ioam6.c
>> >> @@ -0,0 +1,326 @@
>> >> +// SPDX-License-Identifier: GPL-2.0-or-later
>> >> +/*
>> >> + * IOAM IPv6 implementation
>> >> + *
>> >> + * Author:
>> >> + * Justin Iurman <justin.iurman@uliege.be>
>> >> + */
>> >> +
>> >> +#include <linux/errno.h>
>> >> +#include <linux/types.h>
>> >> +#include <linux/kernel.h>
>> >> +#include <linux/net.h>
>> >> +#include <linux/rhashtable.h>
>> >> +
>> >> +#include <net/addrconf.h>
>> >> +#include <net/ioam6.h>
>> >> +
>> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
>> >> +{
>> >> + kfree_rcu(ns, rcu);
>> >> +}
>> >> +
>> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
>> >> +{
>> >> + kfree_rcu(sc, rcu);
>> >> +}
>> >> +
>> >> +static void ioam6_free_ns(void *ptr, void *arg)
>> >> +{
>> >> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
>> >> +
>> >> + if (ns)
>> >> + ioam6_ns_release(ns);
>> >> +}
>> >> +
>> >> +static void ioam6_free_sc(void *ptr, void *arg)
>> >> +{
>> >> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
>> >> +
>> >> + if (sc)
>> >> + ioam6_sc_release(sc);
>> >> +}
>> >> +
>> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> >> +{
>> >> + const struct ioam6_namespace *ns = obj;
>> >> +
>> >> + return (ns->id != *(__be16 *)arg->key);
>> >> +}
>> >> +
>> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
>> >> +{
>> >> + const struct ioam6_schema *sc = obj;
>> >> +
>> >> + return (sc->id != *(u32 *)arg->key);
>> >> +}
>> >> +
>> >> +static const struct rhashtable_params rht_ns_params = {
>> >> + .key_len = sizeof(__be16),
>> >> + .key_offset = offsetof(struct ioam6_namespace, id),
>> >> + .head_offset = offsetof(struct ioam6_namespace, head),
>> >> + .automatic_shrinking = true,
>> >> + .obj_cmpfn = ioam6_ns_cmpfn,
>> >> +};
>> >> +
>> >> +static const struct rhashtable_params rht_sc_params = {
>> >> + .key_len = sizeof(u32),
>> >> + .key_offset = offsetof(struct ioam6_schema, id),
>> >> + .head_offset = offsetof(struct ioam6_schema, head),
>> >> + .automatic_shrinking = true,
>> >> + .obj_cmpfn = ioam6_sc_cmpfn,
>> >> +};
>> >> +
>> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
>> >> +{
>> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> >> +
>> >> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
>> >> +}
>> >> +
>> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
>> >> + u32 trace_type, struct ioam6_namespace *ns)
>> >> +{
>> >> + u8 *data = skb_network_header(skb) + nodeoff;
>> >> + struct __kernel_sock_timeval ts;
>> >> + u64 raw_u64;
>> >> + u32 raw_u32;
>> >> + u16 raw_u16;
>> >> + u8 byte;
>> >> +
>> >> + /* hop_lim and node_id */
>> >> + if (trace_type & IOAM6_TRACE_TYPE0) {
>> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
>> >> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> >> + if (!raw_u32)
>> >> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
>> >> + else
>> >> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
>> >> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* ingress_if_id and egress_if_id */
>> >> + if (trace_type & IOAM6_TRACE_TYPE1) {
>> >> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> >> + if (!raw_u16)
>> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
>> >> + data += sizeof(__be16);
>> >> +
>> >> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> >> + if (!raw_u16)
>> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
>> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
>> >> + data += sizeof(__be16);
>> >> + }
>> >> +
>> >> + /* timestamp seconds */
>> >> + if (trace_type & IOAM6_TRACE_TYPE2) {
>> >> + if (!skb->tstamp) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> + } else {
>> >> + skb_get_new_timestamp(skb, &ts);
>> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
>> >> + }
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* timestamp subseconds */
>> >> + if (trace_type & IOAM6_TRACE_TYPE3) {
>> >> + if (!skb->tstamp) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> + } else {
>> >> + if (!(trace_type & IOAM6_TRACE_TYPE2))
>> >> + skb_get_new_timestamp(skb, &ts);
>> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
>> >> + }
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* transit delay */
>> >> + if (trace_type & IOAM6_TRACE_TYPE4) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* namespace data */
>> >> + if (trace_type & IOAM6_TRACE_TYPE5) {
>> >> + *(__be32 *)data = (__be32)ns->data;
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* queue depth */
>> >> + if (trace_type & IOAM6_TRACE_TYPE6) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* hop_lim and node_id (wide) */
>> >> + if (trace_type & IOAM6_TRACE_TYPE7) {
>> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
>> >> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
>> >> + if (!raw_u64)
>> >> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
>> >> + else
>> >> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
>> >> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
>> >> + data += sizeof(__be64);
>> >> + }
>> >> +
>> >> + /* ingress_if_id and egress_if_id (wide) */
>> >> + if (trace_type & IOAM6_TRACE_TYPE8) {
>> >> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
>> >> + if (!raw_u32)
>> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
>> >> + data += sizeof(__be32);
>> >> +
>> >> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
>> >> + if (!raw_u32)
>> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
>> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* namespace data (wide) */
>> >> + if (trace_type & IOAM6_TRACE_TYPE9) {
>> >> + *(__be64 *)data = ns->data;
>> >> + data += sizeof(__be64);
>> >> + }
>> >> +
>> >> + /* buffer occupancy */
>> >> + if (trace_type & IOAM6_TRACE_TYPE10) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* checksum complement */
>> >> + if (trace_type & IOAM6_TRACE_TYPE11) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
>> >> + data += sizeof(__be32);
>> >> + }
>> >> +
>> >> + /* opaque state snapshot */
>> >> + if (trace_type & IOAM6_TRACE_TYPE22) {
>> >> + if (!ns->schema) {
>> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
>> >> + } else {
>> >> + *(__be32 *)data = ns->schema->hdr;
>> >> + data += sizeof(__be32);
>> >> + memcpy(data, ns->schema->data, ns->schema->len);
>> >> + }
>> >> + }
>> >> +}
>> >> +
>> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
>> >> + struct ioam6_namespace *ns)
>> >> +{
>> >> + u8 nodelen, flags, remlen, sclen = 0;
>> >> + struct ioam6_trace_hdr *trh;
>> >> + int nodeoff;
>> >> + u16 info;
>> >> + u32 type;
>> >> +
>> >> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
>> >> + info = be16_to_cpu(trh->info);
>> >> + type = be32_to_cpu(trh->type);
>> >> +
>> >> + nodelen = info >> 11;
>> >> + flags = (info >> 7) & 0xf;
>> >> + remlen = info & 0x7f;
>> >> +
>> >> + /* Skip if Overflow bit is set OR
>> >> + * if an unknown type (bit 12-21) is set
>> >> + */
>> >> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
>> >> + return;
>> >> +
>> >> + /* NodeLen does not include Opaque State Snapshot length. We need to
>> >> + * take it into account if the corresponding bit is set and if current
>> >> + * IOAM namespace has an active schema attached to it
>> >> + */
>> >> + if (type & IOAM6_TRACE_TYPE22) {
>> >> + /* Opaque State Snapshot header size */
>> >> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
>> >> +
>> >> + if (ns->schema)
>> >> + sclen += ns->schema->len / 4;
>> >> + }
>> >> +
>> >> + /* Not enough space remaining: set Overflow bit and skip */
>> >> + if (!remlen || remlen < (nodelen + sclen)) {
>> >> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
>> >> + trh->info = cpu_to_be16(info);
>> >> + return;
>> >> + }
>> >> +
>> >> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
>> >> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
>> >> +
>> >> + /* Update RemainingLen */
>> >> + remlen -= nodelen + sclen;
>> >> + info = (info & 0xff80) | remlen;
>> >> + trh->info = cpu_to_be16(info);
>> >> +}
>> >> +
>> >> +static int __net_init ioam6_net_init(struct net *net)
>> >> +{
>> >> + struct ioam6_pernet_data *nsdata;
>> >> + int err = -ENOMEM;
>> >> +
>> >> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
>> >> + if (!nsdata)
>> >> + goto out;
>> >> +
>> >> + mutex_init(&nsdata->lock);
>> >> + net->ipv6.ioam6_data = nsdata;
>> >> +
>> >> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
>> >> + if (err)
>> >> + goto free_nsdata;
>> >> +
>> >> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
>> >> + if (err)
>> >> + goto free_rht_ns;
>> >> +
>> >> +out:
>> >> + return err;
>> >> +free_rht_ns:
>> >> + rhashtable_destroy(&nsdata->namespaces);
>> >> +free_nsdata:
>> >> + kfree(nsdata);
>> >> + net->ipv6.ioam6_data = NULL;
>> >> + goto out;
>> >> +}
>> >> +
>> >> +static void __net_exit ioam6_net_exit(struct net *net)
>> >> +{
>> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
>> >> +
>> >> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
>> >> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
>> >> +
>> >> + kfree(nsdata);
>> >> +}
>> >> +
>> >> +static struct pernet_operations ioam6_net_ops = {
>> >> + .init = ioam6_net_init,
>> >> + .exit = ioam6_net_exit,
>> >> +};
>> >> +
>> >> +int __init ioam6_init(void)
>> >> +{
>> >> + int err = register_pernet_subsys(&ioam6_net_ops);
>> >> +
>> >> + if (err)
>> >> + return err;
>> >> +
>> >> + pr_info("In-situ OAM (IOAM) with IPv6\n");
>> >> + return 0;
>> >> +}
>> >> +
>> >> +void ioam6_exit(void)
>> >> +{
>> >> + unregister_pernet_subsys(&ioam6_net_ops);
>> >> +}
>> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
>> >> index fac2135aa47b..da49b33ab6fc 100644
>> >> --- a/net/ipv6/sysctl_net_ipv6.c
>> >> +++ b/net/ipv6/sysctl_net_ipv6.c
>> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
>> >> .mode = 0644,
>> >> .proc_handler = proc_dointvec
>> >> },
>> >> + {
>> >> + .procname = "ioam6_id",
>> >> + .data = &init_net.ipv6.sysctl.ioam6_id,
>> >> + .maxlen = sizeof(int),
>> >> + .mode = 0644,
>> >> + .proc_handler = proc_dointvec
>> >> + },
>> >> { }
>> >> };
>> >>
>> >> --
> > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace
2020-06-26 8:13 ` Justin Iurman
@ 2020-06-26 14:53 ` Tom Herbert
0 siblings, 0 replies; 42+ messages in thread
From: Tom Herbert @ 2020-06-26 14:53 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Fri, Jun 26, 2020 at 1:13 AM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Tom,
>
> >> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> >> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> >> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
> >> >>
> >> >
> >> > The IANA allocation is TEMPORARY, with an expiration date is
> >> > 4/16/2021. Note from RFC7120:
> >> >
> >> > "Implementers and deployers need to be aware that deprecation and
> >> > de-allocation could take place at any time after expiry; therefore, an
> >> > expired early allocation is best considered as deprecated."
> >> >
> >> > Please add a comment in the code and in the Documentation to this effect.
> >>
> >> I'll do that, thanks. What kind of comment (is there an official pattern?) and,
> >> where in the Documentation should I add it?
> >>
> >> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> >> >> packets. Default is drop.
> >> >
> >> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> >> > packet containing the IOAM HBH option . Note that the act bits of the
> >>
> >> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets
> >> containing the IOAM HBH option.
> >>
> >> > option type are 00 which means the TLV is skipped if the option isn't
> >> > processed soI don't think it's correct to drop these packets by
> >> > default.
> >>
> >> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for
> >> this option, I do believe it should be disabled (dropped) by default for nodes
> >> that "speak IOAM". Indeed, you don't want anyone with a kernel that includes
> >> IOAM to accept IOAM packets by default, which would mean that anyone would
> >> create (potentially without being aware) an IOAM domain. And, also, to avoid
> >> spreading leaks.
> >>
> > I think you're convoluting whether a node processes an IOAM or whether
> > it needs to drop because it doesn't process. Yes, on a IOAM system it
> > makes sense to allow configuration at whether to process the TLV.
> > However, even when it doesn't then the TLV should be skipped and the
> > packet not dropped. We know this is the correct behavior since on a
> > system that isn't IOAM aware, i.e. all deployed nodes right now, they
> > will skip the TLV per the act bits. If we want to change the default
> > behavior, the only way to do that is to change the act bits to
> > non-zero.
>
> Makes sense, you're right indeed. But still, I'm a bit worried to enable it by default. That would open the door to things we don't want. We'd end up in a situation where IOAM is not "privately" deployed. And, think about the guy that runs a kernel with IOAM (that he does not know anything about). Of course, he would not have a FW to drop IOAM. Therefore, someone could simply "create" an IOAM domain with him by sending IPv6 packets with IOAM HBH and steel data. This is something similar to the leak problem.
>
Indeed, draft-ioametal-ippm-6man-ioam-ipv6-options-02 states: "Unless
a particular interface is explicitly enabled (i.e. explicitly
configured) for IOAM, a router MUST drop packets which contain
extension headers carrying IOAM data-fields." I believe this
requirement contradicts the option type act bits being zero. I've
posted to IOAM list about this.
> So, I think there are 2 possibilities against the above: (i) the current one, ie drop by default or (ii) use 01 for act bits. This topic has been widely discussed in the WG and is still open, though the trend seems to be "00" with the drop-by-default compromise.
>
> > For the leakage problem, that is a firewall issue. The expectation is
> > that border devices will have rules that prevent leaking packets out
> > of their domain. This is an orthogonal mechanism that needs to be done
> > for other protocols-- SRH for instance. The filtering is simple, just
> > drop the packet when TLV matches (although I suspect most sites
> > probably just drop packets with EH at this point). This doesn't
> > require any changes to the implementation and doesn't require that
> > border devices even implement IOAM-- they just drop on pattern
> > matching.
>
> +1
Mentioned that also.
>
> Justin
>
> > Tom
> >> Justin
> >>
> >> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
> >> >> (unique) identifier of the interface.
> >> >>
> >> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> >> >> identifier of the node.
> >> >>
> >> >> Two relativistic hash tables: one for IOAM namespaces, the other for
> >> >> IOAM schemas. A namespace can only have a single active schema and a
> >> >> schema can only be attached to a single namespace (1:1 relationship).
> >> >>
> >> >> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >> >> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >> >> [3]
> >> >> https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
> >> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> >> >> ---
> >> >> include/linux/ipv6.h | 2 +
> >> >> include/net/ioam6.h | 98 +++++++++++
> >> >> include/net/netns/ipv6.h | 2 +
> >> >> include/uapi/linux/in6.h | 1 +
> >> >> include/uapi/linux/ipv6.h | 2 +
> >> >> net/ipv6/Makefile | 2 +-
> >> >> net/ipv6/addrconf.c | 20 +++
> >> >> net/ipv6/af_inet6.c | 7 +
> >> >> net/ipv6/exthdrs.c | 67 ++++++++
> >> >> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
> >> >> net/ipv6/sysctl_net_ipv6.c | 7 +
> >> >> 11 files changed, 533 insertions(+), 1 deletion(-)
> >> >> create mode 100644 include/net/ioam6.h
> >> >> create mode 100644 net/ipv6/ioam6.c
> >> >>
> >> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> >> index 5312a718bc7a..15732f964c6e 100644
> >> >> --- a/include/linux/ipv6.h
> >> >> +++ b/include/linux/ipv6.h
> >> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> >> >> __s32 disable_policy;
> >> >> __s32 ndisc_tclass;
> >> >> __s32 rpl_seg_enabled;
> >> >> + __u32 ioam6_enabled;
> >> >> + __u32 ioam6_id;
> >> >>
> >> >> struct ctl_table_header *sysctl_header;
> >> >> };
> >> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> >> >> new file mode 100644
> >> >> index 000000000000..2a910bc99947
> >> >> --- /dev/null
> >> >> +++ b/include/net/ioam6.h
> >> >> @@ -0,0 +1,98 @@
> >> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> >> +/*
> >> >> + * IOAM IPv6 implementation
> >> >> + *
> >> >> + * Author:
> >> >> + * Justin Iurman <justin.iurman@uliege.be>
> >> >> + */
> >> >> +
> >> >> +#ifndef _NET_IOAM6_H
> >> >> +#define _NET_IOAM6_H
> >> >> +
> >> >> +#include <linux/net.h>
> >> >> +#include <linux/ipv6.h>
> >> >> +#include <linux/rhashtable-types.h>
> >> >> +
> >> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
> >> >> +
> >> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> >> >> +
> >> >> +#define IOAM6_TRACE_TYPE0 (1 << 31)
> >> >> +#define IOAM6_TRACE_TYPE1 (1 << 30)
> >> >> +#define IOAM6_TRACE_TYPE2 (1 << 29)
> >> >> +#define IOAM6_TRACE_TYPE3 (1 << 28)
> >> >> +#define IOAM6_TRACE_TYPE4 (1 << 27)
> >> >> +#define IOAM6_TRACE_TYPE5 (1 << 26)
> >> >> +#define IOAM6_TRACE_TYPE6 (1 << 25)
> >> >> +#define IOAM6_TRACE_TYPE7 (1 << 24)
> >> >> +#define IOAM6_TRACE_TYPE8 (1 << 23)
> >> >> +#define IOAM6_TRACE_TYPE9 (1 << 22)
> >> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> >> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> >> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> >> >> +
> >> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> >> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> >> >> +
> >> >> +struct ioam6_common_hdr {
> >> >> + u8 opt_type;
> >> >> + u8 opt_len;
> >> >> + u8 res;
> >> >> + u8 ioam_type;
> >> >> + __be16 namespace_id;
> >> >> +} __packed;
> >> >> +
> >> >> +struct ioam6_trace_hdr {
> >> >> + __be16 info;
> >> >> + __be32 type;
> >> >> +} __packed;
> >> >> +
> >> >> +struct ioam6_namespace {
> >> >> + struct rhash_head head;
> >> >> + struct rcu_head rcu;
> >> >> +
> >> >> + __be16 id;
> >> >> + __be64 data;
> >> >> + bool remove_tlv;
> >> >> +
> >> >> + struct ioam6_schema *schema;
> >> >> +};
> >> >> +
> >> >> +struct ioam6_schema {
> >> >> + struct rhash_head head;
> >> >> + struct rcu_head rcu;
> >> >> +
> >> >> + u32 id;
> >> >> + int len;
> >> >> + __be32 hdr;
> >> >> + u8 *data;
> >> >> +
> >> >> + struct ioam6_namespace *ns;
> >> >> +};
> >> >> +
> >> >> +struct ioam6_pernet_data {
> >> >> + struct mutex lock;
> >> >> + struct rhashtable namespaces;
> >> >> + struct rhashtable schemas;
> >> >> +};
> >> >> +
> >> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> >> >> +{
> >> >> +#if IS_ENABLED(CONFIG_IPV6)
> >> >> + return net->ipv6.ioam6_data;
> >> >> +#else
> >> >> + return NULL;
> >> >> +#endif
> >> >> +}
> >> >> +
> >> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> >> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> >> + struct ioam6_namespace *ns);
> >> >> +
> >> >> +extern int ioam6_init(void);
> >> >> +extern void ioam6_exit(void);
> >> >> +
> >> >> +#endif
> >> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> >> >> index 5ec054473d81..89b27fa721f4 100644
> >> >> --- a/include/net/netns/ipv6.h
> >> >> +++ b/include/net/netns/ipv6.h
> >> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> >> >> int max_hbh_opts_len;
> >> >> int seg6_flowlabel;
> >> >> bool skip_notify_on_dev_down;
> >> >> + unsigned int ioam6_id;
> >> >> };
> >> >>
> >> >> struct netns_ipv6 {
> >> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> >> >> spinlock_t lock;
> >> >> u32 seq;
> >> >> } ip6addrlbl_table;
> >> >> + struct ioam6_pernet_data *ioam6_data;
> >> >> };
> >> >>
> >> >> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> >> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> >> >> index 9f2273a08356..1c98435220c9 100644
> >> >> --- a/include/uapi/linux/in6.h
> >> >> +++ b/include/uapi/linux/in6.h
> >> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> >> >> #define IPV6_TLV_PADN 1
> >> >> #define IPV6_TLV_ROUTERALERT 5
> >> >> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
> >> >> +#define IPV6_TLV_IOAM_HOPOPTS 49
> >> >
> >> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> >> > Note from RFC7120:
> >> >
> >> > "Implementers and deployers need to be aware that deprecation and
> >> > de-allocation could take place at any time after expiry; therefore, an
> >> > expired early allocation is best considered as deprecated. It is not
> >> > IANA's responsibility to track the status of allocations, their
> >> > expirations, or when they may be re-allocated."
> >> >
> >> > The expiration date is Please add a comment here and in the
> >> > Documentation to this effect.
> >> >
> >> >> #define IPV6_TLV_JUMBO 194
> >> >> #define IPV6_TLV_HAO 201 /* home address option */
> >> >>
> >> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> >> >> index 13e8751bf24a..eb521b2dd885 100644
> >> >> --- a/include/uapi/linux/ipv6.h
> >> >> +++ b/include/uapi/linux/ipv6.h
> >> >> @@ -189,6 +189,8 @@ enum {
> >> >> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> >> >> DEVCONF_NDISC_TCLASS,
> >> >> DEVCONF_RPL_SEG_ENABLED,
> >> >> + DEVCONF_IOAM6_ENABLED,
> >> >> + DEVCONF_IOAM6_ID,
> >> >> DEVCONF_MAX
> >> >> };
> >> >>
> >> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> >> >> index cf7b47bdb9b3..b7ef10d417d6 100644
> >> >> --- a/net/ipv6/Makefile
> >> >> +++ b/net/ipv6/Makefile
> >> >> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o
> >> >> addrconf.o \
> >> >> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> >> >> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> >> >> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> >> >> - udp_offload.o seg6.o fib6_notifier.o rpl.o
> >> >> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
> >> >>
> >> >> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
> >> >>
> >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> >> index 840bfdb3d7bd..6c952a28ade2 100644
> >> >> --- a/net/ipv6/addrconf.c
> >> >> +++ b/net/ipv6/addrconf.c
> >> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> >> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> >> >> .disable_policy = 0,
> >> >> .rpl_seg_enabled = 0,
> >> >> + .ioam6_enabled = 0,
> >> >> + .ioam6_id = 0,
> >> >> };
> >> >>
> >> >> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> >> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
> >> >> {
> >> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> >> >> .disable_policy = 0,
> >> >> .rpl_seg_enabled = 0,
> >> >> + .ioam6_enabled = 0,
> >> >> + .ioam6_id = 0,
> >> >> };
> >> >>
> >> >> /* Check if link is ready: is it up and is a valid qdisc available */
> >> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
> >> >> *cnf,
> >> >> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> >> >> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> >> >> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> >> >> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> >> >> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> >> >> }
> >> >>
> >> >> static inline size_t inet6_ifla6_size(void)
> >> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> >> >> .mode = 0644,
> >> >> .proc_handler = proc_dointvec,
> >> >> },
> >> >> + {
> >> >> + .procname = "ioam6_enabled",
> >> >> + .data = &ipv6_devconf.ioam6_enabled,
> >> >> + .maxlen = sizeof(int),
> >> >> + .mode = 0644,
> >> >> + .proc_handler = proc_dointvec,
> >> >> + },
> >> >> + {
> >> >> + .procname = "ioam6_id",
> >> >> + .data = &ipv6_devconf.ioam6_id,
> >> >> + .maxlen = sizeof(int),
> >> >> + .mode = 0644,
> >> >> + .proc_handler = proc_dointvec,
> >> >> + },
> >> >> {
> >> >> /* sentinel */
> >> >> }
> >> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> >> >> index b304b882e031..63a9ffc4b283 100644
> >> >> --- a/net/ipv6/af_inet6.c
> >> >> +++ b/net/ipv6/af_inet6.c
> >> >> @@ -62,6 +62,7 @@
> >> >> #include <net/rpl.h>
> >> >> #include <net/compat.h>
> >> >> #include <net/xfrm.h>
> >> >> +#include <net/ioam6.h>
> >> >>
> >> >> #include <linux/uaccess.h>
> >> >> #include <linux/mroute6.h>
> >> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> >> >> if (err)
> >> >> goto rpl_fail;
> >> >>
> >> >> + err = ioam6_init();
> >> >> + if (err)
> >> >> + goto ioam6_fail;
> >> >> +
> >> >> err = igmp6_late_init();
> >> >> if (err)
> >> >> goto igmp6_late_err;
> >> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> >> >> #endif
> >> >> igmp6_late_err:
> >> >> rpl_exit();
> >> >> +ioam6_fail:
> >> >> + ioam6_exit();
> >> >> rpl_fail:
> >> >> seg6_exit();
> >> >> seg6_fail:
> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> index f27ab3bf2e0c..00aee1358f1c 100644
> >> >> --- a/net/ipv6/exthdrs.c
> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> @@ -49,6 +49,8 @@
> >> >> #include <net/seg6_hmac.h>
> >> >> #endif
> >> >> #include <net/rpl.h>
> >> >> +#include <net/ioam6.h>
> >> >> +#include <net/dst_metadata.h>
> >> >>
> >> >> #include <linux/uaccess.h>
> >> >>
> >> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> return TLV_REJECT;
> >> >> }
> >> >>
> >> >> +/* IOAM */
> >> >> +
> >> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> >> >> +{
> >> >> + struct ioam6_common_hdr *ioamh;
> >> >> + struct ioam6_namespace *ns;
> >> >> +
> >> >> + /* Must be 4n-aligned */
> >> >> + if (optoff & 3)
> >> >> + goto drop;
> >> >> +
> >> >> + if (!skb_valid_dst(skb))
> >> >> + ip6_route_input(skb);
> >> >> +
> >> >> + /* IOAM must be enabled on ingress interface */
> >> >> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> >> >> + goto drop;
> >> >> +
> >> >> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> >> >> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> >> >> +
> >> >> + /* Unknown IOAM namespace, either:
> >> >> + * - Drop it if IOAM is not enabled on egress interface (if any)
> >> >> + * - Ignore it otherwise
> >> >> + */
> >> >> + if (!ns) {
> >> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> + goto drop;
> >> >> +
> >> >> + goto accept;
> >> >> + }
> >> >> +
> >> >> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> + goto remove;
> >> >> +
> >> >> + /* Known IOAM namespace which must not be removed:
> >> >> + * IOAM must be enabled on egress interface
> >> >> + */
> >> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> + goto drop;
> >> >> +
> >> >> + switch (ioamh->ioam_type) {
> >> >> + case IOAM6_OPT_TRACE_PREALLOC:
> >> >> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> >> >> + IP6CB(skb)->flags |= IP6SKB_IOAM;
> >> >> + break;
> >> >> + default:
> >> >> + break;
> >> >> + }
> >> >> +
> >> >> +accept:
> >> >> + return TLV_ACCEPT;
> >> >> +remove:
> >> >> + return TLV_REMOVE;
> >> >> +drop:
> >> >> + kfree_skb(skb);
> >> >> + return TLV_REJECT;
> >> >> +}
> >> >> +
> >> >> /* Jumbo payload */
> >> >>
> >> >> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >> .type = IPV6_TLV_ROUTERALERT,
> >> >> .func = ipv6_hop_ra,
> >> >> },
> >> >> + {
> >> >> + .type = IPV6_TLV_IOAM_HOPOPTS,
> >> >> + .func = ipv6_hop_ioam,
> >> >> + },
> >> >> {
> >> >> .type = IPV6_TLV_JUMBO,
> >> >> .func = ipv6_hop_jumbo,
> >> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> >> >> new file mode 100644
> >> >> index 000000000000..406aa78eb504
> >> >> --- /dev/null
> >> >> +++ b/net/ipv6/ioam6.c
> >> >> @@ -0,0 +1,326 @@
> >> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> >> +/*
> >> >> + * IOAM IPv6 implementation
> >> >> + *
> >> >> + * Author:
> >> >> + * Justin Iurman <justin.iurman@uliege.be>
> >> >> + */
> >> >> +
> >> >> +#include <linux/errno.h>
> >> >> +#include <linux/types.h>
> >> >> +#include <linux/kernel.h>
> >> >> +#include <linux/net.h>
> >> >> +#include <linux/rhashtable.h>
> >> >> +
> >> >> +#include <net/addrconf.h>
> >> >> +#include <net/ioam6.h>
> >> >> +
> >> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> >> >> +{
> >> >> + kfree_rcu(ns, rcu);
> >> >> +}
> >> >> +
> >> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> >> >> +{
> >> >> + kfree_rcu(sc, rcu);
> >> >> +}
> >> >> +
> >> >> +static void ioam6_free_ns(void *ptr, void *arg)
> >> >> +{
> >> >> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> >> >> +
> >> >> + if (ns)
> >> >> + ioam6_ns_release(ns);
> >> >> +}
> >> >> +
> >> >> +static void ioam6_free_sc(void *ptr, void *arg)
> >> >> +{
> >> >> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> >> >> +
> >> >> + if (sc)
> >> >> + ioam6_sc_release(sc);
> >> >> +}
> >> >> +
> >> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> >> +{
> >> >> + const struct ioam6_namespace *ns = obj;
> >> >> +
> >> >> + return (ns->id != *(__be16 *)arg->key);
> >> >> +}
> >> >> +
> >> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> >> +{
> >> >> + const struct ioam6_schema *sc = obj;
> >> >> +
> >> >> + return (sc->id != *(u32 *)arg->key);
> >> >> +}
> >> >> +
> >> >> +static const struct rhashtable_params rht_ns_params = {
> >> >> + .key_len = sizeof(__be16),
> >> >> + .key_offset = offsetof(struct ioam6_namespace, id),
> >> >> + .head_offset = offsetof(struct ioam6_namespace, head),
> >> >> + .automatic_shrinking = true,
> >> >> + .obj_cmpfn = ioam6_ns_cmpfn,
> >> >> +};
> >> >> +
> >> >> +static const struct rhashtable_params rht_sc_params = {
> >> >> + .key_len = sizeof(u32),
> >> >> + .key_offset = offsetof(struct ioam6_schema, id),
> >> >> + .head_offset = offsetof(struct ioam6_schema, head),
> >> >> + .automatic_shrinking = true,
> >> >> + .obj_cmpfn = ioam6_sc_cmpfn,
> >> >> +};
> >> >> +
> >> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> >> >> +{
> >> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> >> +
> >> >> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> >> >> +}
> >> >> +
> >> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> >> >> + u32 trace_type, struct ioam6_namespace *ns)
> >> >> +{
> >> >> + u8 *data = skb_network_header(skb) + nodeoff;
> >> >> + struct __kernel_sock_timeval ts;
> >> >> + u64 raw_u64;
> >> >> + u32 raw_u32;
> >> >> + u16 raw_u16;
> >> >> + u8 byte;
> >> >> +
> >> >> + /* hop_lim and node_id */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE0) {
> >> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
> >> >> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> >> + if (!raw_u32)
> >> >> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
> >> >> + else
> >> >> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> >> >> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* ingress_if_id and egress_if_id */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE1) {
> >> >> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u16)
> >> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
> >> >> + data += sizeof(__be16);
> >> >> +
> >> >> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u16)
> >> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
> >> >> + data += sizeof(__be16);
> >> >> + }
> >> >> +
> >> >> + /* timestamp seconds */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE2) {
> >> >> + if (!skb->tstamp) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + } else {
> >> >> + skb_get_new_timestamp(skb, &ts);
> >> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> >> >> + }
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* timestamp subseconds */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE3) {
> >> >> + if (!skb->tstamp) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + } else {
> >> >> + if (!(trace_type & IOAM6_TRACE_TYPE2))
> >> >> + skb_get_new_timestamp(skb, &ts);
> >> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> >> >> + }
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* transit delay */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE4) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* namespace data */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE5) {
> >> >> + *(__be32 *)data = (__be32)ns->data;
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* queue depth */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE6) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* hop_lim and node_id (wide) */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE7) {
> >> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
> >> >> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> >> + if (!raw_u64)
> >> >> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
> >> >> + else
> >> >> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> >> >> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> >> >> + data += sizeof(__be64);
> >> >> + }
> >> >> +
> >> >> + /* ingress_if_id and egress_if_id (wide) */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE8) {
> >> >> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u32)
> >> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
> >> >> + data += sizeof(__be32);
> >> >> +
> >> >> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u32)
> >> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* namespace data (wide) */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE9) {
> >> >> + *(__be64 *)data = ns->data;
> >> >> + data += sizeof(__be64);
> >> >> + }
> >> >> +
> >> >> + /* buffer occupancy */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE10) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* checksum complement */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE11) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* opaque state snapshot */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE22) {
> >> >> + if (!ns->schema) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> >> >> + } else {
> >> >> + *(__be32 *)data = ns->schema->hdr;
> >> >> + data += sizeof(__be32);
> >> >> + memcpy(data, ns->schema->data, ns->schema->len);
> >> >> + }
> >> >> + }
> >> >> +}
> >> >> +
> >> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> >> + struct ioam6_namespace *ns)
> >> >> +{
> >> >> + u8 nodelen, flags, remlen, sclen = 0;
> >> >> + struct ioam6_trace_hdr *trh;
> >> >> + int nodeoff;
> >> >> + u16 info;
> >> >> + u32 type;
> >> >> +
> >> >> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> >> >> + info = be16_to_cpu(trh->info);
> >> >> + type = be32_to_cpu(trh->type);
> >> >> +
> >> >> + nodelen = info >> 11;
> >> >> + flags = (info >> 7) & 0xf;
> >> >> + remlen = info & 0x7f;
> >> >> +
> >> >> + /* Skip if Overflow bit is set OR
> >> >> + * if an unknown type (bit 12-21) is set
> >> >> + */
> >> >> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> >> >> + return;
> >> >> +
> >> >> + /* NodeLen does not include Opaque State Snapshot length. We need to
> >> >> + * take it into account if the corresponding bit is set and if current
> >> >> + * IOAM namespace has an active schema attached to it
> >> >> + */
> >> >> + if (type & IOAM6_TRACE_TYPE22) {
> >> >> + /* Opaque State Snapshot header size */
> >> >> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> >> >> +
> >> >> + if (ns->schema)
> >> >> + sclen += ns->schema->len / 4;
> >> >> + }
> >> >> +
> >> >> + /* Not enough space remaining: set Overflow bit and skip */
> >> >> + if (!remlen || remlen < (nodelen + sclen)) {
> >> >> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> >> >> + trh->info = cpu_to_be16(info);
> >> >> + return;
> >> >> + }
> >> >> +
> >> >> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> >> >> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> >> >> +
> >> >> + /* Update RemainingLen */
> >> >> + remlen -= nodelen + sclen;
> >> >> + info = (info & 0xff80) | remlen;
> >> >> + trh->info = cpu_to_be16(info);
> >> >> +}
> >> >> +
> >> >> +static int __net_init ioam6_net_init(struct net *net)
> >> >> +{
> >> >> + struct ioam6_pernet_data *nsdata;
> >> >> + int err = -ENOMEM;
> >> >> +
> >> >> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> >> >> + if (!nsdata)
> >> >> + goto out;
> >> >> +
> >> >> + mutex_init(&nsdata->lock);
> >> >> + net->ipv6.ioam6_data = nsdata;
> >> >> +
> >> >> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> >> >> + if (err)
> >> >> + goto free_nsdata;
> >> >> +
> >> >> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> >> >> + if (err)
> >> >> + goto free_rht_ns;
> >> >> +
> >> >> +out:
> >> >> + return err;
> >> >> +free_rht_ns:
> >> >> + rhashtable_destroy(&nsdata->namespaces);
> >> >> +free_nsdata:
> >> >> + kfree(nsdata);
> >> >> + net->ipv6.ioam6_data = NULL;
> >> >> + goto out;
> >> >> +}
> >> >> +
> >> >> +static void __net_exit ioam6_net_exit(struct net *net)
> >> >> +{
> >> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> >> +
> >> >> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> >> >> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> >> >> +
> >> >> + kfree(nsdata);
> >> >> +}
> >> >> +
> >> >> +static struct pernet_operations ioam6_net_ops = {
> >> >> + .init = ioam6_net_init,
> >> >> + .exit = ioam6_net_exit,
> >> >> +};
> >> >> +
> >> >> +int __init ioam6_init(void)
> >> >> +{
> >> >> + int err = register_pernet_subsys(&ioam6_net_ops);
> >> >> +
> >> >> + if (err)
> >> >> + return err;
> >> >> +
> >> >> + pr_info("In-situ OAM (IOAM) with IPv6\n");
> >> >> + return 0;
> >> >> +}
> >> >> +
> >> >> +void ioam6_exit(void)
> >> >> +{
> >> >> + unregister_pernet_subsys(&ioam6_net_ops);
> >> >> +}
> >> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> >> >> index fac2135aa47b..da49b33ab6fc 100644
> >> >> --- a/net/ipv6/sysctl_net_ipv6.c
> >> >> +++ b/net/ipv6/sysctl_net_ipv6.c
> >> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> >> >> .mode = 0644,
> >> >> .proc_handler = proc_dointvec
> >> >> },
> >> >> + {
> >> >> + .procname = "ioam6_id",
> >> >> + .data = &init_net.ipv6.sysctl.ioam6_id,
> >> >> + .maxlen = sizeof(int),
> >> >> + .mode = 0644,
> >> >> + .proc_handler = proc_dointvec
> >> >> + },
> >> >> { }
> >> >> };
> >> >>
> >> >> --
> > > >> 2.17.1
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
` (2 preceding siblings ...)
2020-06-24 19:23 ` [PATCH net-next 3/5] ipv6: ioam: Data plane support for Pre-allocated Trace Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
2020-06-25 10:52 ` Dan Carpenter
2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
To: netdev; +Cc: davem, justin.iurman
Add Generic Netlink commands to allow userspace to configure IOAM
namespaces and schemas. The target is iproute2 and the patch is ready.
It will be posted as soon as this patchset is merged. Here is a taste:
$ sudo ip ioam
Usage: ip ioam { namespace | schema } { show | del ID }
schema add ID DATA
namespace add ID [ DATA ] [ POP ]
namespace set ID schema { ID | none }
POP := { true | false }
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
include/linux/ioam6.h | 7 +
include/uapi/linux/ioam6.h | 43 +++
net/ipv6/ioam6.c | 519 ++++++++++++++++++++++++++++++++++++-
3 files changed, 566 insertions(+), 3 deletions(-)
create mode 100644 include/linux/ioam6.h
create mode 100644 include/uapi/linux/ioam6.h
diff --git a/include/linux/ioam6.h b/include/linux/ioam6.h
new file mode 100644
index 000000000000..156223095e57
--- /dev/null
+++ b/include/linux/ioam6.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IOAM6_H
+#define _LINUX_IOAM6_H
+
+#include <uapi/linux/ioam6.h>
+
+#endif
diff --git a/include/uapi/linux/ioam6.h b/include/uapi/linux/ioam6.h
new file mode 100644
index 000000000000..d2be5f820dc5
--- /dev/null
+++ b/include/uapi/linux/ioam6.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_IOAM6_H
+#define _UAPI_LINUX_IOAM6_H
+
+#define IOAM6_GENL_NAME "IOAM6"
+#define IOAM6_GENL_VERSION 0x1
+
+enum {
+ IOAM6_ATTR_UNSPEC,
+
+ IOAM6_ATTR_NS_ID, /* u16 */
+ IOAM6_ATTR_NS_DATA, /* u64 */
+ IOAM6_ATTR_NS_POP, /* Flag */
+
+#define IOAM6_MAX_SCHEMA_DATA_LEN (255 * 4)
+ IOAM6_ATTR_SC_ID, /* u32 */
+ IOAM6_ATTR_SC_DATA, /* Binary */
+ IOAM6_ATTR_SC_NONE, /* Flag */
+
+ IOAM6_ATTR_PAD,
+
+ __IOAM6_ATTR_MAX,
+};
+#define IOAM6_ATTR_MAX (__IOAM6_ATTR_MAX - 1)
+
+enum {
+ IOAM6_CMD_UNSPEC,
+
+ IOAM6_CMD_ADD_NAMESPACE,
+ IOAM6_CMD_DEL_NAMESPACE,
+ IOAM6_CMD_DUMP_NAMESPACES,
+
+ IOAM6_CMD_ADD_SCHEMA,
+ IOAM6_CMD_DEL_SCHEMA,
+ IOAM6_CMD_DUMP_SCHEMAS,
+
+ IOAM6_CMD_NS_SET_SCHEMA,
+
+ __IOAM6_CMD_MAX,
+};
+#define IOAM6_CMD_MAX (__IOAM6_CMD_MAX - 1)
+
+#endif
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index 406aa78eb504..e414e915bf1e 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -11,8 +11,10 @@
#include <linux/kernel.h>
#include <linux/net.h>
#include <linux/rhashtable.h>
+#include <linux/ioam6.h>
#include <net/addrconf.h>
+#include <net/genetlink.h>
#include <net/ioam6.h>
static inline void ioam6_ns_release(struct ioam6_namespace *ns)
@@ -71,6 +73,507 @@ static const struct rhashtable_params rht_sc_params = {
.obj_cmpfn = ioam6_sc_cmpfn,
};
+static struct genl_family ioam6_genl_family;
+
+static const struct nla_policy ioam6_genl_policy[IOAM6_ATTR_MAX + 1] = {
+ [IOAM6_ATTR_NS_ID] = { .type = NLA_U16 },
+ [IOAM6_ATTR_NS_DATA] = { .type = NLA_U64 },
+ [IOAM6_ATTR_NS_POP] = { .type = NLA_FLAG },
+ [IOAM6_ATTR_SC_ID] = { .type = NLA_U32 },
+ [IOAM6_ATTR_SC_DATA] = { .type = NLA_BINARY,
+ .len = IOAM6_MAX_SCHEMA_DATA_LEN },
+ [IOAM6_ATTR_SC_NONE] = { .type = NLA_FLAG },
+};
+
+static int ioam6_genl_addns(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ioam6_pernet_data *nsdata;
+ struct ioam6_namespace *ns;
+ __be16 ns_id;
+ int err;
+
+ if (!info->attrs[IOAM6_ATTR_NS_ID])
+ return -EINVAL;
+
+ ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
+ nsdata = ioam6_pernet(net);
+
+ mutex_lock(&nsdata->lock);
+
+ ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
+ if (ns) {
+ err = -EEXIST;
+ goto out_unlock;
+ }
+
+ ns = kzalloc(sizeof(*ns), GFP_KERNEL);
+ if (!ns) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+
+ ns->id = ns_id;
+ ns->remove_tlv = info->attrs[IOAM6_ATTR_NS_POP] ? true : false;
+
+ if (!info->attrs[IOAM6_ATTR_NS_DATA]) {
+ ns->data = cpu_to_be64(IOAM6_EMPTY_FIELD_u64);
+ } else {
+ ns->data = cpu_to_be64(
+ nla_get_u64(info->attrs[IOAM6_ATTR_NS_DATA]));
+ }
+
+ err = rhashtable_lookup_insert_fast(&nsdata->namespaces, &ns->head,
+ rht_ns_params);
+ if (err)
+ kfree(ns);
+
+out_unlock:
+ mutex_unlock(&nsdata->lock);
+ return err;
+}
+
+static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ioam6_pernet_data *nsdata;
+ struct ioam6_namespace *ns;
+ __be16 ns_id;
+ int err;
+
+ if (!info->attrs[IOAM6_ATTR_NS_ID])
+ return -EINVAL;
+
+ ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
+ nsdata = ioam6_pernet(net);
+
+ mutex_lock(&nsdata->lock);
+
+ ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
+ if (!ns) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
+
+ if (ns->schema)
+ ns->schema->ns = NULL;
+
+ err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
+ rht_ns_params);
+ if (err) {
+ ns->schema->ns = ns;
+ goto out_unlock;
+ }
+
+ ioam6_ns_release(ns);
+
+out_unlock:
+ mutex_unlock(&nsdata->lock);
+ return err;
+}
+
+static int __ioam6_genl_dumpns_element(struct ioam6_namespace *ns,
+ u32 portid, u32 seq, u32 flags,
+ struct sk_buff *skb, u8 cmd)
+{
+ void *hdr;
+ u64 data;
+
+ hdr = genlmsg_put(skb, portid, seq, &ioam6_genl_family, flags, cmd);
+ if (!hdr)
+ return -ENOMEM;
+
+ data = be64_to_cpu(ns->data);
+
+ if (nla_put_u16(skb, IOAM6_ATTR_NS_ID, be16_to_cpu(ns->id)) ||
+ (data != IOAM6_EMPTY_FIELD_u64 &&
+ nla_put_u64_64bit(skb, IOAM6_ATTR_NS_DATA, data, IOAM6_ATTR_PAD)) ||
+ (ns->remove_tlv && nla_put_flag(skb, IOAM6_ATTR_NS_POP)) ||
+ (ns->schema && nla_put_u32(skb, IOAM6_ATTR_SC_ID, ns->schema->id)))
+ goto nla_put_failure;
+
+ genlmsg_end(skb, hdr);
+ return 0;
+
+nla_put_failure:
+ genlmsg_cancel(skb, hdr);
+ return -EMSGSIZE;
+}
+
+static int ioam6_genl_dumpns_start(struct netlink_callback *cb)
+{
+ struct net *net = sock_net(cb->skb->sk);
+ struct ioam6_pernet_data *nsdata;
+ struct rhashtable_iter *iter;
+
+ nsdata = ioam6_pernet(net);
+ iter = (struct rhashtable_iter *)cb->args[0];
+
+ if (!iter) {
+ iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+ if (!iter)
+ return -ENOMEM;
+
+ cb->args[0] = (long)iter;
+ }
+
+ rhashtable_walk_enter(&nsdata->namespaces, iter);
+
+ return 0;
+}
+
+static int ioam6_genl_dumpns_done(struct netlink_callback *cb)
+{
+ struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+
+ rhashtable_walk_exit(iter);
+ kfree(iter);
+
+ return 0;
+}
+
+static int ioam6_genl_dumpns(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+ struct ioam6_namespace *ns;
+ int err;
+
+ rhashtable_walk_start(iter);
+
+ for (;;) {
+ ns = rhashtable_walk_next(iter);
+
+ if (IS_ERR(ns)) {
+ if (PTR_ERR(ns) == -EAGAIN)
+ continue;
+ err = PTR_ERR(ns);
+ goto done;
+ } else if (!ns) {
+ break;
+ }
+
+ err = __ioam6_genl_dumpns_element(ns,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq,
+ NLM_F_MULTI,
+ skb,
+ IOAM6_CMD_DUMP_NAMESPACES);
+ if (err)
+ goto done;
+ }
+
+ err = skb->len;
+
+done:
+ rhashtable_walk_stop(iter);
+ return err;
+}
+
+static int ioam6_genl_addsc(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ioam6_pernet_data *nsdata;
+ struct ioam6_schema *sc;
+ int len, pad, err;
+ u32 sc_id;
+
+ if (!info->attrs[IOAM6_ATTR_SC_ID] || !info->attrs[IOAM6_ATTR_SC_DATA])
+ return -EINVAL;
+
+ sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
+ nsdata = ioam6_pernet(net);
+
+ mutex_lock(&nsdata->lock);
+
+ sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
+ if (sc) {
+ err = -EEXIST;
+ goto out_unlock;
+ }
+
+ sc = kzalloc(sizeof(*sc), GFP_KERNEL);
+ if (!sc) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+
+ len = nla_len(info->attrs[IOAM6_ATTR_SC_DATA]);
+ pad = (4 - (len % 4)) % 4;
+
+ sc->data = kzalloc(len + pad, GFP_KERNEL);
+ if (!sc->data) {
+ err = -ENOMEM;
+ goto free_sc;
+ }
+
+ sc->id = sc_id;
+ sc->len = len + pad;
+ sc->hdr = cpu_to_be32(sc->id | ((u8)(sc->len / 4) << 24));
+
+ nla_memcpy(sc->data, info->attrs[IOAM6_ATTR_SC_DATA], len);
+
+ err = rhashtable_lookup_insert_fast(&nsdata->schemas, &sc->head,
+ rht_sc_params);
+ if (err)
+ goto free_data;
+
+out_unlock:
+ mutex_unlock(&nsdata->lock);
+ return err;
+free_data:
+ kfree(sc->data);
+free_sc:
+ kfree(sc);
+ goto out_unlock;
+}
+
+static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ioam6_pernet_data *nsdata;
+ struct ioam6_schema *sc;
+ u32 sc_id;
+ int err;
+
+ if (!info->attrs[IOAM6_ATTR_SC_ID])
+ return -EINVAL;
+
+ sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
+ nsdata = ioam6_pernet(net);
+
+ mutex_lock(&nsdata->lock);
+
+ sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
+ if (!sc) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
+
+ if (sc->ns)
+ sc->ns->schema = NULL;
+
+ err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
+ rht_sc_params);
+ if (err) {
+ sc->ns->schema = sc;
+ goto out_unlock;
+ }
+
+ ioam6_sc_release(sc);
+
+out_unlock:
+ mutex_unlock(&nsdata->lock);
+ return err;
+}
+
+static int __ioam6_genl_dumpsc_element(struct ioam6_schema *sc,
+ u32 portid, u32 seq, u32 flags,
+ struct sk_buff *skb, u8 cmd)
+{
+ void *hdr;
+
+ hdr = genlmsg_put(skb, portid, seq, &ioam6_genl_family, flags, cmd);
+ if (!hdr)
+ return -ENOMEM;
+
+ if (nla_put_u32(skb, IOAM6_ATTR_SC_ID, sc->id) ||
+ nla_put(skb, IOAM6_ATTR_SC_DATA, sc->len, sc->data) ||
+ (sc->ns && nla_put_u16(skb, IOAM6_ATTR_NS_ID,
+ be16_to_cpu(sc->ns->id))))
+ goto nla_put_failure;
+
+ genlmsg_end(skb, hdr);
+ return 0;
+
+nla_put_failure:
+ genlmsg_cancel(skb, hdr);
+ return -EMSGSIZE;
+}
+
+static int ioam6_genl_dumpsc_start(struct netlink_callback *cb)
+{
+ struct net *net = sock_net(cb->skb->sk);
+ struct ioam6_pernet_data *nsdata;
+ struct rhashtable_iter *iter;
+
+ nsdata = ioam6_pernet(net);
+ iter = (struct rhashtable_iter *)cb->args[0];
+
+ if (!iter) {
+ iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+ if (!iter)
+ return -ENOMEM;
+
+ cb->args[0] = (long)iter;
+ }
+
+ rhashtable_walk_enter(&nsdata->schemas, iter);
+
+ return 0;
+}
+
+static int ioam6_genl_dumpsc_done(struct netlink_callback *cb)
+{
+ struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+
+ rhashtable_walk_exit(iter);
+ kfree(iter);
+
+ return 0;
+}
+
+static int ioam6_genl_dumpsc(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct rhashtable_iter *iter = (struct rhashtable_iter *)cb->args[0];
+ struct ioam6_schema *sc;
+ int err;
+
+ rhashtable_walk_start(iter);
+
+ for (;;) {
+ sc = rhashtable_walk_next(iter);
+
+ if (IS_ERR(sc)) {
+ if (PTR_ERR(sc) == -EAGAIN)
+ continue;
+ err = PTR_ERR(sc);
+ goto done;
+ } else if (!sc) {
+ break;
+ }
+
+ err = __ioam6_genl_dumpsc_element(sc,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq,
+ NLM_F_MULTI,
+ skb,
+ IOAM6_CMD_DUMP_SCHEMAS);
+ if (err)
+ goto done;
+ }
+
+ err = skb->len;
+
+done:
+ rhashtable_walk_stop(iter);
+ return err;
+}
+
+static int ioam6_genl_ns_set_schema(struct sk_buff *skb, struct genl_info *info)
+{
+ struct net *net = genl_info_net(info);
+ struct ioam6_pernet_data *nsdata;
+ struct ioam6_namespace *ns;
+ struct ioam6_schema *sc;
+ __be16 ns_id;
+ int err = 0;
+ u32 sc_id;
+
+ if (!info->attrs[IOAM6_ATTR_NS_ID] ||
+ (!info->attrs[IOAM6_ATTR_SC_ID] &&
+ !info->attrs[IOAM6_ATTR_SC_NONE]))
+ return -EINVAL;
+
+ ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
+ nsdata = ioam6_pernet(net);
+
+ mutex_lock(&nsdata->lock);
+
+ ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
+ if (!ns) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
+
+ if (info->attrs[IOAM6_ATTR_SC_NONE]) {
+ sc = NULL;
+ } else {
+ sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
+ sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id,
+ rht_sc_params);
+ if (!sc) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
+ }
+
+ if (ns->schema)
+ ns->schema->ns = NULL;
+ ns->schema = sc;
+
+ if (sc) {
+ if (sc->ns)
+ sc->ns->schema = NULL;
+ sc->ns = ns;
+ }
+
+out_unlock:
+ mutex_unlock(&nsdata->lock);
+ return err;
+}
+
+static const struct genl_ops ioam6_genl_ops[] = {
+ {
+ .cmd = IOAM6_CMD_ADD_NAMESPACE,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .doit = ioam6_genl_addns,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = IOAM6_CMD_DEL_NAMESPACE,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .doit = ioam6_genl_delns,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = IOAM6_CMD_DUMP_NAMESPACES,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .start = ioam6_genl_dumpns_start,
+ .dumpit = ioam6_genl_dumpns,
+ .done = ioam6_genl_dumpns_done,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = IOAM6_CMD_ADD_SCHEMA,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .doit = ioam6_genl_addsc,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = IOAM6_CMD_DEL_SCHEMA,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .doit = ioam6_genl_delsc,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = IOAM6_CMD_DUMP_SCHEMAS,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .start = ioam6_genl_dumpsc_start,
+ .dumpit = ioam6_genl_dumpsc,
+ .done = ioam6_genl_dumpsc_done,
+ .flags = GENL_ADMIN_PERM,
+ },
+ {
+ .cmd = IOAM6_CMD_NS_SET_SCHEMA,
+ .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
+ .doit = ioam6_genl_ns_set_schema,
+ .flags = GENL_ADMIN_PERM,
+ },
+};
+
+static struct genl_family ioam6_genl_family __ro_after_init = {
+ .hdrsize = 0,
+ .name = IOAM6_GENL_NAME,
+ .version = IOAM6_GENL_VERSION,
+ .maxattr = IOAM6_ATTR_MAX,
+ .policy = ioam6_genl_policy,
+ .netnsok = true,
+ .parallel_ops = true,
+ .ops = ioam6_genl_ops,
+ .n_ops = ARRAY_SIZE(ioam6_genl_ops),
+ .module = THIS_MODULE,
+};
+
struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
{
struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
@@ -311,16 +814,26 @@ static struct pernet_operations ioam6_net_ops = {
int __init ioam6_init(void)
{
- int err = register_pernet_subsys(&ioam6_net_ops);
+ int err = genl_register_family(&ioam6_genl_family);
+
+ if (err)
+ goto out;
+ err = register_pernet_subsys(&ioam6_net_ops);
if (err)
- return err;
+ goto out_unregister_genl;
pr_info("In-situ OAM (IOAM) with IPv6\n");
- return 0;
+
+out:
+ return err;
+out_unregister_genl:
+ genl_unregister_family(&ioam6_genl_family);
+ goto out;
}
void ioam6_exit(void)
{
unregister_pernet_subsys(&ioam6_net_ops);
+ genl_unregister_family(&ioam6_genl_family);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
2020-06-25 10:52 ` Dan Carpenter
@ 2020-06-25 10:52 ` Dan Carpenter
0 siblings, 0 replies; 42+ messages in thread
From: Dan Carpenter @ 2020-06-25 10:52 UTC (permalink / raw)
To: kbuild, Justin Iurman, netdev; +Cc: lkp, kbuild-all, davem, justin.iurman
[-- Attachment #1: Type: text/plain, Size: 6579 bytes --]
Hi Justin,
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: microblaze-randconfig-m031-20200624 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
net/ipv6/ioam6.c:164 ioam6_genl_delns() error: we previously assumed 'ns->schema' could be null (see line 158)
net/ipv6/ioam6.c:358 ioam6_genl_delsc() error: we previously assumed 'sc->ns' could be null (see line 352)
# https://github.com/0day-ci/linux/commit/ce303f2d7c40f84739505f1daa7dac53daa6c4c5
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout ce303f2d7c40f84739505f1daa7dac53daa6c4c5
vim +164 net/ipv6/ioam6.c
ce303f2d7c40f8 Justin Iurman 2020-06-24 135
ce303f2d7c40f8 Justin Iurman 2020-06-24 136 static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24 137 {
ce303f2d7c40f8 Justin Iurman 2020-06-24 138 struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24 139 struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24 140 struct ioam6_namespace *ns;
ce303f2d7c40f8 Justin Iurman 2020-06-24 141 __be16 ns_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24 142 int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 143
ce303f2d7c40f8 Justin Iurman 2020-06-24 144 if (!info->attrs[IOAM6_ATTR_NS_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24 145 return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 146
ce303f2d7c40f8 Justin Iurman 2020-06-24 147 ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
ce303f2d7c40f8 Justin Iurman 2020-06-24 148 nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24 149
ce303f2d7c40f8 Justin Iurman 2020-06-24 150 mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 151
ce303f2d7c40f8 Justin Iurman 2020-06-24 152 ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 153 if (!ns) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 154 err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24 155 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 156 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 157
ce303f2d7c40f8 Justin Iurman 2020-06-24 @158 if (ns->schema)
^^^^^^^^^^
Check for NULL
ce303f2d7c40f8 Justin Iurman 2020-06-24 159 ns->schema->ns = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 160
ce303f2d7c40f8 Justin Iurman 2020-06-24 161 err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24 162 rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 163 if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @164 ns->schema->ns = ns;
^^^^^^^^^^^^^^
Unchecked dereference.
ce303f2d7c40f8 Justin Iurman 2020-06-24 165 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 166 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 167
ce303f2d7c40f8 Justin Iurman 2020-06-24 168 ioam6_ns_release(ns);
ce303f2d7c40f8 Justin Iurman 2020-06-24 169
ce303f2d7c40f8 Justin Iurman 2020-06-24 170 out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24 171 mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 172 return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 173 }
[ snip ]
ce303f2d7c40f8 Justin Iurman 2020-06-24 330 static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24 331 {
ce303f2d7c40f8 Justin Iurman 2020-06-24 332 struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24 333 struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24 334 struct ioam6_schema *sc;
ce303f2d7c40f8 Justin Iurman 2020-06-24 335 u32 sc_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24 336 int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 337
ce303f2d7c40f8 Justin Iurman 2020-06-24 338 if (!info->attrs[IOAM6_ATTR_SC_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24 339 return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 340
ce303f2d7c40f8 Justin Iurman 2020-06-24 341 sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
ce303f2d7c40f8 Justin Iurman 2020-06-24 342 nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24 343
ce303f2d7c40f8 Justin Iurman 2020-06-24 344 mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 345
ce303f2d7c40f8 Justin Iurman 2020-06-24 346 sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 347 if (!sc) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 348 err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24 349 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 350 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 351
ce303f2d7c40f8 Justin Iurman 2020-06-24 @352 if (sc->ns)
^^^^^^
Check for NULL
ce303f2d7c40f8 Justin Iurman 2020-06-24 353 sc->ns->schema = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 354
ce303f2d7c40f8 Justin Iurman 2020-06-24 355 err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24 356 rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 357 if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @358 sc->ns->schema = sc;
^^^^^^^^^^^^^^
Unchecked dereference
ce303f2d7c40f8 Justin Iurman 2020-06-24 359 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 360 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 361
ce303f2d7c40f8 Justin Iurman 2020-06-24 362 ioam6_sc_release(sc);
ce303f2d7c40f8 Justin Iurman 2020-06-24 363
ce303f2d7c40f8 Justin Iurman 2020-06-24 364 out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24 365 mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 366 return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 367 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26285 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
@ 2020-06-25 10:52 ` Dan Carpenter
0 siblings, 0 replies; 42+ messages in thread
From: Dan Carpenter @ 2020-06-25 10:52 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 6698 bytes --]
Hi Justin,
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: microblaze-randconfig-m031-20200624 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
net/ipv6/ioam6.c:164 ioam6_genl_delns() error: we previously assumed 'ns->schema' could be null (see line 158)
net/ipv6/ioam6.c:358 ioam6_genl_delsc() error: we previously assumed 'sc->ns' could be null (see line 352)
# https://github.com/0day-ci/linux/commit/ce303f2d7c40f84739505f1daa7dac53daa6c4c5
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout ce303f2d7c40f84739505f1daa7dac53daa6c4c5
vim +164 net/ipv6/ioam6.c
ce303f2d7c40f8 Justin Iurman 2020-06-24 135
ce303f2d7c40f8 Justin Iurman 2020-06-24 136 static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24 137 {
ce303f2d7c40f8 Justin Iurman 2020-06-24 138 struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24 139 struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24 140 struct ioam6_namespace *ns;
ce303f2d7c40f8 Justin Iurman 2020-06-24 141 __be16 ns_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24 142 int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 143
ce303f2d7c40f8 Justin Iurman 2020-06-24 144 if (!info->attrs[IOAM6_ATTR_NS_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24 145 return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 146
ce303f2d7c40f8 Justin Iurman 2020-06-24 147 ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
ce303f2d7c40f8 Justin Iurman 2020-06-24 148 nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24 149
ce303f2d7c40f8 Justin Iurman 2020-06-24 150 mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 151
ce303f2d7c40f8 Justin Iurman 2020-06-24 152 ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 153 if (!ns) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 154 err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24 155 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 156 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 157
ce303f2d7c40f8 Justin Iurman 2020-06-24 @158 if (ns->schema)
^^^^^^^^^^
Check for NULL
ce303f2d7c40f8 Justin Iurman 2020-06-24 159 ns->schema->ns = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 160
ce303f2d7c40f8 Justin Iurman 2020-06-24 161 err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24 162 rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 163 if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @164 ns->schema->ns = ns;
^^^^^^^^^^^^^^
Unchecked dereference.
ce303f2d7c40f8 Justin Iurman 2020-06-24 165 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 166 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 167
ce303f2d7c40f8 Justin Iurman 2020-06-24 168 ioam6_ns_release(ns);
ce303f2d7c40f8 Justin Iurman 2020-06-24 169
ce303f2d7c40f8 Justin Iurman 2020-06-24 170 out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24 171 mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 172 return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 173 }
[ snip ]
ce303f2d7c40f8 Justin Iurman 2020-06-24 330 static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24 331 {
ce303f2d7c40f8 Justin Iurman 2020-06-24 332 struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24 333 struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24 334 struct ioam6_schema *sc;
ce303f2d7c40f8 Justin Iurman 2020-06-24 335 u32 sc_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24 336 int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 337
ce303f2d7c40f8 Justin Iurman 2020-06-24 338 if (!info->attrs[IOAM6_ATTR_SC_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24 339 return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 340
ce303f2d7c40f8 Justin Iurman 2020-06-24 341 sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
ce303f2d7c40f8 Justin Iurman 2020-06-24 342 nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24 343
ce303f2d7c40f8 Justin Iurman 2020-06-24 344 mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 345
ce303f2d7c40f8 Justin Iurman 2020-06-24 346 sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 347 if (!sc) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 348 err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24 349 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 350 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 351
ce303f2d7c40f8 Justin Iurman 2020-06-24 @352 if (sc->ns)
^^^^^^
Check for NULL
ce303f2d7c40f8 Justin Iurman 2020-06-24 353 sc->ns->schema = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 354
ce303f2d7c40f8 Justin Iurman 2020-06-24 355 err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24 356 rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 357 if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @358 sc->ns->schema = sc;
^^^^^^^^^^^^^^
Unchecked dereference
ce303f2d7c40f8 Justin Iurman 2020-06-24 359 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 360 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 361
ce303f2d7c40f8 Justin Iurman 2020-06-24 362 ioam6_sc_release(sc);
ce303f2d7c40f8 Justin Iurman 2020-06-24 363
ce303f2d7c40f8 Justin Iurman 2020-06-24 364 out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24 365 mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 366 return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 367 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26285 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM
@ 2020-06-25 10:52 ` Dan Carpenter
0 siblings, 0 replies; 42+ messages in thread
From: Dan Carpenter @ 2020-06-25 10:52 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 6698 bytes --]
Hi Justin,
url: https://github.com/0day-ci/linux/commits/Justin-Iurman/Data-plane-support-for-IOAM-Pre-allocated-Trace-with-IPv6/20200625-033536
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 0558c396040734bc1d361919566a581fd41aa539
config: microblaze-randconfig-m031-20200624 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
net/ipv6/ioam6.c:164 ioam6_genl_delns() error: we previously assumed 'ns->schema' could be null (see line 158)
net/ipv6/ioam6.c:358 ioam6_genl_delsc() error: we previously assumed 'sc->ns' could be null (see line 352)
# https://github.com/0day-ci/linux/commit/ce303f2d7c40f84739505f1daa7dac53daa6c4c5
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout ce303f2d7c40f84739505f1daa7dac53daa6c4c5
vim +164 net/ipv6/ioam6.c
ce303f2d7c40f8 Justin Iurman 2020-06-24 135
ce303f2d7c40f8 Justin Iurman 2020-06-24 136 static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24 137 {
ce303f2d7c40f8 Justin Iurman 2020-06-24 138 struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24 139 struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24 140 struct ioam6_namespace *ns;
ce303f2d7c40f8 Justin Iurman 2020-06-24 141 __be16 ns_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24 142 int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 143
ce303f2d7c40f8 Justin Iurman 2020-06-24 144 if (!info->attrs[IOAM6_ATTR_NS_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24 145 return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 146
ce303f2d7c40f8 Justin Iurman 2020-06-24 147 ns_id = cpu_to_be16(nla_get_u16(info->attrs[IOAM6_ATTR_NS_ID]));
ce303f2d7c40f8 Justin Iurman 2020-06-24 148 nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24 149
ce303f2d7c40f8 Justin Iurman 2020-06-24 150 mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 151
ce303f2d7c40f8 Justin Iurman 2020-06-24 152 ns = rhashtable_lookup_fast(&nsdata->namespaces, &ns_id, rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 153 if (!ns) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 154 err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24 155 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 156 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 157
ce303f2d7c40f8 Justin Iurman 2020-06-24 @158 if (ns->schema)
^^^^^^^^^^
Check for NULL
ce303f2d7c40f8 Justin Iurman 2020-06-24 159 ns->schema->ns = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 160
ce303f2d7c40f8 Justin Iurman 2020-06-24 161 err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24 162 rht_ns_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 163 if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @164 ns->schema->ns = ns;
^^^^^^^^^^^^^^
Unchecked dereference.
ce303f2d7c40f8 Justin Iurman 2020-06-24 165 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 166 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 167
ce303f2d7c40f8 Justin Iurman 2020-06-24 168 ioam6_ns_release(ns);
ce303f2d7c40f8 Justin Iurman 2020-06-24 169
ce303f2d7c40f8 Justin Iurman 2020-06-24 170 out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24 171 mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 172 return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 173 }
[ snip ]
ce303f2d7c40f8 Justin Iurman 2020-06-24 330 static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
ce303f2d7c40f8 Justin Iurman 2020-06-24 331 {
ce303f2d7c40f8 Justin Iurman 2020-06-24 332 struct net *net = genl_info_net(info);
ce303f2d7c40f8 Justin Iurman 2020-06-24 333 struct ioam6_pernet_data *nsdata;
ce303f2d7c40f8 Justin Iurman 2020-06-24 334 struct ioam6_schema *sc;
ce303f2d7c40f8 Justin Iurman 2020-06-24 335 u32 sc_id;
ce303f2d7c40f8 Justin Iurman 2020-06-24 336 int err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 337
ce303f2d7c40f8 Justin Iurman 2020-06-24 338 if (!info->attrs[IOAM6_ATTR_SC_ID])
ce303f2d7c40f8 Justin Iurman 2020-06-24 339 return -EINVAL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 340
ce303f2d7c40f8 Justin Iurman 2020-06-24 341 sc_id = nla_get_u32(info->attrs[IOAM6_ATTR_SC_ID]);
ce303f2d7c40f8 Justin Iurman 2020-06-24 342 nsdata = ioam6_pernet(net);
ce303f2d7c40f8 Justin Iurman 2020-06-24 343
ce303f2d7c40f8 Justin Iurman 2020-06-24 344 mutex_lock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 345
ce303f2d7c40f8 Justin Iurman 2020-06-24 346 sc = rhashtable_lookup_fast(&nsdata->schemas, &sc_id, rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 347 if (!sc) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 348 err = -ENOENT;
ce303f2d7c40f8 Justin Iurman 2020-06-24 349 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 350 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 351
ce303f2d7c40f8 Justin Iurman 2020-06-24 @352 if (sc->ns)
^^^^^^
Check for NULL
ce303f2d7c40f8 Justin Iurman 2020-06-24 353 sc->ns->schema = NULL;
ce303f2d7c40f8 Justin Iurman 2020-06-24 354
ce303f2d7c40f8 Justin Iurman 2020-06-24 355 err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
ce303f2d7c40f8 Justin Iurman 2020-06-24 356 rht_sc_params);
ce303f2d7c40f8 Justin Iurman 2020-06-24 357 if (err) {
ce303f2d7c40f8 Justin Iurman 2020-06-24 @358 sc->ns->schema = sc;
^^^^^^^^^^^^^^
Unchecked dereference
ce303f2d7c40f8 Justin Iurman 2020-06-24 359 goto out_unlock;
ce303f2d7c40f8 Justin Iurman 2020-06-24 360 }
ce303f2d7c40f8 Justin Iurman 2020-06-24 361
ce303f2d7c40f8 Justin Iurman 2020-06-24 362 ioam6_sc_release(sc);
ce303f2d7c40f8 Justin Iurman 2020-06-24 363
ce303f2d7c40f8 Justin Iurman 2020-06-24 364 out_unlock:
ce303f2d7c40f8 Justin Iurman 2020-06-24 365 mutex_unlock(&nsdata->lock);
ce303f2d7c40f8 Justin Iurman 2020-06-24 366 return err;
ce303f2d7c40f8 Justin Iurman 2020-06-24 367 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26285 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH net-next] Fix unchecked dereference
2020-06-25 10:52 ` Dan Carpenter
(?)
(?)
@ 2020-06-26 8:54 ` Justin Iurman
2020-06-26 16:01 ` Jakub Kicinski
-1 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 8:54 UTC (permalink / raw)
To: dan.carpenter; +Cc: kbuild, justin.iurman, netdev, lkp, kbuild-all, davem
If rhashtable_remove_fast returns an error, a rollback is applied. In
that case, an unchecked dereference has been fixed.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
net/ipv6/ioam6.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
index e414e915bf1e..f1347940245e 100644
--- a/net/ipv6/ioam6.c
+++ b/net/ipv6/ioam6.c
@@ -161,7 +161,8 @@ static int ioam6_genl_delns(struct sk_buff *skb, struct genl_info *info)
err = rhashtable_remove_fast(&nsdata->namespaces, &ns->head,
rht_ns_params);
if (err) {
- ns->schema->ns = ns;
+ if (ns->schema)
+ ns->schema->ns = ns;
goto out_unlock;
}
@@ -355,7 +356,8 @@ static int ioam6_genl_delsc(struct sk_buff *skb, struct genl_info *info)
err = rhashtable_remove_fast(&nsdata->schemas, &sc->head,
rht_sc_params);
if (err) {
- sc->ns->schema = sc;
+ if (sc->ns)
+ sc->ns->schema = sc;
goto out_unlock;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next] Fix unchecked dereference
2020-06-26 8:54 ` [PATCH net-next] Fix unchecked dereference Justin Iurman
@ 2020-06-26 16:01 ` Jakub Kicinski
0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-26 16:01 UTC (permalink / raw)
To: Justin Iurman; +Cc: dan.carpenter, kbuild, netdev, lkp, kbuild-all, davem
On Fri, 26 Jun 2020 10:54:35 +0200 Justin Iurman wrote:
> If rhashtable_remove_fast returns an error, a rollback is applied. In
> that case, an unchecked dereference has been fixed.
>
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
My bot says this doesn't apply to net-next, could you double-check?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next] Fix unchecked dereference
@ 2020-06-26 16:01 ` Jakub Kicinski
0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-26 16:01 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 423 bytes --]
On Fri, 26 Jun 2020 10:54:35 +0200 Justin Iurman wrote:
> If rhashtable_remove_fast returns an error, a rollback is applied. In
> that case, an unchecked dereference has been fixed.
>
> Reported-by: kernel test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
My bot says this doesn't apply to net-next, could you double-check?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next] Fix unchecked dereference
2020-06-26 16:01 ` Jakub Kicinski
(?)
@ 2020-06-26 17:23 ` Justin Iurman
2020-06-27 4:04 ` Jakub Kicinski
-1 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-26 17:23 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: dan carpenter, kbuild, netdev, lkp, kbuild-all, davem
Hi Jakub,
It is an inline modification of the patch 4 of this series. The modification in itself cannot be a problem. Maybe I did send it the wrong way?
Justin
>> If rhashtable_remove_fast returns an error, a rollback is applied. In
>> that case, an unchecked dereference has been fixed.
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>
> My bot says this doesn't apply to net-next, could you double-check?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next] Fix unchecked dereference
2020-06-26 17:23 ` Justin Iurman
@ 2020-06-27 4:04 ` Jakub Kicinski
0 siblings, 0 replies; 42+ messages in thread
From: Jakub Kicinski @ 2020-06-27 4:04 UTC (permalink / raw)
To: Justin Iurman; +Cc: dan carpenter, kbuild, netdev, lkp, kbuild-all, davem
On Fri, 26 Jun 2020 19:23:21 +0200 (CEST) Justin Iurman wrote:
> Hi Jakub,
>
> It is an inline modification of the patch 4 of this series. The
> modification in itself cannot be a problem. Maybe I did send it the
> wrong way?
Ah, sorry I didn't notice the threading. Please don't tag fixups like
this with [PATCH $tree], the series would need a revision.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
2020-06-24 19:23 [PATCH net-next 0/5] Data plane support for IOAM Pre-allocated Trace with IPv6 Justin Iurman
` (3 preceding siblings ...)
2020-06-24 19:23 ` [PATCH net-next 4/5] ipv6: ioam: Generic Netlink to configure IOAM Justin Iurman
@ 2020-06-24 19:23 ` Justin Iurman
2020-06-25 2:53 ` Tom Herbert
4 siblings, 1 reply; 42+ messages in thread
From: Justin Iurman @ 2020-06-24 19:23 UTC (permalink / raw)
To: netdev; +Cc: davem, justin.iurman
Add documentation for new IOAM sysctls:
- ioam6_id: a namespace sysctl
- ioam6_enabled and ioam6_id: two per-interface sysctls
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
---
Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
Documentation/networking/ip-sysctl.rst | 5 +++++
2 files changed, 25 insertions(+)
create mode 100644 Documentation/networking/ioam6-sysctl.rst
diff --git a/Documentation/networking/ioam6-sysctl.rst b/Documentation/networking/ioam6-sysctl.rst
new file mode 100644
index 000000000000..bad6c64907bc
--- /dev/null
+++ b/Documentation/networking/ioam6-sysctl.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+IOAM6 Sysfs variables
+=====================
+
+
+/proc/sys/net/conf/<iface>/ioam6_* variables:
+============================================
+
+ioam6_enabled - BOOL
+ Enable (accept) or disable (drop) IPv6 IOAM packets on this interface.
+
+ * 0 - disabled (default)
+ * not 0 - enabled
+
+ioam6_id - INTEGER
+ Define the IOAM id of this interface.
+
+ Default is 0.
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index b72f89d5694c..5ba11f2766bd 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1770,6 +1770,11 @@ nexthop_compat_mode - BOOLEAN
and extraneous notifications.
Default: true (backward compat mode)
+ioam6_id - INTEGER
+ Define the IOAM id of this node.
+
+ Default: 0
+
IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER
--
2.17.1
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
2020-06-24 19:23 ` [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls Justin Iurman
@ 2020-06-25 2:53 ` Tom Herbert
2020-06-25 18:00 ` Justin Iurman
0 siblings, 1 reply; 42+ messages in thread
From: Tom Herbert @ 2020-06-25 2:53 UTC (permalink / raw)
To: Justin Iurman; +Cc: Linux Kernel Network Developers, David S. Miller
On Wed, Jun 24, 2020 at 12:33 PM Justin Iurman <justin.iurman@uliege.be> wrote:
>
> Add documentation for new IOAM sysctls:
> - ioam6_id: a namespace sysctl
> - ioam6_enabled and ioam6_id: two per-interface sysctls
>
Are you planning add a more detailed description of the feature and
how to use it (would be nice I think :-) )
> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
> ---
> Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
> Documentation/networking/ip-sysctl.rst | 5 +++++
> 2 files changed, 25 insertions(+)
> create mode 100644 Documentation/networking/ioam6-sysctl.rst
>
> diff --git a/Documentation/networking/ioam6-sysctl.rst b/Documentation/networking/ioam6-sysctl.rst
> new file mode 100644
> index 000000000000..bad6c64907bc
> --- /dev/null
> +++ b/Documentation/networking/ioam6-sysctl.rst
> @@ -0,0 +1,20 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +IOAM6 Sysfs variables
> +=====================
> +
> +
> +/proc/sys/net/conf/<iface>/ioam6_* variables:
> +============================================
> +
> +ioam6_enabled - BOOL
> + Enable (accept) or disable (drop) IPv6 IOAM packets on this interface.
> +
> + * 0 - disabled (default)
> + * not 0 - enabled
> +
> +ioam6_id - INTEGER
> + Define the IOAM id of this interface.
> +
> + Default is 0.
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index b72f89d5694c..5ba11f2766bd 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1770,6 +1770,11 @@ nexthop_compat_mode - BOOLEAN
> and extraneous notifications.
> Default: true (backward compat mode)
>
> +ioam6_id - INTEGER
> + Define the IOAM id of this node.
> +
> + Default: 0
> +
> IPv6 Fragmentation:
>
> ip6frag_high_thresh - INTEGER
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH net-next 5/5] ipv6: ioam: Documentation for new IOAM sysctls
2020-06-25 2:53 ` Tom Herbert
@ 2020-06-25 18:00 ` Justin Iurman
0 siblings, 0 replies; 42+ messages in thread
From: Justin Iurman @ 2020-06-25 18:00 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers, David S. Miller
>> Add documentation for new IOAM sysctls:
>> - ioam6_id: a namespace sysctl
>> - ioam6_enabled and ioam6_id: two per-interface sysctls
>>
> Are you planning add a more detailed description of the feature and
> how to use it (would be nice I think :-) )
Of course, will do that ASAP!
Justin
>> Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
>> ---
>> Documentation/networking/ioam6-sysctl.rst | 20 ++++++++++++++++++++
>> Documentation/networking/ip-sysctl.rst | 5 +++++
>> 2 files changed, 25 insertions(+)
>> create mode 100644 Documentation/networking/ioam6-sysctl.rst
>>
>> diff --git a/Documentation/networking/ioam6-sysctl.rst
>> b/Documentation/networking/ioam6-sysctl.rst
>> new file mode 100644
>> index 000000000000..bad6c64907bc
>> --- /dev/null
>> +++ b/Documentation/networking/ioam6-sysctl.rst
>> @@ -0,0 +1,20 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +=====================
>> +IOAM6 Sysfs variables
>> +=====================
>> +
>> +
>> +/proc/sys/net/conf/<iface>/ioam6_* variables:
>> +============================================
>> +
>> +ioam6_enabled - BOOL
>> + Enable (accept) or disable (drop) IPv6 IOAM packets on this interface.
>> +
>> + * 0 - disabled (default)
>> + * not 0 - enabled
>> +
>> +ioam6_id - INTEGER
>> + Define the IOAM id of this interface.
>> +
>> + Default is 0.
>> diff --git a/Documentation/networking/ip-sysctl.rst
>> b/Documentation/networking/ip-sysctl.rst
>> index b72f89d5694c..5ba11f2766bd 100644
>> --- a/Documentation/networking/ip-sysctl.rst
>> +++ b/Documentation/networking/ip-sysctl.rst
>> @@ -1770,6 +1770,11 @@ nexthop_compat_mode - BOOLEAN
>> and extraneous notifications.
>> Default: true (backward compat mode)
>>
>> +ioam6_id - INTEGER
>> + Define the IOAM id of this node.
>> +
>> + Default: 0
>> +
>> IPv6 Fragmentation:
>>
>> ip6frag_high_thresh - INTEGER
>> --
>> 2.17.1
--
Justin Iurman
Université de Liège (ULg)
Bât. B28 Algorithmique des Grands Systèmes
Quartier Polytech 1
Allée de la Découverte 10
4000 Liège
Phone: +32 4 366 28 09
^ permalink raw reply [flat|nested] 42+ messages in thread