All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch RFC net-next 0/3] Fix traceroute in the presence of SRv6
@ 2021-12-01 16:32 Andrew Lunn
  2021-12-01 16:32 ` [patch RFC net-next 1/3] seg6: export get_srh() for ICMP handling Andrew Lunn
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Lunn @ 2021-12-01 16:32 UTC (permalink / raw)
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	Willem de Bruijn, James Prestwood, Justin Iurman,
	Praveen Chaudhary, Jason A . Donenfeld, Eric Dumazet, netdev,
	Andrew Lunn

RFC: This my first time working on SRv6 and ICMP. Comments very welcome.

When using SRv6 the destination IP address in the IPv6 header is not
always the true destination, it can be a router along the path that
SRv6 is using.

When ICMP reports an error, e.g, time exceeded, which is what
traceroute uses, it included the packet which invoked the error into
the ICMP message body. Upon receiving such an ICMP packet, the
invoking packet is examined and an attempt is made to find the socket
which sent the packet, so the error can be reported. Lookup is
performed using the source and destination address. If the
intermediary router IP address from the IP header is used, the lookup
fails. It is necessary to dig into the header and find the true
destination address in the Segement Router header, SRH.

Patch 1 exports a helper which can find the SRH in a packet
Patch 2 does the actual examination of the invoking packet
Patch 3 makes use of the results when trying to find the socket.

Andrew Lunn (3):
  seg6: export get_srh() for ICMP handling
  icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  udp6: Use Segment Routing Header for dest address if present

 include/linux/ipv6.h  |  2 ++
 include/net/seg6.h    |  1 +
 net/ipv6/icmp.c       | 36 +++++++++++++++++++++++++++++++++++-
 net/ipv6/seg6.c       | 29 +++++++++++++++++++++++++++++
 net/ipv6/seg6_local.c | 33 ++-------------------------------
 net/ipv6/udp.c        |  7 +++++++
 6 files changed, 76 insertions(+), 32 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch RFC net-next 1/3] seg6: export get_srh() for ICMP handling
  2021-12-01 16:32 [patch RFC net-next 0/3] Fix traceroute in the presence of SRv6 Andrew Lunn
@ 2021-12-01 16:32 ` Andrew Lunn
  2021-12-01 16:32 ` [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers Andrew Lunn
  2021-12-01 16:32 ` [patch RFC net-next 3/3] udp6: Use Segment Routing Header for dest address if present Andrew Lunn
  2 siblings, 0 replies; 9+ messages in thread
From: Andrew Lunn @ 2021-12-01 16:32 UTC (permalink / raw)
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	Willem de Bruijn, James Prestwood, Justin Iurman,
	Praveen Chaudhary, Jason A . Donenfeld, Eric Dumazet, netdev,
	Andrew Lunn

An ICMP error message can contain in its message body part of an IPv6
packet which invoked the error. Such a packet might contain a segment
router header. Export get_srh() so the ICMP code can make use of it.

Since his changes the scope of the function from local to global, add
the seg6_ prefix to keep the namespace clean. And move it into seg6.c
so it is always available, not just when IPV6_SEG6_LWTUNNEL is
enabled.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 include/net/seg6.h    |  1 +
 net/ipv6/seg6.c       | 29 +++++++++++++++++++++++++++++
 net/ipv6/seg6_local.c | 33 ++-------------------------------
 3 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 9d19c15e8545..da85ebc5ae99 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -58,6 +58,7 @@ extern int seg6_local_init(void);
 extern void seg6_local_exit(void);
 
 extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len, bool reduced);
+struct ipv6_sr_hdr *seg6_get_srh(struct sk_buff *skb, int flags);
 extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
 			     int proto);
 extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index a8b5784afb1a..5bc9bf892199 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -75,6 +75,35 @@ bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len, bool reduced)
 	return true;
 }
 
+struct ipv6_sr_hdr *seg6_get_srh(struct sk_buff *skb, int flags)
+{
+	struct ipv6_sr_hdr *srh;
+	int len, srhoff = 0;
+
+	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, &flags) < 0)
+		return NULL;
+
+	if (!pskb_may_pull(skb, srhoff + sizeof(*srh)))
+		return NULL;
+
+	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+
+	len = (srh->hdrlen + 1) << 3;
+
+	if (!pskb_may_pull(skb, srhoff + len))
+		return NULL;
+
+	/* note that pskb_may_pull may change pointers in header;
+	 * for this reason it is necessary to reload them when needed.
+	 */
+	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+
+	if (!seg6_validate_srh(srh, len, true))
+		return NULL;
+
+	return srh;
+}
+
 static struct genl_family seg6_genl_family;
 
 static const struct nla_policy seg6_genl_policy[SEG6_ATTR_MAX + 1] = {
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 2dc40b3f373e..ef88489c71f5 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -150,40 +150,11 @@ static struct seg6_local_lwt *seg6_local_lwtunnel(struct lwtunnel_state *lwt)
 	return (struct seg6_local_lwt *)lwt->data;
 }
 
-static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb, int flags)
-{
-	struct ipv6_sr_hdr *srh;
-	int len, srhoff = 0;
-
-	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, &flags) < 0)
-		return NULL;
-
-	if (!pskb_may_pull(skb, srhoff + sizeof(*srh)))
-		return NULL;
-
-	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
-
-	len = (srh->hdrlen + 1) << 3;
-
-	if (!pskb_may_pull(skb, srhoff + len))
-		return NULL;
-
-	/* note that pskb_may_pull may change pointers in header;
-	 * for this reason it is necessary to reload them when needed.
-	 */
-	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
-
-	if (!seg6_validate_srh(srh, len, true))
-		return NULL;
-
-	return srh;
-}
-
 static struct ipv6_sr_hdr *get_and_validate_srh(struct sk_buff *skb)
 {
 	struct ipv6_sr_hdr *srh;
 
-	srh = get_srh(skb, IP6_FH_F_SKIP_RH);
+	srh = seg6_get_srh(skb, IP6_FH_F_SKIP_RH);
 	if (!srh)
 		return NULL;
 
@@ -200,7 +171,7 @@ static bool decap_and_validate(struct sk_buff *skb, int proto)
 	struct ipv6_sr_hdr *srh;
 	unsigned int off = 0;
 
-	srh = get_srh(skb, 0);
+	srh = seg6_get_srh(skb, 0);
 	if (srh && srh->segments_left > 0)
 		return false;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  2021-12-01 16:32 [patch RFC net-next 0/3] Fix traceroute in the presence of SRv6 Andrew Lunn
  2021-12-01 16:32 ` [patch RFC net-next 1/3] seg6: export get_srh() for ICMP handling Andrew Lunn
@ 2021-12-01 16:32 ` Andrew Lunn
  2021-12-01 17:33   ` Willem de Bruijn
  2021-12-01 16:32 ` [patch RFC net-next 3/3] udp6: Use Segment Routing Header for dest address if present Andrew Lunn
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Lunn @ 2021-12-01 16:32 UTC (permalink / raw)
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	Willem de Bruijn, James Prestwood, Justin Iurman,
	Praveen Chaudhary, Jason A . Donenfeld, Eric Dumazet, netdev,
	Andrew Lunn

RFC8754 says:

ICMP error packets generated within the SR domain are sent to source
nodes within the SR domain.  The invoking packet in the ICMP error
message may contain an SRH.  Since the destination address of a packet
with an SRH changes as each segment is processed, it may not be the
destination used by the socket or application that generated the
invoking packet.

For the source of an invoking packet to process the ICMP error
message, the ultimate destination address of the IPv6 header may be
required.  The following logic is used to determine the destination
address for use by protocol-error handlers.

*  Walk all extension headers of the invoking IPv6 packet to the
   routing extension header preceding the upper-layer header.

   -  If routing header is type 4 Segment Routing Header (SRH)

      o  The SID at Segment List[0] may be used as the destination
         address of the invoking packet.

Clone the skb and modify the header offset to give a new skb which
contains the invoking packet. The seg6 helpers can then be used on the
skb to find any segment routing headers. If found, mark this fact in
the IPv6 control block of the skb, and store the offset into the
packet of the SRH.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 include/linux/ipv6.h |  2 ++
 net/ipv6/icmp.c      | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 20c1f968da7c..d8ab5022d397 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -133,6 +133,7 @@ struct inet6_skb_parm {
 	__u16			dsthao;
 #endif
 	__u16			frag_max_size;
+	__u16			srhoff;
 
 #define IP6SKB_XFRM_TRANSFORMED	1
 #define IP6SKB_FORWARDED	2
@@ -142,6 +143,7 @@ struct inet6_skb_parm {
 #define IP6SKB_HOPBYHOP        32
 #define IP6SKB_L3SLAVE         64
 #define IP6SKB_JUMBOGRAM      128
+#define IP6SKB_SEG6	      512
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index a7c31ab67c5d..315787b79f29 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -57,6 +57,7 @@
 #include <net/protocol.h>
 #include <net/raw.h>
 #include <net/rawv6.h>
+#include <net/seg6.h>
 #include <net/transp_v6.h>
 #include <net/ip6_route.h>
 #include <net/addrconf.h>
@@ -818,9 +819,40 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
 	local_bh_enable();
 }
 
+/* Determine if the invoking packet contains a segment routing header.
+ * If it does, extract the true destination address, which is the
+ * first segment address
+ */
+static void icmpv6_notify_srh(struct sk_buff *skb, struct inet6_skb_parm *opt)
+{
+	struct sk_buff *skb_orig;
+	struct ipv6_sr_hdr *srh;
+
+	skb_orig = skb_clone(skb, GFP_ATOMIC);
+	if (!skb_orig)
+		return;
+
+	skb_dst_drop(skb_orig);
+	skb_reset_network_header(skb_orig);
+
+	srh = seg6_get_srh(skb_orig, 0);
+	if (!srh)
+		goto out;
+
+	if (srh->type != IPV6_SRCRT_TYPE_4)
+		goto out;
+
+	opt->flags |= IP6SKB_SEG6;
+	opt->srhoff = (unsigned char *)srh - skb->data;
+
+out:
+	kfree_skb(skb_orig);
+}
+
 void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
 {
 	const struct inet6_protocol *ipprot;
+	struct inet6_skb_parm *opt = IP6CB(skb);
 	int inner_offset;
 	__be16 frag_off;
 	u8 nexthdr;
@@ -829,6 +861,8 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
 	if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
 		goto out;
 
+	icmpv6_notify_srh(skb, opt);
+
 	nexthdr = ((struct ipv6hdr *)skb->data)->nexthdr;
 	if (ipv6_ext_hdr(nexthdr)) {
 		/* now skip over extension headers */
@@ -853,7 +887,7 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
 
 	ipprot = rcu_dereference(inet6_protos[nexthdr]);
 	if (ipprot && ipprot->err_handler)
-		ipprot->err_handler(skb, NULL, type, code, inner_offset, info);
+		ipprot->err_handler(skb, opt, type, code, inner_offset, info);
 
 	raw6_icmp_error(skb, nexthdr, type, code, inner_offset, info);
 	return;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [patch RFC net-next 3/3] udp6: Use Segment Routing Header for dest address if present
  2021-12-01 16:32 [patch RFC net-next 0/3] Fix traceroute in the presence of SRv6 Andrew Lunn
  2021-12-01 16:32 ` [patch RFC net-next 1/3] seg6: export get_srh() for ICMP handling Andrew Lunn
  2021-12-01 16:32 ` [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers Andrew Lunn
@ 2021-12-01 16:32 ` Andrew Lunn
  2 siblings, 0 replies; 9+ messages in thread
From: Andrew Lunn @ 2021-12-01 16:32 UTC (permalink / raw)
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	Willem de Bruijn, James Prestwood, Justin Iurman,
	Praveen Chaudhary, Jason A . Donenfeld, Eric Dumazet, netdev,
	Andrew Lunn

When finding the socket to report an error on, if the invoking packet
is using Segment Routing, the IPv6 destination address is that of an
intermediate router, not the end destination. Extract the ultimate
destination address from the segment address.

This change allows traceroute to function in the presence of Segment
Routing.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 net/ipv6/udp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6a0e569f0bb8..6a2288e7ddda 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -40,6 +40,7 @@
 #include <net/transp_v6.h>
 #include <net/ip6_route.h>
 #include <net/raw.h>
+#include <net/seg6.h>
 #include <net/tcp_states.h>
 #include <net/ip6_checksum.h>
 #include <net/ip6_tunnel.h>
@@ -563,12 +564,18 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	const struct in6_addr *saddr = &hdr->saddr;
 	const struct in6_addr *daddr = &hdr->daddr;
 	struct udphdr *uh = (struct udphdr *)(skb->data+offset);
+	struct ipv6_sr_hdr *srh;
 	bool tunnel = false;
 	struct sock *sk;
 	int harderr;
 	int err;
 	struct net *net = dev_net(skb->dev);
 
+	if (opt->flags & IP6SKB_SEG6) {
+		srh = (struct ipv6_sr_hdr *)(skb->data + opt->srhoff);
+		daddr = &srh->segments[0];
+	}
+
 	sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
 			       inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  2021-12-01 16:32 ` [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers Andrew Lunn
@ 2021-12-01 17:33   ` Willem de Bruijn
  2021-12-01 18:10     ` Andrew Lunn
  0 siblings, 1 reply; 9+ messages in thread
From: Willem de Bruijn @ 2021-12-01 17:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	James Prestwood, Justin Iurman, Praveen Chaudhary,
	Jason A . Donenfeld, Eric Dumazet, netdev

>  include/linux/ipv6.h |  2 ++
>  net/ipv6/icmp.c      | 36 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 20c1f968da7c..d8ab5022d397 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -133,6 +133,7 @@ struct inet6_skb_parm {
>         __u16                   dsthao;
>  #endif
>         __u16                   frag_max_size;
> +       __u16                   srhoff;

Out of scope for this patch, but I guess we could use a

BUILD_BUG_ON(sizeof(struct inet6_skb_parm) > sizeof_field(struct sk_buff, cb));

>
>  #define IP6SKB_XFRM_TRANSFORMED        1
>  #define IP6SKB_FORWARDED       2
> @@ -142,6 +143,7 @@ struct inet6_skb_parm {
>  #define IP6SKB_HOPBYHOP        32
>  #define IP6SKB_L3SLAVE         64
>  #define IP6SKB_JUMBOGRAM      128
> +#define IP6SKB_SEG6          512

256?

>  };
>
>  #if defined(CONFIG_NET_L3_MASTER_DEV)
> diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
> index a7c31ab67c5d..315787b79f29 100644
> --- a/net/ipv6/icmp.c
> +++ b/net/ipv6/icmp.c
> @@ -57,6 +57,7 @@
>  #include <net/protocol.h>
>  #include <net/raw.h>
>  #include <net/rawv6.h>
> +#include <net/seg6.h>
>  #include <net/transp_v6.h>
>  #include <net/ip6_route.h>
>  #include <net/addrconf.h>
> @@ -818,9 +819,40 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
>         local_bh_enable();
>  }
>
> +/* Determine if the invoking packet contains a segment routing header.
> + * If it does, extract the true destination address, which is the
> + * first segment address
> + */
> +static void icmpv6_notify_srh(struct sk_buff *skb, struct inet6_skb_parm *opt)
> +{
> +       struct sk_buff *skb_orig;
> +       struct ipv6_sr_hdr *srh;
> +
> +       skb_orig = skb_clone(skb, GFP_ATOMIC);
> +       if (!skb_orig)
> +               return;

Is this to be allowed to write to skb->cb? Or because seg6_get_srh
calls pskb_may_pull to parse the headers?

It is unlikely (not impossible) in this path for the packet to be
shared or cloned. Avoid this operation when it isn't? Most packets
will not actually have segment routing, so this imposes significant
cost on the common case (if in the not common ICMP processing path).

nit: I found the name skb_orig confusing, as it is not in the meaning
of preserve the original skb as at function entry.

> +       skb_dst_drop(skb_orig);
> +       skb_reset_network_header(skb_orig);
> +
> +       srh = seg6_get_srh(skb_orig, 0);
> +       if (!srh)
> +               goto out;
> +
> +       if (srh->type != IPV6_SRCRT_TYPE_4)
> +               goto out;
> +
> +       opt->flags |= IP6SKB_SEG6;
> +       opt->srhoff = (unsigned char *)srh - skb->data;

Should this offset be against skb->head, in case other data move
operations could occur?

Also, what happens if the header was in a frags that was pulled by
pskb_may_pull in seg6_get_srh.

If we can expect headers to exist in the linear segment, then perhaps
the whole code can be simplified and the clone can be avoided.

> +
> +out:
> +       kfree_skb(skb_orig);
> +}
> +
>  void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
>  {
>         const struct inet6_protocol *ipprot;
> +       struct inet6_skb_parm *opt = IP6CB(skb);
>         int inner_offset;
>         __be16 frag_off;
>         u8 nexthdr;
> @@ -829,6 +861,8 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
>         if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
>                 goto out;
>
> +       icmpv6_notify_srh(skb, opt);
> +
>         nexthdr = ((struct ipv6hdr *)skb->data)->nexthdr;
>         if (ipv6_ext_hdr(nexthdr)) {
>                 /* now skip over extension headers */
> @@ -853,7 +887,7 @@ void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
>
>         ipprot = rcu_dereference(inet6_protos[nexthdr]);
>         if (ipprot && ipprot->err_handler)
> -               ipprot->err_handler(skb, NULL, type, code, inner_offset, info);
> +               ipprot->err_handler(skb, opt, type, code, inner_offset, info);
>
>         raw6_icmp_error(skb, nexthdr, type, code, inner_offset, info);
>         return;
> --
> 2.33.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  2021-12-01 17:33   ` Willem de Bruijn
@ 2021-12-01 18:10     ` Andrew Lunn
  2021-12-01 18:22       ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Lunn @ 2021-12-01 18:10 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	James Prestwood, Justin Iurman, Praveen Chaudhary,
	Jason A . Donenfeld, Eric Dumazet, netdev

On Wed, Dec 01, 2021 at 09:33:32AM -0800, Willem de Bruijn wrote:
> >  include/linux/ipv6.h |  2 ++
> >  net/ipv6/icmp.c      | 36 +++++++++++++++++++++++++++++++++++-
> >  2 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> > index 20c1f968da7c..d8ab5022d397 100644
> > --- a/include/linux/ipv6.h
> > +++ b/include/linux/ipv6.h
> > @@ -133,6 +133,7 @@ struct inet6_skb_parm {
> >         __u16                   dsthao;
> >  #endif
> >         __u16                   frag_max_size;
> > +       __u16                   srhoff;
> 
> Out of scope for this patch, but I guess we could use a
> 
> BUILD_BUG_ON(sizeof(struct inet6_skb_parm) > sizeof_field(struct sk_buff, cb));
 
There is something like that already. I triggered a BUILD_BUG_ON
failure when i put the actual IPv6 destination address here, rather
than an offset to it.

> >  #define IP6SKB_XFRM_TRANSFORMED        1
> >  #define IP6SKB_FORWARDED       2
> > @@ -142,6 +143,7 @@ struct inet6_skb_parm {
> >  #define IP6SKB_HOPBYHOP        32
> >  #define IP6SKB_L3SLAVE         64
> >  #define IP6SKB_JUMBOGRAM      128
> > +#define IP6SKB_SEG6          512
> 
> 256?

Doh!

> > +static void icmpv6_notify_srh(struct sk_buff *skb, struct inet6_skb_parm *opt)
> > +{
> > +       struct sk_buff *skb_orig;
> > +       struct ipv6_sr_hdr *srh;
> > +
> > +       skb_orig = skb_clone(skb, GFP_ATOMIC);
> > +       if (!skb_orig)
> > +               return;
> 
> Is this to be allowed to write to skb->cb? Or because seg6_get_srh
> calls pskb_may_pull to parse the headers?

This is an ICMP error message. So we have an IP packet, skb, which
contains in the message body the IP packet which invoked the error. If
we pass skb to seg6_get_srh() it will look in the received ICMP
packet. But we actually want to find the SRH in the packet which
invoked the error, the one which is in the message body. So the code
makes a clone of the skb, and then updates the pointers so that it
points to the invoking packet within the ICMP packet. Then we can use
seg6_get_srh() on this inner packet, since it just looks like an
ordinary IP packet.

> It is unlikely (not impossible) in this path for the packet to be
> shared or cloned. Avoid this operation when it isn't? Most packets
> will not actually have segment routing, so this imposes significant
> cost on the common case (if in the not common ICMP processing path).
> 
> nit: I found the name skb_orig confusing, as it is not in the meaning
> of preserve the original skb as at function entry.

skb_invoking? That seems to be the ICMP terminology?

> > +       skb_dst_drop(skb_orig);
> > +       skb_reset_network_header(skb_orig);
> > +
> > +       srh = seg6_get_srh(skb_orig, 0);
> > +       if (!srh)
> > +               goto out;
> > +
> > +       if (srh->type != IPV6_SRCRT_TYPE_4)
> > +               goto out;
> > +
> > +       opt->flags |= IP6SKB_SEG6;
> > +       opt->srhoff = (unsigned char *)srh - skb->data;
> 
> Should this offset be against skb->head, in case other data move
> operations could occur?

I copied the idea from get_srh(). It does:

srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);

So i'm just undoing it.

> Also, what happens if the header was in a frags that was pulled by
> pskb_may_pull in seg6_get_srh.

Yes, i checked that. Because the skb has been cloned, if it needs to
rearrange the packet because it goes over a fragment boundary,
pskb_may_pull() will return false. And then we won't find the
SRH. Nothing bad happens, traceroute is till broken as before.  What
is a typical fragment size? We basically need a MAC header, IPv6
header, ICMP Header and another IP header. 14 + 40 + 8 + 40. Plus the
SRH headers. So if 128 byte fragments are being used, then yes, it
could be an issue. But is that realistic? It seems more likely 1K, 2K
or 4K fragments are used?

> If we can expect headers to exist in the linear segment, then perhaps
> the whole code can be simplified and the clone can be avoided.

It will require seg6_get_srh() to be re-written so that you can tell
it to look at a nested IP header. Which actually means ipv6_find_hdr()
needs re-writing. Things like the helper ipv6_hdr(skb) point to the
ICMP packet IP header, not the invoking IP packet header inside the
ICMP packet. I didn't like the idea of such a rewrite.

	Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  2021-12-01 18:10     ` Andrew Lunn
@ 2021-12-01 18:22       ` Willem de Bruijn
  2021-12-01 19:03         ` Andrew Lunn
  0 siblings, 1 reply; 9+ messages in thread
From: Willem de Bruijn @ 2021-12-01 18:22 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	James Prestwood, Justin Iurman, Praveen Chaudhary,
	Jason A . Donenfeld, Eric Dumazet, netdev

> > > +static void icmpv6_notify_srh(struct sk_buff *skb, struct inet6_skb_parm *opt)
> > > +{
> > > +       struct sk_buff *skb_orig;
> > > +       struct ipv6_sr_hdr *srh;
> > > +
> > > +       skb_orig = skb_clone(skb, GFP_ATOMIC);
> > > +       if (!skb_orig)
> > > +               return;
> >
> > Is this to be allowed to write to skb->cb? Or because seg6_get_srh
> > calls pskb_may_pull to parse the headers?
>
> This is an ICMP error message. So we have an IP packet, skb, which
> contains in the message body the IP packet which invoked the error. If
> we pass skb to seg6_get_srh() it will look in the received ICMP
> packet. But we actually want to find the SRH in the packet which
> invoked the error, the one which is in the message body. So the code
> makes a clone of the skb, and then updates the pointers so that it
> points to the invoking packet within the ICMP packet. Then we can use
> seg6_get_srh() on this inner packet, since it just looks like an
> ordinary IP packet.

Ah of course. I clearly did not appreciate the importance of that
skb_reset_network_header.

> > It is unlikely (not impossible) in this path for the packet to be
> > shared or cloned. Avoid this operation when it isn't? Most packets
> > will not actually have segment routing, so this imposes significant
> > cost on the common case (if in the not common ICMP processing path).
> >
> > nit: I found the name skb_orig confusing, as it is not in the meaning
> > of preserve the original skb as at function entry.
>
> skb_invoking? That seems to be the ICMP terminology?

Sounds good, thanks.

> > > +       skb_dst_drop(skb_orig);
> > > +       skb_reset_network_header(skb_orig);
> > > +
> > > +       srh = seg6_get_srh(skb_orig, 0);
> > > +       if (!srh)
> > > +               goto out;
> > > +
> > > +       if (srh->type != IPV6_SRCRT_TYPE_4)
> > > +               goto out;
> > > +
> > > +       opt->flags |= IP6SKB_SEG6;
> > > +       opt->srhoff = (unsigned char *)srh - skb->data;
> >
> > Should this offset be against skb->head, in case other data move
> > operations could occur?
>
> I copied the idea from get_srh(). It does:
>
> srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>
> So i'm just undoing it.
>
> > Also, what happens if the header was in a frags that was pulled by
> > pskb_may_pull in seg6_get_srh.
>
> Yes, i checked that. Because the skb has been cloned, if it needs to
> rearrange the packet because it goes over a fragment boundary,
> pskb_may_pull() will return false. And then we won't find the
> SRH.

Great. So the feature only works if the SRH is in the linear header.

Then if the packet is not shared, you can just temporarily reset the
network header and revert it after?

> Nothing bad happens, traceroute is till broken as before.  What
> is a typical fragment size?

The question here is not the size in frags[], but that of the linear
section. This is really device driver and mtu specific. For many
devices and 1500 B mtu, the entire packet in linear seems quite
likely.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  2021-12-01 18:22       ` Willem de Bruijn
@ 2021-12-01 19:03         ` Andrew Lunn
  2021-12-01 19:19           ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Lunn @ 2021-12-01 19:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Jakub Kicinski, Hideaki YOSHIFUJI, David Ahern,
	James Prestwood, Justin Iurman, Praveen Chaudhary,
	Jason A . Donenfeld, Eric Dumazet, netdev

On Wed, Dec 01, 2021 at 10:22:38AM -0800, Willem de Bruijn wrote:
> > > > +static void icmpv6_notify_srh(struct sk_buff *skb, struct inet6_skb_parm *opt)
> > > > +{
> > > > +       struct sk_buff *skb_orig;
> > > > +       struct ipv6_sr_hdr *srh;
> > > > +
> > > > +       skb_orig = skb_clone(skb, GFP_ATOMIC);
> > > > +       if (!skb_orig)
> > > > +               return;
> > >
> > > Is this to be allowed to write to skb->cb? Or because seg6_get_srh
> > > calls pskb_may_pull to parse the headers?
> >
> > This is an ICMP error message. So we have an IP packet, skb, which
> > contains in the message body the IP packet which invoked the error. If
> > we pass skb to seg6_get_srh() it will look in the received ICMP
> > packet. But we actually want to find the SRH in the packet which
> > invoked the error, the one which is in the message body. So the code
> > makes a clone of the skb, and then updates the pointers so that it
> > points to the invoking packet within the ICMP packet. Then we can use
> > seg6_get_srh() on this inner packet, since it just looks like an
> > ordinary IP packet.
> 
> Ah of course. I clearly did not appreciate the importance of that
> skb_reset_network_header.

So i should probably add a comment here. If we stick with this design.

> > Yes, i checked that. Because the skb has been cloned, if it needs to
> > rearrange the packet because it goes over a fragment boundary,
> > pskb_may_pull() will return false. And then we won't find the
> > SRH.
> 
> Great. So the feature only works if the SRH is in the linear header.

Yes, traceroute will remain broken if the invoking SRH header is not
in the linear header.

> Then if the packet is not shared, you can just temporarily reset the
> network header and revert it after?

Maybe. I was worried about any side affects of such an
operation. Working on a clone seemed a lot less risky.

Is it safe to due such games with the network header?

	Andrew

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  2021-12-01 19:03         ` Andrew Lunn
@ 2021-12-01 19:19           ` Willem de Bruijn
  0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2021-12-01 19:19 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Willem de Bruijn, David Miller, Jakub Kicinski,
	Hideaki YOSHIFUJI, David Ahern, James Prestwood, Justin Iurman,
	Praveen Chaudhary, Jason A . Donenfeld, Eric Dumazet, netdev

> > Then if the packet is not shared, you can just temporarily reset the
> > network header and revert it after?
>
> Maybe. I was worried about any side affects of such an
> operation. Working on a clone seemed a lot less risky.
>
> Is it safe to due such games with the network header?

As long as nothing else is accessing the skb, so only if it is not shared.

Packet sockets do similar temporary modifications, for one example.
See drop_n_restore in packet_rcv.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-01 19:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 16:32 [patch RFC net-next 0/3] Fix traceroute in the presence of SRv6 Andrew Lunn
2021-12-01 16:32 ` [patch RFC net-next 1/3] seg6: export get_srh() for ICMP handling Andrew Lunn
2021-12-01 16:32 ` [patch RFC net-next 2/3] icmp: ICMPV6: Examine invoking packet for Segment Route Headers Andrew Lunn
2021-12-01 17:33   ` Willem de Bruijn
2021-12-01 18:10     ` Andrew Lunn
2021-12-01 18:22       ` Willem de Bruijn
2021-12-01 19:03         ` Andrew Lunn
2021-12-01 19:19           ` Willem de Bruijn
2021-12-01 16:32 ` [patch RFC net-next 3/3] udp6: Use Segment Routing Header for dest address if present Andrew Lunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.