netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
@ 2022-04-20 23:21 Guillaume Nault
  2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, David Ahern, dccp

RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
It's stored in a normally unused bit of the ->flowi4_tos field, in
struct flowi4. However it has several problems:

 * This bit is also used by ECN. Although ECN bits are supposed to be
   cleared before doing a route lookup, it happened that some code
   paths didn't properly sanitise their ->flowi4_tos. So this mechanism
   is fragile and we had bugs in the past where ECN bits slipped in and
   could end up being erroneously interpreted as RTO_ONLINK.

 * A dscp_t type was recently introduced to ensure ECN bits are cleared
   during route lookups. ->flowi4_tos is the most important structure
   field to convert, but RTO_ONLINK prevents such conversion, as dscp_t
   mandates that ECN bits (where RTO_ONLINK is stored) be zero.

Therefore we need to stop using RTO_ONLINK altogether. Fortunately
RTO_ONLINK isn't a necessity. Instead of passing a flag in ->flowi4_tos
to tell the route lookup function to restrict the scope, we can simply
initialise the scope correctly.

Patch 1 does some preparatory work: it stops resetting ->flowi4_scope
automatically before a route lookup, thus allowing callers to set their
desired scope without having to rely on the RTO_ONLINK flag.

Patch 2-3 convert a few code paths to avoid relying on RTO_ONLINK.

More conversions will have to take place before we can eventually
remove this flag.

Guillaume Nault (3):
  ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
  ipv4: Avoid using RTO_ONLINK with ip_route_connect().
  ipv4: Initialise ->flowi4_scope properly in ICMP handlers.

 include/net/route.h | 36 ++++++++++++++++++++++++------------
 net/dccp/ipv4.c     |  5 ++---
 net/ipv4/af_inet.c  |  6 +++---
 net/ipv4/datagram.c |  7 +++----
 net/ipv4/route.c    | 41 +++++++++++++++++++----------------------
 net/ipv4/tcp_ipv4.c |  5 ++---
 6 files changed, 53 insertions(+), 47 deletions(-)

-- 
2.21.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
  2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
@ 2022-04-20 23:21 ` Guillaume Nault
  2022-04-22  2:30   ` David Ahern
  2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, David Ahern, dccp

All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
either by manual field assignment, memset(0) of the whole structure or
implicit structure initialisation of on-stack variables
(RT_SCOPE_UNIVERSE actually equals 0).

Therefore, we don't need to always initialise ->flowi4_scope in
ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
the special RTO_ONLINK flag is present in the tos.

This will allow some code simplification, like removing
ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
entirely by properly initialising ->flowi4_scope, instead of
overloading ->flowi4_tos with a special flag. Eventually, this will
allow to convert ->flowi4_tos to dscp_t.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
It's important for the correctness of this patch that all callers
initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
them is long, although each case is pretty trivial.

If it helps, I can send a patch series that converts implicit
initialisation of ->flowi4_scope with an explicit assignment to
RT_SCOPE_UNIVERSE. This would also have the advantage of making it
clear to future readers that ->flowi4_scope _has_ to be initialised. I
haven't sent such patch series to not overwhelm reviewers with trivial
and not technically-required changes (there are 40+ places to modify,
scattered over 30+ different files). But if anyone prefers explicit
initialisation everywhere, then just let me know and I'll send such
patches.
---
 net/ipv4/route.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e839d424b861..d8f82c0ac132 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -503,8 +503,8 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
 	__u8 tos = RT_FL_TOS(fl4);
 
 	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
-	fl4->flowi4_scope = tos & RTO_ONLINK ?
-			    RT_SCOPE_LINK : RT_SCOPE_UNIVERSE;
+	if (tos & RTO_ONLINK)
+		fl4->flowi4_scope = RT_SCOPE_LINK;
 }
 
 static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect().
  2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
  2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
@ 2022-04-20 23:21 ` Guillaume Nault
  2022-04-22  2:32   ` David Ahern
  2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, David Ahern, dccp

Now that ip_rt_fix_tos() doesn't reset ->flowi4_scope unconditionally,
we don't have to rely on the RTO_ONLINK bit to properly set the scope
of a flowi4 structure. We can just set ->flowi4_scope explicitly and
avoid using RTO_ONLINK in ->flowi4_tos.

This patch converts callers of ip_route_connect(). Instead of setting
the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:

  1- Drop the tos parameter from ip_route_connect(): its value was
     entirely based on sk, which is also passed as parameter.

  2- Set ->flowi4_scope depending on the SOCK_LOCALROUTE socket option
     instead of always initialising it with RT_SCOPE_UNIVERSE (let's
     define ip_sock_rt_scope() for this purpose).

  3- Avoid overloading ->flowi4_tos with RTO_ONLINK: since the scope is
     now properly initialised, we don't need to tell ip_rt_fix_tos() to
     adjust ->flowi4_scope for us. So let's define ip_sock_rt_tos(),
     which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
     bit overload.

Note:
  In the original ip_route_connect() code, __ip_route_output_key()
  might clear the RTO_ONLINK bit of fl4->flowi4_tos (because of
  ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
  original tos variable. Now that we don't set RTO_ONLINK any more,
  this is not a problem and we can use fl4->flowi4_tos in
  flowi4_update_output().

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 include/net/route.h | 36 ++++++++++++++++++++++++------------
 net/dccp/ipv4.c     |  5 ++---
 net/ipv4/af_inet.c  |  6 +++---
 net/ipv4/datagram.c |  7 +++----
 net/ipv4/tcp_ipv4.c |  5 ++---
 5 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 25404fc2b483..991a3985712d 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -43,6 +43,19 @@
 #define RT_CONN_FLAGS(sk)   (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
 #define RT_CONN_FLAGS_TOS(sk,tos)   (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
 
+static inline __u8 ip_sock_rt_scope(const struct sock *sk)
+{
+	if (sock_flag(sk, SOCK_LOCALROUTE))
+		return RT_SCOPE_LINK;
+
+	return RT_SCOPE_UNIVERSE;
+}
+
+static inline __u8 ip_sock_rt_tos(const struct sock *sk)
+{
+	return RT_TOS(inet_sk(sk)->tos);
+}
+
 struct ip_tunnel_info;
 struct fib_nh;
 struct fib_info;
@@ -289,39 +302,38 @@ static inline char rt_tos2priority(u8 tos)
  * ip_route_newports() calls.
  */
 
-static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32 src,
-					 u32 tos, int oif, u8 protocol,
+static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst,
+					 __be32 src, int oif, u8 protocol,
 					 __be16 sport, __be16 dport,
-					 struct sock *sk)
+					 const struct sock *sk)
 {
 	__u8 flow_flags = 0;
 
 	if (inet_sk(sk)->transparent)
 		flow_flags |= FLOWI_FLAG_ANYSRC;
 
-	flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
-			   protocol, flow_flags, dst, src, dport, sport,
-			   sk->sk_uid);
+	flowi4_init_output(fl4, oif, sk->sk_mark, ip_sock_rt_tos(sk),
+			   ip_sock_rt_scope(sk), protocol, flow_flags, dst,
+			   src, dport, sport, sk->sk_uid);
 }
 
-static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
-					      __be32 dst, __be32 src, u32 tos,
-					      int oif, u8 protocol,
+static inline struct rtable *ip_route_connect(struct flowi4 *fl4, __be32 dst,
+					      __be32 src, int oif, u8 protocol,
 					      __be16 sport, __be16 dport,
 					      struct sock *sk)
 {
 	struct net *net = sock_net(sk);
 	struct rtable *rt;
 
-	ip_route_connect_init(fl4, dst, src, tos, oif, protocol,
-			      sport, dport, sk);
+	ip_route_connect_init(fl4, dst, src, oif, protocol, sport, dport, sk);
 
 	if (!dst || !src) {
 		rt = __ip_route_output_key(net, fl4);
 		if (IS_ERR(rt))
 			return rt;
 		ip_rt_put(rt);
-		flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr);
+		flowi4_update_output(fl4, oif, fl4->flowi4_tos, fl4->daddr,
+				     fl4->saddr);
 	}
 	security_sk_classify_flow(sk, flowi4_to_flowi_common(fl4));
 	return ip_route_output_flow(net, fl4, sk);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ae662567a6cb..82696ab86f74 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -76,9 +76,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	orig_dport = usin->sin_port;
 	fl4 = &inet->cork.fl.u.ip4;
 	rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
-			      RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
-			      IPPROTO_DCCP,
-			      orig_sport, orig_dport, sk);
+			      sk->sk_bound_dev_if, IPPROTO_DCCP, orig_sport,
+			      orig_dport, sk);
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 195ecfa2f000..93da9f783bec 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1233,9 +1233,9 @@ static int inet_sk_reselect_saddr(struct sock *sk)
 
 	/* Query new route. */
 	fl4 = &inet->cork.fl.u.ip4;
-	rt = ip_route_connect(fl4, daddr, 0, RT_CONN_FLAGS(sk),
-			      sk->sk_bound_dev_if, sk->sk_protocol,
-			      inet->inet_sport, inet->inet_dport, sk);
+	rt = ip_route_connect(fl4, daddr, 0, sk->sk_bound_dev_if,
+			      sk->sk_protocol, inet->inet_sport,
+			      inet->inet_dport, sk);
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 48f337ccf949..ffd57523331f 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -44,10 +44,9 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 			saddr = inet->mc_addr;
 	}
 	fl4 = &inet->cork.fl.u.ip4;
-	rt = ip_route_connect(fl4, usin->sin_addr.s_addr, saddr,
-			      RT_CONN_FLAGS(sk), oif,
-			      sk->sk_protocol,
-			      inet->inet_sport, usin->sin_port, sk);
+	rt = ip_route_connect(fl4, usin->sin_addr.s_addr, saddr, oif,
+			      sk->sk_protocol, inet->inet_sport,
+			      usin->sin_port, sk);
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
 		if (err == -ENETUNREACH)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 157265aecbed..2c2d42142555 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -229,9 +229,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	orig_dport = usin->sin_port;
 	fl4 = &inet->cork.fl.u.ip4;
 	rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
-			      RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
-			      IPPROTO_TCP,
-			      orig_sport, orig_dport, sk);
+			      sk->sk_bound_dev_if, IPPROTO_TCP, orig_sport,
+			      orig_dport, sk);
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
 		if (err == -ENETUNREACH)
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
  2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
  2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
  2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
@ 2022-04-20 23:21 ` Guillaume Nault
  2022-04-22  3:08   ` David Ahern
  2022-04-22  3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
  2022-04-22 12:50 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, David Ahern

All the *_redirect() and *_update_pmtu() functions initialise their
struct flowi4 variable with either __build_flow_key() or
build_sk_flow_key(). When sk is provided, these functions use
RT_CONN_FLAGS() to set ->flowi4_tos and always use RT_SCOPE_UNIVERSE
for ->flowi4_scope. Then they rely on ip_rt_fix_tos() to adjust the
scope based on the RTO_ONLINK bit and to mask the tos with
IPTOS_RT_MASK.

This patch modifies __build_flow_key() and build_sk_flow_key() to
properly initialise ->flowi4_tos and ->flowi4_scope, so that the
ICMP redirects and PMTU handlers don't need an extra call to
ip_rt_fix_tos() before doing a fib lookup. That is, we:

  * Drop RT_CONN_FLAGS(): use ip_sock_rt_tos() and ip_sock_rt_scope()
    instead, so that we don't have to rely on ip_rt_fix_tos() to adjust
    the scope anymore.

  * Apply IPTOS_RT_MASK to the tos, so that we don't need
    ip_rt_fix_tos() to do it for us.

  * Drop the ip_rt_fix_tos() calls that now become useless.

The only remaining ip_rt_fix_tos() caller is ip_route_output_key_hash()
which needs it as long as external callers still use the RTO_ONLINK
flag.

Note:
  This patch also drops some useless RT_TOS() calls as IPTOS_RT_MASK is
  a stronger mask.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 net/ipv4/route.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d8f82c0ac132..ffbe2e4f8c89 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -508,23 +508,24 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
 }
 
 static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
-			     const struct sock *sk,
-			     const struct iphdr *iph,
-			     int oif, u8 tos,
-			     u8 prot, u32 mark, int flow_flags)
+			     const struct sock *sk, const struct iphdr *iph,
+			     int oif, __u8 tos, u8 prot, u32 mark,
+			     int flow_flags)
 {
+	__u8 scope = RT_SCOPE_UNIVERSE;
+
 	if (sk) {
 		const struct inet_sock *inet = inet_sk(sk);
 
 		oif = sk->sk_bound_dev_if;
 		mark = sk->sk_mark;
-		tos = RT_CONN_FLAGS(sk);
+		tos = ip_sock_rt_tos(sk);
+		scope = ip_sock_rt_scope(sk);
 		prot = inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol;
 	}
-	flowi4_init_output(fl4, oif, mark, tos,
-			   RT_SCOPE_UNIVERSE, prot,
-			   flow_flags,
-			   iph->daddr, iph->saddr, 0, 0,
+
+	flowi4_init_output(fl4, oif, mark, tos & IPTOS_RT_MASK, scope,
+			   prot, flow_flags, iph->daddr, iph->saddr, 0, 0,
 			   sock_net_uid(net, sk));
 }
 
@@ -534,9 +535,9 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
 	const struct net *net = dev_net(skb->dev);
 	const struct iphdr *iph = ip_hdr(skb);
 	int oif = skb->dev->ifindex;
-	u8 tos = RT_TOS(iph->tos);
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;
+	__u8 tos = iph->tos;
 
 	__build_flow_key(net, fl4, sk, iph, oif, tos, prot, mark, 0);
 }
@@ -552,7 +553,8 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk)
 	if (inet_opt && inet_opt->opt.srr)
 		daddr = inet_opt->opt.faddr;
 	flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark,
-			   RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE,
+			   ip_sock_rt_tos(sk) & IPTOS_RT_MASK,
+			   ip_sock_rt_scope(sk),
 			   inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
 			   inet_sk_flowi_flags(sk),
 			   daddr, inet->inet_saddr, 0, 0, sk->sk_uid);
@@ -825,14 +827,13 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf
 	const struct iphdr *iph = (const struct iphdr *) skb->data;
 	struct net *net = dev_net(skb->dev);
 	int oif = skb->dev->ifindex;
-	u8 tos = RT_TOS(iph->tos);
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;
+	__u8 tos = iph->tos;
 
 	rt = (struct rtable *) dst;
 
 	__build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0);
-	ip_rt_fix_tos(&fl4);
 	__ip_do_redirect(rt, skb, &fl4, true);
 }
 
@@ -1061,7 +1062,6 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 	struct flowi4 fl4;
 
 	ip_rt_build_flow_key(&fl4, sk, skb);
-	ip_rt_fix_tos(&fl4);
 
 	/* Don't make lookup fail for bridged encapsulations */
 	if (skb && netif_is_any_bridge_port(skb->dev))
@@ -1078,8 +1078,8 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
 	struct rtable *rt;
 	u32 mark = IP4_REPLY_MARK(net, skb->mark);
 
-	__build_flow_key(net, &fl4, NULL, iph, oif,
-			 RT_TOS(iph->tos), protocol, mark, 0);
+	__build_flow_key(net, &fl4, NULL, iph, oif, iph->tos, protocol, mark,
+			 0);
 	rt = __ip_route_output_key(net, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_rt_update_pmtu(rt, &fl4, mtu);
@@ -1136,8 +1136,6 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 			goto out;
 
 		new = true;
-	} else {
-		ip_rt_fix_tos(&fl4);
 	}
 
 	__ip_rt_update_pmtu((struct rtable *)xfrm_dst_path(&rt->dst), &fl4, mtu);
@@ -1169,8 +1167,7 @@ void ipv4_redirect(struct sk_buff *skb, struct net *net,
 	struct flowi4 fl4;
 	struct rtable *rt;
 
-	__build_flow_key(net, &fl4, NULL, iph, oif,
-			 RT_TOS(iph->tos), protocol, 0, 0);
+	__build_flow_key(net, &fl4, NULL, iph, oif, iph->tos, protocol, 0, 0);
 	rt = __ip_route_output_key(net, &fl4);
 	if (!IS_ERR(rt)) {
 		__ip_do_redirect(rt, skb, &fl4, false);
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
  2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
@ 2022-04-22  2:30   ` David Ahern
  2022-04-22 10:53     ` Guillaume Nault
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2022-04-22  2:30 UTC (permalink / raw)
  To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, dccp

On 4/20/22 5:21 PM, Guillaume Nault wrote:
> All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
> either by manual field assignment, memset(0) of the whole structure or
> implicit structure initialisation of on-stack variables
> (RT_SCOPE_UNIVERSE actually equals 0).
> 
> Therefore, we don't need to always initialise ->flowi4_scope in
> ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
> the special RTO_ONLINK flag is present in the tos.
> 
> This will allow some code simplification, like removing
> ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
> entirely by properly initialising ->flowi4_scope, instead of
> overloading ->flowi4_tos with a special flag. Eventually, this will
> allow to convert ->flowi4_tos to dscp_t.
> 
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
> It's important for the correctness of this patch that all callers
> initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
> them is long, although each case is pretty trivial.
> 
> If it helps, I can send a patch series that converts implicit
> initialisation of ->flowi4_scope with an explicit assignment to
> RT_SCOPE_UNIVERSE. This would also have the advantage of making it
> clear to future readers that ->flowi4_scope _has_ to be initialised. I
> haven't sent such patch series to not overwhelm reviewers with trivial
> and not technically-required changes (there are 40+ places to modify,
> scattered over 30+ different files). But if anyone prefers explicit
> initialisation everywhere, then just let me know and I'll send such
> patches.

There are a handful of places that open code the initialization of the
flow struct. I *think* I found all of them in 40867d74c374.

> ---
>  net/ipv4/route.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index e839d424b861..d8f82c0ac132 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -503,8 +503,8 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
>  	__u8 tos = RT_FL_TOS(fl4);
>  
>  	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
> -	fl4->flowi4_scope = tos & RTO_ONLINK ?
> -			    RT_SCOPE_LINK : RT_SCOPE_UNIVERSE;
> +	if (tos & RTO_ONLINK)
> +		fl4->flowi4_scope = RT_SCOPE_LINK;
>  }
>  
>  static void __build_flow_key(const struct net *net, struct flowi4 *fl4,

Reviewed-by: David Ahern <dsahern@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect().
  2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
@ 2022-04-22  2:32   ` David Ahern
  0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2022-04-22  2:32 UTC (permalink / raw)
  To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, dccp

On 4/20/22 5:21 PM, Guillaume Nault wrote:
> Now that ip_rt_fix_tos() doesn't reset ->flowi4_scope unconditionally,
> we don't have to rely on the RTO_ONLINK bit to properly set the scope
> of a flowi4 structure. We can just set ->flowi4_scope explicitly and
> avoid using RTO_ONLINK in ->flowi4_tos.
> 
> This patch converts callers of ip_route_connect(). Instead of setting
> the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:
> 
>   1- Drop the tos parameter from ip_route_connect(): its value was
>      entirely based on sk, which is also passed as parameter.
> 
>   2- Set ->flowi4_scope depending on the SOCK_LOCALROUTE socket option
>      instead of always initialising it with RT_SCOPE_UNIVERSE (let's
>      define ip_sock_rt_scope() for this purpose).
> 
>   3- Avoid overloading ->flowi4_tos with RTO_ONLINK: since the scope is
>      now properly initialised, we don't need to tell ip_rt_fix_tos() to
>      adjust ->flowi4_scope for us. So let's define ip_sock_rt_tos(),
>      which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
>      bit overload.
> 
> Note:
>   In the original ip_route_connect() code, __ip_route_output_key()
>   might clear the RTO_ONLINK bit of fl4->flowi4_tos (because of
>   ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
>   original tos variable. Now that we don't set RTO_ONLINK any more,
>   this is not a problem and we can use fl4->flowi4_tos in
>   flowi4_update_output().
> 
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
>  include/net/route.h | 36 ++++++++++++++++++++++++------------
>  net/dccp/ipv4.c     |  5 ++---
>  net/ipv4/af_inet.c  |  6 +++---
>  net/ipv4/datagram.c |  7 +++----
>  net/ipv4/tcp_ipv4.c |  5 ++---
>  5 files changed, 34 insertions(+), 25 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
  2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
@ 2022-04-22  3:08   ` David Ahern
  0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2022-04-22  3:08 UTC (permalink / raw)
  To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI

On 4/20/22 5:21 PM, Guillaume Nault wrote:
> All the *_redirect() and *_update_pmtu() functions initialise their
> struct flowi4 variable with either __build_flow_key() or
> build_sk_flow_key(). When sk is provided, these functions use
> RT_CONN_FLAGS() to set ->flowi4_tos and always use RT_SCOPE_UNIVERSE
> for ->flowi4_scope. Then they rely on ip_rt_fix_tos() to adjust the
> scope based on the RTO_ONLINK bit and to mask the tos with
> IPTOS_RT_MASK.
> 
> This patch modifies __build_flow_key() and build_sk_flow_key() to
> properly initialise ->flowi4_tos and ->flowi4_scope, so that the
> ICMP redirects and PMTU handlers don't need an extra call to
> ip_rt_fix_tos() before doing a fib lookup. That is, we:
> 
>   * Drop RT_CONN_FLAGS(): use ip_sock_rt_tos() and ip_sock_rt_scope()
>     instead, so that we don't have to rely on ip_rt_fix_tos() to adjust
>     the scope anymore.
> 
>   * Apply IPTOS_RT_MASK to the tos, so that we don't need
>     ip_rt_fix_tos() to do it for us.
> 
>   * Drop the ip_rt_fix_tos() calls that now become useless.
> 
> The only remaining ip_rt_fix_tos() caller is ip_route_output_key_hash()
> which needs it as long as external callers still use the RTO_ONLINK
> flag.
> 
> Note:
>   This patch also drops some useless RT_TOS() calls as IPTOS_RT_MASK is
>   a stronger mask.
> 
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
>  net/ipv4/route.c | 37 +++++++++++++++++--------------------
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
  2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
                   ` (2 preceding siblings ...)
  2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
@ 2022-04-22  3:10 ` David Ahern
  2022-04-22 11:02   ` Guillaume Nault
  2022-04-22 12:50 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2022-04-22  3:10 UTC (permalink / raw)
  To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Hideaki YOSHIFUJI, dccp

On 4/20/22 5:21 PM, Guillaume Nault wrote:
> RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
> It's stored in a normally unused bit of the ->flowi4_tos field, in
> struct flowi4. However it has several problems:
> 
>  * This bit is also used by ECN. Although ECN bits are supposed to be
>    cleared before doing a route lookup, it happened that some code
>    paths didn't properly sanitise their ->flowi4_tos. So this mechanism
>    is fragile and we had bugs in the past where ECN bits slipped in and
>    could end up being erroneously interpreted as RTO_ONLINK.
> 
>  * A dscp_t type was recently introduced to ensure ECN bits are cleared
>    during route lookups. ->flowi4_tos is the most important structure
>    field to convert, but RTO_ONLINK prevents such conversion, as dscp_t
>    mandates that ECN bits (where RTO_ONLINK is stored) be zero.
> 
> Therefore we need to stop using RTO_ONLINK altogether. Fortunately
> RTO_ONLINK isn't a necessity. Instead of passing a flag in ->flowi4_tos
> to tell the route lookup function to restrict the scope, we can simply
> initialise the scope correctly.
> 

I believe the set looks ok. I think the fib test coverage in selftests
could use more tests to cover tos.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
  2022-04-22  2:30   ` David Ahern
@ 2022-04-22 10:53     ` Guillaume Nault
  2022-04-22 14:40       ` David Ahern
  0 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-22 10:53 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Hideaki YOSHIFUJI, dccp

On Thu, Apr 21, 2022 at 08:30:52PM -0600, David Ahern wrote:
> On 4/20/22 5:21 PM, Guillaume Nault wrote:
> > All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
> > either by manual field assignment, memset(0) of the whole structure or
> > implicit structure initialisation of on-stack variables
> > (RT_SCOPE_UNIVERSE actually equals 0).
> > 
> > Therefore, we don't need to always initialise ->flowi4_scope in
> > ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
> > the special RTO_ONLINK flag is present in the tos.
> > 
> > This will allow some code simplification, like removing
> > ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
> > entirely by properly initialising ->flowi4_scope, instead of
> > overloading ->flowi4_tos with a special flag. Eventually, this will
> > allow to convert ->flowi4_tos to dscp_t.
> > 
> > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > ---
> > It's important for the correctness of this patch that all callers
> > initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
> > them is long, although each case is pretty trivial.
> > 
> > If it helps, I can send a patch series that converts implicit
> > initialisation of ->flowi4_scope with an explicit assignment to
> > RT_SCOPE_UNIVERSE. This would also have the advantage of making it
> > clear to future readers that ->flowi4_scope _has_ to be initialised. I
> > haven't sent such patch series to not overwhelm reviewers with trivial
> > and not technically-required changes (there are 40+ places to modify,
> > scattered over 30+ different files). But if anyone prefers explicit
> > initialisation everywhere, then just let me know and I'll send such
> > patches.
> 
> There are a handful of places that open code the initialization of the
> flow struct. I *think* I found all of them in 40867d74c374.

By open code, do you mean "doesn't use flowi4_init_output() nor
ip_tunnel_init_flow()"? If so, I think there are many more.

To be sure we're on the same page, here's a small part of the diff for
my "explicit flowi4_scope initialisation" patch series:

 drivers/infiniband/core/addr.c                          | 1 +
 drivers/infiniband/sw/rxe/rxe_net.c                     | 1 +
 drivers/net/amt.c                                       | 3 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c            | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c     | 3 +++
 drivers/net/ethernet/netronome/nfp/flower/action.c      | 1 +
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 7 +++++--
 drivers/net/geneve.c                                    | 9 +++++++--
 drivers/net/gtp.c                                       | 1 +
 drivers/net/ipvlan/ipvlan_core.c                        | 1 +
 drivers/net/vrf.c                                       | 1 +
 drivers/net/vxlan/vxlan_core.c                          | 1 +
 drivers/net/wireguard/socket.c                          | 1 +
 include/net/ip_tunnels.h                                | 1 +
 include/net/route.h                                     | 2 ++
 net/core/filter.c                                       | 1 +
 net/core/lwt_bpf.c                                      | 1 +
 net/dccp/ipv4.c                                         | 1 +
 net/ipv4/icmp.c                                         | 3 +++
 net/ipv4/netfilter.c                                    | 1 +
 net/ipv4/netfilter/nf_reject_ipv4.c                     | 1 +
 net/ipv4/route.c                                        | 1 +
 net/ipv4/xfrm4_policy.c                                 | 1 +
 net/netfilter/ipvs/ip_vs_xmit.c                         | 1 +
 net/netfilter/nf_conntrack_h323_main.c                  | 3 +++
 net/netfilter/nf_conntrack_sip.c                        | 1 +
 net/netfilter/nft_flow_offload.c                        | 1 +
 net/netfilter/nft_rt.c                                  | 1 +
 net/netfilter/xt_TCPMSS.c                               | 2 ++
 net/sctp/protocol.c                                     | 1 +
 net/smc/smc_ib.c                                        | 1 +
 net/tipc/udp_media.c                                    | 1 +
 net/xfrm/xfrm_policy.c                                  | 1 +
 33 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f253295795f0..5b6e0003eead 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -399,6 +399,7 @@ static int addr4_resolve(struct sockaddr *src_sock,
        memset(&fl4, 0, sizeof(fl4));
        fl4.daddr = dst_ip;
        fl4.saddr = src_ip;
+       fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
        fl4.flowi4_oif = addr->bound_dev_if;
        rt = ip_route_output_key(addr->net, &fl4);
        ret = PTR_ERR_OR_ZERO(rt);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index c53f4529f098..cf6dc89a8785 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -31,6 +31,7 @@ static struct dst_entry *rxe_find_route4(struct net_device *ndev,
        fl.flowi4_oif = ndev->ifindex;
        memcpy(&fl.saddr, saddr, sizeof(*saddr));
        memcpy(&fl.daddr, daddr, sizeof(*daddr));
+       fl.flowi4_scope = RT_SCOPE_UNIVERSE;
        fl.flowi4_proto = IPPROTO_UDP;
 
        rt = ip_route_output_key(&init_net, &fl);
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 10455c9b9da0..3e957ff64715 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -990,6 +990,7 @@ static bool amt_send_membership_update(struct amt_dev *amt,
        fl4.daddr              = amt->remote_ip;
        fl4.saddr              = amt->local_ip;
        fl4.flowi4_tos         = AMT_TOS;
+       fl4.flowi4_scope       = RT_SCOPE_UNIVERSE;
        fl4.flowi4_proto       = IPPROTO_UDP;
        rt = ip_route_output_key(amt->net, &fl4);
        if (IS_ERR(rt)) {
@@ -1048,6 +1049,7 @@ static void amt_send_multicast_data(struct amt_dev *amt,
        fl4.flowi4_oif         = amt->stream_dev->ifindex;
        fl4.daddr              = tunnel->ip4;
        fl4.saddr              = amt->local_ip;
+       fl4.flowi4_scope       = RT_SCOPE_UNIVERSE;
        fl4.flowi4_proto       = IPPROTO_UDP;
        rt = ip_route_output_key(amt->net, &fl4);
        if (IS_ERR(rt)) {
@@ -1103,6 +1105,7 @@ static bool amt_send_membership_query(struct amt_dev *amt,
        fl4.daddr              = tunnel->ip4;
        fl4.saddr              = amt->local_ip;
        fl4.flowi4_tos         = AMT_TOS;
+       fl4.flowi4_scope       = RT_SCOPE_UNIVERSE;
        fl4.flowi4_proto       = IPPROTO_UDP;
        rt = ip_route_output_key(amt->net, &fl4);
        if (IS_ERR(rt)) {
...

> > ---
> >  net/ipv4/route.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index e839d424b861..d8f82c0ac132 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -503,8 +503,8 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
> >  	__u8 tos = RT_FL_TOS(fl4);
> >  
> >  	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
> > -	fl4->flowi4_scope = tos & RTO_ONLINK ?
> > -			    RT_SCOPE_LINK : RT_SCOPE_UNIVERSE;
> > +	if (tos & RTO_ONLINK)
> > +		fl4->flowi4_scope = RT_SCOPE_LINK;
> >  }
> >  
> >  static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
> 
> Reviewed-by: David Ahern <dsahern@kernel.org>
> 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
  2022-04-22  3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
@ 2022-04-22 11:02   ` Guillaume Nault
  0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2022-04-22 11:02 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Hideaki YOSHIFUJI, dccp

On Thu, Apr 21, 2022 at 09:10:21PM -0600, David Ahern wrote:
> On 4/20/22 5:21 PM, Guillaume Nault wrote:
> > RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
> > It's stored in a normally unused bit of the ->flowi4_tos field, in
> > struct flowi4. However it has several problems:
> > 
> >  * This bit is also used by ECN. Although ECN bits are supposed to be
> >    cleared before doing a route lookup, it happened that some code
> >    paths didn't properly sanitise their ->flowi4_tos. So this mechanism
> >    is fragile and we had bugs in the past where ECN bits slipped in and
> >    could end up being erroneously interpreted as RTO_ONLINK.
> > 
> >  * A dscp_t type was recently introduced to ensure ECN bits are cleared
> >    during route lookups. ->flowi4_tos is the most important structure
> >    field to convert, but RTO_ONLINK prevents such conversion, as dscp_t
> >    mandates that ECN bits (where RTO_ONLINK is stored) be zero.
> > 
> > Therefore we need to stop using RTO_ONLINK altogether. Fortunately
> > RTO_ONLINK isn't a necessity. Instead of passing a flag in ->flowi4_tos
> > to tell the route lookup function to restrict the scope, we can simply
> > initialise the scope correctly.
> > 
> 
> I believe the set looks ok. I think the fib test coverage in selftests
> could use more tests to cover tos.

Yes, this is on my todo list. I also plan to review existing tests that
cover route lookups with link scope, and extend them if necessary.

Thanks for the review.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
  2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
                   ` (3 preceding siblings ...)
  2022-04-22  3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
@ 2022-04-22 12:50 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-04-22 12:50 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: davem, kuba, pabeni, netdev, yoshfuji, dsahern, dccp

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 21 Apr 2022 01:21:19 +0200 you wrote:
> RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
> It's stored in a normally unused bit of the ->flowi4_tos field, in
> struct flowi4. However it has several problems:
> 
>  * This bit is also used by ECN. Although ECN bits are supposed to be
>    cleared before doing a route lookup, it happened that some code
>    paths didn't properly sanitise their ->flowi4_tos. So this mechanism
>    is fragile and we had bugs in the past where ECN bits slipped in and
>    could end up being erroneously interpreted as RTO_ONLINK.
> 
> [...]

Here is the summary with links:
  - [net-next,1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
    https://git.kernel.org/netdev/net-next/c/16a28267774c
  - [net-next,2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect().
    https://git.kernel.org/netdev/net-next/c/67e1e2f4854b
  - [net-next,3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
    https://git.kernel.org/netdev/net-next/c/b1ad41384866

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
  2022-04-22 10:53     ` Guillaume Nault
@ 2022-04-22 14:40       ` David Ahern
  2022-04-25 10:04         ` Guillaume Nault
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2022-04-22 14:40 UTC (permalink / raw)
  To: Guillaume Nault
  Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Hideaki YOSHIFUJI, dccp

On 4/22/22 4:53 AM, Guillaume Nault wrote:
> On Thu, Apr 21, 2022 at 08:30:52PM -0600, David Ahern wrote:
>> On 4/20/22 5:21 PM, Guillaume Nault wrote:
>>> All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
>>> either by manual field assignment, memset(0) of the whole structure or
>>> implicit structure initialisation of on-stack variables
>>> (RT_SCOPE_UNIVERSE actually equals 0).
>>>
>>> Therefore, we don't need to always initialise ->flowi4_scope in
>>> ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
>>> the special RTO_ONLINK flag is present in the tos.
>>>
>>> This will allow some code simplification, like removing
>>> ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
>>> entirely by properly initialising ->flowi4_scope, instead of
>>> overloading ->flowi4_tos with a special flag. Eventually, this will
>>> allow to convert ->flowi4_tos to dscp_t.
>>>
>>> Signed-off-by: Guillaume Nault <gnault@redhat.com>
>>> ---
>>> It's important for the correctness of this patch that all callers
>>> initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
>>> them is long, although each case is pretty trivial.
>>>
>>> If it helps, I can send a patch series that converts implicit
>>> initialisation of ->flowi4_scope with an explicit assignment to
>>> RT_SCOPE_UNIVERSE. This would also have the advantage of making it
>>> clear to future readers that ->flowi4_scope _has_ to be initialised. I
>>> haven't sent such patch series to not overwhelm reviewers with trivial
>>> and not technically-required changes (there are 40+ places to modify,
>>> scattered over 30+ different files). But if anyone prefers explicit
>>> initialisation everywhere, then just let me know and I'll send such
>>> patches.
>>
>> There are a handful of places that open code the initialization of the
>> flow struct. I *think* I found all of them in 40867d74c374.
> 
> By open code, do you mean "doesn't use flowi4_init_output() nor
> ip_tunnel_init_flow()"? If so, I think there are many more.
> 

no, you made a comment about flow struct being initialized to 0 which
implicitly initializes scope. My comment is that there are only a few
places that do not use either `memset(flow, 0, sizeof())` or `struct
flowi4 fl4 = {}` to fully initialize the struct.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
  2022-04-22 14:40       ` David Ahern
@ 2022-04-25 10:04         ` Guillaume Nault
  0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2022-04-25 10:04 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Hideaki YOSHIFUJI, dccp

On Fri, Apr 22, 2022 at 08:40:01AM -0600, David Ahern wrote:
> On 4/22/22 4:53 AM, Guillaume Nault wrote:
> > On Thu, Apr 21, 2022 at 08:30:52PM -0600, David Ahern wrote:
> >> On 4/20/22 5:21 PM, Guillaume Nault wrote:
> >>> All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
> >>> either by manual field assignment, memset(0) of the whole structure or
> >>> implicit structure initialisation of on-stack variables
> >>> (RT_SCOPE_UNIVERSE actually equals 0).
> >>>
> >>> Therefore, we don't need to always initialise ->flowi4_scope in
> >>> ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
> >>> the special RTO_ONLINK flag is present in the tos.
> >>>
> >>> This will allow some code simplification, like removing
> >>> ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
> >>> entirely by properly initialising ->flowi4_scope, instead of
> >>> overloading ->flowi4_tos with a special flag. Eventually, this will
> >>> allow to convert ->flowi4_tos to dscp_t.
> >>>
> >>> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> >>> ---
> >>> It's important for the correctness of this patch that all callers
> >>> initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
> >>> them is long, although each case is pretty trivial.
> >>>
> >>> If it helps, I can send a patch series that converts implicit
> >>> initialisation of ->flowi4_scope with an explicit assignment to
> >>> RT_SCOPE_UNIVERSE. This would also have the advantage of making it
> >>> clear to future readers that ->flowi4_scope _has_ to be initialised. I
> >>> haven't sent such patch series to not overwhelm reviewers with trivial
> >>> and not technically-required changes (there are 40+ places to modify,
> >>> scattered over 30+ different files). But if anyone prefers explicit
> >>> initialisation everywhere, then just let me know and I'll send such
> >>> patches.
> >>
> >> There are a handful of places that open code the initialization of the
> >> flow struct. I *think* I found all of them in 40867d74c374.
> > 
> > By open code, do you mean "doesn't use flowi4_init_output() nor
> > ip_tunnel_init_flow()"? If so, I think there are many more.
> > 
> 
> no, you made a comment about flow struct being initialized to 0 which
> implicitly initializes scope. My comment is that there are only a few
> places that do not use either `memset(flow, 0, sizeof())` or `struct
> flowi4 fl4 = {}` to fully initialize the struct.

Yes, that's right. But I've only audited the call paths that lead to
ip_route_output_key_hash() (plus the ICMP error handlers), as these are
the ones that were relevant for this patch series. So I haven't looked
at flow initialisation in the ip_route_input*() or IPv6 call paths.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-04-25 10:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
2022-04-22  2:30   ` David Ahern
2022-04-22 10:53     ` Guillaume Nault
2022-04-22 14:40       ` David Ahern
2022-04-25 10:04         ` Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
2022-04-22  2:32   ` David Ahern
2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
2022-04-22  3:08   ` David Ahern
2022-04-22  3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
2022-04-22 11:02   ` Guillaume Nault
2022-04-22 12:50 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).