* [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
@ 2022-04-20 23:21 Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, David Ahern, dccp
RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
It's stored in a normally unused bit of the ->flowi4_tos field, in
struct flowi4. However it has several problems:
* This bit is also used by ECN. Although ECN bits are supposed to be
cleared before doing a route lookup, it happened that some code
paths didn't properly sanitise their ->flowi4_tos. So this mechanism
is fragile and we had bugs in the past where ECN bits slipped in and
could end up being erroneously interpreted as RTO_ONLINK.
* A dscp_t type was recently introduced to ensure ECN bits are cleared
during route lookups. ->flowi4_tos is the most important structure
field to convert, but RTO_ONLINK prevents such conversion, as dscp_t
mandates that ECN bits (where RTO_ONLINK is stored) be zero.
Therefore we need to stop using RTO_ONLINK altogether. Fortunately
RTO_ONLINK isn't a necessity. Instead of passing a flag in ->flowi4_tos
to tell the route lookup function to restrict the scope, we can simply
initialise the scope correctly.
Patch 1 does some preparatory work: it stops resetting ->flowi4_scope
automatically before a route lookup, thus allowing callers to set their
desired scope without having to rely on the RTO_ONLINK flag.
Patch 2-3 convert a few code paths to avoid relying on RTO_ONLINK.
More conversions will have to take place before we can eventually
remove this flag.
Guillaume Nault (3):
ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
ipv4: Avoid using RTO_ONLINK with ip_route_connect().
ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
include/net/route.h | 36 ++++++++++++++++++++++++------------
net/dccp/ipv4.c | 5 ++---
net/ipv4/af_inet.c | 6 +++---
net/ipv4/datagram.c | 7 +++----
net/ipv4/route.c | 41 +++++++++++++++++++----------------------
net/ipv4/tcp_ipv4.c | 5 ++---
6 files changed, 53 insertions(+), 47 deletions(-)
--
2.21.3
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
@ 2022-04-20 23:21 ` Guillaume Nault
2022-04-22 2:30 ` David Ahern
2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, David Ahern, dccp
All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
either by manual field assignment, memset(0) of the whole structure or
implicit structure initialisation of on-stack variables
(RT_SCOPE_UNIVERSE actually equals 0).
Therefore, we don't need to always initialise ->flowi4_scope in
ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
the special RTO_ONLINK flag is present in the tos.
This will allow some code simplification, like removing
ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
entirely by properly initialising ->flowi4_scope, instead of
overloading ->flowi4_tos with a special flag. Eventually, this will
allow to convert ->flowi4_tos to dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
It's important for the correctness of this patch that all callers
initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
them is long, although each case is pretty trivial.
If it helps, I can send a patch series that converts implicit
initialisation of ->flowi4_scope with an explicit assignment to
RT_SCOPE_UNIVERSE. This would also have the advantage of making it
clear to future readers that ->flowi4_scope _has_ to be initialised. I
haven't sent such patch series to not overwhelm reviewers with trivial
and not technically-required changes (there are 40+ places to modify,
scattered over 30+ different files). But if anyone prefers explicit
initialisation everywhere, then just let me know and I'll send such
patches.
---
net/ipv4/route.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e839d424b861..d8f82c0ac132 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -503,8 +503,8 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
__u8 tos = RT_FL_TOS(fl4);
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
- fl4->flowi4_scope = tos & RTO_ONLINK ?
- RT_SCOPE_LINK : RT_SCOPE_UNIVERSE;
+ if (tos & RTO_ONLINK)
+ fl4->flowi4_scope = RT_SCOPE_LINK;
}
static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
--
2.21.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect().
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
@ 2022-04-20 23:21 ` Guillaume Nault
2022-04-22 2:32 ` David Ahern
2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, David Ahern, dccp
Now that ip_rt_fix_tos() doesn't reset ->flowi4_scope unconditionally,
we don't have to rely on the RTO_ONLINK bit to properly set the scope
of a flowi4 structure. We can just set ->flowi4_scope explicitly and
avoid using RTO_ONLINK in ->flowi4_tos.
This patch converts callers of ip_route_connect(). Instead of setting
the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:
1- Drop the tos parameter from ip_route_connect(): its value was
entirely based on sk, which is also passed as parameter.
2- Set ->flowi4_scope depending on the SOCK_LOCALROUTE socket option
instead of always initialising it with RT_SCOPE_UNIVERSE (let's
define ip_sock_rt_scope() for this purpose).
3- Avoid overloading ->flowi4_tos with RTO_ONLINK: since the scope is
now properly initialised, we don't need to tell ip_rt_fix_tos() to
adjust ->flowi4_scope for us. So let's define ip_sock_rt_tos(),
which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
bit overload.
Note:
In the original ip_route_connect() code, __ip_route_output_key()
might clear the RTO_ONLINK bit of fl4->flowi4_tos (because of
ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
original tos variable. Now that we don't set RTO_ONLINK any more,
this is not a problem and we can use fl4->flowi4_tos in
flowi4_update_output().
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
include/net/route.h | 36 ++++++++++++++++++++++++------------
net/dccp/ipv4.c | 5 ++---
net/ipv4/af_inet.c | 6 +++---
net/ipv4/datagram.c | 7 +++----
net/ipv4/tcp_ipv4.c | 5 ++---
5 files changed, 34 insertions(+), 25 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 25404fc2b483..991a3985712d 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -43,6 +43,19 @@
#define RT_CONN_FLAGS(sk) (RT_TOS(inet_sk(sk)->tos) | sock_flag(sk, SOCK_LOCALROUTE))
#define RT_CONN_FLAGS_TOS(sk,tos) (RT_TOS(tos) | sock_flag(sk, SOCK_LOCALROUTE))
+static inline __u8 ip_sock_rt_scope(const struct sock *sk)
+{
+ if (sock_flag(sk, SOCK_LOCALROUTE))
+ return RT_SCOPE_LINK;
+
+ return RT_SCOPE_UNIVERSE;
+}
+
+static inline __u8 ip_sock_rt_tos(const struct sock *sk)
+{
+ return RT_TOS(inet_sk(sk)->tos);
+}
+
struct ip_tunnel_info;
struct fib_nh;
struct fib_info;
@@ -289,39 +302,38 @@ static inline char rt_tos2priority(u8 tos)
* ip_route_newports() calls.
*/
-static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32 src,
- u32 tos, int oif, u8 protocol,
+static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst,
+ __be32 src, int oif, u8 protocol,
__be16 sport, __be16 dport,
- struct sock *sk)
+ const struct sock *sk)
{
__u8 flow_flags = 0;
if (inet_sk(sk)->transparent)
flow_flags |= FLOWI_FLAG_ANYSRC;
- flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
- protocol, flow_flags, dst, src, dport, sport,
- sk->sk_uid);
+ flowi4_init_output(fl4, oif, sk->sk_mark, ip_sock_rt_tos(sk),
+ ip_sock_rt_scope(sk), protocol, flow_flags, dst,
+ src, dport, sport, sk->sk_uid);
}
-static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
- __be32 dst, __be32 src, u32 tos,
- int oif, u8 protocol,
+static inline struct rtable *ip_route_connect(struct flowi4 *fl4, __be32 dst,
+ __be32 src, int oif, u8 protocol,
__be16 sport, __be16 dport,
struct sock *sk)
{
struct net *net = sock_net(sk);
struct rtable *rt;
- ip_route_connect_init(fl4, dst, src, tos, oif, protocol,
- sport, dport, sk);
+ ip_route_connect_init(fl4, dst, src, oif, protocol, sport, dport, sk);
if (!dst || !src) {
rt = __ip_route_output_key(net, fl4);
if (IS_ERR(rt))
return rt;
ip_rt_put(rt);
- flowi4_update_output(fl4, oif, tos, fl4->daddr, fl4->saddr);
+ flowi4_update_output(fl4, oif, fl4->flowi4_tos, fl4->daddr,
+ fl4->saddr);
}
security_sk_classify_flow(sk, flowi4_to_flowi_common(fl4));
return ip_route_output_flow(net, fl4, sk);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ae662567a6cb..82696ab86f74 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -76,9 +76,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
orig_dport = usin->sin_port;
fl4 = &inet->cork.fl.u.ip4;
rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
- RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
- IPPROTO_DCCP,
- orig_sport, orig_dport, sk);
+ sk->sk_bound_dev_if, IPPROTO_DCCP, orig_sport,
+ orig_dport, sk);
if (IS_ERR(rt))
return PTR_ERR(rt);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 195ecfa2f000..93da9f783bec 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1233,9 +1233,9 @@ static int inet_sk_reselect_saddr(struct sock *sk)
/* Query new route. */
fl4 = &inet->cork.fl.u.ip4;
- rt = ip_route_connect(fl4, daddr, 0, RT_CONN_FLAGS(sk),
- sk->sk_bound_dev_if, sk->sk_protocol,
- inet->inet_sport, inet->inet_dport, sk);
+ rt = ip_route_connect(fl4, daddr, 0, sk->sk_bound_dev_if,
+ sk->sk_protocol, inet->inet_sport,
+ inet->inet_dport, sk);
if (IS_ERR(rt))
return PTR_ERR(rt);
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 48f337ccf949..ffd57523331f 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -44,10 +44,9 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
saddr = inet->mc_addr;
}
fl4 = &inet->cork.fl.u.ip4;
- rt = ip_route_connect(fl4, usin->sin_addr.s_addr, saddr,
- RT_CONN_FLAGS(sk), oif,
- sk->sk_protocol,
- inet->inet_sport, usin->sin_port, sk);
+ rt = ip_route_connect(fl4, usin->sin_addr.s_addr, saddr, oif,
+ sk->sk_protocol, inet->inet_sport,
+ usin->sin_port, sk);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
if (err == -ENETUNREACH)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 157265aecbed..2c2d42142555 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -229,9 +229,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
orig_dport = usin->sin_port;
fl4 = &inet->cork.fl.u.ip4;
rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
- RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
- IPPROTO_TCP,
- orig_sport, orig_dport, sk);
+ sk->sk_bound_dev_if, IPPROTO_TCP, orig_sport,
+ orig_dport, sk);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
if (err == -ENETUNREACH)
--
2.21.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
@ 2022-04-20 23:21 ` Guillaume Nault
2022-04-22 3:08 ` David Ahern
2022-04-22 3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
2022-04-22 12:50 ` patchwork-bot+netdevbpf
4 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-20 23:21 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, David Ahern
All the *_redirect() and *_update_pmtu() functions initialise their
struct flowi4 variable with either __build_flow_key() or
build_sk_flow_key(). When sk is provided, these functions use
RT_CONN_FLAGS() to set ->flowi4_tos and always use RT_SCOPE_UNIVERSE
for ->flowi4_scope. Then they rely on ip_rt_fix_tos() to adjust the
scope based on the RTO_ONLINK bit and to mask the tos with
IPTOS_RT_MASK.
This patch modifies __build_flow_key() and build_sk_flow_key() to
properly initialise ->flowi4_tos and ->flowi4_scope, so that the
ICMP redirects and PMTU handlers don't need an extra call to
ip_rt_fix_tos() before doing a fib lookup. That is, we:
* Drop RT_CONN_FLAGS(): use ip_sock_rt_tos() and ip_sock_rt_scope()
instead, so that we don't have to rely on ip_rt_fix_tos() to adjust
the scope anymore.
* Apply IPTOS_RT_MASK to the tos, so that we don't need
ip_rt_fix_tos() to do it for us.
* Drop the ip_rt_fix_tos() calls that now become useless.
The only remaining ip_rt_fix_tos() caller is ip_route_output_key_hash()
which needs it as long as external callers still use the RTO_ONLINK
flag.
Note:
This patch also drops some useless RT_TOS() calls as IPTOS_RT_MASK is
a stronger mask.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
net/ipv4/route.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d8f82c0ac132..ffbe2e4f8c89 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -508,23 +508,24 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
}
static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
- const struct sock *sk,
- const struct iphdr *iph,
- int oif, u8 tos,
- u8 prot, u32 mark, int flow_flags)
+ const struct sock *sk, const struct iphdr *iph,
+ int oif, __u8 tos, u8 prot, u32 mark,
+ int flow_flags)
{
+ __u8 scope = RT_SCOPE_UNIVERSE;
+
if (sk) {
const struct inet_sock *inet = inet_sk(sk);
oif = sk->sk_bound_dev_if;
mark = sk->sk_mark;
- tos = RT_CONN_FLAGS(sk);
+ tos = ip_sock_rt_tos(sk);
+ scope = ip_sock_rt_scope(sk);
prot = inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol;
}
- flowi4_init_output(fl4, oif, mark, tos,
- RT_SCOPE_UNIVERSE, prot,
- flow_flags,
- iph->daddr, iph->saddr, 0, 0,
+
+ flowi4_init_output(fl4, oif, mark, tos & IPTOS_RT_MASK, scope,
+ prot, flow_flags, iph->daddr, iph->saddr, 0, 0,
sock_net_uid(net, sk));
}
@@ -534,9 +535,9 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
const struct net *net = dev_net(skb->dev);
const struct iphdr *iph = ip_hdr(skb);
int oif = skb->dev->ifindex;
- u8 tos = RT_TOS(iph->tos);
u8 prot = iph->protocol;
u32 mark = skb->mark;
+ __u8 tos = iph->tos;
__build_flow_key(net, fl4, sk, iph, oif, tos, prot, mark, 0);
}
@@ -552,7 +553,8 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk)
if (inet_opt && inet_opt->opt.srr)
daddr = inet_opt->opt.faddr;
flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark,
- RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE,
+ ip_sock_rt_tos(sk) & IPTOS_RT_MASK,
+ ip_sock_rt_scope(sk),
inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
inet_sk_flowi_flags(sk),
daddr, inet->inet_saddr, 0, 0, sk->sk_uid);
@@ -825,14 +827,13 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf
const struct iphdr *iph = (const struct iphdr *) skb->data;
struct net *net = dev_net(skb->dev);
int oif = skb->dev->ifindex;
- u8 tos = RT_TOS(iph->tos);
u8 prot = iph->protocol;
u32 mark = skb->mark;
+ __u8 tos = iph->tos;
rt = (struct rtable *) dst;
__build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0);
- ip_rt_fix_tos(&fl4);
__ip_do_redirect(rt, skb, &fl4, true);
}
@@ -1061,7 +1062,6 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
struct flowi4 fl4;
ip_rt_build_flow_key(&fl4, sk, skb);
- ip_rt_fix_tos(&fl4);
/* Don't make lookup fail for bridged encapsulations */
if (skb && netif_is_any_bridge_port(skb->dev))
@@ -1078,8 +1078,8 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
struct rtable *rt;
u32 mark = IP4_REPLY_MARK(net, skb->mark);
- __build_flow_key(net, &fl4, NULL, iph, oif,
- RT_TOS(iph->tos), protocol, mark, 0);
+ __build_flow_key(net, &fl4, NULL, iph, oif, iph->tos, protocol, mark,
+ 0);
rt = __ip_route_output_key(net, &fl4);
if (!IS_ERR(rt)) {
__ip_rt_update_pmtu(rt, &fl4, mtu);
@@ -1136,8 +1136,6 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
goto out;
new = true;
- } else {
- ip_rt_fix_tos(&fl4);
}
__ip_rt_update_pmtu((struct rtable *)xfrm_dst_path(&rt->dst), &fl4, mtu);
@@ -1169,8 +1167,7 @@ void ipv4_redirect(struct sk_buff *skb, struct net *net,
struct flowi4 fl4;
struct rtable *rt;
- __build_flow_key(net, &fl4, NULL, iph, oif,
- RT_TOS(iph->tos), protocol, 0, 0);
+ __build_flow_key(net, &fl4, NULL, iph, oif, iph->tos, protocol, 0, 0);
rt = __ip_route_output_key(net, &fl4);
if (!IS_ERR(rt)) {
__ip_do_redirect(rt, skb, &fl4, false);
--
2.21.3
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
@ 2022-04-22 2:30 ` David Ahern
2022-04-22 10:53 ` Guillaume Nault
0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2022-04-22 2:30 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, dccp
On 4/20/22 5:21 PM, Guillaume Nault wrote:
> All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
> either by manual field assignment, memset(0) of the whole structure or
> implicit structure initialisation of on-stack variables
> (RT_SCOPE_UNIVERSE actually equals 0).
>
> Therefore, we don't need to always initialise ->flowi4_scope in
> ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
> the special RTO_ONLINK flag is present in the tos.
>
> This will allow some code simplification, like removing
> ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
> entirely by properly initialising ->flowi4_scope, instead of
> overloading ->flowi4_tos with a special flag. Eventually, this will
> allow to convert ->flowi4_tos to dscp_t.
>
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
> It's important for the correctness of this patch that all callers
> initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
> them is long, although each case is pretty trivial.
>
> If it helps, I can send a patch series that converts implicit
> initialisation of ->flowi4_scope with an explicit assignment to
> RT_SCOPE_UNIVERSE. This would also have the advantage of making it
> clear to future readers that ->flowi4_scope _has_ to be initialised. I
> haven't sent such patch series to not overwhelm reviewers with trivial
> and not technically-required changes (there are 40+ places to modify,
> scattered over 30+ different files). But if anyone prefers explicit
> initialisation everywhere, then just let me know and I'll send such
> patches.
There are a handful of places that open code the initialization of the
flow struct. I *think* I found all of them in 40867d74c374.
> ---
> net/ipv4/route.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index e839d424b861..d8f82c0ac132 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -503,8 +503,8 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
> __u8 tos = RT_FL_TOS(fl4);
>
> fl4->flowi4_tos = tos & IPTOS_RT_MASK;
> - fl4->flowi4_scope = tos & RTO_ONLINK ?
> - RT_SCOPE_LINK : RT_SCOPE_UNIVERSE;
> + if (tos & RTO_ONLINK)
> + fl4->flowi4_scope = RT_SCOPE_LINK;
> }
>
> static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect().
2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
@ 2022-04-22 2:32 ` David Ahern
0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2022-04-22 2:32 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, dccp
On 4/20/22 5:21 PM, Guillaume Nault wrote:
> Now that ip_rt_fix_tos() doesn't reset ->flowi4_scope unconditionally,
> we don't have to rely on the RTO_ONLINK bit to properly set the scope
> of a flowi4 structure. We can just set ->flowi4_scope explicitly and
> avoid using RTO_ONLINK in ->flowi4_tos.
>
> This patch converts callers of ip_route_connect(). Instead of setting
> the tos parameter with RT_CONN_FLAGS(sk), as all callers do, we can:
>
> 1- Drop the tos parameter from ip_route_connect(): its value was
> entirely based on sk, which is also passed as parameter.
>
> 2- Set ->flowi4_scope depending on the SOCK_LOCALROUTE socket option
> instead of always initialising it with RT_SCOPE_UNIVERSE (let's
> define ip_sock_rt_scope() for this purpose).
>
> 3- Avoid overloading ->flowi4_tos with RTO_ONLINK: since the scope is
> now properly initialised, we don't need to tell ip_rt_fix_tos() to
> adjust ->flowi4_scope for us. So let's define ip_sock_rt_tos(),
> which is the same as RT_CONN_FLAGS() but without the RTO_ONLINK
> bit overload.
>
> Note:
> In the original ip_route_connect() code, __ip_route_output_key()
> might clear the RTO_ONLINK bit of fl4->flowi4_tos (because of
> ip_rt_fix_tos()). Therefore flowi4_update_output() had to reuse the
> original tos variable. Now that we don't set RTO_ONLINK any more,
> this is not a problem and we can use fl4->flowi4_tos in
> flowi4_update_output().
>
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
> include/net/route.h | 36 ++++++++++++++++++++++++------------
> net/dccp/ipv4.c | 5 ++---
> net/ipv4/af_inet.c | 6 +++---
> net/ipv4/datagram.c | 7 +++----
> net/ipv4/tcp_ipv4.c | 5 ++---
> 5 files changed, 34 insertions(+), 25 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
@ 2022-04-22 3:08 ` David Ahern
0 siblings, 0 replies; 13+ messages in thread
From: David Ahern @ 2022-04-22 3:08 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI
On 4/20/22 5:21 PM, Guillaume Nault wrote:
> All the *_redirect() and *_update_pmtu() functions initialise their
> struct flowi4 variable with either __build_flow_key() or
> build_sk_flow_key(). When sk is provided, these functions use
> RT_CONN_FLAGS() to set ->flowi4_tos and always use RT_SCOPE_UNIVERSE
> for ->flowi4_scope. Then they rely on ip_rt_fix_tos() to adjust the
> scope based on the RTO_ONLINK bit and to mask the tos with
> IPTOS_RT_MASK.
>
> This patch modifies __build_flow_key() and build_sk_flow_key() to
> properly initialise ->flowi4_tos and ->flowi4_scope, so that the
> ICMP redirects and PMTU handlers don't need an extra call to
> ip_rt_fix_tos() before doing a fib lookup. That is, we:
>
> * Drop RT_CONN_FLAGS(): use ip_sock_rt_tos() and ip_sock_rt_scope()
> instead, so that we don't have to rely on ip_rt_fix_tos() to adjust
> the scope anymore.
>
> * Apply IPTOS_RT_MASK to the tos, so that we don't need
> ip_rt_fix_tos() to do it for us.
>
> * Drop the ip_rt_fix_tos() calls that now become useless.
>
> The only remaining ip_rt_fix_tos() caller is ip_route_output_key_hash()
> which needs it as long as external callers still use the RTO_ONLINK
> flag.
>
> Note:
> This patch also drops some useless RT_TOS() calls as IPTOS_RT_MASK is
> a stronger mask.
>
> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> ---
> net/ipv4/route.c | 37 +++++++++++++++++--------------------
> 1 file changed, 17 insertions(+), 20 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
` (2 preceding siblings ...)
2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
@ 2022-04-22 3:10 ` David Ahern
2022-04-22 11:02 ` Guillaume Nault
2022-04-22 12:50 ` patchwork-bot+netdevbpf
4 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2022-04-22 3:10 UTC (permalink / raw)
To: Guillaume Nault, David Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, Hideaki YOSHIFUJI, dccp
On 4/20/22 5:21 PM, Guillaume Nault wrote:
> RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
> It's stored in a normally unused bit of the ->flowi4_tos field, in
> struct flowi4. However it has several problems:
>
> * This bit is also used by ECN. Although ECN bits are supposed to be
> cleared before doing a route lookup, it happened that some code
> paths didn't properly sanitise their ->flowi4_tos. So this mechanism
> is fragile and we had bugs in the past where ECN bits slipped in and
> could end up being erroneously interpreted as RTO_ONLINK.
>
> * A dscp_t type was recently introduced to ensure ECN bits are cleared
> during route lookups. ->flowi4_tos is the most important structure
> field to convert, but RTO_ONLINK prevents such conversion, as dscp_t
> mandates that ECN bits (where RTO_ONLINK is stored) be zero.
>
> Therefore we need to stop using RTO_ONLINK altogether. Fortunately
> RTO_ONLINK isn't a necessity. Instead of passing a flag in ->flowi4_tos
> to tell the route lookup function to restrict the scope, we can simply
> initialise the scope correctly.
>
I believe the set looks ok. I think the fib test coverage in selftests
could use more tests to cover tos.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
2022-04-22 2:30 ` David Ahern
@ 2022-04-22 10:53 ` Guillaume Nault
2022-04-22 14:40 ` David Ahern
0 siblings, 1 reply; 13+ messages in thread
From: Guillaume Nault @ 2022-04-22 10:53 UTC (permalink / raw)
To: David Ahern
Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
Hideaki YOSHIFUJI, dccp
On Thu, Apr 21, 2022 at 08:30:52PM -0600, David Ahern wrote:
> On 4/20/22 5:21 PM, Guillaume Nault wrote:
> > All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
> > either by manual field assignment, memset(0) of the whole structure or
> > implicit structure initialisation of on-stack variables
> > (RT_SCOPE_UNIVERSE actually equals 0).
> >
> > Therefore, we don't need to always initialise ->flowi4_scope in
> > ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
> > the special RTO_ONLINK flag is present in the tos.
> >
> > This will allow some code simplification, like removing
> > ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
> > entirely by properly initialising ->flowi4_scope, instead of
> > overloading ->flowi4_tos with a special flag. Eventually, this will
> > allow to convert ->flowi4_tos to dscp_t.
> >
> > Signed-off-by: Guillaume Nault <gnault@redhat.com>
> > ---
> > It's important for the correctness of this patch that all callers
> > initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
> > them is long, although each case is pretty trivial.
> >
> > If it helps, I can send a patch series that converts implicit
> > initialisation of ->flowi4_scope with an explicit assignment to
> > RT_SCOPE_UNIVERSE. This would also have the advantage of making it
> > clear to future readers that ->flowi4_scope _has_ to be initialised. I
> > haven't sent such patch series to not overwhelm reviewers with trivial
> > and not technically-required changes (there are 40+ places to modify,
> > scattered over 30+ different files). But if anyone prefers explicit
> > initialisation everywhere, then just let me know and I'll send such
> > patches.
>
> There are a handful of places that open code the initialization of the
> flow struct. I *think* I found all of them in 40867d74c374.
By open code, do you mean "doesn't use flowi4_init_output() nor
ip_tunnel_init_flow()"? If so, I think there are many more.
To be sure we're on the same page, here's a small part of the diff for
my "explicit flowi4_scope initialisation" patch series:
drivers/infiniband/core/addr.c | 1 +
drivers/infiniband/sw/rxe/rxe_net.c | 1 +
drivers/net/amt.c | 3 +++
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun.c | 3 +++
drivers/net/ethernet/netronome/nfp/flower/action.c | 1 +
drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 7 +++++--
drivers/net/geneve.c | 9 +++++++--
drivers/net/gtp.c | 1 +
drivers/net/ipvlan/ipvlan_core.c | 1 +
drivers/net/vrf.c | 1 +
drivers/net/vxlan/vxlan_core.c | 1 +
drivers/net/wireguard/socket.c | 1 +
include/net/ip_tunnels.h | 1 +
include/net/route.h | 2 ++
net/core/filter.c | 1 +
net/core/lwt_bpf.c | 1 +
net/dccp/ipv4.c | 1 +
net/ipv4/icmp.c | 3 +++
net/ipv4/netfilter.c | 1 +
net/ipv4/netfilter/nf_reject_ipv4.c | 1 +
net/ipv4/route.c | 1 +
net/ipv4/xfrm4_policy.c | 1 +
net/netfilter/ipvs/ip_vs_xmit.c | 1 +
net/netfilter/nf_conntrack_h323_main.c | 3 +++
net/netfilter/nf_conntrack_sip.c | 1 +
net/netfilter/nft_flow_offload.c | 1 +
net/netfilter/nft_rt.c | 1 +
net/netfilter/xt_TCPMSS.c | 2 ++
net/sctp/protocol.c | 1 +
net/smc/smc_ib.c | 1 +
net/tipc/udp_media.c | 1 +
net/xfrm/xfrm_policy.c | 1 +
33 files changed, 53 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f253295795f0..5b6e0003eead 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -399,6 +399,7 @@ static int addr4_resolve(struct sockaddr *src_sock,
memset(&fl4, 0, sizeof(fl4));
fl4.daddr = dst_ip;
fl4.saddr = src_ip;
+ fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.flowi4_oif = addr->bound_dev_if;
rt = ip_route_output_key(addr->net, &fl4);
ret = PTR_ERR_OR_ZERO(rt);
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index c53f4529f098..cf6dc89a8785 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -31,6 +31,7 @@ static struct dst_entry *rxe_find_route4(struct net_device *ndev,
fl.flowi4_oif = ndev->ifindex;
memcpy(&fl.saddr, saddr, sizeof(*saddr));
memcpy(&fl.daddr, daddr, sizeof(*daddr));
+ fl.flowi4_scope = RT_SCOPE_UNIVERSE;
fl.flowi4_proto = IPPROTO_UDP;
rt = ip_route_output_key(&init_net, &fl);
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index 10455c9b9da0..3e957ff64715 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -990,6 +990,7 @@ static bool amt_send_membership_update(struct amt_dev *amt,
fl4.daddr = amt->remote_ip;
fl4.saddr = amt->local_ip;
fl4.flowi4_tos = AMT_TOS;
+ fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.flowi4_proto = IPPROTO_UDP;
rt = ip_route_output_key(amt->net, &fl4);
if (IS_ERR(rt)) {
@@ -1048,6 +1049,7 @@ static void amt_send_multicast_data(struct amt_dev *amt,
fl4.flowi4_oif = amt->stream_dev->ifindex;
fl4.daddr = tunnel->ip4;
fl4.saddr = amt->local_ip;
+ fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.flowi4_proto = IPPROTO_UDP;
rt = ip_route_output_key(amt->net, &fl4);
if (IS_ERR(rt)) {
@@ -1103,6 +1105,7 @@ static bool amt_send_membership_query(struct amt_dev *amt,
fl4.daddr = tunnel->ip4;
fl4.saddr = amt->local_ip;
fl4.flowi4_tos = AMT_TOS;
+ fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.flowi4_proto = IPPROTO_UDP;
rt = ip_route_output_key(amt->net, &fl4);
if (IS_ERR(rt)) {
...
> > ---
> > net/ipv4/route.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index e839d424b861..d8f82c0ac132 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -503,8 +503,8 @@ static void ip_rt_fix_tos(struct flowi4 *fl4)
> > __u8 tos = RT_FL_TOS(fl4);
> >
> > fl4->flowi4_tos = tos & IPTOS_RT_MASK;
> > - fl4->flowi4_scope = tos & RTO_ONLINK ?
> > - RT_SCOPE_LINK : RT_SCOPE_UNIVERSE;
> > + if (tos & RTO_ONLINK)
> > + fl4->flowi4_scope = RT_SCOPE_LINK;
> > }
> >
> > static void __build_flow_key(const struct net *net, struct flowi4 *fl4,
>
> Reviewed-by: David Ahern <dsahern@kernel.org>
>
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
2022-04-22 3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
@ 2022-04-22 11:02 ` Guillaume Nault
0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2022-04-22 11:02 UTC (permalink / raw)
To: David Ahern
Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
Hideaki YOSHIFUJI, dccp
On Thu, Apr 21, 2022 at 09:10:21PM -0600, David Ahern wrote:
> On 4/20/22 5:21 PM, Guillaume Nault wrote:
> > RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
> > It's stored in a normally unused bit of the ->flowi4_tos field, in
> > struct flowi4. However it has several problems:
> >
> > * This bit is also used by ECN. Although ECN bits are supposed to be
> > cleared before doing a route lookup, it happened that some code
> > paths didn't properly sanitise their ->flowi4_tos. So this mechanism
> > is fragile and we had bugs in the past where ECN bits slipped in and
> > could end up being erroneously interpreted as RTO_ONLINK.
> >
> > * A dscp_t type was recently introduced to ensure ECN bits are cleared
> > during route lookups. ->flowi4_tos is the most important structure
> > field to convert, but RTO_ONLINK prevents such conversion, as dscp_t
> > mandates that ECN bits (where RTO_ONLINK is stored) be zero.
> >
> > Therefore we need to stop using RTO_ONLINK altogether. Fortunately
> > RTO_ONLINK isn't a necessity. Instead of passing a flag in ->flowi4_tos
> > to tell the route lookup function to restrict the scope, we can simply
> > initialise the scope correctly.
> >
>
> I believe the set looks ok. I think the fib test coverage in selftests
> could use more tests to cover tos.
Yes, this is on my todo list. I also plan to review existing tests that
cover route lookups with link scope, and extend them if necessary.
Thanks for the review.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
` (3 preceding siblings ...)
2022-04-22 3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
@ 2022-04-22 12:50 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-04-22 12:50 UTC (permalink / raw)
To: Guillaume Nault; +Cc: davem, kuba, pabeni, netdev, yoshfuji, dsahern, dccp
Hello:
This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:
On Thu, 21 Apr 2022 01:21:19 +0200 you wrote:
> RTO_ONLINK is a flag that allows to reduce the scope of route lookups.
> It's stored in a normally unused bit of the ->flowi4_tos field, in
> struct flowi4. However it has several problems:
>
> * This bit is also used by ECN. Although ECN bits are supposed to be
> cleared before doing a route lookup, it happened that some code
> paths didn't properly sanitise their ->flowi4_tos. So this mechanism
> is fragile and we had bugs in the past where ECN bits slipped in and
> could end up being erroneously interpreted as RTO_ONLINK.
>
> [...]
Here is the summary with links:
- [net-next,1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
https://git.kernel.org/netdev/net-next/c/16a28267774c
- [net-next,2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect().
https://git.kernel.org/netdev/net-next/c/67e1e2f4854b
- [net-next,3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers.
https://git.kernel.org/netdev/net-next/c/b1ad41384866
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
2022-04-22 10:53 ` Guillaume Nault
@ 2022-04-22 14:40 ` David Ahern
2022-04-25 10:04 ` Guillaume Nault
0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2022-04-22 14:40 UTC (permalink / raw)
To: Guillaume Nault
Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
Hideaki YOSHIFUJI, dccp
On 4/22/22 4:53 AM, Guillaume Nault wrote:
> On Thu, Apr 21, 2022 at 08:30:52PM -0600, David Ahern wrote:
>> On 4/20/22 5:21 PM, Guillaume Nault wrote:
>>> All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
>>> either by manual field assignment, memset(0) of the whole structure or
>>> implicit structure initialisation of on-stack variables
>>> (RT_SCOPE_UNIVERSE actually equals 0).
>>>
>>> Therefore, we don't need to always initialise ->flowi4_scope in
>>> ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
>>> the special RTO_ONLINK flag is present in the tos.
>>>
>>> This will allow some code simplification, like removing
>>> ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
>>> entirely by properly initialising ->flowi4_scope, instead of
>>> overloading ->flowi4_tos with a special flag. Eventually, this will
>>> allow to convert ->flowi4_tos to dscp_t.
>>>
>>> Signed-off-by: Guillaume Nault <gnault@redhat.com>
>>> ---
>>> It's important for the correctness of this patch that all callers
>>> initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
>>> them is long, although each case is pretty trivial.
>>>
>>> If it helps, I can send a patch series that converts implicit
>>> initialisation of ->flowi4_scope with an explicit assignment to
>>> RT_SCOPE_UNIVERSE. This would also have the advantage of making it
>>> clear to future readers that ->flowi4_scope _has_ to be initialised. I
>>> haven't sent such patch series to not overwhelm reviewers with trivial
>>> and not technically-required changes (there are 40+ places to modify,
>>> scattered over 30+ different files). But if anyone prefers explicit
>>> initialisation everywhere, then just let me know and I'll send such
>>> patches.
>>
>> There are a handful of places that open code the initialization of the
>> flow struct. I *think* I found all of them in 40867d74c374.
>
> By open code, do you mean "doesn't use flowi4_init_output() nor
> ip_tunnel_init_flow()"? If so, I think there are many more.
>
no, you made a comment about flow struct being initialized to 0 which
implicitly initializes scope. My comment is that there are only a few
places that do not use either `memset(flow, 0, sizeof())` or `struct
flowi4 fl4 = {}` to fully initialize the struct.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos().
2022-04-22 14:40 ` David Ahern
@ 2022-04-25 10:04 ` Guillaume Nault
0 siblings, 0 replies; 13+ messages in thread
From: Guillaume Nault @ 2022-04-25 10:04 UTC (permalink / raw)
To: David Ahern
Cc: David Miller, Jakub Kicinski, Paolo Abeni, netdev,
Hideaki YOSHIFUJI, dccp
On Fri, Apr 22, 2022 at 08:40:01AM -0600, David Ahern wrote:
> On 4/22/22 4:53 AM, Guillaume Nault wrote:
> > On Thu, Apr 21, 2022 at 08:30:52PM -0600, David Ahern wrote:
> >> On 4/20/22 5:21 PM, Guillaume Nault wrote:
> >>> All callers already initialise ->flowi4_scope with RT_SCOPE_UNIVERSE,
> >>> either by manual field assignment, memset(0) of the whole structure or
> >>> implicit structure initialisation of on-stack variables
> >>> (RT_SCOPE_UNIVERSE actually equals 0).
> >>>
> >>> Therefore, we don't need to always initialise ->flowi4_scope in
> >>> ip_rt_fix_tos(). We only need to reduce the scope to RT_SCOPE_LINK when
> >>> the special RTO_ONLINK flag is present in the tos.
> >>>
> >>> This will allow some code simplification, like removing
> >>> ip_rt_fix_tos(). Also, the long term idea is to remove RTO_ONLINK
> >>> entirely by properly initialising ->flowi4_scope, instead of
> >>> overloading ->flowi4_tos with a special flag. Eventually, this will
> >>> allow to convert ->flowi4_tos to dscp_t.
> >>>
> >>> Signed-off-by: Guillaume Nault <gnault@redhat.com>
> >>> ---
> >>> It's important for the correctness of this patch that all callers
> >>> initialise ->flowi4_scope to 0 (in one way or another). Auditing all of
> >>> them is long, although each case is pretty trivial.
> >>>
> >>> If it helps, I can send a patch series that converts implicit
> >>> initialisation of ->flowi4_scope with an explicit assignment to
> >>> RT_SCOPE_UNIVERSE. This would also have the advantage of making it
> >>> clear to future readers that ->flowi4_scope _has_ to be initialised. I
> >>> haven't sent such patch series to not overwhelm reviewers with trivial
> >>> and not technically-required changes (there are 40+ places to modify,
> >>> scattered over 30+ different files). But if anyone prefers explicit
> >>> initialisation everywhere, then just let me know and I'll send such
> >>> patches.
> >>
> >> There are a handful of places that open code the initialization of the
> >> flow struct. I *think* I found all of them in 40867d74c374.
> >
> > By open code, do you mean "doesn't use flowi4_init_output() nor
> > ip_tunnel_init_flow()"? If so, I think there are many more.
> >
>
> no, you made a comment about flow struct being initialized to 0 which
> implicitly initializes scope. My comment is that there are only a few
> places that do not use either `memset(flow, 0, sizeof())` or `struct
> flowi4 fl4 = {}` to fully initialize the struct.
Yes, that's right. But I've only audited the call paths that lead to
ip_route_output_key_hash() (plus the ICMP error handlers), as these are
the ones that were relevant for this patch series. So I haven't looked
at flow initialisation in the ip_route_input*() or IPv6 call paths.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-04-25 10:04 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 23:21 [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 1/3] ipv4: Don't reset ->flowi4_scope in ip_rt_fix_tos() Guillaume Nault
2022-04-22 2:30 ` David Ahern
2022-04-22 10:53 ` Guillaume Nault
2022-04-22 14:40 ` David Ahern
2022-04-25 10:04 ` Guillaume Nault
2022-04-20 23:21 ` [PATCH net-next 2/3] ipv4: Avoid using RTO_ONLINK with ip_route_connect() Guillaume Nault
2022-04-22 2:32 ` David Ahern
2022-04-20 23:21 ` [PATCH net-next 3/3] ipv4: Initialise ->flowi4_scope properly in ICMP handlers Guillaume Nault
2022-04-22 3:08 ` David Ahern
2022-04-22 3:10 ` [PATCH net-next 0/3] ipv4: First steps toward removing RTO_ONLINK David Ahern
2022-04-22 11:02 ` Guillaume Nault
2022-04-22 12:50 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).