Re: [PATCH net] ipv4: fix route lookups when handling ICMP redirects and PMTU updates

From: David Ahern <dsahern@kernel.org>
To: Guillaume Nault <gnault@redhat.com>
Cc: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Subject: Re: [PATCH net] ipv4: fix route lookups when handling ICMP redirects and PMTU updates
Date: Mon, 28 Feb 2022 21:31:09 -0700	[thread overview]
Message-ID: <faca5750-911d-151f-d5fa-7a8ed3b43b08@kernel.org> (raw)
In-Reply-To: <20220228205440.GA24680@debian.home>

On 2/28/22 1:54 PM, Guillaume Nault wrote:
> On Mon, Feb 28, 2022 at 10:31:58AM -0700, David Ahern wrote:
>> On 2/28/22 10:16 AM, Guillaume Nault wrote:
>>> Fixes: d3a25c980fc2 ("ipv4: Fix nexthop exception hash computation.")
>>
>> That does not seem related to tos in the flow struct at all.
> 
> Ouch, copy/paste mistake.
> I meant 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions."), which is
> the next commit with 'git log -- net/ipv4/route.c'.
> Really sorry :/, and thanks a lot for catching that!
> 
>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>> index f33ad1f383b6..d5d058de3664 100644
>>> --- a/net/ipv4/route.c
>>> +++ b/net/ipv4/route.c
>>> @@ -499,6 +499,15 @@ void __ip_select_ident(struct net *net, struct iphdr *iph, int segs)
>>>  }
>>>  EXPORT_SYMBOL(__ip_select_ident);
>>>  
>>> +static void ip_rt_fix_tos(struct flowi4 *fl4)
>>
>> make this a static inline in include/net/flow.h and update
>> flowi4_init_output and flowi4_update_output to use it. That should cover
>> a few of the cases below leaving just  ...
> 
> Hum, I didn't think about this option, but it looks risky to me. As I
> put it in note 1, ip_route_output_key_hash() unconditionally sets
> ->flowi4_scope, assuming it can infer the scope from the RTO_ONLINK bit
> of ->flowi4_tos. If we santise these fields in flowi4_init_output()
> (and flowi4_update_output()), then ip_route_output_key_hash() would
> sometimes work on already santised values and sometimes not. So it
> wouldn't know if it should initialise ->flowi4_scope.
> 
> We could decide to let ip_route_output_key_hash() initialise
> ->flowi4_scope only when the RTO_ONLINK bit is set, which
> guarantees that we don't have sanitised values. But before that, we'd
> need to audit all other callers, to verify that they correctly
> initialise the ->flowi4_scope with RT_SCOPE_UNIVERSE, since
> ip_route_output_key_hash() isn't going do it for them anymore.
> I'll audit all these callers, but that should be something for
> net-next.

I'm not following the response. You are moving the tos logic from
ip_route_output_key_hash to a helper and calling the new helper for
other fib lookups. My suggestion was to correctly set / fixup the tos
and scope when flowi4 is initialized (reducing the number of places the
fixup is needed) and recognizing below that ip_route_output_key_hash
still needs the call to the new ip_rt_fix_tos.

> 
>>> @@ -2613,9 +2625,7 @@ struct rtable *ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>>>  	struct rtable *rth;
>>>  
>>>  	fl4->flowi4_iif = LOOPBACK_IFINDEX;
>>> -	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
>>> -	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
>>> -			 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
>>> +	ip_rt_fix_tos(fl4);
>>
>> ... this one to call the new helper.
> 
> BTW, here's a bit more about the context around this patch.
> I found the problem while working on removing the use of RTO_ONLINK, so
> that ->flowi4_tos could be converted to dscp_t.
> 
> The objective is to modify callers so that they'd set ->flowi4_scope
> directly, instead using RTO_ONLINK to mark their intention (and that's
> why I said I'd have to audit them anyway).
> 
> Once that will be done, ip_rt_fix_tos() won't have to touch the scope
> anymore. And once ->flowi4_tos will be converted to dscp_t, we'll can
> remove that function entirely since dscp_t ensures ECN bits are cleared
> (IPTOS_RT_MASK also ensures that high order bits are cleared too, but
> that's redundant with the RT_TOS() calls already done by callers, and
> which somewhat aren't really desirable anyway).
>