netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Julian Anastasov <ja@ssi.bg>
To: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: David Ahern <dsahern@kernel.org>, netdev@vger.kernel.org
Subject: Re: [PATCH net] ip: fix triggering of 'icmp redirect'
Date: Tue, 27 Sep 2022 15:56:15 +0300 (EEST)	[thread overview]
Message-ID: <6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg> (raw)
In-Reply-To: <20220829100121.3821-1-nicolas.dichtel@6wind.com>


	Hello,

On Mon, 29 Aug 2022, Nicolas Dichtel wrote:

> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
> My understanding is that fib_validate_source() is used to know if the src
> address and the gateway address are on the same link. For that,
> fib_validate_source() returns 1 (same link) or 0 (not the same network).
> __mkroute_input() is the only user of these positive values, all other
> callers only look if the returned value is negative.
> 
> Since the below patch, fib_validate_source() didn't return anymore 1 when
> both addresses are on the same network, because the route lookup returns
> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
> Let's adapat the test to return 1 again when both addresses are on the same
> link.
> 
> CC: stable@vger.kernel.org
> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
> Reported-by: kernel test robot <yujie.liu@intel.com>
> Reported-by: Heng Qi <hengqi@linux.alibaba.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> This code exists since more than two decades:
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f
> 
> Please, feel free to comment if my analysis seems wrong.
> 
> Regards,
> Nicolas
> 
>  net/ipv4/fib_frontend.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f361d3d56be2..943edf4ad4db 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	dev_match = dev_match || (res.type == RTN_LOCAL &&
>  				  dev == net->loopback_dev);
>  	if (dev_match) {
> -		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;

	Looks like I'm late here. nhc_scope is related to the
nhc_gw. nhc_scope = RT_SCOPE_LINK means rt_gw4 is a target behing a GW 
(nhc_gw). OTOH, RT_SCOPE_HOST means nhc_gw is 0 (missing) or a local IP 
and as result, rt_gw4 is directly connected. IIRC, this should look
like this:

nhc_gw	nhc_scope		rt_gw4		fib_scope (route)
---------------------------------------------------------------------------
0	RT_SCOPE_NOWHERE	Host		RT_SCOPE_HOST (local)
0	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
LOCAL1	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
REM_GW1	RT_SCOPE_LINK		Universe	RT_SCOPE_UNIVERSE (indirect)

	For the code above: we do not check res->scope,
we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE).
It means, reverse route points back to same device and sender is not
reached via gateway REM_GW1.

	By changing it to nhc_scope >= RT_SCOPE_LINK, ret always
will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253).
Note that RT_SCOPE_HOST is 254.

	Looks like calling fib_info_update_nhc_saddr() in
nh_create_ipv4() with fib_nh->fib_nh_scope (nhc_scope) is a problem,
it should be fib_nh->fib_nh_scope - 1 or something like that,
see below. 127.0.0.1 is selected because it is
ifa_scope = RT_SCOPE_HOST, lo is first device and fib_nh_scope is 
RT_SCOPE_HOST when GW is not provided while creating the nexthop.

	About commit 747c14307214:

- if nexthop_create() assigns always nhc_scope = RT_SCOPE_LINK then
the assumption is that all added gateways are forced to be used
for routes to universe? If GW is not provided, we should use
nhc_scope = RT_SCOPE_HOST to allow link routes, right?
Now, the question is, can I use same nexthop in routes
with different scope ? What if later I add local IP that matches
the IP used in nexthop? This nexthop's GW becomes local one.
But these are corner cases.

	What I see as things to change:

- fib_check_nexthop(): "scope == RT_SCOPE_HOST",
"Route with host scope can not have multiple nexthops".
Why not? We may need to spread traffic to multiple
local IPs. But this is old problem and needs more
investigation.

- as fib_check_nh() is called (and sets nhc_scope) in
nh_create_ipv4(), i.e. when a nexthop is added, we should
tune nhc_scope in nh_create_ipv4() by selecting fib_cfg.fc_scope
based on the present GW and then to provide it to fib_check_nh()
and fib_info_update_nhc_saddr. As result, this nexthop will get
valid nhc_scope based on the current IPs and link routes and
also valid nh_saddr (not 127.0.0.1). We can do it as follows:

	.fc_scope = cfg->gw.ipv4 ? RT_SCOPE_UNIVERSE :
				   RT_SCOPE_LINK,

	As result, we will also fix the scope provided to
fib_info_update_nhc_saddr which looks like the main
problem here.

- later, if created route refers to this nexthop,
fib_check_nexthop() should make sure the provided
route's scope is not >= the nhc_scope, may be a
check in nexthop_check_scope() is needed. This
will ensure that new link route is not using nexthop
with provided gateway.

- __fib_validate_source() should match for RT_SCOPE_HOST
as before.

- fib_check_nh_nongw: no GW => RT_SCOPE_HOST

	So, we return to all state but with fixed
argument to fib_info_update_nhc_saddr().

>  		return ret;
>  	}
>  	if (no_addr)
> @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	ret = 0;
>  	if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
>  		if (res.type == RTN_UNICAST)
> -			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>  	}
>  	return ret;
>  
> -- 
> 2.33.0

Regards

--
Julian Anastasov <ja@ssi.bg>


  parent reply	other threads:[~2022-09-27 12:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-29 10:01 [PATCH net] ip: fix triggering of 'icmp redirect' Nicolas Dichtel
2022-09-01  2:30 ` David Ahern
2022-09-01  3:00 ` patchwork-bot+netdevbpf
2022-09-27 12:56 ` Julian Anastasov [this message]
2022-09-29 14:27   ` Nicolas Dichtel
2022-09-29 18:43     ` Julian Anastasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg \
    --to=ja@ssi.bg \
    --cc=dsahern@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).