All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] ip: fix triggering of 'icmp redirect'
@ 2022-08-29 10:01 ` Nicolas Dichtel
  0 siblings, 0 replies; 9+ messages in thread
From: Nicolas Dichtel @ 2022-08-29 10:01 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David Ahern
  Cc: netdev, Heng Qi, Edwin Brossette, kernel test robot, lkp,
	Nicolas Dichtel, stable, kernel test robot

__mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
My understanding is that fib_validate_source() is used to know if the src
address and the gateway address are on the same link. For that,
fib_validate_source() returns 1 (same link) or 0 (not the same network).
__mkroute_input() is the only user of these positive values, all other
callers only look if the returned value is negative.

Since the below patch, fib_validate_source() didn't return anymore 1 when
both addresses are on the same network, because the route lookup returns
RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
Let's adapat the test to return 1 again when both addresses are on the same
link.

CC: stable@vger.kernel.org
Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

This code exists since more than two decades:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f

Please, feel free to comment if my analysis seems wrong.

Regards,
Nicolas

 net/ipv4/fib_frontend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index f361d3d56be2..943edf4ad4db 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	dev_match = dev_match || (res.type == RTN_LOCAL &&
 				  dev == net->loopback_dev);
 	if (dev_match) {
-		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
+		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
 		return ret;
 	}
 	if (no_addr)
@@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	ret = 0;
 	if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
 		if (res.type == RTN_UNICAST)
-			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
+			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
 	}
 	return ret;
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net] ip: fix triggering of 'icmp redirect'
@ 2022-08-29 10:01 ` Nicolas Dichtel
  0 siblings, 0 replies; 9+ messages in thread
From: Nicolas Dichtel @ 2022-08-29 10:01 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]

__mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
My understanding is that fib_validate_source() is used to know if the src
address and the gateway address are on the same link. For that,
fib_validate_source() returns 1 (same link) or 0 (not the same network).
__mkroute_input() is the only user of these positive values, all other
callers only look if the returned value is negative.

Since the below patch, fib_validate_source() didn't return anymore 1 when
both addresses are on the same network, because the route lookup returns
RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
Let's adapat the test to return 1 again when both addresses are on the same
link.

CC: stable(a)vger.kernel.org
Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

This code exists since more than two decades:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f

Please, feel free to comment if my analysis seems wrong.

Regards,
Nicolas

 net/ipv4/fib_frontend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index f361d3d56be2..943edf4ad4db 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	dev_match = dev_match || (res.type == RTN_LOCAL &&
 				  dev == net->loopback_dev);
 	if (dev_match) {
-		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
+		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
 		return ret;
 	}
 	if (no_addr)
@@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	ret = 0;
 	if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
 		if (res.type == RTN_UNICAST)
-			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
+			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
 	}
 	return ret;
 
-- 
2.33.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
  2022-08-29 10:01 ` Nicolas Dichtel
@ 2022-09-01  2:30   ` David Ahern
  -1 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2022-09-01  2:30 UTC (permalink / raw)
  To: Nicolas Dichtel, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet
  Cc: netdev, Heng Qi, Edwin Brossette, kernel test robot, lkp, stable,
	kernel test robot

On 8/29/22 4:01 AM, Nicolas Dichtel wrote:
> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
> My understanding is that fib_validate_source() is used to know if the src
> address and the gateway address are on the same link. For that,
> fib_validate_source() returns 1 (same link) or 0 (not the same network).
> __mkroute_input() is the only user of these positive values, all other
> callers only look if the returned value is negative.
> 
> Since the below patch, fib_validate_source() didn't return anymore 1 when
> both addresses are on the same network, because the route lookup returns
> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
> Let's adapat the test to return 1 again when both addresses are on the same
> link.
> 
> CC: stable@vger.kernel.org
> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
> Reported-by: kernel test robot <yujie.liu@intel.com>
> Reported-by: Heng Qi <hengqi@linux.alibaba.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> This code exists since more than two decades:
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f
> 
> Please, feel free to comment if my analysis seems wrong.
> 
> Regards,
> Nicolas
> 
>  net/ipv4/fib_frontend.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f361d3d56be2..943edf4ad4db 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	dev_match = dev_match || (res.type == RTN_LOCAL &&
>  				  dev == net->loopback_dev);
>  	if (dev_match) {
> -		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>  		return ret;
>  	}
>  	if (no_addr)
> @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	ret = 0;
>  	if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
>  		if (res.type == RTN_UNICAST)
> -			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>  	}
>  	return ret;
>  

Looks ok to me.

Reviewed-by: David Ahern <dsahern@kernel.org>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
@ 2022-09-01  2:30   ` David Ahern
  0 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2022-09-01  2:30 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 2409 bytes --]

On 8/29/22 4:01 AM, Nicolas Dichtel wrote:
> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
> My understanding is that fib_validate_source() is used to know if the src
> address and the gateway address are on the same link. For that,
> fib_validate_source() returns 1 (same link) or 0 (not the same network).
> __mkroute_input() is the only user of these positive values, all other
> callers only look if the returned value is negative.
> 
> Since the below patch, fib_validate_source() didn't return anymore 1 when
> both addresses are on the same network, because the route lookup returns
> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
> Let's adapat the test to return 1 again when both addresses are on the same
> link.
> 
> CC: stable(a)vger.kernel.org
> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
> Reported-by: kernel test robot <yujie.liu@intel.com>
> Reported-by: Heng Qi <hengqi@linux.alibaba.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> This code exists since more than two decades:
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f
> 
> Please, feel free to comment if my analysis seems wrong.
> 
> Regards,
> Nicolas
> 
>  net/ipv4/fib_frontend.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f361d3d56be2..943edf4ad4db 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	dev_match = dev_match || (res.type == RTN_LOCAL &&
>  				  dev == net->loopback_dev);
>  	if (dev_match) {
> -		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>  		return ret;
>  	}
>  	if (no_addr)
> @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	ret = 0;
>  	if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
>  		if (res.type == RTN_UNICAST)
> -			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>  	}
>  	return ret;
>  

Looks ok to me.

Reviewed-by: David Ahern <dsahern@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
  2022-08-29 10:01 ` Nicolas Dichtel
                   ` (2 preceding siblings ...)
  (?)
@ 2022-09-01  3:00 ` patchwork-bot+netdevbpf
  -1 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-01  3:00 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: davem, kuba, pabeni, edumazet, dsahern, netdev, hengqi,
	edwin.brossette, lkp, lkp, stable, yujie.liu

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Aug 2022 12:01:21 +0200 you wrote:
> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
> My understanding is that fib_validate_source() is used to know if the src
> address and the gateway address are on the same link. For that,
> fib_validate_source() returns 1 (same link) or 0 (not the same network).
> __mkroute_input() is the only user of these positive values, all other
> callers only look if the returned value is negative.
> 
> [...]

Here is the summary with links:
  - [net] ip: fix triggering of 'icmp redirect'
    https://git.kernel.org/netdev/net/c/eb55dc09b5dd

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
  2022-08-29 10:01 ` Nicolas Dichtel
  (?)
  (?)
@ 2022-09-01  3:00 ` patchwork-bot+netdevbpf
  -1 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-01  3:00 UTC (permalink / raw)
  To: lkp

[-- Attachment #1: Type: text/plain, Size: 843 bytes --]

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Aug 2022 12:01:21 +0200 you wrote:
> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
> My understanding is that fib_validate_source() is used to know if the src
> address and the gateway address are on the same link. For that,
> fib_validate_source() returns 1 (same link) or 0 (not the same network).
> __mkroute_input() is the only user of these positive values, all other
> callers only look if the returned value is negative.
> 
> [...]

Here is the summary with links:
  - [net] ip: fix triggering of 'icmp redirect'
    https://git.kernel.org/netdev/net/c/eb55dc09b5dd

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
  2022-08-29 10:01 ` Nicolas Dichtel
                   ` (3 preceding siblings ...)
  (?)
@ 2022-09-27 12:56 ` Julian Anastasov
  2022-09-29 14:27   ` Nicolas Dichtel
  -1 siblings, 1 reply; 9+ messages in thread
From: Julian Anastasov @ 2022-09-27 12:56 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: David Ahern, netdev


	Hello,

On Mon, 29 Aug 2022, Nicolas Dichtel wrote:

> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
> My understanding is that fib_validate_source() is used to know if the src
> address and the gateway address are on the same link. For that,
> fib_validate_source() returns 1 (same link) or 0 (not the same network).
> __mkroute_input() is the only user of these positive values, all other
> callers only look if the returned value is negative.
> 
> Since the below patch, fib_validate_source() didn't return anymore 1 when
> both addresses are on the same network, because the route lookup returns
> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
> Let's adapat the test to return 1 again when both addresses are on the same
> link.
> 
> CC: stable@vger.kernel.org
> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
> Reported-by: kernel test robot <yujie.liu@intel.com>
> Reported-by: Heng Qi <hengqi@linux.alibaba.com>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> This code exists since more than two decades:
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f
> 
> Please, feel free to comment if my analysis seems wrong.
> 
> Regards,
> Nicolas
> 
>  net/ipv4/fib_frontend.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f361d3d56be2..943edf4ad4db 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	dev_match = dev_match || (res.type == RTN_LOCAL &&
>  				  dev == net->loopback_dev);
>  	if (dev_match) {
> -		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;

	Looks like I'm late here. nhc_scope is related to the
nhc_gw. nhc_scope = RT_SCOPE_LINK means rt_gw4 is a target behing a GW 
(nhc_gw). OTOH, RT_SCOPE_HOST means nhc_gw is 0 (missing) or a local IP 
and as result, rt_gw4 is directly connected. IIRC, this should look
like this:

nhc_gw	nhc_scope		rt_gw4		fib_scope (route)
---------------------------------------------------------------------------
0	RT_SCOPE_NOWHERE	Host		RT_SCOPE_HOST (local)
0	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
LOCAL1	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
REM_GW1	RT_SCOPE_LINK		Universe	RT_SCOPE_UNIVERSE (indirect)

	For the code above: we do not check res->scope,
we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE).
It means, reverse route points back to same device and sender is not
reached via gateway REM_GW1.

	By changing it to nhc_scope >= RT_SCOPE_LINK, ret always
will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253).
Note that RT_SCOPE_HOST is 254.

	Looks like calling fib_info_update_nhc_saddr() in
nh_create_ipv4() with fib_nh->fib_nh_scope (nhc_scope) is a problem,
it should be fib_nh->fib_nh_scope - 1 or something like that,
see below. 127.0.0.1 is selected because it is
ifa_scope = RT_SCOPE_HOST, lo is first device and fib_nh_scope is 
RT_SCOPE_HOST when GW is not provided while creating the nexthop.

	About commit 747c14307214:

- if nexthop_create() assigns always nhc_scope = RT_SCOPE_LINK then
the assumption is that all added gateways are forced to be used
for routes to universe? If GW is not provided, we should use
nhc_scope = RT_SCOPE_HOST to allow link routes, right?
Now, the question is, can I use same nexthop in routes
with different scope ? What if later I add local IP that matches
the IP used in nexthop? This nexthop's GW becomes local one.
But these are corner cases.

	What I see as things to change:

- fib_check_nexthop(): "scope == RT_SCOPE_HOST",
"Route with host scope can not have multiple nexthops".
Why not? We may need to spread traffic to multiple
local IPs. But this is old problem and needs more
investigation.

- as fib_check_nh() is called (and sets nhc_scope) in
nh_create_ipv4(), i.e. when a nexthop is added, we should
tune nhc_scope in nh_create_ipv4() by selecting fib_cfg.fc_scope
based on the present GW and then to provide it to fib_check_nh()
and fib_info_update_nhc_saddr. As result, this nexthop will get
valid nhc_scope based on the current IPs and link routes and
also valid nh_saddr (not 127.0.0.1). We can do it as follows:

	.fc_scope = cfg->gw.ipv4 ? RT_SCOPE_UNIVERSE :
				   RT_SCOPE_LINK,

	As result, we will also fix the scope provided to
fib_info_update_nhc_saddr which looks like the main
problem here.

- later, if created route refers to this nexthop,
fib_check_nexthop() should make sure the provided
route's scope is not >= the nhc_scope, may be a
check in nexthop_check_scope() is needed. This
will ensure that new link route is not using nexthop
with provided gateway.

- __fib_validate_source() should match for RT_SCOPE_HOST
as before.

- fib_check_nh_nongw: no GW => RT_SCOPE_HOST

	So, we return to all state but with fixed
argument to fib_info_update_nhc_saddr().

>  		return ret;
>  	}
>  	if (no_addr)
> @@ -401,7 +401,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>  	ret = 0;
>  	if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
>  		if (res.type == RTN_UNICAST)
> -			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
> +			ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>  	}
>  	return ret;
>  
> -- 
> 2.33.0

Regards

--
Julian Anastasov <ja@ssi.bg>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
  2022-09-27 12:56 ` Julian Anastasov
@ 2022-09-29 14:27   ` Nicolas Dichtel
  2022-09-29 18:43     ` Julian Anastasov
  0 siblings, 1 reply; 9+ messages in thread
From: Nicolas Dichtel @ 2022-09-29 14:27 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Ahern, netdev

Le 27/09/2022 à 14:56, Julian Anastasov a écrit :
> 
> 	Hello,
Hello,

thanks for the detailed explanation.

> 
> On Mon, 29 Aug 2022, Nicolas Dichtel wrote:
> 
>> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
>> My understanding is that fib_validate_source() is used to know if the src
>> address and the gateway address are on the same link. For that,
>> fib_validate_source() returns 1 (same link) or 0 (not the same network).
>> __mkroute_input() is the only user of these positive values, all other
>> callers only look if the returned value is negative.
>>
>> Since the below patch, fib_validate_source() didn't return anymore 1 when
>> both addresses are on the same network, because the route lookup returns
>> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
>> Let's adapat the test to return 1 again when both addresses are on the same
>> link.
>>
>> CC: stable@vger.kernel.org
>> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
>> Reported-by: kernel test robot <yujie.liu@intel.com>
>> Reported-by: Heng Qi <hengqi@linux.alibaba.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>>
>> This code exists since more than two decades:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f
>>
>> Please, feel free to comment if my analysis seems wrong.
>>
>> Regards,
>> Nicolas
>>
>>  net/ipv4/fib_frontend.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index f361d3d56be2..943edf4ad4db 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>>  	dev_match = dev_match || (res.type == RTN_LOCAL &&
>>  				  dev == net->loopback_dev);
>>  	if (dev_match) {
>> -		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
>> +		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
> 
> 	Looks like I'm late here. nhc_scope is related to the
> nhc_gw. nhc_scope = RT_SCOPE_LINK means rt_gw4 is a target behing a GW 
> (nhc_gw). OTOH, RT_SCOPE_HOST means nhc_gw is 0 (missing) or a local IP 
> and as result, rt_gw4 is directly connected. IIRC, this should look
> like this:
> 
> nhc_gw	nhc_scope		rt_gw4		fib_scope (route)
> ---------------------------------------------------------------------------
> 0	RT_SCOPE_NOWHERE	Host		RT_SCOPE_HOST (local)
> 0	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
> LOCAL1	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
> REM_GW1	RT_SCOPE_LINK		Universe	RT_SCOPE_UNIVERSE (indirect)
> 
> 	For the code above: we do not check res->scope,
> we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE).
> It means, reverse route points back to same device and sender is not
> reached via gateway REM_GW1.
iproute2 reject a gw which is not directly connected, am I wrong?

> 
> 	By changing it to nhc_scope >= RT_SCOPE_LINK, ret always
> will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253).
> Note that RT_SCOPE_HOST is 254.
Do you have a setup which shows the problem?

> 
> 	Looks like calling fib_info_update_nhc_saddr() in
> nh_create_ipv4() with fib_nh->fib_nh_scope (nhc_scope) is a problem,
> it should be fib_nh->fib_nh_scope - 1 or something like that,
> see below. 127.0.0.1 is selected because it is
> ifa_scope = RT_SCOPE_HOST, lo is first device and fib_nh_scope is 
> RT_SCOPE_HOST when GW is not provided while creating the nexthop.
> 
> 	About commit 747c14307214:
> 
> - if nexthop_create() assigns always nhc_scope = RT_SCOPE_LINK then
> the assumption is that all added gateways are forced to be used
> for routes to universe? If GW is not provided, we should use
> nhc_scope = RT_SCOPE_HOST to allow link routes, right?
> Now, the question is, can I use same nexthop in routes
> with different scope ? What if later I add local IP that matches
> the IP used in nexthop? This nexthop's GW becomes local one.
> But these are corner cases.
> 
> 	What I see as things to change:
> 
> - fib_check_nexthop(): "scope == RT_SCOPE_HOST",
> "Route with host scope can not have multiple nexthops".
> Why not? We may need to spread traffic to multiple
> local IPs. But this is old problem and needs more
> investigation.
> 
> - as fib_check_nh() is called (and sets nhc_scope) in
> nh_create_ipv4(), i.e. when a nexthop is added, we should
> tune nhc_scope in nh_create_ipv4() by selecting fib_cfg.fc_scope
> based on the present GW and then to provide it to fib_check_nh()
> and fib_info_update_nhc_saddr. As result, this nexthop will get
> valid nhc_scope based on the current IPs and link routes and
> also valid nh_saddr (not 127.0.0.1). We can do it as follows:
> 
> 	.fc_scope = cfg->gw.ipv4 ? RT_SCOPE_UNIVERSE :
> 				   RT_SCOPE_LINK,
> 
> 	As result, we will also fix the scope provided to
> fib_info_update_nhc_saddr which looks like the main
> problem here.
> 
> - later, if created route refers to this nexthop,
> fib_check_nexthop() should make sure the provided
> route's scope is not >= the nhc_scope, may be a
> check in nexthop_check_scope() is needed. This
> will ensure that new link route is not using nexthop
> with provided gateway.
> 
> - __fib_validate_source() should match for RT_SCOPE_HOST
> as before.
> 
> - fib_check_nh_nongw: no GW => RT_SCOPE_HOST
After reverting the two commits (747c14307214 and eb55dc09b5dd) and putting the
below patch, the initial problem is fixed. But it's not clear what is broken
with the current code. Before sending these patches formally, it would be nice
to be able to add a selftest to demonstrate what is wrong.

@@ -2534,7 +2534,7 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh,
 	if (!err) {
 		nh->nh_flags = fib_nh->fib_nh_flags;
 		fib_info_update_nhc_saddr(net, &fib_nh->nh_common,
-					  fib_nh->fib_nh_scope);
+					  !fib_nh->fib_nh_scope ? 0 : fib_nh->fib_nh_scope - 1);
 	} else {
 		fib_nh_release(net, fib_nh);
 	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net] ip: fix triggering of 'icmp redirect'
  2022-09-29 14:27   ` Nicolas Dichtel
@ 2022-09-29 18:43     ` Julian Anastasov
  0 siblings, 0 replies; 9+ messages in thread
From: Julian Anastasov @ 2022-09-29 18:43 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: David Ahern, netdev

[-- Attachment #1: Type: text/plain, Size: 2882 bytes --]


	Hello,

On Thu, 29 Sep 2022, Nicolas Dichtel wrote:

> Le 27/09/2022 à 14:56, Julian Anastasov a écrit :
> > 
> > nhc_gw	nhc_scope		rt_gw4		fib_scope (route)
> > ---------------------------------------------------------------------------
> > 0	RT_SCOPE_NOWHERE	Host		RT_SCOPE_HOST (local)
> > 0	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
> > LOCAL1	RT_SCOPE_HOST		LAN_TARGET	RT_SCOPE_LINK (link)
> > REM_GW1	RT_SCOPE_LINK		Universe	RT_SCOPE_UNIVERSE (indirect)
> > 
> > 	For the code above: we do not check res->scope,
> > we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE).
> > It means, reverse route points back to same device and sender is not
> > reached via gateway REM_GW1.

	In short, to send redirects, sender should be
reachable via link route (with nhc_scope = RT_SCOPE_HOST).

> iproute2 reject a gw which is not directly connected, am I wrong?

	ip tool can not do it. It only provides route's scope
specified by user and the GW's scope is determined by 
fib_check_nh_v4_gw() as route's scope + 1 but at least RT_SCOPE_LINK:

                /* It is not necessary, but requires a bit of thinking */
                if (fl4.flowi4_scope < RT_SCOPE_LINK)
                        fl4.flowi4_scope = RT_SCOPE_LINK;

	The other allowed value for nhc_scope when rt_gw4 is not 0 is
RT_SCOPE_HOST (GW is a local IP, useful when autoselecting source
address from same subnet). It is set by fib_check_nh_v4_gw() when
res.type = RTN_LOCAL.

> > 	By changing it to nhc_scope >= RT_SCOPE_LINK, ret always
> > will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253).
> > Note that RT_SCOPE_HOST is 254.
> Do you have a setup which shows the problem?

	No, just by analyze. RT_SCOPE_LINK indicates sender
is reached via GW.

> After reverting the two commits (747c14307214 and eb55dc09b5dd) and putting the
> below patch, the initial problem is fixed. But it's not clear what is broken
> with the current code. Before sending these patches formally, it would be nice
> to be able to add a selftest to demonstrate what is wrong.

	What is broken? I guess, __fib_validate_source always
returns 1 causing redirects.

	As for nh_create_ipv4(), may be using scope=0 as
arg to fib_check_nh() should work. Now I can not find example
for corner case where this can fail.

> @@ -2534,7 +2534,7 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh,
>  	if (!err) {
>  		nh->nh_flags = fib_nh->fib_nh_flags;
>  		fib_info_update_nhc_saddr(net, &fib_nh->nh_common,
> -					  fib_nh->fib_nh_scope);
> +					  !fib_nh->fib_nh_scope ? 0 : fib_nh->fib_nh_scope - 1);

	And this fix is needed to not expose scope host
saddr (127.0.0.1) when nexthop is without GW and to
not expose scope link saddr when nexthop is with GW (for
traffic via scope global routes).

>  	} else {
>  		fib_nh_release(net, fib_nh);
>  	}

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-09-29 18:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29 10:01 [PATCH net] ip: fix triggering of 'icmp redirect' Nicolas Dichtel
2022-08-29 10:01 ` Nicolas Dichtel
2022-09-01  2:30 ` David Ahern
2022-09-01  2:30   ` David Ahern
2022-09-01  3:00 ` patchwork-bot+netdevbpf
2022-09-01  3:00 ` patchwork-bot+netdevbpf
2022-09-27 12:56 ` Julian Anastasov
2022-09-29 14:27   ` Nicolas Dichtel
2022-09-29 18:43     ` Julian Anastasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.