All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
@ 2017-04-20 12:58 Robert Shearman
  2017-04-20 14:21 ` David Ahern
  2017-04-20 22:18 ` [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check David Ahern
  0 siblings, 2 replies; 12+ messages in thread
From: Robert Shearman @ 2017-04-20 12:58 UTC (permalink / raw)
  To: davem; +Cc: netdev, David Ahern, Robert Shearman

David reported that doing the following:

    ip li add red type vrf table 10
    ip link set dev eth1 vrf red
    ip addr add 127.0.0.1/8 dev red
    ip link set dev eth1 up
    ip li set red up
    ping -c1 -w1 -I red 127.0.0.1
    ip li del red

results in a hang with this message:

    unregister_netdevice: waiting for red to become free. Usage count = 1

The problem is caused by caching the dst used for sending the packet
out of the specified interface on the route that the lookup returned
from the local table when the rule for the lookup in the local table
is ordered before the rule for lookups using l3mdevs. Thus the dst
could stay around until the route in the local table is deleted which
may be never.

Address the problem by not allocating a cacheable output dst if
FLOWI_FLAG_SKIP_NH_OIF is set and the nh device differs from the
device used for the dst.

Fixes: ebfc102c566d ("net: vrf: Flip IPv4 output path from FIB lookup hook to out hook")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/ipv4/route.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index acd69cfe2951..f667783ffd19 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2125,6 +2125,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		fi = NULL;
 	}
 
+	/* If the flag to skip the nh oif check is set then the output
+	 * device may not match the nh device, so cannot use or add to
+	 * cache in that case.
+	 */
+	if (unlikely(fl4->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF &&
+		     FIB_RES_NH(*res).nh_dev != dev_out))
+		do_cache = false;
+
 	fnhe = NULL;
 	do_cache &= fi != NULL;
 	if (do_cache) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 12:58 [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check Robert Shearman
@ 2017-04-20 14:21 ` David Ahern
  2017-04-20 14:39   ` Robert Shearman
  2017-04-20 22:18 ` [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check David Ahern
  1 sibling, 1 reply; 12+ messages in thread
From: David Ahern @ 2017-04-20 14:21 UTC (permalink / raw)
  To: Robert Shearman, davem; +Cc: netdev

On 4/20/17 6:58 AM, Robert Shearman wrote:
> David reported that doing the following:
> 
>     ip li add red type vrf table 10
>     ip link set dev eth1 vrf red
>     ip addr add 127.0.0.1/8 dev red
>     ip link set dev eth1 up
>     ip li set red up
>     ping -c1 -w1 -I red 127.0.0.1
>     ip li del red
> 
> results in a hang with this message:
> 
>     unregister_netdevice: waiting for red to become free. Usage count = 1

I think you misunderstood my comment. The above works fine today. There
is no bug with refcnts.

It breaks with your patches wanting to use a VRF device with the main
table (254).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 14:21 ` David Ahern
@ 2017-04-20 14:39   ` Robert Shearman
  2017-04-20 14:59     ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Shearman @ 2017-04-20 14:39 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev

On 20/04/17 15:21, David Ahern wrote:
> On 4/20/17 6:58 AM, Robert Shearman wrote:
>> David reported that doing the following:
>>
>>     ip li add red type vrf table 10
>>     ip link set dev eth1 vrf red
>>     ip addr add 127.0.0.1/8 dev red
>>     ip link set dev eth1 up
>>     ip li set red up
>>     ping -c1 -w1 -I red 127.0.0.1
>>     ip li del red
>>
>> results in a hang with this message:
>>
>>     unregister_netdevice: waiting for red to become free. Usage count = 1
>
> I think you misunderstood my comment. The above works fine today. There
> is no bug with refcnts.
>
> It breaks with your patches wanting to use a VRF device with the main
> table (254).

That doesn't seem to match with my experience. I can reproduce this on 
the net tree with the listed commands and the behaviour matches what I 
see in the code.

Thanks,
Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 14:39   ` Robert Shearman
@ 2017-04-20 14:59     ` David Ahern
  2017-04-20 15:05       ` Robert Shearman
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2017-04-20 14:59 UTC (permalink / raw)
  To: Robert Shearman, davem; +Cc: netdev

On 4/20/17 8:39 AM, Robert Shearman wrote:
> On 20/04/17 15:21, David Ahern wrote:
>> On 4/20/17 6:58 AM, Robert Shearman wrote:
>>> David reported that doing the following:
>>>
>>>     ip li add red type vrf table 10
>>>     ip link set dev eth1 vrf red
>>>     ip addr add 127.0.0.1/8 dev red
>>>     ip link set dev eth1 up
>>>     ip li set red up
>>>     ping -c1 -w1 -I red 127.0.0.1
>>>     ip li del red
>>>
>>> results in a hang with this message:
>>>
>>>     unregister_netdevice: waiting for red to become free. Usage count
>>> = 1
>>
>> I think you misunderstood my comment. The above works fine today. There
>> is no bug with refcnts.
>>
>> It breaks with your patches wanting to use a VRF device with the main
>> table (254).
> 
> That doesn't seem to match with my experience. I can reproduce this on
> the net tree with the listed commands and the behaviour matches what I
> see in the code.

Our mileage varies:

root@kenny-jessie3:~# ip li add red type vrf table 10
root@kenny-jessie3:~#     ip link set dev eth1 vrf red
root@kenny-jessie3:~#     ip addr add 127.0.0.1/8 dev red
root@kenny-jessie3:~#     ip link set dev eth1 up
root@kenny-jessie3:~#     ip li set red up
root@kenny-jessie3:~#     ping -c1 -w1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.059 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.059/0.059/0.059/0.000 ms
root@kenny-jessie3:~#     ip li del red
root@kenny-jessie3:~# uname -a
Linux kenny-jessie3 4.11.0-rc6+ #8 SMP Wed Apr 19 11:53:48 PDT 2017
x86_64 GNU/Linux

The above is one of many tests I run and never hit a problem deleting a
VRF device. dst's attached to fnhe_rth_output and fnhe_rth_input appear
to be properly flushed and device references released when the device is
deleted (NETDEV_DOWN and then NETDEV_UNREGISTER).

Can you send me your kernel config and "sysctl -a --pattern 'net.ipv4'"?
Perhaps you have something enabled I don't.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 14:59     ` David Ahern
@ 2017-04-20 15:05       ` Robert Shearman
  2017-04-20 15:16         ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Shearman @ 2017-04-20 15:05 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev

On 20/04/17 15:59, David Ahern wrote:
> On 4/20/17 8:39 AM, Robert Shearman wrote:
>> On 20/04/17 15:21, David Ahern wrote:
>>> On 4/20/17 6:58 AM, Robert Shearman wrote:
>>>> David reported that doing the following:
>>>>
>>>>     ip li add red type vrf table 10
>>>>     ip link set dev eth1 vrf red
>>>>     ip addr add 127.0.0.1/8 dev red
>>>>     ip link set dev eth1 up
>>>>     ip li set red up
>>>>     ping -c1 -w1 -I red 127.0.0.1
>>>>     ip li del red
>>>>
>>>> results in a hang with this message:
>>>>
>>>>     unregister_netdevice: waiting for red to become free. Usage count
>>>> = 1
>>>
>>> I think you misunderstood my comment. The above works fine today. There
>>> is no bug with refcnts.
>>>
>>> It breaks with your patches wanting to use a VRF device with the main
>>> table (254).
>>
>> That doesn't seem to match with my experience. I can reproduce this on
>> the net tree with the listed commands and the behaviour matches what I
>> see in the code.
>
> Our mileage varies:
>
> root@kenny-jessie3:~# ip li add red type vrf table 10
> root@kenny-jessie3:~#     ip link set dev eth1 vrf red
> root@kenny-jessie3:~#     ip addr add 127.0.0.1/8 dev red
> root@kenny-jessie3:~#     ip link set dev eth1 up
> root@kenny-jessie3:~#     ip li set red up
> root@kenny-jessie3:~#     ping -c1 -w1 -I red 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.059 ms
>
> --- 127.0.0.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.059/0.059/0.059/0.000 ms
> root@kenny-jessie3:~#     ip li del red
> root@kenny-jessie3:~# uname -a
> Linux kenny-jessie3 4.11.0-rc6+ #8 SMP Wed Apr 19 11:53:48 PDT 2017
> x86_64 GNU/Linux
>
> The above is one of many tests I run and never hit a problem deleting a
> VRF device. dst's attached to fnhe_rth_output and fnhe_rth_input appear
> to be properly flushed and device references released when the device is
> deleted (NETDEV_DOWN and then NETDEV_UNREGISTER).
>
> Can you send me your kernel config and "sysctl -a --pattern 'net.ipv4'"?
> Perhaps you have something enabled I don't.

The key thing I think is the ip rules:

$ ip rule
0:	from all lookup local
1000:	from all lookup [l3mdev-table]
32766:	from all lookup main
32767:	from all lookup default

Maybe you have the local rule moved down at startup?

I'll send you the requested information if not.

Thanks,
Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 15:05       ` Robert Shearman
@ 2017-04-20 15:16         ` David Ahern
  2017-04-20 15:35           ` Robert Shearman
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2017-04-20 15:16 UTC (permalink / raw)
  To: Robert Shearman, davem; +Cc: netdev

On 4/20/17 9:05 AM, Robert Shearman wrote:
> The key thing I think is the ip rules:
> 
> $ ip rule
> 0:    from all lookup local
> 1000:    from all lookup [l3mdev-table]
> 32766:    from all lookup main
> 32767:    from all lookup default
> 
> Maybe you have the local rule moved down at startup?

yes that would be a problem. With this test the 127.0.0.1 lookup is
matching on the wrong table:

perf probe fib_table_lookup%return ret=%ax
perf record -e fib:*,probe:* -a -- ping -c1 -w1 -I red 127.0.0.1
...
perf script

            ping  2803 [000] 70559.086446: fib:fib_table_lookup: table
255 oif 31 iif 1 src 0.0.0.0 dst 127.0.0.1 tos 0 scope 0 flags 4
            ping  2803 [000] 70559.086451: fib:fib_table_lookup_nh:
nexthop dev lo oif 1 src 127.0.0.1
            ping  2803 [000] 70559.086453: probe:fib_table_lookup:
(ffffffff8146aaaa <- ffffffff81470734) ret=0x0
            ping  2803 [000] 70559.086458: fib:fib_table_lookup: table
255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 4
            ping  2803 [000] 70559.086459: fib:fib_table_lookup_nh:
nexthop dev lo oif 1 src 127.0.0.1
            ping  2803 [000] 70559.086460: probe:fib_table_lookup:
(ffffffff8146aaaa <- ffffffff81470734) ret=0x0
            ping  2803 [000] 70559.086752: fib:fib_table_lookup: table
255 oif 31 iif 1 src 0.0.0.0 dst 127.0.0.1 tos 0 scope 0 flags 4
            ping  2803 [000] 70559.086754: fib:fib_table_lookup_nh:
nexthop dev lo oif 1 src 127.0.0.1
            ping  2803 [000] 70559.086755: probe:fib_table_lookup:
(ffffffff8146aaaa <- ffffffff81470734) ret=0x0
            ping  2803 [000] 70559.086768: fib:fib_table_lookup: table
255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 5
            ping  2803 [000] 70559.086769: fib:fib_table_lookup_nh:
nexthop dev lo oif 1 src 127.0.0.1
            ping  2803 [000] 70559.086770: probe:fib_table_lookup:
(ffffffff8146aaaa <- ffffffff81470734) ret=0x0
            ping  2803 [000] 70559.086781: fib:fib_table_lookup: table
255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 4
            ping  2803 [000] 70559.086782: fib:fib_table_lookup_nh:
nexthop dev lo oif 1 src 127.0.0.1
            ping  2803 [000] 70559.086782: probe:fib_table_lookup:
(ffffffff8146aaaa <- ffffffff81470734) ret=0x0
            ping  2803 [000] 70559.086787: fib:fib_table_lookup: table
255 oif 31 iif 1 src 127.0.0.1 dst 127.0.0.1 tos 0 scope 0 flags 5
            ping  2803 [000] 70559.086788: fib:fib_table_lookup_nh:
nexthop dev lo oif 1 src 127.0.0.1
            ping  2803 [000] 70559.086789: probe:fib_table_lookup:
(ffffffff8146aaaa <- ffffffff81470734) ret=0x0

Table 255 is the wrong table. That is the ultimate problem here.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 15:16         ` David Ahern
@ 2017-04-20 15:35           ` Robert Shearman
  2017-04-21 20:34             ` [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route Robert Shearman
  0 siblings, 1 reply; 12+ messages in thread
From: Robert Shearman @ 2017-04-20 15:35 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev

On 20/04/17 16:16, David Ahern wrote:
> On 4/20/17 9:05 AM, Robert Shearman wrote:
>> The key thing I think is the ip rules:
>>
>> $ ip rule
>> 0:    from all lookup local
>> 1000:    from all lookup [l3mdev-table]
>> 32766:    from all lookup main
>> 32767:    from all lookup default
>>
>> Maybe you have the local rule moved down at startup?
>
> yes that would be a problem. With this test the 127.0.0.1 lookup is
> matching on the wrong table:
>
> perf probe fib_table_lookup%return ret=%ax
> perf record -e fib:*,probe:* -a -- ping -c1 -w1 -I red 127.0.0.1
> ...
> perf script
>
...
> Table 255 is the wrong table. That is the ultimate problem here.
>

The table used for the lookup could be retrieved from the oif, but the 
check I introduced would still be required to cope with the unusual, but 
not disallowed, case of two VRF master devices referring to the same table.

Thanks,
Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 12:58 [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check Robert Shearman
  2017-04-20 14:21 ` David Ahern
@ 2017-04-20 22:18 ` David Ahern
  2017-04-21 17:17   ` Robert Shearman
  1 sibling, 1 reply; 12+ messages in thread
From: David Ahern @ 2017-04-20 22:18 UTC (permalink / raw)
  To: Robert Shearman, davem; +Cc: netdev

On 4/20/17 6:58 AM, Robert Shearman wrote:
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index acd69cfe2951..f667783ffd19 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2125,6 +2125,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
>  		fi = NULL;
>  	}
>  
> +	/* If the flag to skip the nh oif check is set then the output
> +	 * device may not match the nh device, so cannot use or add to
> +	 * cache in that case.
> +	 */
> +	if (unlikely(fl4->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF &&
> +		     FIB_RES_NH(*res).nh_dev != dev_out))
> +		do_cache = false;
> +
>  	fnhe = NULL;
>  	do_cache &= fi != NULL;
>  	if (do_cache) {
> 

I believe this is a better fix:

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5e1e60546fce..fb74a16958af 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2407,7 +2407,7 @@ struct rtable *__ip_route_output_key_hash(struct
net *net, struct flowi4 *fl4,
                }

                /* L3 master device is the loopback for that domain */
-               dev_out = l3mdev_master_dev_rcu(dev_out) ? :
net->loopback_dev;
+               dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
net->loopback_dev;
                fl4->flowi4_oif = dev_out->ifindex;
                flags |= RTCF_LOCAL;
                goto make_route;

Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")

With your change above, references to vrf devices are still taken
(dev_out is the vrf device based on the flow struct) even though the
route's nexthop is in another domain. And the commit log should
reference the use case which is policy routing overriding the VRF rule.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check
  2017-04-20 22:18 ` [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check David Ahern
@ 2017-04-21 17:17   ` Robert Shearman
  0 siblings, 0 replies; 12+ messages in thread
From: Robert Shearman @ 2017-04-21 17:17 UTC (permalink / raw)
  To: David Ahern, davem; +Cc: netdev

On 20/04/17 23:18, David Ahern wrote:
> On 4/20/17 6:58 AM, Robert Shearman wrote:
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index acd69cfe2951..f667783ffd19 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2125,6 +2125,14 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
>>  		fi = NULL;
>>  	}
>>
>> +	/* If the flag to skip the nh oif check is set then the output
>> +	 * device may not match the nh device, so cannot use or add to
>> +	 * cache in that case.
>> +	 */
>> +	if (unlikely(fl4->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF &&
>> +		     FIB_RES_NH(*res).nh_dev != dev_out))
>> +		do_cache = false;
>> +
>>  	fnhe = NULL;
>>  	do_cache &= fi != NULL;
>>  	if (do_cache) {
>>
>
> I believe this is a better fix:
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 5e1e60546fce..fb74a16958af 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2407,7 +2407,7 @@ struct rtable *__ip_route_output_key_hash(struct
> net *net, struct flowi4 *fl4,
>                 }
>
>                 /* L3 master device is the loopback for that domain */
> -               dev_out = l3mdev_master_dev_rcu(dev_out) ? :
> net->loopback_dev;
> +               dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
> net->loopback_dev;
>                 fl4->flowi4_oif = dev_out->ifindex;
>                 flags |= RTCF_LOCAL;
>                 goto make_route;
>
> Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")
>
> With your change above, references to vrf devices are still taken
> (dev_out is the vrf device based on the flow struct) even though the
> route's nexthop is in another domain. And the commit log should
> reference the use case which is policy routing overriding the VRF rule.

That is indeed a nicer fix - it survives all of my local testing. Thanks 
for correcting the fixes tag too.

I had included this text in the commit message to capture the condition 
of the rules ordering: "when the rule for the lookup in the local table
is ordered before the rule for lookups using l3mdevs". However, I'll try 
to make it more prominent and expand it to note the policy routing use 
case too.

Thanks,
Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route
  2017-04-20 15:35           ` Robert Shearman
@ 2017-04-21 20:34             ` Robert Shearman
  2017-04-21 20:37               ` David Ahern
  2017-04-24 16:52               ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Robert Shearman @ 2017-04-21 20:34 UTC (permalink / raw)
  To: davem; +Cc: netdev, David Ahern, Robert Shearman

David reported that doing the following:

    ip li add red type vrf table 10
    ip link set dev eth1 vrf red
    ip addr add 127.0.0.1/8 dev red
    ip link set dev eth1 up
    ip li set red up
    ping -c1 -w1 -I red 127.0.0.1
    ip li del red

when either policy routing IP rules are present or the local table
lookup ip rule is before the l3mdev lookup results in a hang with
these messages:

    unregister_netdevice: waiting for red to become free. Usage count = 1

The problem is caused by caching the dst used for sending the packet
out of the specified interface on a local route with a different
nexthop interface. Thus the dst could stay around until the route in
the table the lookup was done is deleted which may be never.

Address the problem by not forcing output device to be the l3mdev in
the flow's output interface if the lookup didn't use the l3mdev. This
then results in the dst using the right device according to the route.

Changes in v2:
 - make the dev_out passed in by __ip_route_output_key_hash correct
   instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
   suggested by David.

Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Suggested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/ipv4/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index acd69cfe2951..d9724889ff09 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2359,7 +2359,8 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
 		}
 
 		/* L3 master device is the loopback for that domain */
-		dev_out = l3mdev_master_dev_rcu(dev_out) ? : net->loopback_dev;
+		dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
+			net->loopback_dev;
 		fl4->flowi4_oif = dev_out->ifindex;
 		flags |= RTCF_LOCAL;
 		goto make_route;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route
  2017-04-21 20:34             ` [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route Robert Shearman
@ 2017-04-21 20:37               ` David Ahern
  2017-04-24 16:52               ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Ahern @ 2017-04-21 20:37 UTC (permalink / raw)
  To: Robert Shearman, davem; +Cc: netdev

On 4/21/17 2:34 PM, Robert Shearman wrote:
> David reported that doing the following:
> 
>     ip li add red type vrf table 10
>     ip link set dev eth1 vrf red
>     ip addr add 127.0.0.1/8 dev red
>     ip link set dev eth1 up
>     ip li set red up
>     ping -c1 -w1 -I red 127.0.0.1
>     ip li del red
> 
> when either policy routing IP rules are present or the local table
> lookup ip rule is before the l3mdev lookup results in a hang with
> these messages:
> 
>     unregister_netdevice: waiting for red to become free. Usage count = 1
> 
> The problem is caused by caching the dst used for sending the packet
> out of the specified interface on a local route with a different
> nexthop interface. Thus the dst could stay around until the route in
> the table the lookup was done is deleted which may be never.
> 
> Address the problem by not forcing output device to be the l3mdev in
> the flow's output interface if the lookup didn't use the l3mdev. This
> then results in the dst using the right device according to the route.
> 
> Changes in v2:
>  - make the dev_out passed in by __ip_route_output_key_hash correct
>    instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
>    suggested by David.
> 
> Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Suggested-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/ipv4/route.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index acd69cfe2951..d9724889ff09 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2359,7 +2359,8 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4,
>  		}
>  
>  		/* L3 master device is the loopback for that domain */
> -		dev_out = l3mdev_master_dev_rcu(dev_out) ? : net->loopback_dev;
> +		dev_out = l3mdev_master_dev_rcu(FIB_RES_DEV(res)) ? :
> +			net->loopback_dev;
>  		fl4->flowi4_oif = dev_out->ifindex;
>  		flags |= RTCF_LOCAL;
>  		goto make_route;
> 

LGTM

Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route
  2017-04-21 20:34             ` [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route Robert Shearman
  2017-04-21 20:37               ` David Ahern
@ 2017-04-24 16:52               ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2017-04-24 16:52 UTC (permalink / raw)
  To: rshearma; +Cc: netdev, dsa

From: Robert Shearman <rshearma@brocade.com>
Date: Fri, 21 Apr 2017 21:34:59 +0100

> David reported that doing the following:
> 
>     ip li add red type vrf table 10
>     ip link set dev eth1 vrf red
>     ip addr add 127.0.0.1/8 dev red
>     ip link set dev eth1 up
>     ip li set red up
>     ping -c1 -w1 -I red 127.0.0.1
>     ip li del red
> 
> when either policy routing IP rules are present or the local table
> lookup ip rule is before the l3mdev lookup results in a hang with
> these messages:
> 
>     unregister_netdevice: waiting for red to become free. Usage count = 1
> 
> The problem is caused by caching the dst used for sending the packet
> out of the specified interface on a local route with a different
> nexthop interface. Thus the dst could stay around until the route in
> the table the lookup was done is deleted which may be never.
> 
> Address the problem by not forcing output device to be the l3mdev in
> the flow's output interface if the lookup didn't use the l3mdev. This
> then results in the dst using the right device according to the route.
> 
> Changes in v2:
>  - make the dev_out passed in by __ip_route_output_key_hash correct
>    instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
>    suggested by David.
> 
> Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Suggested-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-04-24 16:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20 12:58 [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check Robert Shearman
2017-04-20 14:21 ` David Ahern
2017-04-20 14:39   ` Robert Shearman
2017-04-20 14:59     ` David Ahern
2017-04-20 15:05       ` Robert Shearman
2017-04-20 15:16         ` David Ahern
2017-04-20 15:35           ` Robert Shearman
2017-04-21 20:34             ` [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route Robert Shearman
2017-04-21 20:37               ` David Ahern
2017-04-24 16:52               ` David Miller
2017-04-20 22:18 ` [PATCH net] ipv4: Avoid caching dsts when lookup skipped nh oif check David Ahern
2017-04-21 17:17   ` Robert Shearman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.