netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IPv6 L2TP issues related to 93531c67
@ 2019-07-15 16:18 Paul Donohue
  2019-07-15 18:55 ` David Ahern
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Donohue @ 2019-07-15 16:18 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

I have a system that establishes four L2TP over IPv6 tunnels using site-local addresses via the following:
ip l2tp add tunnel tunnel_id 1233 peer_tunnel_id 1233 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp1 tunnel_id 1233 session_id 1233 peer_session_id 1233
ip link set dev net_l2tp1 up
ip l2tp add tunnel tunnel_id 1235 peer_tunnel_id 1235 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp2 tunnel_id 1235 session_id 1235 peer_session_id 1235
ip link set dev net_l2tp2 up
ip l2tp add tunnel tunnel_id 2233 peer_tunnel_id 2233 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp3 tunnel_id 2233 session_id 2233 peer_session_id 2233
ip link set dev net_l2tp3 up
ip l2tp add tunnel tunnel_id 2235 peer_tunnel_id 2235 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp4 tunnel_id 2235 session_id 2235 peer_session_id 2235
ip link set dev net_l2tp4 up

These tunnels worked fine on kernel 4.4.  On kernel 4.15, there was a bug that caused intermittent L2TP packet errors, but everything worked fine after applying 4522a70db7aa5e77526a4079628578599821b193.

However, after upgrading to kernel 4.18 with 4522a70d (or upgrading to kernel 5.0 which includes 4522a70d, or upgrading to the current master kernel branch), two of the four tunnels always fail to work properly after a reboot, although it appears random which two work and which two fail.

When I say "fail to work properly", the problem is that packets generated by the l2tp kernel modules (in response to a packet being sent to the associated net_l2tpX interface) are silently dropped.  The l2tp_debugfs kernel module reports that L2TP packets are being transmitted with no errors, iptables counters and nflog rules can be used to confirm that well-formed packets are generated and sent, but tcpdump does not see the packets being sent on any interface on the system.  iptables reports that the destination interface of the lost packets is "lo" (which is clearly incorrect and probably an indicator of the underlying issue), but `tcpdump -nnn -i lo` doesn't show any packets.  Incoming L2TP packets appear to be processed correctly, only outgoing L2TP packets appear affected.

Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.

IPv4 L2TP tunnels do not appear affected by this issue.  Based on a few quick tests, it appears that switching to publicly-routable IPv6 addresses instead of site-local addresses seems to prevent this issue, although I haven't done sufficient testing of this, and it is not clear to me how the code in 93531c67 might be affected by the type of IPv6 address, so this observation may be a red herring.  Manually deleting and re-creating a broken interface seems to make it work again, although I have not thoroughly experimented with making changes after boot time to see if the problem is entirely random, if it is based on the number of existing interfaces, if it is based on a boot-time timing issue, etc.

It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed.  Could someone take a look and point me in the right direction for further troubleshooting?

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IPv6 L2TP issues related to 93531c67
  2019-07-15 16:18 IPv6 L2TP issues related to 93531c67 Paul Donohue
@ 2019-07-15 18:55 ` David Ahern
  2019-07-16 13:56   ` Paul Donohue
  0 siblings, 1 reply; 6+ messages in thread
From: David Ahern @ 2019-07-15 18:55 UTC (permalink / raw)
  To: Paul Donohue; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

[-- Attachment #1: Type: text/plain, Size: 3233 bytes --]

Hi Paul:

As an FYI, gmail thinks your emails are spam.

On 7/15/19 10:18 AM, Paul Donohue wrote:
> I have a system that establishes four L2TP over IPv6 tunnels using site-local addresses via the following:
...

> 
> These tunnels worked fine on kernel 4.4.  On kernel 4.15, there was a bug that caused intermittent L2TP packet errors, but everything worked fine after applying 4522a70db7aa5e77526a4079628578599821b193.
> 
> However, after upgrading to kernel 4.18 with 4522a70d (or upgrading to kernel 5.0 which includes 4522a70d, or upgrading to the current master kernel branch), two of the four tunnels always fail to work properly after a reboot, although it appears random which two work and which two fail.
> 
> When I say "fail to work properly", the problem is that packets generated by the l2tp kernel modules (in response to a packet being sent to the associated net_l2tpX interface) are silently dropped.  The l2tp_debugfs kernel module reports that L2TP packets are being transmitted with no errors, iptables counters and nflog rules can be used to confirm that well-formed packets are generated and sent, but tcpdump does not see the packets being sent on any interface on the system.  iptables reports that the destination interface of the lost packets is "lo" (which is clearly incorrect and probably an indicator of the underlying issue), but `tcpdump -nnn -i lo` doesn't show any packets.  Incoming L2TP packets appear to be processed correctly, only outgoing L2TP packets appear affected.
> 
> Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.

That commit can not be reverted. It is a foundational piece for a lot of
other changes. Did you mean the commit before it works and this commit
fails?

> 
> IPv4 L2TP tunnels do not appear affected by this issue.  Based on a few quick tests, it appears that switching to publicly-routable IPv6 addresses instead of site-local addresses seems to prevent this issue, although I haven't done sufficient testing of this, and it is not clear to me how the code in 93531c67 might be affected by the type of IPv6 address, so this observation may be a red herring.  Manually deleting and re-creating a broken interface seems to make it work again, although I have not thoroughly experimented with making changes after boot time to see if the problem is entirely random, if it is based on the number of existing interfaces, if it is based on a boot-time timing issue, etc.
> 
> It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed.  Could someone take a look and point me in the right direction for further troubleshooting?
> 

Let's get a complete example that demonstrates the problem, and I can go
from there. Can you take the attached script and update it so that it
reflects the problem you are reporting? That script works on latest
kernel as well as 4.14.133. It uses network namespaces for 2 hosts with
a router between them.

Also, check the return of the fib lookups using:
    perf record -e fib6:* -a
    <run test, ctrl-c on the record>
    perf script

Checkout the fib lookup parameters and result. Do they look correct to
you for your setup?

[-- Attachment #2: l2tp.sh --]
[-- Type: application/x-sh, Size: 3750 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IPv6 L2TP issues related to 93531c67
  2019-07-15 18:55 ` David Ahern
@ 2019-07-16 13:56   ` Paul Donohue
  2019-07-16 16:46     ` David Ahern
  2019-07-17 11:11     ` David Ahern
  0 siblings, 2 replies; 6+ messages in thread
From: Paul Donohue @ 2019-07-16 13:56 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

On Mon, Jul 15, 2019 at 12:55:48PM -0600, David Ahern wrote:
> As an FYI, gmail thinks your emails are spam.
Ugh.  Thanks for letting me know.  I'll look into it.

> On 7/15/19 10:18 AM, Paul Donohue wrote:
> > Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.
> That commit can not be reverted. It is a foundational piece for a lot of
> other changes. Did you mean the commit before it works and this commit
> fails?
Sorry, yes, I meant the commit before it works, and this one fails.  I did not try reverting this commit on a more recent kernel.

> > It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed.  Could someone take a look and point me in the right direction for further troubleshooting?
> Let's get a complete example that demonstrates the problem, and I can go
> from there. Can you take the attached script and update it so that it
> reflects the problem you are reporting? That script works on latest
> kernel as well as 4.14.133. It uses network namespaces for 2 hosts with
> a router between them.
> 
> Also, check the return of the fib lookups using:
>     perf record -e fib6:* -a
>     <run test, ctrl-c on the record>
>     perf script
> 
> Checkout the fib lookup parameters and result. Do they look correct to
> you for your setup?

Unfortunately, I have a fairly complicated setup, so it took me a while to figure out which pieces were relevant ... But I think I've finally got it.  The missing piece was IPsec.

After establishing an IPsec tunnel to carry the L2TP traffic, the first L2TP packet through the IPsec tunnel permanently breaks the associated L2TP tunnel.  Tearing down the IPsec tunnel does not restore functionality of the L2TP tunnel - I have to tear down and re-create the L2TP tunnel before it will work again.  In my real-world use case, I have two L2TP tunnels running over the same IPsec tunnel, and the first L2TP tunnel to send a packet after IPsec is established gets permanently broken, while the other L2TP tunnel works fine.

I've attached a modified version of the script which demonstrates this issue.

Thank you!
-Paul

[-- Attachment #2: l2tp.sh --]
[-- Type: application/x-sh, Size: 4977 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IPv6 L2TP issues related to 93531c67
  2019-07-16 13:56   ` Paul Donohue
@ 2019-07-16 16:46     ` David Ahern
  2019-07-17 11:11     ` David Ahern
  1 sibling, 0 replies; 6+ messages in thread
From: David Ahern @ 2019-07-16 16:46 UTC (permalink / raw)
  To: Paul Donohue; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

On 7/16/19 7:56 AM, Paul Donohue wrote:
> After establishing an IPsec tunnel to carry the L2TP traffic, the first L2TP packet through the IPsec tunnel permanently breaks the associated L2TP tunnel.  Tearing down the IPsec tunnel does not restore functionality of the L2TP tunnel - I have to tear down and re-create the L2TP tunnel before it will work again.  In my real-world use case, I have two L2TP tunnels running over the same IPsec tunnel, and the first L2TP tunnel to send a packet after IPsec is established gets permanently broken, while the other L2TP tunnel works fine.
> 
> I've attached a modified version of the script which demonstrates this issue.

Thanks. I will take a look at get back to you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IPv6 L2TP issues related to 93531c67
  2019-07-16 13:56   ` Paul Donohue
  2019-07-16 16:46     ` David Ahern
@ 2019-07-17 11:11     ` David Ahern
  2019-07-17 15:37       ` Paul Donohue
  1 sibling, 1 reply; 6+ messages in thread
From: David Ahern @ 2019-07-17 11:11 UTC (permalink / raw)
  To: Paul Donohue; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

On 7/16/19 7:56 AM, Paul Donohue wrote:
> 
> Unfortunately, I have a fairly complicated setup, so it took me a while to figure out which pieces were relevant ... But I think I've finally got it.  The missing piece was IPsec.
> 
> After establishing an IPsec tunnel to carry the L2TP traffic, the first L2TP packet through the IPsec tunnel permanently breaks the associated L2TP tunnel.  Tearing down the IPsec tunnel does not restore functionality of the L2TP tunnel - I have to tear down and re-create the L2TP tunnel before it will work again.  In my real-world use case, I have two L2TP tunnels running over the same IPsec tunnel, and the first L2TP tunnel to send a packet after IPsec is established gets permanently broken, while the other L2TP tunnel works fine.
> 
> I've attached a modified version of the script which demonstrates this issue.

This fixes the test script (whitespace damaged but simple enough to
manually patch). See if it fixes the problem with your more complex
setup. If so I will send a formal patch.

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d2e6b31a8d6..6fe3097b9ab7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2563,7 +2563,7 @@ static struct dst_entry *rt6_check(struct rt6_info
*rt,
 {
        u32 rt_cookie = 0;

-       if ((from && !fib6_get_cookie_safe(from, &rt_cookie)) ||
+       if (!from || !fib6_get_cookie_safe(from, &rt_cookie) ||
            rt_cookie != cookie)
                return NULL;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: IPv6 L2TP issues related to 93531c67
  2019-07-17 11:11     ` David Ahern
@ 2019-07-17 15:37       ` Paul Donohue
  0 siblings, 0 replies; 6+ messages in thread
From: Paul Donohue @ 2019-07-17 15:37 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

On Wed, Jul 17, 2019 at 05:11:21AM -0600, David Ahern wrote:
> This fixes the test script (whitespace damaged but simple enough to
> manually patch). See if it fixes the problem with your more complex
> setup. If so I will send a formal patch.

Yes! I applied this on top of f632a8170a6b667ee4e3f552087588f0fe13c4bb (master branch), and it fixes the problem on my systems.

Thank you very much!  I really appreciate all of your work on Linux networking!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-17 15:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-15 16:18 IPv6 L2TP issues related to 93531c67 Paul Donohue
2019-07-15 18:55 ` David Ahern
2019-07-16 13:56   ` Paul Donohue
2019-07-16 16:46     ` David Ahern
2019-07-17 11:11     ` David Ahern
2019-07-17 15:37       ` Paul Donohue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).