netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: big ICMP requests get disrupted on IPSec tunnel activation
@ 2019-10-17  4:44 Bartschies, Thomas
  0 siblings, 0 replies; 6+ messages in thread
From: Bartschies, Thomas @ 2019-10-17  4:44 UTC (permalink / raw)
  To: 'Eric Dumazet', 'David Ahern',
	'netdev@vger.kernel.org'

No problem. This is exactly the way I've adapted it myself. So my tests seem to be valid.

Regards,
--
Thomas Bartschies
CVK IT Systeme

-----Ursprüngliche Nachricht-----
Von: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Gesendet: Mittwoch, 16. Oktober 2019 21:54
An: Bartschies, Thomas <Thomas.Bartschies@cvk.de>; 'David Ahern' <dsahern@gmail.com>; 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
Betreff: Re: AW: big ICMP requests get disrupted on IPSec tunnel activation



On 10/16/19 11:54 AM, Bartschies, Thomas wrote:
> Hello,
> 
> I had to adapt the second half to my test kernel. Just had to make some guesses, tested. and it works.
> Your conclusions are correct. I also suspected something like that, but with no knowledge of the inner
> workings of the ip stack I had very little chances to find the right fix by myself.
> 
> Thank you very much. Will retest know for a secondary forwarding problem that's much harder to reproduce.
> 

Sorry for the delay.

The patch backported to 5.2.18 would be something like :

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8c2ec35b6512f1486cf2ea01f4a19444c7422642..96c02146be0af1e66230627b401c35757f9dc702 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -626,6 +626,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
        if (skb_has_frag_list(skb)) {
                struct sk_buff *frag, *frag2;
                unsigned int first_len = skb_pagelen(skb);
+               ktime_t tstamp = skb->tstamp;
 
                if (first_len - hlen > mtu ||
                    ((first_len - hlen) & 7) ||
@@ -687,6 +688,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
                                ip_send_check(iph);
                        }
 
+                       skb->tstamp = tstamp;
                        err = output(net, sk, skb);
 
                        if (!err)



> Regards,
> --
> Thomas Bartschies
> CVK IT-Systeme
> 
> -----Ursprüngliche Nachricht-----
> Von: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Gesendet: Mittwoch, 16. Oktober 2019 17:41
> An: Bartschies, Thomas <Thomas.Bartschies@cvk.de>; 'David Ahern' <dsahern@gmail.com>; 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
> Betreff: Re: big ICMP requests get disrupted on IPSec tunnel activation
> 
> On 10/16/19 8:31 AM, Eric Dumazet wrote:
>>
>>
>> On 10/16/19 5:57 AM, Bartschies, Thomas wrote:
>>> Hello,
>>>
>>> did another test. This time I've changed the order. First triggered the IPSec policy and then tried to ping in parallel with a big packet size.
>>> Could also reproduce the issue, but the trace was completely different. May be this time I've got the trace for the problematic connection?
>>>
>>
>> This one was probably a false positive.
>>
>> The other one, I finally understood what was going on.
>>
>> You told us you removed netfilter, but it seems you still have the ip defrag modules there.
>>
>> (For a pure fowarding node, no reassembly-defrag should be needed)
>>
>> When ip_forward() is used, it correctly clears skb->tstamp
>>
>> But later, ip_do_fragment() might re-use the skbs found attached to 
>> the master skb and we do not init properly their skb->tstamp
>>
>> The master skb->tstamp should be copied to the children.
>>
>> I will send a patch asap.
>>
>> Thanks.
>>
> 
> Can you try :
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 28fca408812c5576fc4ea957c1c4dec97ec8faf3..c880229a01712ba5a9ed413f8aab2b56dfe93c82 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -808,6 +808,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>         if (skb_has_frag_list(skb)) {
>                 struct sk_buff *frag, *frag2;
>                 unsigned int first_len = skb_pagelen(skb);
> +               ktime_t tstamp = skb->tstamp;
>  
>                 if (first_len - hlen > mtu ||
>                     ((first_len - hlen) & 7) || @@ -846,6 +847,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>                                 ip_fraglist_prepare(skb, &iter);
>                         }
>  
> +                       skb->tstamp = tstamp;
>                         err = output(net, sk, skb);
>  
>                         if (!err)
> 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: big ICMP requests get disrupted on IPSec tunnel activation
  2019-10-16 15:31 ` Eric Dumazet
@ 2019-10-16 15:40   ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2019-10-16 15:40 UTC (permalink / raw)
  To: Bartschies, Thomas, 'David Ahern',
	'netdev@vger.kernel.org'



On 10/16/19 8:31 AM, Eric Dumazet wrote:
> 
> 
> On 10/16/19 5:57 AM, Bartschies, Thomas wrote:
>> Hello,
>>
>> did another test. This time I've changed the order. First triggered the IPSec policy and then tried to ping in parallel with a big packet size.
>> Could also reproduce the issue, but the trace was completely different. May be this time I've got the trace for the problematic connection?
>>
> 
> This one was probably a false positive.
> 
> The other one, I finally understood what was going on.
> 
> You told us you removed netfilter, but it seems you still have the ip defrag modules there.
> 
> (For a pure fowarding node, no reassembly-defrag should be needed)
> 
> When ip_forward() is used, it correctly clears skb->tstamp
> 
> But later, ip_do_fragment() might re-use the skbs found attached to the master skb
> and we do not init properly their skb->tstamp 
> 
> The master skb->tstamp should be copied to the children.
> 
> I will send a patch asap.
> 
> Thanks.
> 

Can you try :

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 28fca408812c5576fc4ea957c1c4dec97ec8faf3..c880229a01712ba5a9ed413f8aab2b56dfe93c82 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -808,6 +808,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
        if (skb_has_frag_list(skb)) {
                struct sk_buff *frag, *frag2;
                unsigned int first_len = skb_pagelen(skb);
+               ktime_t tstamp = skb->tstamp;
 
                if (first_len - hlen > mtu ||
                    ((first_len - hlen) & 7) ||
@@ -846,6 +847,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
                                ip_fraglist_prepare(skb, &iter);
                        }
 
+                       skb->tstamp = tstamp;
                        err = output(net, sk, skb);
 
                        if (!err)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: big ICMP requests get disrupted on IPSec tunnel activation
  2019-10-16 12:57 Bartschies, Thomas
@ 2019-10-16 15:31 ` Eric Dumazet
  2019-10-16 15:40   ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2019-10-16 15:31 UTC (permalink / raw)
  To: Bartschies, Thomas, 'David Ahern',
	'netdev@vger.kernel.org'



On 10/16/19 5:57 AM, Bartschies, Thomas wrote:
> Hello,
> 
> did another test. This time I've changed the order. First triggered the IPSec policy and then tried to ping in parallel with a big packet size.
> Could also reproduce the issue, but the trace was completely different. May be this time I've got the trace for the problematic connection?
> 

This one was probably a false positive.

The other one, I finally understood what was going on.

You told us you removed netfilter, but it seems you still have the ip defrag modules there.

(For a pure fowarding node, no reassembly-defrag should be needed)

When ip_forward() is used, it correctly clears skb->tstamp

But later, ip_do_fragment() might re-use the skbs found attached to the master skb
and we do not init properly their skb->tstamp 

The master skb->tstamp should be copied to the children.

I will send a patch asap.

Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: big ICMP requests get disrupted on IPSec tunnel activation
@ 2019-10-16 12:57 Bartschies, Thomas
  2019-10-16 15:31 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Bartschies, Thomas @ 2019-10-16 12:57 UTC (permalink / raw)
  To: 'Eric Dumazet', 'David Ahern',
	'netdev@vger.kernel.org'

Hello,

did another test. This time I've changed the order. First triggered the IPSec policy and then tried to ping in parallel with a big packet size.
Could also reproduce the issue, but the trace was completely different. May be this time I've got the trace for the problematic connection?

[Mi Okt 16 14:43:15 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.2.18-200.fc30.x86_64 #1
[Mi Okt 16 14:43:15 2019] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[Mi Okt 16 14:43:15 2019] RIP: 0010:fq_enqueue+0x531/0x610 [sch_fq]
[Mi Okt 16 14:43:15 2019] Code: c2 48 8b 3d 49 28 00 00 48 89 ee e8 39 0a ec c1 45 85 ed 75 d4 49 8b 2f e9 f1 fb ff ff 48 83 83 30 02 00 00 01 e9 fd fc ff ff <0f> 0b e9 83 fc ff ff 4c 89 ea 31 c0 e9 be fc ff ff 41 8b 46 08 39
[Mi Okt 16 14:43:15 2019] RSP: 0018:ffff948640748ae8 EFLAGS: 00010206
[Mi Okt 16 14:43:15 2019] RAX: 0000000df8475800 RBX: ffff895036e25c00 RCX: 0000000000000018
[Mi Okt 16 14:43:15 2019] RDX: 00000e7ebca89351 RSI: 0014ca96f0789efd RDI: ffffffff83423a60
[Mi Okt 16 14:43:15 2019] RBP: 15ce211ccec6bca9 R08: 000000002a91ed5a R09: 0000000000000001
[Mi Okt 16 14:43:15 2019] R10: ffff948640748a80 R11: 00000000bc3ed800 R12: ffff895034e00900
[Mi Okt 16 14:43:15 2019] R13: ffff89503720f2a0 R14: 00000000000004d3 R15: ffff895038809748
[Mi Okt 16 14:43:15 2019] FS:  0000000000000000(0000) GS:ffff89503bb80000(0000) knlGS:0000000000000000
[Mi Okt 16 14:43:15 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mi Okt 16 14:43:15 2019] CR2: 00007f3235d5d000 CR3: 00000001334a6003 CR4: 00000000000606e0
[Mi Okt 16 14:43:15 2019] Call Trace:
[Mi Okt 16 14:43:15 2019]  <IRQ>
[Mi Okt 16 14:43:15 2019]  ? __qdisc_run+0x151/0x540
[Mi Okt 16 14:43:15 2019]  __dev_queue_xmit+0x45b/0x960
[Mi Okt 16 14:43:15 2019]  ? mod_timer+0x15b/0x300
[Mi Okt 16 14:43:15 2019]  __neigh_update+0x432/0x9f0
[Mi Okt 16 14:43:15 2019]  arp_process+0x255/0x7e0
[Mi Okt 16 14:43:15 2019]  ? ip_forward+0x3c5/0x480
[Mi Okt 16 14:43:15 2019]  arp_rcv+0x18c/0x1d0
[Mi Okt 16 14:43:15 2019]  __netif_receive_skb_one_core+0x87/0x90
[Mi Okt 16 14:43:15 2019]  netif_receive_skb_internal+0x41/0xb0
[Mi Okt 16 14:43:15 2019]  napi_gro_receive+0xf6/0x160
[Mi Okt 16 14:43:15 2019]  e1000_clean_rx_irq+0x2ac/0x570 [e1000]
[Mi Okt 16 14:43:15 2019]  e1000_clean+0x278/0x5f0 [e1000]
[Mi Okt 16 14:43:15 2019]  net_rx_action+0x148/0x3b0
[Mi Okt 16 14:43:15 2019]  __do_softirq+0xed/0x30e
[Mi Okt 16 14:43:15 2019]  irq_exit+0xf1/0x100
[Mi Okt 16 14:43:15 2019]  do_IRQ+0x81/0xe0
[Mi Okt 16 14:43:15 2019]  common_interrupt+0xf/0xf
[Mi Okt 16 14:43:15 2019]  </IRQ>
[Mi Okt 16 14:43:15 2019] RIP: 0010:native_safe_halt+0xe/0x10
[Mi Okt 16 14:43:15 2019] Code: 90 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 36 06 44 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 26 06 44 00 fb f4 <c3> 90 66 66 66 66 90 41 54 55 53 e8 e2 01 79 ff 65 8b 2d cb 7f 64
[Mi Okt 16 14:43:15 2019] RSP: 0018:ffff9486406bbeb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd5
[Mi Okt 16 14:43:15 2019] RAX: ffffffff829c7d70 RBX: 0000000000000003 RCX: 0000000000000000
[Mi Okt 16 14:43:15 2019] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff89503bb9c500
[Mi Okt 16 14:43:15 2019] RBP: 0000000000000003 R08: 00000066a172847b R09: 0000000000000000
[Mi Okt 16 14:43:15 2019] R10: ffff89503bb97888 R11: 0000000000000000 R12: ffff89503a84be80
[Mi Okt 16 14:43:15 2019] R13: 0000000000000000 R14: 0000000000000000 R15: ffff89503a84be80
[Mi Okt 16 14:43:15 2019]  ? __cpuidle_text_start+0x8/0x8
[Mi Okt 16 14:43:15 2019]  default_idle+0x1a/0x140
[Mi Okt 16 14:43:15 2019]  do_idle+0x1fb/0x260
[Mi Okt 16 14:43:15 2019]  ? do_idle+0x17b/0x260
[Mi Okt 16 14:43:15 2019]  cpu_startup_entry+0x19/0x20
[Mi Okt 16 14:43:15 2019]  start_secondary+0x17d/0x1d0
[Mi Okt 16 14:43:15 2019]  secondary_startup_64+0xa4/0xb0
[Mi Okt 16 14:43:15 2019] ---[ end trace 601c6ff4362427bd ]---

Regards,
--
Thomas Bartschies
CVK IT Systeme

-----Ursprüngliche Nachricht-----
Von: Bartschies, Thomas 
Gesendet: Dienstag, 15. Oktober 2019 12:12
An: 'Eric Dumazet' <eric.dumazet@gmail.com>; 'David Ahern' <dsahern@gmail.com>; 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
Betreff: AW: AW: big ICMP requests get disrupted on IPSec tunnel activation

Hello Eric,

no problem. Just done that. Here are the results. Hope it helps. Did a parallel debug ftrace also. But no conclusive information within that. Tried it before.

[Di Okt 15 11:51:06 2019] WARNING: CPU: 1 PID: 0 at net/sched/sch_fq.c:389 fq_enqueue+0x531/0x610 [sch_fq]
[Di Okt 15 11:51:06 2019] Modules linked in: twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_ssse3 sha512_generic af_key xt_CHECKSUM ipt_rpfilter xt_statistic xt_connlimit nf_conncount ip_set_hash_ip br_netfilter bridge stp llc xt_HL xt_hl ipt_REJECT nf_reject_ipv4 xt_REDIRECT xt_recent xt_realm xt_CT xt_NETMAP xt_MASQUERADE xt_esp ipt_ECN xt_ecn xt_comment ipt_CLUSTERIP ipt_ah xt_addrtype act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_set ip_set xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_sane nf_conntrack_netlink nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_conntrack_tftp
[Di Okt 15 11:51:06 2019]  nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp ts_kmp nf_conntrack_amanda xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark xt_CLASSIFY xt_AUDIT iptable_raw iptable_nat nf_nat iptable_mangle iptable_filter vmw_vsock_vmci_transport vsock ip6t_REJECT nf_reject_ipv6 xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_log nfnetlink ip6table_filter ip6_tables sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf joydev vmw_balloon vmw_vmci i2c_piix4 sch_fq tcp_illinois binfmt_misc ip_tables vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi
[Di Okt 15 11:51:06 2019] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.18-200.fc30.x86_64 #1
[Di Okt 15 11:51:06 2019] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[Di Okt 15 11:51:06 2019] RIP: 0010:fq_enqueue+0x531/0x610 [sch_fq]
[Di Okt 15 11:51:06 2019] Code: db 48 8b 3d 49 28 00 00 48 89 ee e8 39 ba 03 db 45 85 ed 75 d4 49 8b 2f e9 f1 fb ff ff 48 83 83 30 02 00 00 01 e9 fd fc ff ff <0f> 0b e9 83 fc ff ff 4c 89 ea 31 c0 e9 be fc ff ff 41 8b 46 08 39
[Di Okt 15 11:51:06 2019] RSP: 0018:ffffa2fe806f0a58 EFLAGS: 00010206
[Di Okt 15 11:51:06 2019] RAX: 0000000df8475800 RBX: ffff973638a32800 RCX: 0000000000000000
[Di Okt 15 11:51:06 2019] RDX: 000003df1924d162 RSI: 0000000000000004 RDI: ffffffffc02845b5
[Di Okt 15 11:51:06 2019] RBP: 15cdc829bb26f866 R08: 0000000000000000 R09: ffff97363afc4e00
[Di Okt 15 11:51:06 2019] R10: 0000000000000078 R11: ffff97363808b088 R12: ffff97363821ce00
[Di Okt 15 11:51:06 2019] R13: ffff973634620cb0 R14: 00000000000003cb R15: ffff973635965c70
[Di Okt 15 11:51:06 2019] FS:  0000000000000000(0000) GS:ffff97363ba80000(0000) knlGS:0000000000000000
[Di Okt 15 11:51:06 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Di Okt 15 11:51:06 2019] CR2: 00005616fa9f90b8 CR3: 0000000135084005 CR4: 00000000000606e0
[Di Okt 15 11:51:06 2019] Call Trace:
[Di Okt 15 11:51:06 2019]  <IRQ>
[Di Okt 15 11:51:06 2019]  ? fq_flow_purge+0x70/0x70 [sch_fq]
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  __dev_queue_xmit+0x45b/0x960
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_finish_output2+0x2c8/0x580
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_do_fragment+0x69d/0x7a0
[Di Okt 15 11:51:06 2019]  ? __ip_flush_pending_frames.isra.0+0x80/0x80
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_output+0x71/0xf0
[Di Okt 15 11:51:06 2019]  ? ip_finish_output2+0x580/0x580
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_forward+0x3c5/0x480
[Di Okt 15 11:51:06 2019]  ? ip_defrag.cold+0x37/0x37
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  ip_rcv+0xbc/0xd0
[Di Okt 15 11:51:06 2019]  ? ip_rcv_finish_core.isra.0+0x390/0x390
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  __netif_receive_skb_one_core+0x80/0x90
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  netif_receive_skb_internal+0x41/0xb0
[Di Okt 15 11:51:06 2019]  ? return_to_handler+0x15/0x36
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  napi_gro_receive+0xf6/0x160
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  e1000_clean_rx_irq+0x2ac/0x570 [e1000]
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  e1000_clean+0x278/0x5f0 [e1000]
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  net_rx_action+0x148/0x3b0
[Di Okt 15 11:51:06 2019]  ? napi_complete_done+0x110/0x110
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  __do_softirq+0xed/0x30e
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  irq_exit+0xf1/0x100
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  do_IRQ+0x81/0xe0
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  common_interrupt+0xf/0xf
[Di Okt 15 11:51:06 2019]  </IRQ>
[Di Okt 15 11:51:06 2019] RIP: 0010:native_safe_halt+0xe/0x10
[Di Okt 15 11:51:06 2019] Code: 90 90 90 90 90 90 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d 46 06 44 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 36 06 44 00 fb f4 <c3> 90 e8 eb 98 03 00 41 54 55 53 e8 f2 01 79 ff 65 8b 2d db 7f 64
[Di Okt 15 11:51:06 2019] RSP: 0018:ffffa2fe806abeb0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffffd3
[Di Okt 15 11:51:06 2019] RAX: ffffffff9b9c7d60 RBX: 0000000000000001 RCX: 0000000000000000
[Di Okt 15 11:51:06 2019] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff9b1140d6
[Di Okt 15 11:51:06 2019] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff97363afc4e00
[Di Okt 15 11:51:06 2019] R10: 0000000000000c78 R11: ffff973638092c88 R12: ffff97363a84ddc0
[Di Okt 15 11:51:06 2019] R13: 0000000000000000 R14: 0000000000000000 R15: ffff97363a84ddc0
[Di Okt 15 11:51:06 2019]  ? __cpuidle_text_start+0x8/0x8
[Di Okt 15 11:51:06 2019]  ? do_idle+0x1f6/0x260
[Di Okt 15 11:51:06 2019]  default_idle+0x1a/0x140
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  do_idle+0x1fb/0x260
[Di Okt 15 11:51:06 2019]  ? cpu_startup_entry+0x19/0x20
[Di Okt 15 11:51:06 2019]  ? ftrace_graph_caller+0xa0/0xa0
[Di Okt 15 11:51:06 2019]  cpu_startup_entry+0x19/0x20
[Di Okt 15 11:51:06 2019]  start_secondary+0x17d/0x1d0
[Di Okt 15 11:51:06 2019]  secondary_startup_64+0xa4/0xb0
[Di Okt 15 11:51:06 2019] ---[ end trace 34f95424acff5ddb ]---

Regards,
--
Thomas Bartschies
CVK IT Systeme

-----Ursprüngliche Nachricht-----
Von: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Gesendet: Montag, 14. Oktober 2019 17:32
An: Bartschies, Thomas <Thomas.Bartschies@cvk.de>; 'David Ahern' <dsahern@gmail.com>; 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
Betreff: Re: AW: big ICMP requests get disrupted on IPSec tunnel activation



On 10/14/19 7:02 AM, Bartschies, Thomas wrote:
> Hello,
> 
> it took a while to build a testsystem for bisecting the issue. Finally I've identified the patch that causes my problems.
> BTW. The fq packet network scheduler is in use.
> 
> It's
> [PATCH net-next] tcp/fq: move back to CLOCK_MONOTONIC
> 
> In the recent TCP/EDT patch series, I switched TCP and sch_fq clocks 
> from MONOTONIC to TAI, in order to meet the choice done earlier for sch_etf packet scheduler.
> 
> But sure enough, this broke some setups were the TAI clock jumps forward (by almost 50 year...), as reported by Leonard Crestez.
> 
> If we want to converge later, we'll probably need to add an skb field to differentiate the clock bases, or a socket option.
> 
> In the meantime, an UDP application will need to use CLOCK_MONOTONIC 
> base for its SCM_TXTIME timestamps if using fq packet scheduler.
> 
> Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
> Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI")
> Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI")
> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Reported-by: Leonard Crestez <leonard.crestez@xxxxxxx>
> 
> ----
> 
> After reverting it in a current 5.2.18 kernel, the problem disappears. 
> There were some post fixes for other issues caused by this patch. 
> These fixed other similar issues, but not mine. I've already tried to 
> set the tstamp to zero in xfrm4_output.c, but with no luck so far. I'm pretty sure, that reverting the clock patch isn't the proper solution for upstream. So I what other way this can be fixed?


Thanks a lot Thomas for this report !

I guess you could add a debug check in fq to let us know the call graph.

Something like the following :

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c index 98dd87ce15108cfe1c011da44ba32f97763776c8..2aa41a39e81b94f3b7092dc51b91829f5929634d 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -380,9 +380,14 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)  {
        struct rb_node **p, *parent;
        struct sk_buff *head, *aux;
+       s64 delay;
 
        fq_skb_cb(skb)->time_to_send = skb->tstamp ?: ktime_get_ns();
 
+       /* We should really add a TCA_FQ_MAX_HORIZON  at some point :( */
+       delay = fq_skb_cb(skb)->time_to_send - ktime_get_ns();
+       WARN_ON_ONCE(delay > 60 * NSEC_PER_SEC);
+
        head = flow->head;
        if (!head ||
            fq_skb_cb(skb)->time_to_send >= fq_skb_cb(flow->tail)->time_to_send) {


> 
> ---
> [PATCH net] net: clear skb->tstamp in bridge forwarding path Matteo 
> reported forwarding issues inside the linux bridge, if the enslaved interfaces use the fq qdisc.
> 
> Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding 
> paths"), we need to clear the tstamp field in the bridge forwarding path.
> 
> Fixes: 80b14dee2bea ("net: Add a new socket option for a future 
> transmit time.")
> Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
> Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> and
> 
> net: clear skb->tstamp in forwarding paths
> 
> Sergey reported that forwarding was no longer working if fq packet scheduler was used.
> 
> This is caused by the recent switch to EDT model, since incoming 
> packets might have been timestamped by __net_timestamp()
> 
> __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base.
> 
> The fix is to clear skb->tstamp in forwarding paths.
> 
> Fixes: 80b14dee ("net: Add a new socket option for a future transmit 
> time.")
> Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
> Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
> Reported-by: default avatarSergey Matyukevich <geomatsi@gmail.com>
> Tested-by: default avatarSergey Matyukevich <geomatsi@gmail.com>
> Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
> 
> Best regards,
> --
> Thomas Bartschies
> CVK IT Systeme
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Bartschies, Thomas
> Gesendet: Dienstag, 17. September 2019 09:28
> An: 'David Ahern' <dsahern@gmail.com>; 'netdev@vger.kernel.org' 
> <netdev@vger.kernel.org>
> Betreff: AW: big ICMP requests get disrupted on IPSec tunnel 
> activation
> 
> Hello,
> 
> thanks for the suggestion. Running pmtu.sh with kernel versions 4.19, 4.20 and even 5.2.13 made no difference. All tests were successful every time.
> 
> Although my external ping tests still failing with the newer kernels. I've ran the script after triggering my problem, to make sure all possible side effects happening. 
> 
> Please keep in mind, that even when the ICMP requests stalling, other connections still going through. Like e.g. ssh or tracepath. I would expect that all connection types would be affected if this is a MTU problem. Am I wrong?
> 
> Any suggestions for more tests to isolate the cause? 
> 
> Best regards,
> --
> Thomas Bartschies
> CVK IT Systeme
> 
> -----Ursprüngliche Nachricht-----
> Von: David Ahern [mailto:dsahern@gmail.com]
> Gesendet: Freitag, 13. September 2019 19:13
> An: Bartschies, Thomas <Thomas.Bartschies@cvk.de>; 
> 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
> Betreff: Re: big ICMP requests get disrupted on IPSec tunnel 
> activation
> 
> On 9/13/19 9:59 AM, Bartschies, Thomas wrote:
>> Hello together,
>>
>> since kenel 4.20 we're observing a strange behaviour when sending big ICMP packets. An example is a packet size of 3000 bytes.
>> The packets should be forwarded by a linux gateway (firewall) having multiple interfaces also acting as a vpn gateway.
>>
>> Test steps:
>> 1. Disabled all iptables rules
>> 2. Enabled the VPN IPSec Policies.
>> 3. Start a ping with packet size (e.g. 3000 bytes) from a client in 
>> the DMZ passing the machine targeting another LAN machine 4. Ping 
>> works 5. Enable a VPN policy by sending pings from the gateway to a 
>> tunnel target. System tries to create the tunnel 6. Ping from 3. immediately stalls. No error messages. Just stops.
>> 7. Stop Ping from 3. Start another without packet size parameter. Stalls also.
>>
>> Result:
>> Connections from the client to other services on the LAN machine 
>> still work. Tracepath works. Only ICMP requests do not pass the 
>> gateway anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN interface. IMCP requests to any other target IP address in LAN still work. Until one uses a bigger packet size. Then these alternative connections stall also.
>>
>> Flushing the policy table has no effect. Flushing the conntrack table has no effect. Setting rp_filter to loose (2) has no effect.
>> Flush the route cache has no effect.
>>
>> Only a reboot of the gateway restores normal behavior.
>>
>> What can be the cause? Is this a networking bug?
>>
> 
> some of these most likely will fail due to other reasons, but can you run 'tools/testing/selftests/net/pmtu.sh'[1] on 4.19 and then 4.20 and compare results. Hopefully it will shed some light on the problem and can be used to bisect to a commit that caused the regression.
> 
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> e/tools/testing/selftests/net/pmtu.sh
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: big ICMP requests get disrupted on IPSec tunnel activation
  2019-09-13  8:59 Bartschies, Thomas
@ 2019-09-13 17:13 ` David Ahern
  0 siblings, 0 replies; 6+ messages in thread
From: David Ahern @ 2019-09-13 17:13 UTC (permalink / raw)
  To: Bartschies, Thomas, 'netdev@vger.kernel.org'

On 9/13/19 9:59 AM, Bartschies, Thomas wrote:
> Hello together,
> 
> since kenel 4.20 we're observing a strange behaviour when sending big ICMP packets. An example is a packet size of 3000 bytes.
> The packets should be forwarded by a linux gateway (firewall) having multiple interfaces also acting as a vpn gateway.
> 
> Test steps:
> 1. Disabled all iptables rules
> 2. Enabled the VPN IPSec Policies.
> 3. Start a ping with packet size (e.g. 3000 bytes) from a client in the DMZ passing the machine targeting another LAN machine
> 4. Ping works
> 5. Enable a VPN policy by sending pings from the gateway to a tunnel target. System tries to create the tunnel
> 6. Ping from 3. immediately stalls. No error messages. Just stops.
> 7. Stop Ping from 3. Start another without packet size parameter. Stalls also.
> 
> Result:
> Connections from the client to other services on the LAN machine still work. Tracepath works. Only ICMP requests do not pass
> the gateway anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN interface. IMCP requests to any
> other target IP address in LAN still work. Until one uses a bigger packet size. Then these alternative connections stall also.
> 
> Flushing the policy table has no effect. Flushing the conntrack table has no effect. Setting rp_filter to loose (2) has no effect.
> Flush the route cache has no effect.
> 
> Only a reboot of the gateway restores normal behavior.
> 
> What can be the cause? Is this a networking bug?
> 

some of these most likely will fail due to other reasons, but can you
run 'tools/testing/selftests/net/pmtu.sh'[1] on 4.19 and then 4.20 and
compare results. Hopefully it will shed some light on the problem and
can be used to bisect to a commit that caused the regression.


[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/net/pmtu.sh


^ permalink raw reply	[flat|nested] 6+ messages in thread

* big ICMP requests get disrupted on IPSec tunnel activation
@ 2019-09-13  8:59 Bartschies, Thomas
  2019-09-13 17:13 ` David Ahern
  0 siblings, 1 reply; 6+ messages in thread
From: Bartschies, Thomas @ 2019-09-13  8:59 UTC (permalink / raw)
  To: 'netdev@vger.kernel.org'

Hello together,

since kenel 4.20 we're observing a strange behaviour when sending big ICMP packets. An example is a packet size of 3000 bytes.
The packets should be forwarded by a linux gateway (firewall) having multiple interfaces also acting as a vpn gateway.

Test steps:
1. Disabled all iptables rules
2. Enabled the VPN IPSec Policies.
3. Start a ping with packet size (e.g. 3000 bytes) from a client in the DMZ passing the machine targeting another LAN machine
4. Ping works
5. Enable a VPN policy by sending pings from the gateway to a tunnel target. System tries to create the tunnel
6. Ping from 3. immediately stalls. No error messages. Just stops.
7. Stop Ping from 3. Start another without packet size parameter. Stalls also.

Result:
Connections from the client to other services on the LAN machine still work. Tracepath works. Only ICMP requests do not pass
the gateway anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN interface. IMCP requests to any
other target IP address in LAN still work. Until one uses a bigger packet size. Then these alternative connections stall also.

Flushing the policy table has no effect. Flushing the conntrack table has no effect. Setting rp_filter to loose (2) has no effect.
Flush the route cache has no effect.

Only a reboot of the gateway restores normal behavior.

What can be the cause? Is this a networking bug?

Best regards,
--
i. A. Thomas Bartschies 
IT Systeme

Cornelsen Verlagskontor GmbH
Kammerratsheide 66
33609 Bielefeld
Telefon: +49 (0)521 9719-310
Telefax: +49 (0)521 9719-93310
thomas.bartschies@cvk.de
http://www.cvk.de
AG Bielefeld HRB 39324
Geschäftsführer: Thomas Fuchs, Patrick Neiss




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-17  4:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17  4:44 big ICMP requests get disrupted on IPSec tunnel activation Bartschies, Thomas
  -- strict thread matches above, loose matches on Subject: below --
2019-10-16 12:57 Bartschies, Thomas
2019-10-16 15:31 ` Eric Dumazet
2019-10-16 15:40   ` Eric Dumazet
2019-09-13  8:59 Bartschies, Thomas
2019-09-13 17:13 ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).