From mboxrd@z Thu Jan 1 00:00:00 1970 From: Or Gerlitz Subject: some failures with vxlan offloads.. Date: Sun, 26 Oct 2014 15:36:35 +0200 Message-ID: <544CF8E3.8070207@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Cc: "netdev@vger.kernel.org" , John Fastabend , Jeff Kirsher To: Tom Herbert Return-path: Received: from eu1sys200aog102.obsmtp.com ([207.126.144.113]:53249 "EHLO eu1sys200aog102.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751120AbaJZNgn (ORCPT ); Sun, 26 Oct 2014 09:36:43 -0400 Sender: netdev-owner@vger.kernel.org List-ID: Hi all, Tom.. Running VXLAN traffic using driver/NIC which support offloads (mlx4 driver, ConnectX-3 pro NIC), I see some configurationsthat don't really work. The testing was done over the net tree, 3.17.0+, as ofcommit d10845f "Merge branch 'gso_encap_fixes'", whenI say breaks, it means that encapsulated ping works, but encapsulatedTCP (netperf) doesn't. conf client server status ---------------------------------------------- 1 offloaded offloaded works 2 non-offloaded non-offloaded works 3 non-offloaded offloaded breaks 4 offloaded non-offloaded breaks In the cases where it breaks I can see UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 726 prints from __udp4_lib_rcv() in the kernel log of the node where offloads are OFF, where the badpacket is sent from the hostwhere offloading is enabled. I guess the packet is just dropped: # dmesg -c ; nstat UDP: bad checksum. From 192.168.31.18:45084 to 192.168.31.17:4789 ulen 78 UDP: bad checksum. From 192.168.31.18:45084 to 192.168.31.17:4789 ulen 78 #kernel IpInReceives 18 0.0 IpInDelivers 18 0.0 IpOutRequests 17 0.0 TcpInSegs 15 0.0 TcpOutSegs 12 0.0 TcpRetransSegs 1 0.0 UdpInDatagrams 1 0.0 UdpInErrors 2 0.0 UdpOutDatagrams 3 0.0 UdpInCsumErrors 2 0.0 TcpExtTCPHPHits 1 0.0 TcpExtTCPHPAcks 12 0.0 TcpExtTCPAutoCorking 5 0.0 TcpExtTCPSynRetrans 1 0.0 TcpExtTCPOrigDataSent 12 0.0 IpExtInOctets 1068 0.0 IpExtOutOctets 3174 0.0 IpExtInNoECTPkts 18 0.0 The mlx4 driver advertizes NETIF_F_GSO_UDP_TUNNEL but notNETIF_F_GSO_UDP_TUNNEL_CSUM I wonder if such or similar configs work for people with other drivers/NIC that supports offloads? Tom, I think you were testing your changes with bnx2x Or. Setup details: I use OVS with VXLAN, create pair,plug veth1 to OVS and as ip address on veth0, run ping and laternetperf over the veth interfaces IP subnet (192.168.52/24 in this case)which goes through VXLAN encapsulation over the host subnet(192.168.31/24 in this case). client: host 192.168.31.17 / inner 192.168.52.17 server: host 192.168.31.18 / inner 192.168.52.18 output from config #3 the client side has these messages printed from __udp4_lib_rcv() on the csum_error label UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 70 UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 726 UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 70 UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 726 UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 70 UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 70 UDP: bad checksum. From 192.168.31.18:54748 to 192.168.31.17:4789 ulen 726 output fromconfig #4 the server side has these messages printed from __udp4_lib_rcv() on the csum_error label and the below warning UDP: bad checksum. From 192.168.31.17:34521 to 192.168.31.18:4789 ulen 1480 UDP: bad checksum. From 192.168.31.17:34521 to 192.168.31.18:4789 ulen 1480 UDP: bad checksum. From 192.168.31.17:34521 to 192.168.31.18:4789 ulen 1480 UDP: bad checksum. From 192.168.31.17:34521 to 192.168.31.18:4789 ulen 1480 UDP: bad checksum. From 192.168.31.17:60499 to 192.168.31.18:4789 ulen 78 UDP: bad checksum. From 192.168.31.17:36909 to 192.168.31.18:4789 ulen 78 UDP: bad checksum. From 192.168.31.17:36909 to 192.168.31.18:4789 ulen 78 UDP: bad checksum. From 192.168.31.17:36909 to 192.168.31.18:4789 ulen 78 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5427 at net/core/skbuff.c:4006 skb_try_coalesce+0x25e/0x395() Modules linked in: mlx4_ib mlx4_en mlx4_core veth ib_ipoib ib_cm ib_umad ib_sa ib_mad ib_core ib_addrigb dca ptp pps_core hwmon autofs4 sunrpc target_core_mod configfs ipmi_devintf ipmi_si ipmi_msghandleripv6 openvswitch vxlan geneve udp_tunnel ip6_udp_tunnel gre crc32c_generic libcrc32c dm_mirrordm_region_hash dm_log uinput dm_mod microcode sr_mod ext3 jbd usb_storage floppy sd_mod ata_piixlibata scsi_mod uhci_hcd [last unloaded: mlx4_core] CPU: 0 PID: 5427 Comm: netserver Not tainted 3.17.0+ #172 Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008 0000000000000fa6 ffff8802156039b8 ffffffff813f6da9 0000000000000fa6 0000000000000000 ffff8802156039f8 ffffffff8103dc38 ffff8802239b40c0 ffffffff81352a6a ffff8800c5530e00 ffff880215fd1200 ffff880215603a74 Call Trace: [] dump_stack+0x51/0x70 [] warn_slowpath_common+0x7c/0x96 [] ? skb_try_coalesce+0x25e/0x395 [] warn_slowpath_null+0x15/0x17 [] skb_try_coalesce+0x25e/0x395 [] tcp_try_coalesce+0x35/0x91 [] tcp_queue_rcv+0x61/0x101 [] tcp_rcv_established+0x3b9/0x602 [] ? release_sock+0x30/0x1b0 [] tcp_v4_do_rcv+0x105/0x41a [] release_sock+0x105/0x1b0 [] tcp_recvmsg+0x912/0xa5b [] ? rcu_irq_exit+0x7d/0x8f [] ? retint_restore_args+0xe/0xe [] inet_recvmsg+0xd1/0xeb [] sock_recvmsg+0x94/0xb2 [] ? trace_hardirqs_on+0xd/0xf [] ? _raw_spin_unlock_irq+0x2b/0x38 [] ? __fdget+0xe/0x10 [] SyS_recvfrom+0xbf/0x10f [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] ? try_to_wake_up+0x2d0/0x317 [] ? release_sock+0x30/0x1b0 [] system_call_fastpath+0x12/0x17 ---[ end trace d39905841ae018aa ]---