From mboxrd@z Thu Jan 1 00:00:00 1970 From: AMG Zollner Robert Subject: [bug] cxgb4: vrf stopped working with cxgb4 card Date: Mon, 4 Jun 2018 18:03:32 +0300 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: netdev@vger.kernel.org, dsa@cumulusnetworks.com To: ganeshgr@chelsio.com Return-path: Received: from web01.accessmedia.ro ([86.107.100.4]:45803 "EHLO web01.accessmedia.ro" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753603AbeFDPR3 (ORCPT ); Mon, 4 Jun 2018 11:17:29 -0400 Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: I have noticed that vrf is not working with kernel v4.15.0 but was working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr) Setup: Two metal servers with a T520-cr card each, directly connected without a switch in between.        SVR1  only ipfwd                 SVR2     with vrf .----------------------------. .----------------------------------. |                            |         |             | |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   | |    192.168.9.1 [ens2f4d1]--|---------|-- 192.168.9.2 VRF=10   | `----------------------------' `----------------------------------' When vrf is not working there are no error messages (dmesg or iproute commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10 shows packets(arp req/reply) coming in and going out, but outgoing packets(arp reply) do not reach the other server SVR1.ens2f4d1 Bisect: Found this commit to be the problem after doing a git bisect between v4.13..v4.15: commit ba581f77df23c8ee70b372966e69cf10bc5453d8 Author: Ganesh Goudar Date:   Sat Sep 23 16:07:28 2017 +0530     cxgb4: do DCB state reset in couple of places     reset the driver's DCB state in couple of places     where it was missing. A bisect step was considered good when: - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3 forwarding) (this check was redundant,both tests fail or pass simultaneous) The problem is still present on recent kernels also, checked v4.16.0 and v4.17.rc7 Disabling DCB for the card support fixes the problem ( Compiling kernel with "CONFIG_CHELSIO_T4_DCB=n") This is my first time reporting a bug to the linux kernel and hope I have included the right amount of information. Please let me know if I have missed something. Thank you, Zollner Robert -------- Logs: VRF configured using folowing commands: #!/bin/sh CHDEV=ens1f4 VRF=vrf-recv sysctl -w net.ipv4.tcp_l3mdev_accept=1 sysctl -w net.ipv4.udp_l3mdev_accept=1 sysctl -w net.ipv4.conf.all.accept_local=1 ifconfig ${CHDEV}   192.168.8.2/24 ifconfig ${CHDEV}d1 192.168.9.2/24 ip link add ${VRF} type vrf table 10 ip link set dev ${VRF} up ip rule add pref 32765 table local ip rule del pref 0 ip route add table 10 unreachable default metric 4278198272 ip link set dev ${CHDEV}d1 master ${VRF} ip route add table 10 default via 192.168.9.1 ip route add 192.168.9.0/24 via 192.168.8.1