netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug] cxgb4: vrf stopped working with cxgb4 card
@ 2018-06-04 15:03 AMG Zollner Robert
  2018-06-04 18:17 ` David Ahern
  2018-06-10  0:47 ` David Ahern
  0 siblings, 2 replies; 7+ messages in thread
From: AMG Zollner Robert @ 2018-06-04 15:03 UTC (permalink / raw)
  To: ganeshgr; +Cc: netdev, dsa

I have noticed that vrf is not working with kernel v4.15.0 but was 
working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)

Setup:
Two metal servers with a T520-cr card each, directly connected without a 
switch in between.

        SVR1  only ipfwd                 SVR2     with vrf
.----------------------------. .----------------------------------.
|                            |         |             |
|    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
|    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
`----------------------------' `----------------------------------'

When vrf is not working there are no error messages (dmesg or iproute 
commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10 
shows packets(arp req/reply) coming in and going out, but outgoing 
packets(arp reply) do not reach the other server SVR1.ens2f4d1


Bisect:
Found this commit to be the problem after doing a git bisect between 
v4.13..v4.15:

commit ba581f77df23c8ee70b372966e69cf10bc5453d8
Author: Ganesh Goudar <ganeshgr@chelsio.com>
Date:   Sat Sep 23 16:07:28 2017 +0530

     cxgb4: do DCB state reset in couple of places

     reset the driver's DCB state in couple of places
     where it was missing.


A bisect step was considered good when:
- successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
- successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3 
forwarding) (this check was redundant,both tests fail or pass simultaneous)

The problem is still present on recent kernels also, checked v4.16.0 and 
v4.17.rc7

Disabling DCB for the card support fixes the problem ( Compiling kernel 
with "CONFIG_CHELSIO_T4_DCB=n")



This is my first time reporting a bug to the linux kernel and hope I 
have included the right amount of information. Please let me know if I 
have missed something.



Thank you,
Zollner Robert


--------
Logs:

VRF configured using folowing commands:

#!/bin/sh

CHDEV=ens1f4
VRF=vrf-recv

sysctl -w net.ipv4.tcp_l3mdev_accept=1
sysctl -w net.ipv4.udp_l3mdev_accept=1
sysctl -w net.ipv4.conf.all.accept_local=1

ifconfig ${CHDEV}   192.168.8.2/24
ifconfig ${CHDEV}d1 192.168.9.2/24

ip link add ${VRF} type vrf table 10
ip link set dev ${VRF} up

ip rule add pref 32765 table local
ip rule del pref 0

ip route add table 10 unreachable default metric 4278198272

ip link set dev ${CHDEV}d1 master ${VRF}

ip route add table 10 default via 192.168.9.1
ip route add 192.168.9.0/24 via 192.168.8.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
  2018-06-04 15:03 [bug] cxgb4: vrf stopped working with cxgb4 card AMG Zollner Robert
@ 2018-06-04 18:17 ` David Ahern
  2018-06-04 20:14   ` AMG Zollner Robert
  2018-06-10  0:47 ` David Ahern
  1 sibling, 1 reply; 7+ messages in thread
From: David Ahern @ 2018-06-04 18:17 UTC (permalink / raw)
  To: AMG Zollner Robert, ganeshgr; +Cc: netdev

On 6/4/18 8:03 AM, AMG Zollner Robert wrote:
> I have noticed that vrf is not working with kernel v4.15.0 but was
> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
> 
> Setup:
> Two metal servers with a T520-cr card each, directly connected without a
> switch in between.
> 
>        SVR1  only ipfwd                 SVR2     with vrf
> .----------------------------. .----------------------------------.
> |                            |         |             |
> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
> `----------------------------' `----------------------------------'
> 
> When vrf is not working there are no error messages (dmesg or iproute
> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
> shows packets(arp req/reply) coming in and going out, but outgoing
> packets(arp reply) do not reach the other server SVR1.ens2f4d1
> 
> 
> Bisect:
> Found this commit to be the problem after doing a git bisect between
> v4.13..v4.15:
> 
> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
> Author: Ganesh Goudar <ganeshgr@chelsio.com>
> Date:   Sat Sep 23 16:07:28 2017 +0530
> 
>     cxgb4: do DCB state reset in couple of places
> 
>     reset the driver's DCB state in couple of places
>     where it was missing.
> 
> 
> A bisect step was considered good when:
> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
> forwarding) (this check was redundant,both tests fail or pass simultaneous)
> 
> The problem is still present on recent kernels also, checked v4.16.0 and
> v4.17.rc7
> 
> Disabling DCB for the card support fixes the problem ( Compiling kernel
> with "CONFIG_CHELSIO_T4_DCB=n")
> 

Are you doing the VRF enslave while it is up?

If so, does it work ok if you change the sequence:

ip li set ens1f4d1 down
ip li set ens1f4d1 master <VRF>
ip li set ens1f4d1 up

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
  2018-06-04 18:17 ` David Ahern
@ 2018-06-04 20:14   ` AMG Zollner Robert
  2018-06-04 20:35     ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: AMG Zollner Robert @ 2018-06-04 20:14 UTC (permalink / raw)
  To: David Ahern, ganeshgr; +Cc: netdev

Yes, I was enslaving while the interface was up.

Just tested some of the builds that where not working earlier and they 
are working if I keep the interface down when enslaving as you suggested.

Is this the expected behavior?

Thank you,
Zollner Robert


On 04.06.2018 21:17, David Ahern wrote:
> On 6/4/18 8:03 AM, AMG Zollner Robert wrote:
>> I have noticed that vrf is not working with kernel v4.15.0 but was
>> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
>>
>> Setup:
>> Two metal servers with a T520-cr card each, directly connected without a
>> switch in between.
>>
>>         SVR1  only ipfwd                 SVR2     with vrf
>> .----------------------------. .----------------------------------.
>> |                            |         |             |
>> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
>> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
>> `----------------------------' `----------------------------------'
>>
>> When vrf is not working there are no error messages (dmesg or iproute
>> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
>> shows packets(arp req/reply) coming in and going out, but outgoing
>> packets(arp reply) do not reach the other server SVR1.ens2f4d1
>>
>>
>> Bisect:
>> Found this commit to be the problem after doing a git bisect between
>> v4.13..v4.15:
>>
>> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
>> Author: Ganesh Goudar <ganeshgr@chelsio.com>
>> Date:   Sat Sep 23 16:07:28 2017 +0530
>>
>>      cxgb4: do DCB state reset in couple of places
>>
>>      reset the driver's DCB state in couple of places
>>      where it was missing.
>>
>>
>> A bisect step was considered good when:
>> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
>> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
>> forwarding) (this check was redundant,both tests fail or pass simultaneous)
>>
>> The problem is still present on recent kernels also, checked v4.16.0 and
>> v4.17.rc7
>>
>> Disabling DCB for the card support fixes the problem ( Compiling kernel
>> with "CONFIG_CHELSIO_T4_DCB=n")
>>
> Are you doing the VRF enslave while it is up?
>
> If so, does it work ok if you change the sequence:
>
> ip li set ens1f4d1 down
> ip li set ens1f4d1 master <VRF>
> ip li set ens1f4d1 up

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
  2018-06-04 20:14   ` AMG Zollner Robert
@ 2018-06-04 20:35     ` David Ahern
  0 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2018-06-04 20:35 UTC (permalink / raw)
  To: AMG Zollner Robert, ganeshgr; +Cc: netdev

On 6/4/18 1:14 PM, AMG Zollner Robert wrote:
> Yes, I was enslaving while the interface was up.
> 
> Just tested some of the builds that where not working earlier and they
> are working if I keep the interface down when enslaving as you suggested.
> 
> Is this the expected behavior?

Not expected from my perspective.

The VRF device cycles interfaces when they are enslaved or unenslaved to
clean up route and neighbor tables. This is a day 1 property of VRF.

I guessed that was the problem based on the commit you bisected the
problem to. If nothing else, it gives you a workaround until it is fixed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
  2018-06-04 15:03 [bug] cxgb4: vrf stopped working with cxgb4 card AMG Zollner Robert
  2018-06-04 18:17 ` David Ahern
@ 2018-06-10  0:47 ` David Ahern
  2018-06-11  9:17   ` Ganesh Goudar
  1 sibling, 1 reply; 7+ messages in thread
From: David Ahern @ 2018-06-10  0:47 UTC (permalink / raw)
  To: AMG Zollner Robert, ganeshgr; +Cc: netdev

Ganesh:

On 6/4/18 9:03 AM, AMG Zollner Robert wrote:
> I have noticed that vrf is not working with kernel v4.15.0 but was
> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
> 
> Setup:
> Two metal servers with a T520-cr card each, directly connected without a
> switch in between.
> 
>        SVR1  only ipfwd                 SVR2     with vrf
> .----------------------------. .----------------------------------.
> |                            |         |             |
> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
> `----------------------------' `----------------------------------'
> 
> When vrf is not working there are no error messages (dmesg or iproute
> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
> shows packets(arp req/reply) coming in and going out, but outgoing
> packets(arp reply) do not reach the other server SVR1.ens2f4d1
> 
> 
> Bisect:
> Found this commit to be the problem after doing a git bisect between
> v4.13..v4.15:
> 
> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
> Author: Ganesh Goudar <ganeshgr@chelsio.com>
> Date:   Sat Sep 23 16:07:28 2017 +0530
> 
>     cxgb4: do DCB state reset in couple of places
> 
>     reset the driver's DCB state in couple of places
>     where it was missing.

Are you working on a fix for this or should a revert of the above patch
be sent?


> 
> 
> A bisect step was considered good when:
> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
> forwarding) (this check was redundant,both tests fail or pass simultaneous)
> 
> The problem is still present on recent kernels also, checked v4.16.0 and
> v4.17.rc7
> 
> Disabling DCB for the card support fixes the problem ( Compiling kernel
> with "CONFIG_CHELSIO_T4_DCB=n")
> 
> 
> 
> This is my first time reporting a bug to the linux kernel and hope I
> have included the right amount of information. Please let me know if I
> have missed something.
> 
> 
> 
> Thank you,
> Zollner Robert
> 
> 
> --------
> Logs:
> 
> VRF configured using folowing commands:
> 
> #!/bin/sh
> 
> CHDEV=ens1f4
> VRF=vrf-recv
> 
> sysctl -w net.ipv4.tcp_l3mdev_accept=1
> sysctl -w net.ipv4.udp_l3mdev_accept=1
> sysctl -w net.ipv4.conf.all.accept_local=1
> 
> ifconfig ${CHDEV}   192.168.8.2/24
> ifconfig ${CHDEV}d1 192.168.9.2/24
> 
> ip link add ${VRF} type vrf table 10
> ip link set dev ${VRF} up
> 
> ip rule add pref 32765 table local
> ip rule del pref 0
> 
> ip route add table 10 unreachable default metric 4278198272
> 
> ip link set dev ${CHDEV}d1 master ${VRF}
> 
> ip route add table 10 default via 192.168.9.1
> ip route add 192.168.9.0/24 via 192.168.8.1
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
  2018-06-10  0:47 ` David Ahern
@ 2018-06-11  9:17   ` Ganesh Goudar
       [not found]     ` <20180619132425.GA6576@chelsio.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Ganesh Goudar @ 2018-06-11  9:17 UTC (permalink / raw)
  To: David Ahern; +Cc: AMG Zollner Robert, netdev

On Saturday, June 06/09/18, 2018 at 18:47:55 -0600, David Ahern wrote:
> Ganesh:
> 
> On 6/4/18 9:03 AM, AMG Zollner Robert wrote:
> > I have noticed that vrf is not working with kernel v4.15.0 but was
> > working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
> > 
> > Setup:
> > Two metal servers with a T520-cr card each, directly connected without a
> > switch in between.
> > 
> >        SVR1  only ipfwd                 SVR2     with vrf
> > .----------------------------. .----------------------------------.
> > |                            |         |             |
> > |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
> > |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
> > `----------------------------' `----------------------------------'
> > 
> > When vrf is not working there are no error messages (dmesg or iproute
> > commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
> > shows packets(arp req/reply) coming in and going out, but outgoing
> > packets(arp reply) do not reach the other server SVR1.ens2f4d1
> > 
> > 
> > Bisect:
> > Found this commit to be the problem after doing a git bisect between
> > v4.13..v4.15:
> > 
> > commit ba581f77df23c8ee70b372966e69cf10bc5453d8
> > Author: Ganesh Goudar <ganeshgr@chelsio.com>
> > Date:   Sat Sep 23 16:07:28 2017 +0530
> > 
> >     cxgb4: do DCB state reset in couple of places
> > 
> >     reset the driver's DCB state in couple of places
> >     where it was missing.
> 
> Are you working on a fix for this or should a revert of the above patch
> be sent?
Will look into it and fix/revert it soon, Thanks for responding to Robert.
> 
> 
> > 
> > 
> > A bisect step was considered good when:
> > - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
> > - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
> > forwarding) (this check was redundant,both tests fail or pass simultaneous)
> > 
> > The problem is still present on recent kernels also, checked v4.16.0 and
> > v4.17.rc7
> > 
> > Disabling DCB for the card support fixes the problem ( Compiling kernel
> > with "CONFIG_CHELSIO_T4_DCB=n")
> > 
> > 
> > 
> > This is my first time reporting a bug to the linux kernel and hope I
> > have included the right amount of information. Please let me know if I
> > have missed something.
> > 
> > 
> > 
> > Thank you,
> > Zollner Robert
> > 
> > 
> > --------
> > Logs:
> > 
> > VRF configured using folowing commands:
> > 
> > #!/bin/sh
> > 
> > CHDEV=ens1f4
> > VRF=vrf-recv
> > 
> > sysctl -w net.ipv4.tcp_l3mdev_accept=1
> > sysctl -w net.ipv4.udp_l3mdev_accept=1
> > sysctl -w net.ipv4.conf.all.accept_local=1
> > 
> > ifconfig ${CHDEV}   192.168.8.2/24
> > ifconfig ${CHDEV}d1 192.168.9.2/24
> > 
> > ip link add ${VRF} type vrf table 10
> > ip link set dev ${VRF} up
> > 
> > ip rule add pref 32765 table local
> > ip rule del pref 0
> > 
> > ip route add table 10 unreachable default metric 4278198272
> > 
> > ip link set dev ${CHDEV}d1 master ${VRF}
> > 
> > ip route add table 10 default via 192.168.9.1
> > ip route add 192.168.9.0/24 via 192.168.8.1
> > 
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
       [not found]     ` <20180619132425.GA6576@chelsio.com>
@ 2018-06-19 20:32       ` AMG Zollner Robert
  0 siblings, 0 replies; 7+ messages in thread
From: AMG Zollner Robert @ 2018-06-19 20:32 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: office, dsa, netdev

On 19.06.2018 16:24, Ganesh Goudar wrote:
> On Monday, June 06/11/18, 2018 at 14:47:55 +0530, Ganesh Goudar wrote:
>> On Saturday, June 06/09/18, 2018 at 18:47:55 -0600, David Ahern wrote:
>>> Ganesh:
>>>
>>> On 6/4/18 9:03 AM, AMG Zollner Robert wrote:
>>>> I have noticed that vrf is not working with kernel v4.15.0 but was
>>>> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
>>>>
>>>> Setup:
>>>> Two metal servers with a T520-cr card each, directly connected without a
>>>> switch in between.
>>>>
>>>>         SVR1  only ipfwd                 SVR2     with vrf
>>>> .----------------------------. .----------------------------------.
>>>> |                            |         |             |
>>>> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
>>>> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
>>>> `----------------------------' `----------------------------------'
>>>>
>>>> When vrf is not working there are no error messages (dmesg or iproute
>>>> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
>>>> shows packets(arp req/reply) coming in and going out, but outgoing
>>>> packets(arp reply) do not reach the other server SVR1.ens2f4d1
>>>>
>>>>
>>>> Bisect:
>>>> Found this commit to be the problem after doing a git bisect between
>>>> v4.13..v4.15:
>>>>
>>>> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
>>>> Author: Ganesh Goudar <ganeshgr@chelsio.com>
>>>> Date:   Sat Sep 23 16:07:28 2017 +0530
>>>>
>>>>      cxgb4: do DCB state reset in couple of places
>>>>
>>>>      reset the driver's DCB state in couple of places
>>>>      where it was missing.
>>>
>>> Are you working on a fix for this or should a revert of the above patch
>>> be sent?
>> Will look into it and fix/revert it soon, Thanks for responding to Robert.
>>>
>>>
>>>>
>>>>
>>>> A bisect step was considered good when:
>>>> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
>>>> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
>>>> forwarding) (this check was redundant,both tests fail or pass simultaneous)
>>>>
>>>> The problem is still present on recent kernels also, checked v4.16.0 and
>>>> v4.17.rc7
>>>>
>>>> Disabling DCB for the card support fixes the problem ( Compiling kernel
>>>> with "CONFIG_CHELSIO_T4_DCB=n")
>>>>
>>>>
>>>>
>>>> This is my first time reporting a bug to the linux kernel and hope I
>>>> have included the right amount of information. Please let me know if I
>>>> have missed something.
>>>>
>>>>
>>>>
>>>> Thank you,
>>>> Zollner Robert
>>>>
>>>>
>>>> --------
>>>> Logs:
>>>>
>>>> VRF configured using folowing commands:
>>>>
>>>> #!/bin/sh
>>>>
>>>> CHDEV=ens1f4
>>>> VRF=vrf-recv
>>>>
>>>> sysctl -w net.ipv4.tcp_l3mdev_accept=1
>>>> sysctl -w net.ipv4.udp_l3mdev_accept=1
>>>> sysctl -w net.ipv4.conf.all.accept_local=1
>>>>
>>>> ifconfig ${CHDEV}   192.168.8.2/24
>>>> ifconfig ${CHDEV}d1 192.168.9.2/24
>>>>
>>>> ip link add ${VRF} type vrf table 10
>>>> ip link set dev ${VRF} up
>>>>
>>>> ip rule add pref 32765 table local
>>>> ip rule del pref 0
>>>>
>>>> ip route add table 10 unreachable default metric 4278198272
>>>>
>>>> ip link set dev ${CHDEV}d1 master ${VRF}
>>>>
>>>> ip route add table 10 default via 192.168.9.1
>>>> ip route add 192.168.9.0/24 via 192.168.8.1
>>>>
>>>>
>>>>
>>>>
>>>
> -netdev, Please feel free to add if needed.
> 
> Hi Robert,
> 
> My knowledge of VRF is very limited, I am trying to bring
> up VRF setup, I just wanted to check if you are doing anything
> related DCB and also please let me know how did you setup SRV1.
> 
> Thanks
> 

Hello Ganesh,

SRV1 is just forwarding(l3) between the two physical ports of the 
T520-CR card.

ifconfig ens1f4   192.168.8.1/24
ifconfig ens1f4d1 192.168.9.1/24
sysctl -w net.ipv4.ip_forward=1

- No VRF is configured on this box
- DCB is also not used


SVR2 is using VRF and is configured with the script inlined in the first 
email.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-06-19 20:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 15:03 [bug] cxgb4: vrf stopped working with cxgb4 card AMG Zollner Robert
2018-06-04 18:17 ` David Ahern
2018-06-04 20:14   ` AMG Zollner Robert
2018-06-04 20:35     ` David Ahern
2018-06-10  0:47 ` David Ahern
2018-06-11  9:17   ` Ganesh Goudar
     [not found]     ` <20180619132425.GA6576@chelsio.com>
2018-06-19 20:32       ` AMG Zollner Robert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).