netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bonding + arp monitoring fails if interface is a vlan
@ 2013-08-01 12:11 Santiago Garcia Mantinan
  2013-08-01 13:00 ` Erik Hugne
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Santiago Garcia Mantinan @ 2013-08-01 12:11 UTC (permalink / raw)
  To: netdev

Hi!

I'm trying to setup a bond of a couple of vlans, these vlans are different
paths to an upstream switch from a local switch.  I want to do arp
monitoring of the link in order for the bonding interface to know which path
is ok and wich one is broken.  If I set it up using arp monitoring and
without using vlans it works ok, it also works if I set it up using vlans
but without arp monitoring, so the broken setup seems to be with bonding +
arp monitoring + vlans. Here is a schema:

 -------------
|Remote Switch|
 -------------
   |      |
   P      P
   A      A
   T      T
   H      H
   1      2
   |      |
 ------------
|Local switch|
 ------------
      |
      | VLAN for PATH1
      | VLAN for PATH2
      |
 Linux machine

The broken setup seems to work but arp monitoring makes it loose the logical
link from time to time, thus changing to other slave if available.  What I
saw when monitoring this with tcpdump is that all the arp requests were
going out and that all the replies where coming in, so acording to the
traffic seen on tcpdump the link should have been stable, but
/proc/net/bonding/bond0 showed the link failures increasing and when testing
with just a vlan interface I was loosing ping when the link was going down.

I've tried this on Debian wheezy with its 3.2.46 kernel and also the 3.10.3
version in unstable, the tests where done on a couple of machines using a 32
bits kernel with different nics (r8169 and skge).

I created a small lab to replicate the problem, on this setup I avoided all
the switching and I directly connected the machine with bonding to another
Linux on which I just had eth0.1002 configured with ip 192.168.1.1, the
results where the same as in the full scenario, link on the bonding slave
was going down from time to time.

This is the setup on the bonding interface.

auto bond0
iface bond0 inet static
        address 192.168.1.2
        netmask 255.255.255.0
        bond-slaves eth0.1002
        bond-mode active-backup
        bond-arp_validate 0
        bond-arp_interval 5000
        bond-arp_ip_target 192.168.1.1
        pre-up ip link set eth0 up || true
        pre-up ip link add link eth0 name eth0.1002 type vlan id 1002 || true
        down ip link delete eth0.1002 || true

These are the messages I was seing on the bonding machines:

[  452.436750] bonding: bond0: adding ARP target 192.168.1.1.
[  452.436851] bonding: bond0: Setting ARP monitoring interval to 5000.
[  452.440287] bonding: bond0: setting mode to active-backup (1).
[  452.440429] bonding: bond0: setting arp_validate to none (0).
[  452.458349] bonding: bond0: Adding slave eth0.1002.
[  452.458964] bonding: bond0: making interface eth0.1002 the new active one.
[  452.458983] bonding: bond0: first active interface up!
[  452.458999] bonding: bond0: enslaving eth0.1002 as an active interface with an up link.
[  452.482560] 8021q: adding VLAN 0 to HW filter on device bond0
[  467.500143] bonding: bond0: link status definitely down for interface eth0.1002, disabling it
[  467.500193] bonding: bond0: now running without any active interface !
[  622.748102] bonding: bond0: link status definitely up for interface eth0.1002.
[  622.748122] bonding: bond0: making interface eth0.1002 the new active one.
[  622.748522] bonding: bond0: first active interface up!
[  637.772179] bonding: bond0: link status definitely down for interface eth0.1002, disabling it
[  637.772228] bonding: bond0: now running without any active interface !
[  642.780173] bonding: bond0: link status definitely up for interface eth0.1002.
[  642.780192] bonding: bond0: making interface eth0.1002 the new active one.
[  642.780603] bonding: bond0: first active interface up!
[  657.804154] bonding: bond0: link status definitely down for interface eth0.1002, disabling it
[  657.804209] bonding: bond0: now running without any active interface !
[  662.812165] bonding: bond0: link status definitely up for interface eth0.1002.
[  662.812185] bonding: bond0: making interface eth0.1002 the new active one.
[  662.812592] bonding: bond0: first active interface up!
[  677.836167] bonding: bond0: link status definitely down for interface eth0.1002, disabling it
[  677.836223] bonding: bond0: now running without any active interface !
[  682.844162] bonding: bond0: link status definitely up for interface eth0.1002.
[  682.844181] bonding: bond0: making interface eth0.1002 the new active one.
[  682.844590] bonding: bond0: first active interface up!
[  697.868153] bonding: bond0: link status definitely down for interface eth0.1002, disabling it

Like I said, running tcpdump on both Linux shows everything fine, all arp
replies and requests are there, but link goes down from time to time, on
this setup the bond is built just with one slave, so network is lost when
link goes down.

Some questions:

am I doing something wrong here?
Is this setup not supported?
If it should work... can anybody reproduce this?
Bug?

What should I do now?

Regards...
-- 
Manty/BestiaTester -> http://manty.net

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-08-21  7:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-01 12:11 bonding + arp monitoring fails if interface is a vlan Santiago Garcia Mantinan
2013-08-01 13:00 ` Erik Hugne
2013-08-02  7:26   ` Santiago Garcia Mantinan
2013-08-02  9:33     ` Santiago Garcia Mantinan
2013-08-01 20:21 ` Veaceslav Falico
2013-08-02  7:30   ` Santiago Garcia Mantinan
2013-08-02 11:58 ` Nikolay Aleksandrov
2013-08-02 15:49   ` Jay Vosburgh
2013-08-02 16:13     ` Nikolay Aleksandrov
2013-08-04 10:45   ` Santiago Garcia Mantinan
2013-08-05 10:26     ` Santiago Garcia Mantinan
2013-08-05 10:26       ` Nikolay Aleksandrov
2013-08-07  7:26         ` Santiago Garcia Mantinan
2013-08-07  7:39           ` Nikolay Aleksandrov
2013-08-07 10:44             ` Santiago Garcia Mantinan
2013-08-20  8:05               ` Santiago Garcia Mantinan
2013-08-20 10:11                 ` Nikolay Aleksandrov
2013-08-21  7:39                   ` Santiago Garcia Mantinan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).