All of lore.kernel.org
 help / color / mirror / Atom feed
* bond interface arp, vlan and trunk / network question
@ 2009-04-20 17:50 stefan novak
  2009-04-20 18:16 ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: stefan novak @ 2009-04-20 17:50 UTC (permalink / raw)
  To: linux-kernel

Hello list,

I've got a Problem with my bladecenter and bond interfaces.
My Server has 2 interface, each on a seperate switch. Each
serverinterface is connected via a trunk to one of the switches.
Each switch has a trunk to stacked-backbone switch.

nic1 --trunk--> bladesw1 --trunk--> backbone switch
nic2 --trunk--> bladesw2 --trunk--> backbone switch

So far vlan and trunking works as expected. But if one trunk
connection from a bladeswitch to the backbone switch is down the
mii-tool cant recognize this.
I tried to use an arp target on the backbone switch to check the
connection state.

Now i'm running into problems. :(
My bladesw1 is configured with a trunk and a private vlan id of 600.
On the backbone switch is a server in the vlan 600 connected, but i
can't get an arp request.

What can be the problem, or is it possible to add a vlan tag on the arp check?

thx for your help and sorry for my bad english...

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 17:50 bond interface arp, vlan and trunk / network question stefan novak
@ 2009-04-20 18:16 ` Eric Dumazet
  2009-04-20 18:37   ` Jay Vosburgh
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2009-04-20 18:16 UTC (permalink / raw)
  To: stefan novak; +Cc: linux-kernel, Linux Netdev List

stefan novak a écrit :
> Hello list,
> 
> I've got a Problem with my bladecenter and bond interfaces.
> My Server has 2 interface, each on a seperate switch. Each
> serverinterface is connected via a trunk to one of the switches.
> Each switch has a trunk to stacked-backbone switch.
> 
> nic1 --trunk--> bladesw1 --trunk--> backbone switch
> nic2 --trunk--> bladesw2 --trunk--> backbone switch
> 
> So far vlan and trunking works as expected. But if one trunk
> connection from a bladeswitch to the backbone switch is down the
> mii-tool cant recognize this.

What is the exact problem on this one ?

> I tried to use an arp target on the backbone switch to check the
> connection state.
> 
> Now i'm running into problems. :(
> My bladesw1 is configured with a trunk and a private vlan id of 600.
> On the backbone switch is a server in the vlan 600 connected, but i
> can't get an arp request.
> 
> What can be the problem, or is it possible to add a vlan tag on the arp check?
> 

What driver is in use on your NIC interface ?

Could you post your bonding settings ?

cat /proc/net/bonding/bond0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 18:16 ` Eric Dumazet
@ 2009-04-20 18:37   ` Jay Vosburgh
  2009-04-20 20:00     ` stefan novak
  0 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-20 18:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: stefan novak, linux-kernel, Linux Netdev List

Eric Dumazet <dada1@cosmosbay.com> wrote:

>stefan novak a écrit :
>> Hello list,
>> 
>> I've got a Problem with my bladecenter and bond interfaces.
>> My Server has 2 interface, each on a seperate switch. Each
>> serverinterface is connected via a trunk to one of the switches.
>> Each switch has a trunk to stacked-backbone switch.
>> 
>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>> 
>> So far vlan and trunking works as expected. But if one trunk
>> connection from a bladeswitch to the backbone switch is down the
>> mii-tool cant recognize this.
>
>What is the exact problem on this one ?

	This is the expected behavior, the external switch to
bladecenter switch (ESM) link status does not affect the ESM to blade
link status.

	Unless... your ESM supports "trunk failover."  I believe the
Cisco ESMs do, I'm not sure about others.  With trunk failover enabled,
loss of link on an external switch port will in turn drop link on the
corresponding internal switch ports.  There is a long-ish delay for
this, on the order of 750 ms, as I recall.

>> I tried to use an arp target on the backbone switch to check the
>> connection state.
>> 
>> Now i'm running into problems. :(
>> My bladesw1 is configured with a trunk and a private vlan id of 600.
>> On the backbone switch is a server in the vlan 600 connected, but i
>> can't get an arp request.
>> 
>> What can be the problem, or is it possible to add a vlan tag on the arp check?
>> 
>
>What driver is in use on your NIC interface ?
>
>Could you post your bonding settings ?
>
>cat /proc/net/bonding/bond0

	Sufficiently recent versions of bonding should VLAN tag the ARP
probes, provided you are using a VLAN device configured above the bond.
My recollection is that VLAN tagging of ARP probes was added about three
years ago.

	If the switch port is configured as native to a VLAN, then it
should tag everything coming in.

	As Eric asks, what are you running?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 18:37   ` Jay Vosburgh
@ 2009-04-20 20:00     ` stefan novak
  2009-04-20 20:36       ` Jay Vosburgh
  0 siblings, 1 reply; 19+ messages in thread
From: stefan novak @ 2009-04-20 20:00 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>
>> So far vlan and trunking works as expected. But if one trunk
>> connection from a bladeswitch to the backbone switch is down the
>> mii-tool cant recognize this.
>
> What is the exact problem on this one ?

The exact problem is that the bonding driver don't switch the
interface because the mii-tool don't recognize that the connection
between the two switches is now broken.

nic1 --trunk--> bladesw1 --trunk--> backbone switch (passive if)
nic2 --trunk--> bladesw2 --xxx--broken-trunk-xx-> backbone switch (active if)

>   Sufficiently recent versions of bonding should VLAN tag the ARP
>probes, provided you are using a VLAN device configured above the bond.
>My recollection is that VLAN tagging of ARP probes was added about three
>years ago.
>
>       If the switch port is configured as native to a VLAN, then it
>should tag everything coming in.
>
>      As Eric asks, what are you running?

I've got a bonding interface over eth0 and eth1 and on that bonding
interface several vlans. The arp probe is in vlan 600 and i can ping
it from the bash.
Also a arpping with -I bond0.600 is working. Arpping with -I bond0 is
not working.
So the arp check is not working for me :(

the nic driver is:
alias eth0 igb
alias eth1 igb

the bond setiings:
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 30
ARP IP target/s (n.n.n.n form): 172.21.0.254

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:94:7d:1a

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:30:48:94:7d:1b


On Mon, Apr 20, 2009 at 8:37 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>
>>stefan novak a écrit :
>>> Hello list,
>>>
>>> I've got a Problem with my bladecenter and bond interfaces.
>>> My Server has 2 interface, each on a seperate switch. Each
>>> serverinterface is connected via a trunk to one of the switches.
>>> Each switch has a trunk to stacked-backbone switch.
>>>
>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>>
>>> So far vlan and trunking works as expected. But if one trunk
>>> connection from a bladeswitch to the backbone switch is down the
>>> mii-tool cant recognize this.
>>
>>What is the exact problem on this one ?
>
>        This is the expected behavior, the external switch to
> bladecenter switch (ESM) link status does not affect the ESM to blade
> link status.
>
>        Unless... your ESM supports "trunk failover."  I believe the
> Cisco ESMs do, I'm not sure about others.  With trunk failover enabled,
> loss of link on an external switch port will in turn drop link on the
> corresponding internal switch ports.  There is a long-ish delay for
> this, on the order of 750 ms, as I recall.
>
>>> I tried to use an arp target on the backbone switch to check the
>>> connection state.
>>>
>>> Now i'm running into problems. :(
>>> My bladesw1 is configured with a trunk and a private vlan id of 600.
>>> On the backbone switch is a server in the vlan 600 connected, but i
>>> can't get an arp request.
>>>
>>> What can be the problem, or is it possible to add a vlan tag on the arp check?
>>>
>>
>>What driver is in use on your NIC interface ?
>>
>>Could you post your bonding settings ?
>>
>>cat /proc/net/bonding/bond0
>
>        Sufficiently recent versions of bonding should VLAN tag the ARP
> probes, provided you are using a VLAN device configured above the bond.
> My recollection is that VLAN tagging of ARP probes was added about three
> years ago.
>
>        If the switch port is configured as native to a VLAN, then it
> should tag everything coming in.
>
>        As Eric asks, what are you running?
>
>        -J
>
> ---
>        -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 20:00     ` stefan novak
@ 2009-04-20 20:36       ` Jay Vosburgh
  2009-04-20 21:03         ` stefan novak
  0 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-20 20:36 UTC (permalink / raw)
  To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

stefan novak <lms.brubaker@gmail.com> wrote:

>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>>
>>> So far vlan and trunking works as expected. But if one trunk
>>> connection from a bladeswitch to the backbone switch is down the
>>> mii-tool cant recognize this.
>>
>> What is the exact problem on this one ?
>
>The exact problem is that the bonding driver don't switch the
>interface because the mii-tool don't recognize that the connection
>between the two switches is now broken.

	No, from your configuration information, you're running the ARP
monitor, in which case the actual link state ("mii-tool", although that
isn't really how it works) is not used in the failover decision.

	For the ARP monitor, the decision is based on whether or not
"replies" to the ARP probes come through.  More on that in a bit.

>nic1 --trunk--> bladesw1 --trunk--> backbone switch (passive if)
>nic2 --trunk--> bladesw2 --xxx--broken-trunk-xx-> backbone switch (active if)
>
>>   Sufficiently recent versions of bonding should VLAN tag the ARP
>>probes, provided you are using a VLAN device configured above the bond.
>>My recollection is that VLAN tagging of ARP probes was added about three
>>years ago.
>>
>>       If the switch port is configured as native to a VLAN, then it
>>should tag everything coming in.
>>
>>      As Eric asks, what are you running?
>
>I've got a bonding interface over eth0 and eth1 and on that bonding
>interface several vlans. The arp probe is in vlan 600 and i can ping
>it from the bash.
>Also a arpping with -I bond0.600 is working. Arpping with -I bond0 is
>not working.
>So the arp check is not working for me :(

	I believe you're seeing the expected behavior from arping here,
and it does not automatically indicate that anything is wrong.

	It's very possible that your network topology is such that
arping -I bond0 won't work while arping -I bond0.600 does.  If the
target you specify is reachable only on the VLAN, it's expected behavior
that arping -I bond0 of that target won't work (because the interface
bond0 is not attached to the VLAN, only bond0.600 is).  That doesn't
mean that the ARPs generated internally by bonding are untagged /
failing, as bonding itself adds VLAN tags to its own ARP probes as
needed.

	On the other hand, if you specify different targets to the
arping -I bond0 and arping -I bond0.600 (so that the "bond0" target
isn't a VLAN destination), then something unusual may be occuring.

	Also, are you running multiple blades with bonding behind the
same set of switches?  If you are, you probably want to set the
arp_validate option to either "active" or "all", as the default setting
(none) relies only on the existance of traffic on the slaves, and
doesn't check the source of that traffic.  The end result of that is the
probes from multiple bonding instances fool one another into thinking
the path is up, when it is not.  With arp_validate enabled, it'll check
that the slaves are actually receiving their own ARP traffic.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com



>the nic driver is:
>alias eth0 igb
>alias eth1 igb
>
>the bond setiings:
>cat /proc/net/bonding/bond0
>Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
>
>Bonding Mode: fault-tolerance (active-backup)
>Primary Slave: None
>Currently Active Slave: eth0
>MII Status: up
>MII Polling Interval (ms): 0
>Up Delay (ms): 0
>Down Delay (ms): 0
>ARP Polling Interval (ms): 30
>ARP IP target/s (n.n.n.n form): 172.21.0.254
>
>Slave Interface: eth0
>MII Status: up
>Link Failure Count: 0
>Permanent HW addr: 00:30:48:94:7d:1a
>
>Slave Interface: eth1
>MII Status: up
>Link Failure Count: 0
>Permanent HW addr: 00:30:48:94:7d:1b
>
>
>On Mon, Apr 20, 2009 at 8:37 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> Eric Dumazet <dada1@cosmosbay.com> wrote:
>>
>>>stefan novak a écrit :
>>>> Hello list,
>>>>
>>>> I've got a Problem with my bladecenter and bond interfaces.
>>>> My Server has 2 interface, each on a seperate switch. Each
>>>> serverinterface is connected via a trunk to one of the switches.
>>>> Each switch has a trunk to stacked-backbone switch.
>>>>
>>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch
>>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch
>>>>
>>>> So far vlan and trunking works as expected. But if one trunk
>>>> connection from a bladeswitch to the backbone switch is down the
>>>> mii-tool cant recognize this.
>>>
>>>What is the exact problem on this one ?
>>
>>        This is the expected behavior, the external switch to
>> bladecenter switch (ESM) link status does not affect the ESM to blade
>> link status.
>>
>>        Unless... your ESM supports "trunk failover."  I believe the
>> Cisco ESMs do, I'm not sure about others.  With trunk failover enabled,
>> loss of link on an external switch port will in turn drop link on the
>> corresponding internal switch ports.  There is a long-ish delay for
>> this, on the order of 750 ms, as I recall.
>>
>>>> I tried to use an arp target on the backbone switch to check the
>>>> connection state.
>>>>
>>>> Now i'm running into problems. :(
>>>> My bladesw1 is configured with a trunk and a private vlan id of 600.
>>>> On the backbone switch is a server in the vlan 600 connected, but i
>>>> can't get an arp request.
>>>>
>>>> What can be the problem, or is it possible to add a vlan tag on the arp check?
>>>>
>>>
>>>What driver is in use on your NIC interface ?
>>>
>>>Could you post your bonding settings ?
>>>
>>>cat /proc/net/bonding/bond0
>>
>>        Sufficiently recent versions of bonding should VLAN tag the ARP
>> probes, provided you are using a VLAN device configured above the bond.
>> My recollection is that VLAN tagging of ARP probes was added about three
>> years ago.
>>
>>        If the switch port is configured as native to a VLAN, then it
>> should tag everything coming in.
>>
>>        As Eric asks, what are you running?
>>
>>        -J
>>
>> ---
>>        -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 20:36       ` Jay Vosburgh
@ 2009-04-20 21:03         ` stefan novak
  2009-04-20 21:15           ` Eric Dumazet
  2009-04-20 21:23           ` Jay Vosburgh
  0 siblings, 2 replies; 19+ messages in thread
From: stefan novak @ 2009-04-20 21:03 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

>
>        I believe you're seeing the expected behavior from arping here,
> and it does not automatically indicate that anything is wrong.
>
>        It's very possible that your network topology is such that
> arping -I bond0 won't work while arping -I bond0.600 does.  If the
> target you specify is reachable only on the VLAN, it's expected behavior
> that arping -I bond0 of that target won't work (because the interface
> bond0 is not attached to the VLAN, only bond0.600 is).  That doesn't
> mean that the ARPs generated internally by bonding are untagged /
> failing, as bonding itself adds VLAN tags to its own ARP probes as
> needed.

Ok. I've checked the tcpdump's on the machines and I think something is working.

tcpdump  -v -i eth0 arp
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1
22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1
22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1
22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1

tcpdump  -v -i bond0.600 arp
tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture
size 96 bytes
22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)

the arp's are sent out on eth0 and recieved via bond0.600. When they
are sent on eth0 then the switch must tag the vlan600 (private vlan).
Then they come in at the right interface. Is it normal that so many
arp's are sent?
Is there a way to check if the arp check is working right in the proc
fs oder something like that?

>        Also, are you running multiple blades with bonding behind the
> same set of switches?

Yes, 14 blades with 2 seperate(not connected) switches.

>  If you are, you probably want to set the
> arp_validate option to either "active" or "all", as the default setting
> (none) relies only on the existance of traffic on the slaves, and
> doesn't check the source of that traffic.  The end result of that is the
> probes from multiple bonding instances fool one another into thinking
> the path is up, when it is not.  With arp_validate enabled, it'll check
> that the slaves are actually receiving their own ARP traffic.

Ok, sounds right for me. I've set the arp_validate option to "all".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 21:03         ` stefan novak
@ 2009-04-20 21:15           ` Eric Dumazet
  2009-04-20 21:23           ` Jay Vosburgh
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2009-04-20 21:15 UTC (permalink / raw)
  To: stefan novak; +Cc: Jay Vosburgh, linux-kernel, Linux Netdev List

stefan novak a écrit :
>>        I believe you're seeing the expected behavior from arping here,
>> and it does not automatically indicate that anything is wrong.
>>
>>        It's very possible that your network topology is such that
>> arping -I bond0 won't work while arping -I bond0.600 does.  If the
>> target you specify is reachable only on the VLAN, it's expected behavior
>> that arping -I bond0 of that target won't work (because the interface
>> bond0 is not attached to the VLAN, only bond0.600 is).  That doesn't
>> mean that the ARPs generated internally by bonding are untagged /
>> failing, as bonding itself adds VLAN tags to its own ARP probes as
>> needed.
> 
> Ok. I've checked the tcpdump's on the machines and I think something is working.
> 
> tcpdump  -v -i eth0 arp
> tcpdump: WARNING: eth0: no IPv4 address assigned
> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
> 22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1
> 22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1
> 22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1
> 22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1
> 
> tcpdump  -v -i bond0.600 arp
> tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture
> size 96 bytes
> 22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
> 
> the arp's are sent out on eth0 and recieved via bond0.600. When they
> are sent on eth0 then the switch must tag the vlan600 (private vlan).

Ah, you setup eth0 or bond0 with an IP ?

bond driver does a route loookup to find out if a vlan tag is necessary or not
when issuing an arp request.

So check result of : "ip route get 172.21.0.254"


> Then they come in at the right interface. Is it normal that so many
> arp's are sent?

you setup a 30 ms interval, so you get what you asked for :)

> Is there a way to check if the arp check is working right in the proc
> fs oder something like that?
> 
>>        Also, are you running multiple blades with bonding behind the
>> same set of switches?
> 
> Yes, 14 blades with 2 seperate(not connected) switches.
> 
>>  If you are, you probably want to set the
>> arp_validate option to either "active" or "all", as the default setting
>> (none) relies only on the existance of traffic on the slaves, and
>> doesn't check the source of that traffic.  The end result of that is the
>> probes from multiple bonding instances fool one another into thinking
>> the path is up, when it is not.  With arp_validate enabled, it'll check
>> that the slaves are actually receiving their own ARP traffic.
> 
> Ok, sounds right for me. I've set the arp_validate option to "all".

Please give us :

ifconfig -a



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 21:03         ` stefan novak
  2009-04-20 21:15           ` Eric Dumazet
@ 2009-04-20 21:23           ` Jay Vosburgh
  2009-04-20 21:39             ` stefan novak
  1 sibling, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-20 21:23 UTC (permalink / raw)
  To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

stefan novak <lms.brubaker@gmail.com> wrote:

>>
>>        I believe you're seeing the expected behavior from arping here,
>> and it does not automatically indicate that anything is wrong.
>>
>>        It's very possible that your network topology is such that
>> arping -I bond0 won't work while arping -I bond0.600 does.  If the
>> target you specify is reachable only on the VLAN, it's expected behavior
>> that arping -I bond0 of that target won't work (because the interface
>> bond0 is not attached to the VLAN, only bond0.600 is).  That doesn't
>> mean that the ARPs generated internally by bonding are untagged /
>> failing, as bonding itself adds VLAN tags to its own ARP probes as
>> needed.
>
>Ok. I've checked the tcpdump's on the machines and I think something is working.
>
>tcpdump  -v -i eth0 arp
>tcpdump: WARNING: eth0: no IPv4 address assigned
>tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
>22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1
>22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1
>22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1
>22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1
>
>tcpdump  -v -i bond0.600 arp
>tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture
>size 96 bytes
>22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown)
>
>the arp's are sent out on eth0 and recieved via bond0.600. When they
>are sent on eth0 then the switch must tag the vlan600 (private vlan).
>Then they come in at the right interface. Is it normal that so many
>arp's are sent?

	You set arp_interval to 30, so you get one probe every 30
milliseconds, 33 per second.

	As far as seeing the ARPs on eth0 and bond0.600, I believe
that's normal, as the interfaces are logically stacked, so you can see
the traffic in both places.  If the device supports VLAN acceleration
(and maybe even if it doesn't; I'd have to look), I don't think you'll
see the tag in the tcpdumped traffic, as the tagging is done by the
hardware, not in software.

	It's also normal on bonding to see transmits on the slave and
received traffic on the bond device; that has to do with how the
transmit and receive paths differ.

>Is there a way to check if the arp check is working right in the proc
>fs oder something like that?

	There's no "self test" or anything like that, if that's what you
mean.  If the ARPs work (make the round trip) the link is up, if they
don't, the link is down.  That's subject to some details related to
arp_validate, but is basically it.

>>        Also, are you running multiple blades with bonding behind the
>> same set of switches?
>
>Yes, 14 blades with 2 seperate(not connected) switches.
>
>>  If you are, you probably want to set the
>> arp_validate option to either "active" or "all", as the default setting
>> (none) relies only on the existance of traffic on the slaves, and
>> doesn't check the source of that traffic.  The end result of that is the
>> probes from multiple bonding instances fool one another into thinking
>> the path is up, when it is not.  With arp_validate enabled, it'll check
>> that the slaves are actually receiving their own ARP traffic.
>
>Ok, sounds right for me. I've set the arp_validate option to "all".

	That may make the phantom "up" link go away.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 21:23           ` Jay Vosburgh
@ 2009-04-20 21:39             ` stefan novak
  2009-04-21  0:01               ` Jay Vosburgh
  0 siblings, 1 reply; 19+ messages in thread
From: stefan novak @ 2009-04-20 21:39 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

ip route get 172.21.0.254
172.21.0.254 dev bond0.600  src 172.21.0.1
    cache  mtu 1500 advmss 1460 hoplimit 64

ifconfig -a :
bond0     Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:62221072 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1152153 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4037489679 (3.7 GiB)  TX bytes:124861822 (119.0 MiB)

bond0.200 Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
          inet addr:172.22.0.1  Bcast:172.22.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:5 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:250 (250.0 b)  TX bytes:252 (252.0 b)

bond0.500 Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
          inet addr:172.20.0.1  Bcast:172.20.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:26448 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33570 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6278388 (5.9 MiB)  TX bytes:6429404 (6.1 MiB)

bond0.600 Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
          inet addr:172.21.0.1  Bcast:172.21.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:1911 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8665 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:98604 (96.2 KiB)  TX bytes:1208395 (1.1 MiB)

eth0      Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:881723 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1074651 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:108550619 (103.5 MiB)  TX bytes:115574099 (110.2 MiB)
          Memory:f8220000-f8240000

eth1      Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:61339352 errors:0 dropped:0 overruns:0 frame:0
          TX packets:77510 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3928939240 (3.6 GiB)  TX bytes:9290795 (8.8 MiB)
          Memory:f8260000-f8280000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:105468 errors:0 dropped:0 overruns:0 frame:0
          TX packets:105468 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:7016087 (6.6 MiB)  TX bytes:7016087 (6.6 MiB)


>        You set arp_interval to 30, so you get one probe every 30
> milliseconds, 33 per second.

Yes, sure. my mistake....

>        There's no "self test" or anything like that, if that's what you
> mean.  If the ARPs work (make the round trip) the link is up, if they
> don't, the link is down.  That's subject to some details related to
> arp_validate, but is basically it.

Ok. my interfaces are now up with arp_validate set to 0/none.
Everything works as expected, thank you.
But with arp_validate set to all the interfaces are always down. :(
I think that i have to use the all option because i have many servers
in that vlan?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-20 21:39             ` stefan novak
@ 2009-04-21  0:01               ` Jay Vosburgh
  2009-04-22  8:29                 ` stefan novak
  0 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-21  0:01 UTC (permalink / raw)
  To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

stefan novak <lms.brubaker@gmail.com> wrote:

>ip route get 172.21.0.254
>172.21.0.254 dev bond0.600  src 172.21.0.1
>    cache  mtu 1500 advmss 1460 hoplimit 64
>
>ifconfig -a :
>bond0     Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
>          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:62221072 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:1152153 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:4037489679 (3.7 GiB)  TX bytes:124861822 (119.0 MiB)
>
>bond0.200 Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
>          inet addr:172.22.0.1  Bcast:172.22.0.255  Mask:255.255.255.0
>          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:5 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:250 (250.0 b)  TX bytes:252 (252.0 b)
>
>bond0.500 Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
>          inet addr:172.20.0.1  Bcast:172.20.0.255  Mask:255.255.255.0
>          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:26448 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:33570 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:6278388 (5.9 MiB)  TX bytes:6429404 (6.1 MiB)
>
>bond0.600 Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
>          inet addr:172.21.0.1  Bcast:172.21.0.255  Mask:255.255.255.0
>          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>          RX packets:1911 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:8665 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:98604 (96.2 KiB)  TX bytes:1208395 (1.1 MiB)
>
>eth0      Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
>          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>          RX packets:881723 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:1074651 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:108550619 (103.5 MiB)  TX bytes:115574099 (110.2 MiB)
>          Memory:f8220000-f8240000
>
>eth1      Link encap:Ethernet  HWaddr 00:30:48:94:7D:1A
>          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
>          RX packets:61339352 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:77510 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:3928939240 (3.6 GiB)  TX bytes:9290795 (8.8 MiB)
>          Memory:f8260000-f8280000
>
>lo        Link encap:Local Loopback
>          inet addr:127.0.0.1  Mask:255.0.0.0
>          UP LOOPBACK RUNNING  MTU:16436  Metric:1
>          RX packets:105468 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:105468 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:0
>          RX bytes:7016087 (6.6 MiB)  TX bytes:7016087 (6.6 MiB)

	This looks pretty much as expected.

[...]
>>        There's no "self test" or anything like that, if that's what you
>> mean.  If the ARPs work (make the round trip) the link is up, if they
>> don't, the link is down.  That's subject to some details related to
>> arp_validate, but is basically it.
>
>Ok. my interfaces are now up with arp_validate set to 0/none.
>Everything works as expected, thank you.
>But with arp_validate set to all the interfaces are always down. :(
>I think that i have to use the all option because i have many servers
>in that vlan?

	Well, you'll probably want to run with arp_validate, otherwise
the probe traffic can fool the arp monitor into thinking the path is up
when it isn't.

	But...

	I know what your problem with arp_validate is, though, and it's
something I've been working on as time permits (and forgot to mention
previously).  Basically, the VLAN receive path assigns the VLAN device
to the received packet before doing receive processing on it, so the
"slave" identity is lost before the bonding ARP receive function looks
at it, so it never counts the ARP (for validate purposes).

	I've been chewing on the least bad way to fix this, mostly
trying to figure out if its possible to resolve without adding a hook
into bonding that basically replaces and extends skb_bond_should_drop
(which does work, it's just not very elegant).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-21  0:01               ` Jay Vosburgh
@ 2009-04-22  8:29                 ` stefan novak
  2009-04-23  1:12                   ` Jay Vosburgh
  0 siblings, 1 reply; 19+ messages in thread
From: stefan novak @ 2009-04-22  8:29 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

so its a bug in kernel?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-22  8:29                 ` stefan novak
@ 2009-04-23  1:12                   ` Jay Vosburgh
  2009-04-23  5:58                     ` Eric Dumazet
  2009-04-23  7:48                     ` Jiri Pirko
  0 siblings, 2 replies; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-23  1:12 UTC (permalink / raw)
  To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List

stefan novak <lms.brubaker@gmail.com> wrote:

>so its a bug in kernel?

	Yes.  I think the following patch should add support for
arp_validate over VLANs, could you test this?  This is still a work in
progress, so it's pretty rough around the edges, but the core
functionality should be there.

	This works by moving the skb_bond_should_drop logic around, and
replaces the current inline code with a hook into bonding to do all of
that, plus additional logic.  The reason for a hook is to get the
skb_bond_should_drop callers from the VLAN input path before the input
device is assigned to the VLAN device.

	-J

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 99610f3..cc367a3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 	read_lock(&bond->lock);
 
 	new_slave->last_arp_rx = jiffies;
+	bond_fix_slave_validate_flag(bond, new_slave);
 
 	if (bond->params.miimon && !bond->params.use_carrier) {
 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
 {
 	struct sk_buff *skb;
 
-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
-	       slave_dev->name, dest_ip, src_ip, vlan_id);
+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
 	       
 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
 			 NULL, slave_dev->dev_addr, NULL);
@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 
 	targets = bond->params.arp_targets;
 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
 		if (sip == targets[i]) {
 			if (bond_has_this_ip(bond, tip))
 				slave->last_arp_rx = jiffies;
@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	}
 }
 
-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
 {
 	struct arphdr *arp;
 	struct slave *slave;
+	struct net_device *bond_dev = dev->master;
 	struct bonding *bond;
 	unsigned char *arp_ptr;
 	__be32 sip, tip;
 
 	if (dev_net(dev) != &init_net)
-		goto out;
+		return;
 
-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
-		goto out;
+	bond = netdev_priv(bond_dev);
 
-	bond = netdev_priv(dev);
 	read_lock(&bond->lock);
 
-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
-		orig_dev ? orig_dev->name : "NULL");
-
-	slave = bond_get_slave_by_dev(bond, orig_dev);
+	slave = bond_get_slave_by_dev(bond, dev);
 	if (!slave || !slave_do_arp_validate(bond, slave))
 		goto out_unlock;
 
 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
 		goto out_unlock;
 
-	arp = arp_hdr(skb);
+	/* Can't use arp_hdr; it's not initialized yet. */
+	arp = (struct arphdr *)skb->data;
+
+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
+	       ntohs(arp->ar_pro), arp->ar_pln);
+
 	if (arp->ar_hln != dev->addr_len ||
 	    skb->pkt_type == PACKET_OTHERHOST ||
 	    skb->pkt_type == PACKET_LOOPBACK ||
@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
 	arp_ptr += 4 + dev->addr_len;
 	memcpy(&tip, arp_ptr, 4);
 
-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
-		bond->dev->name, slave->dev->name, slave->state,
-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
-		&sip, &tip);
-
-	/*
-	 * Backup slaves won't see the ARP reply, but do come through
-	 * here for each ARP probe (so we swap the sip/tip to validate
-	 * the probe).  In a "redundant switch, common router" type of
-	 * configuration, the ARP probe will (hopefully) travel from
-	 * the active, through one switch, the router, then the other
-	 * switch before reaching the backup.
-	 */
-	if (slave->state == BOND_STATE_ACTIVE)
+	switch (ntohs(arp->ar_op)) {
+	case ARPOP_REPLY:
 		bond_validate_arp(bond, slave, sip, tip);
-	else
+		break;
+	case ARPOP_REQUEST:
 		bond_validate_arp(bond, slave, tip, sip);
+		break;
+	}
 
 out_unlock:
 	read_unlock(&bond->lock);
-out:
-	dev_kfree_skb(skb);
-	return NET_RX_SUCCESS;
+}
+
+/*
+ * Called by core packet processing in netif_receive_skb and in VLAN fast
+ * path to (a) determine if packet should be dropped, and (b) to perform
+ * ARP monitor processing (last_rx, validation).
+ *
+ * For the VLAN case, called before the skb->dev is reassigned to the
+ * VLAN.
+ *
+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
+ *
+ * We want to keep:
+ * - all traffic on active slaves
+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
+ * ALB/TLB mode and LACP frames in 802.3ad mode.
+ *
+ * ARP frames are handled as normal traffic; the ARP monitor handling
+ * takes place here, so they need not be kept on inactive slaves.
+ */
+static int bond_handle_frame(struct sk_buff *skb)
+{
+	struct net_device *dev;
+	struct net_device *master;
+
+	dev = skb->dev;
+	master = dev->master;
+	if (!master || !master->priv_flags & IFF_BONDING)
+		return 0;
+
+	if (master->priv_flags & IFF_MASTER_ARPMON) {
+		dev->last_rx = jiffies;
+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
+		    skb->protocol == __constant_htons(ETH_P_ARP))
+			bond_handle_arp(skb, dev);
+	}
+
+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
+		if (master->priv_flags & IFF_MASTER_ALB) {
+			if (skb->pkt_type != PACKET_BROADCAST &&
+			    skb->pkt_type != PACKET_MULTICAST)
+				return 0;
+		}
+		if (master->priv_flags & IFF_MASTER_8023AD &&
+		    skb->protocol == __constant_htons(ETH_P_SLOW))
+			return 0;
+
+		return 1;
+	}
+	return 0;
 }
 
 /*
@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
 
 void bond_register_arp(struct bonding *bond)
 {
+#if 0
 	struct packet_type *pt = &bond->arp_mon_pt;
 
 	if (pt->type)
@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
 	pt->dev = bond->dev;
 	pt->func = bond_arp_rcv;
 	dev_add_pack(pt);
+#endif
 }
 
 void bond_unregister_arp(struct bonding *bond)
 {
+#if 0
 	struct packet_type *pt = &bond->arp_mon_pt;
 
 	dev_remove_pack(pt);
 	pt->type = 0;
+#endif
 }
 
 /*---------------------------- Hashing Policies -----------------------------*/
@@ -5188,6 +5230,8 @@ out_rtnl:
 	return res;
 }
 
+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
+
 static int __init bonding_init(void)
 {
 	int i;
@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
 	register_inetaddr_notifier(&bond_inetaddr_notifier);
 	bond_register_ipv6_notifier();
 
+	bond_handle_frame_hook = bond_handle_frame;
+
 	goto out;
 err:
 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
 	rtnl_lock();
 	bond_free_all();
 	rtnl_unlock();
+
+	bond_handle_frame_hook = NULL;
+
 }
 
 module_init(bonding_init);
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 18cf478..0e9f0e9 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 					  struct device_attribute *attr,
 					  const char *buf, size_t count)
 {
-	int new_value;
+	int new_value, i;
 	struct bonding *bond = to_bond(d);
+	struct slave *slave;
 
 	new_value = bond_parse_parm(buf, arp_validate_tbl);
 	if (new_value < 0) {
@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
 	       bond->dev->name, arp_validate_tbl[new_value].modename,
 	       new_value);
 
-	if (!bond->params.arp_validate && new_value) {
-		bond_register_arp(bond);
-	} else if (bond->params.arp_validate && !new_value) {
-		bond_unregister_arp(bond);
-	}
+	if (bond->params.arp_validate != new_value) {
+		bond->params.arp_validate = new_value;
 
-	bond->params.arp_validate = new_value;
+		bond_for_each_slave(bond, slave, i)
+			bond_fix_slave_validate_flag(bond, slave);
+	}
 
 	return count;
 }
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ca849d2..275f08d 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
 	return slave->dev->last_rx;
 }
 
+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
+					       struct slave *slave)
+{
+	if (slave_do_arp_validate(bond, slave))
+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
+	else
+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
+}
+
 static inline void bond_set_slave_inactive_flags(struct slave *slave)
 {
 	struct bonding *bond = netdev_priv(slave->dev->master);
@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
 	    bond->params.mode != BOND_MODE_ALB)
 		slave->state = BOND_STATE_BACKUP;
 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
-	if (slave_do_arp_validate(bond, slave))
-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
+	bond_fix_slave_validate_flag(bond, slave);
 }
 
 static inline void bond_set_slave_active_flags(struct slave *slave)
 {
+	struct bonding *bond = netdev_priv(slave->dev->master);
+
 	slave->state = BOND_STATE_ACTIVE;
-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
+	bond_fix_slave_validate_flag(bond, slave);
 }
 
 static inline void bond_set_master_3ad_flags(struct bonding *bond)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e7783f..3dafd8f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
 	dev->gso_max_size = size;
 }
 
+#if 0
 /* On bonding slaves other than the currently active slave, suppress
  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
  * ARP on active-backup slaves with arp_validate enabled.
@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
 	}
 	return 0;
 }
+#endif
+extern int skb_bond_should_drop(struct sk_buff *skb);
 
 extern struct pernet_operations __net_initdata loopback_net_ops;
 #endif /* __KERNEL__ */
diff --git a/net/core/dev.c b/net/core/dev.c
index 91d792d..ac5a37e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
 #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
 #endif
 
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+int (*bond_handle_frame_hook)(struct sk_buff *skb);
+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
+
+/* On bonding slaves other than the currently active slave, suppress
+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
+ * ARP on active-backup slaves with arp_validate enabled.
+ */
+int skb_bond_should_drop(struct sk_buff *skb)
+{
+	struct net_device *dev = skb->dev;
+	struct net_device *master = dev->master;
+
+	if (master)
+		return bond_handle_frame_hook(skb);
+
+	return 0;
+
+#if 0
+		if (master->priv_flags & IFF_MASTER_ARPMON)
+			dev->last_rx = jiffies;
+
+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
+			    skb->protocol == __constant_htons(ETH_P_ARP))
+				return 0;
+
+			if (master->priv_flags & IFF_MASTER_ALB) {
+				if (skb->pkt_type != PACKET_BROADCAST &&
+				    skb->pkt_type != PACKET_MULTICAST)
+					return 0;
+			}
+			if (master->priv_flags & IFF_MASTER_8023AD &&
+			    skb->protocol == __constant_htons(ETH_P_SLOW))
+				return 0;
+
+			return 1;
+		}
+	}
+	return 0;
+#endif /* 0 */
+}
+#else
+int skb_bond_should_drop(struct sk_buff *skb)
+{
+	return 0;
+}
+#endif
+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
+
 #ifdef CONFIG_NET_CLS_ACT
 /* TODO: Maybe we should just force sch_ingress to be compiled in
  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions


---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23  1:12                   ` Jay Vosburgh
@ 2009-04-23  5:58                     ` Eric Dumazet
  2009-04-23 15:38                       ` Jay Vosburgh
  2009-04-23  7:48                     ` Jiri Pirko
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2009-04-23  5:58 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: stefan novak, linux-kernel, Linux Netdev List

Jay Vosburgh a écrit :
> stefan novak <lms.brubaker@gmail.com> wrote:
> 
>> so its a bug in kernel?
> 
> 	Yes.  I think the following patch should add support for
> arp_validate over VLANs, could you test this?  This is still a work in
> progress, so it's pretty rough around the edges, but the core
> functionality should be there.
> 
> 	This works by moving the skb_bond_should_drop logic around, and
> replaces the current inline code with a hook into bonding to do all of
> that, plus additional logic.  The reason for a hook is to get the
> skb_bond_should_drop callers from the VLAN input path before the input
> device is assigned to the VLAN device.
> 
> 	-J

Hi Jay

I wanted to test your patch and setup such VLAN config, and not yet applied your patch.

# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 250
ARP IP target/s (n.n.n.n form): 192.168.20.254

Slave Interface: eth1
MII Status: up
Link Failure Count: 8
Permanent HW addr: 00:1e:0b:ec:d3:d2

Slave Interface: eth2
MII Status: up
Link Failure Count: 11
Permanent HW addr: 00:1e:0b:92:78:50

# grep . /sys/class/net/bond1/bonding/*
/sys/class/net/bond1/bonding/active_slave:eth2
/sys/class/net/bond1/bonding/ad_select:stable 0
/sys/class/net/bond1/bonding/arp_interval:250
/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254
/sys/class/net/bond1/bonding/arp_validate:none 0
/sys/class/net/bond1/bonding/downdelay:0
/sys/class/net/bond1/bonding/fail_over_mac:none 0
/sys/class/net/bond1/bonding/lacp_rate:slow 0
/sys/class/net/bond1/bonding/miimon:0
/sys/class/net/bond1/bonding/mii_status:up
/sys/class/net/bond1/bonding/mode:active-backup 1
/sys/class/net/bond1/bonding/num_grat_arp:1
/sys/class/net/bond1/bonding/num_unsol_na:1
/sys/class/net/bond1/bonding/slaves:eth1 eth2
/sys/class/net/bond1/bonding/updelay:0
/sys/class/net/bond1/bonding/use_carrier:1
/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0



So active slave is eth2. Still I receive trafic on eth1, according to ifconfig :
# ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets
          RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
          RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0

exactly 4 packets per second.

# ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets
          RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0
          RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0

I also receive arp answers on eth2 (quite normal)

I wanted to tcpdump on eth1 but got nothing :

# tcpdump -p -n -s 0 -i eth1
tcpdump: WARNING: eth1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C

# tcpdump -p -n -s 0 -i eth2
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110
07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110

# tcpdump -p -n -s 0 -i bond1
tcpdump: WARNING: bond1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01

# tcpdump -p -n -s 0 -i bond1.103
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01

Configuration script is 

ip link set eth1 up
ip link set eth2 up

ip addr flush dev eth1
ip addr flush dev eth2

ip link set eth1 up
ip link set eth2 up

modprobe bond1

ifconfig bond1 down

# test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP
echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target
echo 250 >/sys/class/net/bond1/bonding/arp_interval

# mode actif/passif (active-backup)
echo 1 >/sys/class/net/bond1/bonding/mode

ifconfig bond1 up

ifenslave bond1 eth1 eth2

ip link set eth1 up
ip link set eth2 up

ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103
ip addr add 192.168.20.110/24 dev bond1.103
ip link set bond1.103 up

ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825
ip addr add 10.170.73.123/25 dev bond1.825
ip link set bond1.825 up



Is this setup OK to test your patch ?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23  1:12                   ` Jay Vosburgh
  2009-04-23  5:58                     ` Eric Dumazet
@ 2009-04-23  7:48                     ` Jiri Pirko
  2009-04-23 14:59                       ` Jay Vosburgh
  1 sibling, 1 reply; 19+ messages in thread
From: Jiri Pirko @ 2009-04-23  7:48 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List

Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote:
>stefan novak <lms.brubaker@gmail.com> wrote:
>
>>so its a bug in kernel?
>
>	Yes.  I think the following patch should add support for
>arp_validate over VLANs, could you test this?  This is still a work in
>progress, so it's pretty rough around the edges, but the core
>functionality should be there.
>
>	This works by moving the skb_bond_should_drop logic around, and
>replaces the current inline code with a hook into bonding to do all of
>that, plus additional logic.  The reason for a hook is to get the
>skb_bond_should_drop callers from the VLAN input path before the input
>device is assigned to the VLAN device.
>
>	-J
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 99610f3..cc367a3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 	read_lock(&bond->lock);
> 
> 	new_slave->last_arp_rx = jiffies;
>+	bond_fix_slave_validate_flag(bond, new_slave);
> 
> 	if (bond->params.miimon && !bond->params.use_carrier) {
> 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
> {
> 	struct sk_buff *skb;
> 
>-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>-	       slave_dev->name, dest_ip, src_ip, vlan_id);
>+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
> 	       
> 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
> 			 NULL, slave_dev->dev_addr, NULL);
>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
> 
> 	targets = bond->params.arp_targets;
> 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
> 		if (sip == targets[i]) {
> 			if (bond_has_this_ip(bond, tip))
> 				slave->last_arp_rx = jiffies;
>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
> 	}
> }
> 
>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
> {
> 	struct arphdr *arp;
> 	struct slave *slave;
>+	struct net_device *bond_dev = dev->master;
> 	struct bonding *bond;
> 	unsigned char *arp_ptr;
> 	__be32 sip, tip;
> 
> 	if (dev_net(dev) != &init_net)
>-		goto out;
>+		return;
> 
>-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>-		goto out;
>+	bond = netdev_priv(bond_dev);
> 
>-	bond = netdev_priv(dev);
> 	read_lock(&bond->lock);
> 
>-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>-		orig_dev ? orig_dev->name : "NULL");
>-
>-	slave = bond_get_slave_by_dev(bond, orig_dev);
>+	slave = bond_get_slave_by_dev(bond, dev);
> 	if (!slave || !slave_do_arp_validate(bond, slave))
> 		goto out_unlock;
> 
> 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
> 		goto out_unlock;
> 
>-	arp = arp_hdr(skb);
>+	/* Can't use arp_hdr; it's not initialized yet. */
>+	arp = (struct arphdr *)skb->data;
>+
>+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>+	       ntohs(arp->ar_pro), arp->ar_pln);
>+
> 	if (arp->ar_hln != dev->addr_len ||
> 	    skb->pkt_type == PACKET_OTHERHOST ||
> 	    skb->pkt_type == PACKET_LOOPBACK ||
>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
> 	arp_ptr += 4 + dev->addr_len;
> 	memcpy(&tip, arp_ptr, 4);
> 
>-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>-		bond->dev->name, slave->dev->name, slave->state,
>-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>-		&sip, &tip);
>-
>-	/*
>-	 * Backup slaves won't see the ARP reply, but do come through
>-	 * here for each ARP probe (so we swap the sip/tip to validate
>-	 * the probe).  In a "redundant switch, common router" type of
>-	 * configuration, the ARP probe will (hopefully) travel from
>-	 * the active, through one switch, the router, then the other
>-	 * switch before reaching the backup.
>-	 */
>-	if (slave->state == BOND_STATE_ACTIVE)
>+	switch (ntohs(arp->ar_op)) {
>+	case ARPOP_REPLY:
> 		bond_validate_arp(bond, slave, sip, tip);
>-	else
>+		break;
>+	case ARPOP_REQUEST:
> 		bond_validate_arp(bond, slave, tip, sip);
>+		break;
>+	}
> 
> out_unlock:
> 	read_unlock(&bond->lock);
>-out:
>-	dev_kfree_skb(skb);
>-	return NET_RX_SUCCESS;
>+}
>+
>+/*
>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>+ * path to (a) determine if packet should be dropped, and (b) to perform
>+ * ARP monitor processing (last_rx, validation).
>+ *
>+ * For the VLAN case, called before the skb->dev is reassigned to the
>+ * VLAN.
>+ *
>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>+ *
>+ * We want to keep:
>+ * - all traffic on active slaves
>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>+ *
>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>+ * takes place here, so they need not be kept on inactive slaves.
>+ */
>+static int bond_handle_frame(struct sk_buff *skb)
>+{
>+	struct net_device *dev;
>+	struct net_device *master;
>+
>+	dev = skb->dev;
>+	master = dev->master;
>+	if (!master || !master->priv_flags & IFF_BONDING)
>+		return 0;
>+
>+	if (master->priv_flags & IFF_MASTER_ARPMON) {
>+		dev->last_rx = jiffies;
>+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>+		    skb->protocol == __constant_htons(ETH_P_ARP))
>+			bond_handle_arp(skb, dev);
>+	}
>+
>+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>+		if (master->priv_flags & IFF_MASTER_ALB) {
>+			if (skb->pkt_type != PACKET_BROADCAST &&
>+			    skb->pkt_type != PACKET_MULTICAST)
>+				return 0;
>+		}
>+		if (master->priv_flags & IFF_MASTER_8023AD &&
>+		    skb->protocol == __constant_htons(ETH_P_SLOW))
>+			return 0;
>+
>+		return 1;
>+	}
>+	return 0;
> }
> 
> /*
>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
> 
> void bond_register_arp(struct bonding *bond)
> {
>+#if 0
> 	struct packet_type *pt = &bond->arp_mon_pt;
> 
> 	if (pt->type)
>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
> 	pt->dev = bond->dev;
> 	pt->func = bond_arp_rcv;
> 	dev_add_pack(pt);
>+#endif
> }
> 
> void bond_unregister_arp(struct bonding *bond)
> {
>+#if 0
> 	struct packet_type *pt = &bond->arp_mon_pt;
> 
> 	dev_remove_pack(pt);
> 	pt->type = 0;
>+#endif
> }
> 
> /*---------------------------- Hashing Policies -----------------------------*/
>@@ -5188,6 +5230,8 @@ out_rtnl:
> 	return res;
> }
> 
>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>+
> static int __init bonding_init(void)
> {
> 	int i;
>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
> 	register_inetaddr_notifier(&bond_inetaddr_notifier);
> 	bond_register_ipv6_notifier();
> 
>+	bond_handle_frame_hook = bond_handle_frame;
>+
> 	goto out;
> err:
> 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
> 	rtnl_lock();
> 	bond_free_all();
> 	rtnl_unlock();
>+
>+	bond_handle_frame_hook = NULL;
>+
> }
> 
> module_init(bonding_init);
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 18cf478..0e9f0e9 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
> 					  struct device_attribute *attr,
> 					  const char *buf, size_t count)
> {
>-	int new_value;
>+	int new_value, i;
> 	struct bonding *bond = to_bond(d);
>+	struct slave *slave;
> 
> 	new_value = bond_parse_parm(buf, arp_validate_tbl);
> 	if (new_value < 0) {
>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
> 	       bond->dev->name, arp_validate_tbl[new_value].modename,
> 	       new_value);
> 
>-	if (!bond->params.arp_validate && new_value) {
>-		bond_register_arp(bond);
>-	} else if (bond->params.arp_validate && !new_value) {
>-		bond_unregister_arp(bond);
>-	}
>+	if (bond->params.arp_validate != new_value) {
>+		bond->params.arp_validate = new_value;
> 
>-	bond->params.arp_validate = new_value;
>+		bond_for_each_slave(bond, slave, i)
>+			bond_fix_slave_validate_flag(bond, slave);
>+	}
> 
> 	return count;
> }
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index ca849d2..275f08d 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
> 	return slave->dev->last_rx;
> }
> 
>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>+					       struct slave *slave)
>+{
>+	if (slave_do_arp_validate(bond, slave))
>+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>+	else
>+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>+}
>+
> static inline void bond_set_slave_inactive_flags(struct slave *slave)
> {
> 	struct bonding *bond = netdev_priv(slave->dev->master);
>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
> 	    bond->params.mode != BOND_MODE_ALB)
> 		slave->state = BOND_STATE_BACKUP;
> 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>-	if (slave_do_arp_validate(bond, slave))
>-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>+	bond_fix_slave_validate_flag(bond, slave);
> }
> 
> static inline void bond_set_slave_active_flags(struct slave *slave)
> {
>+	struct bonding *bond = netdev_priv(slave->dev->master);
>+
> 	slave->state = BOND_STATE_ACTIVE;
>-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>+	bond_fix_slave_validate_flag(bond, slave);
> }
> 
> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 2e7783f..3dafd8f 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
> 	dev->gso_max_size = size;
> }
> 
>+#if 0
> /* On bonding slaves other than the currently active slave, suppress
>  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>  * ARP on active-backup slaves with arp_validate enabled.
>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
> 	}
> 	return 0;
> }
>+#endif
>+extern int skb_bond_should_drop(struct sk_buff *skb);
> 
> extern struct pernet_operations __net_initdata loopback_net_ops;
> #endif /* __KERNEL__ */
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 91d792d..ac5a37e 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
> #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
> #endif
> 
>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>+
>+/* On bonding slaves other than the currently active slave, suppress
>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>+ * ARP on active-backup slaves with arp_validate enabled.
>+ */
>+int skb_bond_should_drop(struct sk_buff *skb)
>+{
>+	struct net_device *dev = skb->dev;
>+	struct net_device *master = dev->master;
>+
>+	if (master)
>+		return bond_handle_frame_hook(skb);

Maybe this hook can be called from netif_receive_skb() directly. You would safe
at least 2 dereferences, 1 if check. You would also need to add
"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>+
>+	return 0;
>+
>+#if 0
>+		if (master->priv_flags & IFF_MASTER_ARPMON)
>+			dev->last_rx = jiffies;
>+
>+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>+			    skb->protocol == __constant_htons(ETH_P_ARP))
>+				return 0;
>+
>+			if (master->priv_flags & IFF_MASTER_ALB) {
>+				if (skb->pkt_type != PACKET_BROADCAST &&
>+				    skb->pkt_type != PACKET_MULTICAST)
>+					return 0;
>+			}
>+			if (master->priv_flags & IFF_MASTER_8023AD &&
>+			    skb->protocol == __constant_htons(ETH_P_SLOW))
>+				return 0;
>+
>+			return 1;
>+		}
>+	}
>+	return 0;
>+#endif /* 0 */
>+}
>+#else
>+int skb_bond_should_drop(struct sk_buff *skb)
>+{
>+	return 0;
>+}
>+#endif
>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>+
> #ifdef CONFIG_NET_CLS_ACT
> /* TODO: Maybe we should just force sch_ingress to be compiled in
>  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>
>
>---
>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23  7:48                     ` Jiri Pirko
@ 2009-04-23 14:59                       ` Jay Vosburgh
  2009-04-23 15:20                         ` Jiri Pirko
  0 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-23 14:59 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List

Jiri Pirko <jpirko@redhat.com> wrote:

>Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote:
>>stefan novak <lms.brubaker@gmail.com> wrote:
>>
>>>so its a bug in kernel?
>>
>>	Yes.  I think the following patch should add support for
>>arp_validate over VLANs, could you test this?  This is still a work in
>>progress, so it's pretty rough around the edges, but the core
>>functionality should be there.
>>
>>	This works by moving the skb_bond_should_drop logic around, and
>>replaces the current inline code with a hook into bonding to do all of
>>that, plus additional logic.  The reason for a hook is to get the
>>skb_bond_should_drop callers from the VLAN input path before the input
>>device is assigned to the VLAN device.
>>
>>	-J
>>
>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>index 99610f3..cc367a3 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>> 	read_lock(&bond->lock);
>> 
>> 	new_slave->last_arp_rx = jiffies;
>>+	bond_fix_slave_validate_flag(bond, new_slave);
>> 
>> 	if (bond->params.miimon && !bond->params.use_carrier) {
>> 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
>> {
>> 	struct sk_buff *skb;
>> 
>>-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>>-	       slave_dev->name, dest_ip, src_ip, vlan_id);
>>+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>>+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
>> 	       
>> 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
>> 			 NULL, slave_dev->dev_addr, NULL);
>>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>> 
>> 	targets = bond->params.arp_targets;
>> 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>>-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>>-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
>> 		if (sip == targets[i]) {
>> 			if (bond_has_this_ip(bond, tip))
>> 				slave->last_arp_rx = jiffies;
>>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>> 	}
>> }
>> 
>>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
>> {
>> 	struct arphdr *arp;
>> 	struct slave *slave;
>>+	struct net_device *bond_dev = dev->master;
>> 	struct bonding *bond;
>> 	unsigned char *arp_ptr;
>> 	__be32 sip, tip;
>> 
>> 	if (dev_net(dev) != &init_net)
>>-		goto out;
>>+		return;
>> 
>>-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>>-		goto out;
>>+	bond = netdev_priv(bond_dev);
>> 
>>-	bond = netdev_priv(dev);
>> 	read_lock(&bond->lock);
>> 
>>-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>>-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>>-		orig_dev ? orig_dev->name : "NULL");
>>-
>>-	slave = bond_get_slave_by_dev(bond, orig_dev);
>>+	slave = bond_get_slave_by_dev(bond, dev);
>> 	if (!slave || !slave_do_arp_validate(bond, slave))
>> 		goto out_unlock;
>> 
>> 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
>> 		goto out_unlock;
>> 
>>-	arp = arp_hdr(skb);
>>+	/* Can't use arp_hdr; it's not initialized yet. */
>>+	arp = (struct arphdr *)skb->data;
>>+
>>+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>>+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>>+	       ntohs(arp->ar_pro), arp->ar_pln);
>>+
>> 	if (arp->ar_hln != dev->addr_len ||
>> 	    skb->pkt_type == PACKET_OTHERHOST ||
>> 	    skb->pkt_type == PACKET_LOOPBACK ||
>>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
>> 	arp_ptr += 4 + dev->addr_len;
>> 	memcpy(&tip, arp_ptr, 4);
>> 
>>-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>>-		bond->dev->name, slave->dev->name, slave->state,
>>-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>>-		&sip, &tip);
>>-
>>-	/*
>>-	 * Backup slaves won't see the ARP reply, but do come through
>>-	 * here for each ARP probe (so we swap the sip/tip to validate
>>-	 * the probe).  In a "redundant switch, common router" type of
>>-	 * configuration, the ARP probe will (hopefully) travel from
>>-	 * the active, through one switch, the router, then the other
>>-	 * switch before reaching the backup.
>>-	 */
>>-	if (slave->state == BOND_STATE_ACTIVE)
>>+	switch (ntohs(arp->ar_op)) {
>>+	case ARPOP_REPLY:
>> 		bond_validate_arp(bond, slave, sip, tip);
>>-	else
>>+		break;
>>+	case ARPOP_REQUEST:
>> 		bond_validate_arp(bond, slave, tip, sip);
>>+		break;
>>+	}
>> 
>> out_unlock:
>> 	read_unlock(&bond->lock);
>>-out:
>>-	dev_kfree_skb(skb);
>>-	return NET_RX_SUCCESS;
>>+}
>>+
>>+/*
>>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>>+ * path to (a) determine if packet should be dropped, and (b) to perform
>>+ * ARP monitor processing (last_rx, validation).
>>+ *
>>+ * For the VLAN case, called before the skb->dev is reassigned to the
>>+ * VLAN.
>>+ *
>>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>>+ *
>>+ * We want to keep:
>>+ * - all traffic on active slaves
>>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>>+ *
>>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>>+ * takes place here, so they need not be kept on inactive slaves.
>>+ */
>>+static int bond_handle_frame(struct sk_buff *skb)
>>+{
>>+	struct net_device *dev;
>>+	struct net_device *master;
>>+
>>+	dev = skb->dev;
>>+	master = dev->master;
>>+	if (!master || !master->priv_flags & IFF_BONDING)
>>+		return 0;
>>+
>>+	if (master->priv_flags & IFF_MASTER_ARPMON) {
>>+		dev->last_rx = jiffies;
>>+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>+		    skb->protocol == __constant_htons(ETH_P_ARP))
>>+			bond_handle_arp(skb, dev);
>>+	}
>>+
>>+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>+		if (master->priv_flags & IFF_MASTER_ALB) {
>>+			if (skb->pkt_type != PACKET_BROADCAST &&
>>+			    skb->pkt_type != PACKET_MULTICAST)
>>+				return 0;
>>+		}
>>+		if (master->priv_flags & IFF_MASTER_8023AD &&
>>+		    skb->protocol == __constant_htons(ETH_P_SLOW))
>>+			return 0;
>>+
>>+		return 1;
>>+	}
>>+	return 0;
>> }
>> 
>> /*
>>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
>> 
>> void bond_register_arp(struct bonding *bond)
>> {
>>+#if 0
>> 	struct packet_type *pt = &bond->arp_mon_pt;
>> 
>> 	if (pt->type)
>>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
>> 	pt->dev = bond->dev;
>> 	pt->func = bond_arp_rcv;
>> 	dev_add_pack(pt);
>>+#endif
>> }
>> 
>> void bond_unregister_arp(struct bonding *bond)
>> {
>>+#if 0
>> 	struct packet_type *pt = &bond->arp_mon_pt;
>> 
>> 	dev_remove_pack(pt);
>> 	pt->type = 0;
>>+#endif
>> }
>> 
>> /*---------------------------- Hashing Policies -----------------------------*/
>>@@ -5188,6 +5230,8 @@ out_rtnl:
>> 	return res;
>> }
>> 
>>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>+
>> static int __init bonding_init(void)
>> {
>> 	int i;
>>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
>> 	register_inetaddr_notifier(&bond_inetaddr_notifier);
>> 	bond_register_ipv6_notifier();
>> 
>>+	bond_handle_frame_hook = bond_handle_frame;
>>+
>> 	goto out;
>> err:
>> 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
>> 	rtnl_lock();
>> 	bond_free_all();
>> 	rtnl_unlock();
>>+
>>+	bond_handle_frame_hook = NULL;
>>+
>> }
>> 
>> module_init(bonding_init);
>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>index 18cf478..0e9f0e9 100644
>>--- a/drivers/net/bonding/bond_sysfs.c
>>+++ b/drivers/net/bonding/bond_sysfs.c
>>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>> 					  struct device_attribute *attr,
>> 					  const char *buf, size_t count)
>> {
>>-	int new_value;
>>+	int new_value, i;
>> 	struct bonding *bond = to_bond(d);
>>+	struct slave *slave;
>> 
>> 	new_value = bond_parse_parm(buf, arp_validate_tbl);
>> 	if (new_value < 0) {
>>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>> 	       bond->dev->name, arp_validate_tbl[new_value].modename,
>> 	       new_value);
>> 
>>-	if (!bond->params.arp_validate && new_value) {
>>-		bond_register_arp(bond);
>>-	} else if (bond->params.arp_validate && !new_value) {
>>-		bond_unregister_arp(bond);
>>-	}
>>+	if (bond->params.arp_validate != new_value) {
>>+		bond->params.arp_validate = new_value;
>> 
>>-	bond->params.arp_validate = new_value;
>>+		bond_for_each_slave(bond, slave, i)
>>+			bond_fix_slave_validate_flag(bond, slave);
>>+	}
>> 
>> 	return count;
>> }
>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>>index ca849d2..275f08d 100644
>>--- a/drivers/net/bonding/bonding.h
>>+++ b/drivers/net/bonding/bonding.h
>>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
>> 	return slave->dev->last_rx;
>> }
>> 
>>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>>+					       struct slave *slave)
>>+{
>>+	if (slave_do_arp_validate(bond, slave))
>>+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>+	else
>>+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>>+}
>>+
>> static inline void bond_set_slave_inactive_flags(struct slave *slave)
>> {
>> 	struct bonding *bond = netdev_priv(slave->dev->master);
>>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
>> 	    bond->params.mode != BOND_MODE_ALB)
>> 		slave->state = BOND_STATE_BACKUP;
>> 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>>-	if (slave_do_arp_validate(bond, slave))
>>-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>+	bond_fix_slave_validate_flag(bond, slave);
>> }
>> 
>> static inline void bond_set_slave_active_flags(struct slave *slave)
>> {
>>+	struct bonding *bond = netdev_priv(slave->dev->master);
>>+
>> 	slave->state = BOND_STATE_ACTIVE;
>>-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>>+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>>+	bond_fix_slave_validate_flag(bond, slave);
>> }
>> 
>> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>index 2e7783f..3dafd8f 100644
>>--- a/include/linux/netdevice.h
>>+++ b/include/linux/netdevice.h
>>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
>> 	dev->gso_max_size = size;
>> }
>> 
>>+#if 0
>> /* On bonding slaves other than the currently active slave, suppress
>>  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>  * ARP on active-backup slaves with arp_validate enabled.
>>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
>> 	}
>> 	return 0;
>> }
>>+#endif
>>+extern int skb_bond_should_drop(struct sk_buff *skb);
>> 
>> extern struct pernet_operations __net_initdata loopback_net_ops;
>> #endif /* __KERNEL__ */
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 91d792d..ac5a37e 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
>> #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
>> #endif
>> 
>>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>>+
>>+/* On bonding slaves other than the currently active slave, suppress
>>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>+ * ARP on active-backup slaves with arp_validate enabled.
>>+ */
>>+int skb_bond_should_drop(struct sk_buff *skb)
>>+{
>>+	struct net_device *dev = skb->dev;
>>+	struct net_device *master = dev->master;
>>+
>>+	if (master)
>>+		return bond_handle_frame_hook(skb);
>
>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>at least 2 dereferences, 1 if check. You would also need to add
>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().

	This won't work, because the VLAN code reassigns skb->dev to the
VLAN device before calling netif_receive_skb.

	In the VLAN path, the second call to skb_bond_should_drop (the
first is within the VLAN code itself, __vlan_hwaccel_rx or
vlan_gro_common, the second is netif_receive_skb) won't call the hook,
because the VLAN device has no dev->master.

	This is the whole reason for the hook: to process the ARP before
VLAN reassigns skb->dev.  Once that happens, the actual device the ARP
arrived on is lost.

	-J


>>+
>>+	return 0;
>>+
>>+#if 0
>>+		if (master->priv_flags & IFF_MASTER_ARPMON)
>>+			dev->last_rx = jiffies;
>>+
>>+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>+			    skb->protocol == __constant_htons(ETH_P_ARP))
>>+				return 0;
>>+
>>+			if (master->priv_flags & IFF_MASTER_ALB) {
>>+				if (skb->pkt_type != PACKET_BROADCAST &&
>>+				    skb->pkt_type != PACKET_MULTICAST)
>>+					return 0;
>>+			}
>>+			if (master->priv_flags & IFF_MASTER_8023AD &&
>>+			    skb->protocol == __constant_htons(ETH_P_SLOW))
>>+				return 0;
>>+
>>+			return 1;
>>+		}
>>+	}
>>+	return 0;
>>+#endif /* 0 */
>>+}
>>+#else
>>+int skb_bond_should_drop(struct sk_buff *skb)
>>+{
>>+	return 0;
>>+}
>>+#endif
>>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>>+
>> #ifdef CONFIG_NET_CLS_ACT
>> /* TODO: Maybe we should just force sch_ingress to be compiled in
>>  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>>
>>
>>---
>>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23 14:59                       ` Jay Vosburgh
@ 2009-04-23 15:20                         ` Jiri Pirko
  2009-04-23 18:34                           ` Jay Vosburgh
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Pirko @ 2009-04-23 15:20 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List

Thu, Apr 23, 2009 at 04:59:41PM CEST, fubar@us.ibm.com wrote:
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>>Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote:
>>>stefan novak <lms.brubaker@gmail.com> wrote:
>>>
>>>>so its a bug in kernel?
>>>
>>>	Yes.  I think the following patch should add support for
>>>arp_validate over VLANs, could you test this?  This is still a work in
>>>progress, so it's pretty rough around the edges, but the core
>>>functionality should be there.
>>>
>>>	This works by moving the skb_bond_should_drop logic around, and
>>>replaces the current inline code with a hook into bonding to do all of
>>>that, plus additional logic.  The reason for a hook is to get the
>>>skb_bond_should_drop callers from the VLAN input path before the input
>>>device is assigned to the VLAN device.
>>>
>>>	-J
>>>
>>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>>index 99610f3..cc367a3 100644
>>>--- a/drivers/net/bonding/bond_main.c
>>>+++ b/drivers/net/bonding/bond_main.c
>>>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>>> 	read_lock(&bond->lock);
>>> 
>>> 	new_slave->last_arp_rx = jiffies;
>>>+	bond_fix_slave_validate_flag(bond, new_slave);
>>> 
>>> 	if (bond->params.miimon && !bond->params.use_carrier) {
>>> 		link_reporting = bond_check_dev_link(bond, slave_dev, 1);
>>>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_
>>> {
>>> 	struct sk_buff *skb;
>>> 
>>>-	pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op,
>>>-	       slave_dev->name, dest_ip, src_ip, vlan_id);
>>>+	pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op,
>>>+		 slave_dev->name, dest_ip, src_ip, vlan_id, jiffies);
>>> 	       
>>> 	skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip,
>>> 			 NULL, slave_dev->dev_addr, NULL);
>>>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>>> 
>>> 	targets = bond->params.arp_targets;
>>> 	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
>>>-		pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n",
>>>-			&sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip));
>>> 		if (sip == targets[i]) {
>>> 			if (bond_has_this_ip(bond, tip))
>>> 				slave->last_arp_rx = jiffies;
>>>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
>>> 	}
>>> }
>>> 
>>>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
>>>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev)
>>> {
>>> 	struct arphdr *arp;
>>> 	struct slave *slave;
>>>+	struct net_device *bond_dev = dev->master;
>>> 	struct bonding *bond;
>>> 	unsigned char *arp_ptr;
>>> 	__be32 sip, tip;
>>> 
>>> 	if (dev_net(dev) != &init_net)
>>>-		goto out;
>>>+		return;
>>> 
>>>-	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
>>>-		goto out;
>>>+	bond = netdev_priv(bond_dev);
>>> 
>>>-	bond = netdev_priv(dev);
>>> 	read_lock(&bond->lock);
>>> 
>>>-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
>>>-		bond->dev->name, skb->dev ? skb->dev->name : "NULL",
>>>-		orig_dev ? orig_dev->name : "NULL");
>>>-
>>>-	slave = bond_get_slave_by_dev(bond, orig_dev);
>>>+	slave = bond_get_slave_by_dev(bond, dev);
>>> 	if (!slave || !slave_do_arp_validate(bond, slave))
>>> 		goto out_unlock;
>>> 
>>> 	if (!pskb_may_pull(skb, arp_hdr_len(dev)))
>>> 		goto out_unlock;
>>> 
>>>-	arp = arp_hdr(skb);
>>>+	/* Can't use arp_hdr; it's not initialized yet. */
>>>+	arp = (struct arphdr *)skb->data;
>>>+
>>>+	pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n",
>>>+	       arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd),
>>>+	       ntohs(arp->ar_pro), arp->ar_pln);
>>>+
>>> 	if (arp->ar_hln != dev->addr_len ||
>>> 	    skb->pkt_type == PACKET_OTHERHOST ||
>>> 	    skb->pkt_type == PACKET_LOOPBACK ||
>>>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
>>> 	arp_ptr += 4 + dev->addr_len;
>>> 	memcpy(&tip, arp_ptr, 4);
>>> 
>>>-	pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n",
>>>-		bond->dev->name, slave->dev->name, slave->state,
>>>-		bond->params.arp_validate, slave_do_arp_validate(bond, slave),
>>>-		&sip, &tip);
>>>-
>>>-	/*
>>>-	 * Backup slaves won't see the ARP reply, but do come through
>>>-	 * here for each ARP probe (so we swap the sip/tip to validate
>>>-	 * the probe).  In a "redundant switch, common router" type of
>>>-	 * configuration, the ARP probe will (hopefully) travel from
>>>-	 * the active, through one switch, the router, then the other
>>>-	 * switch before reaching the backup.
>>>-	 */
>>>-	if (slave->state == BOND_STATE_ACTIVE)
>>>+	switch (ntohs(arp->ar_op)) {
>>>+	case ARPOP_REPLY:
>>> 		bond_validate_arp(bond, slave, sip, tip);
>>>-	else
>>>+		break;
>>>+	case ARPOP_REQUEST:
>>> 		bond_validate_arp(bond, slave, tip, sip);
>>>+		break;
>>>+	}
>>> 
>>> out_unlock:
>>> 	read_unlock(&bond->lock);
>>>-out:
>>>-	dev_kfree_skb(skb);
>>>-	return NET_RX_SUCCESS;
>>>+}
>>>+
>>>+/*
>>>+ * Called by core packet processing in netif_receive_skb and in VLAN fast
>>>+ * path to (a) determine if packet should be dropped, and (b) to perform
>>>+ * ARP monitor processing (last_rx, validation).
>>>+ *
>>>+ * For the VLAN case, called before the skb->dev is reassigned to the
>>>+ * VLAN.
>>>+ *
>>>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept.
>>>+ *
>>>+ * We want to keep:
>>>+ * - all traffic on active slaves
>>>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in
>>>+ * ALB/TLB mode and LACP frames in 802.3ad mode.
>>>+ *
>>>+ * ARP frames are handled as normal traffic; the ARP monitor handling
>>>+ * takes place here, so they need not be kept on inactive slaves.
>>>+ */
>>>+static int bond_handle_frame(struct sk_buff *skb)
>>>+{
>>>+	struct net_device *dev;
>>>+	struct net_device *master;
>>>+
>>>+	dev = skb->dev;
>>>+	master = dev->master;
>>>+	if (!master || !master->priv_flags & IFF_BONDING)
>>>+		return 0;
>>>+
>>>+	if (master->priv_flags & IFF_MASTER_ARPMON) {
>>>+		dev->last_rx = jiffies;
>>>+		if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>>+		    skb->protocol == __constant_htons(ETH_P_ARP))
>>>+			bond_handle_arp(skb, dev);
>>>+	}
>>>+
>>>+	if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>>+		if (master->priv_flags & IFF_MASTER_ALB) {
>>>+			if (skb->pkt_type != PACKET_BROADCAST &&
>>>+			    skb->pkt_type != PACKET_MULTICAST)
>>>+				return 0;
>>>+		}
>>>+		if (master->priv_flags & IFF_MASTER_8023AD &&
>>>+		    skb->protocol == __constant_htons(ETH_P_SLOW))
>>>+			return 0;
>>>+
>>>+		return 1;
>>>+	}
>>>+	return 0;
>>> }
>>> 
>>> /*
>>>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond)
>>> 
>>> void bond_register_arp(struct bonding *bond)
>>> {
>>>+#if 0
>>> 	struct packet_type *pt = &bond->arp_mon_pt;
>>> 
>>> 	if (pt->type)
>>>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond)
>>> 	pt->dev = bond->dev;
>>> 	pt->func = bond_arp_rcv;
>>> 	dev_add_pack(pt);
>>>+#endif
>>> }
>>> 
>>> void bond_unregister_arp(struct bonding *bond)
>>> {
>>>+#if 0
>>> 	struct packet_type *pt = &bond->arp_mon_pt;
>>> 
>>> 	dev_remove_pack(pt);
>>> 	pt->type = 0;
>>>+#endif
>>> }
>>> 
>>> /*---------------------------- Hashing Policies -----------------------------*/
>>>@@ -5188,6 +5230,8 @@ out_rtnl:
>>> 	return res;
>>> }
>>> 
>>>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>>+
>>> static int __init bonding_init(void)
>>> {
>>> 	int i;
>>>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void)
>>> 	register_inetaddr_notifier(&bond_inetaddr_notifier);
>>> 	bond_register_ipv6_notifier();
>>> 
>>>+	bond_handle_frame_hook = bond_handle_frame;
>>>+
>>> 	goto out;
>>> err:
>>> 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void)
>>> 	rtnl_lock();
>>> 	bond_free_all();
>>> 	rtnl_unlock();
>>>+
>>>+	bond_handle_frame_hook = NULL;
>>>+
>>> }
>>> 
>>> module_init(bonding_init);
>>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>>index 18cf478..0e9f0e9 100644
>>>--- a/drivers/net/bonding/bond_sysfs.c
>>>+++ b/drivers/net/bonding/bond_sysfs.c
>>>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>>> 					  struct device_attribute *attr,
>>> 					  const char *buf, size_t count)
>>> {
>>>-	int new_value;
>>>+	int new_value, i;
>>> 	struct bonding *bond = to_bond(d);
>>>+	struct slave *slave;
>>> 
>>> 	new_value = bond_parse_parm(buf, arp_validate_tbl);
>>> 	if (new_value < 0) {
>>>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d,
>>> 	       bond->dev->name, arp_validate_tbl[new_value].modename,
>>> 	       new_value);
>>> 
>>>-	if (!bond->params.arp_validate && new_value) {
>>>-		bond_register_arp(bond);
>>>-	} else if (bond->params.arp_validate && !new_value) {
>>>-		bond_unregister_arp(bond);
>>>-	}
>>>+	if (bond->params.arp_validate != new_value) {
>>>+		bond->params.arp_validate = new_value;
>>> 
>>>-	bond->params.arp_validate = new_value;
>>>+		bond_for_each_slave(bond, slave, i)
>>>+			bond_fix_slave_validate_flag(bond, slave);
>>>+	}
>>> 
>>> 	return count;
>>> }
>>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>>>index ca849d2..275f08d 100644
>>>--- a/drivers/net/bonding/bonding.h
>>>+++ b/drivers/net/bonding/bonding.h
>>>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond,
>>> 	return slave->dev->last_rx;
>>> }
>>> 
>>>+static inline void bond_fix_slave_validate_flag(struct bonding *bond,
>>>+					       struct slave *slave)
>>>+{
>>>+	if (slave_do_arp_validate(bond, slave))
>>>+		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>>+	else
>>>+		slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP;
>>>+}
>>>+
>>> static inline void bond_set_slave_inactive_flags(struct slave *slave)
>>> {
>>> 	struct bonding *bond = netdev_priv(slave->dev->master);
>>>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
>>> 	    bond->params.mode != BOND_MODE_ALB)
>>> 		slave->state = BOND_STATE_BACKUP;
>>> 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
>>>-	if (slave_do_arp_validate(bond, slave))
>>>-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
>>>+	bond_fix_slave_validate_flag(bond, slave);
>>> }
>>> 
>>> static inline void bond_set_slave_active_flags(struct slave *slave)
>>> {
>>>+	struct bonding *bond = netdev_priv(slave->dev->master);
>>>+
>>> 	slave->state = BOND_STATE_ACTIVE;
>>>-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
>>>+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
>>>+	bond_fix_slave_validate_flag(bond, slave);
>>> }
>>> 
>>> static inline void bond_set_master_3ad_flags(struct bonding *bond)
>>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>index 2e7783f..3dafd8f 100644
>>>--- a/include/linux/netdevice.h
>>>+++ b/include/linux/netdevice.h
>>>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
>>> 	dev->gso_max_size = size;
>>> }
>>> 
>>>+#if 0
>>> /* On bonding slaves other than the currently active slave, suppress
>>>  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>>  * ARP on active-backup slaves with arp_validate enabled.
>>>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb)
>>> 	}
>>> 	return 0;
>>> }
>>>+#endif
>>>+extern int skb_bond_should_drop(struct sk_buff *skb);
>>> 
>>> extern struct pernet_operations __net_initdata loopback_net_ops;
>>> #endif /* __KERNEL__ */
>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>index 91d792d..ac5a37e 100644
>>>--- a/net/core/dev.c
>>>+++ b/net/core/dev.c
>>>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
>>> #define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
>>> #endif
>>> 
>>>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
>>>+int (*bond_handle_frame_hook)(struct sk_buff *skb);
>>>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook);
>>>+
>>>+/* On bonding slaves other than the currently active slave, suppress
>>>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
>>>+ * ARP on active-backup slaves with arp_validate enabled.
>>>+ */
>>>+int skb_bond_should_drop(struct sk_buff *skb)
>>>+{
>>>+	struct net_device *dev = skb->dev;
>>>+	struct net_device *master = dev->master;
>>>+
>>>+	if (master)
>>>+		return bond_handle_frame_hook(skb);
>>
>>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>>at least 2 dereferences, 1 if check. You would also need to add
>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>
>	This won't work, because the VLAN code reassigns skb->dev to the
>VLAN device before calling netif_receive_skb.

Sure, but bond_should_drop is called before it actually reassigns that. So the
check in bond_should_drop tests "original_dev->master".

I had on mind something like following:

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index c67fe6f..87a7334 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
 	if (netpoll_rx(skb))
 		return NET_RX_DROP;
 
-	if (skb_bond_should_drop(skb))
+	if (skb->dev->master && bond_handle_frame_hook(skb))
 		goto drop;
 
 	skb->vlan_tci = vlan_tci;
@@ -79,7 +79,7 @@ static int vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
 {
 	struct sk_buff *p;
 
-	if (skb_bond_should_drop(skb))
+	if (skb->dev->master && bond_handle_frame_hook(skb))
 		goto drop;
 
 	skb->vlan_tci = vlan_tci;
diff --git a/net/core/dev.c b/net/core/dev.c
index 0590b2a..280e3de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2282,7 +2282,7 @@ int netif_receive_skb(struct sk_buff *skb)
 	null_or_orig = NULL;
 	orig_dev = skb->dev;
 	if (orig_dev->master) {
-		if (skb_bond_should_drop(skb))
+		if (bond_handle_frame_hook(skb))
 			null_or_orig = orig_dev; /* deliver only exact match */
 		else
 			skb->dev = orig_dev->master;

Note I did not even compile that.

>
>	In the VLAN path, the second call to skb_bond_should_drop (the
>first is within the VLAN code itself, __vlan_hwaccel_rx or
>vlan_gro_common, the second is netif_receive_skb) won't call the hook,
>because the VLAN device has no dev->master.
>
>	This is the whole reason for the hook: to process the ARP before
>VLAN reassigns skb->dev.  Once that happens, the actual device the ARP
>arrived on is lost.
>
>	-J
>
>
>>>+
>>>+	return 0;
>>>+
>>>+#if 0
>>>+		if (master->priv_flags & IFF_MASTER_ARPMON)
>>>+			dev->last_rx = jiffies;
>>>+
>>>+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
>>>+			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
>>>+			    skb->protocol == __constant_htons(ETH_P_ARP))
>>>+				return 0;
>>>+
>>>+			if (master->priv_flags & IFF_MASTER_ALB) {
>>>+				if (skb->pkt_type != PACKET_BROADCAST &&
>>>+				    skb->pkt_type != PACKET_MULTICAST)
>>>+					return 0;
>>>+			}
>>>+			if (master->priv_flags & IFF_MASTER_8023AD &&
>>>+			    skb->protocol == __constant_htons(ETH_P_SLOW))
>>>+				return 0;
>>>+
>>>+			return 1;
>>>+		}
>>>+	}
>>>+	return 0;
>>>+#endif /* 0 */
>>>+}
>>>+#else
>>>+int skb_bond_should_drop(struct sk_buff *skb)
>>>+{
>>>+	return 0;
>>>+}
>>>+#endif
>>>+EXPORT_SYMBOL_GPL(skb_bond_should_drop);
>>>+
>>> #ifdef CONFIG_NET_CLS_ACT
>>> /* TODO: Maybe we should just force sch_ingress to be compiled in
>>>  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
>>>
>>>
>>>---
>>>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>--
>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23  5:58                     ` Eric Dumazet
@ 2009-04-23 15:38                       ` Jay Vosburgh
  0 siblings, 0 replies; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-23 15:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: stefan novak, linux-kernel, Linux Netdev List

Eric Dumazet <dada1@cosmosbay.com> wrote:

>Jay Vosburgh a écrit :
>> stefan novak <lms.brubaker@gmail.com> wrote:
>> 
>>> so its a bug in kernel?
>> 
>> 	Yes.  I think the following patch should add support for
>> arp_validate over VLANs, could you test this?  This is still a work in
>> progress, so it's pretty rough around the edges, but the core
>> functionality should be there.
>> 
>> 	This works by moving the skb_bond_should_drop logic around, and
>> replaces the current inline code with a hook into bonding to do all of
>> that, plus additional logic.  The reason for a hook is to get the
>> skb_bond_should_drop callers from the VLAN input path before the input
>> device is assigned to the VLAN device.
>> 
>> 	-J
>
>Hi Jay
>
>I wanted to test your patch and setup such VLAN config, and not yet applied your patch.
>
># cat /proc/net/bonding/bond1
>Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
>
>Bonding Mode: fault-tolerance (active-backup)
>Primary Slave: None
>Currently Active Slave: eth2
>MII Status: up
>MII Polling Interval (ms): 0
>Up Delay (ms): 0
>Down Delay (ms): 0
>ARP Polling Interval (ms): 250
>ARP IP target/s (n.n.n.n form): 192.168.20.254
>
>Slave Interface: eth1
>MII Status: up
>Link Failure Count: 8
>Permanent HW addr: 00:1e:0b:ec:d3:d2
>
>Slave Interface: eth2
>MII Status: up
>Link Failure Count: 11
>Permanent HW addr: 00:1e:0b:92:78:50
>
># grep . /sys/class/net/bond1/bonding/*
>/sys/class/net/bond1/bonding/active_slave:eth2
>/sys/class/net/bond1/bonding/ad_select:stable 0
>/sys/class/net/bond1/bonding/arp_interval:250
>/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254
>/sys/class/net/bond1/bonding/arp_validate:none 0

	To test this patch, you'll need to set arp_validate to all.
Well, insuring that nothing breaks without arp_validate is good, too,
but the patch is supposed to make arp_validate function with an
arp_ip_target accessed via a VLAN.

>/sys/class/net/bond1/bonding/downdelay:0
>/sys/class/net/bond1/bonding/fail_over_mac:none 0
>/sys/class/net/bond1/bonding/lacp_rate:slow 0
>/sys/class/net/bond1/bonding/miimon:0
>/sys/class/net/bond1/bonding/mii_status:up
>/sys/class/net/bond1/bonding/mode:active-backup 1
>/sys/class/net/bond1/bonding/num_grat_arp:1
>/sys/class/net/bond1/bonding/num_unsol_na:1
>/sys/class/net/bond1/bonding/slaves:eth1 eth2
>/sys/class/net/bond1/bonding/updelay:0
>/sys/class/net/bond1/bonding/use_carrier:1
>/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0
>
>
>
>So active slave is eth2. Still I receive trafic on eth1, according to ifconfig :
># ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets
>          RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
>          RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0
>
>exactly 4 packets per second.

	This is expected, these are the broadcast ARP_REQUEST packets
the bond issues as probes, every 250 ms as specified by your
arp_interval.  Any broadcasts on the switch or other traffic flooded to
all ports will come into the slave device, and (for the active-backup
inactive slave case) all of it is thrown away.  There's some trickery in
netif_receive_skb for sockets that bind directly to the slave, but that
doesn't seem to affect tcpdump.

># ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets
>          RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0
>          RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0
>
>I also receive arp answers on eth2 (quite normal)
>
>I wanted to tcpdump on eth1 but got nothing :
>
># tcpdump -p -n -s 0 -i eth1
>tcpdump: WARNING: eth1: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

	This is normal, as incoming traffic is assigned to the master
before the packet capture gets a look.

># tcpdump -p -n -s 0 -i eth2
>tcpdump: WARNING: eth2: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110
>07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110

	This is also normal (seeing only outbound traffic) for the same
reason as above.

># tcpdump -p -n -s 0 -i bond1
>tcpdump: WARNING: bond1: no IPv4 address assigned
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>
># tcpdump -p -n -s 0 -i bond1.103
>tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes
>07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01
>
>Configuration script is 
>
>ip link set eth1 up
>ip link set eth2 up
>
>ip addr flush dev eth1
>ip addr flush dev eth2
>
>ip link set eth1 up
>ip link set eth2 up
>
>modprobe bond1
>
>ifconfig bond1 down
>
># test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP
>echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target
>echo 250 >/sys/class/net/bond1/bonding/arp_interval
>
># mode actif/passif (active-backup)
>echo 1 >/sys/class/net/bond1/bonding/mode
>
>ifconfig bond1 up
>
>ifenslave bond1 eth1 eth2
>
>ip link set eth1 up
>ip link set eth2 up
>
>ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103
>ip addr add 192.168.20.110/24 dev bond1.103
>ip link set bond1.103 up
>
>ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825
>ip addr add 10.170.73.123/25 dev bond1.825
>ip link set bond1.825 up
>
>
>
>Is this setup OK to test your patch ?

	I believe so, provided you enable arp_validate as mentioned
above.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23 15:20                         ` Jiri Pirko
@ 2009-04-23 18:34                           ` Jay Vosburgh
  2009-04-23 19:22                             ` Jiri Pirko
  0 siblings, 1 reply; 19+ messages in thread
From: Jay Vosburgh @ 2009-04-23 18:34 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List

Jiri Pirko <jpirko@redhat.com> wrote:
>Thu, Apr 23, 2009 at 04:59:41PM CEST, fubar@us.ibm.com wrote:
>>Jiri Pirko <jpirko@redhat.com> wrote:
[...]
>>>>+{
>>>>+	struct net_device *dev = skb->dev;
>>>>+	struct net_device *master = dev->master;
>>>>+
>>>>+	if (master)
>>>>+		return bond_handle_frame_hook(skb);
>>>
>>>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>>>at least 2 dereferences, 1 if check. You would also need to add
>>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>>
>>	This won't work, because the VLAN code reassigns skb->dev to the
>>VLAN device before calling netif_receive_skb.
>
>Sure, but bond_should_drop is called before it actually reassigns that. So the
>check in bond_should_drop tests "original_dev->master".
>
>I had on mind something like following:
>
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>index c67fe6f..87a7334 100644
>--- a/net/8021q/vlan_core.c
>+++ b/net/8021q/vlan_core.c
>@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
> 	if (netpoll_rx(skb))
> 		return NET_RX_DROP;
>
>-	if (skb_bond_should_drop(skb))
>+	if (skb->dev->master && bond_handle_frame_hook(skb))
[...]

	Yah, ok, I see what you mean.  The same could be accomplished by
turning skb_bond_should_drop back into an inline in the header file, and
hiding the fiddly bits from the calling context.

	It's pretty grotty no matter how it's done; I'd prefer to avoid
the whole hook business, but I haven't thought of a less bad way.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: bond interface arp, vlan and trunk / network question
  2009-04-23 18:34                           ` Jay Vosburgh
@ 2009-04-23 19:22                             ` Jiri Pirko
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Pirko @ 2009-04-23 19:22 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List

Thu, Apr 23, 2009 at 08:34:20PM CEST, fubar@us.ibm.com wrote:
>Jiri Pirko <jpirko@redhat.com> wrote:
>>Thu, Apr 23, 2009 at 04:59:41PM CEST, fubar@us.ibm.com wrote:
>>>Jiri Pirko <jpirko@redhat.com> wrote:
>[...]
>>>>>+{
>>>>>+	struct net_device *dev = skb->dev;
>>>>>+	struct net_device *master = dev->master;
>>>>>+
>>>>>+	if (master)
>>>>>+		return bond_handle_frame_hook(skb);
>>>>
>>>>Maybe this hook can be called from netif_receive_skb() directly. You would safe
>>>>at least 2 dereferences, 1 if check. You would also need to add
>>>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common().
>>>
>>>	This won't work, because the VLAN code reassigns skb->dev to the
>>>VLAN device before calling netif_receive_skb.
>>
>>Sure, but bond_should_drop is called before it actually reassigns that. So the
>>check in bond_should_drop tests "original_dev->master".
>>
>>I had on mind something like following:
>>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>>diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>>index c67fe6f..87a7334 100644
>>--- a/net/8021q/vlan_core.c
>>+++ b/net/8021q/vlan_core.c
>>@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
>> 	if (netpoll_rx(skb))
>> 		return NET_RX_DROP;
>>
>>-	if (skb_bond_should_drop(skb))
>>+	if (skb->dev->master && bond_handle_frame_hook(skb))
>[...]
>
>	Yah, ok, I see what you mean.  The same could be accomplished by
>turning skb_bond_should_drop back into an inline in the header file, and

Well no exactly. In vlan_x certainly yes but in netif_receive_skb() you safe
something by not checking ->master twice and 2 dereferences. Just noting when I
see the waste...

>hiding the fiddly bits from the calling context.
>
>	It's pretty grotty no matter how it's done; I'd prefer to avoid
>the whole hook business, but I haven't thought of a less bad way.

Yes I agree hooking this sucks :/ Anyway, is there any movement to eliminate
all these hooks from netif_receive_skb() ?

	Jirka
>
>	-J
>
>---
>	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-04-23 19:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-20 17:50 bond interface arp, vlan and trunk / network question stefan novak
2009-04-20 18:16 ` Eric Dumazet
2009-04-20 18:37   ` Jay Vosburgh
2009-04-20 20:00     ` stefan novak
2009-04-20 20:36       ` Jay Vosburgh
2009-04-20 21:03         ` stefan novak
2009-04-20 21:15           ` Eric Dumazet
2009-04-20 21:23           ` Jay Vosburgh
2009-04-20 21:39             ` stefan novak
2009-04-21  0:01               ` Jay Vosburgh
2009-04-22  8:29                 ` stefan novak
2009-04-23  1:12                   ` Jay Vosburgh
2009-04-23  5:58                     ` Eric Dumazet
2009-04-23 15:38                       ` Jay Vosburgh
2009-04-23  7:48                     ` Jiri Pirko
2009-04-23 14:59                       ` Jay Vosburgh
2009-04-23 15:20                         ` Jiri Pirko
2009-04-23 18:34                           ` Jay Vosburgh
2009-04-23 19:22                             ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.