* bond interface arp, vlan and trunk / network question @ 2009-04-20 17:50 stefan novak 2009-04-20 18:16 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: stefan novak @ 2009-04-20 17:50 UTC (permalink / raw) To: linux-kernel Hello list, I've got a Problem with my bladecenter and bond interfaces. My Server has 2 interface, each on a seperate switch. Each serverinterface is connected via a trunk to one of the switches. Each switch has a trunk to stacked-backbone switch. nic1 --trunk--> bladesw1 --trunk--> backbone switch nic2 --trunk--> bladesw2 --trunk--> backbone switch So far vlan and trunking works as expected. But if one trunk connection from a bladeswitch to the backbone switch is down the mii-tool cant recognize this. I tried to use an arp target on the backbone switch to check the connection state. Now i'm running into problems. :( My bladesw1 is configured with a trunk and a private vlan id of 600. On the backbone switch is a server in the vlan 600 connected, but i can't get an arp request. What can be the problem, or is it possible to add a vlan tag on the arp check? thx for your help and sorry for my bad english... Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 17:50 bond interface arp, vlan and trunk / network question stefan novak @ 2009-04-20 18:16 ` Eric Dumazet 2009-04-20 18:37 ` Jay Vosburgh 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2009-04-20 18:16 UTC (permalink / raw) To: stefan novak; +Cc: linux-kernel, Linux Netdev List stefan novak a écrit : > Hello list, > > I've got a Problem with my bladecenter and bond interfaces. > My Server has 2 interface, each on a seperate switch. Each > serverinterface is connected via a trunk to one of the switches. > Each switch has a trunk to stacked-backbone switch. > > nic1 --trunk--> bladesw1 --trunk--> backbone switch > nic2 --trunk--> bladesw2 --trunk--> backbone switch > > So far vlan and trunking works as expected. But if one trunk > connection from a bladeswitch to the backbone switch is down the > mii-tool cant recognize this. What is the exact problem on this one ? > I tried to use an arp target on the backbone switch to check the > connection state. > > Now i'm running into problems. :( > My bladesw1 is configured with a trunk and a private vlan id of 600. > On the backbone switch is a server in the vlan 600 connected, but i > can't get an arp request. > > What can be the problem, or is it possible to add a vlan tag on the arp check? > What driver is in use on your NIC interface ? Could you post your bonding settings ? cat /proc/net/bonding/bond0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 18:16 ` Eric Dumazet @ 2009-04-20 18:37 ` Jay Vosburgh 2009-04-20 20:00 ` stefan novak 0 siblings, 1 reply; 19+ messages in thread From: Jay Vosburgh @ 2009-04-20 18:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: stefan novak, linux-kernel, Linux Netdev List Eric Dumazet <dada1@cosmosbay.com> wrote: >stefan novak a écrit : >> Hello list, >> >> I've got a Problem with my bladecenter and bond interfaces. >> My Server has 2 interface, each on a seperate switch. Each >> serverinterface is connected via a trunk to one of the switches. >> Each switch has a trunk to stacked-backbone switch. >> >> nic1 --trunk--> bladesw1 --trunk--> backbone switch >> nic2 --trunk--> bladesw2 --trunk--> backbone switch >> >> So far vlan and trunking works as expected. But if one trunk >> connection from a bladeswitch to the backbone switch is down the >> mii-tool cant recognize this. > >What is the exact problem on this one ? This is the expected behavior, the external switch to bladecenter switch (ESM) link status does not affect the ESM to blade link status. Unless... your ESM supports "trunk failover." I believe the Cisco ESMs do, I'm not sure about others. With trunk failover enabled, loss of link on an external switch port will in turn drop link on the corresponding internal switch ports. There is a long-ish delay for this, on the order of 750 ms, as I recall. >> I tried to use an arp target on the backbone switch to check the >> connection state. >> >> Now i'm running into problems. :( >> My bladesw1 is configured with a trunk and a private vlan id of 600. >> On the backbone switch is a server in the vlan 600 connected, but i >> can't get an arp request. >> >> What can be the problem, or is it possible to add a vlan tag on the arp check? >> > >What driver is in use on your NIC interface ? > >Could you post your bonding settings ? > >cat /proc/net/bonding/bond0 Sufficiently recent versions of bonding should VLAN tag the ARP probes, provided you are using a VLAN device configured above the bond. My recollection is that VLAN tagging of ARP probes was added about three years ago. If the switch port is configured as native to a VLAN, then it should tag everything coming in. As Eric asks, what are you running? -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 18:37 ` Jay Vosburgh @ 2009-04-20 20:00 ` stefan novak 2009-04-20 20:36 ` Jay Vosburgh 0 siblings, 1 reply; 19+ messages in thread From: stefan novak @ 2009-04-20 20:00 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List >> nic1 --trunk--> bladesw1 --trunk--> backbone switch >> nic2 --trunk--> bladesw2 --trunk--> backbone switch >> >> So far vlan and trunking works as expected. But if one trunk >> connection from a bladeswitch to the backbone switch is down the >> mii-tool cant recognize this. > > What is the exact problem on this one ? The exact problem is that the bonding driver don't switch the interface because the mii-tool don't recognize that the connection between the two switches is now broken. nic1 --trunk--> bladesw1 --trunk--> backbone switch (passive if) nic2 --trunk--> bladesw2 --xxx--broken-trunk-xx-> backbone switch (active if) > Sufficiently recent versions of bonding should VLAN tag the ARP >probes, provided you are using a VLAN device configured above the bond. >My recollection is that VLAN tagging of ARP probes was added about three >years ago. > > If the switch port is configured as native to a VLAN, then it >should tag everything coming in. > > As Eric asks, what are you running? I've got a bonding interface over eth0 and eth1 and on that bonding interface several vlans. The arp probe is in vlan 600 and i can ping it from the bash. Also a arpping with -I bond0.600 is working. Arpping with -I bond0 is not working. So the arp check is not working for me :( the nic driver is: alias eth0 igb alias eth1 igb the bond setiings: cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 30 ARP IP target/s (n.n.n.n form): 172.21.0.254 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:30:48:94:7d:1a Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:30:48:94:7d:1b On Mon, Apr 20, 2009 at 8:37 PM, Jay Vosburgh <fubar@us.ibm.com> wrote: > Eric Dumazet <dada1@cosmosbay.com> wrote: > >>stefan novak a écrit : >>> Hello list, >>> >>> I've got a Problem with my bladecenter and bond interfaces. >>> My Server has 2 interface, each on a seperate switch. Each >>> serverinterface is connected via a trunk to one of the switches. >>> Each switch has a trunk to stacked-backbone switch. >>> >>> nic1 --trunk--> bladesw1 --trunk--> backbone switch >>> nic2 --trunk--> bladesw2 --trunk--> backbone switch >>> >>> So far vlan and trunking works as expected. But if one trunk >>> connection from a bladeswitch to the backbone switch is down the >>> mii-tool cant recognize this. >> >>What is the exact problem on this one ? > > This is the expected behavior, the external switch to > bladecenter switch (ESM) link status does not affect the ESM to blade > link status. > > Unless... your ESM supports "trunk failover." I believe the > Cisco ESMs do, I'm not sure about others. With trunk failover enabled, > loss of link on an external switch port will in turn drop link on the > corresponding internal switch ports. There is a long-ish delay for > this, on the order of 750 ms, as I recall. > >>> I tried to use an arp target on the backbone switch to check the >>> connection state. >>> >>> Now i'm running into problems. :( >>> My bladesw1 is configured with a trunk and a private vlan id of 600. >>> On the backbone switch is a server in the vlan 600 connected, but i >>> can't get an arp request. >>> >>> What can be the problem, or is it possible to add a vlan tag on the arp check? >>> >> >>What driver is in use on your NIC interface ? >> >>Could you post your bonding settings ? >> >>cat /proc/net/bonding/bond0 > > Sufficiently recent versions of bonding should VLAN tag the ARP > probes, provided you are using a VLAN device configured above the bond. > My recollection is that VLAN tagging of ARP probes was added about three > years ago. > > If the switch port is configured as native to a VLAN, then it > should tag everything coming in. > > As Eric asks, what are you running? > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 20:00 ` stefan novak @ 2009-04-20 20:36 ` Jay Vosburgh 2009-04-20 21:03 ` stefan novak 0 siblings, 1 reply; 19+ messages in thread From: Jay Vosburgh @ 2009-04-20 20:36 UTC (permalink / raw) To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List stefan novak <lms.brubaker@gmail.com> wrote: >>> nic1 --trunk--> bladesw1 --trunk--> backbone switch >>> nic2 --trunk--> bladesw2 --trunk--> backbone switch >>> >>> So far vlan and trunking works as expected. But if one trunk >>> connection from a bladeswitch to the backbone switch is down the >>> mii-tool cant recognize this. >> >> What is the exact problem on this one ? > >The exact problem is that the bonding driver don't switch the >interface because the mii-tool don't recognize that the connection >between the two switches is now broken. No, from your configuration information, you're running the ARP monitor, in which case the actual link state ("mii-tool", although that isn't really how it works) is not used in the failover decision. For the ARP monitor, the decision is based on whether or not "replies" to the ARP probes come through. More on that in a bit. >nic1 --trunk--> bladesw1 --trunk--> backbone switch (passive if) >nic2 --trunk--> bladesw2 --xxx--broken-trunk-xx-> backbone switch (active if) > >> Sufficiently recent versions of bonding should VLAN tag the ARP >>probes, provided you are using a VLAN device configured above the bond. >>My recollection is that VLAN tagging of ARP probes was added about three >>years ago. >> >> If the switch port is configured as native to a VLAN, then it >>should tag everything coming in. >> >> As Eric asks, what are you running? > >I've got a bonding interface over eth0 and eth1 and on that bonding >interface several vlans. The arp probe is in vlan 600 and i can ping >it from the bash. >Also a arpping with -I bond0.600 is working. Arpping with -I bond0 is >not working. >So the arp check is not working for me :( I believe you're seeing the expected behavior from arping here, and it does not automatically indicate that anything is wrong. It's very possible that your network topology is such that arping -I bond0 won't work while arping -I bond0.600 does. If the target you specify is reachable only on the VLAN, it's expected behavior that arping -I bond0 of that target won't work (because the interface bond0 is not attached to the VLAN, only bond0.600 is). That doesn't mean that the ARPs generated internally by bonding are untagged / failing, as bonding itself adds VLAN tags to its own ARP probes as needed. On the other hand, if you specify different targets to the arping -I bond0 and arping -I bond0.600 (so that the "bond0" target isn't a VLAN destination), then something unusual may be occuring. Also, are you running multiple blades with bonding behind the same set of switches? If you are, you probably want to set the arp_validate option to either "active" or "all", as the default setting (none) relies only on the existance of traffic on the slaves, and doesn't check the source of that traffic. The end result of that is the probes from multiple bonding instances fool one another into thinking the path is up, when it is not. With arp_validate enabled, it'll check that the slaves are actually receiving their own ARP traffic. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com >the nic driver is: >alias eth0 igb >alias eth1 igb > >the bond setiings: >cat /proc/net/bonding/bond0 >Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) > >Bonding Mode: fault-tolerance (active-backup) >Primary Slave: None >Currently Active Slave: eth0 >MII Status: up >MII Polling Interval (ms): 0 >Up Delay (ms): 0 >Down Delay (ms): 0 >ARP Polling Interval (ms): 30 >ARP IP target/s (n.n.n.n form): 172.21.0.254 > >Slave Interface: eth0 >MII Status: up >Link Failure Count: 0 >Permanent HW addr: 00:30:48:94:7d:1a > >Slave Interface: eth1 >MII Status: up >Link Failure Count: 0 >Permanent HW addr: 00:30:48:94:7d:1b > > >On Mon, Apr 20, 2009 at 8:37 PM, Jay Vosburgh <fubar@us.ibm.com> wrote: >> Eric Dumazet <dada1@cosmosbay.com> wrote: >> >>>stefan novak a écrit : >>>> Hello list, >>>> >>>> I've got a Problem with my bladecenter and bond interfaces. >>>> My Server has 2 interface, each on a seperate switch. Each >>>> serverinterface is connected via a trunk to one of the switches. >>>> Each switch has a trunk to stacked-backbone switch. >>>> >>>> nic1 --trunk--> bladesw1 --trunk--> backbone switch >>>> nic2 --trunk--> bladesw2 --trunk--> backbone switch >>>> >>>> So far vlan and trunking works as expected. But if one trunk >>>> connection from a bladeswitch to the backbone switch is down the >>>> mii-tool cant recognize this. >>> >>>What is the exact problem on this one ? >> >> This is the expected behavior, the external switch to >> bladecenter switch (ESM) link status does not affect the ESM to blade >> link status. >> >> Unless... your ESM supports "trunk failover." I believe the >> Cisco ESMs do, I'm not sure about others. With trunk failover enabled, >> loss of link on an external switch port will in turn drop link on the >> corresponding internal switch ports. There is a long-ish delay for >> this, on the order of 750 ms, as I recall. >> >>>> I tried to use an arp target on the backbone switch to check the >>>> connection state. >>>> >>>> Now i'm running into problems. :( >>>> My bladesw1 is configured with a trunk and a private vlan id of 600. >>>> On the backbone switch is a server in the vlan 600 connected, but i >>>> can't get an arp request. >>>> >>>> What can be the problem, or is it possible to add a vlan tag on the arp check? >>>> >>> >>>What driver is in use on your NIC interface ? >>> >>>Could you post your bonding settings ? >>> >>>cat /proc/net/bonding/bond0 >> >> Sufficiently recent versions of bonding should VLAN tag the ARP >> probes, provided you are using a VLAN device configured above the bond. >> My recollection is that VLAN tagging of ARP probes was added about three >> years ago. >> >> If the switch port is configured as native to a VLAN, then it >> should tag everything coming in. >> >> As Eric asks, what are you running? >> >> -J >> >> --- >> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com >> >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 20:36 ` Jay Vosburgh @ 2009-04-20 21:03 ` stefan novak 2009-04-20 21:15 ` Eric Dumazet 2009-04-20 21:23 ` Jay Vosburgh 0 siblings, 2 replies; 19+ messages in thread From: stefan novak @ 2009-04-20 21:03 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List > > I believe you're seeing the expected behavior from arping here, > and it does not automatically indicate that anything is wrong. > > It's very possible that your network topology is such that > arping -I bond0 won't work while arping -I bond0.600 does. If the > target you specify is reachable only on the VLAN, it's expected behavior > that arping -I bond0 of that target won't work (because the interface > bond0 is not attached to the VLAN, only bond0.600 is). That doesn't > mean that the ARPs generated internally by bonding are untagged / > failing, as bonding itself adds VLAN tags to its own ARP probes as > needed. Ok. I've checked the tcpdump's on the machines and I think something is working. tcpdump -v -i eth0 arp tcpdump: WARNING: eth0: no IPv4 address assigned tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1 22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1 22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1 22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1 tcpdump -v -i bond0.600 arp tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture size 96 bytes 22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) 22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) 22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) 22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) the arp's are sent out on eth0 and recieved via bond0.600. When they are sent on eth0 then the switch must tag the vlan600 (private vlan). Then they come in at the right interface. Is it normal that so many arp's are sent? Is there a way to check if the arp check is working right in the proc fs oder something like that? > Also, are you running multiple blades with bonding behind the > same set of switches? Yes, 14 blades with 2 seperate(not connected) switches. > If you are, you probably want to set the > arp_validate option to either "active" or "all", as the default setting > (none) relies only on the existance of traffic on the slaves, and > doesn't check the source of that traffic. The end result of that is the > probes from multiple bonding instances fool one another into thinking > the path is up, when it is not. With arp_validate enabled, it'll check > that the slaves are actually receiving their own ARP traffic. Ok, sounds right for me. I've set the arp_validate option to "all". ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 21:03 ` stefan novak @ 2009-04-20 21:15 ` Eric Dumazet 2009-04-20 21:23 ` Jay Vosburgh 1 sibling, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2009-04-20 21:15 UTC (permalink / raw) To: stefan novak; +Cc: Jay Vosburgh, linux-kernel, Linux Netdev List stefan novak a écrit : >> I believe you're seeing the expected behavior from arping here, >> and it does not automatically indicate that anything is wrong. >> >> It's very possible that your network topology is such that >> arping -I bond0 won't work while arping -I bond0.600 does. If the >> target you specify is reachable only on the VLAN, it's expected behavior >> that arping -I bond0 of that target won't work (because the interface >> bond0 is not attached to the VLAN, only bond0.600 is). That doesn't >> mean that the ARPs generated internally by bonding are untagged / >> failing, as bonding itself adds VLAN tags to its own ARP probes as >> needed. > > Ok. I've checked the tcpdump's on the machines and I think something is working. > > tcpdump -v -i eth0 arp > tcpdump: WARNING: eth0: no IPv4 address assigned > tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes > 22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1 > 22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1 > 22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1 > 22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1 > > tcpdump -v -i bond0.600 arp > tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture > size 96 bytes > 22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) > 22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) > 22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) > 22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) > > the arp's are sent out on eth0 and recieved via bond0.600. When they > are sent on eth0 then the switch must tag the vlan600 (private vlan). Ah, you setup eth0 or bond0 with an IP ? bond driver does a route loookup to find out if a vlan tag is necessary or not when issuing an arp request. So check result of : "ip route get 172.21.0.254" > Then they come in at the right interface. Is it normal that so many > arp's are sent? you setup a 30 ms interval, so you get what you asked for :) > Is there a way to check if the arp check is working right in the proc > fs oder something like that? > >> Also, are you running multiple blades with bonding behind the >> same set of switches? > > Yes, 14 blades with 2 seperate(not connected) switches. > >> If you are, you probably want to set the >> arp_validate option to either "active" or "all", as the default setting >> (none) relies only on the existance of traffic on the slaves, and >> doesn't check the source of that traffic. The end result of that is the >> probes from multiple bonding instances fool one another into thinking >> the path is up, when it is not. With arp_validate enabled, it'll check >> that the slaves are actually receiving their own ARP traffic. > > Ok, sounds right for me. I've set the arp_validate option to "all". Please give us : ifconfig -a ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 21:03 ` stefan novak 2009-04-20 21:15 ` Eric Dumazet @ 2009-04-20 21:23 ` Jay Vosburgh 2009-04-20 21:39 ` stefan novak 1 sibling, 1 reply; 19+ messages in thread From: Jay Vosburgh @ 2009-04-20 21:23 UTC (permalink / raw) To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List stefan novak <lms.brubaker@gmail.com> wrote: >> >> I believe you're seeing the expected behavior from arping here, >> and it does not automatically indicate that anything is wrong. >> >> It's very possible that your network topology is such that >> arping -I bond0 won't work while arping -I bond0.600 does. If the >> target you specify is reachable only on the VLAN, it's expected behavior >> that arping -I bond0 of that target won't work (because the interface >> bond0 is not attached to the VLAN, only bond0.600 is). That doesn't >> mean that the ARPs generated internally by bonding are untagged / >> failing, as bonding itself adds VLAN tags to its own ARP probes as >> needed. > >Ok. I've checked the tcpdump's on the machines and I think something is working. > >tcpdump -v -i eth0 arp >tcpdump: WARNING: eth0: no IPv4 address assigned >tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes >22:56:38.817599 arp who-has 172.21.0.254 tell 172.21.0.1 >22:56:38.847597 arp who-has 172.21.0.254 tell 172.21.0.1 >22:56:38.877598 arp who-has 172.21.0.254 tell 172.21.0.1 >22:56:38.907596 arp who-has 172.21.0.254 tell 172.21.0.1 > >tcpdump -v -i bond0.600 arp >tcpdump: listening on bond0.600, link-type EN10MB (Ethernet), capture >size 96 bytes >22:56:49.167157 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) >22:56:49.197162 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) >22:56:49.227130 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) >22:56:49.257144 arp reply 172.21.0.254 is-at 00:1d:70:d1:ad:83 (oui Unknown) > >the arp's are sent out on eth0 and recieved via bond0.600. When they >are sent on eth0 then the switch must tag the vlan600 (private vlan). >Then they come in at the right interface. Is it normal that so many >arp's are sent? You set arp_interval to 30, so you get one probe every 30 milliseconds, 33 per second. As far as seeing the ARPs on eth0 and bond0.600, I believe that's normal, as the interfaces are logically stacked, so you can see the traffic in both places. If the device supports VLAN acceleration (and maybe even if it doesn't; I'd have to look), I don't think you'll see the tag in the tcpdumped traffic, as the tagging is done by the hardware, not in software. It's also normal on bonding to see transmits on the slave and received traffic on the bond device; that has to do with how the transmit and receive paths differ. >Is there a way to check if the arp check is working right in the proc >fs oder something like that? There's no "self test" or anything like that, if that's what you mean. If the ARPs work (make the round trip) the link is up, if they don't, the link is down. That's subject to some details related to arp_validate, but is basically it. >> Also, are you running multiple blades with bonding behind the >> same set of switches? > >Yes, 14 blades with 2 seperate(not connected) switches. > >> If you are, you probably want to set the >> arp_validate option to either "active" or "all", as the default setting >> (none) relies only on the existance of traffic on the slaves, and >> doesn't check the source of that traffic. The end result of that is the >> probes from multiple bonding instances fool one another into thinking >> the path is up, when it is not. With arp_validate enabled, it'll check >> that the slaves are actually receiving their own ARP traffic. > >Ok, sounds right for me. I've set the arp_validate option to "all". That may make the phantom "up" link go away. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 21:23 ` Jay Vosburgh @ 2009-04-20 21:39 ` stefan novak 2009-04-21 0:01 ` Jay Vosburgh 0 siblings, 1 reply; 19+ messages in thread From: stefan novak @ 2009-04-20 21:39 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List ip route get 172.21.0.254 172.21.0.254 dev bond0.600 src 172.21.0.1 cache mtu 1500 advmss 1460 hoplimit 64 ifconfig -a : bond0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:62221072 errors:0 dropped:0 overruns:0 frame:0 TX packets:1152153 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4037489679 (3.7 GiB) TX bytes:124861822 (119.0 MiB) bond0.200 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A inet addr:172.22.0.1 Bcast:172.22.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:5 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:250 (250.0 b) TX bytes:252 (252.0 b) bond0.500 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A inet addr:172.20.0.1 Bcast:172.20.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:26448 errors:0 dropped:0 overruns:0 frame:0 TX packets:33570 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6278388 (5.9 MiB) TX bytes:6429404 (6.1 MiB) bond0.600 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A inet addr:172.21.0.1 Bcast:172.21.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:1911 errors:0 dropped:0 overruns:0 frame:0 TX packets:8665 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:98604 (96.2 KiB) TX bytes:1208395 (1.1 MiB) eth0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:881723 errors:0 dropped:0 overruns:0 frame:0 TX packets:1074651 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:108550619 (103.5 MiB) TX bytes:115574099 (110.2 MiB) Memory:f8220000-f8240000 eth1 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:61339352 errors:0 dropped:0 overruns:0 frame:0 TX packets:77510 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3928939240 (3.6 GiB) TX bytes:9290795 (8.8 MiB) Memory:f8260000-f8280000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:105468 errors:0 dropped:0 overruns:0 frame:0 TX packets:105468 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7016087 (6.6 MiB) TX bytes:7016087 (6.6 MiB) > You set arp_interval to 30, so you get one probe every 30 > milliseconds, 33 per second. Yes, sure. my mistake.... > There's no "self test" or anything like that, if that's what you > mean. If the ARPs work (make the round trip) the link is up, if they > don't, the link is down. That's subject to some details related to > arp_validate, but is basically it. Ok. my interfaces are now up with arp_validate set to 0/none. Everything works as expected, thank you. But with arp_validate set to all the interfaces are always down. :( I think that i have to use the all option because i have many servers in that vlan? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-20 21:39 ` stefan novak @ 2009-04-21 0:01 ` Jay Vosburgh 2009-04-22 8:29 ` stefan novak 0 siblings, 1 reply; 19+ messages in thread From: Jay Vosburgh @ 2009-04-21 0:01 UTC (permalink / raw) To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List stefan novak <lms.brubaker@gmail.com> wrote: >ip route get 172.21.0.254 >172.21.0.254 dev bond0.600 src 172.21.0.1 > cache mtu 1500 advmss 1460 hoplimit 64 > >ifconfig -a : >bond0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A > UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 > RX packets:62221072 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1152153 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:4037489679 (3.7 GiB) TX bytes:124861822 (119.0 MiB) > >bond0.200 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A > inet addr:172.22.0.1 Bcast:172.22.0.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 > RX packets:5 errors:0 dropped:0 overruns:0 frame:0 > TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:250 (250.0 b) TX bytes:252 (252.0 b) > >bond0.500 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A > inet addr:172.20.0.1 Bcast:172.20.0.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 > RX packets:26448 errors:0 dropped:0 overruns:0 frame:0 > TX packets:33570 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:6278388 (5.9 MiB) TX bytes:6429404 (6.1 MiB) > >bond0.600 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A > inet addr:172.21.0.1 Bcast:172.21.0.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 > RX packets:1911 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8665 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:98604 (96.2 KiB) TX bytes:1208395 (1.1 MiB) > >eth0 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A > UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 > RX packets:881723 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1074651 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:108550619 (103.5 MiB) TX bytes:115574099 (110.2 MiB) > Memory:f8220000-f8240000 > >eth1 Link encap:Ethernet HWaddr 00:30:48:94:7D:1A > UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 > RX packets:61339352 errors:0 dropped:0 overruns:0 frame:0 > TX packets:77510 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:3928939240 (3.6 GiB) TX bytes:9290795 (8.8 MiB) > Memory:f8260000-f8280000 > >lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:105468 errors:0 dropped:0 overruns:0 frame:0 > TX packets:105468 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:7016087 (6.6 MiB) TX bytes:7016087 (6.6 MiB) This looks pretty much as expected. [...] >> There's no "self test" or anything like that, if that's what you >> mean. If the ARPs work (make the round trip) the link is up, if they >> don't, the link is down. That's subject to some details related to >> arp_validate, but is basically it. > >Ok. my interfaces are now up with arp_validate set to 0/none. >Everything works as expected, thank you. >But with arp_validate set to all the interfaces are always down. :( >I think that i have to use the all option because i have many servers >in that vlan? Well, you'll probably want to run with arp_validate, otherwise the probe traffic can fool the arp monitor into thinking the path is up when it isn't. But... I know what your problem with arp_validate is, though, and it's something I've been working on as time permits (and forgot to mention previously). Basically, the VLAN receive path assigns the VLAN device to the received packet before doing receive processing on it, so the "slave" identity is lost before the bonding ARP receive function looks at it, so it never counts the ARP (for validate purposes). I've been chewing on the least bad way to fix this, mostly trying to figure out if its possible to resolve without adding a hook into bonding that basically replaces and extends skb_bond_should_drop (which does work, it's just not very elegant). -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-21 0:01 ` Jay Vosburgh @ 2009-04-22 8:29 ` stefan novak 2009-04-23 1:12 ` Jay Vosburgh 0 siblings, 1 reply; 19+ messages in thread From: stefan novak @ 2009-04-22 8:29 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List so its a bug in kernel? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-22 8:29 ` stefan novak @ 2009-04-23 1:12 ` Jay Vosburgh 2009-04-23 5:58 ` Eric Dumazet 2009-04-23 7:48 ` Jiri Pirko 0 siblings, 2 replies; 19+ messages in thread From: Jay Vosburgh @ 2009-04-23 1:12 UTC (permalink / raw) To: stefan novak; +Cc: Eric Dumazet, linux-kernel, Linux Netdev List stefan novak <lms.brubaker@gmail.com> wrote: >so its a bug in kernel? Yes. I think the following patch should add support for arp_validate over VLANs, could you test this? This is still a work in progress, so it's pretty rough around the edges, but the core functionality should be there. This works by moving the skb_bond_should_drop logic around, and replaces the current inline code with a hook into bonding to do all of that, plus additional logic. The reason for a hook is to get the skb_bond_should_drop callers from the VLAN input path before the input device is assigned to the VLAN device. -J diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 99610f3..cc367a3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) read_lock(&bond->lock); new_slave->last_arp_rx = jiffies; + bond_fix_slave_validate_flag(bond, new_slave); if (bond->params.miimon && !bond->params.use_carrier) { link_reporting = bond_check_dev_link(bond, slave_dev, 1); @@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_ { struct sk_buff *skb; - pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op, - slave_dev->name, dest_ip, src_ip, vlan_id); + pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op, + slave_dev->name, dest_ip, src_ip, vlan_id, jiffies); skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip, NULL, slave_dev->dev_addr, NULL); @@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 targets = bond->params.arp_targets; for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) { - pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n", - &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip)); if (sip == targets[i]) { if (bond_has_this_ip(bond, tip)) slave->last_arp_rx = jiffies; @@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 } } -static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) +static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev) { struct arphdr *arp; struct slave *slave; + struct net_device *bond_dev = dev->master; struct bonding *bond; unsigned char *arp_ptr; __be32 sip, tip; if (dev_net(dev) != &init_net) - goto out; + return; - if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER)) - goto out; + bond = netdev_priv(bond_dev); - bond = netdev_priv(dev); read_lock(&bond->lock); - pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n", - bond->dev->name, skb->dev ? skb->dev->name : "NULL", - orig_dev ? orig_dev->name : "NULL"); - - slave = bond_get_slave_by_dev(bond, orig_dev); + slave = bond_get_slave_by_dev(bond, dev); if (!slave || !slave_do_arp_validate(bond, slave)) goto out_unlock; if (!pskb_may_pull(skb, arp_hdr_len(dev))) goto out_unlock; - arp = arp_hdr(skb); + /* Can't use arp_hdr; it's not initialized yet. */ + arp = (struct arphdr *)skb->data; + + pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n", + arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd), + ntohs(arp->ar_pro), arp->ar_pln); + if (arp->ar_hln != dev->addr_len || skb->pkt_type == PACKET_OTHERHOST || skb->pkt_type == PACKET_LOOPBACK || @@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack arp_ptr += 4 + dev->addr_len; memcpy(&tip, arp_ptr, 4); - pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n", - bond->dev->name, slave->dev->name, slave->state, - bond->params.arp_validate, slave_do_arp_validate(bond, slave), - &sip, &tip); - - /* - * Backup slaves won't see the ARP reply, but do come through - * here for each ARP probe (so we swap the sip/tip to validate - * the probe). In a "redundant switch, common router" type of - * configuration, the ARP probe will (hopefully) travel from - * the active, through one switch, the router, then the other - * switch before reaching the backup. - */ - if (slave->state == BOND_STATE_ACTIVE) + switch (ntohs(arp->ar_op)) { + case ARPOP_REPLY: bond_validate_arp(bond, slave, sip, tip); - else + break; + case ARPOP_REQUEST: bond_validate_arp(bond, slave, tip, sip); + break; + } out_unlock: read_unlock(&bond->lock); -out: - dev_kfree_skb(skb); - return NET_RX_SUCCESS; +} + +/* + * Called by core packet processing in netif_receive_skb and in VLAN fast + * path to (a) determine if packet should be dropped, and (b) to perform + * ARP monitor processing (last_rx, validation). + * + * For the VLAN case, called before the skb->dev is reassigned to the + * VLAN. + * + * Returns 1 if frame should nominally be dropped; 0 if it should be kept. + * + * We want to keep: + * - all traffic on active slaves + * - some traffic on inactive slaves: non-broadcast and non-multicast in + * ALB/TLB mode and LACP frames in 802.3ad mode. + * + * ARP frames are handled as normal traffic; the ARP monitor handling + * takes place here, so they need not be kept on inactive slaves. + */ +static int bond_handle_frame(struct sk_buff *skb) +{ + struct net_device *dev; + struct net_device *master; + + dev = skb->dev; + master = dev->master; + if (!master || !master->priv_flags & IFF_BONDING) + return 0; + + if (master->priv_flags & IFF_MASTER_ARPMON) { + dev->last_rx = jiffies; + if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && + skb->protocol == __constant_htons(ETH_P_ARP)) + bond_handle_arp(skb, dev); + } + + if (dev->priv_flags & IFF_SLAVE_INACTIVE) { + if (master->priv_flags & IFF_MASTER_ALB) { + if (skb->pkt_type != PACKET_BROADCAST && + skb->pkt_type != PACKET_MULTICAST) + return 0; + } + if (master->priv_flags & IFF_MASTER_8023AD && + skb->protocol == __constant_htons(ETH_P_SLOW)) + return 0; + + return 1; + } + return 0; } /* @@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond) void bond_register_arp(struct bonding *bond) { +#if 0 struct packet_type *pt = &bond->arp_mon_pt; if (pt->type) @@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond) pt->dev = bond->dev; pt->func = bond_arp_rcv; dev_add_pack(pt); +#endif } void bond_unregister_arp(struct bonding *bond) { +#if 0 struct packet_type *pt = &bond->arp_mon_pt; dev_remove_pack(pt); pt->type = 0; +#endif } /*---------------------------- Hashing Policies -----------------------------*/ @@ -5188,6 +5230,8 @@ out_rtnl: return res; } +extern int (*bond_handle_frame_hook)(struct sk_buff *skb); + static int __init bonding_init(void) { int i; @@ -5221,6 +5265,8 @@ static int __init bonding_init(void) register_inetaddr_notifier(&bond_inetaddr_notifier); bond_register_ipv6_notifier(); + bond_handle_frame_hook = bond_handle_frame; + goto out; err: list_for_each_entry(bond, &bond_dev_list, bond_list) { @@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void) rtnl_lock(); bond_free_all(); rtnl_unlock(); + + bond_handle_frame_hook = NULL; + } module_init(bonding_init); diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 18cf478..0e9f0e9 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d, struct device_attribute *attr, const char *buf, size_t count) { - int new_value; + int new_value, i; struct bonding *bond = to_bond(d); + struct slave *slave; new_value = bond_parse_parm(buf, arp_validate_tbl); if (new_value < 0) { @@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d, bond->dev->name, arp_validate_tbl[new_value].modename, new_value); - if (!bond->params.arp_validate && new_value) { - bond_register_arp(bond); - } else if (bond->params.arp_validate && !new_value) { - bond_unregister_arp(bond); - } + if (bond->params.arp_validate != new_value) { + bond->params.arp_validate = new_value; - bond->params.arp_validate = new_value; + bond_for_each_slave(bond, slave, i) + bond_fix_slave_validate_flag(bond, slave); + } return count; } diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index ca849d2..275f08d 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond, return slave->dev->last_rx; } +static inline void bond_fix_slave_validate_flag(struct bonding *bond, + struct slave *slave) +{ + if (slave_do_arp_validate(bond, slave)) + slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; + else + slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP; +} + static inline void bond_set_slave_inactive_flags(struct slave *slave) { struct bonding *bond = netdev_priv(slave->dev->master); @@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave) bond->params.mode != BOND_MODE_ALB) slave->state = BOND_STATE_BACKUP; slave->dev->priv_flags |= IFF_SLAVE_INACTIVE; - if (slave_do_arp_validate(bond, slave)) - slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; + bond_fix_slave_validate_flag(bond, slave); } static inline void bond_set_slave_active_flags(struct slave *slave) { + struct bonding *bond = netdev_priv(slave->dev->master); + slave->state = BOND_STATE_ACTIVE; - slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP); + slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE; + bond_fix_slave_validate_flag(bond, slave); } static inline void bond_set_master_3ad_flags(struct bonding *bond) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2e7783f..3dafd8f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev, dev->gso_max_size = size; } +#if 0 /* On bonding slaves other than the currently active slave, suppress * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and * ARP on active-backup slaves with arp_validate enabled. @@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) } return 0; } +#endif +extern int skb_bond_should_drop(struct sk_buff *skb); extern struct pernet_operations __net_initdata loopback_net_ops; #endif /* __KERNEL__ */ diff --git a/net/core/dev.c b/net/core/dev.c index 91d792d..ac5a37e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb, #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb) #endif +#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) +int (*bond_handle_frame_hook)(struct sk_buff *skb); +EXPORT_SYMBOL_GPL(bond_handle_frame_hook); + +/* On bonding slaves other than the currently active slave, suppress + * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and + * ARP on active-backup slaves with arp_validate enabled. + */ +int skb_bond_should_drop(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct net_device *master = dev->master; + + if (master) + return bond_handle_frame_hook(skb); + + return 0; + +#if 0 + if (master->priv_flags & IFF_MASTER_ARPMON) + dev->last_rx = jiffies; + + if (dev->priv_flags & IFF_SLAVE_INACTIVE) { + if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && + skb->protocol == __constant_htons(ETH_P_ARP)) + return 0; + + if (master->priv_flags & IFF_MASTER_ALB) { + if (skb->pkt_type != PACKET_BROADCAST && + skb->pkt_type != PACKET_MULTICAST) + return 0; + } + if (master->priv_flags & IFF_MASTER_8023AD && + skb->protocol == __constant_htons(ETH_P_SLOW)) + return 0; + + return 1; + } + } + return 0; +#endif /* 0 */ +} +#else +int skb_bond_should_drop(struct sk_buff *skb) +{ + return 0; +} +#endif +EXPORT_SYMBOL_GPL(skb_bond_should_drop); + #ifdef CONFIG_NET_CLS_ACT /* TODO: Maybe we should just force sch_ingress to be compiled in * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 1:12 ` Jay Vosburgh @ 2009-04-23 5:58 ` Eric Dumazet 2009-04-23 15:38 ` Jay Vosburgh 2009-04-23 7:48 ` Jiri Pirko 1 sibling, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2009-04-23 5:58 UTC (permalink / raw) To: Jay Vosburgh; +Cc: stefan novak, linux-kernel, Linux Netdev List Jay Vosburgh a écrit : > stefan novak <lms.brubaker@gmail.com> wrote: > >> so its a bug in kernel? > > Yes. I think the following patch should add support for > arp_validate over VLANs, could you test this? This is still a work in > progress, so it's pretty rough around the edges, but the core > functionality should be there. > > This works by moving the skb_bond_should_drop logic around, and > replaces the current inline code with a hook into bonding to do all of > that, plus additional logic. The reason for a hook is to get the > skb_bond_should_drop callers from the VLAN input path before the input > device is assigned to the VLAN device. > > -J Hi Jay I wanted to test your patch and setup such VLAN config, and not yet applied your patch. # cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth2 MII Status: up MII Polling Interval (ms): 0 Up Delay (ms): 0 Down Delay (ms): 0 ARP Polling Interval (ms): 250 ARP IP target/s (n.n.n.n form): 192.168.20.254 Slave Interface: eth1 MII Status: up Link Failure Count: 8 Permanent HW addr: 00:1e:0b:ec:d3:d2 Slave Interface: eth2 MII Status: up Link Failure Count: 11 Permanent HW addr: 00:1e:0b:92:78:50 # grep . /sys/class/net/bond1/bonding/* /sys/class/net/bond1/bonding/active_slave:eth2 /sys/class/net/bond1/bonding/ad_select:stable 0 /sys/class/net/bond1/bonding/arp_interval:250 /sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254 /sys/class/net/bond1/bonding/arp_validate:none 0 /sys/class/net/bond1/bonding/downdelay:0 /sys/class/net/bond1/bonding/fail_over_mac:none 0 /sys/class/net/bond1/bonding/lacp_rate:slow 0 /sys/class/net/bond1/bonding/miimon:0 /sys/class/net/bond1/bonding/mii_status:up /sys/class/net/bond1/bonding/mode:active-backup 1 /sys/class/net/bond1/bonding/num_grat_arp:1 /sys/class/net/bond1/bonding/num_unsol_na:1 /sys/class/net/bond1/bonding/slaves:eth1 eth2 /sys/class/net/bond1/bonding/updelay:0 /sys/class/net/bond1/bonding/use_carrier:1 /sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0 So active slave is eth2. Still I receive trafic on eth1, according to ifconfig : # ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0 TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0 RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0 TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0 exactly 4 packets per second. # ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0 TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0 RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0 TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0 I also receive arp answers on eth2 (quite normal) I wanted to tcpdump on eth1 but got nothing : # tcpdump -p -n -s 0 -i eth1 tcpdump: WARNING: eth1: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes ^C # tcpdump -p -n -s 0 -i eth2 tcpdump: WARNING: eth2: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes 07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110 07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110 07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110 07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110 07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110 07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110 # tcpdump -p -n -s 0 -i bond1 tcpdump: WARNING: bond1: no IPv4 address assigned tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes 07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 # tcpdump -p -n -s 0 -i bond1.103 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes 07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 Configuration script is ip link set eth1 up ip link set eth2 up ip addr flush dev eth1 ip addr flush dev eth2 ip link set eth1 up ip link set eth2 up modprobe bond1 ifconfig bond1 down # test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target echo 250 >/sys/class/net/bond1/bonding/arp_interval # mode actif/passif (active-backup) echo 1 >/sys/class/net/bond1/bonding/mode ifconfig bond1 up ifenslave bond1 eth1 eth2 ip link set eth1 up ip link set eth2 up ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103 ip addr add 192.168.20.110/24 dev bond1.103 ip link set bond1.103 up ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825 ip addr add 10.170.73.123/25 dev bond1.825 ip link set bond1.825 up Is this setup OK to test your patch ? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 5:58 ` Eric Dumazet @ 2009-04-23 15:38 ` Jay Vosburgh 0 siblings, 0 replies; 19+ messages in thread From: Jay Vosburgh @ 2009-04-23 15:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: stefan novak, linux-kernel, Linux Netdev List Eric Dumazet <dada1@cosmosbay.com> wrote: >Jay Vosburgh a écrit : >> stefan novak <lms.brubaker@gmail.com> wrote: >> >>> so its a bug in kernel? >> >> Yes. I think the following patch should add support for >> arp_validate over VLANs, could you test this? This is still a work in >> progress, so it's pretty rough around the edges, but the core >> functionality should be there. >> >> This works by moving the skb_bond_should_drop logic around, and >> replaces the current inline code with a hook into bonding to do all of >> that, plus additional logic. The reason for a hook is to get the >> skb_bond_should_drop callers from the VLAN input path before the input >> device is assigned to the VLAN device. >> >> -J > >Hi Jay > >I wanted to test your patch and setup such VLAN config, and not yet applied your patch. > ># cat /proc/net/bonding/bond1 >Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) > >Bonding Mode: fault-tolerance (active-backup) >Primary Slave: None >Currently Active Slave: eth2 >MII Status: up >MII Polling Interval (ms): 0 >Up Delay (ms): 0 >Down Delay (ms): 0 >ARP Polling Interval (ms): 250 >ARP IP target/s (n.n.n.n form): 192.168.20.254 > >Slave Interface: eth1 >MII Status: up >Link Failure Count: 8 >Permanent HW addr: 00:1e:0b:ec:d3:d2 > >Slave Interface: eth2 >MII Status: up >Link Failure Count: 11 >Permanent HW addr: 00:1e:0b:92:78:50 > ># grep . /sys/class/net/bond1/bonding/* >/sys/class/net/bond1/bonding/active_slave:eth2 >/sys/class/net/bond1/bonding/ad_select:stable 0 >/sys/class/net/bond1/bonding/arp_interval:250 >/sys/class/net/bond1/bonding/arp_ip_target:192.168.20.254 >/sys/class/net/bond1/bonding/arp_validate:none 0 To test this patch, you'll need to set arp_validate to all. Well, insuring that nothing breaks without arp_validate is good, too, but the patch is supposed to make arp_validate function with an arp_ip_target accessed via a VLAN. >/sys/class/net/bond1/bonding/downdelay:0 >/sys/class/net/bond1/bonding/fail_over_mac:none 0 >/sys/class/net/bond1/bonding/lacp_rate:slow 0 >/sys/class/net/bond1/bonding/miimon:0 >/sys/class/net/bond1/bonding/mii_status:up >/sys/class/net/bond1/bonding/mode:active-backup 1 >/sys/class/net/bond1/bonding/num_grat_arp:1 >/sys/class/net/bond1/bonding/num_unsol_na:1 >/sys/class/net/bond1/bonding/slaves:eth1 eth2 >/sys/class/net/bond1/bonding/updelay:0 >/sys/class/net/bond1/bonding/use_carrier:1 >/sys/class/net/bond1/bonding/xmit_hash_policy:layer2 0 > > > >So active slave is eth2. Still I receive trafic on eth1, according to ifconfig : ># ifconfig eth1|grep packets;sleep 10;ifconfig eth1|grep packets > RX packets:2098122 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0 > RX packets:2098162 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2053085 errors:0 dropped:0 overruns:0 carrier:0 > >exactly 4 packets per second. This is expected, these are the broadcast ARP_REQUEST packets the bond issues as probes, every 250 ms as specified by your arp_interval. Any broadcasts on the switch or other traffic flooded to all ports will come into the slave device, and (for the active-backup inactive slave case) all of it is thrown away. There's some trickery in netif_receive_skb for sockets that bind directly to the slave, but that doesn't seem to affect tcpdump. ># ifconfig eth2|grep packets;sleep 10;ifconfig eth2|grep packets > RX packets:3695512 errors:0 dropped:0 overruns:0 frame:0 > TX packets:3683799 errors:0 dropped:0 overruns:0 carrier:0 > RX packets:3695554 errors:0 dropped:0 overruns:0 frame:0 > TX packets:3683841 errors:0 dropped:0 overruns:0 carrier:0 > >I also receive arp answers on eth2 (quite normal) > >I wanted to tcpdump on eth1 but got nothing : > ># tcpdump -p -n -s 0 -i eth1 >tcpdump: WARNING: eth1: no IPv4 address assigned >tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes This is normal, as incoming traffic is assigned to the master before the packet capture gets a look. ># tcpdump -p -n -s 0 -i eth2 >tcpdump: WARNING: eth2: no IPv4 address assigned >tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes >07:54:48.430982 arp who-has 192.168.20.254 tell 192.168.20.110 >07:54:48.680980 arp who-has 192.168.20.254 tell 192.168.20.110 >07:54:48.930981 arp who-has 192.168.20.254 tell 192.168.20.110 >07:54:49.180980 arp who-has 192.168.20.254 tell 192.168.20.110 >07:54:49.430980 arp who-has 192.168.20.254 tell 192.168.20.110 >07:54:49.680980 arp who-has 192.168.20.254 tell 192.168.20.110 This is also normal (seeing only outbound traffic) for the same reason as above. ># tcpdump -p -n -s 0 -i bond1 >tcpdump: WARNING: bond1: no IPv4 address assigned >tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >listening on bond1, link-type EN10MB (Ethernet), capture size 65535 bytes >07:55:28.681491 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:28.931493 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:29.181466 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:29.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:29.681487 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:29.931492 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 > ># tcpdump -p -n -s 0 -i bond1.103 >tcpdump: verbose output suppressed, use -v or -vv for full protocol decode >listening on bond1.103, link-type EN10MB (Ethernet), capture size 65535 bytes >07:55:58.681521 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:58.931636 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:59.181495 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:59.431496 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:59.681499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 >07:55:59.931499 arp reply 192.168.20.254 is-at 00:00:0c:07:ac:01 > >Configuration script is > >ip link set eth1 up >ip link set eth2 up > >ip addr flush dev eth1 >ip addr flush dev eth2 > >ip link set eth1 up >ip link set eth2 up > >modprobe bond1 > >ifconfig bond1 down > ># test du arp_check toutes les 250ms, en choissant l'HSRP du vlan 103 comme IP >echo +192.168.20.254 >/sys/class/net/bond1/bonding/arp_ip_target >echo 250 >/sys/class/net/bond1/bonding/arp_interval > ># mode actif/passif (active-backup) >echo 1 >/sys/class/net/bond1/bonding/mode > >ifconfig bond1 up > >ifenslave bond1 eth1 eth2 > >ip link set eth1 up >ip link set eth2 up > >ip link add link bond1 bond1.103 txqueuelen 100 type vlan id 103 >ip addr add 192.168.20.110/24 dev bond1.103 >ip link set bond1.103 up > >ip link add link bond1 bond1.825 txqueuelen 1000 type vlan id 825 >ip addr add 10.170.73.123/25 dev bond1.825 >ip link set bond1.825 up > > > >Is this setup OK to test your patch ? I believe so, provided you enable arp_validate as mentioned above. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 1:12 ` Jay Vosburgh 2009-04-23 5:58 ` Eric Dumazet @ 2009-04-23 7:48 ` Jiri Pirko 2009-04-23 14:59 ` Jay Vosburgh 1 sibling, 1 reply; 19+ messages in thread From: Jiri Pirko @ 2009-04-23 7:48 UTC (permalink / raw) To: Jay Vosburgh; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote: >stefan novak <lms.brubaker@gmail.com> wrote: > >>so its a bug in kernel? > > Yes. I think the following patch should add support for >arp_validate over VLANs, could you test this? This is still a work in >progress, so it's pretty rough around the edges, but the core >functionality should be there. > > This works by moving the skb_bond_should_drop logic around, and >replaces the current inline code with a hook into bonding to do all of >that, plus additional logic. The reason for a hook is to get the >skb_bond_should_drop callers from the VLAN input path before the input >device is assigned to the VLAN device. > > -J > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 99610f3..cc367a3 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) > read_lock(&bond->lock); > > new_slave->last_arp_rx = jiffies; >+ bond_fix_slave_validate_flag(bond, new_slave); > > if (bond->params.miimon && !bond->params.use_carrier) { > link_reporting = bond_check_dev_link(bond, slave_dev, 1); >@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_ > { > struct sk_buff *skb; > >- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op, >- slave_dev->name, dest_ip, src_ip, vlan_id); >+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op, >+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies); > > skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip, > NULL, slave_dev->dev_addr, NULL); >@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 > > targets = bond->params.arp_targets; > for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) { >- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n", >- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip)); > if (sip == targets[i]) { > if (bond_has_this_ip(bond, tip)) > slave->last_arp_rx = jiffies; >@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 > } > } > >-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) >+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev) > { > struct arphdr *arp; > struct slave *slave; >+ struct net_device *bond_dev = dev->master; > struct bonding *bond; > unsigned char *arp_ptr; > __be32 sip, tip; > > if (dev_net(dev) != &init_net) >- goto out; >+ return; > >- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER)) >- goto out; >+ bond = netdev_priv(bond_dev); > >- bond = netdev_priv(dev); > read_lock(&bond->lock); > >- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n", >- bond->dev->name, skb->dev ? skb->dev->name : "NULL", >- orig_dev ? orig_dev->name : "NULL"); >- >- slave = bond_get_slave_by_dev(bond, orig_dev); >+ slave = bond_get_slave_by_dev(bond, dev); > if (!slave || !slave_do_arp_validate(bond, slave)) > goto out_unlock; > > if (!pskb_may_pull(skb, arp_hdr_len(dev))) > goto out_unlock; > >- arp = arp_hdr(skb); >+ /* Can't use arp_hdr; it's not initialized yet. */ >+ arp = (struct arphdr *)skb->data; >+ >+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n", >+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd), >+ ntohs(arp->ar_pro), arp->ar_pln); >+ > if (arp->ar_hln != dev->addr_len || > skb->pkt_type == PACKET_OTHERHOST || > skb->pkt_type == PACKET_LOOPBACK || >@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack > arp_ptr += 4 + dev->addr_len; > memcpy(&tip, arp_ptr, 4); > >- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n", >- bond->dev->name, slave->dev->name, slave->state, >- bond->params.arp_validate, slave_do_arp_validate(bond, slave), >- &sip, &tip); >- >- /* >- * Backup slaves won't see the ARP reply, but do come through >- * here for each ARP probe (so we swap the sip/tip to validate >- * the probe). In a "redundant switch, common router" type of >- * configuration, the ARP probe will (hopefully) travel from >- * the active, through one switch, the router, then the other >- * switch before reaching the backup. >- */ >- if (slave->state == BOND_STATE_ACTIVE) >+ switch (ntohs(arp->ar_op)) { >+ case ARPOP_REPLY: > bond_validate_arp(bond, slave, sip, tip); >- else >+ break; >+ case ARPOP_REQUEST: > bond_validate_arp(bond, slave, tip, sip); >+ break; >+ } > > out_unlock: > read_unlock(&bond->lock); >-out: >- dev_kfree_skb(skb); >- return NET_RX_SUCCESS; >+} >+ >+/* >+ * Called by core packet processing in netif_receive_skb and in VLAN fast >+ * path to (a) determine if packet should be dropped, and (b) to perform >+ * ARP monitor processing (last_rx, validation). >+ * >+ * For the VLAN case, called before the skb->dev is reassigned to the >+ * VLAN. >+ * >+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept. >+ * >+ * We want to keep: >+ * - all traffic on active slaves >+ * - some traffic on inactive slaves: non-broadcast and non-multicast in >+ * ALB/TLB mode and LACP frames in 802.3ad mode. >+ * >+ * ARP frames are handled as normal traffic; the ARP monitor handling >+ * takes place here, so they need not be kept on inactive slaves. >+ */ >+static int bond_handle_frame(struct sk_buff *skb) >+{ >+ struct net_device *dev; >+ struct net_device *master; >+ >+ dev = skb->dev; >+ master = dev->master; >+ if (!master || !master->priv_flags & IFF_BONDING) >+ return 0; >+ >+ if (master->priv_flags & IFF_MASTER_ARPMON) { >+ dev->last_rx = jiffies; >+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && >+ skb->protocol == __constant_htons(ETH_P_ARP)) >+ bond_handle_arp(skb, dev); >+ } >+ >+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) { >+ if (master->priv_flags & IFF_MASTER_ALB) { >+ if (skb->pkt_type != PACKET_BROADCAST && >+ skb->pkt_type != PACKET_MULTICAST) >+ return 0; >+ } >+ if (master->priv_flags & IFF_MASTER_8023AD && >+ skb->protocol == __constant_htons(ETH_P_SLOW)) >+ return 0; >+ >+ return 1; >+ } >+ return 0; > } > > /* >@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond) > > void bond_register_arp(struct bonding *bond) > { >+#if 0 > struct packet_type *pt = &bond->arp_mon_pt; > > if (pt->type) >@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond) > pt->dev = bond->dev; > pt->func = bond_arp_rcv; > dev_add_pack(pt); >+#endif > } > > void bond_unregister_arp(struct bonding *bond) > { >+#if 0 > struct packet_type *pt = &bond->arp_mon_pt; > > dev_remove_pack(pt); > pt->type = 0; >+#endif > } > > /*---------------------------- Hashing Policies -----------------------------*/ >@@ -5188,6 +5230,8 @@ out_rtnl: > return res; > } > >+extern int (*bond_handle_frame_hook)(struct sk_buff *skb); >+ > static int __init bonding_init(void) > { > int i; >@@ -5221,6 +5265,8 @@ static int __init bonding_init(void) > register_inetaddr_notifier(&bond_inetaddr_notifier); > bond_register_ipv6_notifier(); > >+ bond_handle_frame_hook = bond_handle_frame; >+ > goto out; > err: > list_for_each_entry(bond, &bond_dev_list, bond_list) { >@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void) > rtnl_lock(); > bond_free_all(); > rtnl_unlock(); >+ >+ bond_handle_frame_hook = NULL; >+ > } > > module_init(bonding_init); >diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c >index 18cf478..0e9f0e9 100644 >--- a/drivers/net/bonding/bond_sysfs.c >+++ b/drivers/net/bonding/bond_sysfs.c >@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d, > struct device_attribute *attr, > const char *buf, size_t count) > { >- int new_value; >+ int new_value, i; > struct bonding *bond = to_bond(d); >+ struct slave *slave; > > new_value = bond_parse_parm(buf, arp_validate_tbl); > if (new_value < 0) { >@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d, > bond->dev->name, arp_validate_tbl[new_value].modename, > new_value); > >- if (!bond->params.arp_validate && new_value) { >- bond_register_arp(bond); >- } else if (bond->params.arp_validate && !new_value) { >- bond_unregister_arp(bond); >- } >+ if (bond->params.arp_validate != new_value) { >+ bond->params.arp_validate = new_value; > >- bond->params.arp_validate = new_value; >+ bond_for_each_slave(bond, slave, i) >+ bond_fix_slave_validate_flag(bond, slave); >+ } > > return count; > } >diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h >index ca849d2..275f08d 100644 >--- a/drivers/net/bonding/bonding.h >+++ b/drivers/net/bonding/bonding.h >@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond, > return slave->dev->last_rx; > } > >+static inline void bond_fix_slave_validate_flag(struct bonding *bond, >+ struct slave *slave) >+{ >+ if (slave_do_arp_validate(bond, slave)) >+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; >+ else >+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP; >+} >+ > static inline void bond_set_slave_inactive_flags(struct slave *slave) > { > struct bonding *bond = netdev_priv(slave->dev->master); >@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave) > bond->params.mode != BOND_MODE_ALB) > slave->state = BOND_STATE_BACKUP; > slave->dev->priv_flags |= IFF_SLAVE_INACTIVE; >- if (slave_do_arp_validate(bond, slave)) >- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; >+ bond_fix_slave_validate_flag(bond, slave); > } > > static inline void bond_set_slave_active_flags(struct slave *slave) > { >+ struct bonding *bond = netdev_priv(slave->dev->master); >+ > slave->state = BOND_STATE_ACTIVE; >- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP); >+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE; >+ bond_fix_slave_validate_flag(bond, slave); > } > > static inline void bond_set_master_3ad_flags(struct bonding *bond) >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >index 2e7783f..3dafd8f 100644 >--- a/include/linux/netdevice.h >+++ b/include/linux/netdevice.h >@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev, > dev->gso_max_size = size; > } > >+#if 0 > /* On bonding slaves other than the currently active slave, suppress > * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and > * ARP on active-backup slaves with arp_validate enabled. >@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) > } > return 0; > } >+#endif >+extern int skb_bond_should_drop(struct sk_buff *skb); > > extern struct pernet_operations __net_initdata loopback_net_ops; > #endif /* __KERNEL__ */ >diff --git a/net/core/dev.c b/net/core/dev.c >index 91d792d..ac5a37e 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb, > #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb) > #endif > >+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) >+int (*bond_handle_frame_hook)(struct sk_buff *skb); >+EXPORT_SYMBOL_GPL(bond_handle_frame_hook); >+ >+/* On bonding slaves other than the currently active slave, suppress >+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and >+ * ARP on active-backup slaves with arp_validate enabled. >+ */ >+int skb_bond_should_drop(struct sk_buff *skb) >+{ >+ struct net_device *dev = skb->dev; >+ struct net_device *master = dev->master; >+ >+ if (master) >+ return bond_handle_frame_hook(skb); Maybe this hook can be called from netif_receive_skb() directly. You would safe at least 2 dereferences, 1 if check. You would also need to add "skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common(). >+ >+ return 0; >+ >+#if 0 >+ if (master->priv_flags & IFF_MASTER_ARPMON) >+ dev->last_rx = jiffies; >+ >+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) { >+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && >+ skb->protocol == __constant_htons(ETH_P_ARP)) >+ return 0; >+ >+ if (master->priv_flags & IFF_MASTER_ALB) { >+ if (skb->pkt_type != PACKET_BROADCAST && >+ skb->pkt_type != PACKET_MULTICAST) >+ return 0; >+ } >+ if (master->priv_flags & IFF_MASTER_8023AD && >+ skb->protocol == __constant_htons(ETH_P_SLOW)) >+ return 0; >+ >+ return 1; >+ } >+ } >+ return 0; >+#endif /* 0 */ >+} >+#else >+int skb_bond_should_drop(struct sk_buff *skb) >+{ >+ return 0; >+} >+#endif >+EXPORT_SYMBOL_GPL(skb_bond_should_drop); >+ > #ifdef CONFIG_NET_CLS_ACT > /* TODO: Maybe we should just force sch_ingress to be compiled in > * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions > > >--- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 7:48 ` Jiri Pirko @ 2009-04-23 14:59 ` Jay Vosburgh 2009-04-23 15:20 ` Jiri Pirko 0 siblings, 1 reply; 19+ messages in thread From: Jay Vosburgh @ 2009-04-23 14:59 UTC (permalink / raw) To: Jiri Pirko; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List Jiri Pirko <jpirko@redhat.com> wrote: >Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote: >>stefan novak <lms.brubaker@gmail.com> wrote: >> >>>so its a bug in kernel? >> >> Yes. I think the following patch should add support for >>arp_validate over VLANs, could you test this? This is still a work in >>progress, so it's pretty rough around the edges, but the core >>functionality should be there. >> >> This works by moving the skb_bond_should_drop logic around, and >>replaces the current inline code with a hook into bonding to do all of >>that, plus additional logic. The reason for a hook is to get the >>skb_bond_should_drop callers from the VLAN input path before the input >>device is assigned to the VLAN device. >> >> -J >> >>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >>index 99610f3..cc367a3 100644 >>--- a/drivers/net/bonding/bond_main.c >>+++ b/drivers/net/bonding/bond_main.c >>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) >> read_lock(&bond->lock); >> >> new_slave->last_arp_rx = jiffies; >>+ bond_fix_slave_validate_flag(bond, new_slave); >> >> if (bond->params.miimon && !bond->params.use_carrier) { >> link_reporting = bond_check_dev_link(bond, slave_dev, 1); >>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_ >> { >> struct sk_buff *skb; >> >>- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op, >>- slave_dev->name, dest_ip, src_ip, vlan_id); >>+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op, >>+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies); >> >> skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip, >> NULL, slave_dev->dev_addr, NULL); >>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 >> >> targets = bond->params.arp_targets; >> for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) { >>- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n", >>- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip)); >> if (sip == targets[i]) { >> if (bond_has_this_ip(bond, tip)) >> slave->last_arp_rx = jiffies; >>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 >> } >> } >> >>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) >>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev) >> { >> struct arphdr *arp; >> struct slave *slave; >>+ struct net_device *bond_dev = dev->master; >> struct bonding *bond; >> unsigned char *arp_ptr; >> __be32 sip, tip; >> >> if (dev_net(dev) != &init_net) >>- goto out; >>+ return; >> >>- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER)) >>- goto out; >>+ bond = netdev_priv(bond_dev); >> >>- bond = netdev_priv(dev); >> read_lock(&bond->lock); >> >>- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n", >>- bond->dev->name, skb->dev ? skb->dev->name : "NULL", >>- orig_dev ? orig_dev->name : "NULL"); >>- >>- slave = bond_get_slave_by_dev(bond, orig_dev); >>+ slave = bond_get_slave_by_dev(bond, dev); >> if (!slave || !slave_do_arp_validate(bond, slave)) >> goto out_unlock; >> >> if (!pskb_may_pull(skb, arp_hdr_len(dev))) >> goto out_unlock; >> >>- arp = arp_hdr(skb); >>+ /* Can't use arp_hdr; it's not initialized yet. */ >>+ arp = (struct arphdr *)skb->data; >>+ >>+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n", >>+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd), >>+ ntohs(arp->ar_pro), arp->ar_pln); >>+ >> if (arp->ar_hln != dev->addr_len || >> skb->pkt_type == PACKET_OTHERHOST || >> skb->pkt_type == PACKET_LOOPBACK || >>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack >> arp_ptr += 4 + dev->addr_len; >> memcpy(&tip, arp_ptr, 4); >> >>- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n", >>- bond->dev->name, slave->dev->name, slave->state, >>- bond->params.arp_validate, slave_do_arp_validate(bond, slave), >>- &sip, &tip); >>- >>- /* >>- * Backup slaves won't see the ARP reply, but do come through >>- * here for each ARP probe (so we swap the sip/tip to validate >>- * the probe). In a "redundant switch, common router" type of >>- * configuration, the ARP probe will (hopefully) travel from >>- * the active, through one switch, the router, then the other >>- * switch before reaching the backup. >>- */ >>- if (slave->state == BOND_STATE_ACTIVE) >>+ switch (ntohs(arp->ar_op)) { >>+ case ARPOP_REPLY: >> bond_validate_arp(bond, slave, sip, tip); >>- else >>+ break; >>+ case ARPOP_REQUEST: >> bond_validate_arp(bond, slave, tip, sip); >>+ break; >>+ } >> >> out_unlock: >> read_unlock(&bond->lock); >>-out: >>- dev_kfree_skb(skb); >>- return NET_RX_SUCCESS; >>+} >>+ >>+/* >>+ * Called by core packet processing in netif_receive_skb and in VLAN fast >>+ * path to (a) determine if packet should be dropped, and (b) to perform >>+ * ARP monitor processing (last_rx, validation). >>+ * >>+ * For the VLAN case, called before the skb->dev is reassigned to the >>+ * VLAN. >>+ * >>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept. >>+ * >>+ * We want to keep: >>+ * - all traffic on active slaves >>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in >>+ * ALB/TLB mode and LACP frames in 802.3ad mode. >>+ * >>+ * ARP frames are handled as normal traffic; the ARP monitor handling >>+ * takes place here, so they need not be kept on inactive slaves. >>+ */ >>+static int bond_handle_frame(struct sk_buff *skb) >>+{ >>+ struct net_device *dev; >>+ struct net_device *master; >>+ >>+ dev = skb->dev; >>+ master = dev->master; >>+ if (!master || !master->priv_flags & IFF_BONDING) >>+ return 0; >>+ >>+ if (master->priv_flags & IFF_MASTER_ARPMON) { >>+ dev->last_rx = jiffies; >>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && >>+ skb->protocol == __constant_htons(ETH_P_ARP)) >>+ bond_handle_arp(skb, dev); >>+ } >>+ >>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) { >>+ if (master->priv_flags & IFF_MASTER_ALB) { >>+ if (skb->pkt_type != PACKET_BROADCAST && >>+ skb->pkt_type != PACKET_MULTICAST) >>+ return 0; >>+ } >>+ if (master->priv_flags & IFF_MASTER_8023AD && >>+ skb->protocol == __constant_htons(ETH_P_SLOW)) >>+ return 0; >>+ >>+ return 1; >>+ } >>+ return 0; >> } >> >> /* >>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond) >> >> void bond_register_arp(struct bonding *bond) >> { >>+#if 0 >> struct packet_type *pt = &bond->arp_mon_pt; >> >> if (pt->type) >>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond) >> pt->dev = bond->dev; >> pt->func = bond_arp_rcv; >> dev_add_pack(pt); >>+#endif >> } >> >> void bond_unregister_arp(struct bonding *bond) >> { >>+#if 0 >> struct packet_type *pt = &bond->arp_mon_pt; >> >> dev_remove_pack(pt); >> pt->type = 0; >>+#endif >> } >> >> /*---------------------------- Hashing Policies -----------------------------*/ >>@@ -5188,6 +5230,8 @@ out_rtnl: >> return res; >> } >> >>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb); >>+ >> static int __init bonding_init(void) >> { >> int i; >>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void) >> register_inetaddr_notifier(&bond_inetaddr_notifier); >> bond_register_ipv6_notifier(); >> >>+ bond_handle_frame_hook = bond_handle_frame; >>+ >> goto out; >> err: >> list_for_each_entry(bond, &bond_dev_list, bond_list) { >>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void) >> rtnl_lock(); >> bond_free_all(); >> rtnl_unlock(); >>+ >>+ bond_handle_frame_hook = NULL; >>+ >> } >> >> module_init(bonding_init); >>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c >>index 18cf478..0e9f0e9 100644 >>--- a/drivers/net/bonding/bond_sysfs.c >>+++ b/drivers/net/bonding/bond_sysfs.c >>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d, >> struct device_attribute *attr, >> const char *buf, size_t count) >> { >>- int new_value; >>+ int new_value, i; >> struct bonding *bond = to_bond(d); >>+ struct slave *slave; >> >> new_value = bond_parse_parm(buf, arp_validate_tbl); >> if (new_value < 0) { >>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d, >> bond->dev->name, arp_validate_tbl[new_value].modename, >> new_value); >> >>- if (!bond->params.arp_validate && new_value) { >>- bond_register_arp(bond); >>- } else if (bond->params.arp_validate && !new_value) { >>- bond_unregister_arp(bond); >>- } >>+ if (bond->params.arp_validate != new_value) { >>+ bond->params.arp_validate = new_value; >> >>- bond->params.arp_validate = new_value; >>+ bond_for_each_slave(bond, slave, i) >>+ bond_fix_slave_validate_flag(bond, slave); >>+ } >> >> return count; >> } >>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h >>index ca849d2..275f08d 100644 >>--- a/drivers/net/bonding/bonding.h >>+++ b/drivers/net/bonding/bonding.h >>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond, >> return slave->dev->last_rx; >> } >> >>+static inline void bond_fix_slave_validate_flag(struct bonding *bond, >>+ struct slave *slave) >>+{ >>+ if (slave_do_arp_validate(bond, slave)) >>+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; >>+ else >>+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP; >>+} >>+ >> static inline void bond_set_slave_inactive_flags(struct slave *slave) >> { >> struct bonding *bond = netdev_priv(slave->dev->master); >>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave) >> bond->params.mode != BOND_MODE_ALB) >> slave->state = BOND_STATE_BACKUP; >> slave->dev->priv_flags |= IFF_SLAVE_INACTIVE; >>- if (slave_do_arp_validate(bond, slave)) >>- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; >>+ bond_fix_slave_validate_flag(bond, slave); >> } >> >> static inline void bond_set_slave_active_flags(struct slave *slave) >> { >>+ struct bonding *bond = netdev_priv(slave->dev->master); >>+ >> slave->state = BOND_STATE_ACTIVE; >>- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP); >>+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE; >>+ bond_fix_slave_validate_flag(bond, slave); >> } >> >> static inline void bond_set_master_3ad_flags(struct bonding *bond) >>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>index 2e7783f..3dafd8f 100644 >>--- a/include/linux/netdevice.h >>+++ b/include/linux/netdevice.h >>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev, >> dev->gso_max_size = size; >> } >> >>+#if 0 >> /* On bonding slaves other than the currently active slave, suppress >> * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and >> * ARP on active-backup slaves with arp_validate enabled. >>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) >> } >> return 0; >> } >>+#endif >>+extern int skb_bond_should_drop(struct sk_buff *skb); >> >> extern struct pernet_operations __net_initdata loopback_net_ops; >> #endif /* __KERNEL__ */ >>diff --git a/net/core/dev.c b/net/core/dev.c >>index 91d792d..ac5a37e 100644 >>--- a/net/core/dev.c >>+++ b/net/core/dev.c >>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb, >> #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb) >> #endif >> >>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) >>+int (*bond_handle_frame_hook)(struct sk_buff *skb); >>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook); >>+ >>+/* On bonding slaves other than the currently active slave, suppress >>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and >>+ * ARP on active-backup slaves with arp_validate enabled. >>+ */ >>+int skb_bond_should_drop(struct sk_buff *skb) >>+{ >>+ struct net_device *dev = skb->dev; >>+ struct net_device *master = dev->master; >>+ >>+ if (master) >>+ return bond_handle_frame_hook(skb); > >Maybe this hook can be called from netif_receive_skb() directly. You would safe >at least 2 dereferences, 1 if check. You would also need to add >"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common(). This won't work, because the VLAN code reassigns skb->dev to the VLAN device before calling netif_receive_skb. In the VLAN path, the second call to skb_bond_should_drop (the first is within the VLAN code itself, __vlan_hwaccel_rx or vlan_gro_common, the second is netif_receive_skb) won't call the hook, because the VLAN device has no dev->master. This is the whole reason for the hook: to process the ARP before VLAN reassigns skb->dev. Once that happens, the actual device the ARP arrived on is lost. -J >>+ >>+ return 0; >>+ >>+#if 0 >>+ if (master->priv_flags & IFF_MASTER_ARPMON) >>+ dev->last_rx = jiffies; >>+ >>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) { >>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && >>+ skb->protocol == __constant_htons(ETH_P_ARP)) >>+ return 0; >>+ >>+ if (master->priv_flags & IFF_MASTER_ALB) { >>+ if (skb->pkt_type != PACKET_BROADCAST && >>+ skb->pkt_type != PACKET_MULTICAST) >>+ return 0; >>+ } >>+ if (master->priv_flags & IFF_MASTER_8023AD && >>+ skb->protocol == __constant_htons(ETH_P_SLOW)) >>+ return 0; >>+ >>+ return 1; >>+ } >>+ } >>+ return 0; >>+#endif /* 0 */ >>+} >>+#else >>+int skb_bond_should_drop(struct sk_buff *skb) >>+{ >>+ return 0; >>+} >>+#endif >>+EXPORT_SYMBOL_GPL(skb_bond_should_drop); >>+ >> #ifdef CONFIG_NET_CLS_ACT >> /* TODO: Maybe we should just force sch_ingress to be compiled in >> * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions >> >> >>--- >> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com >>-- >>To unsubscribe from this list: send the line "unsubscribe netdev" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 14:59 ` Jay Vosburgh @ 2009-04-23 15:20 ` Jiri Pirko 2009-04-23 18:34 ` Jay Vosburgh 0 siblings, 1 reply; 19+ messages in thread From: Jiri Pirko @ 2009-04-23 15:20 UTC (permalink / raw) To: Jay Vosburgh; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List Thu, Apr 23, 2009 at 04:59:41PM CEST, fubar@us.ibm.com wrote: >Jiri Pirko <jpirko@redhat.com> wrote: > >>Thu, Apr 23, 2009 at 03:12:29AM CEST, fubar@us.ibm.com wrote: >>>stefan novak <lms.brubaker@gmail.com> wrote: >>> >>>>so its a bug in kernel? >>> >>> Yes. I think the following patch should add support for >>>arp_validate over VLANs, could you test this? This is still a work in >>>progress, so it's pretty rough around the edges, but the core >>>functionality should be there. >>> >>> This works by moving the skb_bond_should_drop logic around, and >>>replaces the current inline code with a hook into bonding to do all of >>>that, plus additional logic. The reason for a hook is to get the >>>skb_bond_should_drop callers from the VLAN input path before the input >>>device is assigned to the VLAN device. >>> >>> -J >>> >>>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >>>index 99610f3..cc367a3 100644 >>>--- a/drivers/net/bonding/bond_main.c >>>+++ b/drivers/net/bonding/bond_main.c >>>@@ -1607,6 +1607,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) >>> read_lock(&bond->lock); >>> >>> new_slave->last_arp_rx = jiffies; >>>+ bond_fix_slave_validate_flag(bond, new_slave); >>> >>> if (bond->params.miimon && !bond->params.use_carrier) { >>> link_reporting = bond_check_dev_link(bond, slave_dev, 1); >>>@@ -2538,8 +2539,8 @@ static void bond_arp_send(struct net_device *slave_dev, int arp_op, __be32 dest_ >>> { >>> struct sk_buff *skb; >>> >>>- pr_debug("arp %d on slave %s: dst %x src %x vid %d\n", arp_op, >>>- slave_dev->name, dest_ip, src_ip, vlan_id); >>>+ pr_debug("arp %d on slave %s: dst %x src %x vid %d j %lu\n", arp_op, >>>+ slave_dev->name, dest_ip, src_ip, vlan_id, jiffies); >>> >>> skb = arp_create(arp_op, ETH_P_ARP, dest_ip, slave_dev, src_ip, >>> NULL, slave_dev->dev_addr, NULL); >>>@@ -2679,8 +2680,6 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 >>> >>> targets = bond->params.arp_targets; >>> for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) { >>>- pr_debug("bva: sip %pI4 tip %pI4 t[%d] %pI4 bhti(tip) %d\n", >>>- &sip, &tip, i, &targets[i], bond_has_this_ip(bond, tip)); >>> if (sip == targets[i]) { >>> if (bond_has_this_ip(bond, tip)) >>> slave->last_arp_rx = jiffies; >>>@@ -2689,35 +2688,36 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32 >>> } >>> } >>> >>>-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) >>>+static void bond_handle_arp(struct sk_buff *skb, struct net_device *dev) >>> { >>> struct arphdr *arp; >>> struct slave *slave; >>>+ struct net_device *bond_dev = dev->master; >>> struct bonding *bond; >>> unsigned char *arp_ptr; >>> __be32 sip, tip; >>> >>> if (dev_net(dev) != &init_net) >>>- goto out; >>>+ return; >>> >>>- if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER)) >>>- goto out; >>>+ bond = netdev_priv(bond_dev); >>> >>>- bond = netdev_priv(dev); >>> read_lock(&bond->lock); >>> >>>- pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n", >>>- bond->dev->name, skb->dev ? skb->dev->name : "NULL", >>>- orig_dev ? orig_dev->name : "NULL"); >>>- >>>- slave = bond_get_slave_by_dev(bond, orig_dev); >>>+ slave = bond_get_slave_by_dev(bond, dev); >>> if (!slave || !slave_do_arp_validate(bond, slave)) >>> goto out_unlock; >>> >>> if (!pskb_may_pull(skb, arp_hdr_len(dev))) >>> goto out_unlock; >>> >>>- arp = arp_hdr(skb); >>>+ /* Can't use arp_hdr; it's not initialized yet. */ >>>+ arp = (struct arphdr *)skb->data; >>>+ >>>+ pr_debug("arp: hln %d p_t %d hrd %x pro %x pln %d\n", >>>+ arp->ar_hln, skb->pkt_type, ntohs(arp->ar_hrd), >>>+ ntohs(arp->ar_pro), arp->ar_pln); >>>+ >>> if (arp->ar_hln != dev->addr_len || >>> skb->pkt_type == PACKET_OTHERHOST || >>> skb->pkt_type == PACKET_LOOPBACK || >>>@@ -2732,29 +2732,67 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack >>> arp_ptr += 4 + dev->addr_len; >>> memcpy(&tip, arp_ptr, 4); >>> >>>- pr_debug("bond_arp_rcv: %s %s/%d av %d sv %d sip %pI4 tip %pI4\n", >>>- bond->dev->name, slave->dev->name, slave->state, >>>- bond->params.arp_validate, slave_do_arp_validate(bond, slave), >>>- &sip, &tip); >>>- >>>- /* >>>- * Backup slaves won't see the ARP reply, but do come through >>>- * here for each ARP probe (so we swap the sip/tip to validate >>>- * the probe). In a "redundant switch, common router" type of >>>- * configuration, the ARP probe will (hopefully) travel from >>>- * the active, through one switch, the router, then the other >>>- * switch before reaching the backup. >>>- */ >>>- if (slave->state == BOND_STATE_ACTIVE) >>>+ switch (ntohs(arp->ar_op)) { >>>+ case ARPOP_REPLY: >>> bond_validate_arp(bond, slave, sip, tip); >>>- else >>>+ break; >>>+ case ARPOP_REQUEST: >>> bond_validate_arp(bond, slave, tip, sip); >>>+ break; >>>+ } >>> >>> out_unlock: >>> read_unlock(&bond->lock); >>>-out: >>>- dev_kfree_skb(skb); >>>- return NET_RX_SUCCESS; >>>+} >>>+ >>>+/* >>>+ * Called by core packet processing in netif_receive_skb and in VLAN fast >>>+ * path to (a) determine if packet should be dropped, and (b) to perform >>>+ * ARP monitor processing (last_rx, validation). >>>+ * >>>+ * For the VLAN case, called before the skb->dev is reassigned to the >>>+ * VLAN. >>>+ * >>>+ * Returns 1 if frame should nominally be dropped; 0 if it should be kept. >>>+ * >>>+ * We want to keep: >>>+ * - all traffic on active slaves >>>+ * - some traffic on inactive slaves: non-broadcast and non-multicast in >>>+ * ALB/TLB mode and LACP frames in 802.3ad mode. >>>+ * >>>+ * ARP frames are handled as normal traffic; the ARP monitor handling >>>+ * takes place here, so they need not be kept on inactive slaves. >>>+ */ >>>+static int bond_handle_frame(struct sk_buff *skb) >>>+{ >>>+ struct net_device *dev; >>>+ struct net_device *master; >>>+ >>>+ dev = skb->dev; >>>+ master = dev->master; >>>+ if (!master || !master->priv_flags & IFF_BONDING) >>>+ return 0; >>>+ >>>+ if (master->priv_flags & IFF_MASTER_ARPMON) { >>>+ dev->last_rx = jiffies; >>>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && >>>+ skb->protocol == __constant_htons(ETH_P_ARP)) >>>+ bond_handle_arp(skb, dev); >>>+ } >>>+ >>>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) { >>>+ if (master->priv_flags & IFF_MASTER_ALB) { >>>+ if (skb->pkt_type != PACKET_BROADCAST && >>>+ skb->pkt_type != PACKET_MULTICAST) >>>+ return 0; >>>+ } >>>+ if (master->priv_flags & IFF_MASTER_8023AD && >>>+ skb->protocol == __constant_htons(ETH_P_SLOW)) >>>+ return 0; >>>+ >>>+ return 1; >>>+ } >>>+ return 0; >>> } >>> >>> /* >>>@@ -3688,6 +3726,7 @@ static void bond_unregister_lacpdu(struct bonding *bond) >>> >>> void bond_register_arp(struct bonding *bond) >>> { >>>+#if 0 >>> struct packet_type *pt = &bond->arp_mon_pt; >>> >>> if (pt->type) >>>@@ -3697,14 +3736,17 @@ void bond_register_arp(struct bonding *bond) >>> pt->dev = bond->dev; >>> pt->func = bond_arp_rcv; >>> dev_add_pack(pt); >>>+#endif >>> } >>> >>> void bond_unregister_arp(struct bonding *bond) >>> { >>>+#if 0 >>> struct packet_type *pt = &bond->arp_mon_pt; >>> >>> dev_remove_pack(pt); >>> pt->type = 0; >>>+#endif >>> } >>> >>> /*---------------------------- Hashing Policies -----------------------------*/ >>>@@ -5188,6 +5230,8 @@ out_rtnl: >>> return res; >>> } >>> >>>+extern int (*bond_handle_frame_hook)(struct sk_buff *skb); >>>+ >>> static int __init bonding_init(void) >>> { >>> int i; >>>@@ -5221,6 +5265,8 @@ static int __init bonding_init(void) >>> register_inetaddr_notifier(&bond_inetaddr_notifier); >>> bond_register_ipv6_notifier(); >>> >>>+ bond_handle_frame_hook = bond_handle_frame; >>>+ >>> goto out; >>> err: >>> list_for_each_entry(bond, &bond_dev_list, bond_list) { >>>@@ -5249,6 +5295,9 @@ static void __exit bonding_exit(void) >>> rtnl_lock(); >>> bond_free_all(); >>> rtnl_unlock(); >>>+ >>>+ bond_handle_frame_hook = NULL; >>>+ >>> } >>> >>> module_init(bonding_init); >>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c >>>index 18cf478..0e9f0e9 100644 >>>--- a/drivers/net/bonding/bond_sysfs.c >>>+++ b/drivers/net/bonding/bond_sysfs.c >>>@@ -484,8 +484,9 @@ static ssize_t bonding_store_arp_validate(struct device *d, >>> struct device_attribute *attr, >>> const char *buf, size_t count) >>> { >>>- int new_value; >>>+ int new_value, i; >>> struct bonding *bond = to_bond(d); >>>+ struct slave *slave; >>> >>> new_value = bond_parse_parm(buf, arp_validate_tbl); >>> if (new_value < 0) { >>>@@ -504,13 +505,12 @@ static ssize_t bonding_store_arp_validate(struct device *d, >>> bond->dev->name, arp_validate_tbl[new_value].modename, >>> new_value); >>> >>>- if (!bond->params.arp_validate && new_value) { >>>- bond_register_arp(bond); >>>- } else if (bond->params.arp_validate && !new_value) { >>>- bond_unregister_arp(bond); >>>- } >>>+ if (bond->params.arp_validate != new_value) { >>>+ bond->params.arp_validate = new_value; >>> >>>- bond->params.arp_validate = new_value; >>>+ bond_for_each_slave(bond, slave, i) >>>+ bond_fix_slave_validate_flag(bond, slave); >>>+ } >>> >>> return count; >>> } >>>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h >>>index ca849d2..275f08d 100644 >>>--- a/drivers/net/bonding/bonding.h >>>+++ b/drivers/net/bonding/bonding.h >>>@@ -283,6 +283,15 @@ static inline unsigned long slave_last_rx(struct bonding *bond, >>> return slave->dev->last_rx; >>> } >>> >>>+static inline void bond_fix_slave_validate_flag(struct bonding *bond, >>>+ struct slave *slave) >>>+{ >>>+ if (slave_do_arp_validate(bond, slave)) >>>+ slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; >>>+ else >>>+ slave->dev->priv_flags &= ~IFF_SLAVE_NEEDARP; >>>+} >>>+ >>> static inline void bond_set_slave_inactive_flags(struct slave *slave) >>> { >>> struct bonding *bond = netdev_priv(slave->dev->master); >>>@@ -290,14 +299,16 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave) >>> bond->params.mode != BOND_MODE_ALB) >>> slave->state = BOND_STATE_BACKUP; >>> slave->dev->priv_flags |= IFF_SLAVE_INACTIVE; >>>- if (slave_do_arp_validate(bond, slave)) >>>- slave->dev->priv_flags |= IFF_SLAVE_NEEDARP; >>>+ bond_fix_slave_validate_flag(bond, slave); >>> } >>> >>> static inline void bond_set_slave_active_flags(struct slave *slave) >>> { >>>+ struct bonding *bond = netdev_priv(slave->dev->master); >>>+ >>> slave->state = BOND_STATE_ACTIVE; >>>- slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP); >>>+ slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE; >>>+ bond_fix_slave_validate_flag(bond, slave); >>> } >>> >>> static inline void bond_set_master_3ad_flags(struct bonding *bond) >>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>index 2e7783f..3dafd8f 100644 >>>--- a/include/linux/netdevice.h >>>+++ b/include/linux/netdevice.h >>>@@ -1874,6 +1874,7 @@ static inline void netif_set_gso_max_size(struct net_device *dev, >>> dev->gso_max_size = size; >>> } >>> >>>+#if 0 >>> /* On bonding slaves other than the currently active slave, suppress >>> * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and >>> * ARP on active-backup slaves with arp_validate enabled. >>>@@ -1906,6 +1907,8 @@ static inline int skb_bond_should_drop(struct sk_buff *skb) >>> } >>> return 0; >>> } >>>+#endif >>>+extern int skb_bond_should_drop(struct sk_buff *skb); >>> >>> extern struct pernet_operations __net_initdata loopback_net_ops; >>> #endif /* __KERNEL__ */ >>>diff --git a/net/core/dev.c b/net/core/dev.c >>>index 91d792d..ac5a37e 100644 >>>--- a/net/core/dev.c >>>+++ b/net/core/dev.c >>>@@ -2098,6 +2098,56 @@ static inline struct sk_buff *handle_macvlan(struct sk_buff *skb, >>> #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb) >>> #endif >>> >>>+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE) >>>+int (*bond_handle_frame_hook)(struct sk_buff *skb); >>>+EXPORT_SYMBOL_GPL(bond_handle_frame_hook); >>>+ >>>+/* On bonding slaves other than the currently active slave, suppress >>>+ * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and >>>+ * ARP on active-backup slaves with arp_validate enabled. >>>+ */ >>>+int skb_bond_should_drop(struct sk_buff *skb) >>>+{ >>>+ struct net_device *dev = skb->dev; >>>+ struct net_device *master = dev->master; >>>+ >>>+ if (master) >>>+ return bond_handle_frame_hook(skb); >> >>Maybe this hook can be called from netif_receive_skb() directly. You would safe >>at least 2 dereferences, 1 if check. You would also need to add >>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common(). > > This won't work, because the VLAN code reassigns skb->dev to the >VLAN device before calling netif_receive_skb. Sure, but bond_should_drop is called before it actually reassigns that. So the check in bond_should_drop tests "original_dev->master". I had on mind something like following: Signed-off-by: Jiri Pirko <jpirko@redhat.com> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index c67fe6f..87a7334 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, if (netpoll_rx(skb)) return NET_RX_DROP; - if (skb_bond_should_drop(skb)) + if (skb->dev->master && bond_handle_frame_hook(skb)) goto drop; skb->vlan_tci = vlan_tci; @@ -79,7 +79,7 @@ static int vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp, { struct sk_buff *p; - if (skb_bond_should_drop(skb)) + if (skb->dev->master && bond_handle_frame_hook(skb)) goto drop; skb->vlan_tci = vlan_tci; diff --git a/net/core/dev.c b/net/core/dev.c index 0590b2a..280e3de 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2282,7 +2282,7 @@ int netif_receive_skb(struct sk_buff *skb) null_or_orig = NULL; orig_dev = skb->dev; if (orig_dev->master) { - if (skb_bond_should_drop(skb)) + if (bond_handle_frame_hook(skb)) null_or_orig = orig_dev; /* deliver only exact match */ else skb->dev = orig_dev->master; Note I did not even compile that. > > In the VLAN path, the second call to skb_bond_should_drop (the >first is within the VLAN code itself, __vlan_hwaccel_rx or >vlan_gro_common, the second is netif_receive_skb) won't call the hook, >because the VLAN device has no dev->master. > > This is the whole reason for the hook: to process the ARP before >VLAN reassigns skb->dev. Once that happens, the actual device the ARP >arrived on is lost. > > -J > > >>>+ >>>+ return 0; >>>+ >>>+#if 0 >>>+ if (master->priv_flags & IFF_MASTER_ARPMON) >>>+ dev->last_rx = jiffies; >>>+ >>>+ if (dev->priv_flags & IFF_SLAVE_INACTIVE) { >>>+ if ((dev->priv_flags & IFF_SLAVE_NEEDARP) && >>>+ skb->protocol == __constant_htons(ETH_P_ARP)) >>>+ return 0; >>>+ >>>+ if (master->priv_flags & IFF_MASTER_ALB) { >>>+ if (skb->pkt_type != PACKET_BROADCAST && >>>+ skb->pkt_type != PACKET_MULTICAST) >>>+ return 0; >>>+ } >>>+ if (master->priv_flags & IFF_MASTER_8023AD && >>>+ skb->protocol == __constant_htons(ETH_P_SLOW)) >>>+ return 0; >>>+ >>>+ return 1; >>>+ } >>>+ } >>>+ return 0; >>>+#endif /* 0 */ >>>+} >>>+#else >>>+int skb_bond_should_drop(struct sk_buff *skb) >>>+{ >>>+ return 0; >>>+} >>>+#endif >>>+EXPORT_SYMBOL_GPL(skb_bond_should_drop); >>>+ >>> #ifdef CONFIG_NET_CLS_ACT >>> /* TODO: Maybe we should just force sch_ingress to be compiled in >>> * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions >>> >>> >>>--- >>> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com >>>-- >>>To unsubscribe from this list: send the line "unsubscribe netdev" in >>>the body of a message to majordomo@vger.kernel.org >>>More majordomo info at http://vger.kernel.org/majordomo-info.html >>-- >>To unsubscribe from this list: send the line "unsubscribe netdev" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 15:20 ` Jiri Pirko @ 2009-04-23 18:34 ` Jay Vosburgh 2009-04-23 19:22 ` Jiri Pirko 0 siblings, 1 reply; 19+ messages in thread From: Jay Vosburgh @ 2009-04-23 18:34 UTC (permalink / raw) To: Jiri Pirko; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List Jiri Pirko <jpirko@redhat.com> wrote: >Thu, Apr 23, 2009 at 04:59:41PM CEST, fubar@us.ibm.com wrote: >>Jiri Pirko <jpirko@redhat.com> wrote: [...] >>>>+{ >>>>+ struct net_device *dev = skb->dev; >>>>+ struct net_device *master = dev->master; >>>>+ >>>>+ if (master) >>>>+ return bond_handle_frame_hook(skb); >>> >>>Maybe this hook can be called from netif_receive_skb() directly. You would safe >>>at least 2 dereferences, 1 if check. You would also need to add >>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common(). >> >> This won't work, because the VLAN code reassigns skb->dev to the >>VLAN device before calling netif_receive_skb. > >Sure, but bond_should_drop is called before it actually reassigns that. So the >check in bond_should_drop tests "original_dev->master". > >I had on mind something like following: > >Signed-off-by: Jiri Pirko <jpirko@redhat.com> > >diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c >index c67fe6f..87a7334 100644 >--- a/net/8021q/vlan_core.c >+++ b/net/8021q/vlan_core.c >@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, > if (netpoll_rx(skb)) > return NET_RX_DROP; > >- if (skb_bond_should_drop(skb)) >+ if (skb->dev->master && bond_handle_frame_hook(skb)) [...] Yah, ok, I see what you mean. The same could be accomplished by turning skb_bond_should_drop back into an inline in the header file, and hiding the fiddly bits from the calling context. It's pretty grotty no matter how it's done; I'd prefer to avoid the whole hook business, but I haven't thought of a less bad way. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: bond interface arp, vlan and trunk / network question 2009-04-23 18:34 ` Jay Vosburgh @ 2009-04-23 19:22 ` Jiri Pirko 0 siblings, 0 replies; 19+ messages in thread From: Jiri Pirko @ 2009-04-23 19:22 UTC (permalink / raw) To: Jay Vosburgh; +Cc: stefan novak, Eric Dumazet, linux-kernel, Linux Netdev List Thu, Apr 23, 2009 at 08:34:20PM CEST, fubar@us.ibm.com wrote: >Jiri Pirko <jpirko@redhat.com> wrote: >>Thu, Apr 23, 2009 at 04:59:41PM CEST, fubar@us.ibm.com wrote: >>>Jiri Pirko <jpirko@redhat.com> wrote: >[...] >>>>>+{ >>>>>+ struct net_device *dev = skb->dev; >>>>>+ struct net_device *master = dev->master; >>>>>+ >>>>>+ if (master) >>>>>+ return bond_handle_frame_hook(skb); >>>> >>>>Maybe this hook can be called from netif_receive_skb() directly. You would safe >>>>at least 2 dereferences, 1 if check. You would also need to add >>>>"skb->dev->master &&" to if in __vlan_hwaccel_rx() and vlan_gro_common(). >>> >>> This won't work, because the VLAN code reassigns skb->dev to the >>>VLAN device before calling netif_receive_skb. >> >>Sure, but bond_should_drop is called before it actually reassigns that. So the >>check in bond_should_drop tests "original_dev->master". >> >>I had on mind something like following: >> >>Signed-off-by: Jiri Pirko <jpirko@redhat.com> >> >>diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c >>index c67fe6f..87a7334 100644 >>--- a/net/8021q/vlan_core.c >>+++ b/net/8021q/vlan_core.c >>@@ -11,7 +11,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp, >> if (netpoll_rx(skb)) >> return NET_RX_DROP; >> >>- if (skb_bond_should_drop(skb)) >>+ if (skb->dev->master && bond_handle_frame_hook(skb)) >[...] > > Yah, ok, I see what you mean. The same could be accomplished by >turning skb_bond_should_drop back into an inline in the header file, and Well no exactly. In vlan_x certainly yes but in netif_receive_skb() you safe something by not checking ->master twice and 2 dereferences. Just noting when I see the waste... >hiding the fiddly bits from the calling context. > > It's pretty grotty no matter how it's done; I'd prefer to avoid >the whole hook business, but I haven't thought of a less bad way. Yes I agree hooking this sucks :/ Anyway, is there any movement to eliminate all these hooks from netif_receive_skb() ? Jirka > > -J > >--- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-04-23 19:23 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-20 17:50 bond interface arp, vlan and trunk / network question stefan novak 2009-04-20 18:16 ` Eric Dumazet 2009-04-20 18:37 ` Jay Vosburgh 2009-04-20 20:00 ` stefan novak 2009-04-20 20:36 ` Jay Vosburgh 2009-04-20 21:03 ` stefan novak 2009-04-20 21:15 ` Eric Dumazet 2009-04-20 21:23 ` Jay Vosburgh 2009-04-20 21:39 ` stefan novak 2009-04-21 0:01 ` Jay Vosburgh 2009-04-22 8:29 ` stefan novak 2009-04-23 1:12 ` Jay Vosburgh 2009-04-23 5:58 ` Eric Dumazet 2009-04-23 15:38 ` Jay Vosburgh 2009-04-23 7:48 ` Jiri Pirko 2009-04-23 14:59 ` Jay Vosburgh 2009-04-23 15:20 ` Jiri Pirko 2009-04-23 18:34 ` Jay Vosburgh 2009-04-23 19:22 ` Jiri Pirko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.