All of lore.kernel.org
 help / color / mirror / Atom feed
* sky2 bonding problem, 802.3ad
@ 2007-02-15 18:55 Holger Eitzenberger
  2007-02-15 19:26 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Holger Eitzenberger @ 2007-02-15 18:55 UTC (permalink / raw)
  To: shemminger; +Cc: sk-drivers, netdev

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]

Hi Steven,

I have problems using sky2 v1.10 with with bonding driver (802.3ad),
on 'Marvell 88E8053 PCI-E Gigabit Ethernet Controller'.  I have attached
the full lspci output.

My test was to setup a bond of two physical links (both links same
hardware) and ping 192.168.11.10, which is the address of the switch
itself.

I have tested v1.10 with kernel 2.6.19 and 2.6.16.36 (own backport),
which despite the bonding problem runs fine.  Both, kernel 2.6.19 and
2.6.16.36 show the same behaviour.  The 802.3ad aware switch is a Dell
PowerConnect 5324.  VLAN is not configured on all switch ports.  Another
test on a host running kernel 2.6.18.2 with two e1000's bonded runs
fine.  Using sk98lin (v8.41 & v10.0.4) worked also.

The host containing the Yukon-II has a total of 8 NICs, 4 PIC and 4
PCI-E, see attached lspci output.  The failed bond was created from two
PCI-E interfaces.

Find attached a short script which I use to set up the bond on both hosts,
also attached is a procfile (/proc/net/bonding/bond0) from
the "e1000 host" with a working bond as well as the procfile from the host
with the Yukon-II cards.

When looking at the working bond I see that both slave interfaces are
IFF_UP, the load is shared over both links.  When looking at the failing
sky2 bond I see that the bond is not IFF_UP, whereas both slave
interfaces are IFF_UP.  The 802.3ad "partner MAC address" is left all zero's, 
also both interfaces have different Aggregator IDs (1 & 2).  One of the
two failing interfaces always has IFF_NOARP set, caused by code
calling bond_main.c:bond_set_slave_inactive_flags().

I used both use_carrier=1 (default) as well as miimon=50 without luck.

Going through the bonding code, and comparing sky2 source to the e100
code, which I am quite familiar with, I see that sky2 does not use the
generic MII interface, which might point in the right direction.

I am currently going through the bonding code and try to understand the
master <-> slave <-> sky2 interaction, basically this is either through
calling the sky2 net_device ops and through the ethtool ops.

If you need further info or further testing from my side: i will gladly
do that.

Besides that, thanks for a great driver!

   /holger


[-- Attachment #2: bonding-prob.tgz --]
[-- Type: application/x-gtar, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 18:55 sky2 bonding problem, 802.3ad Holger Eitzenberger
@ 2007-02-15 19:26 ` Stephen Hemminger
  2007-02-15 20:13   ` Holger Eitzenberger
  2007-02-15 20:13 ` Jay Vosburgh
  2007-02-15 21:31 ` Andy Gospodarek
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2007-02-15 19:26 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: sk-drivers, netdev


> I used both use_carrier=1 (default) as well as miimon=50 without luck.

use_carrier should work (since device reports carrier transistions).

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 18:55 sky2 bonding problem, 802.3ad Holger Eitzenberger
  2007-02-15 19:26 ` Stephen Hemminger
@ 2007-02-15 20:13 ` Jay Vosburgh
  2007-02-15 21:09   ` Holger Eitzenberger
  2007-02-16  9:15   ` Holger Eitzenberger
  2007-02-15 21:31 ` Andy Gospodarek
  2 siblings, 2 replies; 12+ messages in thread
From: Jay Vosburgh @ 2007-02-15 20:13 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: shemminger, sk-drivers, netdev

Holger Eitzenberger <holger@my-eitzenberger.de> wrote:

>I have tested v1.10 with kernel 2.6.19 and 2.6.16.36 (own backport),
>which despite the bonding problem runs fine.  Both, kernel 2.6.19 and
>2.6.16.36 show the same behaviour.  The 802.3ad aware switch is a Dell
>PowerConnect 5324.  VLAN is not configured on all switch ports.  Another
>test on a host running kernel 2.6.18.2 with two e1000's bonded runs
>fine.  Using sk98lin (v8.41 & v10.0.4) worked also.

	The log you included (with debug turned on) indicates that
bonding is at least attempting to send LACPDUs, but there are no log
entries for having received any LACPDUs.

	I'm unfamiliar with your particular switch, but usually this
kind of problem with bonding 802.3ad is in the switch interaction.  The
switches I have (Cisco) require that 802.3ad mode be explicitly enabled
on whichever ports it is desired on, so it may be worthwhile to check
your switch and make sure that it really is configured for 802.3ad on
the sky2 ports.

	If the switch is configured, you may want to also check to see
if it has counters for LACPDUs sent and received.  If the switch is not
sending and receiving LACPDUs on the appropriate ports, then it's more
likely to be a communications problem somewhere (vs. an 802.3ad
negotiation problem).

>When looking at the working bond I see that both slave interfaces are
>IFF_UP, the load is shared over both links.  When looking at the failing
>sky2 bond I see that the bond is not IFF_UP, whereas both slave
>interfaces are IFF_UP.  The 802.3ad "partner MAC address" is left all zero's, 
>also both interfaces have different Aggregator IDs (1 & 2).  One of the
>two failing interfaces always has IFF_NOARP set, caused by code
>calling bond_main.c:bond_set_slave_inactive_flags().

	For the version of bonding in your dmesg log, the IFF_NOARP is
expected; 802.3ad will select one aggregator as the active one, the
other aggregators will be marked inactive, and that sets IFF_NOARP.
Since no LACPDUs have been exchanged, bonding is leaving each interface
as a separate aggregator.  Versions of bonding later than February 2006
(your proc-bond0-ok for example) don't set the IFF_NOARP on inactive
slaves (a new mechanism is used that doesn't mess with the flags).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 19:26 ` Stephen Hemminger
@ 2007-02-15 20:13   ` Holger Eitzenberger
  0 siblings, 0 replies; 12+ messages in thread
From: Holger Eitzenberger @ 2007-02-15 20:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: sk-drivers, netdev

Stephen Hemminger <shemminger@linux-foundation.org> writes:

>> I used both use_carrier=1 (default) as well as miimon=50 without luck.
>
> use_carrier should work (since device reports carrier transistions).

As you can see in the script I used both use_carrier and miimon (in
combinations) without success.  In fact use_carrier is the default if no
other options are set.

  /holger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 20:13 ` Jay Vosburgh
@ 2007-02-15 21:09   ` Holger Eitzenberger
  2007-02-15 21:43     ` Neil Horman
  2007-02-16  9:15   ` Holger Eitzenberger
  1 sibling, 1 reply; 12+ messages in thread
From: Holger Eitzenberger @ 2007-02-15 21:09 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: shemminger, sk-drivers, netdev

Jay Vosburgh <fubar@us.ibm.com> writes:

> 	I'm unfamiliar with your particular switch, but usually this
> kind of problem with bonding 802.3ad is in the switch interaction.  The
> switches I have (Cisco) require that 802.3ad mode be explicitly enabled
> on whichever ports it is desired on, so it may be worthwhile to check
> your switch and make sure that it really is configured for 802.3ad on
> the sky2 ports.

I am currently using port 1&2 and port 9&10 for bonding and have
configured all four ports for the same aggregator ID 1, LCAP enabled.
I also switched ports, that is, I changed host1 from using port 1&2 to
use port 9&10 and vice versa.  Note that I also used sk98lin which
worked in my setup also.  Do you still think it is a misconfigured
switch?

> 	If the switch is configured, you may want to also check to see
> if it has counters for LACPDUs sent and received.  If the switch is not
> sending and receiving LACPDUs on the appropriate ports, then it's more
> likely to be a communications problem somewhere (vs. an 802.3ad
> negotiation problem).

I will check tomorrow morning whether I see the LACPDUs in the log and
report.

Any more tests which may be helpfull?

Thanks.  /holger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 18:55 sky2 bonding problem, 802.3ad Holger Eitzenberger
  2007-02-15 19:26 ` Stephen Hemminger
  2007-02-15 20:13 ` Jay Vosburgh
@ 2007-02-15 21:31 ` Andy Gospodarek
  2007-02-15 22:07   ` Andy Gospodarek
  2007-02-15 22:27   ` Holger Eitzenberger
  2 siblings, 2 replies; 12+ messages in thread
From: Andy Gospodarek @ 2007-02-15 21:31 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: shemminger, sk-drivers, netdev

On Thu, Feb 15, 2007 at 07:55:42PM +0100, Holger Eitzenberger wrote:
> Hi Steven,
> 
> I have problems using sky2 v1.10 with with bonding driver (802.3ad),
> on 'Marvell 88E8053 PCI-E Gigabit Ethernet Controller'.  I have attached
> the full lspci output.
> 
> My test was to setup a bond of two physical links (both links same
> hardware) and ping 192.168.11.10, which is the address of the switch
> itself.
> 
> I have tested v1.10 with kernel 2.6.19 and 2.6.16.36 (own backport),
> which despite the bonding problem runs fine.  Both, kernel 2.6.19 and
> 2.6.16.36 show the same behaviour.  The 802.3ad aware switch is a Dell
> PowerConnect 5324.  VLAN is not configured on all switch ports.  Another
> test on a host running kernel 2.6.18.2 with two e1000's bonded runs
> fine.  Using sk98lin (v8.41 & v10.0.4) worked also.


I get the impression that sky2 has never worked for you.  Is that
correct?  There was an skge problem I noticed a while ago where on reset
the multicast membership list was cleared.  

commit 758140900a82e3ed3bb2be1d4705dd352fe44825
Author: Stephen Hemminger <shemminger@osdl.org>
Date:   Fri Dec 1 11:41:08 2006 -0800

    [PATCH] skge: don't clear MC state on link down

    I would rather fix Andy's problem by not clearing
    multicast information on link down.

    Also, add code to restore multicast state after ethtool phy reset.

    Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>



A patch Having this list cleared could stop you from receiving 802.3ad
PDUs.  I'll check skge and see if it has the same problem (I'm betting
on it).






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 21:09   ` Holger Eitzenberger
@ 2007-02-15 21:43     ` Neil Horman
  0 siblings, 0 replies; 12+ messages in thread
From: Neil Horman @ 2007-02-15 21:43 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: Jay Vosburgh, shemminger, sk-drivers, netdev

On Thu, Feb 15, 2007 at 10:09:40PM +0100, Holger Eitzenberger wrote:
> Jay Vosburgh <fubar@us.ibm.com> writes:
> 
> > 	I'm unfamiliar with your particular switch, but usually this
> > kind of problem with bonding 802.3ad is in the switch interaction.  The
> > switches I have (Cisco) require that 802.3ad mode be explicitly enabled
> > on whichever ports it is desired on, so it may be worthwhile to check
> > your switch and make sure that it really is configured for 802.3ad on
> > the sky2 ports.
> 
> I am currently using port 1&2 and port 9&10 for bonding and have
> configured all four ports for the same aggregator ID 1, LCAP enabled.
> I also switched ports, that is, I changed host1 from using port 1&2 to
> use port 9&10 and vice versa.  Note that I also used sk98lin which
> worked in my setup also.  Do you still think it is a misconfigured
> switch?
> 
> > 	If the switch is configured, you may want to also check to see
> > if it has counters for LACPDUs sent and received.  If the switch is not
> > sending and receiving LACPDUs on the appropriate ports, then it's more
> > likely to be a communications problem somewhere (vs. an 802.3ad
> > negotiation problem).
> 
> I will check tomorrow morning whether I see the LACPDUs in the log and
> report.
> 
> Any more tests which may be helpfull?
> 
If I had to guess I'd say that sky2 wasn't setting its multicast list properly,
or the bonding driver wasn't telling it too.  IIRC LACPDU's are received on a
reserved multicast MAC address, which the hardware needs to be told to receive.
If the bonding driver isn't receving those frames (which I think you should be
able to tell by looking at the sky2 rx_multicast stat with ethtool.  If the
value isn't going up then you aren't getting LACPDU frames). The hardware should
have that the lacpdu multicast address added during the enslaving process (via
bond enslave).  I'd start instrumenting that part of the driver, as well as
sky2.c's set_multicast_list method, to see if anything is going awry.

Regards
Neil

> Thanks.  /holger
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 21:31 ` Andy Gospodarek
@ 2007-02-15 22:07   ` Andy Gospodarek
  2007-02-15 22:27   ` Holger Eitzenberger
  1 sibling, 0 replies; 12+ messages in thread
From: Andy Gospodarek @ 2007-02-15 22:07 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Holger Eitzenberger, shemminger, sk-drivers, netdev

On Thu, Feb 15, 2007 at 04:31:36PM -0500, Andy Gospodarek wrote:
> On Thu, Feb 15, 2007 at 07:55:42PM +0100, Holger Eitzenberger wrote:
> > Hi Steven,
> > 
> > I have problems using sky2 v1.10 with with bonding driver (802.3ad),
> > on 'Marvell 88E8053 PCI-E Gigabit Ethernet Controller'.  I have attached
> > the full lspci output.
> > 
> > My test was to setup a bond of two physical links (both links same
> > hardware) and ping 192.168.11.10, which is the address of the switch
> > itself.
> > 
> > I have tested v1.10 with kernel 2.6.19 and 2.6.16.36 (own backport),
> > which despite the bonding problem runs fine.  Both, kernel 2.6.19 and
> > 2.6.16.36 show the same behaviour.  The 802.3ad aware switch is a Dell
> > PowerConnect 5324.  VLAN is not configured on all switch ports.  Another
> > test on a host running kernel 2.6.18.2 with two e1000's bonded runs
> > fine.  Using sk98lin (v8.41 & v10.0.4) worked also.
> 
> 
> I get the impression that sky2 has never worked for you.  Is that
> correct?  There was an skge problem I noticed a while ago where on reset
> the multicast membership list was cleared.  
> 
> commit 758140900a82e3ed3bb2be1d4705dd352fe44825
> Author: Stephen Hemminger <shemminger@osdl.org>
> Date:   Fri Dec 1 11:41:08 2006 -0800
> 
>     [PATCH] skge: don't clear MC state on link down
> 
>     I would rather fix Andy's problem by not clearing
>     multicast information on link down.
> 
>     Also, add code to restore multicast state after ethtool phy reset.
> 
>     Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
>     Signed-off-by: Jeff Garzik <jeff@garzik.org>
> 
> 
> 
> A patch Having this list cleared could stop you from receiving 802.3ad
> PDUs.  I'll check skge and see if it has the same problem (I'm betting
> on it).
> 
> 

After a quick peek this doesn't look like it's the issue.  The skge
problem was apparent because when you pulled the link the multicast
memberships disappeared....

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 21:31 ` Andy Gospodarek
  2007-02-15 22:07   ` Andy Gospodarek
@ 2007-02-15 22:27   ` Holger Eitzenberger
  1 sibling, 0 replies; 12+ messages in thread
From: Holger Eitzenberger @ 2007-02-15 22:27 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: shemminger, sk-drivers, netdev

Andy Gospodarek <andy@greyhouse.net> writes:

> I get the impression that sky2 has never worked for you.  Is that
> correct?  There was an skge problem I noticed a while ago where on reset
> the multicast membership list was cleared.  

Well, when it comes to bonding: yes, almost :).  When I noticed the fact
that IFF_NOARP was set on the other interface I experimented a bit with
these flags, ala

  ip l set dev eth0 arp on|off

and I 2-3 times had a link.

  /holger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: sky2 bonding problem, 802.3ad
  2007-02-15 20:13 ` Jay Vosburgh
  2007-02-15 21:09   ` Holger Eitzenberger
@ 2007-02-16  9:15   ` Holger Eitzenberger
  2007-02-16  9:28     ` [Sk-drivers] " Holger Eitzenberger
  1 sibling, 1 reply; 12+ messages in thread
From: Holger Eitzenberger @ 2007-02-16  9:15 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: shemminger, sk-drivers, netdev

Jay Vosburgh <fubar@us.ibm.com> writes:

> 	The log you included (with debug turned on) indicates that
> bonding is at least attempting to send LACPDUs, but there are no log
> entries for having received any LACPDUs.

Yes, the log clearly shows that the LACPDUs are sent, at least bonding
thinks so.  I just checked for received mcast packages on the switch:
yes, they come in frequently on all four ports.  Though I can't check or
Slow_Protocols_Multicast address specifically.

> 	If the switch is configured, you may want to also check to see
> if it has counters for LACPDUs sent and received.  If the switch is not
> sending and receiving LACPDUs on the appropriate ports, then it's more
> likely to be a communications problem somewhere (vs. an 802.3ad
> negotiation problem).

I did not find a dedicated stat for received LACPDUs.

I tried some settings on the switch webinterface.  Interesting: if I
toggle the 'Admin Flow Control' from either 'enabled' to
'autonegotiation' I immediately receive the switch LACPCUs, see:

 bonding: ad_tx_machine() 1210: Sent LACPDU on port 1
 bonding: bond_3ad_rx_indication() 2175: Received LACPDU on port 1
 bonding: ad_rx_machine() 1123: Rx Machine: Port=1, Last State=6, Curr
   State=6

If I then disable the bond (rmmod inclusive) and then enable the bond
again there are no received LACPDUs from the switch.  Mmmpff...

> 	For the version of bonding in your dmesg log, the IFF_NOARP is
> expected; 802.3ad will select one aggregator as the active one, the
> other aggregators will be marked inactive, and that sets IFF_NOARP.
> Since no LACPDUs have been exchanged, bonding is leaving each interface
> as a separate aggregator.  Versions of bonding later than February 2006
> (your proc-bond0-ok for example) don't set the IFF_NOARP on inactive
> slaves (a new mechanism is used that doesn't mess with the flags).

Yes, I already saw in the code that IFF_NOARP was expected.

Any ideas?

  /holger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Sk-drivers] sky2 bonding problem, 802.3ad
  2007-02-16  9:15   ` Holger Eitzenberger
@ 2007-02-16  9:28     ` Holger Eitzenberger
  2007-02-16 20:34       ` Jay Vosburgh
  0 siblings, 1 reply; 12+ messages in thread
From: Holger Eitzenberger @ 2007-02-16  9:28 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, sk-drivers

Holger Eitzenberger <holger@my-eitzenberger.de> writes:

>  bonding: ad_tx_machine() 1210: Sent LACPDU on port 1
>  bonding: bond_3ad_rx_indication() 2175: Received LACPDU on port 1
>  bonding: ad_rx_machine() 1123: Rx Machine: Port=1, Last State=6, Curr
>    State=6

One important point I forgot about in my previous post: the host
receives LACPDUs, but still the bond does not work.

Also note that the other e1000 bond works happily: it is in fact the
link which I use to configure it currently (via telnet).

  /holger

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Sk-drivers] sky2 bonding problem, 802.3ad
  2007-02-16  9:28     ` [Sk-drivers] " Holger Eitzenberger
@ 2007-02-16 20:34       ` Jay Vosburgh
  0 siblings, 0 replies; 12+ messages in thread
From: Jay Vosburgh @ 2007-02-16 20:34 UTC (permalink / raw)
  To: Holger Eitzenberger; +Cc: netdev, sk-drivers

Holger Eitzenberger <holger@my-eitzenberger.de> wrote:

>Holger Eitzenberger <holger@my-eitzenberger.de> writes:
>
>>  bonding: ad_tx_machine() 1210: Sent LACPDU on port 1
>>  bonding: bond_3ad_rx_indication() 2175: Received LACPDU on port 1
>>  bonding: ad_rx_machine() 1123: Rx Machine: Port=1, Last State=6, Curr
>>    State=6
>
>One important point I forgot about in my previous post: the host
>receives LACPDUs, but still the bond does not work.
>
>Also note that the other e1000 bond works happily: it is in fact the
>link which I use to configure it currently (via telnet).

	Can you send me (off list, I'm thinking) the full bonding debug
dmesg log as well as a tcpdump trace of the LACP exchange (i.e., the
case that LACPDUs are both sent and received)?  The raw tcpdump
("tcpdump -w somefile") is better than the formatted text output.

	I don't have either of the pieces of hardware you're using (the
switch or the sky2 card), so I'm hoping there will be some gleaming
nuggets of evidence in the logs.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-02-16 20:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-15 18:55 sky2 bonding problem, 802.3ad Holger Eitzenberger
2007-02-15 19:26 ` Stephen Hemminger
2007-02-15 20:13   ` Holger Eitzenberger
2007-02-15 20:13 ` Jay Vosburgh
2007-02-15 21:09   ` Holger Eitzenberger
2007-02-15 21:43     ` Neil Horman
2007-02-16  9:15   ` Holger Eitzenberger
2007-02-16  9:28     ` [Sk-drivers] " Holger Eitzenberger
2007-02-16 20:34       ` Jay Vosburgh
2007-02-15 21:31 ` Andy Gospodarek
2007-02-15 22:07   ` Andy Gospodarek
2007-02-15 22:27   ` Holger Eitzenberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.