bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
@ 2010-05-16 13:33 Krzysztof Oledzki
  2010-05-16 18:51 ` Michael Chan
  0 siblings, 1 reply; 18+ messages in thread
From: Krzysztof Oledzki @ 2010-05-16 13:33 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1531 bytes --]

Hello,

I have a Dell R610 server with BCM5709 NICs. The server has one, 4-core 
CPU (X5570) and four BCM5709 NICs onboard. I would like to assign each 
NIC's interrupt to a different CPU for a better performance. However, as I 
have 5 INTs to assign and only 4 CPUs available, it is not obvious how to 
do it right:

             CPU0       CPU1       CPU2       CPU3
   61:      85085          0          0          0   PCI-MSI-edge      eth1-0
   62:      23046          0          0          0   PCI-MSI-edge      eth1-1
   63:      24525          0          0          0   PCI-MSI-edge      eth1-2
   64:      77801          0          0          0   PCI-MSI-edge      eth1-3
   65:      24006          0          0          0   PCI-MSI-edge      eth1-4

# uname -r
2.6.33.3

# dmesg |grep  0000:01:00.0
bnx2 0000:01:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
bnx2 0000:01:00.0: setting latency timer to 64
bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-mips-09-5.0.0.j3.fw
bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-rv2p-09-5.0.0.j3.fw
bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
bnx2 0000:01:00.0: irq 69 for MSI/MSI-X

Why the driver registers 5 interrupts instead of 4? How to limit it to 4?

Best regards,

 				Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 13:33 bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3) Krzysztof Oledzki
@ 2010-05-16 18:51 ` Michael Chan
  2010-05-16 19:24   ` Krzysztof Olędzki
  2010-05-18 15:35   ` Krzysztof Olędzki
  0 siblings, 2 replies; 18+ messages in thread
From: Michael Chan @ 2010-05-16 18:51 UTC (permalink / raw)
  To: 'Krzysztof Oledzki'; +Cc: netdev

Krzysztof Oledzki wrote:

>
> Why the driver registers 5 interrupts instead of 4? How to
> limit it to 4?
>

The first vector (eth0-0) handles link interrupt and other slow
path events.  It also has an RX ring for non-IP packets that are
not hashed by the RSS hash.  The majority of the rx packets should
be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
vectors to different CPUs.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 18:51 ` Michael Chan
@ 2010-05-16 19:24   ` Krzysztof Olędzki
  2010-05-16 19:49     ` Krzysztof Olędzki
  2010-05-16 20:00     ` Michael Chan
  2010-05-18 15:35   ` Krzysztof Olędzki
  1 sibling, 2 replies; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-16 19:24 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

On 2010-05-16 20:51, Michael Chan wrote:
> Krzysztof Oledzki wrote:
> 
>>
>> Why the driver registers 5 interrupts instead of 4? How to
>> limit it to 4?
>>
> 
> The first vector (eth0-0) handles link interrupt and other slow
> path events.  It also has an RX ring for non-IP packets that are
> not hashed by the RSS hash.  The majority of the rx packets should
> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> vectors to different CPUs.

Thank you for your prompt response.

In my case the first vector must be handling something more:
 - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
 - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
 - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
 - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2

            CPU0       CPU1       CPU2       CPU3
  67:    1563979          0          0          0   PCI-MSI-edge      eth1-0
  68:    1072869          0          0          0   PCI-MSI-edge      eth1-1
  69:     137905          0          0          0   PCI-MSI-edge      eth1-2
  70:     259246          0          0          0   PCI-MSI-edge      eth1-3
  71:     760252          0          0          0   PCI-MSI-edge      eth1-4

As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.

So, it seems that TX or RX is always handled by the first vector.
I'll try to find if it is TX or RX.

BTW: I'm using .1Q vlans over bonding, does it change anything?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 19:24   ` Krzysztof Olędzki
@ 2010-05-16 19:49     ` Krzysztof Olędzki
  2010-05-16 20:00     ` Michael Chan
  1 sibling, 0 replies; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-16 19:49 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

On 2010-05-16 21:24, Krzysztof Olędzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
>> Krzysztof Oledzki wrote:
>>
>>>
>>> Why the driver registers 5 interrupts instead of 4? How to
>>> limit it to 4?
>>>
>>
>> The first vector (eth0-0) handles link interrupt and other slow
>> path events.  It also has an RX ring for non-IP packets that are
>> not hashed by the RSS hash.  The majority of the rx packets should
>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>> vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
>   - "ping -f 192.168.0.1" increases interrupts on both eth1-0 and eth1-4
>   - "ping -f 192.168.0.2" increases interrupts on both eth1-0 and eth1-3
>   - "ping -f 192.168.0.3" increases interrupts on both eth1-0 and eth1-1
>   - "ping -f 192.168.0.7" increases interrupts on both eth1-0 and eth1-2
>
>              CPU0       CPU1       CPU2       CPU3
>    67:    1563979          0          0          0   PCI-MSI-edge      eth1-0
>    68:    1072869          0          0          0   PCI-MSI-edge      eth1-1
>    69:     137905          0          0          0   PCI-MSI-edge      eth1-2
>    70:     259246          0          0          0   PCI-MSI-edge      eth1-3
>    71:     760252          0          0          0   PCI-MSI-edge      eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?

It looks like TX for locally generated packets is always performed on 
eth1-0. I guess it should look differently for forwarded packets?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 19:24   ` Krzysztof Olędzki
  2010-05-16 19:49     ` Krzysztof Olędzki
@ 2010-05-16 20:00     ` Michael Chan
  2010-05-16 20:15       ` Eric Dumazet
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Chan @ 2010-05-16 20:00 UTC (permalink / raw)
  To: 'Krzysztof Oledzki'; +Cc: netdev

Krzysztof Oledzki wrote:

> On 2010-05-16 20:51, Michael Chan wrote:
> > Krzysztof Oledzki wrote:
> >
> >>
> >> Why the driver registers 5 interrupts instead of 4? How to
> >> limit it to 4?
> >>
> >
> > The first vector (eth0-0) handles link interrupt and other slow
> > path events.  It also has an RX ring for non-IP packets that are
> > not hashed by the RSS hash.  The majority of the rx packets should
> > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > vectors to different CPUs.
>
> Thank you for your prompt response.
>
> In my case the first vector must be handling something more:
>  - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> and eth1-4
>  - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> and eth1-3
>  - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> and eth1-1
>  - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> and eth1-2
>
>             CPU0       CPU1       CPU2       CPU3
>   67:    1563979          0          0          0
> PCI-MSI-edge      eth1-0
>   68:    1072869          0          0          0
> PCI-MSI-edge      eth1-1
>   69:     137905          0          0          0
> PCI-MSI-edge      eth1-2
>   70:     259246          0          0          0
> PCI-MSI-edge      eth1-3
>   71:     760252          0          0          0
> PCI-MSI-edge      eth1-4
>
> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.

I think that ICMP ping packets will always go to ring 0 (eth1-0)
because they are non-IP packets.  I need to double check tomorrow
on how exactly the hashing works on RX.  Can you try running IP
traffic?  IP packets should theoretically go to rings 1 - 4.

>
> So, it seems that TX or RX is always handled by the first vector.
> I'll try to find if it is TX or RX.
>
> BTW: I'm using .1Q vlans over bonding, does it change anything?

That should not matter, as the VLAN tag is stripped before hashing.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 20:00     ` Michael Chan
@ 2010-05-16 20:15       ` Eric Dumazet
  2010-05-16 20:24         ` Michael Chan
  2010-05-16 20:34         ` Krzysztof Olędzki
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2010-05-16 20:15 UTC (permalink / raw)
  To: Michael Chan; +Cc: 'Krzysztof Oledzki', netdev

Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
> Krzysztof Oledzki wrote:
> 
> > On 2010-05-16 20:51, Michael Chan wrote:
> > > Krzysztof Oledzki wrote:
> > >
> > >>
> > >> Why the driver registers 5 interrupts instead of 4? How to
> > >> limit it to 4?
> > >>
> > >
> > > The first vector (eth0-0) handles link interrupt and other slow
> > > path events.  It also has an RX ring for non-IP packets that are
> > > not hashed by the RSS hash.  The majority of the rx packets should
> > > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > > vectors to different CPUs.
> >
> > Thank you for your prompt response.
> >
> > In my case the first vector must be handling something more:
> >  - "ping -f 192.168.0.1" increases interrupts on both eth1-0
> > and eth1-4
> >  - "ping -f 192.168.0.2" increases interrupts on both eth1-0
> > and eth1-3
> >  - "ping -f 192.168.0.3" increases interrupts on both eth1-0
> > and eth1-1
> >  - "ping -f 192.168.0.7" increases interrupts on both eth1-0
> > and eth1-2
> >
> >             CPU0       CPU1       CPU2       CPU3
> >   67:    1563979          0          0          0
> > PCI-MSI-edge      eth1-0
> >   68:    1072869          0          0          0
> > PCI-MSI-edge      eth1-1
> >   69:     137905          0          0          0
> > PCI-MSI-edge      eth1-2
> >   70:     259246          0          0          0
> > PCI-MSI-edge      eth1-3
> >   71:     760252          0          0          0
> > PCI-MSI-edge      eth1-4
> >
> > As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
> 
> I think that ICMP ping packets will always go to ring 0 (eth1-0)
> because they are non-IP packets.  I need to double check tomorrow
> on how exactly the hashing works on RX.  Can you try running IP
> traffic?  IP packets should theoretically go to rings 1 - 4.
> 

ICMP packets are IP packets (Protocol=1)

> >
> > So, it seems that TX or RX is always handled by the first vector.
> > I'll try to find if it is TX or RX.
> >
> > BTW: I'm using .1Q vlans over bonding, does it change anything?
> 
> That should not matter, as the VLAN tag is stripped before hashing.

warning, bonding currently is not multiqueue aware.

All tx packets through bonding will use txqueue 0, since bnx2 doesnt
provide a ndo_select_queue() function.






^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 20:15       ` Eric Dumazet
@ 2010-05-16 20:24         ` Michael Chan
  2010-05-16 20:34         ` Krzysztof Olędzki
  1 sibling, 0 replies; 18+ messages in thread
From: Michael Chan @ 2010-05-16 20:24 UTC (permalink / raw)
  To: 'Eric Dumazet'; +Cc: 'Krzysztof Oledzki', netdev

Eric Dumazet write:

> > I think that ICMP ping packets will always go to ring 0 (eth1-0)
> > because they are non-IP packets.  I need to double check tomorrow
> > on how exactly the hashing works on RX.  Can you try running IP
> > traffic?  IP packets should theoretically go to rings 1 - 4.
> >
>
> ICMP packets are IP packets (Protocol=1)
>

Sorry, Eric is right.  Anyway, I'll check on the hashing to see how
it works on UDP, TCP, and other packets.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 20:15       ` Eric Dumazet
  2010-05-16 20:24         ` Michael Chan
@ 2010-05-16 20:34         ` Krzysztof Olędzki
  2010-05-16 20:47           ` Eric Dumazet
  1 sibling, 1 reply; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-16 20:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev

On 2010-05-16 22:15, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 13:00 -0700, Michael Chan a écrit :
>> Krzysztof Oledzki wrote:
>>
>>> On 2010-05-16 20:51, Michael Chan wrote:
>>>> Krzysztof Oledzki wrote:
>>>>
>>>>>
>>>>> Why the driver registers 5 interrupts instead of 4? How to
>>>>> limit it to 4?
>>>>>
>>>>
>>>> The first vector (eth0-0) handles link interrupt and other slow
>>>> path events.  It also has an RX ring for non-IP packets that are
>>>> not hashed by the RSS hash.  The majority of the rx packets should
>>>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>>>> vectors to different CPUs.
>>>
>>> Thank you for your prompt response.
>>>
>>> In my case the first vector must be handling something more:
>>>   - "ping -f 192.168.0.1" increases interrupts on both eth1-0
>>> and eth1-4
>>>   - "ping -f 192.168.0.2" increases interrupts on both eth1-0
>>> and eth1-3
>>>   - "ping -f 192.168.0.3" increases interrupts on both eth1-0
>>> and eth1-1
>>>   - "ping -f 192.168.0.7" increases interrupts on both eth1-0
>>> and eth1-2
>>>
>>>              CPU0       CPU1       CPU2       CPU3
>>>    67:    1563979          0          0          0
>>> PCI-MSI-edge      eth1-0
>>>    68:    1072869          0          0          0
>>> PCI-MSI-edge      eth1-1
>>>    69:     137905          0          0          0
>>> PCI-MSI-edge      eth1-2
>>>    70:     259246          0          0          0
>>> PCI-MSI-edge      eth1-3
>>>    71:     760252          0          0          0
>>> PCI-MSI-edge      eth1-4
>>>
>>> As you can see, eth1-1 + eth1-2 + eth1-3 + eth1-4 ~= eth1-0.
>>
>> I think that ICMP ping packets will always go to ring 0 (eth1-0)
>> because they are non-IP packets.  I need to double check tomorrow
>> on how exactly the hashing works on RX.  Can you try running IP
>> traffic?  IP packets should theoretically go to rings 1 - 4.
>>
>
> ICMP packets are IP packets (Protocol=1)

Exactly. However, the firmware may handle ICMP and TCP in a different way.

>>> So, it seems that TX or RX is always handled by the first vector.
>>> I'll try to find if it is TX or RX.
>>>
>>> BTW: I'm using .1Q vlans over bonding, does it change anything?
>>
>> That should not matter, as the VLAN tag is stripped before hashing.
>
> warning, bonding currently is not multiqueue aware.
>
> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> provide a ndo_select_queue() function.

OK, that explains everything. Thank you Eric. I assume it may take some 
time for bonding to become multiqueue aware and/or bnx2x to provide 
ndo_select_queue?

BTW: With a normal router workload, should I expect big performance drop 
when receiving and forwarding the same packet using different CPUs? 
Bonding provides very important functionality, I'm not able to drop it. :(

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 20:34         ` Krzysztof Olędzki
@ 2010-05-16 20:47           ` Eric Dumazet
  2010-05-16 21:06             ` George B.
  2010-05-16 21:12             ` Krzysztof Olędzki
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2010-05-16 20:47 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev

Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:15, Eric Dumazet wrote:

> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> > provide a ndo_select_queue() function.
> 
> OK, that explains everything. Thank you Eric. I assume it may take some 
> time for bonding to become multiqueue aware and/or bnx2x to provide 
> ndo_select_queue?
> 

bonding might become multiqueue aware, there are several patches
floating around.

But with your ping tests, it wont change the selected txqueue anyway (it
will be the same for any targets, because skb_tx_hash() wont hash the
destination address, only the skb->protocol.

> BTW: With a normal router workload, should I expect big performance drop 
> when receiving and forwarding the same packet using different CPUs? 
> Bonding provides very important functionality, I'm not able to drop it. :(
> 

Not sure what you mean by forwarding same packet using different CPUs.
You probably meant different queues, because in normal case, only one
cpu is involved (the one receiving the packet is also the one
transmitting it, unless you have congestion or trafic shaping)

If you have 4 cpus, you can use following patch and have a transparent
bonding against multiqueue. Still bonding xmit path hits a global
rwlock, so performance is not what you can get without bonding.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5e12462..2c257f7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)

 	rtnl_lock();

-	bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
-				bond_setup);
+	bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
+				bond_setup, 4);
 	if (!bond_dev) {
 		pr_err("%s: eek! can't alloc netdev!\n", name);
 		rtnl_unlock();

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 20:47           ` Eric Dumazet
@ 2010-05-16 21:06             ` George B.
  2010-05-16 21:12             ` Krzysztof Olędzki
  1 sibling, 0 replies; 18+ messages in thread
From: George B. @ 2010-05-16 21:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Krzysztof Olędzki, Michael Chan, netdev

2010/5/16 Eric Dumazet <eric.dumazet@gmail.com>:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>> > All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>> > provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.
>
>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)
>
> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue. Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 5e12462..2c257f7 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>
>        rtnl_lock();
>
> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
> -                               bond_setup);
> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
> +                               bond_setup, 4);
>        if (!bond_dev) {
>                pr_err("%s: eek! can't alloc netdev!\n", name);
>                rtnl_unlock();
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

FWIW, I will be comparing VLANs on bonded ethernet interfaces compared
to bonded to vlan interfaces (create a vlan on two interfaces and bond
them together) later this week to see if I can notice any performance
difference. I am expecting I will when two or more vlans are
experiencing heavy traffic.  What concerns me is if one ethernet goes
away, will the bond interface see the ethernet underlying the vlan
interface has gone down?

So in summary, rather than bonding ethernet interfaces and then
applying vlans to the bond, I intend to create vlans on the ethernet
interfaces and bond them. So one bond interface per vlan plus one for
the "raw" interfaces.  I am hoping that will allow better throughput
with multiple processors (and less head-of-line blocking for vlans
with low traffic rates).  Note: that configuration doesn't work with
2.6.32, I haven't tried with 2.6.33, and it allows me to configure it
with 2.6.34-rc7 though I haven't tested it yet on a multiqueue
ethernet with multiple processors.  I should have some systems to test
with later this week.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 20:47           ` Eric Dumazet
  2010-05-16 21:06             ` George B.
@ 2010-05-16 21:12             ` Krzysztof Olędzki
  2010-05-16 21:26               ` Eric Dumazet
  1 sibling, 1 reply; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-16 21:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev

On 2010-05-16 22:47, Eric Dumazet wrote:
> Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
>> On 2010-05-16 22:15, Eric Dumazet wrote:
>
>>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
>>> provide a ndo_select_queue() function.
>>
>> OK, that explains everything. Thank you Eric. I assume it may take some
>> time for bonding to become multiqueue aware and/or bnx2x to provide
>> ndo_select_queue?
>>
>
> bonding might become multiqueue aware, there are several patches
> floating around.
>
> But with your ping tests, it wont change the selected txqueue anyway (it
> will be the same for any targets, because skb_tx_hash() wont hash the
> destination address, only the skb->protocol.

What do you mean by "wont hash the destination address, only the 
skb->protocol"? It won't hash the destination address for ICMP or for 
all IP protocols?

My normal workload is TCP and UDP based so if it is only ICMP then there 
is no problem. Actually I have noticeably more UDP traffic than an 
average network, mainly because of LWAPP/CAPWAP, so I'm interested in 
good performance for both TCP and UDP.

During my initial tests ICMP ping showed the same behavior like UDP/TCP 
with iperf, so I sticked with it. I'll redo everyting with UDP and TCP 
of course. :)

>> BTW: With a normal router workload, should I expect big performance drop
>> when receiving and forwarding the same packet using different CPUs?
>> Bonding provides very important functionality, I'm not able to drop it. :(
>>
>
> Not sure what you mean by forwarding same packet using different CPUs.
> You probably meant different queues, because in normal case, only one
> cpu is involved (the one receiving the packet is also the one
> transmitting it, unless you have congestion or trafic shaping)

I mean to receive it on a one CPU and to send it on a different one. I 
would like to assing different vectors (eth1-0 .. eth1-4) to different 
CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1 
.. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two 
different CPUs will be involved (RX on q1-q4, TX on q0).

> If you have 4 cpus, you can use following patch and have a transparent
> bonding against multiqueue.

Thanks! If I get it right: with the patch, packets should be sent using 
the same CPU (queue?) that was used when receiving?

> Still bonding xmit path hits a global
> rwlock, so performance is not what you can get without bonding.

It may not be perfect, but it should be much better than nothing, right?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 21:12             ` Krzysztof Olędzki
@ 2010-05-16 21:26               ` Eric Dumazet
  2010-05-18 14:22                 ` Krzysztof Olędzki
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2010-05-16 21:26 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev

Le dimanche 16 mai 2010 à 23:12 +0200, Krzysztof Olędzki a écrit :
> On 2010-05-16 22:47, Eric Dumazet wrote:
> > Le dimanche 16 mai 2010 à 22:34 +0200, Krzysztof Olędzki a écrit :
> >> On 2010-05-16 22:15, Eric Dumazet wrote:
> >
> >>> All tx packets through bonding will use txqueue 0, since bnx2 doesnt
> >>> provide a ndo_select_queue() function.
> >>
> >> OK, that explains everything. Thank you Eric. I assume it may take some
> >> time for bonding to become multiqueue aware and/or bnx2x to provide
> >> ndo_select_queue?
> >>
> >
> > bonding might become multiqueue aware, there are several patches
> > floating around.
> >
> > But with your ping tests, it wont change the selected txqueue anyway (it
> > will be the same for any targets, because skb_tx_hash() wont hash the
> > destination address, only the skb->protocol.
> 
> What do you mean by "wont hash the destination address, only the 
> skb->protocol"? It won't hash the destination address for ICMP or for 
> all IP protocols?

locally generated ICMP packets all use same tx queue, because
sk->sk_hash is not set :

        if (skb->sk && skb->sk->sk_hash)
                hash = skb->sk->sk_hash;
        else
                hash = (__force u16) skb->protocol;

        hash = jhash_1word(hash, hashrnd);

        return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
 



However, replies will spread four queues, if hardware is capable to
perform hashing of ICMP packets, using IP addresses (source/destination)

> 
> My normal workload is TCP and UDP based so if it is only ICMP then there 
> is no problem. Actually I have noticeably more UDP traffic than an 
> average network, mainly because of LWAPP/CAPWAP, so I'm interested in 
> good performance for both TCP and UDP.
> 
> During my initial tests ICMP ping showed the same behavior like UDP/TCP 
> with iperf, so I sticked with it. I'll redo everyting with UDP and TCP 
> of course. :)
> 
> >> BTW: With a normal router workload, should I expect big performance drop
> >> when receiving and forwarding the same packet using different CPUs?
> >> Bonding provides very important functionality, I'm not able to drop it. :(
> >>
> >
> > Not sure what you mean by forwarding same packet using different CPUs.
> > You probably meant different queues, because in normal case, only one
> > cpu is involved (the one receiving the packet is also the one
> > transmitting it, unless you have congestion or trafic shaping)
> 
> I mean to receive it on a one CPU and to send it on a different one. I 
> would like to assing different vectors (eth1-0 .. eth1-4) to different 
> CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1 
> .. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two 
> different CPUs will be involved (RX on q1-q4, TX on q0).

As I said, (unless you use RPS), one forwarded packet only uses one CPU.
How tx queue is selected is another story. We try to do a 1-1 mapping.

> 
> > If you have 4 cpus, you can use following patch and have a transparent
> > bonding against multiqueue.
> 
> Thanks! If I get it right: with the patch, packets should be sent using 
> the same CPU (queue?) that was used when receiving?

Yes, for forwarding loads.

(You might use 5 or 8 instead of 4, because its not clear to me if bnx2
has 5 txqueues or 4 in your case)

> 
> > Still bonding xmit path hits a global
> > rwlock, so performance is not what you can get without bonding.
> 
> It may not be perfect, but it should be much better than nothing, right?
> 

Sure.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-18 15:35   ` Krzysztof Olędzki
@ 2010-05-18  2:11     ` Michael Chan
  2010-05-18 16:28       ` Krzysztof Olędzki
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Chan @ 2010-05-18  2:11 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: netdev


On Tue, 2010-05-18 at 08:35 -0700, Krzysztof Olędzki wrote:
> On 2010-05-16 20:51, Michael Chan wrote:
> > Krzysztof Oledzki wrote:
> >
> >>
> >> Why the driver registers 5 interrupts instead of 4? How to
> >> limit it to 4?
> >>
> >
> > The first vector (eth0-0) handles link interrupt and other slow
> > path events.  It also has an RX ring for non-IP packets that are
> > not hashed by the RSS hash.  The majority of the rx packets should
> > be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> > vectors to different CPUs.
> 
> Did some more test on a two 4 core CPUs (8 CPUs reported to the system) 
> and on a two 4 core CPUs with HT (16 CPUs reported to the system) and in 
> both cases there are 8 instead of 9 vectors: eth0-0 .. eth0-7 (irqs 61 
> .. 68). However, dmesg shows that 9 interrupts are allocated:
> 
> bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
> bnx2 0000:01:00.0: irq 69 for MSI/MSI-X
> 
> It such case, which ring will be used for slow path and non-IP packets 
> and why there is no additional queue like in a 4CPU case?
> 

eth0-0 is always the one handling slow path, rx ring 0 (non-IP), and tx
ring 0.  The last vector is not used by bnx2.  It is reserved for iSCSI
which is handled by the cnic and bnx2i drivers.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 21:26               ` Eric Dumazet
@ 2010-05-18 14:22                 ` Krzysztof Olędzki
  2010-05-18 14:26                   ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-18 14:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev

On 2010-05-16 23:26, Eric Dumazet wrote:

<CUT>

>> My normal workload is TCP and UDP based so if it is only ICMP then there
>> is no problem. Actually I have noticeably more UDP traffic than an
>> average network, mainly because of LWAPP/CAPWAP, so I'm interested in
>> good performance for both TCP and UDP.
>>
>> During my initial tests ICMP ping showed the same behavior like UDP/TCP
>> with iperf, so I sticked with it. I'll redo everyting with UDP and TCP
>> of course. :)
>>
>>>> BTW: With a normal router workload, should I expect big performance drop
>>>> when receiving and forwarding the same packet using different CPUs?
>>>> Bonding provides very important functionality, I'm not able to drop it. :(
>>>>
>>>
>>> Not sure what you mean by forwarding same packet using different CPUs.
>>> You probably meant different queues, because in normal case, only one
>>> cpu is involved (the one receiving the packet is also the one
>>> transmitting it, unless you have congestion or trafic shaping)
>>
>> I mean to receive it on a one CPU and to send it on a different one. I
>> would like to assing different vectors (eth1-0 .. eth1-4) to different
>> CPUs, but with bnx2x+bonding packets are received on queues 1-4 (eth1-1
>> .. eth1-4) and sent from queue 0 (eth1-0). So, for a one packet, two
>> different CPUs will be involved (RX on q1-q4, TX on q0).
>
> As I said, (unless you use RPS), one forwarded packet only uses one CPU.
> How tx queue is selected is another story. We try to do a 1-1 mapping.

OK, but with multi-queue NIC, I can assign each queue to a different 
CPU. So, while forwarding packets from a flow, I would like to assign 
the same queue on both input and output.

>>> If you have 4 cpus, you can use following patch and have a transparent
>>> bonding against multiqueue.
>>
>> Thanks! If I get it right: with the patch, packets should be sent using
>> the same CPU (queue?) that was used when receiving?
>
> Yes, for forwarding loads.
>
> (You might use 5 or 8 instead of 4, because its not clear to me if bnx2
> has 5 txqueues or 4 in your case)

Thank you. What happens if I set it to a lower/bigger value, than 
avaliable txqueues in a NIC?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-18 14:22                 ` Krzysztof Olędzki
@ 2010-05-18 14:26                   ` Eric Dumazet
  2010-05-18 14:55                     ` Krzysztof Olędzki
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2010-05-18 14:26 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Michael Chan, netdev

Le mardi 18 mai 2010 à 16:22 +0200, Krzysztof Olędzki a écrit :

> Thank you. What happens if I set it to a lower/bigger value, than 
> avaliable txqueues in a NIC?

lower values -> same situation than today (not all txqueues will be
used)

bigger values -> it will be capped, so its only a bit more ram
allocated.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-18 14:26                   ` Eric Dumazet
@ 2010-05-18 14:55                     ` Krzysztof Olędzki
  0 siblings, 0 replies; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-18 14:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Chan, netdev

On 2010-05-18 16:26, Eric Dumazet wrote:
> Le mardi 18 mai 2010 à 16:22 +0200, Krzysztof Olędzki a écrit :
>
>> Thank you. What happens if I set it to a lower/bigger value, than
>> avaliable txqueues in a NIC?
>
> lower values ->  same situation than today (not all txqueues will be
> used)
>
> bigger values ->  it will be capped, so its only a bit more ram
> allocated.

So it is safe to put there little bigger value than needed. Thanks.

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-16 18:51 ` Michael Chan
  2010-05-16 19:24   ` Krzysztof Olędzki
@ 2010-05-18 15:35   ` Krzysztof Olędzki
  2010-05-18  2:11     ` Michael Chan
  1 sibling, 1 reply; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-18 15:35 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

On 2010-05-16 20:51, Michael Chan wrote:
> Krzysztof Oledzki wrote:
>
>>
>> Why the driver registers 5 interrupts instead of 4? How to
>> limit it to 4?
>>
>
> The first vector (eth0-0) handles link interrupt and other slow
> path events.  It also has an RX ring for non-IP packets that are
> not hashed by the RSS hash.  The majority of the rx packets should
> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
> vectors to different CPUs.

Did some more test on a two 4 core CPUs (8 CPUs reported to the system) 
and on a two 4 core CPUs with HT (16 CPUs reported to the system) and in 
both cases there are 8 instead of 9 vectors: eth0-0 .. eth0-7 (irqs 61 
.. 68). However, dmesg shows that 9 interrupts are allocated:

bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
bnx2 0000:01:00.0: irq 69 for MSI/MSI-X

It such case, which ring will be used for slow path and non-IP packets 
and why there is no additional queue like in a 4CPU case?

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3)
  2010-05-18  2:11     ` Michael Chan
@ 2010-05-18 16:28       ` Krzysztof Olędzki
  0 siblings, 0 replies; 18+ messages in thread
From: Krzysztof Olędzki @ 2010-05-18 16:28 UTC (permalink / raw)
  To: Michael Chan; +Cc: netdev

On 2010-05-18 04:11, Michael Chan wrote:
>
> On Tue, 2010-05-18 at 08:35 -0700, Krzysztof Olędzki wrote:
>> On 2010-05-16 20:51, Michael Chan wrote:
>>> Krzysztof Oledzki wrote:
>>>
>>>>
>>>> Why the driver registers 5 interrupts instead of 4? How to
>>>> limit it to 4?
>>>>
>>>
>>> The first vector (eth0-0) handles link interrupt and other slow
>>> path events.  It also has an RX ring for non-IP packets that are
>>> not hashed by the RSS hash.  The majority of the rx packets should
>>> be hashed to the rx rings eth0-1 - eth0-4, so I would assign these
>>> vectors to different CPUs.
>>
>> Did some more test on a two 4 core CPUs (8 CPUs reported to the system)
>> and on a two 4 core CPUs with HT (16 CPUs reported to the system) and in
>> both cases there are 8 instead of 9 vectors: eth0-0 .. eth0-7 (irqs 61
>> .. 68). However, dmesg shows that 9 interrupts are allocated:
>>
>> bnx2 0000:01:00.0: irq 61 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 62 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 63 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 64 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 65 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 66 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 67 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 68 for MSI/MSI-X
>> bnx2 0000:01:00.0: irq 69 for MSI/MSI-X
>>
>> It such case, which ring will be used for slow path and non-IP packets
>> and why there is no additional queue like in a 4CPU case?
>>
>
> eth0-0 is always the one handling slow path, rx ring 0 (non-IP), and tx
> ring 0.  The last vector is not used by bnx2.  It is reserved for iSCSI
> which is handled by the cnic and bnx2i drivers.

Thanks again for the explanation.

Best regards,

			Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-05-18 16:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-16 13:33 bnx2/BCM5709: why 5 interrupts on a 4 core system (2.6.33.3) Krzysztof Oledzki
2010-05-16 18:51 ` Michael Chan
2010-05-16 19:24   ` Krzysztof Olędzki
2010-05-16 19:49     ` Krzysztof Olędzki
2010-05-16 20:00     ` Michael Chan
2010-05-16 20:15       ` Eric Dumazet
2010-05-16 20:24         ` Michael Chan
2010-05-16 20:34         ` Krzysztof Olędzki
2010-05-16 20:47           ` Eric Dumazet
2010-05-16 21:06             ` George B.
2010-05-16 21:12             ` Krzysztof Olędzki
2010-05-16 21:26               ` Eric Dumazet
2010-05-18 14:22                 ` Krzysztof Olędzki
2010-05-18 14:26                   ` Eric Dumazet
2010-05-18 14:55                     ` Krzysztof Olędzki
2010-05-18 15:35   ` Krzysztof Olędzki
2010-05-18  2:11     ` Michael Chan
2010-05-18 16:28       ` Krzysztof Olędzki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).