All of lore.kernel.org
 help / color / mirror / Atom feed
* Horrid balance-rr bonding udp throughput
@ 2017-04-08 23:33 Jarod Wilson
  2017-04-10 18:50 ` Jarod Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Jarod Wilson @ 2017-04-08 23:33 UTC (permalink / raw)
  To: netdev

I'm digging into some bug reports covering performance issues with 
balance-rr, and discovered something even worse than the reporter. My 
test setup has a pair of NICs, one e1000e, one e1000 (but dual e1000e 
seems the same). When I do a test run in LNST with bonding mode 
balance-rr and either miimon or arpmon, the throughput of the UDP_STREAM 
netperf test is absolutely horrible:

TCP: 941.19 +-0.88 mbits/sec
UDP: 45.42 +-4.59 mbits/sec

I figured I'd try LNST's packet capture mode, so exact same test, add 
the -p flag and I get:

TCP: 941.21 +-0.82 mbits/sec
UDP: 961.54 +-0.01 mbits/sec

Uh. What? So yeah. I can't capture the traffic in the bad case, but I 
guess that gives some potential insight into what's not happening 
correctly in either the bonding driver or the NIC drivers... More 
digging forthcoming, but first I have a flooded basement to deal with, 
so if in the interim, anyone has some insight, I'd be happy to hear it. :)

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Horrid balance-rr bonding udp throughput
  2017-04-08 23:33 Horrid balance-rr bonding udp throughput Jarod Wilson
@ 2017-04-10 18:50 ` Jarod Wilson
  2017-04-10 19:11   ` Ben Greear
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jarod Wilson @ 2017-04-10 18:50 UTC (permalink / raw)
  To: netdev

On 2017-04-08 7:33 PM, Jarod Wilson wrote:
> I'm digging into some bug reports covering performance issues with 
> balance-rr, and discovered something even worse than the reporter. My 
> test setup has a pair of NICs, one e1000e, one e1000 (but dual e1000e 
> seems the same). When I do a test run in LNST with bonding mode 
> balance-rr and either miimon or arpmon, the throughput of the UDP_STREAM 
> netperf test is absolutely horrible:
> 
> TCP: 941.19 +-0.88 mbits/sec
> UDP: 45.42 +-4.59 mbits/sec
> 
> I figured I'd try LNST's packet capture mode, so exact same test, add 
> the -p flag and I get:
> 
> TCP: 941.21 +-0.82 mbits/sec
> UDP: 961.54 +-0.01 mbits/sec
> 
> Uh. What? So yeah. I can't capture the traffic in the bad case, but I 
> guess that gives some potential insight into what's not happening 
> correctly in either the bonding driver or the NIC drivers... More 
> digging forthcoming, but first I have a flooded basement to deal with, 
> so if in the interim, anyone has some insight, I'd be happy to hear it. :)

Okay, ignore the bit about bonding, I should have eliminated the bond 
from the picture entirely. I think the traffic simply ended up on the 
e1000 on the non-capture test and on the e1000e for the capture test, as 
those numbers match perfectly with straight NIC to NIC testing, no bond 
involved. That said, really odd that the e1000 is so severely crippled 
for UDP, while TCP is still respectable. Not sure if I have a flaky NIC 
or what...

For reference, e1000 to e1000e netperf:

TCP_STREAM: Measured rate was 849.95 +-1.32 mbits/sec
UDP_STREAM: Measured rate was 44.73 +-5.73 mbits/sec


-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Horrid balance-rr bonding udp throughput
  2017-04-10 18:50 ` Jarod Wilson
@ 2017-04-10 19:11   ` Ben Greear
  2017-04-10 19:31   ` Eric Dumazet
  2017-04-11 14:28   ` Jarod Wilson
  2 siblings, 0 replies; 5+ messages in thread
From: Ben Greear @ 2017-04-10 19:11 UTC (permalink / raw)
  To: Jarod Wilson, netdev

On 04/10/2017 11:50 AM, Jarod Wilson wrote:
> On 2017-04-08 7:33 PM, Jarod Wilson wrote:
>> I'm digging into some bug reports covering performance issues with balance-rr, and discovered something even worse than the reporter. My test setup has a pair
>> of NICs, one e1000e, one e1000 (but dual e1000e seems the same). When I do a test run in LNST with bonding mode balance-rr and either miimon or arpmon, the
>> throughput of the UDP_STREAM netperf test is absolutely horrible:
>>
>> TCP: 941.19 +-0.88 mbits/sec
>> UDP: 45.42 +-4.59 mbits/sec
>>
>> I figured I'd try LNST's packet capture mode, so exact same test, add the -p flag and I get:
>>
>> TCP: 941.21 +-0.82 mbits/sec
>> UDP: 961.54 +-0.01 mbits/sec
>>
>> Uh. What? So yeah. I can't capture the traffic in the bad case, but I guess that gives some potential insight into what's not happening correctly in either
>> the bonding driver or the NIC drivers... More digging forthcoming, but first I have a flooded basement to deal with, so if in the interim, anyone has some
>> insight, I'd be happy to hear it. :)
>
> Okay, ignore the bit about bonding, I should have eliminated the bond from the picture entirely. I think the traffic simply ended up on the e1000 on the
> non-capture test and on the e1000e for the capture test, as those numbers match perfectly with straight NIC to NIC testing, no bond involved. That said, really
> odd that the e1000 is so severely crippled for UDP, while TCP is still respectable. Not sure if I have a flaky NIC or what...
>
> For reference, e1000 to e1000e netperf:
>
> TCP_STREAM: Measured rate was 849.95 +-1.32 mbits/sec
> UDP_STREAM: Measured rate was 44.73 +-5.73 mbits/sec

Maybe check that you have re-ordering issues?  I ran into that with igb
recently and it took a while to realize my problem!

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Horrid balance-rr bonding udp throughput
  2017-04-10 18:50 ` Jarod Wilson
  2017-04-10 19:11   ` Ben Greear
@ 2017-04-10 19:31   ` Eric Dumazet
  2017-04-11 14:28   ` Jarod Wilson
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2017-04-10 19:31 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: netdev

On Mon, 2017-04-10 at 14:50 -0400, Jarod Wilson wrote:
> On 2017-04-08 7:33 PM, Jarod Wilson wrote:
> > I'm digging into some bug reports covering performance issues with 
> > balance-rr, and discovered something even worse than the reporter. My 
> > test setup has a pair of NICs, one e1000e, one e1000 (but dual e1000e 
> > seems the same). When I do a test run in LNST with bonding mode 
> > balance-rr and either miimon or arpmon, the throughput of the UDP_STREAM 
> > netperf test is absolutely horrible:
> > 
> > TCP: 941.19 +-0.88 mbits/sec
> > UDP: 45.42 +-4.59 mbits/sec
> > 
> > I figured I'd try LNST's packet capture mode, so exact same test, add 
> > the -p flag and I get:
> > 
> > TCP: 941.21 +-0.82 mbits/sec
> > UDP: 961.54 +-0.01 mbits/sec
> > 
> > Uh. What? So yeah. I can't capture the traffic in the bad case, but I 
> > guess that gives some potential insight into what's not happening 
> > correctly in either the bonding driver or the NIC drivers... More 
> > digging forthcoming, but first I have a flooded basement to deal with, 
> > so if in the interim, anyone has some insight, I'd be happy to hear it. :)
> 
> Okay, ignore the bit about bonding, I should have eliminated the bond 
> from the picture entirely. I think the traffic simply ended up on the 
> e1000 on the non-capture test and on the e1000e for the capture test, as 
> those numbers match perfectly with straight NIC to NIC testing, no bond 
> involved. That said, really odd that the e1000 is so severely crippled 
> for UDP, while TCP is still respectable. Not sure if I have a flaky NIC 
> or what...
> 
> For reference, e1000 to e1000e netperf:
> 
> TCP_STREAM: Measured rate was 849.95 +-1.32 mbits/sec
> UDP_STREAM: Measured rate was 44.73 +-5.73 mbits/sec

In our experiments, we found e1000e had latency issue with UDP packets,
not with TCP.

Try e1000e -> e1000e , problem should persist, right ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Horrid balance-rr bonding udp throughput
  2017-04-10 18:50 ` Jarod Wilson
  2017-04-10 19:11   ` Ben Greear
  2017-04-10 19:31   ` Eric Dumazet
@ 2017-04-11 14:28   ` Jarod Wilson
  2 siblings, 0 replies; 5+ messages in thread
From: Jarod Wilson @ 2017-04-11 14:28 UTC (permalink / raw)
  To: netdev

On 2017-04-10 2:50 PM, Jarod Wilson wrote:
> On 2017-04-08 7:33 PM, Jarod Wilson wrote:
>> I'm digging into some bug reports covering performance issues with 
>> balance-rr, and discovered something even worse than the reporter. My 
>> test setup has a pair of NICs, one e1000e, one e1000 (but dual e1000e 
>> seems the same). When I do a test run in LNST with bonding mode 
>> balance-rr and either miimon or arpmon, the throughput of the 
>> UDP_STREAM netperf test is absolutely horrible:
>>
>> TCP: 941.19 +-0.88 mbits/sec
>> UDP: 45.42 +-4.59 mbits/sec
>>
>> I figured I'd try LNST's packet capture mode, so exact same test, add 
>> the -p flag and I get:
>>
>> TCP: 941.21 +-0.82 mbits/sec
>> UDP: 961.54 +-0.01 mbits/sec
>>
>> Uh. What? So yeah. I can't capture the traffic in the bad case, but I 
>> guess that gives some potential insight into what's not happening 
>> correctly in either the bonding driver or the NIC drivers... More 
>> digging forthcoming, but first I have a flooded basement to deal with, 
>> so if in the interim, anyone has some insight, I'd be happy to hear 
>> it. :)
> 
> Okay, ignore the bit about bonding, I should have eliminated the bond 
> from the picture entirely. I think the traffic simply ended up on the 
> e1000 on the non-capture test and on the e1000e for the capture test, as 
> those numbers match perfectly with straight NIC to NIC testing, no bond 
> involved. That said, really odd that the e1000 is so severely crippled 
> for UDP, while TCP is still respectable. Not sure if I have a flaky NIC 
> or what...
> 
> For reference, e1000 to e1000e netperf:
> 
> TCP_STREAM: Measured rate was 849.95 +-1.32 mbits/sec
> UDP_STREAM: Measured rate was 44.73 +-5.73 mbits/sec

The rabbit hole went even deeper. The actual problem was with the ITE 
8893 PCIe bridge in the host not properly exposing capabilities, which 
required a pci quirk identical to that of the ITE 8892 to work around. 
With that in place, throughput on this venerable old e1000 goes back up 
to a reasonable 900 mbits/sec, give or take.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-11 14:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-08 23:33 Horrid balance-rr bonding udp throughput Jarod Wilson
2017-04-10 18:50 ` Jarod Wilson
2017-04-10 19:11   ` Ben Greear
2017-04-10 19:31   ` Eric Dumazet
2017-04-11 14:28   ` Jarod Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.