All of lore.kernel.org
 help / color / mirror / Atom feed
* rps testing questions
@ 2011-01-17  9:43 mi wake
  2011-01-17  9:53 ` Eric Dumazet
  2011-01-17 13:08 ` Ben Hutchings
  0 siblings, 2 replies; 7+ messages in thread
From: mi wake @ 2011-01-17  9:43 UTC (permalink / raw)
  To: netdev

I do a rps(Receive Packet Steering) testing on centos 5.5 with  kernel 2.6.37.
cpu: 8 core Intel.
ethernet adapter: bnx2x

Problem statement:
enable rps with:
echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus.

running 1 instances of netperf TCP_RR: netperf  -t TCP_RR -H 192.168.0.1 -c -C
without rps: 9963.48(Trans Rate per sec)
with rps:  9387.59(Trans Rate per sec)

I do ab and tbench testing also find there is less tps with enable
rps.but,there is more cpu using when with enable rps.when with enable
rps ,softirqs is blanced  on cpus.

is there something wrong with my test?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rps testing questions
  2011-01-17  9:43 rps testing questions mi wake
@ 2011-01-17  9:53 ` Eric Dumazet
  2011-01-18  8:34   ` mi wake
  2011-01-17 13:08 ` Ben Hutchings
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2011-01-17  9:53 UTC (permalink / raw)
  To: mi wake; +Cc: netdev

Le lundi 17 janvier 2011 à 17:43 +0800, mi wake a écrit :
> I do a rps(Receive Packet Steering) testing on centos 5.5 with  kernel 2.6.37.
> cpu: 8 core Intel.
> ethernet adapter: bnx2x
> 
> Problem statement:
> enable rps with:
> echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus.
> 

bnx2x with one queue only ?

> running 1 instances of netperf TCP_RR: netperf  -t TCP_RR -H 192.168.0.1 -c -C
> without rps: 9963.48(Trans Rate per sec)
> with rps:  9387.59(Trans Rate per sec)
> 
> I do ab and tbench testing also find there is less tps with enable
> rps.but,there is more cpu using when with enable rps.when with enable
> rps ,softirqs is blanced  on cpus.

Really ? that seems unlikely with your one flow test, unless you _also_
have hardware IRQS hitting all your cpus. (That would be very bad)

> 
> is there something wrong with my test?
> --

If you test with one flow, RPS brings nothing at all. Better handle the
packet directly from the cpu handling the hardware IRQ (and NAPI)

You better make sure hardware IRQ are on one cpu, instead of many cpus.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rps testing questions
  2011-01-17  9:43 rps testing questions mi wake
  2011-01-17  9:53 ` Eric Dumazet
@ 2011-01-17 13:08 ` Ben Hutchings
  2011-01-18 18:23   ` Rick Jones
  1 sibling, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2011-01-17 13:08 UTC (permalink / raw)
  To: mi wake; +Cc: netdev

On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote:
> I do a rps(Receive Packet Steering) testing on centos 5.5 with  kernel 2.6.37.
> cpu: 8 core Intel.
> ethernet adapter: bnx2x
> 
> Problem statement:
> enable rps with:
> echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus.
> 
> running 1 instances of netperf TCP_RR: netperf  -t TCP_RR -H 192.168.0.1 -c -C
> without rps: 9963.48(Trans Rate per sec)
> with rps:  9387.59(Trans Rate per sec)
> 
> I do ab and tbench testing also find there is less tps with enable
> rps.but,there is more cpu using when with enable rps.when with enable
> rps ,softirqs is blanced  on cpus.
> 
> is there something wrong with my test?

In addition to what Eric said, check the interrupt moderation settings
(ethtool -c/-C options).  One-way latency for a single request/response
test will be at least the interrupt moderation value.

I haven't tested RPS by itself (Solarflare NICs have plenty of hardware
queues) so I don't know whether it can improve latency.  However, RFS
certainly does when there are many flows.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rps testing questions
  2011-01-17  9:53 ` Eric Dumazet
@ 2011-01-18  8:34   ` mi wake
  0 siblings, 0 replies; 7+ messages in thread
From: mi wake @ 2011-01-18  8:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

2011/1/17 Eric Dumazet <eric.dumazet@gmail.com>:
> Le lundi 17 janvier 2011 à 17:43 +0800, mi wake a écrit :
>> I do a rps(Receive Packet Steering) testing on centos 5.5 with  kernel 2.6.37.
>> cpu: 8 core Intel.
>> ethernet adapter: bnx2x
>>
>> Problem statement:
>> enable rps with:
>> echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus.
>>
>
> bnx2x with one queue only ?
>
>> running 1 instances of netperf TCP_RR: netperf  -t TCP_RR -H 192.168.0.1 -c -C
>> without rps: 9963.48(Trans Rate per sec)
>> with rps:  9387.59(Trans Rate per sec)
>>
>> I do ab and tbench testing also find there is less tps with enable
>> rps.but,there is more cpu using when with enable rps.when with enable
>> rps ,softirqs is blanced  on cpus.
>
> Really ? that seems unlikely with your one flow test, unless you _also_
> have hardware IRQS hitting all your cpus. (That would be very bad)
>
>>
>> is there something wrong with my test?
>> --
>
> If you test with one flow, RPS brings nothing at all. Better handle the
> packet directly from the cpu handling the hardware IRQ (and NAPI)
>
> You better make sure hardware IRQ are on one cpu, instead of many cpus.
>
>
 I have checked that bnx2x with one queue only and hardware IRQ are on one cpu.
 I do testing again with more flows.when I use ip range from 192.x.x.1
to 192.x.x.200 to send syn packets.
Server can deal with :
 without rps + rfs : 18M/s
 with     rps +rfs : 21M/s.
Maybe  in previous tests,there are less flow. I will continue testing.
Thank you!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rps testing questions
  2011-01-17 13:08 ` Ben Hutchings
@ 2011-01-18 18:23   ` Rick Jones
  2011-01-18 18:34     ` Ben Hutchings
  0 siblings, 1 reply; 7+ messages in thread
From: Rick Jones @ 2011-01-18 18:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: mi wake, netdev

Ben Hutchings wrote:
> On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote:
> 
>>I do a rps(Receive Packet Steering) testing on centos 5.5 with  kernel 2.6.37.
>>cpu: 8 core Intel.
>>ethernet adapter: bnx2x
>>
>>Problem statement:
>>enable rps with:
>>echo "ff" > /sys/class/net/eth2/queues/rx-0/rps_cpus.
>>
>>running 1 instances of netperf TCP_RR: netperf  -t TCP_RR -H 192.168.0.1 -c -C
>>without rps: 9963.48(Trans Rate per sec)
>>with rps:  9387.59(Trans Rate per sec)

Presumably there was an increase in service demand corresponding with the drop 
in transactions per second.

Also, an unsolicited benchmarking style tip or two.  I find it helpful to either 
do several discrete runs, or use the confidence intervals (global -i and -I 
options) with the TCP_RR tests when I am looking to compare two settings.  I 
find a bit more "variability" in the _RR tests than the _STREAM tests.

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#index-g_t_002dI_002c-Global-26

Pinning netperf/netserver is also something I tend to do, but combining that 
with  confidence intervals, RPS is kind of difficult - the successive data 
connections made while running the iterations of the confidence intervals will 
have different port numbers and so different hashing.  That would cause RPS to 
put the connections on different cores in turn, which would, in conjunction with 
netperf/netserver being pinned to a core cause the relationship between where 
netperf runs and where netserver runs to change.  That will likely result in 
cache to cache (processor cache) transfers which will definitely up the service 
demand and drop the single-stream transactions per second.

In theory :) with RFS that should not be an issue since where netperf/netserver 
are pinned controls where the inbound processing takes place.

We are in a maze of twisty heuristics... :)

>>I do ab and tbench testing also find there is less tps with enable
>>rps.but,there is more cpu using when with enable rps.when with enable
>>rps ,softirqs is blanced  on cpus.
>>
>>is there something wrong with my test?
> 
> 
> In addition to what Eric said, check the interrupt moderation settings
> (ethtool -c/-C options).  One-way latency for a single request/response
> test will be at least the interrupt moderation value.
> 
> I haven't tested RPS by itself (Solarflare NICs have plenty of hardware
> queues) so I don't know whether it can improve latency.  However, RFS
> certainly does when there are many flows.

Is there actually an expectation that either RPS or RFS would improve *latency*? 
  Multiple-stream throughput certainly, but with the additional work done to 
spread things around, I wouldn't expect either to improve latency.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rps testing questions
  2011-01-18 18:23   ` Rick Jones
@ 2011-01-18 18:34     ` Ben Hutchings
  2011-01-18 19:10       ` Rick Jones
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2011-01-18 18:34 UTC (permalink / raw)
  To: Rick Jones; +Cc: mi wake, netdev

On Tue, 2011-01-18 at 10:23 -0800, Rick Jones wrote:
> Ben Hutchings wrote:
> > On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote:
[...]
> >>I do ab and tbench testing also find there is less tps with enable
> >>rps.but,there is more cpu using when with enable rps.when with enable
> >>rps ,softirqs is blanced  on cpus.
> >>
> >>is there something wrong with my test?
> > 
> > 
> > In addition to what Eric said, check the interrupt moderation settings
> > (ethtool -c/-C options).  One-way latency for a single request/response
> > test will be at least the interrupt moderation value.
> > 
> > I haven't tested RPS by itself (Solarflare NICs have plenty of hardware
> > queues) so I don't know whether it can improve latency.  However, RFS
> > certainly does when there are many flows.
> 
> Is there actually an expectation that either RPS or RFS would improve *latency*? 
>   Multiple-stream throughput certainly, but with the additional work done to 
> spread things around, I wouldn't expect either to improve latency.

Yes, it seems to make a big improvement to latency when many flows are
active.  Tom told me that one of his benchmarks was 200 * netperf TCP_RR
in parallel, and I've seen over 40% reduction in latency for that.  That
said, allocating more RX queues might also help (sfc currently defaults
to one per processor package rather than one per processor thread, due
to concerns about CPU efficiency).

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rps testing questions
  2011-01-18 18:34     ` Ben Hutchings
@ 2011-01-18 19:10       ` Rick Jones
  0 siblings, 0 replies; 7+ messages in thread
From: Rick Jones @ 2011-01-18 19:10 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: mi wake, netdev

Ben Hutchings wrote:
> On Tue, 2011-01-18 at 10:23 -0800, Rick Jones wrote:
> 
>>Ben Hutchings wrote:
>>
>>>On Mon, 2011-01-17 at 17:43 +0800, mi wake wrote:
> 
> [...]
> 
>>>>I do ab and tbench testing also find there is less tps with enable
>>>>rps.but,there is more cpu using when with enable rps.when with enable
>>>>rps ,softirqs is blanced  on cpus.
>>>>
>>>>is there something wrong with my test?
>>>
>>>
>>>In addition to what Eric said, check the interrupt moderation settings
>>>(ethtool -c/-C options).  One-way latency for a single request/response
>>>test will be at least the interrupt moderation value.
>>>
>>>I haven't tested RPS by itself (Solarflare NICs have plenty of hardware
>>>queues) so I don't know whether it can improve latency.  However, RFS
>>>certainly does when there are many flows.
>>
>>Is there actually an expectation that either RPS or RFS would improve *latency*? 
>>  Multiple-stream throughput certainly, but with the additional work done to 
>>spread things around, I wouldn't expect either to improve latency.
> 
> 
> Yes, it seems to make a big improvement to latency when many flows are
> active. 

OK, you and I were using different definitions.  I was speaking to single-stream 
latency, but didn't say it explicitly (I may have subconsciously thought it was 
implicit given the OP used a single instance of netperf :).

happy benchmarking,

rick jones

> Tom told me that one of his benchmarks was 200 * netperf TCP_RR
> in parallel, and I've seen over 40% reduction in latency for that. That
> said, allocating more RX queues might also help (sfc currently defaults
> to one per processor package rather than one per processor thread, due
> to concerns about CPU efficiency).
> 
> Ben.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-01-18 19:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-17  9:43 rps testing questions mi wake
2011-01-17  9:53 ` Eric Dumazet
2011-01-18  8:34   ` mi wake
2011-01-17 13:08 ` Ben Hutchings
2011-01-18 18:23   ` Rick Jones
2011-01-18 18:34     ` Ben Hutchings
2011-01-18 19:10       ` Rick Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.