All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
@ 2016-11-07 16:47 Ray Bellis
  2016-11-07 18:42 ` Alexander Duyck
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Bellis @ 2016-11-07 16:47 UTC (permalink / raw)
  To: intel-wired-lan

Does anyone have any ideas on the performance regression I've reported
on the i40e at https://sourceforge.net/p/e1000/bugs/544/ ?

I've got code that with the 1.3.9 driver and kernel 4.3 can achieve 1.3
million packets per second, but if I drop in 1.3.38 or later it drops to
about 500k.

This regression affects off-the-shelf DNS server software packages too.

thanks,

Ray Bellis
ISC Research Fellow


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-07 16:47 [Intel-wired-lan] i40e performance regression after 1.3.9 ? Ray Bellis
@ 2016-11-07 18:42 ` Alexander Duyck
  2016-11-07 20:16   ` Ray Bellis
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2016-11-07 18:42 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Nov 7, 2016 at 8:47 AM, Ray Bellis <ray@isc.org> wrote:
> Does anyone have any ideas on the performance regression I've reported
> on the i40e at https://sourceforge.net/p/e1000/bugs/544/ ?
>
> I've got code that with the 1.3.9 driver and kernel 4.3 can achieve 1.3
> million packets per second, but if I drop in 1.3.38 or later it drops to
> about 500k.
>
> This regression affects off-the-shelf DNS server software packages too.
>
> thanks,
>
> Ray Bellis
> ISC Research Fellow

Do you know if the packets are being dropped in the Rx path or is the
bottleneck on the Tx side?  It looks like this could be one of a few
things.  Doing a quick git log and git diff between v4.3 and v4.4 the
two things that jump out at me are changes to the Tx tail bumping code
and changes to the interrupt moderation code.

To narrow this down you might try manually configuring both the 1.3.9
and 1.3.38 drivers to the same interrupt moderation values using a
command like:
ethtool -C <iface> adaptive-rx off adaptive-tx off rx-usecs 25 tx-usecs 25

That would default the interrupt moderation to somewhere around 40K
interrupts per second.  If you run this command on both drivers and
they give you the same performance than we would know that the issue
is likely due to changes in the dynamic interrupt moderation.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-07 18:42 ` Alexander Duyck
@ 2016-11-07 20:16   ` Ray Bellis
  2016-11-07 20:44     ` Alexander Duyck
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Bellis @ 2016-11-07 20:16 UTC (permalink / raw)
  To: intel-wired-lan

On 07/11/2016 18:42, Alexander Duyck wrote:

> Do you know if the packets are being dropped in the Rx path or is the
> bottleneck on the Tx side?  

None were being dropped AFAICR (from looking at the interface stats) and
I can't tell which side is the bottleneck.

> It looks like this could be one of a few
> things.  Doing a quick git log and git diff between v4.3 and v4.4 the
> two things that jump out at me are changes to the Tx tail bumping code
> and changes to the interrupt moderation code.
> 
> To narrow this down you might try manually configuring both the 1.3.9
> and 1.3.38 drivers to the same interrupt moderation values using a
> command like:
> ethtool -C <iface> adaptive-rx off adaptive-tx off rx-usecs 25 tx-usecs 25
> 
> That would default the interrupt moderation to somewhere around 40K
> interrupts per second.  If you run this command on both drivers and
> they give you the same performance than we would know that the issue
> is likely due to changes in the dynamic interrupt moderation.

I had noticed last week that the rx-usecs and tx-usecs values were
different between 4.3 and 4.4, but when I changed the 25/25 that 4.4
uses to the 62/122 that 4.3 uses it made no difference.

However that's not quite what you've asked, so I shall repeat the test
tomorrow, and also see whether your suggestion of changing 4.3 to the
4.4 values causes the same drop.

I also didn't turn off adaptive rx or tx, so I shall try that too.

Also if it helps, I've got flame graphs of "before" and "after":

http://users.isc.org/~ray/graph-fast.svg   (1.3.9)
http://users.isc.org/~ray/graph-slow.svg   (1.3.38+)

These both represent 30 seconds of dnsperf hammering my UDP echo server,
although I suspect that compiler inlining may have interfered with the
stack traces.

thanks,

Ray



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-07 20:16   ` Ray Bellis
@ 2016-11-07 20:44     ` Alexander Duyck
  2016-11-08 10:30       ` Ray Bellis
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2016-11-07 20:44 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Nov 7, 2016 at 12:16 PM, Ray Bellis <ray@isc.org> wrote:
> On 07/11/2016 18:42, Alexander Duyck wrote:
>
>> Do you know if the packets are being dropped in the Rx path or is the
>> bottleneck on the Tx side?
>
> None were being dropped AFAICR (from looking at the interface stats) and
> I can't tell which side is the bottleneck.
>
>> It looks like this could be one of a few
>> things.  Doing a quick git log and git diff between v4.3 and v4.4 the
>> two things that jump out at me are changes to the Tx tail bumping code
>> and changes to the interrupt moderation code.
>>
>> To narrow this down you might try manually configuring both the 1.3.9
>> and 1.3.38 drivers to the same interrupt moderation values using a
>> command like:
>> ethtool -C <iface> adaptive-rx off adaptive-tx off rx-usecs 25 tx-usecs 25
>>
>> That would default the interrupt moderation to somewhere around 40K
>> interrupts per second.  If you run this command on both drivers and
>> they give you the same performance than we would know that the issue
>> is likely due to changes in the dynamic interrupt moderation.
>
> I had noticed last week that the rx-usecs and tx-usecs values were
> different between 4.3 and 4.4, but when I changed the 25/25 that 4.4
> uses to the 62/122 that 4.3 uses it made no difference.
>
> However that's not quite what you've asked, so I shall repeat the test
> tomorrow, and also see whether your suggestion of changing 4.3 to the
> 4.4 values causes the same drop.

I would suggest also changing the values on 4.4 with the same command.
It will say that rx-usecs and tx-usecs didn't change but the simple
fact that we disabled the adaptive moderation can have a huge impact.

> I also didn't turn off adaptive rx or tx, so I shall try that too.

The adaptive Rx and Tx being disabled is the important part.  If you
didn't do that then changing the other values really had no effect.

> Also if it helps, I've got flame graphs of "before" and "after":
>
> http://users.isc.org/~ray/graph-fast.svg   (1.3.9)
> http://users.isc.org/~ray/graph-slow.svg   (1.3.38+)
>
> These both represent 30 seconds of dnsperf hammering my UDP echo server,
> although I suspect that compiler inlining may have interfered with the
> stack traces.

I'm used to seeing the effects of compiler inlining on these sort of
things.  Just looking at them I suspect the problem is the new driver
isn't firing the interrupts often enough.  You are spending half as
much time in the driver as you were before when handling the Rx
cleanup.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-07 20:44     ` Alexander Duyck
@ 2016-11-08 10:30       ` Ray Bellis
  2016-11-08 16:05         ` Wyborny, Carolyn
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Bellis @ 2016-11-08 10:30 UTC (permalink / raw)
  To: intel-wired-lan

On 07/11/2016 20:44, Alexander Duyck wrote:

> The adaptive Rx and Tx being disabled is the important part.  If you
> didn't do that then changing the other values really had no effect.

I've just confirmed that turning off adaptive Rx and Tx on the "server"
side has the desired effect, bringing my packet rate back up to 1.3+ Mpps.

I'm getting odd effects when I try this on the "client" side though
(with the dnsperf program which generates packets and then waits for the
response)

With the 1.3.9 driver, I get a pretty consistent packet rate:

1478599801.555815: 1346798.548250
1478599802.556818: 1345090.873854
1478599803.557820: 1348886.415811
1478599804.558826: 1347842.070877
1478599805.559823: 1346257.780992
1478599806.560821: 1347437.257617
1478599807.561818: 1346674.365657
1478599808.562824: 1351776.113230
1478599809.563818: 1347379.704574

I then unload the driver and install the new one:

# ifdown enp5s0f1
Device 'enp5s0f1' successfully disconnected.
# rmmod i40e
# insmod ./i40e-1.3.38-4.3.5-300.fc23.ko
# ifup enp5s0f1
# ethtool -C enp5s0f1 adaptive-rx off adaptive-tx off

and re-running the test the packet rates are all over the place, and
almost seem to be trending downwards over time:

1478599891.282086: 1064831.565393
1478599892.282814: 1013365.270083
1478599893.283820: 958262.987435
1478599894.284814: 960207.553692
1478599895.285817: 947161.996517
1478599896.286852: 964507.734495
1478599897.287818: 991703.014888
1478599898.288815: 1010553.478182
1478599899.289822: 963053.205422
1478599900.290860: 933701.817513
1478599901.291816: 988006.465819
1478599902.292821: 957132.082257
1478599903.293822: 1017923.059018
1478599904.294820: 974373.575172
1478599905.295816: 947963.828027
1478599906.296817: 943062.993943
1478599907.297851: 885024.884270
1478599908.298818: 868948.726581
1478599909.299814: 945017.762309
1478599910.300846: 954253.210687
1478599911.301817: 936898.271778
1478599912.302818: 885917.196886

Even more bizarrely, the first time I ran this test it actually gave me
the expected 1.35 Mpps for the first ten seconds or so before
(gradually) dropping down to 1 Mpps or less.

More experimentation required - I'll do so with some reboots rather that
rmmod / insmod.

Ray






^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-08 10:30       ` Ray Bellis
@ 2016-11-08 16:05         ` Wyborny, Carolyn
  2016-11-08 16:31           ` Ray Bellis
  0 siblings, 1 reply; 9+ messages in thread
From: Wyborny, Carolyn @ 2016-11-08 16:05 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Ray Bellis
> Sent: Tuesday, November 08, 2016 2:31 AM
> To: Alexander Duyck <alexander.duyck@gmail.com>
> Cc: intel-wired-lan at lists.osuosl.org
> Subject: Re: [Intel-wired-lan] i40e performance regression after 1.3.9 ?
> 
> On 07/11/2016 20:44, Alexander Duyck wrote:
> 
> > The adaptive Rx and Tx being disabled is the important part.  If you
> > didn't do that then changing the other values really had no effect.
> 
> I've just confirmed that turning off adaptive Rx and Tx on the "server"
> side has the desired effect, bringing my packet rate back up to 1.3+ Mpps.
> 
> I'm getting odd effects when I try this on the "client" side though
> (with the dnsperf program which generates packets and then waits for the
> response)
> 
> With the 1.3.9 driver, I get a pretty consistent packet rate:
> 
> 1478599801.555815: 1346798.548250
> 1478599802.556818: 1345090.873854
> 1478599803.557820: 1348886.415811
> 1478599804.558826: 1347842.070877
> 1478599805.559823: 1346257.780992
> 1478599806.560821: 1347437.257617
> 1478599807.561818: 1346674.365657
> 1478599808.562824: 1351776.113230
> 1478599809.563818: 1347379.704574
> 
> I then unload the driver and install the new one:
> 
> # ifdown enp5s0f1
> Device 'enp5s0f1' successfully disconnected.
> # rmmod i40e
> # insmod ./i40e-1.3.38-4.3.5-300.fc23.ko
> # ifup enp5s0f1
> # ethtool -C enp5s0f1 adaptive-rx off adaptive-tx off
> 
> and re-running the test the packet rates are all over the place, and
> almost seem to be trending downwards over time:
> 
> 1478599891.282086: 1064831.565393
> 1478599892.282814: 1013365.270083
> 1478599893.283820: 958262.987435
> 1478599894.284814: 960207.553692
> 1478599895.285817: 947161.996517
> 1478599896.286852: 964507.734495
> 1478599897.287818: 991703.014888
> 1478599898.288815: 1010553.478182
> 1478599899.289822: 963053.205422
> 1478599900.290860: 933701.817513
> 1478599901.291816: 988006.465819
> 1478599902.292821: 957132.082257
> 1478599903.293822: 1017923.059018
> 1478599904.294820: 974373.575172
> 1478599905.295816: 947963.828027
> 1478599906.296817: 943062.993943
> 1478599907.297851: 885024.884270
> 1478599908.298818: 868948.726581
> 1478599909.299814: 945017.762309
> 1478599910.300846: 954253.210687
> 1478599911.301817: 936898.271778
> 1478599912.302818: 885917.196886
> 
> Even more bizarrely, the first time I ran this test it actually gave me
> the expected 1.35 Mpps for the first ten seconds or so before
> (gradually) dropping down to 1 Mpps or less.
> 
> More experimentation required - I'll do so with some reboots rather that
> rmmod / insmod.

Thanks for this info Ray.  This narrows it down a lot.  I'll be looking closely at the adaptive-itr mechanism for some improvements.  In the 1.3.38 version, it was not actually enabled.  The i40e hw's ITR feature is different than our other parts.  I wanted to confirm something though.  Are you still seeing variability in the non-adaptive ITR settings  on your client system?

Thanks again,

Carolyn

Carolyn Wyborny 
Linux Development 
Networking Division 
Intel Corporation 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-08 16:05         ` Wyborny, Carolyn
@ 2016-11-08 16:31           ` Ray Bellis
  2016-11-08 17:18             ` Wyborny, Carolyn
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Bellis @ 2016-11-08 16:31 UTC (permalink / raw)
  To: intel-wired-lan

On 08/11/2016 16:05, Wyborny, Carolyn wrote:

> Thanks for this info Ray.  This narrows it down a lot.  I'll be
> looking closely at the adaptive-itr mechanism for some improvements.
> In the 1.3.38 version, it was not actually enabled.  The i40e hw's
> ITR feature is different than our other parts.  I wanted to confirm
> something though.  Are you still seeing variability in the
> non-adaptive ITR settings  on your client system?

If I upgrade to anything beyond 1.3.9 in the client system I seem to be
unable to maintain the original packet rates regardless of adapter settings.

With default settings, immediately after loading the 1.3.38 driver:

1478621396.881368: 391141.759023
1478621397.882212: 354922.445456
1478621398.883211: 416478.937541
1478621399.884240: 430534.979506
1478621400.885273: 422226.839675
1478621401.886212: 422115.633420

I then turned off adaptive-rx and -tx (note that this is the same test
as the second set of figures from my previous email)

1478622609.157284: 1328728.765727
1478622610.158216: 1308772.224287
1478622611.159251: 1332014.365132
1478622612.160215: 1282047.106589
1478622613.161251: 1269727.562246
1478622614.162212: 1152071.858944
1478622615.163212: 1291263.736264
1478622616.164220: 1260214.703579
1478622617.165220: 1336847.152847
1478622618.166216: 1087381.967560
1478622619.167211: 1260368.932912
1478622620.168214: 1312834.227270
1478622621.169253: 1000005.993772
1478622622.170286: 1320425.999942

For unknown reasons it's producing higher figures than my run from this
morning (UK time) but the variability is still very high.  This time
it's averaging 1.2 Mpps (still down from 1.35) but earlier it was more
like 930 kpps.

[one possible difference is that the client machine was rebooted in
between.  FWIW, both machines are using the default Haswell intel_pstate
frequency governer in powersave mode.  I can change that to
'performance' with a specified range if necessary]

If there's a specific sequence of tests you'd like me to make, please
let me know.

Ray



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-08 16:31           ` Ray Bellis
@ 2016-11-08 17:18             ` Wyborny, Carolyn
  2016-11-08 17:30               ` Ray Bellis
  0 siblings, 1 reply; 9+ messages in thread
From: Wyborny, Carolyn @ 2016-11-08 17:18 UTC (permalink / raw)
  To: intel-wired-lan

> -----Original Message-----
> From: Ray Bellis [mailto:ray at isc.org]
> Sent: Tuesday, November 08, 2016 8:31 AM
> To: Wyborny, Carolyn <carolyn.wyborny@intel.com>; Alexander Duyck
> <alexander.duyck@gmail.com>
> Cc: intel-wired-lan at lists.osuosl.org
> Subject: Re: [Intel-wired-lan] i40e performance regression after 1.3.9 ?
> 
[..]
> If I upgrade to anything beyond 1.3.9 in the client system I seem to be
> unable to maintain the original packet rates regardless of adapter settings.
> 
[..]
> For unknown reasons it's producing higher figures than my run from this
> morning (UK time) but the variability is still very high.  This time
> it's averaging 1.2 Mpps (still down from 1.35) but earlier it was more
> like 930 kpps.
> 
> [one possible difference is that the client machine was rebooted in
> between.  FWIW, both machines are using the default Haswell intel_pstate
> frequency governer in powersave mode.  I can change that to
> 'performance' with a specified range if necessary]
> 
> If there's a specific sequence of tests you'd like me to make, please
> let me know.

Its strange that there would be such a difference between them. Are they the same hw with the same resources, same slot, BIOS versions., etc., ad infinitem?  The driver settings are the same, apparently, but something else must be configured different by default.    

If you haven't been here already, here is a link to performance tuning generally for Intel Ethernet.  http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005811.html

We have another interrupt moderation feature in the X710 HW called Interrupt Rate Limiting.  It should be described in the README, but can also be found packaged with our out of tree driver at SourceForge.  I'd be interested to know if disabling the adaptive ITR and trying the Rate Limiting has any effect on what you're seeing.  I'd suggest trying this on the server system to avoid whatever is going on with the Client system.

Thanks,

Carolyn



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] i40e performance regression after 1.3.9 ?
  2016-11-08 17:18             ` Wyborny, Carolyn
@ 2016-11-08 17:30               ` Ray Bellis
  0 siblings, 0 replies; 9+ messages in thread
From: Ray Bellis @ 2016-11-08 17:30 UTC (permalink / raw)
  To: intel-wired-lan

On 08/11/2016 17:18, Wyborny, Carolyn wrote:

> Its strange that there would be such a difference between them. Are
> they the same hw with the same resources, same slot, BIOS versions.,
> etc., ad infinitem?  The driver settings are the same, apparently,
> but something else must be configured different by default.

As far as I can tell they are, although the dnsperf software I'm using
to generate the test traffic (and from which those rates are reported)
is perhaps just more sensitive to whatever's causing this issue.

The simplest test I have is that I can load the 1.3.9 driver on the
client and performance is rock steady, but when I put in any later
driver I either get a massive drop (with adaptive on) or higher but very
unstable values (with adaptive off).

[the systems are Dell R430s which were all built identically on the same
order]

> If you haven't been here already, here is a link to performance
> tuning generally for Intel Ethernet.
> http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005811.html

I'm not sure I've referred to that specific guide, but I have for
example turned off IRQ balancing, and put each RX/TX queue IRQ onto a
separate CPU core, and have hyperthreading disabled.

> We have another interrupt moderation feature in the X710 HW called
> Interrupt Rate Limiting.  It should be described in the README, but
> can also be found packaged with our out of tree driver at
> SourceForge.  I'd be interested to know if disabling the adaptive ITR
> and trying the Rate Limiting has any effect on what you're seeing.
> I'd suggest trying this on the server system to avoid whatever is
> going on with the Client system.

OK, I'll give that a try (tomorrow now, it's the end of my day here in
the UK).

thanks,

Ray


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-11-08 17:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-07 16:47 [Intel-wired-lan] i40e performance regression after 1.3.9 ? Ray Bellis
2016-11-07 18:42 ` Alexander Duyck
2016-11-07 20:16   ` Ray Bellis
2016-11-07 20:44     ` Alexander Duyck
2016-11-08 10:30       ` Ray Bellis
2016-11-08 16:05         ` Wyborny, Carolyn
2016-11-08 16:31           ` Ray Bellis
2016-11-08 17:18             ` Wyborny, Carolyn
2016-11-08 17:30               ` Ray Bellis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.