* NAT performance issue 944mbit -> ~40mbit
@ 2020-07-11 15:53 Ian Kumlien
2020-07-15 20:05 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-11 15:53 UTC (permalink / raw)
To: Linux Kernel Network Developers
Hi,
I first detected this with 5.7.6 but it seems to apply as far back as 5.6.1...
(so, 5.7.8 client -> nat (5.6.1 -> 5.8-rc4 -> server 5.7.7)
It seems to me that the window size doesn't advance, so i did revert
the tcp: grow window for OOO packets only for SACK flows [1]
but it did no difference...
I have a 384 MB tcpdump of a iperf3 session that starts low and then
actually starts to get the bandwidth...
I do use BBR - I have tried with cubic... it didn't help - the NAT
machine does use fq but changing it doesn't seem to yield any other
results.
Doing -P10 gives you the bandwith and can sometimes break the
stalemate but you'll end up back with the lower transfer speed again.
(it only seems to apply to NAT - the machine is a: A2SDi-12C-HLN4F and
has handled this without problems in the past...)
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.7.8&id=bf780119617797b5690e999e59a64ad79a572374
First iperf3 as a reference:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 113 MBytes 945 Mbits/sec 0 814 KBytes
[ 5] 1.00-2.00 sec 109 MBytes 912 Mbits/sec 0 806 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 31 792 KBytes
[ 5] 3.00-4.00 sec 101 MBytes 849 Mbits/sec 31 1.18 MBytes
[ 5] 4.00-5.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 31 778 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 93 772 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 778 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 60 778 KBytes
[ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 92 814 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 338 sender
[ 5] 0.00-10.01 sec 1.07 GBytes 919 Mbits/sec receiver
After that:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 4.77 MBytes 40.0 Mbits/sec 0 42.4 KBytes
[ 5] 1.00-2.00 sec 4.10 MBytes 34.4 Mbits/sec 0 84.8 KBytes
[ 5] 2.00-3.00 sec 4.60 MBytes 38.6 Mbits/sec 0 87.7 KBytes
[ 5] 3.00-4.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
[ 5] 4.00-5.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
[ 5] 5.00-6.00 sec 4.47 MBytes 37.5 Mbits/sec 0 76.4 KBytes
[ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 67.9 KBytes
[ 5] 7.00-8.00 sec 4.66 MBytes 39.1 Mbits/sec 0 67.9 KBytes
[ 5] 8.00-9.00 sec 4.35 MBytes 36.5 Mbits/sec 0 82.0 KBytes
[ 5] 9.00-10.00 sec 4.66 MBytes 39.1 Mbits/sec 0 139 KBytes
- - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 45.5 MBytes 38.2 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 45.0 MBytes 37.8 Mbits/sec receiver
You even get some:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 5.38 MBytes 45.2 Mbits/sec 0 42.4 KBytes
[ 5] 1.00-2.00 sec 7.08 MBytes 59.4 Mbits/sec 0 535 KBytes
[ 5] 2.00-3.00 sec 108 MBytes 907 Mbits/sec 0 778 KBytes
[ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 814 KBytes
[ 5] 4.00-5.00 sec 91.2 MBytes 765 Mbits/sec 0 829 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 0 783 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 769 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 778 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 809 KBytes
[ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 0 823 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 879 MBytes 738 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 875 MBytes 734 Mbits/sec receiver
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NAT performance issue 944mbit -> ~40mbit
2020-07-11 15:53 NAT performance issue 944mbit -> ~40mbit Ian Kumlien
@ 2020-07-15 20:05 ` Ian Kumlien
2020-07-15 20:31 ` Jakub Kicinski
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-15 20:05 UTC (permalink / raw)
To: Linux Kernel Network Developers, jeffrey.t.kirsher, intel-wired-lan
After a lot of debugging it turns out that the bug is in igb...
driver: igb
version: 5.6.0-k
firmware-version: 0. 6-1
03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
Connection (rev 03)
It's interesting that it only seems to happen on longer links... Any clues?
On Sat, Jul 11, 2020 at 5:53 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Hi,
>
> I first detected this with 5.7.6 but it seems to apply as far back as 5.6.1...
> (so, 5.7.8 client -> nat (5.6.1 -> 5.8-rc4 -> server 5.7.7)
>
> It seems to me that the window size doesn't advance, so i did revert
> the tcp: grow window for OOO packets only for SACK flows [1]
> but it did no difference...
>
> I have a 384 MB tcpdump of a iperf3 session that starts low and then
> actually starts to get the bandwidth...
> I do use BBR - I have tried with cubic... it didn't help - the NAT
> machine does use fq but changing it doesn't seem to yield any other
> results.
>
> Doing -P10 gives you the bandwith and can sometimes break the
> stalemate but you'll end up back with the lower transfer speed again.
> (it only seems to apply to NAT - the machine is a: A2SDi-12C-HLN4F and
> has handled this without problems in the past...)
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.7.8&id=bf780119617797b5690e999e59a64ad79a572374
>
> First iperf3 as a reference:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 113 MBytes 945 Mbits/sec 0 814 KBytes
> [ 5] 1.00-2.00 sec 109 MBytes 912 Mbits/sec 0 806 KBytes
> [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 31 792 KBytes
> [ 5] 3.00-4.00 sec 101 MBytes 849 Mbits/sec 31 1.18 MBytes
> [ 5] 4.00-5.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 31 778 KBytes
> [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 93 772 KBytes
> [ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 778 KBytes
> [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 60 778 KBytes
> [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 92 814 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 338 sender
> [ 5] 0.00-10.01 sec 1.07 GBytes 919 Mbits/sec receiver
>
> After that:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 4.77 MBytes 40.0 Mbits/sec 0 42.4 KBytes
> [ 5] 1.00-2.00 sec 4.10 MBytes 34.4 Mbits/sec 0 84.8 KBytes
> [ 5] 2.00-3.00 sec 4.60 MBytes 38.6 Mbits/sec 0 87.7 KBytes
> [ 5] 3.00-4.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> [ 5] 4.00-5.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> [ 5] 5.00-6.00 sec 4.47 MBytes 37.5 Mbits/sec 0 76.4 KBytes
> [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 67.9 KBytes
> [ 5] 7.00-8.00 sec 4.66 MBytes 39.1 Mbits/sec 0 67.9 KBytes
> [ 5] 8.00-9.00 sec 4.35 MBytes 36.5 Mbits/sec 0 82.0 KBytes
> [ 5] 9.00-10.00 sec 4.66 MBytes 39.1 Mbits/sec 0 139 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 45.5 MBytes 38.2 Mbits/sec 0 sender
> [ 5] 0.00-10.00 sec 45.0 MBytes 37.8 Mbits/sec receiver
>
> You even get some:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 5.38 MBytes 45.2 Mbits/sec 0 42.4 KBytes
> [ 5] 1.00-2.00 sec 7.08 MBytes 59.4 Mbits/sec 0 535 KBytes
> [ 5] 2.00-3.00 sec 108 MBytes 907 Mbits/sec 0 778 KBytes
> [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 814 KBytes
> [ 5] 4.00-5.00 sec 91.2 MBytes 765 Mbits/sec 0 829 KBytes
> [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 0 783 KBytes
> [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 769 KBytes
> [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 778 KBytes
> [ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 809 KBytes
> [ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 0 823 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 879 MBytes 738 Mbits/sec 0 sender
> [ 5] 0.00-10.00 sec 875 MBytes 734 Mbits/sec receiver
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NAT performance issue 944mbit -> ~40mbit
2020-07-15 20:05 ` Ian Kumlien
@ 2020-07-15 20:31 ` Jakub Kicinski
2020-07-15 21:02 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2020-07-15 20:31 UTC (permalink / raw)
To: Ian Kumlien
Cc: Linux Kernel Network Developers, jeffrey.t.kirsher, intel-wired-lan
On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> After a lot of debugging it turns out that the bug is in igb...
>
> driver: igb
> version: 5.6.0-k
> firmware-version: 0. 6-1
>
> 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> Connection (rev 03)
Unclear to me what you're actually reporting. Is this a regression
after a kernel upgrade? Compared to no NAT?
> It's interesting that it only seems to happen on longer links... Any clues?
Links as in with longer cables?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NAT performance issue 944mbit -> ~40mbit
2020-07-15 20:31 ` Jakub Kicinski
@ 2020-07-15 21:02 ` Ian Kumlien
2020-07-15 21:12 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-15 21:02 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Linux Kernel Network Developers, jeffrey.t.kirsher, intel-wired-lan
On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > After a lot of debugging it turns out that the bug is in igb...
> >
> > driver: igb
> > version: 5.6.0-k
> > firmware-version: 0. 6-1
> >
> > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > Connection (rev 03)
>
> Unclear to me what you're actually reporting. Is this a regression
> after a kernel upgrade? Compared to no NAT?
It only happens on "internet links"
Lets say that A is client with ibg driver, B is a firewall running NAT
with ixgbe drivers, C is another local node with igb and
D is a remote node with a bridge backed by a bnx2 interface.
A -> B -> C is ok (B and C is on the same switch)
A -> B -> D -- 32-40mbit
B -> D 944 mbit
C -> D 944 mbit
A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
Can it be a timing issue? this is on a AMD Ryzen 9 system - I have
tcpdumps but i doubt that they'll help...
> > It's interesting that it only seems to happen on longer links... Any clues?
>
> Links as in with longer cables?
Longer links, as in more hops and unknown (in this case Juniper) switches/boxes
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NAT performance issue 944mbit -> ~40mbit
2020-07-15 21:02 ` Ian Kumlien
@ 2020-07-15 21:12 ` Ian Kumlien
2020-07-15 21:40 ` Jakub Kicinski
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-15 21:12 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Linux Kernel Network Developers, jeffrey.t.kirsher, intel-wired-lan
On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > After a lot of debugging it turns out that the bug is in igb...
> > >
> > > driver: igb
> > > version: 5.6.0-k
> > > firmware-version: 0. 6-1
> > >
> > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > Connection (rev 03)
> >
> > Unclear to me what you're actually reporting. Is this a regression
> > after a kernel upgrade? Compared to no NAT?
>
> It only happens on "internet links"
>
> Lets say that A is client with ibg driver, B is a firewall running NAT
> with ixgbe drivers, C is another local node with igb and
> D is a remote node with a bridge backed by a bnx2 interface.
>
> A -> B -> C is ok (B and C is on the same switch)
>
> A -> B -> D -- 32-40mbit
>
> B -> D 944 mbit
> C -> D 944 mbit
>
> A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
This should of course be A' -> B -> D
Sorry, I've been scratching my head for about a week...
> Can it be a timing issue? this is on a AMD Ryzen 9 system - I have
> tcpdumps but i doubt that they'll help...
>
> > > It's interesting that it only seems to happen on longer links... Any clues?
> >
> > Links as in with longer cables?
>
> Longer links, as in more hops and unknown (in this case Juniper) switches/boxes
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NAT performance issue 944mbit -> ~40mbit
2020-07-15 21:12 ` Ian Kumlien
@ 2020-07-15 21:40 ` Jakub Kicinski
2020-07-15 21:59 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2020-07-15 21:40 UTC (permalink / raw)
To: Ian Kumlien
Cc: Linux Kernel Network Developers, jeffrey.t.kirsher, intel-wired-lan
On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > After a lot of debugging it turns out that the bug is in igb...
> > > >
> > > > driver: igb
> > > > version: 5.6.0-k
> > > > firmware-version: 0. 6-1
> > > >
> > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > Connection (rev 03)
> > >
> > > Unclear to me what you're actually reporting. Is this a regression
> > > after a kernel upgrade? Compared to no NAT?
> >
> > It only happens on "internet links"
> >
> > Lets say that A is client with ibg driver, B is a firewall running NAT
> > with ixgbe drivers, C is another local node with igb and
> > D is a remote node with a bridge backed by a bnx2 interface.
> >
> > A -> B -> C is ok (B and C is on the same switch)
> >
> > A -> B -> D -- 32-40mbit
> >
> > B -> D 944 mbit
> > C -> D 944 mbit
> >
> > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
>
> This should of course be A' -> B -> D
>
> Sorry, I've been scratching my head for about a week...
Hm, only thing that comes to mind if A' works reliably and A doesn't is
that A has somehow broken TCP offloads. Could you try disabling things
via ethtool -K and see if those settings make a difference?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NAT performance issue 944mbit -> ~40mbit
2020-07-15 21:40 ` Jakub Kicinski
@ 2020-07-15 21:59 ` Ian Kumlien
2020-07-15 22:32 ` [Intel-wired-lan] " Alexander Duyck
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-15 21:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Linux Kernel Network Developers, jeffrey.t.kirsher, intel-wired-lan
On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > >
> > > > > driver: igb
> > > > > version: 5.6.0-k
> > > > > firmware-version: 0. 6-1
> > > > >
> > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > Connection (rev 03)
> > > >
> > > > Unclear to me what you're actually reporting. Is this a regression
> > > > after a kernel upgrade? Compared to no NAT?
> > >
> > > It only happens on "internet links"
> > >
> > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > with ixgbe drivers, C is another local node with igb and
> > > D is a remote node with a bridge backed by a bnx2 interface.
> > >
> > > A -> B -> C is ok (B and C is on the same switch)
> > >
> > > A -> B -> D -- 32-40mbit
> > >
> > > B -> D 944 mbit
> > > C -> D 944 mbit
> > >
> > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> >
> > This should of course be A' -> B -> D
> >
> > Sorry, I've been scratching my head for about a week...
>
> Hm, only thing that comes to mind if A' works reliably and A doesn't is
> that A has somehow broken TCP offloads. Could you try disabling things
> via ethtool -K and see if those settings make a difference?
It's a bit hard since it works like this, turned tso off:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
[ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
[ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
[ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
[ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
[ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
[ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
[ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
[ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
Continued running tests:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
[ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
[ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
[ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
[ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
[ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
[ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
[ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
[ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
[ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
[ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
[ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
[ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
[ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
[ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
[ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
[ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
[ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
[ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
And the low bandwidth continues with:
ethtool -k enp3s0 |grep ": on"
rx-vlan-offload: on
tx-vlan-offload: on [requested off]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-udp-segmentation: on
hw-tc-offload: on
Can't quite find how to turn those off since they aren't listed in
ethtool (since the text is not what you use to enable/disable)
I was hoping that you'd have a clue of something that might introduce
a regression - ie specific patches to try to revert
Btw, the same issue applies to udp as werll
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
[ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
[ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
[ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
[ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
[ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
[ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
[ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
[ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
[ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
0/32584 (0%) sender
[ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
0/32573 (0%) receiver
vs:
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
[ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
[ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
[ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
[ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
[ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
[ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
[ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
[ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
[ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter
Lost/Total Datagrams
[ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
0/824530 (0%) sender
[ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
4756/824530 (0.58%) receiver
lspci -s 03:00.0 -vvv
03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
Connection (rev 03)
Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 57
IOMMU group: 20
Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at e000 [size=32]
Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
L0s <2us, L1 <16us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-
EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Device Serial Number 34-97-f6-ff-ff-31-88-f4
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
Steering table in TPH capability structure
Kernel driver in use: igb
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-15 21:59 ` Ian Kumlien
@ 2020-07-15 22:32 ` Alexander Duyck
2020-07-15 22:51 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2020-07-15 22:32 UTC (permalink / raw)
To: Ian Kumlien
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > >
> > > > > > driver: igb
> > > > > > version: 5.6.0-k
> > > > > > firmware-version: 0. 6-1
> > > > > >
> > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > Connection (rev 03)
> > > > >
> > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > after a kernel upgrade? Compared to no NAT?
> > > >
> > > > It only happens on "internet links"
> > > >
> > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > with ixgbe drivers, C is another local node with igb and
> > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > >
> > > > A -> B -> C is ok (B and C is on the same switch)
> > > >
> > > > A -> B -> D -- 32-40mbit
> > > >
> > > > B -> D 944 mbit
> > > > C -> D 944 mbit
> > > >
> > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > >
> > > This should of course be A' -> B -> D
> > >
> > > Sorry, I've been scratching my head for about a week...
> >
> > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > that A has somehow broken TCP offloads. Could you try disabling things
> > via ethtool -K and see if those settings make a difference?
>
> It's a bit hard since it works like this, turned tso off:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
>
> Continued running tests:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
>
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
>
> And the low bandwidth continues with:
> ethtool -k enp3s0 |grep ": on"
> rx-vlan-offload: on
> tx-vlan-offload: on [requested off]
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> tx-gre-segmentation: on
> tx-gre-csum-segmentation: on
> tx-ipxip4-segmentation: on
> tx-ipxip6-segmentation: on
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
> tx-gso-partial: on
> tx-udp-segmentation: on
> hw-tc-offload: on
>
> Can't quite find how to turn those off since they aren't listed in
> ethtool (since the text is not what you use to enable/disable)
To disable them you would just repeat the same string in the display
string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
and that would turn off a large chunk of them as all the encapsulated
support requires gso partial support.
> I was hoping that you'd have a clue of something that might introduce
> a regression - ie specific patches to try to revert
>
> Btw, the same issue applies to udp as werll
>
> [ ID] Interval Transfer Bitrate Total Datagrams
> [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Jitter
> Lost/Total Datagrams
> [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> 0/32584 (0%) sender
> [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> 0/32573 (0%) receiver
>
> vs:
>
> [ ID] Interval Transfer Bitrate Total Datagrams
> [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Jitter
> Lost/Total Datagrams
> [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> 0/824530 (0%) sender
> [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> 4756/824530 (0.58%) receiver
The fact that it is impacting UDP seems odd. I wonder if we don't have
a qdisc somewhere that is misbehaving and throttling the Tx. Either
that or I wonder if we are getting spammed with flow control frames.
It would be useful to include the output of just calling "ethtool
enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
to output the statistics and dump anything that isn't zero.
> lspci -s 03:00.0 -vvv
> 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> Connection (rev 03)
> Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin A routed to IRQ 57
> IOMMU group: 20
> Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> Region 2: I/O ports at e000 [size=32]
> Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> Vector table: BAR=3 offset=00000000
> PBA: BAR=3 offset=00002000
> Capabilities: [a0] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> MaxPayload 128 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> L0s <2us, L1 <16us
> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
PCIe wise the connection is going to be pretty tight in terms of
bandwidth. It looks like we have 2.5GT/s with only a single lane of
PCIe. In addition we are running with ASPM enabled so that means that
if we don't have enough traffic we are shutting off the one PCIe lane
we have so if we are getting bursty traffic that can get ugly.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-15 22:32 ` [Intel-wired-lan] " Alexander Duyck
@ 2020-07-15 22:51 ` Ian Kumlien
2020-07-15 23:41 ` Alexander Duyck
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-15 22:51 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Thu, Jul 16, 2020 at 12:32 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > > >
> > > > > > > driver: igb
> > > > > > > version: 5.6.0-k
> > > > > > > firmware-version: 0. 6-1
> > > > > > >
> > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > Connection (rev 03)
> > > > > >
> > > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > > after a kernel upgrade? Compared to no NAT?
> > > > >
> > > > > It only happens on "internet links"
> > > > >
> > > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > > with ixgbe drivers, C is another local node with igb and
> > > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > > >
> > > > > A -> B -> C is ok (B and C is on the same switch)
> > > > >
> > > > > A -> B -> D -- 32-40mbit
> > > > >
> > > > > B -> D 944 mbit
> > > > > C -> D 944 mbit
> > > > >
> > > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > > >
> > > > This should of course be A' -> B -> D
> > > >
> > > > Sorry, I've been scratching my head for about a week...
> > >
> > > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > > that A has somehow broken TCP offloads. Could you try disabling things
> > > via ethtool -K and see if those settings make a difference?
> >
> > It's a bit hard since it works like this, turned tso off:
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> > [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> > [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> > [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
> >
> > Continued running tests:
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> > [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> > [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> > [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> > [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> > [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> > [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> > [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> > [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> > [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> > [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
> >
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> > [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> > [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> > [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> > [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> > [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> > [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> > [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> > [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
> >
> > And the low bandwidth continues with:
> > ethtool -k enp3s0 |grep ": on"
> > rx-vlan-offload: on
> > tx-vlan-offload: on [requested off]
> > highdma: on [fixed]
> > rx-vlan-filter: on [fixed]
> > tx-gre-segmentation: on
> > tx-gre-csum-segmentation: on
> > tx-ipxip4-segmentation: on
> > tx-ipxip6-segmentation: on
> > tx-udp_tnl-segmentation: on
> > tx-udp_tnl-csum-segmentation: on
> > tx-gso-partial: on
> > tx-udp-segmentation: on
> > hw-tc-offload: on
> >
> > Can't quite find how to turn those off since they aren't listed in
> > ethtool (since the text is not what you use to enable/disable)
>
> To disable them you would just repeat the same string in the display
> string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
> and that would turn off a large chunk of them as all the encapsulated
> support requires gso partial support.
ethtool -k enp3s0 |grep ": on"
highdma: on [fixed]
rx-vlan-filter: on [fixed]
---
And then back to back:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 4.91 MBytes 41.2 Mbits/sec 0 45.2 KBytes
[ 5] 1.00-2.00 sec 4.47 MBytes 37.5 Mbits/sec 0 52.3 KBytes
[ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
[ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 141 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 928 Mbits/sec 63 764 KBytes
[ 5] 5.00-6.00 sec 86.2 MBytes 724 Mbits/sec 0 744 KBytes
[ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 61 769 KBytes
[ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 749 KBytes
[ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 741 KBytes
[ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 31 761 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 644 MBytes 540 Mbits/sec 155 sender
[ 5] 0.00-10.01 sec 641 MBytes 537 Mbits/sec receiver
and we're back at the not working bit:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 4.84 MBytes 40.6 Mbits/sec 0 42.4 KBytes
[ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 42.4 KBytes
[ 5] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
[ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
[ 5] 4.00-5.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
[ 5] 5.00-6.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
[ 5] 6.00-7.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
[ 5] 7.00-8.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
[ 5] 8.00-9.00 sec 4.47 MBytes 37.5 Mbits/sec 0 53.7 KBytes
[ 5] 9.00-10.00 sec 4.47 MBytes 37.5 Mbits/sec 0 79.2 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 44.5 MBytes 37.3 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 43.9 MBytes 36.8 Mbits/sec receiver
> > I was hoping that you'd have a clue of something that might introduce
> > a regression - ie specific patches to try to revert
> >
> > Btw, the same issue applies to udp as werll
> >
> > [ ID] Interval Transfer Bitrate Total Datagrams
> > [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> > [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> > [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> > [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> > [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> > [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> > [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> > [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> > [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> > [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Jitter
> > Lost/Total Datagrams
> > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> > 0/32584 (0%) sender
> > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> > 0/32573 (0%) receiver
> >
> > vs:
> >
> > [ ID] Interval Transfer Bitrate Total Datagrams
> > [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> > [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> > [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> > [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> > [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> > [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> > [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> > [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> > [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> > [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Jitter
> > Lost/Total Datagrams
> > [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> > 0/824530 (0%) sender
> > [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> > 4756/824530 (0.58%) receiver
>
> The fact that it is impacting UDP seems odd. I wonder if we don't have
> a qdisc somewhere that is misbehaving and throttling the Tx. Either
> that or I wonder if we are getting spammed with flow control frames.
it sometimes works, it looks like the cwindow just isn't increased -
that's where i started...
Example:
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 4.86 MBytes 40.8 Mbits/sec 0 50.9 KBytes
[ 5] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 0 65.0 KBytes
[ 5] 2.00-3.00 sec 4.29 MBytes 36.0 Mbits/sec 0 42.4 KBytes
[ 5] 3.00-4.00 sec 4.66 MBytes 39.1 Mbits/sec 0 42.4 KBytes
[ 5] 4.00-5.00 sec 23.1 MBytes 194 Mbits/sec 0 1.07 MBytes
[ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
[ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 60 806 KBytes
[ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0 812 KBytes
[ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 92 761 KBytes
[ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 755 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 554 MBytes 465 Mbits/sec 152 sender
[ 5] 0.00-10.00 sec 550 MBytes 461 Mbits/sec receiver
> It would be useful to include the output of just calling "ethtool
> enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
> verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
> to output the statistics and dump anything that isn't zero.
ethtool enp3s0
Settings for enp3s0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
MDI-X: off (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
---
ethtool -a enp3s0
Pause parameters for enp3s0:
Autonegotiate: on
RX: on
TX: off
---
ethtool -S enp3s0 |grep -v :\ 0
NIC statistics:
rx_packets: 15920618
tx_packets: 17846725
rx_bytes: 15676264423
tx_bytes: 19925010639
rx_broadcast: 119553
tx_broadcast: 497
rx_multicast: 330193
tx_multicast: 18190
multicast: 330193
rx_missed_errors: 270102
rx_long_length_errors: 6
tx_tcp_seg_good: 1342561
rx_long_byte_count: 15676264423
rx_errors: 6
rx_length_errors: 6
rx_fifo_errors: 270102
tx_queue_0_packets: 7651168
tx_queue_0_bytes: 7823281566
tx_queue_0_restart: 4920
tx_queue_1_packets: 10195557
tx_queue_1_bytes: 12027522118
tx_queue_1_restart: 12718
rx_queue_0_packets: 15920618
rx_queue_0_bytes: 15612581951
rx_queue_0_csum_err: 76
(I've only run two runs since i reenabled the interface)
---
> > lspci -s 03:00.0 -vvv
> > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > Connection (rev 03)
> > Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > Stepping- SERR- FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0
> > Interrupt: pin A routed to IRQ 57
> > IOMMU group: 20
> > Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> > Region 2: I/O ports at e000 [size=32]
> > Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> > Capabilities: [40] Power Management version 3
> > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > Address: 0000000000000000 Data: 0000
> > Masking: 00000000 Pending: 00000000
> > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > Vector table: BAR=3 offset=00000000
> > PBA: BAR=3 offset=00002000
> > Capabilities: [a0] Express (v2) Endpoint, MSI 00
> > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> > MaxPayload 128 bytes, MaxReadReq 512 bytes
> > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> > L0s <2us, L1 <16us
> > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>
> PCIe wise the connection is going to be pretty tight in terms of
> bandwidth. It looks like we have 2.5GT/s with only a single lane of
> PCIe. In addition we are running with ASPM enabled so that means that
> if we don't have enough traffic we are shutting off the one PCIe lane
> we have so if we are getting bursty traffic that can get ugly.
Humm... is there a way to force disable ASPM in sysfs?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-15 22:51 ` Ian Kumlien
@ 2020-07-15 23:41 ` Alexander Duyck
2020-07-15 23:59 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2020-07-15 23:41 UTC (permalink / raw)
To: Ian Kumlien
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Wed, Jul 15, 2020 at 3:51 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Thu, Jul 16, 2020 at 12:32 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> > On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > > > >
> > > > > > > > driver: igb
> > > > > > > > version: 5.6.0-k
> > > > > > > > firmware-version: 0. 6-1
> > > > > > > >
> > > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > > Connection (rev 03)
> > > > > > >
> > > > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > > > after a kernel upgrade? Compared to no NAT?
> > > > > >
> > > > > > It only happens on "internet links"
> > > > > >
> > > > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > > > with ixgbe drivers, C is another local node with igb and
> > > > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > > > >
> > > > > > A -> B -> C is ok (B and C is on the same switch)
> > > > > >
> > > > > > A -> B -> D -- 32-40mbit
> > > > > >
> > > > > > B -> D 944 mbit
> > > > > > C -> D 944 mbit
> > > > > >
> > > > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > > > >
> > > > > This should of course be A' -> B -> D
> > > > >
> > > > > Sorry, I've been scratching my head for about a week...
> > > >
> > > > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > > > that A has somehow broken TCP offloads. Could you try disabling things
> > > > via ethtool -K and see if those settings make a difference?
> > >
> > > It's a bit hard since it works like this, turned tso off:
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> > > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> > > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> > > [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> > > [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> > > [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
> > >
> > > Continued running tests:
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> > > [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> > > [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> > > [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> > > [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> > > [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> > > [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> > > [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> > > [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> > > [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> > > [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
> > >
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> > > [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> > > [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> > > [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> > > [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> > > [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> > > [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> > > [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> > > [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
> > >
> > > And the low bandwidth continues with:
> > > ethtool -k enp3s0 |grep ": on"
> > > rx-vlan-offload: on
> > > tx-vlan-offload: on [requested off]
> > > highdma: on [fixed]
> > > rx-vlan-filter: on [fixed]
> > > tx-gre-segmentation: on
> > > tx-gre-csum-segmentation: on
> > > tx-ipxip4-segmentation: on
> > > tx-ipxip6-segmentation: on
> > > tx-udp_tnl-segmentation: on
> > > tx-udp_tnl-csum-segmentation: on
> > > tx-gso-partial: on
> > > tx-udp-segmentation: on
> > > hw-tc-offload: on
> > >
> > > Can't quite find how to turn those off since they aren't listed in
> > > ethtool (since the text is not what you use to enable/disable)
> >
> > To disable them you would just repeat the same string in the display
> > string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
> > and that would turn off a large chunk of them as all the encapsulated
> > support requires gso partial support.
>
> ethtool -k enp3s0 |grep ": on"
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> ---
> And then back to back:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 4.91 MBytes 41.2 Mbits/sec 0 45.2 KBytes
> [ 5] 1.00-2.00 sec 4.47 MBytes 37.5 Mbits/sec 0 52.3 KBytes
> [ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 141 KBytes
> [ 5] 4.00-5.00 sec 111 MBytes 928 Mbits/sec 63 764 KBytes
> [ 5] 5.00-6.00 sec 86.2 MBytes 724 Mbits/sec 0 744 KBytes
> [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 61 769 KBytes
> [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 749 KBytes
> [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 741 KBytes
> [ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 31 761 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 644 MBytes 540 Mbits/sec 155 sender
> [ 5] 0.00-10.01 sec 641 MBytes 537 Mbits/sec receiver
>
> and we're back at the not working bit:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 4.84 MBytes 40.6 Mbits/sec 0 42.4 KBytes
> [ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 42.4 KBytes
> [ 5] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> [ 5] 4.00-5.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> [ 5] 5.00-6.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> [ 5] 6.00-7.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> [ 5] 7.00-8.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> [ 5] 8.00-9.00 sec 4.47 MBytes 37.5 Mbits/sec 0 53.7 KBytes
> [ 5] 9.00-10.00 sec 4.47 MBytes 37.5 Mbits/sec 0 79.2 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 44.5 MBytes 37.3 Mbits/sec 0 sender
> [ 5] 0.00-10.00 sec 43.9 MBytes 36.8 Mbits/sec receiver
>
> > > I was hoping that you'd have a clue of something that might introduce
> > > a regression - ie specific patches to try to revert
> > >
> > > Btw, the same issue applies to udp as werll
> > >
> > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> > > [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> > > [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> > > [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> > > [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> > > [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> > > [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> > > [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> > > [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> > > [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Jitter
> > > Lost/Total Datagrams
> > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> > > 0/32584 (0%) sender
> > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> > > 0/32573 (0%) receiver
> > >
> > > vs:
> > >
> > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> > > [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> > > [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> > > [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> > > [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> > > [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> > > [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> > > [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> > > [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> > > [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Jitter
> > > Lost/Total Datagrams
> > > [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> > > 0/824530 (0%) sender
> > > [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> > > 4756/824530 (0.58%) receiver
> >
> > The fact that it is impacting UDP seems odd. I wonder if we don't have
> > a qdisc somewhere that is misbehaving and throttling the Tx. Either
> > that or I wonder if we are getting spammed with flow control frames.
>
> it sometimes works, it looks like the cwindow just isn't increased -
> that's where i started...
>
> Example:
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 4.86 MBytes 40.8 Mbits/sec 0 50.9 KBytes
> [ 5] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 0 65.0 KBytes
> [ 5] 2.00-3.00 sec 4.29 MBytes 36.0 Mbits/sec 0 42.4 KBytes
> [ 5] 3.00-4.00 sec 4.66 MBytes 39.1 Mbits/sec 0 42.4 KBytes
> [ 5] 4.00-5.00 sec 23.1 MBytes 194 Mbits/sec 0 1.07 MBytes
> [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 60 806 KBytes
> [ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0 812 KBytes
> [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 92 761 KBytes
> [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 755 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 554 MBytes 465 Mbits/sec 152 sender
> [ 5] 0.00-10.00 sec 550 MBytes 461 Mbits/sec receiver
>
> > It would be useful to include the output of just calling "ethtool
> > enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
> > verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
> > to output the statistics and dump anything that isn't zero.
>
> ethtool enp3s0
> Settings for enp3s0:
> Supported ports: [ TP ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Supported pause frame use: Symmetric
> Supports auto-negotiation: Yes
> Supported FEC modes: Not reported
> Advertised link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> 1000baseT/Full
> Advertised pause frame use: Symmetric
> Advertised auto-negotiation: Yes
> Advertised FEC modes: Not reported
> Speed: 1000Mb/s
> Duplex: Full
> Auto-negotiation: on
> Port: Twisted Pair
> PHYAD: 1
> Transceiver: internal
> MDI-X: off (auto)
> Supports Wake-on: pumbg
> Wake-on: g
> Current message level: 0x00000007 (7)
> drv probe link
> Link detected: yes
> ---
> ethtool -a enp3s0
> Pause parameters for enp3s0:
> Autonegotiate: on
> RX: on
> TX: off
> ---
> ethtool -S enp3s0 |grep -v :\ 0
> NIC statistics:
> rx_packets: 15920618
> tx_packets: 17846725
> rx_bytes: 15676264423
> tx_bytes: 19925010639
> rx_broadcast: 119553
> tx_broadcast: 497
> rx_multicast: 330193
> tx_multicast: 18190
> multicast: 330193
> rx_missed_errors: 270102
> rx_long_length_errors: 6
> tx_tcp_seg_good: 1342561
> rx_long_byte_count: 15676264423
> rx_errors: 6
> rx_length_errors: 6
> rx_fifo_errors: 270102
> tx_queue_0_packets: 7651168
> tx_queue_0_bytes: 7823281566
> tx_queue_0_restart: 4920
> tx_queue_1_packets: 10195557
> tx_queue_1_bytes: 12027522118
> tx_queue_1_restart: 12718
> rx_queue_0_packets: 15920618
> rx_queue_0_bytes: 15612581951
> rx_queue_0_csum_err: 76
> (I've only run two runs since i reenabled the interface)
So I am seeing three things here.
The rx_long_length_errors are usually due to an MTU mismatch. Do you
have something on the network that is using jumbo frames, or is the
MTU on the NIC set to something smaller than what is supported on the
network?
You are getting rx_missed_errors, that would seem to imply that the
DMA is not able to keep up. We may want to try disabling the L1 to see
if we get any boost from doing that.
The last bit is that queue 0 is seeing packets with bad checksums. You
might want to run some tests and see where the bad checksums are
coming from. If they are being detected from a specific NIC such as
the ixgbe in your example it might point to some sort of checksum
error being created as a result of the NAT translation.
> ---
>
> > > lspci -s 03:00.0 -vvv
> > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > Connection (rev 03)
> > > Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > > Stepping- SERR- FastB2B- DisINTx+
> > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > Latency: 0
> > > Interrupt: pin A routed to IRQ 57
> > > IOMMU group: 20
> > > Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> > > Region 2: I/O ports at e000 [size=32]
> > > Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> > > Capabilities: [40] Power Management version 3
> > > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > > Address: 0000000000000000 Data: 0000
> > > Masking: 00000000 Pending: 00000000
> > > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > > Vector table: BAR=3 offset=00000000
> > > PBA: BAR=3 offset=00002000
> > > Capabilities: [a0] Express (v2) Endpoint, MSI 00
> > > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> > > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> > > MaxPayload 128 bytes, MaxReadReq 512 bytes
> > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > > LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> > > L0s <2us, L1 <16us
> > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >
> > PCIe wise the connection is going to be pretty tight in terms of
> > bandwidth. It looks like we have 2.5GT/s with only a single lane of
> > PCIe. In addition we are running with ASPM enabled so that means that
> > if we don't have enough traffic we are shutting off the one PCIe lane
> > we have so if we are getting bursty traffic that can get ugly.
>
> Humm... is there a way to force disable ASPM in sysfs?
Actually the easiest way to do this is to just use setpci.
You should be able to dump the word containing the setting via:
# setpci -s 3:00.0 0xB0.w
0042
# setpci -s 3:00.0 0xB0.w=0040
Basically what you do is clear the lower 3 bits of the value so in
this case that means replacing the 2 with a 0 based on the output of
the first command.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-15 23:41 ` Alexander Duyck
@ 2020-07-15 23:59 ` Ian Kumlien
2020-07-16 15:18 ` Alexander Duyck
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-15 23:59 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Thu, Jul 16, 2020 at 1:42 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Jul 15, 2020 at 3:51 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Thu, Jul 16, 2020 at 12:32 AM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > > On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > > > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > > > > >
> > > > > > > > > driver: igb
> > > > > > > > > version: 5.6.0-k
> > > > > > > > > firmware-version: 0. 6-1
> > > > > > > > >
> > > > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > > > Connection (rev 03)
> > > > > > > >
> > > > > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > > > > after a kernel upgrade? Compared to no NAT?
> > > > > > >
> > > > > > > It only happens on "internet links"
> > > > > > >
> > > > > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > > > > with ixgbe drivers, C is another local node with igb and
> > > > > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > > > > >
> > > > > > > A -> B -> C is ok (B and C is on the same switch)
> > > > > > >
> > > > > > > A -> B -> D -- 32-40mbit
> > > > > > >
> > > > > > > B -> D 944 mbit
> > > > > > > C -> D 944 mbit
> > > > > > >
> > > > > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > > > > >
> > > > > > This should of course be A' -> B -> D
> > > > > >
> > > > > > Sorry, I've been scratching my head for about a week...
> > > > >
> > > > > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > > > > that A has somehow broken TCP offloads. Could you try disabling things
> > > > > via ethtool -K and see if those settings make a difference?
> > > >
> > > > It's a bit hard since it works like this, turned tso off:
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> > > > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> > > > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> > > > [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> > > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> > > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> > > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> > > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> > > > [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> > > > [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
> > > >
> > > > Continued running tests:
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> > > > [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> > > > [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> > > > [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> > > > [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> > > > [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> > > > [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> > > > [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> > > > [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> > > > [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> > > > [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
> > > >
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> > > > [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> > > > [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> > > > [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> > > > [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> > > > [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> > > > [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> > > > [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> > > > [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
> > > >
> > > > And the low bandwidth continues with:
> > > > ethtool -k enp3s0 |grep ": on"
> > > > rx-vlan-offload: on
> > > > tx-vlan-offload: on [requested off]
> > > > highdma: on [fixed]
> > > > rx-vlan-filter: on [fixed]
> > > > tx-gre-segmentation: on
> > > > tx-gre-csum-segmentation: on
> > > > tx-ipxip4-segmentation: on
> > > > tx-ipxip6-segmentation: on
> > > > tx-udp_tnl-segmentation: on
> > > > tx-udp_tnl-csum-segmentation: on
> > > > tx-gso-partial: on
> > > > tx-udp-segmentation: on
> > > > hw-tc-offload: on
> > > >
> > > > Can't quite find how to turn those off since they aren't listed in
> > > > ethtool (since the text is not what you use to enable/disable)
> > >
> > > To disable them you would just repeat the same string in the display
> > > string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
> > > and that would turn off a large chunk of them as all the encapsulated
> > > support requires gso partial support.
> >
> > ethtool -k enp3s0 |grep ": on"
> > highdma: on [fixed]
> > rx-vlan-filter: on [fixed]
> > ---
> > And then back to back:
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 4.91 MBytes 41.2 Mbits/sec 0 45.2 KBytes
> > [ 5] 1.00-2.00 sec 4.47 MBytes 37.5 Mbits/sec 0 52.3 KBytes
> > [ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 141 KBytes
> > [ 5] 4.00-5.00 sec 111 MBytes 928 Mbits/sec 63 764 KBytes
> > [ 5] 5.00-6.00 sec 86.2 MBytes 724 Mbits/sec 0 744 KBytes
> > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 61 769 KBytes
> > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 749 KBytes
> > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 741 KBytes
> > [ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 31 761 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 644 MBytes 540 Mbits/sec 155 sender
> > [ 5] 0.00-10.01 sec 641 MBytes 537 Mbits/sec receiver
> >
> > and we're back at the not working bit:
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 4.84 MBytes 40.6 Mbits/sec 0 42.4 KBytes
> > [ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 42.4 KBytes
> > [ 5] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > [ 5] 4.00-5.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > [ 5] 5.00-6.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > [ 5] 6.00-7.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > [ 5] 7.00-8.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > [ 5] 8.00-9.00 sec 4.47 MBytes 37.5 Mbits/sec 0 53.7 KBytes
> > [ 5] 9.00-10.00 sec 4.47 MBytes 37.5 Mbits/sec 0 79.2 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 44.5 MBytes 37.3 Mbits/sec 0 sender
> > [ 5] 0.00-10.00 sec 43.9 MBytes 36.8 Mbits/sec receiver
> >
> > > > I was hoping that you'd have a clue of something that might introduce
> > > > a regression - ie specific patches to try to revert
> > > >
> > > > Btw, the same issue applies to udp as werll
> > > >
> > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> > > > [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> > > > [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> > > > [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> > > > [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> > > > [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> > > > [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> > > > [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> > > > [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> > > > [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Jitter
> > > > Lost/Total Datagrams
> > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> > > > 0/32584 (0%) sender
> > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> > > > 0/32573 (0%) receiver
> > > >
> > > > vs:
> > > >
> > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> > > > [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> > > > [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> > > > [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> > > > [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> > > > [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> > > > [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> > > > [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> > > > [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> > > > [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Jitter
> > > > Lost/Total Datagrams
> > > > [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> > > > 0/824530 (0%) sender
> > > > [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> > > > 4756/824530 (0.58%) receiver
> > >
> > > The fact that it is impacting UDP seems odd. I wonder if we don't have
> > > a qdisc somewhere that is misbehaving and throttling the Tx. Either
> > > that or I wonder if we are getting spammed with flow control frames.
> >
> > it sometimes works, it looks like the cwindow just isn't increased -
> > that's where i started...
> >
> > Example:
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 4.86 MBytes 40.8 Mbits/sec 0 50.9 KBytes
> > [ 5] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 0 65.0 KBytes
> > [ 5] 2.00-3.00 sec 4.29 MBytes 36.0 Mbits/sec 0 42.4 KBytes
> > [ 5] 3.00-4.00 sec 4.66 MBytes 39.1 Mbits/sec 0 42.4 KBytes
> > [ 5] 4.00-5.00 sec 23.1 MBytes 194 Mbits/sec 0 1.07 MBytes
> > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 60 806 KBytes
> > [ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0 812 KBytes
> > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 92 761 KBytes
> > [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 755 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 554 MBytes 465 Mbits/sec 152 sender
> > [ 5] 0.00-10.00 sec 550 MBytes 461 Mbits/sec receiver
> >
> > > It would be useful to include the output of just calling "ethtool
> > > enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
> > > verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
> > > to output the statistics and dump anything that isn't zero.
> >
> > ethtool enp3s0
> > Settings for enp3s0:
> > Supported ports: [ TP ]
> > Supported link modes: 10baseT/Half 10baseT/Full
> > 100baseT/Half 100baseT/Full
> > 1000baseT/Full
> > Supported pause frame use: Symmetric
> > Supports auto-negotiation: Yes
> > Supported FEC modes: Not reported
> > Advertised link modes: 10baseT/Half 10baseT/Full
> > 100baseT/Half 100baseT/Full
> > 1000baseT/Full
> > Advertised pause frame use: Symmetric
> > Advertised auto-negotiation: Yes
> > Advertised FEC modes: Not reported
> > Speed: 1000Mb/s
> > Duplex: Full
> > Auto-negotiation: on
> > Port: Twisted Pair
> > PHYAD: 1
> > Transceiver: internal
> > MDI-X: off (auto)
> > Supports Wake-on: pumbg
> > Wake-on: g
> > Current message level: 0x00000007 (7)
> > drv probe link
> > Link detected: yes
> > ---
> > ethtool -a enp3s0
> > Pause parameters for enp3s0:
> > Autonegotiate: on
> > RX: on
> > TX: off
> > ---
> > ethtool -S enp3s0 |grep -v :\ 0
> > NIC statistics:
> > rx_packets: 15920618
> > tx_packets: 17846725
> > rx_bytes: 15676264423
> > tx_bytes: 19925010639
> > rx_broadcast: 119553
> > tx_broadcast: 497
> > rx_multicast: 330193
> > tx_multicast: 18190
> > multicast: 330193
> > rx_missed_errors: 270102
> > rx_long_length_errors: 6
> > tx_tcp_seg_good: 1342561
> > rx_long_byte_count: 15676264423
> > rx_errors: 6
> > rx_length_errors: 6
> > rx_fifo_errors: 270102
> > tx_queue_0_packets: 7651168
> > tx_queue_0_bytes: 7823281566
> > tx_queue_0_restart: 4920
> > tx_queue_1_packets: 10195557
> > tx_queue_1_bytes: 12027522118
> > tx_queue_1_restart: 12718
> > rx_queue_0_packets: 15920618
> > rx_queue_0_bytes: 15612581951
> > rx_queue_0_csum_err: 76
> > (I've only run two runs since i reenabled the interface)
>
> So I am seeing three things here.
>
> The rx_long_length_errors are usually due to an MTU mismatch. Do you
> have something on the network that is using jumbo frames, or is the
> MTU on the NIC set to something smaller than what is supported on the
> network?
I'm using jumbo frames on the local network, internet side is the
normal 1500 bytes mtu though
> You are getting rx_missed_errors, that would seem to imply that the
> DMA is not able to keep up. We may want to try disabling the L1 to see
> if we get any boost from doing that.
It used to work, I don't do benchmarks all the time and sometimes the first
benchmarks turn out fine... so it's hard to say when this started happening...
It could also be related to a bios upgrade, but I'm pretty sure I did
successful benchmarks after that...
How do I disable the l1? just echo 0 >
/sys/bus/pci/drivers/igb/0000\:03\:00.0/link/l1_aspm ?
> The last bit is that queue 0 is seeing packets with bad checksums. You
> might want to run some tests and see where the bad checksums are
> coming from. If they are being detected from a specific NIC such as
> the ixgbe in your example it might point to some sort of checksum
> error being created as a result of the NAT translation.
But that should also affect A' and the A -> B -> C case, which it doesn't...
It only seems to happen with higher rtt (6 hops, sub 3 ms in this case
but still high enough somehow)
> > ---
> >
> > > > lspci -s 03:00.0 -vvv
> > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > Connection (rev 03)
> > > > Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > > > Stepping- SERR- FastB2B- DisINTx+
> > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > Latency: 0
> > > > Interrupt: pin A routed to IRQ 57
> > > > IOMMU group: 20
> > > > Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> > > > Region 2: I/O ports at e000 [size=32]
> > > > Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> > > > Capabilities: [40] Power Management version 3
> > > > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> > > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > > > Address: 0000000000000000 Data: 0000
> > > > Masking: 00000000 Pending: 00000000
> > > > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > > > Vector table: BAR=3 offset=00000000
> > > > PBA: BAR=3 offset=00002000
> > > > Capabilities: [a0] Express (v2) Endpoint, MSI 00
> > > > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > > > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> > > > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> > > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> > > > MaxPayload 128 bytes, MaxReadReq 512 bytes
> > > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > > > LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> > > > L0s <2us, L1 <16us
> > > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> > > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > > > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > >
> > > PCIe wise the connection is going to be pretty tight in terms of
> > > bandwidth. It looks like we have 2.5GT/s with only a single lane of
> > > PCIe. In addition we are running with ASPM enabled so that means that
> > > if we don't have enough traffic we are shutting off the one PCIe lane
> > > we have so if we are getting bursty traffic that can get ugly.
> >
> > Humm... is there a way to force disable ASPM in sysfs?
>
> Actually the easiest way to do this is to just use setpci.
>
> You should be able to dump the word containing the setting via:
> # setpci -s 3:00.0 0xB0.w
> 0042
> # setpci -s 3:00.0 0xB0.w=0040
>
> Basically what you do is clear the lower 3 bits of the value so in
> this case that means replacing the 2 with a 0 based on the output of
> the first command.
Well... I'll be damned... I used to force enable ASPM... this must be
related to the change in PCIe bus ASPM
Perhaps disable ASPM if there is only one link?
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 113 MBytes 950 Mbits/sec 31 710 KBytes
[ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 135 626 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 18 713 KBytes
[ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 798 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 0 721 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 31 800 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 730 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 19 730 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 12 701 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 937 Mbits/sec 246 sender
[ 5] 0.00-10.01 sec 1.09 GBytes 933 Mbits/sec receiver
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 749 KBytes
[ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 30 766 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 7 749 KBytes
[ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 8 699 KBytes
[ 5] 6.00-7.00 sec 112 MBytes 944 Mbits/sec 1 953 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 26 707 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 2 1.07 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec 87 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 953 Mbits/sec 16 908 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 944 Mbits/sec 0 693 KBytes
[ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 0 713 KBytes
[ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 687 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 15 687 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 2 888 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 17 696 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 758 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 31 749 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 792 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 81 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 747 KBytes
[ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 0 744 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 12 1.18 MBytes
[ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
[ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 28 699 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 684 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 741 KBytes
[ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 3 687 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 22 699 KBytes
[ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 78 sender
[ 5] 0.00-10.01 sec 1.09 GBytes 934 Mbits/sec receiver
---
ethtool -S enp3s0 |grep -v ": 0"
NIC statistics:
rx_packets: 16303520
tx_packets: 21602840
rx_bytes: 15711958157
tx_bytes: 25599009212
rx_broadcast: 122212
tx_broadcast: 530
rx_multicast: 333489
tx_multicast: 18446
multicast: 333489
rx_missed_errors: 270143
rx_long_length_errors: 6
tx_tcp_seg_good: 1342561
rx_long_byte_count: 15711958157
rx_errors: 6
rx_length_errors: 6
rx_fifo_errors: 270143
tx_queue_0_packets: 8963830
tx_queue_0_bytes: 9803196683
tx_queue_0_restart: 4920
tx_queue_1_packets: 12639010
tx_queue_1_bytes: 15706576814
tx_queue_1_restart: 12718
rx_queue_0_packets: 16303520
rx_queue_0_bytes: 15646744077
rx_queue_0_csum_err: 76
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-15 23:59 ` Ian Kumlien
@ 2020-07-16 15:18 ` Alexander Duyck
2020-07-16 19:47 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2020-07-16 15:18 UTC (permalink / raw)
To: Ian Kumlien
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Wed, Jul 15, 2020 at 5:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Thu, Jul 16, 2020 at 1:42 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> > On Wed, Jul 15, 2020 at 3:51 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > On Thu, Jul 16, 2020 at 12:32 AM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > > On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > > > > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > > > > > >
> > > > > > > > > > driver: igb
> > > > > > > > > > version: 5.6.0-k
> > > > > > > > > > firmware-version: 0. 6-1
> > > > > > > > > >
> > > > > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > > > > Connection (rev 03)
> > > > > > > > >
> > > > > > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > > > > > after a kernel upgrade? Compared to no NAT?
> > > > > > > >
> > > > > > > > It only happens on "internet links"
> > > > > > > >
> > > > > > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > > > > > with ixgbe drivers, C is another local node with igb and
> > > > > > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > > > > > >
> > > > > > > > A -> B -> C is ok (B and C is on the same switch)
> > > > > > > >
> > > > > > > > A -> B -> D -- 32-40mbit
> > > > > > > >
> > > > > > > > B -> D 944 mbit
> > > > > > > > C -> D 944 mbit
> > > > > > > >
> > > > > > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > > > > > >
> > > > > > > This should of course be A' -> B -> D
> > > > > > >
> > > > > > > Sorry, I've been scratching my head for about a week...
> > > > > >
> > > > > > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > > > > > that A has somehow broken TCP offloads. Could you try disabling things
> > > > > > via ethtool -K and see if those settings make a difference?
> > > > >
> > > > > It's a bit hard since it works like this, turned tso off:
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> > > > > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> > > > > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> > > > > [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> > > > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> > > > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> > > > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> > > > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> > > > > [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> > > > > [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
> > > > >
> > > > > Continued running tests:
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> > > > > [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> > > > > [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> > > > > [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> > > > > [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> > > > > [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> > > > > [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> > > > > [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> > > > > [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> > > > > [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> > > > > [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
> > > > >
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> > > > > [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> > > > > [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> > > > > [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > > [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> > > > > [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> > > > > [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> > > > > [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > > [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> > > > > [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> > > > > [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
> > > > >
> > > > > And the low bandwidth continues with:
> > > > > ethtool -k enp3s0 |grep ": on"
> > > > > rx-vlan-offload: on
> > > > > tx-vlan-offload: on [requested off]
> > > > > highdma: on [fixed]
> > > > > rx-vlan-filter: on [fixed]
> > > > > tx-gre-segmentation: on
> > > > > tx-gre-csum-segmentation: on
> > > > > tx-ipxip4-segmentation: on
> > > > > tx-ipxip6-segmentation: on
> > > > > tx-udp_tnl-segmentation: on
> > > > > tx-udp_tnl-csum-segmentation: on
> > > > > tx-gso-partial: on
> > > > > tx-udp-segmentation: on
> > > > > hw-tc-offload: on
> > > > >
> > > > > Can't quite find how to turn those off since they aren't listed in
> > > > > ethtool (since the text is not what you use to enable/disable)
> > > >
> > > > To disable them you would just repeat the same string in the display
> > > > string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
> > > > and that would turn off a large chunk of them as all the encapsulated
> > > > support requires gso partial support.
> > >
> > > ethtool -k enp3s0 |grep ": on"
> > > highdma: on [fixed]
> > > rx-vlan-filter: on [fixed]
> > > ---
> > > And then back to back:
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 4.91 MBytes 41.2 Mbits/sec 0 45.2 KBytes
> > > [ 5] 1.00-2.00 sec 4.47 MBytes 37.5 Mbits/sec 0 52.3 KBytes
> > > [ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 141 KBytes
> > > [ 5] 4.00-5.00 sec 111 MBytes 928 Mbits/sec 63 764 KBytes
> > > [ 5] 5.00-6.00 sec 86.2 MBytes 724 Mbits/sec 0 744 KBytes
> > > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 61 769 KBytes
> > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 749 KBytes
> > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 741 KBytes
> > > [ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 31 761 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 644 MBytes 540 Mbits/sec 155 sender
> > > [ 5] 0.00-10.01 sec 641 MBytes 537 Mbits/sec receiver
> > >
> > > and we're back at the not working bit:
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 4.84 MBytes 40.6 Mbits/sec 0 42.4 KBytes
> > > [ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 42.4 KBytes
> > > [ 5] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > > [ 5] 4.00-5.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > > [ 5] 5.00-6.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > [ 5] 6.00-7.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > [ 5] 7.00-8.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > > [ 5] 8.00-9.00 sec 4.47 MBytes 37.5 Mbits/sec 0 53.7 KBytes
> > > [ 5] 9.00-10.00 sec 4.47 MBytes 37.5 Mbits/sec 0 79.2 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 44.5 MBytes 37.3 Mbits/sec 0 sender
> > > [ 5] 0.00-10.00 sec 43.9 MBytes 36.8 Mbits/sec receiver
> > >
> > > > > I was hoping that you'd have a clue of something that might introduce
> > > > > a regression - ie specific patches to try to revert
> > > > >
> > > > > Btw, the same issue applies to udp as werll
> > > > >
> > > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > > [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> > > > > [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> > > > > [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> > > > > [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> > > > > [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> > > > > [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> > > > > [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> > > > > [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> > > > > [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> > > > > [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Jitter
> > > > > Lost/Total Datagrams
> > > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> > > > > 0/32584 (0%) sender
> > > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> > > > > 0/32573 (0%) receiver
> > > > >
> > > > > vs:
> > > > >
> > > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > > [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> > > > > [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> > > > > [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> > > > > [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> > > > > [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> > > > > [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> > > > > [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> > > > > [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> > > > > [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> > > > > [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Jitter
> > > > > Lost/Total Datagrams
> > > > > [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> > > > > 0/824530 (0%) sender
> > > > > [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> > > > > 4756/824530 (0.58%) receiver
> > > >
> > > > The fact that it is impacting UDP seems odd. I wonder if we don't have
> > > > a qdisc somewhere that is misbehaving and throttling the Tx. Either
> > > > that or I wonder if we are getting spammed with flow control frames.
> > >
> > > it sometimes works, it looks like the cwindow just isn't increased -
> > > that's where i started...
> > >
> > > Example:
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 4.86 MBytes 40.8 Mbits/sec 0 50.9 KBytes
> > > [ 5] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 0 65.0 KBytes
> > > [ 5] 2.00-3.00 sec 4.29 MBytes 36.0 Mbits/sec 0 42.4 KBytes
> > > [ 5] 3.00-4.00 sec 4.66 MBytes 39.1 Mbits/sec 0 42.4 KBytes
> > > [ 5] 4.00-5.00 sec 23.1 MBytes 194 Mbits/sec 0 1.07 MBytes
> > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 60 806 KBytes
> > > [ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0 812 KBytes
> > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 92 761 KBytes
> > > [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 755 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 554 MBytes 465 Mbits/sec 152 sender
> > > [ 5] 0.00-10.00 sec 550 MBytes 461 Mbits/sec receiver
> > >
> > > > It would be useful to include the output of just calling "ethtool
> > > > enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
> > > > verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
> > > > to output the statistics and dump anything that isn't zero.
> > >
> > > ethtool enp3s0
> > > Settings for enp3s0:
> > > Supported ports: [ TP ]
> > > Supported link modes: 10baseT/Half 10baseT/Full
> > > 100baseT/Half 100baseT/Full
> > > 1000baseT/Full
> > > Supported pause frame use: Symmetric
> > > Supports auto-negotiation: Yes
> > > Supported FEC modes: Not reported
> > > Advertised link modes: 10baseT/Half 10baseT/Full
> > > 100baseT/Half 100baseT/Full
> > > 1000baseT/Full
> > > Advertised pause frame use: Symmetric
> > > Advertised auto-negotiation: Yes
> > > Advertised FEC modes: Not reported
> > > Speed: 1000Mb/s
> > > Duplex: Full
> > > Auto-negotiation: on
> > > Port: Twisted Pair
> > > PHYAD: 1
> > > Transceiver: internal
> > > MDI-X: off (auto)
> > > Supports Wake-on: pumbg
> > > Wake-on: g
> > > Current message level: 0x00000007 (7)
> > > drv probe link
> > > Link detected: yes
> > > ---
> > > ethtool -a enp3s0
> > > Pause parameters for enp3s0:
> > > Autonegotiate: on
> > > RX: on
> > > TX: off
> > > ---
> > > ethtool -S enp3s0 |grep -v :\ 0
> > > NIC statistics:
> > > rx_packets: 15920618
> > > tx_packets: 17846725
> > > rx_bytes: 15676264423
> > > tx_bytes: 19925010639
> > > rx_broadcast: 119553
> > > tx_broadcast: 497
> > > rx_multicast: 330193
> > > tx_multicast: 18190
> > > multicast: 330193
> > > rx_missed_errors: 270102
> > > rx_long_length_errors: 6
> > > tx_tcp_seg_good: 1342561
> > > rx_long_byte_count: 15676264423
> > > rx_errors: 6
> > > rx_length_errors: 6
> > > rx_fifo_errors: 270102
> > > tx_queue_0_packets: 7651168
> > > tx_queue_0_bytes: 7823281566
> > > tx_queue_0_restart: 4920
> > > tx_queue_1_packets: 10195557
> > > tx_queue_1_bytes: 12027522118
> > > tx_queue_1_restart: 12718
> > > rx_queue_0_packets: 15920618
> > > rx_queue_0_bytes: 15612581951
> > > rx_queue_0_csum_err: 76
> > > (I've only run two runs since i reenabled the interface)
> >
> > So I am seeing three things here.
> >
> > The rx_long_length_errors are usually due to an MTU mismatch. Do you
> > have something on the network that is using jumbo frames, or is the
> > MTU on the NIC set to something smaller than what is supported on the
> > network?
>
> I'm using jumbo frames on the local network, internet side is the
> normal 1500 bytes mtu though
>
> > You are getting rx_missed_errors, that would seem to imply that the
> > DMA is not able to keep up. We may want to try disabling the L1 to see
> > if we get any boost from doing that.
>
> It used to work, I don't do benchmarks all the time and sometimes the first
> benchmarks turn out fine... so it's hard to say when this started happening...
>
> It could also be related to a bios upgrade, but I'm pretty sure I did
> successful benchmarks after that...
>
> How do I disable the l1? just echo 0 >
> /sys/bus/pci/drivers/igb/0000\:03\:00.0/link/l1_aspm ?
>
> > The last bit is that queue 0 is seeing packets with bad checksums. You
> > might want to run some tests and see where the bad checksums are
> > coming from. If they are being detected from a specific NIC such as
> > the ixgbe in your example it might point to some sort of checksum
> > error being created as a result of the NAT translation.
>
> But that should also affect A' and the A -> B -> C case, which it doesn't...
>
> It only seems to happen with higher rtt (6 hops, sub 3 ms in this case
> but still high enough somehow)
>
> > > ---
> > >
> > > > > lspci -s 03:00.0 -vvv
> > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > Connection (rev 03)
> > > > > Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> > > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > > > > Stepping- SERR- FastB2B- DisINTx+
> > > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > > Latency: 0
> > > > > Interrupt: pin A routed to IRQ 57
> > > > > IOMMU group: 20
> > > > > Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> > > > > Region 2: I/O ports at e000 [size=32]
> > > > > Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> > > > > Capabilities: [40] Power Management version 3
> > > > > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> > > > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > > > > Address: 0000000000000000 Data: 0000
> > > > > Masking: 00000000 Pending: 00000000
> > > > > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > > > > Vector table: BAR=3 offset=00000000
> > > > > PBA: BAR=3 offset=00002000
> > > > > Capabilities: [a0] Express (v2) Endpoint, MSI 00
> > > > > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > > > > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> > > > > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> > > > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> > > > > MaxPayload 128 bytes, MaxReadReq 512 bytes
> > > > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > > > > LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> > > > > L0s <2us, L1 <16us
> > > > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> > > > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > > > > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > > >
> > > > PCIe wise the connection is going to be pretty tight in terms of
> > > > bandwidth. It looks like we have 2.5GT/s with only a single lane of
> > > > PCIe. In addition we are running with ASPM enabled so that means that
> > > > if we don't have enough traffic we are shutting off the one PCIe lane
> > > > we have so if we are getting bursty traffic that can get ugly.
> > >
> > > Humm... is there a way to force disable ASPM in sysfs?
> >
> > Actually the easiest way to do this is to just use setpci.
> >
> > You should be able to dump the word containing the setting via:
> > # setpci -s 3:00.0 0xB0.w
> > 0042
> > # setpci -s 3:00.0 0xB0.w=0040
> >
> > Basically what you do is clear the lower 3 bits of the value so in
> > this case that means replacing the 2 with a 0 based on the output of
> > the first command.
>
> Well... I'll be damned... I used to force enable ASPM... this must be
> related to the change in PCIe bus ASPM
> Perhaps disable ASPM if there is only one link?
Is there any specific reason why you are enabling ASPM? Is this system
a laptop where you are trying to conserve power when on battery? If
not disabling it probably won't hurt things too much since the power
consumption for a 2.5GT/s link operating in a width of one shouldn't
bee too high. Otherwise you are likely going to end up paying the
price for getting the interface out of L1 when the traffic goes idle
so you are going to see flows that get bursty paying a heavy penalty
when they start dropping packets.
It is also possible this could be something that changed with the
physical PCIe link. Basically L1 works by powering down the link when
idle, and then powering it back up when there is activity. The problem
is bringing it back up can sometimes be a challenge when the physical
link starts to go faulty. I know I have seen that in some cases it can
even result in the device falling off of the PCIe bus if the link
training fails.
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 113 MBytes 950 Mbits/sec 31 710 KBytes
> [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 135 626 KBytes
> [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 18 713 KBytes
> [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 798 KBytes
> [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 0 721 KBytes
> [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 31 800 KBytes
> [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 730 KBytes
> [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 19 730 KBytes
> [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
> [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 12 701 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 1.09 GBytes 937 Mbits/sec 246 sender
> [ 5] 0.00-10.01 sec 1.09 GBytes 933 Mbits/sec receiver
>
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 749 KBytes
> [ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 30 766 KBytes
> [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 7 749 KBytes
> [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
> [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
> [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 8 699 KBytes
> [ 5] 6.00-7.00 sec 112 MBytes 944 Mbits/sec 1 953 KBytes
> [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
> [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 26 707 KBytes
> [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 2 1.07 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec 87 sender
> [ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
>
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 114 MBytes 953 Mbits/sec 16 908 KBytes
> [ 5] 1.00-2.00 sec 112 MBytes 944 Mbits/sec 0 693 KBytes
> [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 0 713 KBytes
> [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 687 KBytes
> [ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 15 687 KBytes
> [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 2 888 KBytes
> [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 17 696 KBytes
> [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 758 KBytes
> [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 31 749 KBytes
> [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 792 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 81 sender
> [ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
>
> [ ID] Interval Transfer Bitrate Retr Cwnd
> [ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 747 KBytes
> [ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 0 744 KBytes
> [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 12 1.18 MBytes
> [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
> [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 28 699 KBytes
> [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 684 KBytes
> [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 741 KBytes
> [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 3 687 KBytes
> [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 22 699 KBytes
> [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 78 sender
> [ 5] 0.00-10.01 sec 1.09 GBytes 934 Mbits/sec receiver
> ---
>
> ethtool -S enp3s0 |grep -v ": 0"
> NIC statistics:
> rx_packets: 16303520
> tx_packets: 21602840
> rx_bytes: 15711958157
> tx_bytes: 25599009212
> rx_broadcast: 122212
> tx_broadcast: 530
> rx_multicast: 333489
> tx_multicast: 18446
> multicast: 333489
> rx_missed_errors: 270143
> rx_long_length_errors: 6
> tx_tcp_seg_good: 1342561
> rx_long_byte_count: 15711958157
> rx_errors: 6
> rx_length_errors: 6
> rx_fifo_errors: 270143
> tx_queue_0_packets: 8963830
> tx_queue_0_bytes: 9803196683
> tx_queue_0_restart: 4920
> tx_queue_1_packets: 12639010
> tx_queue_1_bytes: 15706576814
> tx_queue_1_restart: 12718
> rx_queue_0_packets: 16303520
> rx_queue_0_bytes: 15646744077
> rx_queue_0_csum_err: 76
Okay, so this result still has the same length and checksum errors,
were you resetting the system/statistics between runs?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-16 15:18 ` Alexander Duyck
@ 2020-07-16 19:47 ` Ian Kumlien
2020-07-17 0:09 ` Alexander Duyck
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-16 19:47 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
Sorry, tried to respond via the phone, used the webbrowser version but
still html mails... :/
On Thu, Jul 16, 2020 at 5:18 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Wed, Jul 15, 2020 at 5:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Thu, Jul 16, 2020 at 1:42 AM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > > On Wed, Jul 15, 2020 at 3:51 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > On Thu, Jul 16, 2020 at 12:32 AM Alexander Duyck
> > > > <alexander.duyck@gmail.com> wrote:
> > > > > On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > > > > > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > > > > > > >
> > > > > > > > > > > driver: igb
> > > > > > > > > > > version: 5.6.0-k
> > > > > > > > > > > firmware-version: 0. 6-1
> > > > > > > > > > >
> > > > > > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > > > > > Connection (rev 03)
> > > > > > > > > >
> > > > > > > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > > > > > > after a kernel upgrade? Compared to no NAT?
> > > > > > > > >
> > > > > > > > > It only happens on "internet links"
> > > > > > > > >
> > > > > > > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > > > > > > with ixgbe drivers, C is another local node with igb and
> > > > > > > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > > > > > > >
> > > > > > > > > A -> B -> C is ok (B and C is on the same switch)
> > > > > > > > >
> > > > > > > > > A -> B -> D -- 32-40mbit
> > > > > > > > >
> > > > > > > > > B -> D 944 mbit
> > > > > > > > > C -> D 944 mbit
> > > > > > > > >
> > > > > > > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > > > > > > >
> > > > > > > > This should of course be A' -> B -> D
> > > > > > > >
> > > > > > > > Sorry, I've been scratching my head for about a week...
> > > > > > >
> > > > > > > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > > > > > > that A has somehow broken TCP offloads. Could you try disabling things
> > > > > > > via ethtool -K and see if those settings make a difference?
> > > > > >
> > > > > > It's a bit hard since it works like this, turned tso off:
> > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > > [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> > > > > > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> > > > > > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> > > > > > [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> > > > > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> > > > > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> > > > > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> > > > > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> > > > > > [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> > > > > > [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
> > > > > >
> > > > > > Continued running tests:
> > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > > [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> > > > > > [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> > > > > > [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> > > > > > [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> > > > > > [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> > > > > > [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> > > > > > [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> > > > > > [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> > > > > > [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> > > > > > [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> > > > > > [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
> > > > > >
> > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > > [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> > > > > > [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> > > > > > [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> > > > > > [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > > > [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> > > > > > [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> > > > > > [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> > > > > > [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > > > [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> > > > > > [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> > > > > > [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
> > > > > >
> > > > > > And the low bandwidth continues with:
> > > > > > ethtool -k enp3s0 |grep ": on"
> > > > > > rx-vlan-offload: on
> > > > > > tx-vlan-offload: on [requested off]
> > > > > > highdma: on [fixed]
> > > > > > rx-vlan-filter: on [fixed]
> > > > > > tx-gre-segmentation: on
> > > > > > tx-gre-csum-segmentation: on
> > > > > > tx-ipxip4-segmentation: on
> > > > > > tx-ipxip6-segmentation: on
> > > > > > tx-udp_tnl-segmentation: on
> > > > > > tx-udp_tnl-csum-segmentation: on
> > > > > > tx-gso-partial: on
> > > > > > tx-udp-segmentation: on
> > > > > > hw-tc-offload: on
> > > > > >
> > > > > > Can't quite find how to turn those off since they aren't listed in
> > > > > > ethtool (since the text is not what you use to enable/disable)
> > > > >
> > > > > To disable them you would just repeat the same string in the display
> > > > > string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
> > > > > and that would turn off a large chunk of them as all the encapsulated
> > > > > support requires gso partial support.
> > > >
> > > > ethtool -k enp3s0 |grep ": on"
> > > > highdma: on [fixed]
> > > > rx-vlan-filter: on [fixed]
> > > > ---
> > > > And then back to back:
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 4.91 MBytes 41.2 Mbits/sec 0 45.2 KBytes
> > > > [ 5] 1.00-2.00 sec 4.47 MBytes 37.5 Mbits/sec 0 52.3 KBytes
> > > > [ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 141 KBytes
> > > > [ 5] 4.00-5.00 sec 111 MBytes 928 Mbits/sec 63 764 KBytes
> > > > [ 5] 5.00-6.00 sec 86.2 MBytes 724 Mbits/sec 0 744 KBytes
> > > > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 61 769 KBytes
> > > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 749 KBytes
> > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 741 KBytes
> > > > [ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 31 761 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 644 MBytes 540 Mbits/sec 155 sender
> > > > [ 5] 0.00-10.01 sec 641 MBytes 537 Mbits/sec receiver
> > > >
> > > > and we're back at the not working bit:
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 4.84 MBytes 40.6 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > > > [ 5] 4.00-5.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 5.00-6.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 6.00-7.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 7.00-8.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > > > [ 5] 8.00-9.00 sec 4.47 MBytes 37.5 Mbits/sec 0 53.7 KBytes
> > > > [ 5] 9.00-10.00 sec 4.47 MBytes 37.5 Mbits/sec 0 79.2 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 44.5 MBytes 37.3 Mbits/sec 0 sender
> > > > [ 5] 0.00-10.00 sec 43.9 MBytes 36.8 Mbits/sec receiver
> > > >
> > > > > > I was hoping that you'd have a clue of something that might introduce
> > > > > > a regression - ie specific patches to try to revert
> > > > > >
> > > > > > Btw, the same issue applies to udp as werll
> > > > > >
> > > > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > > > [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> > > > > > [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> > > > > > [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> > > > > > [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> > > > > > [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> > > > > > [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> > > > > > [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> > > > > > [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> > > > > > [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> > > > > > [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval Transfer Bitrate Jitter
> > > > > > Lost/Total Datagrams
> > > > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> > > > > > 0/32584 (0%) sender
> > > > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> > > > > > 0/32573 (0%) receiver
> > > > > >
> > > > > > vs:
> > > > > >
> > > > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > > > [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> > > > > > [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> > > > > > [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> > > > > > [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> > > > > > [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> > > > > > [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> > > > > > [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> > > > > > [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> > > > > > [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> > > > > > [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > [ ID] Interval Transfer Bitrate Jitter
> > > > > > Lost/Total Datagrams
> > > > > > [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> > > > > > 0/824530 (0%) sender
> > > > > > [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> > > > > > 4756/824530 (0.58%) receiver
> > > > >
> > > > > The fact that it is impacting UDP seems odd. I wonder if we don't have
> > > > > a qdisc somewhere that is misbehaving and throttling the Tx. Either
> > > > > that or I wonder if we are getting spammed with flow control frames.
> > > >
> > > > it sometimes works, it looks like the cwindow just isn't increased -
> > > > that's where i started...
> > > >
> > > > Example:
> > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > [ 5] 0.00-1.00 sec 4.86 MBytes 40.8 Mbits/sec 0 50.9 KBytes
> > > > [ 5] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 0 65.0 KBytes
> > > > [ 5] 2.00-3.00 sec 4.29 MBytes 36.0 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 3.00-4.00 sec 4.66 MBytes 39.1 Mbits/sec 0 42.4 KBytes
> > > > [ 5] 4.00-5.00 sec 23.1 MBytes 194 Mbits/sec 0 1.07 MBytes
> > > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 60 806 KBytes
> > > > [ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0 812 KBytes
> > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 92 761 KBytes
> > > > [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 755 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval Transfer Bitrate Retr
> > > > [ 5] 0.00-10.00 sec 554 MBytes 465 Mbits/sec 152 sender
> > > > [ 5] 0.00-10.00 sec 550 MBytes 461 Mbits/sec receiver
> > > >
> > > > > It would be useful to include the output of just calling "ethtool
> > > > > enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
> > > > > verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
> > > > > to output the statistics and dump anything that isn't zero.
> > > >
> > > > ethtool enp3s0
> > > > Settings for enp3s0:
> > > > Supported ports: [ TP ]
> > > > Supported link modes: 10baseT/Half 10baseT/Full
> > > > 100baseT/Half 100baseT/Full
> > > > 1000baseT/Full
> > > > Supported pause frame use: Symmetric
> > > > Supports auto-negotiation: Yes
> > > > Supported FEC modes: Not reported
> > > > Advertised link modes: 10baseT/Half 10baseT/Full
> > > > 100baseT/Half 100baseT/Full
> > > > 1000baseT/Full
> > > > Advertised pause frame use: Symmetric
> > > > Advertised auto-negotiation: Yes
> > > > Advertised FEC modes: Not reported
> > > > Speed: 1000Mb/s
> > > > Duplex: Full
> > > > Auto-negotiation: on
> > > > Port: Twisted Pair
> > > > PHYAD: 1
> > > > Transceiver: internal
> > > > MDI-X: off (auto)
> > > > Supports Wake-on: pumbg
> > > > Wake-on: g
> > > > Current message level: 0x00000007 (7)
> > > > drv probe link
> > > > Link detected: yes
> > > > ---
> > > > ethtool -a enp3s0
> > > > Pause parameters for enp3s0:
> > > > Autonegotiate: on
> > > > RX: on
> > > > TX: off
> > > > ---
> > > > ethtool -S enp3s0 |grep -v :\ 0
> > > > NIC statistics:
> > > > rx_packets: 15920618
> > > > tx_packets: 17846725
> > > > rx_bytes: 15676264423
> > > > tx_bytes: 19925010639
> > > > rx_broadcast: 119553
> > > > tx_broadcast: 497
> > > > rx_multicast: 330193
> > > > tx_multicast: 18190
> > > > multicast: 330193
> > > > rx_missed_errors: 270102
> > > > rx_long_length_errors: 6
> > > > tx_tcp_seg_good: 1342561
> > > > rx_long_byte_count: 15676264423
> > > > rx_errors: 6
> > > > rx_length_errors: 6
> > > > rx_fifo_errors: 270102
> > > > tx_queue_0_packets: 7651168
> > > > tx_queue_0_bytes: 7823281566
> > > > tx_queue_0_restart: 4920
> > > > tx_queue_1_packets: 10195557
> > > > tx_queue_1_bytes: 12027522118
> > > > tx_queue_1_restart: 12718
> > > > rx_queue_0_packets: 15920618
> > > > rx_queue_0_bytes: 15612581951
> > > > rx_queue_0_csum_err: 76
> > > > (I've only run two runs since i reenabled the interface)
> > >
> > > So I am seeing three things here.
> > >
> > > The rx_long_length_errors are usually due to an MTU mismatch. Do you
> > > have something on the network that is using jumbo frames, or is the
> > > MTU on the NIC set to something smaller than what is supported on the
> > > network?
> >
> > I'm using jumbo frames on the local network, internet side is the
> > normal 1500 bytes mtu though
> >
> > > You are getting rx_missed_errors, that would seem to imply that the
> > > DMA is not able to keep up. We may want to try disabling the L1 to see
> > > if we get any boost from doing that.
> >
> > It used to work, I don't do benchmarks all the time and sometimes the first
> > benchmarks turn out fine... so it's hard to say when this started happening...
> >
> > It could also be related to a bios upgrade, but I'm pretty sure I did
> > successful benchmarks after that...
> >
> > How do I disable the l1? just echo 0 >
> > /sys/bus/pci/drivers/igb/0000\:03\:00.0/link/l1_aspm ?
> >
> > > The last bit is that queue 0 is seeing packets with bad checksums. You
> > > might want to run some tests and see where the bad checksums are
> > > coming from. If they are being detected from a specific NIC such as
> > > the ixgbe in your example it might point to some sort of checksum
> > > error being created as a result of the NAT translation.
> >
> > But that should also affect A' and the A -> B -> C case, which it doesn't...
> >
> > It only seems to happen with higher rtt (6 hops, sub 3 ms in this case
> > but still high enough somehow)
> >
> > > > ---
> > > >
> > > > > > lspci -s 03:00.0 -vvv
> > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > Connection (rev 03)
> > > > > > Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> > > > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > > > > > Stepping- SERR- FastB2B- DisINTx+
> > > > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > > > Latency: 0
> > > > > > Interrupt: pin A routed to IRQ 57
> > > > > > IOMMU group: 20
> > > > > > Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> > > > > > Region 2: I/O ports at e000 [size=32]
> > > > > > Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> > > > > > Capabilities: [40] Power Management version 3
> > > > > > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> > > > > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > > > > > Address: 0000000000000000 Data: 0000
> > > > > > Masking: 00000000 Pending: 00000000
> > > > > > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > > > > > Vector table: BAR=3 offset=00000000
> > > > > > PBA: BAR=3 offset=00002000
> > > > > > Capabilities: [a0] Express (v2) Endpoint, MSI 00
> > > > > > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > > > > > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> > > > > > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> > > > > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> > > > > > MaxPayload 128 bytes, MaxReadReq 512 bytes
> > > > > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > > > > > LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> > > > > > L0s <2us, L1 <16us
> > > > > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> > > > > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > > > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > > > > > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > > > >
> > > > > PCIe wise the connection is going to be pretty tight in terms of
> > > > > bandwidth. It looks like we have 2.5GT/s with only a single lane of
> > > > > PCIe. In addition we are running with ASPM enabled so that means that
> > > > > if we don't have enough traffic we are shutting off the one PCIe lane
> > > > > we have so if we are getting bursty traffic that can get ugly.
> > > >
> > > > Humm... is there a way to force disable ASPM in sysfs?
> > >
> > > Actually the easiest way to do this is to just use setpci.
> > >
> > > You should be able to dump the word containing the setting via:
> > > # setpci -s 3:00.0 0xB0.w
> > > 0042
> > > # setpci -s 3:00.0 0xB0.w=0040
> > >
> > > Basically what you do is clear the lower 3 bits of the value so in
> > > this case that means replacing the 2 with a 0 based on the output of
> > > the first command.
> >
> > Well... I'll be damned... I used to force enable ASPM... this must be
> > related to the change in PCIe bus ASPM
> > Perhaps disable ASPM if there is only one link?
>
> Is there any specific reason why you are enabling ASPM? Is this system
> a laptop where you are trying to conserve power when on battery? If
> not disabling it probably won't hurt things too much since the power
> consumption for a 2.5GT/s link operating in a width of one shouldn't
> bee too high. Otherwise you are likely going to end up paying the
> price for getting the interface out of L1 when the traffic goes idle
> so you are going to see flows that get bursty paying a heavy penalty
> when they start dropping packets.
Ah, you misunderstand, I used to do this and everything worked - now
Linux enables ASPM by default on all pcie controllers,
so imho this should be a quirk, if there is only one lane, don't do
ASPM due to latency and timing issues...
> It is also possible this could be something that changed with the
> physical PCIe link. Basically L1 works by powering down the link when
> idle, and then powering it back up when there is activity. The problem
> is bringing it back up can sometimes be a challenge when the physical
> link starts to go faulty. I know I have seen that in some cases it can
> even result in the device falling off of the PCIe bus if the link
> training fails.
It works fine without ASPM (and the machine is pretty new)
I suspect we hit some timing race with aggressive ASPM (assumed as
such since it works on local links but doesn't on ~3 ms Links)
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 113 MBytes 950 Mbits/sec 31 710 KBytes
> > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 135 626 KBytes
> > [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 18 713 KBytes
> > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 798 KBytes
> > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 0 721 KBytes
> > [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 31 800 KBytes
> > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 730 KBytes
> > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 19 730 KBytes
> > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
> > [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 12 701 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 1.09 GBytes 937 Mbits/sec 246 sender
> > [ 5] 0.00-10.01 sec 1.09 GBytes 933 Mbits/sec receiver
> >
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 749 KBytes
> > [ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 30 766 KBytes
> > [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 7 749 KBytes
> > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
> > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
> > [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 8 699 KBytes
> > [ 5] 6.00-7.00 sec 112 MBytes 944 Mbits/sec 1 953 KBytes
> > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
> > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 26 707 KBytes
> > [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 2 1.07 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec 87 sender
> > [ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
> >
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 114 MBytes 953 Mbits/sec 16 908 KBytes
> > [ 5] 1.00-2.00 sec 112 MBytes 944 Mbits/sec 0 693 KBytes
> > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 0 713 KBytes
> > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 687 KBytes
> > [ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 15 687 KBytes
> > [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 2 888 KBytes
> > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 17 696 KBytes
> > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 758 KBytes
> > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 31 749 KBytes
> > [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 792 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 81 sender
> > [ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
> >
> > [ ID] Interval Transfer Bitrate Retr Cwnd
> > [ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 747 KBytes
> > [ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 0 744 KBytes
> > [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 12 1.18 MBytes
> > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
> > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 28 699 KBytes
> > [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 684 KBytes
> > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 741 KBytes
> > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 3 687 KBytes
> > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 22 699 KBytes
> > [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 78 sender
> > [ 5] 0.00-10.01 sec 1.09 GBytes 934 Mbits/sec receiver
> > ---
> >
> > ethtool -S enp3s0 |grep -v ": 0"
> > NIC statistics:
> > rx_packets: 16303520
> > tx_packets: 21602840
> > rx_bytes: 15711958157
> > tx_bytes: 25599009212
> > rx_broadcast: 122212
> > tx_broadcast: 530
> > rx_multicast: 333489
> > tx_multicast: 18446
> > multicast: 333489
> > rx_missed_errors: 270143
> > rx_long_length_errors: 6
> > tx_tcp_seg_good: 1342561
> > rx_long_byte_count: 15711958157
> > rx_errors: 6
> > rx_length_errors: 6
> > rx_fifo_errors: 270143
> > tx_queue_0_packets: 8963830
> > tx_queue_0_bytes: 9803196683
> > tx_queue_0_restart: 4920
> > tx_queue_1_packets: 12639010
> > tx_queue_1_bytes: 15706576814
> > tx_queue_1_restart: 12718
> > rx_queue_0_packets: 16303520
> > rx_queue_0_bytes: 15646744077
> > rx_queue_0_csum_err: 76
>
> Okay, so this result still has the same length and checksum errors,
> were you resetting the system/statistics between runs?
Ah, no.... Will reset and do more tests when I'm back home
Am I blind or is this part missing from ethtools man page?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-16 19:47 ` Ian Kumlien
@ 2020-07-17 0:09 ` Alexander Duyck
2020-07-17 13:45 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2020-07-17 0:09 UTC (permalink / raw)
To: Ian Kumlien
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Thu, Jul 16, 2020 at 12:47 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Sorry, tried to respond via the phone, used the webbrowser version but
> still html mails... :/
>
> On Thu, Jul 16, 2020 at 5:18 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> > On Wed, Jul 15, 2020 at 5:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > On Thu, Jul 16, 2020 at 1:42 AM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > > On Wed, Jul 15, 2020 at 3:51 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > On Thu, Jul 16, 2020 at 12:32 AM Alexander Duyck
> > > > > <alexander.duyck@gmail.com> wrote:
> > > > > > On Wed, Jul 15, 2020 at 3:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > > On Wed, Jul 15, 2020 at 11:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > On Wed, 15 Jul 2020 23:12:23 +0200 Ian Kumlien wrote:
> > > > > > > > > On Wed, Jul 15, 2020 at 11:02 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > > > > > > > On Wed, Jul 15, 2020 at 10:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > > > > On Wed, 15 Jul 2020 22:05:58 +0200 Ian Kumlien wrote:
> > > > > > > > > > > > After a lot of debugging it turns out that the bug is in igb...
> > > > > > > > > > > >
> > > > > > > > > > > > driver: igb
> > > > > > > > > > > > version: 5.6.0-k
> > > > > > > > > > > > firmware-version: 0. 6-1
> > > > > > > > > > > >
> > > > > > > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > > > > > > Connection (rev 03)
> > > > > > > > > > >
> > > > > > > > > > > Unclear to me what you're actually reporting. Is this a regression
> > > > > > > > > > > after a kernel upgrade? Compared to no NAT?
> > > > > > > > > >
> > > > > > > > > > It only happens on "internet links"
> > > > > > > > > >
> > > > > > > > > > Lets say that A is client with ibg driver, B is a firewall running NAT
> > > > > > > > > > with ixgbe drivers, C is another local node with igb and
> > > > > > > > > > D is a remote node with a bridge backed by a bnx2 interface.
> > > > > > > > > >
> > > > > > > > > > A -> B -> C is ok (B and C is on the same switch)
> > > > > > > > > >
> > > > > > > > > > A -> B -> D -- 32-40mbit
> > > > > > > > > >
> > > > > > > > > > B -> D 944 mbit
> > > > > > > > > > C -> D 944 mbit
> > > > > > > > > >
> > > > > > > > > > A' -> D ~933 mbit (A with realtek nic -- also link is not idle atm)
> > > > > > > > >
> > > > > > > > > This should of course be A' -> B -> D
> > > > > > > > >
> > > > > > > > > Sorry, I've been scratching my head for about a week...
> > > > > > > >
> > > > > > > > Hm, only thing that comes to mind if A' works reliably and A doesn't is
> > > > > > > > that A has somehow broken TCP offloads. Could you try disabling things
> > > > > > > > via ethtool -K and see if those settings make a difference?
> > > > > > >
> > > > > > > It's a bit hard since it works like this, turned tso off:
> > > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > > > [ 5] 0.00-1.00 sec 108 MBytes 902 Mbits/sec 0 783 KBytes
> > > > > > > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 31 812 KBytes
> > > > > > > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 92 772 KBytes
> > > > > > > [ 5] 3.00-4.00 sec 110 MBytes 923 Mbits/sec 0 834 KBytes
> > > > > > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 60 823 KBytes
> > > > > > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 31 789 KBytes
> > > > > > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 786 KBytes
> > > > > > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > > > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 772 KBytes
> > > > > > > [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 868 KBytes
> > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > > [ 5] 0.00-10.00 sec 1.07 GBytes 923 Mbits/sec 214 sender
> > > > > > > [ 5] 0.00-10.00 sec 1.07 GBytes 920 Mbits/sec receiver
> > > > > > >
> > > > > > > Continued running tests:
> > > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > > > [ 5] 0.00-1.00 sec 5.82 MBytes 48.8 Mbits/sec 0 82.0 KBytes
> > > > > > > [ 5] 1.00-2.00 sec 4.97 MBytes 41.7 Mbits/sec 0 130 KBytes
> > > > > > > [ 5] 2.00-3.00 sec 5.28 MBytes 44.3 Mbits/sec 0 99.0 KBytes
> > > > > > > [ 5] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 105 KBytes
> > > > > > > [ 5] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 122 KBytes
> > > > > > > [ 5] 5.00-6.00 sec 5.28 MBytes 44.3 Mbits/sec 0 82.0 KBytes
> > > > > > > [ 5] 6.00-7.00 sec 5.28 MBytes 44.3 Mbits/sec 0 79.2 KBytes
> > > > > > > [ 5] 7.00-8.00 sec 5.28 MBytes 44.3 Mbits/sec 0 110 KBytes
> > > > > > > [ 5] 8.00-9.00 sec 5.28 MBytes 44.3 Mbits/sec 0 156 KBytes
> > > > > > > [ 5] 9.00-10.00 sec 5.28 MBytes 44.3 Mbits/sec 0 87.7 KBytes
> > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > > [ 5] 0.00-10.00 sec 53.0 MBytes 44.5 Mbits/sec 0 sender
> > > > > > > [ 5] 0.00-10.00 sec 52.5 MBytes 44.1 Mbits/sec receiver
> > > > > > >
> > > > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > > > [ 5] 0.00-1.00 sec 7.08 MBytes 59.4 Mbits/sec 0 156 KBytes
> > > > > > > [ 5] 1.00-2.00 sec 5.97 MBytes 50.0 Mbits/sec 0 110 KBytes
> > > > > > > [ 5] 2.00-3.00 sec 4.97 MBytes 41.7 Mbits/sec 0 124 KBytes
> > > > > > > [ 5] 3.00-4.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > > > > [ 5] 4.00-5.00 sec 5.47 MBytes 45.9 Mbits/sec 0 158 KBytes
> > > > > > > [ 5] 5.00-6.00 sec 4.97 MBytes 41.7 Mbits/sec 0 70.7 KBytes
> > > > > > > [ 5] 6.00-7.00 sec 5.47 MBytes 45.9 Mbits/sec 0 113 KBytes
> > > > > > > [ 5] 7.00-8.00 sec 5.47 MBytes 45.9 Mbits/sec 0 96.2 KBytes
> > > > > > > [ 5] 8.00-9.00 sec 4.97 MBytes 41.7 Mbits/sec 0 84.8 KBytes
> > > > > > > [ 5] 9.00-10.00 sec 5.47 MBytes 45.9 Mbits/sec 0 116 KBytes
> > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > > > [ 5] 0.00-10.00 sec 55.3 MBytes 46.4 Mbits/sec 0 sender
> > > > > > > [ 5] 0.00-10.00 sec 53.9 MBytes 45.2 Mbits/sec receiver
> > > > > > >
> > > > > > > And the low bandwidth continues with:
> > > > > > > ethtool -k enp3s0 |grep ": on"
> > > > > > > rx-vlan-offload: on
> > > > > > > tx-vlan-offload: on [requested off]
> > > > > > > highdma: on [fixed]
> > > > > > > rx-vlan-filter: on [fixed]
> > > > > > > tx-gre-segmentation: on
> > > > > > > tx-gre-csum-segmentation: on
> > > > > > > tx-ipxip4-segmentation: on
> > > > > > > tx-ipxip6-segmentation: on
> > > > > > > tx-udp_tnl-segmentation: on
> > > > > > > tx-udp_tnl-csum-segmentation: on
> > > > > > > tx-gso-partial: on
> > > > > > > tx-udp-segmentation: on
> > > > > > > hw-tc-offload: on
> > > > > > >
> > > > > > > Can't quite find how to turn those off since they aren't listed in
> > > > > > > ethtool (since the text is not what you use to enable/disable)
> > > > > >
> > > > > > To disable them you would just repeat the same string in the display
> > > > > > string. So it should just be "ethtool -K enp3s0 tx-gso-partial off"
> > > > > > and that would turn off a large chunk of them as all the encapsulated
> > > > > > support requires gso partial support.
> > > > >
> > > > > ethtool -k enp3s0 |grep ": on"
> > > > > highdma: on [fixed]
> > > > > rx-vlan-filter: on [fixed]
> > > > > ---
> > > > > And then back to back:
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 4.91 MBytes 41.2 Mbits/sec 0 45.2 KBytes
> > > > > [ 5] 1.00-2.00 sec 4.47 MBytes 37.5 Mbits/sec 0 52.3 KBytes
> > > > > [ 5] 2.00-3.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 141 KBytes
> > > > > [ 5] 4.00-5.00 sec 111 MBytes 928 Mbits/sec 63 764 KBytes
> > > > > [ 5] 5.00-6.00 sec 86.2 MBytes 724 Mbits/sec 0 744 KBytes
> > > > > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 61 769 KBytes
> > > > > [ 5] 7.00-8.00 sec 110 MBytes 923 Mbits/sec 0 749 KBytes
> > > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 0 741 KBytes
> > > > > [ 5] 9.00-10.00 sec 110 MBytes 923 Mbits/sec 31 761 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > [ 5] 0.00-10.00 sec 644 MBytes 540 Mbits/sec 155 sender
> > > > > [ 5] 0.00-10.01 sec 641 MBytes 537 Mbits/sec receiver
> > > > >
> > > > > and we're back at the not working bit:
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 4.84 MBytes 40.6 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 1.00-2.00 sec 4.60 MBytes 38.6 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 2.00-3.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 3.00-4.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > > > > [ 5] 4.00-5.00 sec 4.47 MBytes 37.5 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 5.00-6.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 6.00-7.00 sec 4.23 MBytes 35.4 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 7.00-8.00 sec 4.47 MBytes 37.5 Mbits/sec 0 67.9 KBytes
> > > > > [ 5] 8.00-9.00 sec 4.47 MBytes 37.5 Mbits/sec 0 53.7 KBytes
> > > > > [ 5] 9.00-10.00 sec 4.47 MBytes 37.5 Mbits/sec 0 79.2 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > [ 5] 0.00-10.00 sec 44.5 MBytes 37.3 Mbits/sec 0 sender
> > > > > [ 5] 0.00-10.00 sec 43.9 MBytes 36.8 Mbits/sec receiver
> > > > >
> > > > > > > I was hoping that you'd have a clue of something that might introduce
> > > > > > > a regression - ie specific patches to try to revert
> > > > > > >
> > > > > > > Btw, the same issue applies to udp as werll
> > > > > > >
> > > > > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > > > > [ 5] 0.00-1.00 sec 6.77 MBytes 56.8 Mbits/sec 4900
> > > > > > > [ 5] 1.00-2.00 sec 4.27 MBytes 35.8 Mbits/sec 3089
> > > > > > > [ 5] 2.00-3.00 sec 4.20 MBytes 35.2 Mbits/sec 3041
> > > > > > > [ 5] 3.00-4.00 sec 4.30 MBytes 36.1 Mbits/sec 3116
> > > > > > > [ 5] 4.00-5.00 sec 4.24 MBytes 35.6 Mbits/sec 3070
> > > > > > > [ 5] 5.00-6.00 sec 4.21 MBytes 35.3 Mbits/sec 3047
> > > > > > > [ 5] 6.00-7.00 sec 4.29 MBytes 36.0 Mbits/sec 3110
> > > > > > > [ 5] 7.00-8.00 sec 4.28 MBytes 35.9 Mbits/sec 3097
> > > > > > > [ 5] 8.00-9.00 sec 4.25 MBytes 35.6 Mbits/sec 3075
> > > > > > > [ 5] 9.00-10.00 sec 4.20 MBytes 35.2 Mbits/sec 3039
> > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > > [ ID] Interval Transfer Bitrate Jitter
> > > > > > > Lost/Total Datagrams
> > > > > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.000 ms
> > > > > > > 0/32584 (0%) sender
> > > > > > > [ 5] 0.00-10.00 sec 45.0 MBytes 37.7 Mbits/sec 0.037 ms
> > > > > > > 0/32573 (0%) receiver
> > > > > > >
> > > > > > > vs:
> > > > > > >
> > > > > > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > > > > > [ 5] 0.00-1.00 sec 114 MBytes 954 Mbits/sec 82342
> > > > > > > [ 5] 1.00-2.00 sec 114 MBytes 955 Mbits/sec 82439
> > > > > > > [ 5] 2.00-3.00 sec 114 MBytes 956 Mbits/sec 82507
> > > > > > > [ 5] 3.00-4.00 sec 114 MBytes 955 Mbits/sec 82432
> > > > > > > [ 5] 4.00-5.00 sec 114 MBytes 956 Mbits/sec 82535
> > > > > > > [ 5] 5.00-6.00 sec 114 MBytes 953 Mbits/sec 82240
> > > > > > > [ 5] 6.00-7.00 sec 114 MBytes 956 Mbits/sec 82512
> > > > > > > [ 5] 7.00-8.00 sec 114 MBytes 956 Mbits/sec 82503
> > > > > > > [ 5] 8.00-9.00 sec 114 MBytes 956 Mbits/sec 82532
> > > > > > > [ 5] 9.00-10.00 sec 114 MBytes 956 Mbits/sec 82488
> > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > > > [ ID] Interval Transfer Bitrate Jitter
> > > > > > > Lost/Total Datagrams
> > > > > > > [ 5] 0.00-10.00 sec 1.11 GBytes 955 Mbits/sec 0.000 ms
> > > > > > > 0/824530 (0%) sender
> > > > > > > [ 5] 0.00-10.01 sec 1.11 GBytes 949 Mbits/sec 0.014 ms
> > > > > > > 4756/824530 (0.58%) receiver
> > > > > >
> > > > > > The fact that it is impacting UDP seems odd. I wonder if we don't have
> > > > > > a qdisc somewhere that is misbehaving and throttling the Tx. Either
> > > > > > that or I wonder if we are getting spammed with flow control frames.
> > > > >
> > > > > it sometimes works, it looks like the cwindow just isn't increased -
> > > > > that's where i started...
> > > > >
> > > > > Example:
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 4.86 MBytes 40.8 Mbits/sec 0 50.9 KBytes
> > > > > [ 5] 1.00-2.00 sec 4.66 MBytes 39.1 Mbits/sec 0 65.0 KBytes
> > > > > [ 5] 2.00-3.00 sec 4.29 MBytes 36.0 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 3.00-4.00 sec 4.66 MBytes 39.1 Mbits/sec 0 42.4 KBytes
> > > > > [ 5] 4.00-5.00 sec 23.1 MBytes 194 Mbits/sec 0 1.07 MBytes
> > > > > [ 5] 5.00-6.00 sec 110 MBytes 923 Mbits/sec 0 761 KBytes
> > > > > [ 5] 6.00-7.00 sec 98.8 MBytes 828 Mbits/sec 60 806 KBytes
> > > > > [ 5] 7.00-8.00 sec 82.5 MBytes 692 Mbits/sec 0 812 KBytes
> > > > > [ 5] 8.00-9.00 sec 110 MBytes 923 Mbits/sec 92 761 KBytes
> > > > > [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 0 755 KBytes
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > > [ ID] Interval Transfer Bitrate Retr
> > > > > [ 5] 0.00-10.00 sec 554 MBytes 465 Mbits/sec 152 sender
> > > > > [ 5] 0.00-10.00 sec 550 MBytes 461 Mbits/sec receiver
> > > > >
> > > > > > It would be useful to include the output of just calling "ethtool
> > > > > > enp3s0" on the interface to verify the speed, "ethtool -a enp3s0" to
> > > > > > verify flow control settings, and "ethtool -S enp3s0 | grep -v :\ 0"
> > > > > > to output the statistics and dump anything that isn't zero.
> > > > >
> > > > > ethtool enp3s0
> > > > > Settings for enp3s0:
> > > > > Supported ports: [ TP ]
> > > > > Supported link modes: 10baseT/Half 10baseT/Full
> > > > > 100baseT/Half 100baseT/Full
> > > > > 1000baseT/Full
> > > > > Supported pause frame use: Symmetric
> > > > > Supports auto-negotiation: Yes
> > > > > Supported FEC modes: Not reported
> > > > > Advertised link modes: 10baseT/Half 10baseT/Full
> > > > > 100baseT/Half 100baseT/Full
> > > > > 1000baseT/Full
> > > > > Advertised pause frame use: Symmetric
> > > > > Advertised auto-negotiation: Yes
> > > > > Advertised FEC modes: Not reported
> > > > > Speed: 1000Mb/s
> > > > > Duplex: Full
> > > > > Auto-negotiation: on
> > > > > Port: Twisted Pair
> > > > > PHYAD: 1
> > > > > Transceiver: internal
> > > > > MDI-X: off (auto)
> > > > > Supports Wake-on: pumbg
> > > > > Wake-on: g
> > > > > Current message level: 0x00000007 (7)
> > > > > drv probe link
> > > > > Link detected: yes
> > > > > ---
> > > > > ethtool -a enp3s0
> > > > > Pause parameters for enp3s0:
> > > > > Autonegotiate: on
> > > > > RX: on
> > > > > TX: off
> > > > > ---
> > > > > ethtool -S enp3s0 |grep -v :\ 0
> > > > > NIC statistics:
> > > > > rx_packets: 15920618
> > > > > tx_packets: 17846725
> > > > > rx_bytes: 15676264423
> > > > > tx_bytes: 19925010639
> > > > > rx_broadcast: 119553
> > > > > tx_broadcast: 497
> > > > > rx_multicast: 330193
> > > > > tx_multicast: 18190
> > > > > multicast: 330193
> > > > > rx_missed_errors: 270102
> > > > > rx_long_length_errors: 6
> > > > > tx_tcp_seg_good: 1342561
> > > > > rx_long_byte_count: 15676264423
> > > > > rx_errors: 6
> > > > > rx_length_errors: 6
> > > > > rx_fifo_errors: 270102
> > > > > tx_queue_0_packets: 7651168
> > > > > tx_queue_0_bytes: 7823281566
> > > > > tx_queue_0_restart: 4920
> > > > > tx_queue_1_packets: 10195557
> > > > > tx_queue_1_bytes: 12027522118
> > > > > tx_queue_1_restart: 12718
> > > > > rx_queue_0_packets: 15920618
> > > > > rx_queue_0_bytes: 15612581951
> > > > > rx_queue_0_csum_err: 76
> > > > > (I've only run two runs since i reenabled the interface)
> > > >
> > > > So I am seeing three things here.
> > > >
> > > > The rx_long_length_errors are usually due to an MTU mismatch. Do you
> > > > have something on the network that is using jumbo frames, or is the
> > > > MTU on the NIC set to something smaller than what is supported on the
> > > > network?
> > >
> > > I'm using jumbo frames on the local network, internet side is the
> > > normal 1500 bytes mtu though
> > >
> > > > You are getting rx_missed_errors, that would seem to imply that the
> > > > DMA is not able to keep up. We may want to try disabling the L1 to see
> > > > if we get any boost from doing that.
> > >
> > > It used to work, I don't do benchmarks all the time and sometimes the first
> > > benchmarks turn out fine... so it's hard to say when this started happening...
> > >
> > > It could also be related to a bios upgrade, but I'm pretty sure I did
> > > successful benchmarks after that...
> > >
> > > How do I disable the l1? just echo 0 >
> > > /sys/bus/pci/drivers/igb/0000\:03\:00.0/link/l1_aspm ?
> > >
> > > > The last bit is that queue 0 is seeing packets with bad checksums. You
> > > > might want to run some tests and see where the bad checksums are
> > > > coming from. If they are being detected from a specific NIC such as
> > > > the ixgbe in your example it might point to some sort of checksum
> > > > error being created as a result of the NAT translation.
> > >
> > > But that should also affect A' and the A -> B -> C case, which it doesn't...
> > >
> > > It only seems to happen with higher rtt (6 hops, sub 3 ms in this case
> > > but still high enough somehow)
> > >
> > > > > ---
> > > > >
> > > > > > > lspci -s 03:00.0 -vvv
> > > > > > > 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network
> > > > > > > Connection (rev 03)
> > > > > > > Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection
> > > > > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> > > > > > > Stepping- SERR- FastB2B- DisINTx+
> > > > > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > > > > Latency: 0
> > > > > > > Interrupt: pin A routed to IRQ 57
> > > > > > > IOMMU group: 20
> > > > > > > Region 0: Memory at fc900000 (32-bit, non-prefetchable) [size=128K]
> > > > > > > Region 2: I/O ports at e000 [size=32]
> > > > > > > Region 3: Memory at fc920000 (32-bit, non-prefetchable) [size=16K]
> > > > > > > Capabilities: [40] Power Management version 3
> > > > > > > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > > > > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
> > > > > > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> > > > > > > Address: 0000000000000000 Data: 0000
> > > > > > > Masking: 00000000 Pending: 00000000
> > > > > > > Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> > > > > > > Vector table: BAR=3 offset=00000000
> > > > > > > PBA: BAR=3 offset=00002000
> > > > > > > Capabilities: [a0] Express (v2) Endpoint, MSI 00
> > > > > > > DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> > > > > > > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
> > > > > > > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> > > > > > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> > > > > > > MaxPayload 128 bytes, MaxReadReq 512 bytes
> > > > > > > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > > > > > > LnkCap: Port #3, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
> > > > > > > L0s <2us, L1 <16us
> > > > > > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> > > > > > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > > > > > > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > > > > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > > > > > > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > > > > >
> > > > > > PCIe wise the connection is going to be pretty tight in terms of
> > > > > > bandwidth. It looks like we have 2.5GT/s with only a single lane of
> > > > > > PCIe. In addition we are running with ASPM enabled so that means that
> > > > > > if we don't have enough traffic we are shutting off the one PCIe lane
> > > > > > we have so if we are getting bursty traffic that can get ugly.
> > > > >
> > > > > Humm... is there a way to force disable ASPM in sysfs?
> > > >
> > > > Actually the easiest way to do this is to just use setpci.
> > > >
> > > > You should be able to dump the word containing the setting via:
> > > > # setpci -s 3:00.0 0xB0.w
> > > > 0042
> > > > # setpci -s 3:00.0 0xB0.w=0040
> > > >
> > > > Basically what you do is clear the lower 3 bits of the value so in
> > > > this case that means replacing the 2 with a 0 based on the output of
> > > > the first command.
> > >
> > > Well... I'll be damned... I used to force enable ASPM... this must be
> > > related to the change in PCIe bus ASPM
> > > Perhaps disable ASPM if there is only one link?
> >
> > Is there any specific reason why you are enabling ASPM? Is this system
> > a laptop where you are trying to conserve power when on battery? If
> > not disabling it probably won't hurt things too much since the power
> > consumption for a 2.5GT/s link operating in a width of one shouldn't
> > bee too high. Otherwise you are likely going to end up paying the
> > price for getting the interface out of L1 when the traffic goes idle
> > so you are going to see flows that get bursty paying a heavy penalty
> > when they start dropping packets.
>
> Ah, you misunderstand, I used to do this and everything worked - now
> Linux enables ASPM by default on all pcie controllers,
> so imho this should be a quirk, if there is only one lane, don't do
> ASPM due to latency and timing issues...
>
> > It is also possible this could be something that changed with the
> > physical PCIe link. Basically L1 works by powering down the link when
> > idle, and then powering it back up when there is activity. The problem
> > is bringing it back up can sometimes be a challenge when the physical
> > link starts to go faulty. I know I have seen that in some cases it can
> > even result in the device falling off of the PCIe bus if the link
> > training fails.
>
> It works fine without ASPM (and the machine is pretty new)
>
> I suspect we hit some timing race with aggressive ASPM (assumed as
> such since it works on local links but doesn't on ~3 ms Links)
Agreed. What is probably happening if you are using a NAT is that it
may be seeing some burstiness being introduced and as a result the
part is going to sleep and then being overrun when the traffic does
arrive.
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 113 MBytes 950 Mbits/sec 31 710 KBytes
> > > [ 5] 1.00-2.00 sec 110 MBytes 923 Mbits/sec 135 626 KBytes
> > > [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 18 713 KBytes
> > > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 798 KBytes
> > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 0 721 KBytes
> > > [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 31 800 KBytes
> > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 730 KBytes
> > > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 19 730 KBytes
> > > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
> > > [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 12 701 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 1.09 GBytes 937 Mbits/sec 246 sender
> > > [ 5] 0.00-10.01 sec 1.09 GBytes 933 Mbits/sec receiver
> > >
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 749 KBytes
> > > [ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 30 766 KBytes
> > > [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 7 749 KBytes
> > > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
> > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
> > > [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 8 699 KBytes
> > > [ 5] 6.00-7.00 sec 112 MBytes 944 Mbits/sec 1 953 KBytes
> > > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 701 KBytes
> > > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 26 707 KBytes
> > > [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 2 1.07 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec 87 sender
> > > [ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
> > >
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 114 MBytes 953 Mbits/sec 16 908 KBytes
> > > [ 5] 1.00-2.00 sec 112 MBytes 944 Mbits/sec 0 693 KBytes
> > > [ 5] 2.00-3.00 sec 111 MBytes 933 Mbits/sec 0 713 KBytes
> > > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 0 687 KBytes
> > > [ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 15 687 KBytes
> > > [ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 2 888 KBytes
> > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 17 696 KBytes
> > > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 0 758 KBytes
> > > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 31 749 KBytes
> > > [ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 792 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 81 sender
> > > [ 5] 0.00-10.00 sec 1.09 GBytes 934 Mbits/sec receiver
> > >
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 114 MBytes 956 Mbits/sec 0 747 KBytes
> > > [ 5] 1.00-2.00 sec 111 MBytes 933 Mbits/sec 0 744 KBytes
> > > [ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 12 1.18 MBytes
> > > [ 5] 3.00-4.00 sec 111 MBytes 933 Mbits/sec 2 699 KBytes
> > > [ 5] 4.00-5.00 sec 111 MBytes 933 Mbits/sec 28 699 KBytes
> > > [ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 684 KBytes
> > > [ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 741 KBytes
> > > [ 5] 7.00-8.00 sec 111 MBytes 933 Mbits/sec 3 687 KBytes
> > > [ 5] 8.00-9.00 sec 111 MBytes 933 Mbits/sec 22 699 KBytes
> > > [ 5] 9.00-10.00 sec 111 MBytes 933 Mbits/sec 11 707 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec 78 sender
> > > [ 5] 0.00-10.01 sec 1.09 GBytes 934 Mbits/sec receiver
> > > ---
> > >
> > > ethtool -S enp3s0 |grep -v ": 0"
> > > NIC statistics:
> > > rx_packets: 16303520
> > > tx_packets: 21602840
> > > rx_bytes: 15711958157
> > > tx_bytes: 25599009212
> > > rx_broadcast: 122212
> > > tx_broadcast: 530
> > > rx_multicast: 333489
> > > tx_multicast: 18446
> > > multicast: 333489
> > > rx_missed_errors: 270143
> > > rx_long_length_errors: 6
> > > tx_tcp_seg_good: 1342561
> > > rx_long_byte_count: 15711958157
> > > rx_errors: 6
> > > rx_length_errors: 6
> > > rx_fifo_errors: 270143
> > > tx_queue_0_packets: 8963830
> > > tx_queue_0_bytes: 9803196683
> > > tx_queue_0_restart: 4920
> > > tx_queue_1_packets: 12639010
> > > tx_queue_1_bytes: 15706576814
> > > tx_queue_1_restart: 12718
> > > rx_queue_0_packets: 16303520
> > > rx_queue_0_bytes: 15646744077
> > > rx_queue_0_csum_err: 76
> >
> > Okay, so this result still has the same length and checksum errors,
> > were you resetting the system/statistics between runs?
>
> Ah, no.... Will reset and do more tests when I'm back home
>
> Am I blind or is this part missing from ethtools man page?
There isn't a reset that will reset the stats via ethtool. The device
stats will be persistent until the driver is unloaded and reloaded or
the system is reset. You can reset the queue stats by changing the
number of queues. So for example using "ethtool -L enp3s0 1; ethtool
-L enp3s0 2".
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-17 0:09 ` Alexander Duyck
@ 2020-07-17 13:45 ` Ian Kumlien
2020-07-24 12:01 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-17 13:45 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Fri, Jul 17, 2020 at 2:09 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Thu, Jul 16, 2020 at 12:47 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > Sorry, tried to respond via the phone, used the webbrowser version but
> > still html mails... :/
> > On Thu, Jul 16, 2020 at 5:18 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > > On Wed, Jul 15, 2020 at 5:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
[--8<--]
> > > > Well... I'll be damned... I used to force enable ASPM... this must be
> > > > related to the change in PCIe bus ASPM
> > > > Perhaps disable ASPM if there is only one link?
> > >
> > > Is there any specific reason why you are enabling ASPM? Is this system
> > > a laptop where you are trying to conserve power when on battery? If
> > > not disabling it probably won't hurt things too much since the power
> > > consumption for a 2.5GT/s link operating in a width of one shouldn't
> > > bee too high. Otherwise you are likely going to end up paying the
> > > price for getting the interface out of L1 when the traffic goes idle
> > > so you are going to see flows that get bursty paying a heavy penalty
> > > when they start dropping packets.
> >
> > Ah, you misunderstand, I used to do this and everything worked - now
> > Linux enables ASPM by default on all pcie controllers,
> > so imho this should be a quirk, if there is only one lane, don't do
> > ASPM due to latency and timing issues...
> >
> > > It is also possible this could be something that changed with the
> > > physical PCIe link. Basically L1 works by powering down the link when
> > > idle, and then powering it back up when there is activity. The problem
> > > is bringing it back up can sometimes be a challenge when the physical
> > > link starts to go faulty. I know I have seen that in some cases it can
> > > even result in the device falling off of the PCIe bus if the link
> > > training fails.
> >
> > It works fine without ASPM (and the machine is pretty new)
> >
> > I suspect we hit some timing race with aggressive ASPM (assumed as
> > such since it works on local links but doesn't on ~3 ms Links)
>
> Agreed. What is probably happening if you are using a NAT is that it
> may be seeing some burstiness being introduced and as a result the
> part is going to sleep and then being overrun when the traffic does
> arrive.
Weird though, seems to be very aggressive timings =)
[--8<--]
> > > > ethtool -S enp3s0 |grep -v ": 0"
> > > > NIC statistics:
> > > > rx_packets: 16303520
> > > > tx_packets: 21602840
> > > > rx_bytes: 15711958157
> > > > tx_bytes: 25599009212
> > > > rx_broadcast: 122212
> > > > tx_broadcast: 530
> > > > rx_multicast: 333489
> > > > tx_multicast: 18446
> > > > multicast: 333489
> > > > rx_missed_errors: 270143
> > > > rx_long_length_errors: 6
> > > > tx_tcp_seg_good: 1342561
> > > > rx_long_byte_count: 15711958157
> > > > rx_errors: 6
> > > > rx_length_errors: 6
> > > > rx_fifo_errors: 270143
> > > > tx_queue_0_packets: 8963830
> > > > tx_queue_0_bytes: 9803196683
> > > > tx_queue_0_restart: 4920
> > > > tx_queue_1_packets: 12639010
> > > > tx_queue_1_bytes: 15706576814
> > > > tx_queue_1_restart: 12718
> > > > rx_queue_0_packets: 16303520
> > > > rx_queue_0_bytes: 15646744077
> > > > rx_queue_0_csum_err: 76
> > >
> > > Okay, so this result still has the same length and checksum errors,
> > > were you resetting the system/statistics between runs?
> >
> > Ah, no.... Will reset and do more tests when I'm back home
> >
> > Am I blind or is this part missing from ethtools man page?
>
> There isn't a reset that will reset the stats via ethtool. The device
> stats will be persistent until the driver is unloaded and reloaded or
> the system is reset. You can reset the queue stats by changing the
> number of queues. So for example using "ethtool -L enp3s0 1; ethtool
> -L enp3s0 2".
It did reset some counters but not all...
NIC statistics:
rx_packets: 37339997
tx_packets: 36066432
rx_bytes: 39226365570
tx_bytes: 37364799188
rx_broadcast: 197736
tx_broadcast: 1187
rx_multicast: 572374
tx_multicast: 30546
multicast: 572374
collisions: 0
rx_crc_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 270844
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
rx_long_length_errors: 6
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 2663350
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 39226365570
tx_dma_out_of_sync: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_hwtstamp_timeouts: 0
tx_hwtstamp_skipped: 0
rx_hwtstamp_cleared: 0
rx_errors: 6
tx_errors: 0
tx_dropped: 0
rx_length_errors: 6
rx_over_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 270844
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_queue_0_packets: 16069894
tx_queue_0_bytes: 16031462246
tx_queue_0_restart: 4920
tx_queue_1_packets: 19996538
tx_queue_1_bytes: 21169430746
tx_queue_1_restart: 12718
rx_queue_0_packets: 37339997
rx_queue_0_bytes: 39077005582
rx_queue_0_drops: 0
rx_queue_0_csum_err: 76
rx_queue_0_alloc_failed: 0
rx_queue_1_packets: 0
rx_queue_1_bytes: 0
rx_queue_1_drops: 0
rx_queue_1_csum_err: 0
rx_queue_1_alloc_failed: 0
-- vs --
NIC statistics:
rx_packets: 37340720
tx_packets: 36066920
rx_bytes: 39226590275
tx_bytes: 37364899567
rx_broadcast: 197755
tx_broadcast: 1204
rx_multicast: 572582
tx_multicast: 30563
multicast: 572582
collisions: 0
rx_crc_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 270844
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
rx_long_length_errors: 6
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 2663352
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 39226590275
tx_dma_out_of_sync: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_hwtstamp_timeouts: 0
tx_hwtstamp_skipped: 0
rx_hwtstamp_cleared: 0
rx_errors: 6
tx_errors: 0
tx_dropped: 0
rx_length_errors: 6
rx_over_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 270844
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_queue_0_packets: 59
tx_queue_0_bytes: 11829
tx_queue_0_restart: 0
tx_queue_1_packets: 49
tx_queue_1_bytes: 12058
tx_queue_1_restart: 0
rx_queue_0_packets: 84
rx_queue_0_bytes: 22195
rx_queue_0_drops: 0
rx_queue_0_csum_err: 0
rx_queue_0_alloc_failed: 0
rx_queue_1_packets: 0
rx_queue_1_bytes: 0
rx_queue_1_drops: 0
rx_queue_1_csum_err: 0
rx_queue_1_alloc_failed: 0
---
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-17 13:45 ` Ian Kumlien
@ 2020-07-24 12:01 ` Ian Kumlien
2020-07-24 12:33 ` Ian Kumlien
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-24 12:01 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Fri, Jul 17, 2020 at 3:45 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Fri, Jul 17, 2020 at 2:09 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> > On Thu, Jul 16, 2020 at 12:47 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> > > Sorry, tried to respond via the phone, used the webbrowser version but
> > > still html mails... :/
>
> > > On Thu, Jul 16, 2020 at 5:18 PM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > > On Wed, Jul 15, 2020 at 5:00 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> [--8<--]
>
> > > > > Well... I'll be damned... I used to force enable ASPM... this must be
> > > > > related to the change in PCIe bus ASPM
> > > > > Perhaps disable ASPM if there is only one link?
> > > >
> > > > Is there any specific reason why you are enabling ASPM? Is this system
> > > > a laptop where you are trying to conserve power when on battery? If
> > > > not disabling it probably won't hurt things too much since the power
> > > > consumption for a 2.5GT/s link operating in a width of one shouldn't
> > > > bee too high. Otherwise you are likely going to end up paying the
> > > > price for getting the interface out of L1 when the traffic goes idle
> > > > so you are going to see flows that get bursty paying a heavy penalty
> > > > when they start dropping packets.
> > >
> > > Ah, you misunderstand, I used to do this and everything worked - now
> > > Linux enables ASPM by default on all pcie controllers,
> > > so imho this should be a quirk, if there is only one lane, don't do
> > > ASPM due to latency and timing issues...
> > >
> > > > It is also possible this could be something that changed with the
> > > > physical PCIe link. Basically L1 works by powering down the link when
> > > > idle, and then powering it back up when there is activity. The problem
> > > > is bringing it back up can sometimes be a challenge when the physical
> > > > link starts to go faulty. I know I have seen that in some cases it can
> > > > even result in the device falling off of the PCIe bus if the link
> > > > training fails.
> > >
> > > It works fine without ASPM (and the machine is pretty new)
> > >
> > > I suspect we hit some timing race with aggressive ASPM (assumed as
> > > such since it works on local links but doesn't on ~3 ms Links)
> >
> > Agreed. What is probably happening if you are using a NAT is that it
> > may be seeing some burstiness being introduced and as a result the
> > part is going to sleep and then being overrun when the traffic does
> > arrive.
>
> Weird though, seems to be very aggressive timings =)
>
> [--8<--]
>
> > > > > ethtool -S enp3s0 |grep -v ": 0"
> > > > > NIC statistics:
> > > > > rx_packets: 16303520
> > > > > tx_packets: 21602840
> > > > > rx_bytes: 15711958157
> > > > > tx_bytes: 25599009212
> > > > > rx_broadcast: 122212
> > > > > tx_broadcast: 530
> > > > > rx_multicast: 333489
> > > > > tx_multicast: 18446
> > > > > multicast: 333489
> > > > > rx_missed_errors: 270143
> > > > > rx_long_length_errors: 6
> > > > > tx_tcp_seg_good: 1342561
> > > > > rx_long_byte_count: 15711958157
> > > > > rx_errors: 6
> > > > > rx_length_errors: 6
> > > > > rx_fifo_errors: 270143
> > > > > tx_queue_0_packets: 8963830
> > > > > tx_queue_0_bytes: 9803196683
> > > > > tx_queue_0_restart: 4920
> > > > > tx_queue_1_packets: 12639010
> > > > > tx_queue_1_bytes: 15706576814
> > > > > tx_queue_1_restart: 12718
> > > > > rx_queue_0_packets: 16303520
> > > > > rx_queue_0_bytes: 15646744077
> > > > > rx_queue_0_csum_err: 76
> > > >
> > > > Okay, so this result still has the same length and checksum errors,
> > > > were you resetting the system/statistics between runs?
> > >
> > > Ah, no.... Will reset and do more tests when I'm back home
> > >
> > > Am I blind or is this part missing from ethtools man page?
> >
> > There isn't a reset that will reset the stats via ethtool. The device
> > stats will be persistent until the driver is unloaded and reloaded or
> > the system is reset. You can reset the queue stats by changing the
> > number of queues. So for example using "ethtool -L enp3s0 1; ethtool
> > -L enp3s0 2".
As a side note, would something like this fix it - not even compile tested
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
b/drivers/net/ethernet/intel/igb/igb_main.c
index 8bb3db2cbd41..1a7240aae85c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3396,6 +3396,13 @@ static int igb_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
"Width x2" :
(hw->bus.width == e1000_bus_width_pcie_x1) ?
"Width x1" : "unknown"), netdev->dev_addr);
+ /* quirk */
+#ifdef CONFIG_PCIEASPM
+ if (hw->bus.width == e1000_bus_width_pcie_x1) {
+ /* single lane pcie causes problems with ASPM */
+ pdev->pcie_link_state->aspm_enabled = 0;
+ }
+#endif
}
if ((hw->mac.type >= e1000_i210 ||
I don't know where the right place to put a quirk would be...
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-24 12:01 ` Ian Kumlien
@ 2020-07-24 12:33 ` Ian Kumlien
2020-07-24 14:56 ` Alexander Duyck
0 siblings, 1 reply; 18+ messages in thread
From: Ian Kumlien @ 2020-07-24 12:33 UTC (permalink / raw)
To: Alexander Duyck
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Fri, Jul 24, 2020 at 2:01 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Fri, Jul 17, 2020 at 3:45 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
[--8<--]
> As a side note, would something like this fix it - not even compile tested
>
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 8bb3db2cbd41..1a7240aae85c 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3396,6 +3396,13 @@ static int igb_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
> "Width x2" :
> (hw->bus.width == e1000_bus_width_pcie_x1) ?
> "Width x1" : "unknown"), netdev->dev_addr);
> + /* quirk */
> +#ifdef CONFIG_PCIEASPM
> + if (hw->bus.width == e1000_bus_width_pcie_x1) {
> + /* single lane pcie causes problems with ASPM */
> + pdev->pcie_link_state->aspm_enabled = 0;
> + }
> +#endif
> }
>
> if ((hw->mac.type >= e1000_i210 ||
>
> I don't know where the right place to put a quirk would be...
Ok so that was a real brainfart... turns out that there is a lack of
good ways to get to that but it was more intended to
know where the quirk should go...
Due to the lack of api:s i started wondering if this will apply to
more devices than just network cards - potentially we could
be a little bit more selective and only not enable it in one direction but...
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..96a3c6837124 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -584,15 +584,16 @@ static void pcie_aspm_cap_init(struct
pcie_link_state *link, int blacklist)
* given link unless components on both sides of the link each
* support L0s.
*/
- if (dwreg.support & upreg.support & PCIE_LINK_STATE_L0S)
- link->aspm_support |= ASPM_STATE_L0S;
- if (dwreg.enabled & PCIE_LINK_STATE_L0S)
- link->aspm_enabled |= ASPM_STATE_L0S_UP;
- if (upreg.enabled & PCIE_LINK_STATE_L0S)
- link->aspm_enabled |= ASPM_STATE_L0S_DW;
- link->latency_up.l0s = calc_l0s_latency(upreg.latency_encoding_l0s);
- link->latency_dw.l0s = calc_l0s_latency(dwreg.latency_encoding_l0s);
-
+ if (pcie_get_width_cap(child) != PCIE_LNK_X1) {
+ if (dwreg.support & upreg.support & PCIE_LINK_STATE_L0S)
+ link->aspm_support |= ASPM_STATE_L0S;
+ if (dwreg.enabled & PCIE_LINK_STATE_L0S)
+ link->aspm_enabled |= ASPM_STATE_L0S_UP;
+ if (upreg.enabled & PCIE_LINK_STATE_L0S)
+ link->aspm_enabled |= ASPM_STATE_L0S_DW;
+ link->latency_up.l0s =
calc_l0s_latency(upreg.latency_encoding_l0s);
+ link->latency_dw.l0s =
calc_l0s_latency(dwreg.latency_encoding_l0s);
+ }
this time it's compile tested...
It could also be if (pcie_get_width_cap(child) > PCIE_LNK_X1) {
I assume that ASPM is not enabled for: PCIE_LNK_WIDTH_RESRV ;)
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [Intel-wired-lan] NAT performance issue 944mbit -> ~40mbit
2020-07-24 12:33 ` Ian Kumlien
@ 2020-07-24 14:56 ` Alexander Duyck
0 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2020-07-24 14:56 UTC (permalink / raw)
To: Ian Kumlien
Cc: Jakub Kicinski, Linux Kernel Network Developers, intel-wired-lan
On Fri, Jul 24, 2020 at 5:33 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Fri, Jul 24, 2020 at 2:01 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Fri, Jul 17, 2020 at 3:45 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> [--8<--]
>
> > As a side note, would something like this fix it - not even compile tested
> >
> >
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c
> > index 8bb3db2cbd41..1a7240aae85c 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -3396,6 +3396,13 @@ static int igb_probe(struct pci_dev *pdev,
> > const struct pci_device_id *ent)
> > "Width x2" :
> > (hw->bus.width == e1000_bus_width_pcie_x1) ?
> > "Width x1" : "unknown"), netdev->dev_addr);
> > + /* quirk */
> > +#ifdef CONFIG_PCIEASPM
> > + if (hw->bus.width == e1000_bus_width_pcie_x1) {
> > + /* single lane pcie causes problems with ASPM */
> > + pdev->pcie_link_state->aspm_enabled = 0;
> > + }
> > +#endif
> > }
> >
> > if ((hw->mac.type >= e1000_i210 ||
> >
> > I don't know where the right place to put a quirk would be...
>
> Ok so that was a real brainfart... turns out that there is a lack of
> good ways to get to that but it was more intended to
> know where the quirk should go...
>
> Due to the lack of api:s i started wondering if this will apply to
> more devices than just network cards - potentially we could
> be a little bit more selective and only not enable it in one direction but...
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..96a3c6837124 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -584,15 +584,16 @@ static void pcie_aspm_cap_init(struct
> pcie_link_state *link, int blacklist)
> * given link unless components on both sides of the link each
> * support L0s.
> */
> - if (dwreg.support & upreg.support & PCIE_LINK_STATE_L0S)
> - link->aspm_support |= ASPM_STATE_L0S;
> - if (dwreg.enabled & PCIE_LINK_STATE_L0S)
> - link->aspm_enabled |= ASPM_STATE_L0S_UP;
> - if (upreg.enabled & PCIE_LINK_STATE_L0S)
> - link->aspm_enabled |= ASPM_STATE_L0S_DW;
> - link->latency_up.l0s = calc_l0s_latency(upreg.latency_encoding_l0s);
> - link->latency_dw.l0s = calc_l0s_latency(dwreg.latency_encoding_l0s);
> -
> + if (pcie_get_width_cap(child) != PCIE_LNK_X1) {
> + if (dwreg.support & upreg.support & PCIE_LINK_STATE_L0S)
> + link->aspm_support |= ASPM_STATE_L0S;
> + if (dwreg.enabled & PCIE_LINK_STATE_L0S)
> + link->aspm_enabled |= ASPM_STATE_L0S_UP;
> + if (upreg.enabled & PCIE_LINK_STATE_L0S)
> + link->aspm_enabled |= ASPM_STATE_L0S_DW;
> + link->latency_up.l0s =
> calc_l0s_latency(upreg.latency_encoding_l0s);
> + link->latency_dw.l0s =
> calc_l0s_latency(dwreg.latency_encoding_l0s);
> + }
>
> this time it's compile tested...
>
> It could also be if (pcie_get_width_cap(child) > PCIE_LNK_X1) {
>
> I assume that ASPM is not enabled for: PCIE_LNK_WIDTH_RESRV ;)
This is probably a bit too broad of a scope to be used generically
since this will disable ASPM for all devices that have a x1 link
width.
It might make more sense to look at something such as
e1000e_disable_aspm as an example of how to approach this.
As far as what triggers it we would need to get more details about the
setup. I'd be curious if we have an "lspci -vvv" for the system
available. The assumption is that the ASPM exit latency is high on
this system and that in turn is causing the bandwidth issues as you
start entering L1. If I am not mistaken the device should advertise
about 16us for the exit latency. I'd be curious if we have a device
somewhere between the NIC and the root port that might be increasing
the delay in exiting L1, and then if we could identify that we could
add a PCIe quirk for that.
Thanks.
- Alex
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-07-24 14:57 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-11 15:53 NAT performance issue 944mbit -> ~40mbit Ian Kumlien
2020-07-15 20:05 ` Ian Kumlien
2020-07-15 20:31 ` Jakub Kicinski
2020-07-15 21:02 ` Ian Kumlien
2020-07-15 21:12 ` Ian Kumlien
2020-07-15 21:40 ` Jakub Kicinski
2020-07-15 21:59 ` Ian Kumlien
2020-07-15 22:32 ` [Intel-wired-lan] " Alexander Duyck
2020-07-15 22:51 ` Ian Kumlien
2020-07-15 23:41 ` Alexander Duyck
2020-07-15 23:59 ` Ian Kumlien
2020-07-16 15:18 ` Alexander Duyck
2020-07-16 19:47 ` Ian Kumlien
2020-07-17 0:09 ` Alexander Duyck
2020-07-17 13:45 ` Ian Kumlien
2020-07-24 12:01 ` Ian Kumlien
2020-07-24 12:33 ` Ian Kumlien
2020-07-24 14:56 ` Alexander Duyck
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).