netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Expected performance w/ bonding
@ 2023-01-18 14:21 Ian Kumlien
  2023-01-18 16:43 ` Jay Vosburgh
  0 siblings, 1 reply; 2+ messages in thread
From: Ian Kumlien @ 2023-01-18 14:21 UTC (permalink / raw)
  To: Linux Kernel Network Developers; +Cc: andy, vfalico, j.vosburgh

Hi,

I was doing some tests with some of the bigger AMD machines, both
using PCIE-4 Mellanox connectx-5 nics

They have 2x100gbit links to the same switches (running in VLT - ie as
"one"), but iperf3 seems to hit a limit at 27gbit max...
(with 10 threads in parallel) but generally somewhere at 25gbit - so
my question is if there is a limit at ~25gbit for bonding
using 802.3ad and layer2+3 hashing.

It's a little bit difficult to do proper measurements since most
systems are in production - but i'm kinda running out of clues =)

If anyone has any ideas, it would be very interesting to see if they would help.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Expected performance w/ bonding
  2023-01-18 14:21 Expected performance w/ bonding Ian Kumlien
@ 2023-01-18 16:43 ` Jay Vosburgh
  0 siblings, 0 replies; 2+ messages in thread
From: Jay Vosburgh @ 2023-01-18 16:43 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Linux Kernel Network Developers, andy, vfalico

Ian Kumlien <ian.kumlien@gmail.com> wrote:

>Hi,
>
>I was doing some tests with some of the bigger AMD machines, both
>using PCIE-4 Mellanox connectx-5 nics
>
>They have 2x100gbit links to the same switches (running in VLT - ie as
>"one"), but iperf3 seems to hit a limit at 27gbit max...
>(with 10 threads in parallel) but generally somewhere at 25gbit - so
>my question is if there is a limit at ~25gbit for bonding
>using 802.3ad and layer2+3 hashing.
>
>It's a little bit difficult to do proper measurements since most
>systems are in production - but i'm kinda running out of clues =)
>
>If anyone has any ideas, it would be very interesting to see if they would help.

	If by "bigger AMD machines" you mean ROME or similar, then what
you may be seeing is the effects of (a) NUMA, and (b) the AMD CCX cache
architecture (in which a small-ish number of CPUs share an L3 cache).
We've seen similar effects on these systems with similar configurations,
particularly on versions with smaller numbers of CPUs per CCX (as I
recall, one is 4 per CCX).

	For testing purposes, you can pin the iperf tasks to CPUs in the
same CCX as one another, and on the same NUMA node as the network
device.  If your bond utilizes interfaces on separate NUMA nodes, there
may be additional randomness in the results, as data may or may not
cross a NUMA boundary depending on the flow hash.  For testing, this can
be worked around by disabling one interface in the bond (i.e., a bond
with just one active interface), and insuring the iperf tasks are pinned
to the correct NUMA node.

	There is a mechanism in bonding to do flow -> queue -> interface
assignments (described in Documentation/networking/bonding.rst), but
it's nontrivial, and still needs the processes to be resident on the
same NUMA node (and on the AMD systems, also within the same CCX
domain).

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-01-18 16:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 14:21 Expected performance w/ bonding Ian Kumlien
2023-01-18 16:43 ` Jay Vosburgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).