Re: [RFC PATCH 0/4] Sockperf: Initial multi-threaded throughput client

From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	nirni-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
Subject: Re: [RFC PATCH 0/4] Sockperf: Initial multi-threaded throughput client
Date: Wed, 20 Dec 2017 10:52:39 +0200	[thread overview]
Message-ID: <20171220085239.GP2942@mtr-leonro.local> (raw)
In-Reply-To: <cover.1513609601.git.dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3543 bytes --]

On Mon, Dec 18, 2017 at 10:26:42AM -0500, Doug Ledford wrote:
> During testing, it has become painfully clear that a single threaded UDP
> test client can not exercise a 100Gig link due to issues related to
> single core maximum throughput.  This patchset implements a
> multi-threaded throughput test in sockperf.  This is just an initial
> implementation, there is still more work to be done.  In particular:
>
> 1) Although the speed improved with this change, it did not improve
> drastically.  As soon as the client send bottleneck was removed, it
> became clear there is another bottleneck on the server.  When sending to
> a server from one client, all data is received on a single queue pair,
> and due to how interrupts are spread in the RDMA stack (namely that each
> queue pair goes to a single interrupt and we rely on multiple queue
> pairs being in use to balance interrupts across different cores), we
> take all interrupts from a specific host on a single core and the
> receiving side then becomes the bottleneck with single core IPoIB
> receive processing being the limiting factor.  On a slower machine, I
> clocked 30GBit/s throughput.  On a faster machine as the server, I was
> able to get up to 70GBit/s throughput.
>
> 2) I thought I might try an experiment to get around the queue pair is
> on one CPU issue.  We use P_Keys in our internal lab setup, and so on
> the specific link in question, I actually have a total of three
> different IP interfaces on different P_Keys.  I tried to open tests on
> multiple of these interfaces to see how that would impact performance
> (so a multithreaded server listening on ports on three different P_Key
> interfaces all on the same physical link, which should use three
> different queue pairs, and a multithreaded client sending to those three
> different P_Key interfaces from three different P_Key interfaces of its
> own).  It tanked it.  Like less than gigabit ethernet speeds.  This
> warrants some investigation moving forward I think.
>
> 3) I thought I might try sending from two clients to the server at once
> and summing their throughput.  That was fun.  With UDP the clients are
> able to send enough data that flow control on the link kicks in, at
> which point each client starts dropping packets on the floor (they're
> UDP after all), and so the net result is that one client claimed
> 200GBit/s and the other about 175GBit/s.  Meanwhile, the server thought
> we were just kidding and didn't actually run a test at all.
>
> 4) I reran the test using TCP instead of UDP.  That's a non-starter.
> Whether due to my changes, or just because it is the way it is, the TCP
> tests all failed.  For larger message sizes, they failed instantly.  For
> smaller message sizes the test might run for a few seconds, but would
> eventually fail too.  Always the failure was that the server would get a
> message it deemed too large and would forcibly close all of the TCP
> connections, at which point the client just bails.
>
> I should point out that I don't program C++.  Issues with me not doing
> these patches in a C++ typical manner are related to that.
>

Doug,

I contacted the group which is responsible for the development of sockperf
https://github.com/Mellanox/sockperf

Their maintainer is on vacation till third week of January and unluckily
there are no other people right now who can take a look onto your proposal.

In meanwhile, it is better to push the code to github, because their
development flow is based on github and not on mailing list.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]