From: Eric Dumazet <eric.dumazet@gmail.com>
To: Maxim Uvarov <maxim.uvarov@linaro.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Ilias Apalodimas <ilias.apalodimas@linaro.org>
Subject: Re: RFC: zero copy recv()
Date: Thu, 25 Apr 2019 10:50:03 -0700 [thread overview]
Message-ID: <77665188-27f2-6567-9e0c-62c66d98f436@gmail.com> (raw)
In-Reply-To: <CAD8XO3b0m5Qn1Ey3gu3HPmcOanN-yjCYBJZEUEu754X=5jAtOA@mail.gmail.com>
On 4/25/19 1:01 AM, Maxim Uvarov wrote:
> On Wed, 24 Apr 2019 at 18:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 04/23/2019 11:23 PM, Maxim Uvarov wrote:
>>> Hello,
>>>
>>> On different conferences I see that people are trying to accelerate
>>> network with putting packet processing with protocol level completely
>>> to user space. It might be DPDK, ODP or AF_XDP plus some network
>>> stack on top of it. Then people are trying to test this solution with
>>> some existence applications. And in better way do not modify
>>> application binaries and just LD_PRELOAD sockets syscalls (recv(),
>>> sendto() and etc). Current recv() expects that application allocates
>>> memory and call will "copy" packet to that memory. Copy per packet is
>>> slow. Can we consider about implementing zero copy API calls
>>> friendly? Can this change be accepted to kernel?
>>
>
> Hello Eric, thanks for responding.
>
>> Generic zero copy is hard.
>>
>
> yes that is true.
>
>> As soon as you have multiple consumers in different domains for the data,
>> you need some kind of multiplexing, typically using hardware capabilities.
>>
>> For TCP, we implemented zero copy last year, which works quite well
>> on x86 if your network uses MTU of 4096+headers.
>>
>> tools/testing/selftests/net/tcp_mmap.c reaches line rate (100Gbit) on
>> a single TCP flow, if using a NIC able to perform header split.
>>
>
> That is great work. But isn't there context switches on
> getsockopt(TCP_ZEROCOPY_RECEIVE) and read() per packet?
No, since in many cases you actually know how many bytes are expected to be received.
SO_RCVLOWAT can be used by the application to tell the kernel :
- Please send me an EPOLLIN only when you have at least XXXXXX bytes available in receive queue.
>
> I played with AF_XDP where one core can be isolated and do polling of
> umem pool memory and some other core can do softirq processing.
> And polling of umem is really fast - about 96ns on 2.5Ghz x86 laptop
> and no context switches on umem polling core.
Sure, but again this is very far from being 'generic', let say if you want to reuse TCP stack...
>
> But in general for tcp_mmap.c code if getsockopt()+read() will be
> changed to one zero copy call, something like recvmsg_zc() then it can
> be LD_PRELOADED.
> mmap() can be also moved under socket creation to simplify api. Does
> it look reasonable?
Honestly I prefer not having to play games like that.
They are many subtle issues there really.
>
>> But the model is not to run a legacy application with some LD_PRELOAD
>> hack/magic, sorry.
>>
> More likely that legacy applications will like to use zero copy
> networking. Once api will be stable they will support it, especially
> if api can be used with minimal changes for apps.
> Than it will be quite easy to LD_PRELOAD hack or change application to
> use some other IP stack.
>
> Maxim.
>
prev parent reply other threads:[~2019-04-25 17:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-24 6:23 RFC: zero copy recv() Maxim Uvarov
2019-04-24 15:59 ` Eric Dumazet
2019-04-25 8:01 ` Maxim Uvarov
2019-04-25 17:50 ` Eric Dumazet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77665188-27f2-6567-9e0c-62c66d98f436@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=ilias.apalodimas@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maxim.uvarov@linaro.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).