linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Maxim Uvarov <maxim.uvarov@linaro.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>
Subject: Re: RFC: zero copy recv()
Date: Thu, 25 Apr 2019 10:50:03 -0700	[thread overview]
Message-ID: <77665188-27f2-6567-9e0c-62c66d98f436@gmail.com> (raw)
In-Reply-To: <CAD8XO3b0m5Qn1Ey3gu3HPmcOanN-yjCYBJZEUEu754X=5jAtOA@mail.gmail.com>



On 4/25/19 1:01 AM, Maxim Uvarov wrote:
> On Wed, 24 Apr 2019 at 18:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 04/23/2019 11:23 PM, Maxim Uvarov wrote:
>>> Hello,
>>>
>>> On different conferences I see that people are trying to accelerate
>>> network with putting packet processing with protocol level completely
>>> to user space. It might be DPDK, ODP or AF_XDP  plus some network
>>> stack on top of it. Then people are trying to test this solution with
>>> some existence applications. And in better way do not modify
>>> application binaries and just LD_PRELOAD sockets syscalls (recv(),
>>> sendto() and etc). Current recv() expects that application allocates
>>> memory and call will "copy" packet to that memory. Copy per packet is
>>> slow.  Can we consider about implementing zero copy API calls
>>> friendly? Can this change be accepted to kernel?
>>
> 
> Hello Eric, thanks for responding.
> 
>> Generic zero copy is hard.
>>
> 
> yes that is true.
> 
>> As soon as you have multiple consumers in different domains for the data,
>> you need some kind of multiplexing, typically using hardware capabilities.
>>
>> For TCP, we implemented zero copy last year, which works quite well
>> on x86 if your network uses MTU of 4096+headers.
>>
>> tools/testing/selftests/net/tcp_mmap.c  reaches line rate (100Gbit) on
>> a single TCP flow, if using a NIC able to perform header split.
>>
> 
> That is great work. But isn't there context switches on
> getsockopt(TCP_ZEROCOPY_RECEIVE) and read() per packet?

No, since in many cases you actually know how many bytes are expected to be received.

SO_RCVLOWAT can be used by the application to tell the kernel :

- Please send me an EPOLLIN only when you have at least XXXXXX bytes available in receive queue.

> 
> I played with AF_XDP where one core can be isolated and do polling of
> umem pool memory and some other core can do softirq processing.
> And polling of umem is really fast - about 96ns on 2.5Ghz x86 laptop
> and no context switches on umem polling core.

Sure, but again this is very far from being 'generic', let say if you want to reuse TCP stack...

> 
> But in general for tcp_mmap.c code if getsockopt()+read() will be
> changed to one zero copy call, something like recvmsg_zc() then it can
> be LD_PRELOADED.
> mmap() can be also moved under socket creation to simplify api. Does
> it look reasonable?

Honestly I prefer not having to play games like that.

They are many subtle issues there really.

> 
>> But the model is not to run a legacy application with some LD_PRELOAD
>> hack/magic, sorry.
>>
> More likely that legacy applications will like to use zero copy
> networking. Once api will be stable they will support it, especially
> if api can be used with minimal changes for apps.
> Than it will be quite easy to LD_PRELOAD hack or change application to
> use some other IP stack.
> 
> Maxim.
> 

      reply	other threads:[~2019-04-25 17:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-24  6:23 RFC: zero copy recv() Maxim Uvarov
2019-04-24 15:59 ` Eric Dumazet
2019-04-25  8:01   ` Maxim Uvarov
2019-04-25 17:50     ` Eric Dumazet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77665188-27f2-6567-9e0c-62c66d98f436@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maxim.uvarov@linaro.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).