linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: zero copy recv()
@ 2019-04-24  6:23 Maxim Uvarov
  2019-04-24 15:59 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Maxim Uvarov @ 2019-04-24  6:23 UTC (permalink / raw)
  To: netdev, linux-kernel

Hello,

On different conferences I see that people are trying to accelerate
network with putting packet processing with protocol level completely
to user space. It might be DPDK, ODP or AF_XDP  plus some network
stack on top of it. Then people are trying to test this solution with
some existence applications. And in better way do not modify
application binaries and just LD_PRELOAD sockets syscalls (recv(),
sendto() and etc). Current recv() expects that application allocates
memory and call will "copy" packet to that memory. Copy per packet is
slow.  Can we consider about implementing zero copy API calls
friendly? Can this change be accepted to kernel?

What are your thoughts?

Thank you,
Maxim.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: zero copy recv()
  2019-04-24  6:23 RFC: zero copy recv() Maxim Uvarov
@ 2019-04-24 15:59 ` Eric Dumazet
  2019-04-25  8:01   ` Maxim Uvarov
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2019-04-24 15:59 UTC (permalink / raw)
  To: Maxim Uvarov, netdev, linux-kernel



On 04/23/2019 11:23 PM, Maxim Uvarov wrote:
> Hello,
> 
> On different conferences I see that people are trying to accelerate
> network with putting packet processing with protocol level completely
> to user space. It might be DPDK, ODP or AF_XDP  plus some network
> stack on top of it. Then people are trying to test this solution with
> some existence applications. And in better way do not modify
> application binaries and just LD_PRELOAD sockets syscalls (recv(),
> sendto() and etc). Current recv() expects that application allocates
> memory and call will "copy" packet to that memory. Copy per packet is
> slow.  Can we consider about implementing zero copy API calls
> friendly? Can this change be accepted to kernel?

Generic zero copy is hard.

As soon as you have multiple consumers in different domains for the data,
you need some kind of multiplexing, typically using hardware capabilities.

For TCP, we implemented zero copy last year, which works quite well
on x86 if your network uses MTU of 4096+headers.

tools/testing/selftests/net/tcp_mmap.c  reaches line rate (100Gbit) on
a single TCP flow, if using a NIC able to perform header split.

But the model is not to run a legacy application with some LD_PRELOAD
hack/magic, sorry.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: zero copy recv()
  2019-04-24 15:59 ` Eric Dumazet
@ 2019-04-25  8:01   ` Maxim Uvarov
  2019-04-25 17:50     ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Maxim Uvarov @ 2019-04-25  8:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, Ilias Apalodimas

On Wed, 24 Apr 2019 at 18:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 04/23/2019 11:23 PM, Maxim Uvarov wrote:
> > Hello,
> >
> > On different conferences I see that people are trying to accelerate
> > network with putting packet processing with protocol level completely
> > to user space. It might be DPDK, ODP or AF_XDP  plus some network
> > stack on top of it. Then people are trying to test this solution with
> > some existence applications. And in better way do not modify
> > application binaries and just LD_PRELOAD sockets syscalls (recv(),
> > sendto() and etc). Current recv() expects that application allocates
> > memory and call will "copy" packet to that memory. Copy per packet is
> > slow.  Can we consider about implementing zero copy API calls
> > friendly? Can this change be accepted to kernel?
>

Hello Eric, thanks for responding.

> Generic zero copy is hard.
>

yes that is true.

> As soon as you have multiple consumers in different domains for the data,
> you need some kind of multiplexing, typically using hardware capabilities.
>
> For TCP, we implemented zero copy last year, which works quite well
> on x86 if your network uses MTU of 4096+headers.
>
> tools/testing/selftests/net/tcp_mmap.c  reaches line rate (100Gbit) on
> a single TCP flow, if using a NIC able to perform header split.
>

That is great work. But isn't there context switches on
getsockopt(TCP_ZEROCOPY_RECEIVE) and read() per packet?

I played with AF_XDP where one core can be isolated and do polling of
umem pool memory and some other core can do softirq processing.
And polling of umem is really fast - about 96ns on 2.5Ghz x86 laptop
and no context switches on umem polling core.

But in general for tcp_mmap.c code if getsockopt()+read() will be
changed to one zero copy call, something like recvmsg_zc() then it can
be LD_PRELOADED.
mmap() can be also moved under socket creation to simplify api. Does
it look reasonable?

> But the model is not to run a legacy application with some LD_PRELOAD
> hack/magic, sorry.
>
More likely that legacy applications will like to use zero copy
networking. Once api will be stable they will support it, especially
if api can be used with minimal changes for apps.
Than it will be quite easy to LD_PRELOAD hack or change application to
use some other IP stack.

Maxim.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: zero copy recv()
  2019-04-25  8:01   ` Maxim Uvarov
@ 2019-04-25 17:50     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2019-04-25 17:50 UTC (permalink / raw)
  To: Maxim Uvarov; +Cc: netdev, linux-kernel, Ilias Apalodimas



On 4/25/19 1:01 AM, Maxim Uvarov wrote:
> On Wed, 24 Apr 2019 at 18:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 04/23/2019 11:23 PM, Maxim Uvarov wrote:
>>> Hello,
>>>
>>> On different conferences I see that people are trying to accelerate
>>> network with putting packet processing with protocol level completely
>>> to user space. It might be DPDK, ODP or AF_XDP  plus some network
>>> stack on top of it. Then people are trying to test this solution with
>>> some existence applications. And in better way do not modify
>>> application binaries and just LD_PRELOAD sockets syscalls (recv(),
>>> sendto() and etc). Current recv() expects that application allocates
>>> memory and call will "copy" packet to that memory. Copy per packet is
>>> slow.  Can we consider about implementing zero copy API calls
>>> friendly? Can this change be accepted to kernel?
>>
> 
> Hello Eric, thanks for responding.
> 
>> Generic zero copy is hard.
>>
> 
> yes that is true.
> 
>> As soon as you have multiple consumers in different domains for the data,
>> you need some kind of multiplexing, typically using hardware capabilities.
>>
>> For TCP, we implemented zero copy last year, which works quite well
>> on x86 if your network uses MTU of 4096+headers.
>>
>> tools/testing/selftests/net/tcp_mmap.c  reaches line rate (100Gbit) on
>> a single TCP flow, if using a NIC able to perform header split.
>>
> 
> That is great work. But isn't there context switches on
> getsockopt(TCP_ZEROCOPY_RECEIVE) and read() per packet?

No, since in many cases you actually know how many bytes are expected to be received.

SO_RCVLOWAT can be used by the application to tell the kernel :

- Please send me an EPOLLIN only when you have at least XXXXXX bytes available in receive queue.

> 
> I played with AF_XDP where one core can be isolated and do polling of
> umem pool memory and some other core can do softirq processing.
> And polling of umem is really fast - about 96ns on 2.5Ghz x86 laptop
> and no context switches on umem polling core.

Sure, but again this is very far from being 'generic', let say if you want to reuse TCP stack...

> 
> But in general for tcp_mmap.c code if getsockopt()+read() will be
> changed to one zero copy call, something like recvmsg_zc() then it can
> be LD_PRELOADED.
> mmap() can be also moved under socket creation to simplify api. Does
> it look reasonable?

Honestly I prefer not having to play games like that.

They are many subtle issues there really.

> 
>> But the model is not to run a legacy application with some LD_PRELOAD
>> hack/magic, sorry.
>>
> More likely that legacy applications will like to use zero copy
> networking. Once api will be stable they will support it, especially
> if api can be used with minimal changes for apps.
> Than it will be quite easy to LD_PRELOAD hack or change application to
> use some other IP stack.
> 
> Maxim.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-25 17:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24  6:23 RFC: zero copy recv() Maxim Uvarov
2019-04-24 15:59 ` Eric Dumazet
2019-04-25  8:01   ` Maxim Uvarov
2019-04-25 17:50     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).