All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	bjorn.topel@gmail.com, jasowang@redhat.com, ast@fb.com,
	alexander.duyck@gmail.com, john.r.fastabend@intel.com,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH 1/2] af_packet: direct dma for packet ineterface
Date: Tue, 31 Jan 2017 21:09:24 -0800	[thread overview]
Message-ID: <58916D84.7090706@gmail.com> (raw)
In-Reply-To: <CAF=yD-J_2Y4eX_iG40rKm3tgs_xr2dr-Rw=JL_OsV0TnfOKhhQ@mail.gmail.com>

On 17-01-30 05:31 PM, Willem de Bruijn wrote:
>>>> V3 header formats added bulk polling via socket calls and timers
>>>> used in the polling interface to return every n milliseconds. Currently,
>>>> I don't see any way to support this in hardware because we can't
>>>> know if the hardware is in the middle of a DMA operation or not
>>>> on a slot. So when a timer fires I don't know how to advance the
>>>> descriptor ring leaving empty descriptors similar to how the software
>>>> ring works. The easiest (best?) route is to simply not support this.
>>>
>>> From a performance pov bulking is essential. Systems like netmap that
>>> also depend on transferring control between kernel and userspace,
>>> report[1] that they need at least bulking size 8, to amortize the overhead.
> 
> To introduce interrupt moderation, ixgbe_do_ddma only has to elide the
> sk_data_ready, and schedule an hrtimer if one is not scheduled yet.
> 
> If I understand correctly, the difficulty lies in v3 requiring that the
> timer "close" the block when the timer expires. That may not be worth
> implementing, indeed.
> 

Yep that is where I just gave up and decided it wasn't worth it.

> Hardware interrupt moderation and napi may already give some
> moderation, even with a sock_def_readable call for each packet. If
> considering a v4 format, I'll again suggest virtio virtqueues. Those
> have interrupt suppression built in with EVENT_IDX.


Agreed. On paper now I'm considering moving to something like this after
getting some feedback here. Of course I'll need to play with the code a
bit to see what it looks like. I'll need a couple weeks probably to get
this sorted out.

> 
>>> Likely, but I would like that we do a measurement based approach.  Lets
>>> benchmark with this V2 header format, and see how far we are from
>>> target, and see what lights-up in perf report and if it is something we
>>> can address.
>>
>> Yep I'm hoping to get to this sometime this week.
> 
> Perhaps also without filling in the optional metadata data fields
> in tpacket and sockaddr_ll.
> 
>>> E.g. how will you support XDP_TX?  AFAIK you cannot remove/detach a
>>> packet with this solution (and place it on a TX queue and wait for DMA
>>> TX completion).
>>>
>>
>> This is something worth exploring. tpacket_v2 uses a fixed ring with
>> slots so all the pages are allocated and assigned to the ring at init
>> time. To xmit a packet in this case the user space application would
>> be required to leave the packet descriptor on the rx side pinned
>> until the tx side DMA has completed. Then it can unpin the rx side
>> and return it to the driver. This works if the TX/RX processing is
>> fast enough to keep up. For many things this is good enough.
>>
>> For some work loads though this may not be sufficient. In which
>> case a tpacket_v4 would be useful that can push down a new set
>> of "slots" every n packets. Where n is sufficiently large to keep
>> the workload running.
> 
> Here, too, virtio rings may help.
> 
> The extra level of indirection allows out of order completions,
> reducing the chance of running out of rx descriptors when redirecting
> a subset of packets to a tx ring, as that does not block the entire ring.
> 
> And passing explicit descriptors from userspace enables pointing to
> new memory regions. On the flipside, they now have to be checked for
> safety against region bounds.
> 
>> This is similar in many ways to virtio/vhost interaction.
> 
> Ah, I only saw this after writing the above :)
> 

yep but glad to get some validation on this idea.

  reply	other threads:[~2017-02-01  5:09 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-27 21:33 [RFC PATCH 0/2] rx zero copy interface for af_packet John Fastabend
2017-01-27 21:33 ` [RFC PATCH 1/2] af_packet: direct dma for packet ineterface John Fastabend
2017-01-30 18:16   ` Jesper Dangaard Brouer
2017-01-30 21:51     ` John Fastabend
2017-01-31  1:31       ` Willem de Bruijn
2017-02-01  5:09         ` John Fastabend [this message]
2017-03-06 21:28           ` chetan loke
2017-01-31 12:20       ` Jesper Dangaard Brouer
2017-02-01  5:01         ` John Fastabend
2017-02-04  3:10   ` Jason Wang
2017-01-27 21:34 ` [RFC PATCH 2/2] ixgbe: add af_packet direct copy support John Fastabend
2017-01-31  2:53   ` Alexei Starovoitov
2017-02-01  4:58     ` John Fastabend
2017-01-30 22:02 ` [RFC PATCH 0/2] rx zero copy interface for af_packet David Miller
2017-01-31 16:30 ` Sowmini Varadhan
2017-02-01  4:23   ` John Fastabend
2017-01-31 19:39 ` tndave
2017-02-01  5:09   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58916D84.7090706@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ast@fb.com \
    --cc=bjorn.topel@gmail.com \
    --cc=brouer@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.