All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Brenden Blanco <bblanco@plumgrid.com>
Cc: Tom Herbert <tom@herbertland.com>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	gerlitz@mellanox.com, Daniel Borkmann <daniel@iogearbox.net>,
	john fastabend <john.fastabend@gmail.com>,
	brouer@redhat.com, Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [RFC PATCH 0/5] Add driver bpf hook for early packet drop
Date: Mon, 4 Apr 2016 09:48:46 +0200	[thread overview]
Message-ID: <20160404094846.4df8defc@redhat.com> (raw)
In-Reply-To: <20160403054103.GB21980@gmail.com>

On Sat, 2 Apr 2016 22:41:04 -0700
Brenden Blanco <bblanco@plumgrid.com> wrote:

> On Sat, Apr 02, 2016 at 12:47:16PM -0400, Tom Herbert wrote:
>
> > Very nice! Do you think this hook will be sufficient to implement a
> > fast forward patch also?

(DMA experts please verify and correct me!)

One of the gotchas is how DMA sync/unmap works.  For forwarding you
need to modify the headers.  The DMA sync API (DMA_FROM_DEVICE) specify
that the data is to be _considered_ read-only.  AFAIK you can write into
the data, BUT on DMA_unmap the API/DMA-engine is allowed to overwrite
data... note on most archs the DMA_unmap does not overwrite.

This DMA issue should not block the work on a hook for early packet drop.
Maybe we should add a flag option, that can specify to the hook if the
packet read-only? (e.g. if driver use page-fragments and DMA_sync)


We should have another track/thread on how to solve the DMA issue:
I see two solutions.

Solution 1: Simply use a "full" page per packet and do the DMA_unmap.
This result in a slowdown on arch's with expensive DMA-map/unmap.  And
we stress the page allocator more (can be solved with a page-pool-cache).
Eric will not like this due to memory usage, but we can just add a
"copy-break" step for normal stack hand-off.

Solution 2: (Due credit to Alex Duyck, this idea came up while
discussing issue with him).  Remember DMA_sync'ed data is only
considered read-only, because the DMA_unmap can be destructive.  In many
cases DMA_unmap is not.  Thus, we could take advantage of this, and
allow modifying DMA sync'ed data on those DMA setups.


> That is the goal, but more work needs to be done of course. It won't be
> possible with just a single pseudo skb, the driver will need a fast
> way to get batches of pseudo skbs (per core?) through from rx to tx.
> In mlx4 for instance, either the skb needs to be much more complete
> to be handled from the start of mlx4_en_xmit(), or that function
> would need to be split so that the fast tx could start midway through.
> 
> Or, skb allocation just gets much faster. Then it should be pretty
> straightforward.

With the bulking SLUB API, we can reduce the bare kmem_cache_alloc+free
cost per SKB from 90 cycles to 27 cycles.  It is good, but for really
fast forwarding it would be good to avoid allocating any extra data
structures.  We just want to move a RX packet-page to a TX ring queue.

Maybe the 27 cycles kmem_cache/slab cost is considered "fast-enough",
for what we gain in ease of implementation.  The real expensive part of
the SKB process is memset/clearing the SKB.  Which the fast forward
use-case could avoid.  Splitting the SKB alloc and clearing part would
be a needed first step.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-04-04  7:48 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-02  1:21 [RFC PATCH 0/5] Add driver bpf hook for early packet drop Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter Brenden Blanco
2016-04-02 16:39   ` Tom Herbert
2016-04-03  7:02     ` Brenden Blanco
2016-04-04 22:07       ` Thomas Graf
2016-04-05  8:19         ` Jesper Dangaard Brouer
2016-04-04  8:49   ` Daniel Borkmann
2016-04-04 13:07     ` Jesper Dangaard Brouer
2016-04-04 13:36       ` Daniel Borkmann
2016-04-04 14:09         ` Tom Herbert
2016-04-04 15:12           ` Jesper Dangaard Brouer
2016-04-04 15:29             ` Brenden Blanco
2016-04-04 16:07               ` John Fastabend
2016-04-04 16:17                 ` Brenden Blanco
2016-04-04 20:00                   ` Alexei Starovoitov
2016-04-04 22:04                     ` Thomas Graf
2016-04-05  2:25                       ` Alexei Starovoitov
2016-04-05  8:11                         ` Jesper Dangaard Brouer
2016-04-05  9:29                     ` Jesper Dangaard Brouer
2016-04-05 22:06                       ` Alexei Starovoitov
2016-04-04 14:33       ` Eric Dumazet
2016-04-04 15:18         ` Edward Cree
2016-04-02  1:21 ` [RFC PATCH 2/5] net: add ndo to set bpf prog in adapter rx Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 3/5] rtnl: add option for setting link bpf prog Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program Brenden Blanco
2016-04-02  2:08   ` Eric Dumazet
2016-04-02  2:47     ` Alexei Starovoitov
2016-04-04 14:57       ` Jesper Dangaard Brouer
2016-04-04 15:22         ` Eric Dumazet
2016-04-04 18:50           ` Alexei Starovoitov
2016-04-05 14:15             ` Or Gerlitz
2016-04-06  4:05               ` Brenden Blanco
2016-04-03  6:15     ` Brenden Blanco
2016-04-05  2:20       ` Brenden Blanco
2016-04-05  2:44         ` Eric Dumazet
2016-04-05 18:59         ` Eran Ben Elisha
2016-04-02  8:23   ` Jesper Dangaard Brouer
2016-04-03  6:11     ` Brenden Blanco
2016-04-04 18:27       ` Alexei Starovoitov
2016-04-05  6:04         ` Jesper Dangaard Brouer
2016-04-02 18:40   ` Johannes Berg
2016-04-03  6:38     ` Brenden Blanco
2016-04-04  7:35       ` Johannes Berg
2016-04-04  9:57         ` Daniel Borkmann
2016-04-04 18:46           ` Alexei Starovoitov
2016-04-04 21:01             ` Daniel Borkmann
2016-04-05  1:17               ` Alexei Starovoitov
2016-04-04  8:33   ` Jesper Dangaard Brouer
2016-04-04  9:22   ` Daniel Borkmann
2016-04-02  1:21 ` [RFC PATCH 5/5] Add sample for adding simple drop program to link Brenden Blanco
2016-04-06 19:48   ` Jesper Dangaard Brouer
2016-04-06 20:01     ` Jesper Dangaard Brouer
2016-04-06 23:11       ` Alexei Starovoitov
2016-04-06 20:03     ` Daniel Borkmann
2016-04-02 16:47 ` [RFC PATCH 0/5] Add driver bpf hook for early packet drop Tom Herbert
2016-04-03  5:41   ` Brenden Blanco
2016-04-04  7:48     ` Jesper Dangaard Brouer [this message]
2016-04-04 18:10       ` Alexei Starovoitov
2016-04-02 18:41 ` Johannes Berg
2016-04-02 22:57   ` Tom Herbert
2016-04-03  2:28     ` Lorenzo Colitti
2016-04-04  7:37       ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160404094846.4df8defc@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bblanco@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=gerlitz@mellanox.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.