All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Netdev <netdev@vger.kernel.org>,
	Tom Herbert <tom@herbertland.com>,
	Alexei Starovoitov <ast@kernel.org>,
	John Fastabend <john.r.fastabend@intel.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Miller <davem@davemloft.net>
Subject: Re: Questions on XDP
Date: Mon, 20 Feb 2017 19:18:31 -0800	[thread overview]
Message-ID: <20170221031829.GA3960@ast-mbp.thefacebook.com> (raw)
In-Reply-To: <CAKgT0UcpW_MMCgURnB=r5Tezdm9jJSNfOYcNJg8TR1sMKREzPg@mail.gmail.com>

On Sat, Feb 18, 2017 at 06:16:47PM -0800, Alexander Duyck wrote:
>
> I was thinking about the fact that the Mellanox driver is currently
> mapping pages as bidirectional, so I was sticking to the device to
> device case in regards to that discussion.  For virtual interfaces we
> don't even need the DMA mapping, it is just a copy to user space we
> have to deal with in the case of vhost.  In that regard I was thinking
> we need to start looking at taking XDP_TX one step further and
> possibly look at supporting the transmit of an xdp_buf on an unrelated
> netdev.  Although it looks like that means adding a netdev pointer to
> xdp_buf in order to support returning that.

xdp_tx variant (via bpf_xdp_redirect as John proposed) should work.
I don't see why such tx into another netdev cannot be done today.
The only requirement that it shouldn't be driver specific.
Whichever way it's implemented in igxbe/i40e should be applicable
to mlx*, bnx*, nfp at least.

> Anyway I am just running on conjecture at this point.  But it seems
> like if we want to make XDP capable of doing transmit we should
> support something other than bounce on the same port since that seems
> like a "just saturate the bus" use case more than anything.  I suppose
> you can do a one armed router, or have it do encap/decap for a tunnel,
> but that is about the limits of it. 

one armed router is exactly our ILA router use case.
encap/decap is our load balancer use case.

>From your other email:
> 1.  The Tx code is mostly just a toy.  We need support for more
> functional use cases.

this tx toy is serving real traffic.
Adding more use cases to xdp is nice, but we cannot sacrifice
performance of these bread and butter use cases like ddos and lb.

> 2.  1 page per packet is costly, and blocks use on the intel drivers,
> mlx4 (after Eric's patches), and 64K page architectures.

1 page per packet is costly on archs with 64k pagesize. that's it.
I see no reason to waste x86 cycles to improve perf on such archs.
If the argument is truesize socket limits due to 4k vs 2k, then
please show the patch where split page can work just as fast
as page per packet and everyone will be giving two thumbs up.
If we can have 2k truesize with the same xdp_drop/tx performance
then by all means please do it.

But I suspect what is really happening is a premature defense
of likely mediocre ixgbe xdp performance on xdp_drop due to split page...
If so, that's only ixgbe's fault and trying to make other
nics slower to have apple to apples with ixgbe is just wrong.

> 3.  Should we support scatter-gather to support 9K jumbo frames
> instead of allocating order 2 pages?

we can, if main use case of mtu < 4k doesn't suffer.

> If we allow it to do transmit on
> other netdevs then suddenly this has the potential to replace
> significant existing infrastructure.

what existing infrastructure are we talking about?
The clear containers need is clear :)
The xdp_redirect into vhost/virtio would be great to have,
but xdp_tx from one port into another of physical nic
is much less clear. That's 'saturate pci' demo.

> Sorry if I am stirring the hornets nest here.  I just finished the DMA
> API changes to allow DMA page reuse with writable pages on ixgbe, and
> igb/i40e/i40evf should be getting the same treatment shortly.  So now
> I am looking forward at XDP and just noticing a few things that didn't
> seem to make sense given the work I was doing to enable the API.

did I miss the patches that already landed ?
I don't see any recycling done by i40e_clean_tx_irq or by
ixgbe_clean_tx_irq ...

  parent reply	other threads:[~2017-02-21  3:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-18 23:31 Questions on XDP Alexei Starovoitov
2017-02-18 23:48 ` John Fastabend
2017-02-18 23:59   ` Eric Dumazet
2017-02-19  2:16   ` Alexander Duyck
2017-02-19  3:48     ` John Fastabend
2017-02-20 20:06       ` Jakub Kicinski
2017-02-22  5:02         ` John Fastabend
2017-02-21  3:18     ` Alexei Starovoitov [this message]
2017-02-21  3:39       ` John Fastabend
2017-02-21  4:00         ` Alexander Duyck
2017-02-21  7:55           ` Alexei Starovoitov
2017-02-21 17:44             ` Alexander Duyck
2017-02-22 17:08               ` John Fastabend
2017-02-22 21:59                 ` Jesper Dangaard Brouer
  -- strict thread matches above, loose matches on Subject: below --
2017-02-18 23:59 Alexei Starovoitov
2017-02-16 20:41 Alexander Duyck
2017-02-16 22:36 ` John Fastabend
2017-02-18 16:34   ` Jesper Dangaard Brouer
2017-02-18 17:41     ` Eric Dumazet
2017-02-18 18:18       ` Alexander Duyck
2017-02-18 23:28         ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170221031829.GA3960@ast-mbp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=ast@kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.