netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
	"David Ahern" <dsahern@gmail.com>,
	"David Ahern" <dsahern@kernel.org>,
	netdev@vger.kernel.org, prashantbhole.linux@gmail.com,
	jasowang@redhat.com, davem@davemloft.net, mst@redhat.com,
	toshiaki.makita1@gmail.com, daniel@iogearbox.net,
	john.fastabend@gmail.com, ast@kernel.org, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com, andriin@fb.com,
	"David Ahern" <dahern@digitalocean.com>,
	brouer@redhat.com, "Björn Töpel" <bjorn.topel@intel.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>
Subject: Re: [PATCH bpf-next 03/12] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path
Date: Tue, 04 Feb 2020 12:00:40 +0100	[thread overview]
Message-ID: <87y2ti4nxj.fsf@toke.dk> (raw)
In-Reply-To: <20200203231503.24eec7f0@carbon>

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Mon, 03 Feb 2020 21:13:24 +0100
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> Oops, I see I forgot to reply to this bit:
>> 
>> >> Yeah, but having the low-level details available to the XDP program
>> >> (such as HW queue occupancy for the egress hook) is one of the benefits
>> >> of XDP, isn't it?  
>> >
>> > I think I glossed over the hope for having access to HW queue occupancy
>> > - what exactly are you after? 
>> >
>> > I don't think one can get anything beyond a BQL type granularity.
>> > Reading over PCIe is out of question, device write back on high
>> > granularity would burn through way too much bus throughput.  
>> 
>> This was Jesper's idea originally, so maybe he can explain better; but
>> as I understood it, he basically wanted to expose the same information
>> that BQL has to eBPF. Making it possible for an eBPF program to either
>> (re-)implement BQL with its own custom policy, or react to HWQ pressure
>> in some other way, such as by load balancing to another interface.
>
> Yes, and I also have plans that goes beyond BQL. But let me start with
> explaining the BQL part, and answer Toke's question below.
>
> On Mon, 03 Feb 2020 20:56:03 +0100 Toke wrote:
>> [...] Hmm, I wonder if a TX driver hook is enough?
>
> Short answer is no, a TX driver hook is not enough.  The queue state
> info the TX driver hook have access to, needs to be updated once the
> hardware have "confirmed" the TX-DMA operation have completed.  For
> BQL/DQL this update happens during TX-DMA completion/cleanup (code
> see call sites for netdev_tx_completed_queue()).  (As Jakub wisely
> point out we cannot query the device directly due to performance
> implications).  It doesn't need to be a new BPF hook, just something
> that update the queue state info (we could piggy back on the
> netdev_tx_completed_queue() call or give TX hook access to
> dev_queue->dql).

The question is whether this can't simply be done through bpf helpers?
bpf_get_txq_occupancy(ifindex, txqno)?

> Regarding "where is the queue": For me the XDP-TX queue is the NIC
> hardware queue, that this BPF hook have some visibility into and can do
> filtering on. (Imagine that my TX queue is bandwidth limited, then I
> can shrink the packet size and still send a "congestion" packet to my
> receiver).

I'm not sure the hardware queues will be enough, though. Unless I'm
misunderstanding something, hardware queues are (1) fairly short and (2)
FIFO. So, say we wanted to implement fq_codel for XDP forwarding: we'd
still need a software queueing layer on top of the hardware queue.

If the hardware is EDT-aware this may change, I suppose, but I'm not
sure if we can design the XDP queueing primitives with this assumption? :)

> The bigger picture is that I envision the XDP-TX/egress hook can
> open-up for taking advantage of NIC hardware TX queue features. This
> also ties into the queue abstraction work by Björn+Magnus. Today NIC
> hardware can do a million TX-queues, and hardware can also do rate
> limiting per queue. Thus, I also envision that the XDP-TX/egress hook
> can choose/change the TX queue the packet is queue/sent on (we can
> likely just overload the XDP_REDIRECT and have a new bpf map type for
> this).

Yes, I think I mentioned in another email that putting all the queueing
smarts into the redirect map was also something I'd considered (well, I
do think we've discussed this in the past, so maybe not so surprising if
we're thinking along the same lines) :)

But the implication of this is also that an actual TX hook in the driver
need not necessarily incorporate a lot of new functionality, as it can
control the queueing through a combination of BPF helpers and map
updates?

-Toke


  reply	other threads:[~2020-02-04 11:00 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23  1:41 [PATCH bpf-next 00/12] Add support for XDP in egress path David Ahern
2020-01-23  1:41 ` [PATCH bpf-next 01/12] net: Add new XDP setup and query commands David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
2020-01-23 11:34   ` Toke Høiland-Jørgensen
2020-01-23 21:32     ` David Ahern
2020-01-24  9:49       ` Toke Høiland-Jørgensen
2020-01-24  7:33   ` Martin Lau
2020-01-23  1:42 ` [PATCH bpf-next 03/12] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
2020-01-23 11:35   ` Toke Høiland-Jørgensen
2020-01-23 21:33     ` David Ahern
2020-01-24 15:21       ` Jakub Kicinski
2020-01-24 15:36         ` Toke Høiland-Jørgensen
2020-01-26  1:43           ` David Ahern
2020-01-26  4:54             ` Alexei Starovoitov
2020-02-02 17:59               ` David Ahern
2020-01-26 12:49             ` Jesper Dangaard Brouer
2020-01-26 16:38               ` David Ahern
2020-01-26 22:17               ` Jakub Kicinski
2020-01-28 14:13                 ` Jesper Dangaard Brouer
2020-01-30 14:45                   ` Jakub Kicinski
2020-02-01 16:03                     ` Toke Høiland-Jørgensen
2020-02-02 17:48                       ` David Ahern
2020-01-26 22:11             ` Jakub Kicinski
2020-01-27  4:03               ` David Ahern
2020-01-27 14:16                 ` Jakub Kicinski
2020-01-28  3:43                   ` David Ahern
2020-01-28 13:57                     ` Jakub Kicinski
2020-02-01 16:24                       ` Toke Høiland-Jørgensen
2020-02-01 17:08                         ` Jakub Kicinski
2020-02-01 20:05                           ` Toke Høiland-Jørgensen
2020-02-02  4:15                             ` Jakub Kicinski
2020-02-03 19:56                               ` Toke Høiland-Jørgensen
2020-02-03 20:13                               ` Toke Høiland-Jørgensen
2020-02-03 22:15                                 ` Jesper Dangaard Brouer
2020-02-04 11:00                                   ` Toke Høiland-Jørgensen [this message]
2020-02-04 17:09                                     ` Jakub Kicinski
2020-02-05 15:30                                       ` Toke Høiland-Jørgensen
2020-02-02 17:45                           ` David Ahern
2020-02-02 19:12                             ` Jakub Kicinski
2020-02-02 17:43                       ` David Ahern
2020-02-02 19:31                         ` Jakub Kicinski
2020-02-02 21:51                           ` David Ahern
2020-02-01 15:59             ` Toke Høiland-Jørgensen
2020-02-02 17:54               ` David Ahern
2020-02-03 20:09                 ` Toke Høiland-Jørgensen
2020-01-23  1:42 ` [PATCH bpf-next 04/12] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 05/12] tuntap: check tun_msg_ctl type at necessary places David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 06/12] tun: move shared functions to if_tun.h David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 07/12] vhost_net: user tap recvmsg api to access ptr ring David Ahern
2020-01-23  8:26   ` Michael S. Tsirkin
2020-01-23  1:42 ` [PATCH bpf-next 08/12] tuntap: remove usage of ptr ring in vhost_net David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 09/12] tun: set egress XDP program David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 10/12] tun: run XDP program in tx path David Ahern
2020-01-23  8:23   ` Michael S. Tsirkin
2020-01-24 13:36     ` Prashant Bhole
2020-01-24 13:44     ` Prashant Bhole
2020-01-23  1:42 ` [PATCH bpf-next 11/12] libbpf: Add egress XDP support David Ahern
2020-01-23  1:42 ` [PATCH bpf-next 12/12] samples/bpf: xdp1, add " David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2ti4nxj.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=brouer@redhat.com \
    --cc=dahern@digitalocean.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=prashantbhole.linux@gmail.com \
    --cc=songliubraving@fb.com \
    --cc=toshiaki.makita1@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).