From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: "John Fastabend" <john.fastabend@gmail.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
alardam@gmail.com, magnus.karlsson@intel.com,
bjorn.topel@intel.com, andrii.nakryiko@gmail.com,
kuba@kernel.org, ast@kernel.org, netdev@vger.kernel.org,
davem@davemloft.net, hawk@kernel.org, jonathan.lemon@gmail.com,
bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com,
maciejromanfijalkowski@gmail.com,
intel-wired-lan@lists.osuosl.org,
"Marek Majtyka" <marekx.majtyka@intel.com>
Subject: Re: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set
Date: Wed, 9 Dec 2020 12:52:23 +0100 [thread overview]
Message-ID: <20201209125223.49096d50@carbon> (raw)
In-Reply-To: <20201209095454.GA36812@ranger.igk.intel.com>
On Wed, 9 Dec 2020 10:54:54 +0100
Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote:
> On Tue, Dec 08, 2020 at 10:03:51PM -0800, John Fastabend wrote:
> > > On Mon, Dec 07, 2020 at 12:52:22PM -0800, John Fastabend wrote:
> > > > Jesper Dangaard Brouer wrote:
> > > > > On Fri, 4 Dec 2020 16:21:08 +0100
> > > > > Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > [...] pruning the thread to answer Jesper.
>
> I think you meant me, but thanks anyway for responding :)
I was about to say that ;-)
> > > > >
> > > > > Use-case(2): Disable XDP_TX on a driver to save hardware TX-queue
> > > > > resources, as the use-case is only DDoS. Today we have this problem
> > > > > with the ixgbe hardware, that cannot load XDP programs on systems with
> > > > > more than 192 CPUs.
> > > >
> > > > The ixgbe issues is just a bug or missing-feature in my opinion.
> > >
> > > Not a bug, rather HW limitation?
> >
> > Well hardware has some max queue limit. Likely <192 otherwise I would
> > have kept doing queue per core on up to 192. But, ideally we should
>
> Data sheet states its 128 Tx qs for ixgbe.
I likely remember wrong, maybe it was only ~96 CPUs. I do remember that
some TX queue were reserved for something else, and QA reported issues
(as I don't have this high end system myself).
> > still load and either share queues across multiple cores or restirct
> > down to a subset of CPUs.
>
> And that's the missing piece of logic, I suppose.
>
> > Do you need 192 cores for a 10gbps nic, probably not.
>
> Let's hear from Jesper :p
LOL - of-cause you don't need 192 cores. With XDP I will claim that
you only need 2 cores (with high GHz) to forward 10gbps wirespeed small
packets.
The point is that this only works, when we avoid atomic lock operations
per packet and bulk NIC PCIe tail/doorbell. It was actually John's
invention/design to have a dedicated TX queue per core to avoid the
atomic lock operation per packet when queuing packets to the NIC.
10G @64B give budget of 67.2 ns (241 cycles @ 3.60GHz)
Atomic lock operation use:[1]
- Type:spin_lock_unlock Per elem: 34 cycles(tsc) 9.485 ns
- Type:spin_lock_unlock_irqsave Per elem: 61 cycles(tsc) 17.125 ns
(And atomic can affect Inst per cycle)
But I have redesigned the ndo_xdp_xmit call to take a bulk of packets
(up-to 16) so it should not be a problem to solve this by sharing
TX-queue and talking a lock per 16 packets. I still recommend that,
for fallback case, you allocated a number a TX-queue and distribute
this across CPUs to avoid hitting a congested lock (above measurements
are the optimal non-congested atomic lock operation)
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
> > Yes, it requires some extra care, but should be doable
> > if someone cares enough. I gather current limitation/bug is because
> > no one has that configuration and/or has complained loud enough.
>
> I would say we're safe for queue per core approach for newer devices where
> we have thousands of queues to play with. Older devices combined with big
> cpu count can cause us some problems.
>
> Wondering if drivers could have a problem when user would do something
> weird as limiting the queue count to a lower value than cpu count and then
> changing the irq affinity?
Not sure what you mean.
But for XDP RX-side we use softirq NAPI guarantee to guard against
concurrent access to our (per-cpu) data structures.
> >
> > >
> > > >
> > > > I think we just document that XDP_TX consumes resources and if users
> > > > care they shouldn't use XD_TX in programs and in that case hardware
> > > > should via program discovery not allocate the resource. This seems
> > > > cleaner in my opinion then more bits for features.
> > >
> > > But what if I'm with some limited HW that actually has a support for XDP
> > > and I would like to utilize XDP_TX?
> > >
> > > Not all drivers that support XDP consume Tx resources. Recently igb got
> > > support and it shares Tx queues between netstack and XDP.
> >
> > Makes sense to me.
> >
> > >
> > > I feel like we should have a sort-of best effort approach in case we
> > > stumble upon the XDP_TX in prog being loaded and query the driver if it
> > > would be able to provide the Tx resources on the current system, given
> > > that normally we tend to have a queue per core.
> >
> > Why do we need to query? I guess you want some indication from the
> > driver its not going to be running in the ideal NIC configuraition?
> > I guess printing a warning would be the normal way to show that. But,
> > maybe your point is you want something easier to query?
>
> I meant that given Jesper's example, what should we do? You don't have Tx
> resources to pull at all. Should we have a data path for that case that
> would share Tx qs between XDP/netstack? Probably not.
>
I think ixgbe should have a fallback mode, where it allocated e.g. 32
TX-queue for XDP xmits or even just same amount as RX-queues (I think
XDP_TX and XDP_REDIRECT can share these TX-queues dedicated to XDP).
When in fallback mode a lock need to be taken (sharded across CPUs),
but ndo_xdp_xmit will bulk up-to 16 packets, so it should not matter
too much.
I do think ixgbe should output a dmesg log message, to say it is in XDP
fallback mode with X number of TX-queues. For us QA usually collect
the dmesg output after a test run.
> > >
> > > In that case igb would say yes, ixgbe would say no and prog would be
> > > rejected.
> >
> > I think the driver should load even if it can't meet the queue per
> > core quota. Refusing to load at all or just dropping packets on the
> > floor is not very friendly. I think we agree on that point.
>
> Agreed on that. But it needs some work. I can dabble on that a bit.
>
I will really appreciate if Intel can fix this in the ixgbe driver, and
implement a fallback method.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2020-12-09 11:54 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-04 10:28 [PATCH v2 bpf 0/5] New netdev feature flags for XDP alardam
2020-12-04 10:28 ` [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set alardam
2020-12-04 12:18 ` Toke Høiland-Jørgensen
2020-12-04 12:46 ` Maciej Fijalkowski
2020-12-04 15:21 ` Daniel Borkmann
2020-12-04 17:20 ` Toke Høiland-Jørgensen
2020-12-04 22:19 ` Daniel Borkmann
2020-12-07 11:54 ` Jesper Dangaard Brouer
2020-12-07 12:08 ` Toke Høiland-Jørgensen
2020-12-07 12:03 ` Toke Høiland-Jørgensen
2020-12-07 12:54 ` Jesper Dangaard Brouer
2020-12-07 20:52 ` John Fastabend
2020-12-07 22:38 ` Saeed Mahameed
2020-12-07 23:07 ` Maciej Fijalkowski
2020-12-09 6:03 ` John Fastabend
2020-12-09 9:54 ` Maciej Fijalkowski
2020-12-09 11:52 ` Jesper Dangaard Brouer [this message]
2020-12-09 15:41 ` David Ahern
2020-12-09 17:15 ` Saeed Mahameed
2020-12-10 3:34 ` David Ahern
2020-12-10 6:48 ` Saeed Mahameed
2020-12-10 15:30 ` David Ahern
2020-12-10 18:58 ` Saeed Mahameed
2021-01-05 11:56 ` Marek Majtyka
2021-02-01 16:16 ` Toke Høiland-Jørgensen
2021-02-02 11:26 ` Marek Majtyka
2021-02-02 12:05 ` Toke Høiland-Jørgensen
2021-02-02 19:34 ` Jakub Kicinski
2021-02-03 12:50 ` Marek Majtyka
2021-02-03 17:02 ` Jakub Kicinski
2021-02-10 10:53 ` Toke Høiland-Jørgensen
2021-02-10 18:31 ` Jakub Kicinski
2021-02-10 22:52 ` Toke Høiland-Jørgensen
2021-02-12 1:26 ` Jakub Kicinski
2021-02-12 2:05 ` Alexei Starovoitov
2021-02-12 7:02 ` Marek Majtyka
2021-02-16 14:30 ` Toke Høiland-Jørgensen
2020-12-09 15:44 ` David Ahern
2020-12-10 13:32 ` Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set) Jesper Dangaard Brouer
2020-12-10 14:14 ` [Intel-wired-lan] " Magnus Karlsson
2020-12-10 17:30 ` Jesper Dangaard Brouer
2020-12-10 19:20 ` Saeed Mahameed
2020-12-08 1:01 ` [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set David Ahern
2020-12-08 8:28 ` Jesper Dangaard Brouer
2020-12-08 11:58 ` Toke Høiland-Jørgensen
2020-12-09 5:50 ` John Fastabend
2020-12-09 10:26 ` Toke Høiland-Jørgensen
2020-12-08 9:00 ` Jesper Dangaard Brouer
2020-12-08 9:42 ` Daniel Borkmann
2020-12-04 12:57 ` Maciej Fijalkowski
2020-12-04 10:28 ` [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on alardam
2020-12-04 12:19 ` Toke Høiland-Jørgensen
2020-12-04 10:28 ` [PATCH v2 bpf 3/5] xsk: add usage of xdp properties flags alardam
2020-12-04 10:29 ` [PATCH v2 bpf 4/5] xsk: add check for full support of XDP in bind alardam
2020-12-04 10:29 ` [PATCH v2 bpf 5/5] ethtool: provide xdp info with XDP_PROPERTIES_GET alardam
2020-12-04 17:20 ` [PATCH v2 bpf 0/5] New netdev feature flags for XDP Jakub Kicinski
2020-12-04 17:26 ` Toke Høiland-Jørgensen
2020-12-04 19:22 ` Jakub Kicinski
2020-12-07 12:04 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201209125223.49096d50@carbon \
--to=jbrouer@redhat.com \
--cc=alardam@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=ast@kernel.org \
--cc=bjorn.topel@intel.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hawk@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jeffrey.t.kirsher@intel.com \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=maciejromanfijalkowski@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=marekx.majtyka@intel.com \
--cc=netdev@vger.kernel.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).