From: Vladimir Oltean <olteanv@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
Alexander Duyck <alexander.duyck@gmail.com>,
Ioana Ciornei <ioana.ciornei@nxp.com>,
Alex Marginean <alexandru.marginean@nxp.com>,
Claudiu Manoil <claudiu.manoil@nxp.com>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Vladimir Oltean <vladimir.oltean@nxp.com>
Subject: Re: [PATCH net-next 9/9] net: enetc: add support for XDP_REDIRECT
Date: Thu, 1 Apr 2021 19:09:43 +0300 [thread overview]
Message-ID: <20210401160943.frw7l3rio7spr33n@skbuf> (raw)
In-Reply-To: <875z16nsiu.fsf@toke.dk>
On Thu, Apr 01, 2021 at 01:39:05PM +0200, Toke Høiland-Jørgensen wrote:
> Vladimir Oltean <olteanv@gmail.com> writes:
>
> > On Thu, Apr 01, 2021 at 01:26:02PM +0200, Toke Høiland-Jørgensen wrote:
> >> > +int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
> >> > + struct xdp_frame **frames, u32 flags)
> >> > +{
> >> > + struct enetc_tx_swbd xdp_redirect_arr[ENETC_MAX_SKB_FRAGS] = {0};
> >> > + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> >> > + struct enetc_bdr *tx_ring;
> >> > + int xdp_tx_bd_cnt, i, k;
> >> > + int xdp_tx_frm_cnt = 0;
> >> > +
> >> > + tx_ring = priv->tx_ring[smp_processor_id()];
> >>
> >> What mechanism guarantees that this won't overflow the array? :)
> >
> > Which array, the array of TX rings?
>
> Yes.
>
The problem isn't even accessing an out-of-bounds element in the TX ring array.
As it turns out, I had a relatively superficial understanding of how
things are organized, but let me try to explain.
The number of TX rings is a configurable resource (between PFs and VFs)
and we read the capability at probe time:
enetc_get_si_caps:
val = enetc_rd(hw, ENETC_SICAPR0);
si->num_rx_rings = (val >> 16) & 0xff;
si->num_tx_rings = val & 0xff;
enetc_init_si_rings_params:
priv->num_tx_rings = si->num_tx_rings;
In any case, the TX array is declared as:
struct enetc_ndev_priv {
struct enetc_bdr *tx_ring[16];
struct enetc_bdr *rx_ring[16];
};
because that's the maximum hardware capability.
The priv->tx_ring array is populated in:
enetc_alloc_msix:
/* # of tx rings per int vector */
v_tx_rings = priv->num_tx_rings / priv->bdr_int_num;
for (i = 0; i < priv->bdr_int_num; i++) {
for (j = 0; j < v_tx_rings; j++) {
if (priv->bdr_int_num == ENETC_MAX_BDR_INT)
idx = 2 * j + i; /* 2 CPUs */
else
idx = j + i * v_tx_rings; /* default */
priv->tx_ring[idx] = bdr;
}
}
priv->bdr_int_num is set to "num_online_cpus()".
On LS1028A, it can be either 1 or 2 (and the ENETC_MAX_BDR_INT macro is
equal to 2).
Otherwise said, the convoluted logic above does the following:
- It affines an MSI interrupt vector per CPU
- It affines an RX ring per MSI vector, hence per CPU
- It balances the fixed number of TX rings (say 8) among the available
MSI vectors, hence CPUs (say 2). It does this by iterating with i
through the RX MSI interrupt vectors, and with j through the number of
TX rings per MSI vector.
This logic maps:
- the even TX rings to CPU 0 and the odd TX rings to CPU 1, if 2 CPUs
are used
- all TX rings to CPU 0, if 1 CPU is used
This is done because we have this logic in enetc_poll:
for (i = 0; i < v->count_tx_rings; i++)
if (!enetc_clean_tx_ring(&v->tx_ring[i], budget))
complete = false;
for processing the TX completions of a given group of TX rings in the RX
MSI interrupt handler of a certain CPU.
Otherwise said, priv->tx_ring[i] is always BD ring i, and that mapping
never changes. All 8 TX rings are enabled and available for use.
What I knew about tc-taprio and tc-mqprio is that they only enqueue to
TX queues [0, num_tc-1] because of this, as it turns out:
enetc_xmit:
tx_ring = priv->tx_ring[skb->queue_mapping];
where skb->queue_mapping is given by:
err = netif_set_real_num_tx_queues(ndev, priv->num_tx_rings);
and by this, respectively, from the mqprio code path:
netif_set_real_num_tx_queues(ndev, num_tc);
As for why XDP works, and priv->tx_ring[smp_processor_id()] is:
- TX ring 0 for CPU 0 and TX ring 1 for CPU 1, if 2 CPUs are used
- TX ring 0, if 1 CPU is used
The TX completions in the first case are handled by:
- CPU 0 for TX ring 0 (because it is even) and CPU 1 for TX ring 1
(because it is odd), if 2 CPUs are used, due to the mapping I talked
about earlier
- CPU 0 if only 1 CPU is used
> > You mean that it's possible to receive a TC_SETUP_QDISC_MQPRIO or
> > TC_SETUP_QDISC_TAPRIO with num_tc == 1, and we have 2 CPUs?
>
> Not just that, this ndo can be called on arbitrary CPUs after a
> redirect. The code just calls through from the XDP receive path so which
> CPU it ends up on depends on the RSS+IRQ config of the other device,
> which may not even be the same driver; i.e., you have no control over
> that... :)
>
What do you mean by "arbitrary" CPU? You can't plug CPUs in, it's a dual
core system... Why does the source ifindex matter at all? I'm using the
TX ring affined to the CPU that ndo_xdp_xmit is currently running on.
> > Well, yeah, I don't know what's the proper way to deal with that. Ideas?
>
> Well the obvious one is just:
>
> tx_ring = priv->tx_ring[smp_processor_id() % num_ring_ids];
>
> and then some kind of locking to deal with multiple CPUs accessing the
> same TX ring...
By multiple CPUs accessing the same TX ring, you mean locking between
ndo_xdp_xmit and ndo_start_xmit? Can that even happen if the hardware
architecture is to have at least as many TX rings as CPUs?
Because otherwise, I see that ndo_xdp_xmit is only called from
xdp_do_flush, which is in softirq context, which to my very rudimentary
knowledge run with bottom halves, thus preemption, disabled? So I don't
think it's possible for ndo_xdp_xmit and ndo_xmit, or even two
ndo_xdp_xmit instances, to access the same TX ring?
Sorry, I'm sure these are trivial questions, but I would like to really
understand what I need to change and why :D
next prev parent reply other threads:[~2021-04-01 18:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-31 20:08 [PATCH net-next 0/9] XDP for NXP ENETC Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 1/9] net: enetc: consume the error RX buffer descriptors in a dedicated function Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 2/9] net: enetc: move skb creation into enetc_build_skb Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 3/9] net: enetc: add a dedicated is_eof bit in the TX software BD Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 4/9] net: enetc: clean the TX software BD on the TX confirmation path Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 5/9] net: enetc: move up enetc_reuse_page and enetc_page_reusable Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 6/9] net: enetc: add support for XDP_DROP and XDP_PASS Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 7/9] net: enetc: add support for XDP_TX Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 8/9] net: enetc: increase RX ring default size Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 9/9] net: enetc: add support for XDP_REDIRECT Vladimir Oltean
2021-04-01 11:26 ` Toke Høiland-Jørgensen
2021-04-01 11:31 ` Vladimir Oltean
2021-04-01 11:39 ` Toke Høiland-Jørgensen
2021-04-01 16:09 ` Vladimir Oltean [this message]
2021-04-01 18:01 ` Toke Høiland-Jørgensen
2021-04-01 19:38 ` Vladimir Oltean
2021-04-02 10:56 ` Vladimir Oltean
2021-04-03 11:07 ` Toke Høiland-Jørgensen
2021-04-03 12:12 ` Vladimir Oltean
2021-04-06 11:23 ` Toke Høiland-Jørgensen
2021-03-31 22:20 ` [PATCH net-next 0/9] XDP for NXP ENETC patchwork-bot+netdevbpf
2021-03-31 22:55 ` Vladimir Oltean
2021-04-01 11:28 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210401160943.frw7l3rio7spr33n@skbuf \
--to=olteanv@gmail.com \
--cc=alexander.duyck@gmail.com \
--cc=alexandru.marginean@nxp.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=claudiu.manoil@nxp.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=ioana.ciornei@nxp.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=songliubraving@fb.com \
--cc=toke@redhat.com \
--cc=vladimir.oltean@nxp.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).