bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Ioana Ciornei <ioana.ciornei@nxp.com>,
	Alex Marginean <alexandru.marginean@nxp.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Vladimir Oltean <vladimir.oltean@nxp.com>
Subject: Re: [PATCH net-next 9/9] net: enetc: add support for XDP_REDIRECT
Date: Thu, 1 Apr 2021 19:09:43 +0300	[thread overview]
Message-ID: <20210401160943.frw7l3rio7spr33n@skbuf> (raw)
In-Reply-To: <875z16nsiu.fsf@toke.dk>

On Thu, Apr 01, 2021 at 01:39:05PM +0200, Toke Høiland-Jørgensen wrote:
> Vladimir Oltean <olteanv@gmail.com> writes:
>
> > On Thu, Apr 01, 2021 at 01:26:02PM +0200, Toke Høiland-Jørgensen wrote:
> >> > +int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
> >> > +		   struct xdp_frame **frames, u32 flags)
> >> > +{
> >> > +	struct enetc_tx_swbd xdp_redirect_arr[ENETC_MAX_SKB_FRAGS] = {0};
> >> > +	struct enetc_ndev_priv *priv = netdev_priv(ndev);
> >> > +	struct enetc_bdr *tx_ring;
> >> > +	int xdp_tx_bd_cnt, i, k;
> >> > +	int xdp_tx_frm_cnt = 0;
> >> > +
> >> > +	tx_ring = priv->tx_ring[smp_processor_id()];
> >>
> >> What mechanism guarantees that this won't overflow the array? :)
> >
> > Which array, the array of TX rings?
>
> Yes.
>

The problem isn't even accessing an out-of-bounds element in the TX ring array.

As it turns out, I had a relatively superficial understanding of how
things are organized, but let me try to explain.

The number of TX rings is a configurable resource (between PFs and VFs)
and we read the capability at probe time:

enetc_get_si_caps:
	val = enetc_rd(hw, ENETC_SICAPR0);
	si->num_rx_rings = (val >> 16) & 0xff;
	si->num_tx_rings = val & 0xff;

enetc_init_si_rings_params:
	priv->num_tx_rings = si->num_tx_rings;

In any case, the TX array is declared as:

struct enetc_ndev_priv {
	struct enetc_bdr *tx_ring[16];
	struct enetc_bdr *rx_ring[16];
};

because that's the maximum hardware capability.

The priv->tx_ring array is populated in:

enetc_alloc_msix:
	/* # of tx rings per int vector */
	v_tx_rings = priv->num_tx_rings / priv->bdr_int_num;

	for (i = 0; i < priv->bdr_int_num; i++) {
		for (j = 0; j < v_tx_rings; j++) {
			if (priv->bdr_int_num == ENETC_MAX_BDR_INT)
				idx = 2 * j + i; /* 2 CPUs */
			else
				idx = j + i * v_tx_rings; /* default */

			priv->tx_ring[idx] = bdr;
		}
	}

priv->bdr_int_num is set to "num_online_cpus()".
On LS1028A, it can be either 1 or 2 (and the ENETC_MAX_BDR_INT macro is
equal to 2).

Otherwise said, the convoluted logic above does the following:
- It affines an MSI interrupt vector per CPU
- It affines an RX ring per MSI vector, hence per CPU
- It balances the fixed number of TX rings (say 8) among the available
  MSI vectors, hence CPUs (say 2). It does this by iterating with i
  through the RX MSI interrupt vectors, and with j through the number of
  TX rings per MSI vector.

This logic maps:
- the even TX rings to CPU 0 and the odd TX rings to CPU 1, if 2 CPUs
  are used
- all TX rings to CPU 0, if 1 CPU is used

This is done because we have this logic in enetc_poll:

	for (i = 0; i < v->count_tx_rings; i++)
		if (!enetc_clean_tx_ring(&v->tx_ring[i], budget))
			complete = false;

for processing the TX completions of a given group of TX rings in the RX
MSI interrupt handler of a certain CPU.

Otherwise said, priv->tx_ring[i] is always BD ring i, and that mapping
never changes. All 8 TX rings are enabled and available for use.

What I knew about tc-taprio and tc-mqprio is that they only enqueue to
TX queues [0, num_tc-1] because of this, as it turns out:

enetc_xmit:
	tx_ring = priv->tx_ring[skb->queue_mapping];

where skb->queue_mapping is given by:
	err = netif_set_real_num_tx_queues(ndev, priv->num_tx_rings);
and by this, respectively, from the mqprio code path:
	netif_set_real_num_tx_queues(ndev, num_tc);

As for why XDP works, and priv->tx_ring[smp_processor_id()] is:
- TX ring 0 for CPU 0 and TX ring 1 for CPU 1, if 2 CPUs are used
- TX ring 0, if 1 CPU is used

The TX completions in the first case are handled by:
- CPU 0 for TX ring 0 (because it is even) and CPU 1 for TX ring 1
  (because it is odd), if 2 CPUs are used, due to the mapping I talked
  about earlier
- CPU 0 if only 1 CPU is used

> > You mean that it's possible to receive a TC_SETUP_QDISC_MQPRIO or
> > TC_SETUP_QDISC_TAPRIO with num_tc == 1, and we have 2 CPUs?
>
> Not just that, this ndo can be called on arbitrary CPUs after a
> redirect. The code just calls through from the XDP receive path so which
> CPU it ends up on depends on the RSS+IRQ config of the other device,
> which may not even be the same driver; i.e., you have no control over
> that... :)
>

What do you mean by "arbitrary" CPU? You can't plug CPUs in, it's a dual
core system... Why does the source ifindex matter at all? I'm using the
TX ring affined to the CPU that ndo_xdp_xmit is currently running on.

> > Well, yeah, I don't know what's the proper way to deal with that. Ideas?
>
> Well the obvious one is just:
>
> tx_ring = priv->tx_ring[smp_processor_id() % num_ring_ids];
>
> and then some kind of locking to deal with multiple CPUs accessing the
> same TX ring...

By multiple CPUs accessing the same TX ring, you mean locking between
ndo_xdp_xmit and ndo_start_xmit? Can that even happen if the hardware
architecture is to have at least as many TX rings as CPUs?

Because otherwise, I see that ndo_xdp_xmit is only called from
xdp_do_flush, which is in softirq context, which to my very rudimentary
knowledge run with bottom halves, thus preemption, disabled? So I don't
think it's possible for ndo_xdp_xmit and ndo_xmit, or even two
ndo_xdp_xmit instances, to access the same TX ring?

Sorry, I'm sure these are trivial questions, but I would like to really
understand what I need to change and why :D

  reply	other threads:[~2021-04-01 18:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 20:08 [PATCH net-next 0/9] XDP for NXP ENETC Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 1/9] net: enetc: consume the error RX buffer descriptors in a dedicated function Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 2/9] net: enetc: move skb creation into enetc_build_skb Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 3/9] net: enetc: add a dedicated is_eof bit in the TX software BD Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 4/9] net: enetc: clean the TX software BD on the TX confirmation path Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 5/9] net: enetc: move up enetc_reuse_page and enetc_page_reusable Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 6/9] net: enetc: add support for XDP_DROP and XDP_PASS Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 7/9] net: enetc: add support for XDP_TX Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 8/9] net: enetc: increase RX ring default size Vladimir Oltean
2021-03-31 20:08 ` [PATCH net-next 9/9] net: enetc: add support for XDP_REDIRECT Vladimir Oltean
2021-04-01 11:26   ` Toke Høiland-Jørgensen
2021-04-01 11:31     ` Vladimir Oltean
2021-04-01 11:39       ` Toke Høiland-Jørgensen
2021-04-01 16:09         ` Vladimir Oltean [this message]
2021-04-01 18:01           ` Toke Høiland-Jørgensen
2021-04-01 19:38             ` Vladimir Oltean
2021-04-02 10:56               ` Vladimir Oltean
2021-04-03 11:07                 ` Toke Høiland-Jørgensen
2021-04-03 12:12                   ` Vladimir Oltean
2021-04-06 11:23                     ` Toke Høiland-Jørgensen
2021-03-31 22:20 ` [PATCH net-next 0/9] XDP for NXP ENETC patchwork-bot+netdevbpf
2021-03-31 22:55   ` Vladimir Oltean
2021-04-01 11:28     ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210401160943.frw7l3rio7spr33n@skbuf \
    --to=olteanv@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexandru.marginean@nxp.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=claudiu.manoil@nxp.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=ioana.ciornei@nxp.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=vladimir.oltean@nxp.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).