All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: Tobias Waldekranz <tobias@waldekranz.com>
Cc: DENG Qingfang <dqfext@gmail.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, Ido Schimmel <idosch@idosch.org>,
	Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	bridge@lists.linux-foundation.org,
	Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices
Date: Mon, 5 Jul 2021 12:57:27 +0300	[thread overview]
Message-ID: <20210705095727.w7iigb3goaorzef5@skbuf> (raw)
In-Reply-To: <87v95p6u0r.fsf@waldekranz.com>

On Mon, Jul 05, 2021 at 10:32:04AM +0200, Tobias Waldekranz wrote:
> > Many DSA taggers use port bit field in their TX tags, which allows
> > replication in hardware. (multiple bits set = send to multiple ports)
> > I wonder if the tagger API can be updated to support this.
>
> I think you could, but it would be tricky.
>
> The bridge does not operate using vectors/bitfields, rather it is
> procedural code that you have to loop through before knowing the set of
> destination ports.
>
> This series just sends the skb to the first port in the hardware domain
> and trusts the HW to calculate the same port set as the code in
> br_forward.c would have.
>
> To do what you suggest, the bridge would have to translate each nbp into
> a position in a bitfield (or call out to the underlying driver to do it)
> as it is looping through ports, then send the aggregated mask along with
> the skb. Knowing if a port is the first one you have come across for a
> given domain is very easy (just maintain a bitfield), knowing if it is
> the last one is harder. So you would likely end up having to queue up
> the actual transmission until after the loop has been executed, which
> hard to combine with the "lazy cloning" that you really want to get
> decent performance.

In addition to changing the bridge in order to get the entire bit mask,
one also has to somehow propagate that bit mask per skb down to the
driver which might be tricky in itself. There is currently no bridge
specific data structure passed between the bridge and the switchdev
driver, it is just the struct net_device *sb_dev. A hacky solution I
might imagine is for the bridge to kzalloc() a small data structure
like:

struct bridge_fwd_offload_accel_priv {
	struct net_device *sb_dev; /* Must be first! */
	unsigned long port_mask;
};

and call as follows:

int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
{
	struct bridge_fwd_offload_accel_priv *accel_priv = NULL;

	if (br_switchdev_accels_skb(skb)) {
		accel_priv = kzalloc(sizeof(*accel_priv), GFP_ATOMIC);
		if (!accel_priv)
			return -ENOMEM;

		accel_priv->sb_dev = BR_INPUT_SKB_CB(skb)->brdev;
		accel_priv->port_mask = port_mask;
	}

	dev_queue_xmit_accel(skb, accel_priv);
}

This way, the code in net/core/dev.c can be left unmodified. We give it
an accel_priv pointer but it can think it is only looking at a sb_dev
pointer, since that is the first element in the structure.

But then the switchdev driver must kfree(accel_priv) in the xmit function.

Not really nice, but for a cuter solution, I think we would need to extend struct sk_buff.

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Oltean <olteanv@gmail.com>
To: Tobias Waldekranz <tobias@waldekranz.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, Ido Schimmel <idosch@idosch.org>,
	bridge@lists.linux-foundation.org, netdev@vger.kernel.org,
	Roopa Prabhu <roopa@nvidia.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	DENG Qingfang <dqfext@gmail.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Vladimir Oltean <vladimir.oltean@nxp.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [Bridge] [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices
Date: Mon, 5 Jul 2021 12:57:27 +0300	[thread overview]
Message-ID: <20210705095727.w7iigb3goaorzef5@skbuf> (raw)
In-Reply-To: <87v95p6u0r.fsf@waldekranz.com>

On Mon, Jul 05, 2021 at 10:32:04AM +0200, Tobias Waldekranz wrote:
> > Many DSA taggers use port bit field in their TX tags, which allows
> > replication in hardware. (multiple bits set = send to multiple ports)
> > I wonder if the tagger API can be updated to support this.
>
> I think you could, but it would be tricky.
>
> The bridge does not operate using vectors/bitfields, rather it is
> procedural code that you have to loop through before knowing the set of
> destination ports.
>
> This series just sends the skb to the first port in the hardware domain
> and trusts the HW to calculate the same port set as the code in
> br_forward.c would have.
>
> To do what you suggest, the bridge would have to translate each nbp into
> a position in a bitfield (or call out to the underlying driver to do it)
> as it is looping through ports, then send the aggregated mask along with
> the skb. Knowing if a port is the first one you have come across for a
> given domain is very easy (just maintain a bitfield), knowing if it is
> the last one is harder. So you would likely end up having to queue up
> the actual transmission until after the loop has been executed, which
> hard to combine with the "lazy cloning" that you really want to get
> decent performance.

In addition to changing the bridge in order to get the entire bit mask,
one also has to somehow propagate that bit mask per skb down to the
driver which might be tricky in itself. There is currently no bridge
specific data structure passed between the bridge and the switchdev
driver, it is just the struct net_device *sb_dev. A hacky solution I
might imagine is for the bridge to kzalloc() a small data structure
like:

struct bridge_fwd_offload_accel_priv {
	struct net_device *sb_dev; /* Must be first! */
	unsigned long port_mask;
};

and call as follows:

int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
{
	struct bridge_fwd_offload_accel_priv *accel_priv = NULL;

	if (br_switchdev_accels_skb(skb)) {
		accel_priv = kzalloc(sizeof(*accel_priv), GFP_ATOMIC);
		if (!accel_priv)
			return -ENOMEM;

		accel_priv->sb_dev = BR_INPUT_SKB_CB(skb)->brdev;
		accel_priv->port_mask = port_mask;
	}

	dev_queue_xmit_accel(skb, accel_priv);
}

This way, the code in net/core/dev.c can be left unmodified. We give it
an accel_priv pointer but it can think it is only looking at a sb_dev
pointer, since that is the first element in the structure.

But then the switchdev driver must kfree(accel_priv) in the xmit function.

Not really nice, but for a cuter solution, I think we would need to extend struct sk_buff.

  reply	other threads:[~2021-07-05  9:57 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-03 11:56 [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices Vladimir Oltean
2021-07-03 11:56 ` [Bridge] " Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 01/10] net: dfwd: constrain existing users to macvlan subordinates Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 02/10] net: bridge: disambiguate offload_fwd_mark Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-09 13:23   ` Grygorii Strashko
2021-07-03 11:56 ` [RFC PATCH v2 net-next 03/10] net: bridge: switchdev: recycle unused hwdoms Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 04/10] net: bridge: switchdev: allow the data plane forwarding to be offloaded Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-09 13:16   ` Grygorii Strashko
2021-07-09 14:09     ` Vladimir Oltean
2021-07-09 14:09       ` [Bridge] " Vladimir Oltean
2021-07-12 12:28       ` Tobias Waldekranz
2021-07-12 12:28         ` [Bridge] " Tobias Waldekranz
2021-07-12 13:03         ` Vladimir Oltean
2021-07-12 13:03           ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 05/10] net: extract helpers for binding a subordinate device to TX queues Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 06/10] net: allow ndo_select_queue to go beyond dev->num_real_tx_queues Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 07/10] net: dsa: track the number of switches in a tree Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 08/10] net: dsa: add support for bridge forwarding offload Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 09/10] net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 10/10] net: dsa: tag_dsa: offload the bridge forwarding process Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 22:04 ` [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices Tobias Waldekranz
2021-07-03 22:04   ` [Bridge] " Tobias Waldekranz
2021-07-04  8:11   ` Vladimir Oltean
2021-07-04  8:11     ` [Bridge] " Vladimir Oltean
2021-07-05  8:09     ` Tobias Waldekranz
2021-07-05  8:09       ` [Bridge] " Tobias Waldekranz
2021-07-05  8:54       ` Vladimir Oltean
2021-07-05  8:54         ` [Bridge] " Vladimir Oltean
2021-07-05  4:20 ` DENG Qingfang
2021-07-05  4:20   ` [Bridge] " DENG Qingfang
2021-07-05  8:32   ` Tobias Waldekranz
2021-07-05  8:32     ` [Bridge] " Tobias Waldekranz
2021-07-05  9:57     ` Vladimir Oltean [this message]
2021-07-05  9:57       ` Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210705095727.w7iigb3goaorzef5@skbuf \
    --to=olteanv@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=dqfext@gmail.com \
    --cc=f.fainelli@gmail.com \
    --cc=idosch@idosch.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=stephen@networkplumber.org \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.