All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, Ido Schimmel <idosch@idosch.org>,
	Tobias Waldekranz <tobias@waldekranz.com>,
	Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	"bridge@lists.linux-foundation.org" 
	<bridge@lists.linux-foundation.org>,
	Alexander Duyck <alexander.duyck@gmail.com>
Subject: Re: [RFC PATCH v2 net-next 04/10] net: bridge: switchdev: allow the data plane forwarding to be offloaded
Date: Fri, 9 Jul 2021 14:09:40 +0000	[thread overview]
Message-ID: <20210709140940.4ak5vvt5hxay3wus@skbuf> (raw)
In-Reply-To: <3686cff1-2a80-687e-7c64-cf070a0f5324@ti.com>

Hi Grygorii,

On Fri, Jul 09, 2021 at 04:16:13PM +0300, Grygorii Strashko wrote:
> On 03/07/2021 14:56, Vladimir Oltean wrote:
> > From: Tobias Waldekranz <tobias@waldekranz.com>
> >
> > Allow switchdevs to forward frames from the CPU in accordance with the
> > bridge configuration in the same way as is done between bridge
> > ports. This means that the bridge will only send a single skb towards
> > one of the ports under the switchdev's control, and expects the driver
> > to deliver the packet to all eligible ports in its domain.
> >
> > Primarily this improves the performance of multicast flows with
> > multiple subscribers, as it allows the hardware to perform the frame
> > replication.
> >
> > The basic flow between the driver and the bridge is as follows:
> >
> > - The switchdev accepts the offload by returning a non-null pointer
> >    from .ndo_dfwd_add_station when the port is added to the bridge.
> >
> > - The bridge sends offloadable skbs to one of the ports under the
> >    switchdev's control using dev_queue_xmit_accel.
> >
> > - The switchdev notices the offload by checking for a non-NULL
> >    "sb_dev" in the core's call to .ndo_select_queue.
>
> Sry, I could be missing smth.
>
> Is there any possibility to just mark skb itself as "fwd_offload" (or smth), so driver can
> just check it and decide what to do. Following you series:
> - BR itself will send packet only once to one port if fwd offload possible and supported
> - switchdev driver can check/negotiate BR_FWD_OFFLOAD flag
>
> In our case, TI CPSW can send directed packet (default now), by specifying port_id if DMA desc
> or keep port_id == 0 which will allow HW to process packet internally, including MC duplication.
>
> Sry, again, but necessity to add 3 callbacks and manipulate with "virtual" queue to achieve
> MC offload (seems like one of the primary goals) from BR itself looks a bit over-complicated :(

After cutting my teeth myself with Tobias' patches, I tend to agree with
the idea that the macvlan offload framework is not a great fit for the
software bridge data plane TX offloading. Some reasons:
- the sb_dev pointer is necessary for macvlan because you can have
  multiple macvlan uppers and you need to know which one this packet
  came from. Whereas in the case of a bridge, any given switchdev net
  device can have a single bridge upper. So a single bit per skb,
  possibly even skb->offload_fwd_mark, could be used to encode this bit
  of information: please look up your FDB for this packet and
  forward/replicate it accordingly.
- I am a bit on the fence about the "net: allow ndo_select_queue to go
  beyond dev->num_real_tx_queues" and "net: extract helpers for binding
  a subordinate device to TX queues" patches, they look like the wrong
  approach overall, just to shoehorn our use case into a framework that
  was not meant to cover it.
- most importantly: Ido asked about the possibility for a switchdev to
  accelerate the data plane for a bridge port that is a LAG upper. In the
  current design, where the bridge attempts to call the
  .ndo_dfwd_add_station method of the bond/team driver, this will not
  work. Traditionally, switchdev has migrated away from ndo's towards
  notifiers because of the ability for a switchdev to intercept the
  notifier emitted by the bridge for the bonding interface, and to treat
  it by itself. So, logically speaking, it would make more sense to
  introduce a new switchdev notifier for TX data plane offloading per
  port. Actually, now that I'm thinking even more about this, it would
  be great not only if we could migrate towards notifiers, but if the
  notification could be emitted by the switchdev driver itself, at
  bridge join time. Once upon a time I had an RFC patch that changed all
  switchdev drivers to inform the bridge that they are capable of
  offloading the RX data plane:
  https://patchwork.kernel.org/project/netdevbpf/patch/20210318231829.3892920-17-olteanv@gmail.com/
  That patch was necessary because the bridge, when it sees a bridge
  port that is a LAG, and the LAG is on top of a switchdev, will assign
  the port hwdom based on the devlink switch ID of the switchdev. This
  is wrong because it assumes that the switchdev offloads the LAG, but
  in the vast majority of cases this is false, only a handful of
  switchdev drivers have LAG offload right now. So the expectation is
  that the bridge can do software forwarding between such LAG comprised
  of two switchdev interfaces, and a third (standalone) switchdev
  interface, but it doesn't do that, because to the bridge, all ports
  have the same hwdom.
  Now it seems common sense that I pick up this patch again and make the
  switchdev drivers give 2 pieces of information:
  (a) can I offload the RX data path
  (b) can I offload the TX data path

I can try to draft another RFC with these changes.

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Oltean <vladimir.oltean@nxp.com>
To: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bridge@lists.linux-foundation.org"
	<bridge@lists.linux-foundation.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Ido Schimmel <idosch@idosch.org>,
	Nikolay Aleksandrov <nikolay@nvidia.com>,
	Roopa Prabhu <roopa@nvidia.com>, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Tobias Waldekranz <tobias@waldekranz.com>
Subject: Re: [Bridge] [RFC PATCH v2 net-next 04/10] net: bridge: switchdev: allow the data plane forwarding to be offloaded
Date: Fri, 9 Jul 2021 14:09:40 +0000	[thread overview]
Message-ID: <20210709140940.4ak5vvt5hxay3wus@skbuf> (raw)
In-Reply-To: <3686cff1-2a80-687e-7c64-cf070a0f5324@ti.com>

Hi Grygorii,

On Fri, Jul 09, 2021 at 04:16:13PM +0300, Grygorii Strashko wrote:
> On 03/07/2021 14:56, Vladimir Oltean wrote:
> > From: Tobias Waldekranz <tobias@waldekranz.com>
> >
> > Allow switchdevs to forward frames from the CPU in accordance with the
> > bridge configuration in the same way as is done between bridge
> > ports. This means that the bridge will only send a single skb towards
> > one of the ports under the switchdev's control, and expects the driver
> > to deliver the packet to all eligible ports in its domain.
> >
> > Primarily this improves the performance of multicast flows with
> > multiple subscribers, as it allows the hardware to perform the frame
> > replication.
> >
> > The basic flow between the driver and the bridge is as follows:
> >
> > - The switchdev accepts the offload by returning a non-null pointer
> >    from .ndo_dfwd_add_station when the port is added to the bridge.
> >
> > - The bridge sends offloadable skbs to one of the ports under the
> >    switchdev's control using dev_queue_xmit_accel.
> >
> > - The switchdev notices the offload by checking for a non-NULL
> >    "sb_dev" in the core's call to .ndo_select_queue.
>
> Sry, I could be missing smth.
>
> Is there any possibility to just mark skb itself as "fwd_offload" (or smth), so driver can
> just check it and decide what to do. Following you series:
> - BR itself will send packet only once to one port if fwd offload possible and supported
> - switchdev driver can check/negotiate BR_FWD_OFFLOAD flag
>
> In our case, TI CPSW can send directed packet (default now), by specifying port_id if DMA desc
> or keep port_id == 0 which will allow HW to process packet internally, including MC duplication.
>
> Sry, again, but necessity to add 3 callbacks and manipulate with "virtual" queue to achieve
> MC offload (seems like one of the primary goals) from BR itself looks a bit over-complicated :(

After cutting my teeth myself with Tobias' patches, I tend to agree with
the idea that the macvlan offload framework is not a great fit for the
software bridge data plane TX offloading. Some reasons:
- the sb_dev pointer is necessary for macvlan because you can have
  multiple macvlan uppers and you need to know which one this packet
  came from. Whereas in the case of a bridge, any given switchdev net
  device can have a single bridge upper. So a single bit per skb,
  possibly even skb->offload_fwd_mark, could be used to encode this bit
  of information: please look up your FDB for this packet and
  forward/replicate it accordingly.
- I am a bit on the fence about the "net: allow ndo_select_queue to go
  beyond dev->num_real_tx_queues" and "net: extract helpers for binding
  a subordinate device to TX queues" patches, they look like the wrong
  approach overall, just to shoehorn our use case into a framework that
  was not meant to cover it.
- most importantly: Ido asked about the possibility for a switchdev to
  accelerate the data plane for a bridge port that is a LAG upper. In the
  current design, where the bridge attempts to call the
  .ndo_dfwd_add_station method of the bond/team driver, this will not
  work. Traditionally, switchdev has migrated away from ndo's towards
  notifiers because of the ability for a switchdev to intercept the
  notifier emitted by the bridge for the bonding interface, and to treat
  it by itself. So, logically speaking, it would make more sense to
  introduce a new switchdev notifier for TX data plane offloading per
  port. Actually, now that I'm thinking even more about this, it would
  be great not only if we could migrate towards notifiers, but if the
  notification could be emitted by the switchdev driver itself, at
  bridge join time. Once upon a time I had an RFC patch that changed all
  switchdev drivers to inform the bridge that they are capable of
  offloading the RX data plane:
  https://patchwork.kernel.org/project/netdevbpf/patch/20210318231829.3892920-17-olteanv@gmail.com/
  That patch was necessary because the bridge, when it sees a bridge
  port that is a LAG, and the LAG is on top of a switchdev, will assign
  the port hwdom based on the devlink switch ID of the switchdev. This
  is wrong because it assumes that the switchdev offloads the LAG, but
  in the vast majority of cases this is false, only a handful of
  switchdev drivers have LAG offload right now. So the expectation is
  that the bridge can do software forwarding between such LAG comprised
  of two switchdev interfaces, and a third (standalone) switchdev
  interface, but it doesn't do that, because to the bridge, all ports
  have the same hwdom.
  Now it seems common sense that I pick up this patch again and make the
  switchdev drivers give 2 pieces of information:
  (a) can I offload the RX data path
  (b) can I offload the TX data path

I can try to draft another RFC with these changes.

  reply	other threads:[~2021-07-09 14:09 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-03 11:56 [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices Vladimir Oltean
2021-07-03 11:56 ` [Bridge] " Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 01/10] net: dfwd: constrain existing users to macvlan subordinates Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 02/10] net: bridge: disambiguate offload_fwd_mark Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-09 13:23   ` Grygorii Strashko
2021-07-03 11:56 ` [RFC PATCH v2 net-next 03/10] net: bridge: switchdev: recycle unused hwdoms Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-03 11:56 ` [RFC PATCH v2 net-next 04/10] net: bridge: switchdev: allow the data plane forwarding to be offloaded Vladimir Oltean
2021-07-03 11:56   ` [Bridge] " Vladimir Oltean
2021-07-09 13:16   ` Grygorii Strashko
2021-07-09 14:09     ` Vladimir Oltean [this message]
2021-07-09 14:09       ` [Bridge] " Vladimir Oltean
2021-07-12 12:28       ` Tobias Waldekranz
2021-07-12 12:28         ` [Bridge] " Tobias Waldekranz
2021-07-12 13:03         ` Vladimir Oltean
2021-07-12 13:03           ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 05/10] net: extract helpers for binding a subordinate device to TX queues Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 06/10] net: allow ndo_select_queue to go beyond dev->num_real_tx_queues Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 07/10] net: dsa: track the number of switches in a tree Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 08/10] net: dsa: add support for bridge forwarding offload Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 09/10] net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 11:57 ` [RFC PATCH v2 net-next 10/10] net: dsa: tag_dsa: offload the bridge forwarding process Vladimir Oltean
2021-07-03 11:57   ` [Bridge] " Vladimir Oltean
2021-07-03 22:04 ` [RFC PATCH v2 net-next 00/10] Allow forwarding for the software bridge data path to be offloaded to capable devices Tobias Waldekranz
2021-07-03 22:04   ` [Bridge] " Tobias Waldekranz
2021-07-04  8:11   ` Vladimir Oltean
2021-07-04  8:11     ` [Bridge] " Vladimir Oltean
2021-07-05  8:09     ` Tobias Waldekranz
2021-07-05  8:09       ` [Bridge] " Tobias Waldekranz
2021-07-05  8:54       ` Vladimir Oltean
2021-07-05  8:54         ` [Bridge] " Vladimir Oltean
2021-07-05  4:20 ` DENG Qingfang
2021-07-05  4:20   ` [Bridge] " DENG Qingfang
2021-07-05  8:32   ` Tobias Waldekranz
2021-07-05  8:32     ` [Bridge] " Tobias Waldekranz
2021-07-05  9:57     ` Vladimir Oltean
2021-07-05  9:57       ` [Bridge] " Vladimir Oltean

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210709140940.4ak5vvt5hxay3wus@skbuf \
    --to=vladimir.oltean@nxp.com \
    --cc=alexander.duyck@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=bridge@lists.linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=grygorii.strashko@ti.com \
    --cc=idosch@idosch.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@nvidia.com \
    --cc=roopa@nvidia.com \
    --cc=stephen@networkplumber.org \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.