All of lore.kernel.org
 help / color / mirror / Atom feed
From: Premkumar Jonnala <pjonnala@broadcom.com>
To: Florian Fainelli <f.fainelli@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"sfeldma@gmail.com" <sfeldma@gmail.com>,
	"jiri@resnulli.us" <jiri@resnulli.us>,
	"nikolay@cumulusnetworks.com" <nikolay@cumulusnetworks.com>,
	"idosch@mellanox.com" <idosch@mellanox.com>,
	"gospo@cumulusnetworks.com" <gospo@cumulusnetworks.com>
Subject: RE: [PATCH] bonding: Offloading bonds to hardware
Date: Mon, 16 Nov 2015 06:10:09 +0000	[thread overview]
Message-ID: <77EF4405DD4BB54AACCE7DB593DF6A9AA06FFC@SJEXCHMB14.corp.ad.broadcom.com> (raw)
In-Reply-To: <56462E38.5090600@gmail.com>



> -----Original Message-----
> From: Florian Fainelli [mailto:f.fainelli@gmail.com]
> Sent: Saturday, November 14, 2015 12:09 AM
> To: Premkumar Jonnala; netdev@vger.kernel.org; sfeldma@gmail.com;
> jiri@resnulli.us; nikolay@cumulusnetworks.com; idosch@mellanox.com;
> gospo@cumulusnetworks.com
> Subject: Re: [PATCH] bonding: Offloading bonds to hardware
> 
> On 12/11/15 08:02, Premkumar Jonnala wrote:
> > Packet forwarding to/from bond interfaces is done in software.
> >
> > This patch enables certain platforms to bridge traffic to/from
> > bond interfaces in hardware.  Notifications are sent out when
> > the "active" slave set for a bond interface is updated in
> > software.  Platforms use the notifications to program the
> > hardware accordingly.  The changes have been verified to work
> > with configured and 802.3ad bond interfaces.
> 
> This is a good explanation of why you want the changes, and how this is
> implemented in a system utilizing that, but this is not documenting why
> you are making these changes to the bonding code, nor how they are
> supposed to be used by an implementor driver, since there is no such
> user posted (yet?).

Thank you for reading thru.  In a system where forwarding happens in 
hardware, bonding interfaces need to be handled appropriately.  Bonding 
interfaces should be treated as a single logical forwarding port, and traffic 
egressing bonding interface should be load balanced across the members.  
Packets ingress slave interface should be associated with appropriate
bond interface for forwarding purposes.

I will add more comments to the ndo/switchdev interfaces based on feedback.  In 
short, the APIs associate/disassociate a slave with a bond interface.  Typically, drivers 
program a "bonding table" in hardware that associates/disassociates a physical port 
with a bond.   Learning, forwarding, etc. from then on consider the bond interface 
and not the physical interface.

When a packet needs to egress a bond interface, a load balancing scheme in 
hardware figures out the slave the packet needs to be sent out on.  Normally, a hash 
function that uses some fields from packet (MAC SA, MAC DA, ethertype, among others) 
are used to determine the slave out which the packet is sent.

> 
> You introduce two new NDOs which are not documented in the commit
> message which would be nice to explain, in particular, why adding new
> NDOs and not switchdev attributes and methods for instance?

I am open to changing the APIs to use the switchdev interface.  I will send the 
diffs out shortly.  As for commenting, I was following the coding/commenting style 
in the file.  I am open to adding more comments.

> 
> Also, is it possible to move some of the logic into a notifier instead
> of having to maintain an array of slaves and an array of slaves to discard?

Can you please elaborate?  Bonding interfaces maintain an array of active slaves 
already. I've created another array, just to manage cleanup/updates to the slave 
set.  For situations where the slave set does not change, or where some slaves 
stay across the slave-array update, I was trying to avoid a remove-slave-x followed 
by an immediate add-slave-x call.  Avoiding unnecessary remove/add calls will 
help prevent traffic interruptions.

Thanks
Prem

> 
> >
> > Signed-off-by: Premkumar Jonnala <pjonnala@broadcom.com>
> >
> > ---
> >
> > diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c
> > index b4351ca..4b53733 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -3759,6 +3759,101 @@ err:
> >  	bond_slave_arr_work_rearm(bond, 1);
> >  }
> >
> > +static int slave_present(struct slave *slave, struct bond_up_slave *arr)
> > +{
> > +	int i;
> > +
> > +	if (!arr)
> > +		return 0;
> > +
> > +	for (i = 0; i < arr->count; i++) {
> > +		if (arr->arr[i] == slave)
> > +			return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +/* Send notification to clear/remove slaves for 'bond' in 'arr' except for
> > + * slaves in 'ignore_arr'.
> > + */
> > +static int bond_slave_arr_clear_notify(struct bonding *bond,
> > +				struct bond_up_slave *arr,
> > +				struct bond_up_slave *ignore_arr)
> > +{
> > +	struct slave *slave;
> > +	struct net_device *slave_dev;
> > +	int i, rv;
> > +	const struct net_device_ops *ops;
> > +
> > +	if (!bond->dev || !arr)
> > +		return -EINVAL;
> > +
> > +	rv = 0;
> > +	for (i = 0; i < arr->count; i++) {
> > +		slave = arr->arr[i];
> > +		if (!slave || !slave->dev)
> > +			continue;
> > +
> > +		slave_dev = slave->dev;
> > +		if (slave_present(slave, ignore_arr)) {
> > +			netdev_dbg(bond->dev, "ignoring clear of slave %s\n",
> > +				slave_dev->name);
> > +			continue;
> > +		}
> > +		ops = slave_dev->netdev_ops;
> > +		if (!ops || !ops->ndo_bond_slave_discard) {
> > +			netdev_dbg(bond->dev, "No slave discard ops for
> %s\n",
> > +				slave_dev->name);
> > +			continue;
> > +		}
> > +		rv = ops->ndo_bond_slave_discard(slave_dev, bond->dev);
> > +		if (rv < 0)
> > +			return rv;
> > +	}
> > +	return rv;
> > +}
> > +
> > +/* Send notification about updated slaves for 'bond' except for slaves in
> > + * 'ignore_arr'.
> > + */
> > +static int bond_slave_arr_set_notify(struct bonding *bond,
> > +				struct bond_up_slave *ignore_arr)
> > +{
> > +	struct slave *slave;
> > +	struct net_device *slave_dev;
> > +	struct bond_up_slave *arr;
> > +	int i, rv;
> > +	const struct net_device_ops *ops;
> > +
> > +	if (!bond || !bond->dev)
> > +		return -EINVAL;
> > +	rv = 0;
> > +
> > +	arr = rtnl_dereference(bond->slave_arr);
> > +	if (!arr)
> > +		return -EINVAL;
> > +
> > +	for (i = 0; i < arr->count; i++) {
> > +		slave = arr->arr[i];
> > +		slave_dev = slave->dev;
> > +		if (slave_present(slave, ignore_arr)) {
> > +			netdev_dbg(bond->dev, "ignoring add of slave %s\n",
> > +				slave->dev->name);
> > +			continue;
> > +		}
> > +		ops = slave_dev->netdev_ops;
> > +		if (!ops || !ops->ndo_bond_slave_add) {
> > +			netdev_dbg(bond->dev, "No slave add ops for %s\n",
> > +				slave_dev->name);
> > +			continue;
> > +		}
> > +		rv = ops->ndo_bond_slave_add(slave_dev, bond->dev);
> > +		if (rv < 0)
> > +			return rv;
> > +	}
> > +	return rv;
> > +}
> > +
> >  /* Build the usable slaves array in control path for modes that use xmit-hash
> >   * to determine the slave interface -
> >   * (a) BOND_MODE_8023AD
> > @@ -3771,7 +3866,7 @@ int bond_update_slave_arr(struct bonding *bond,
> struct slave *skipslave)
> >  {
> >  	struct slave *slave;
> >  	struct list_head *iter;
> > -	struct bond_up_slave *new_arr, *old_arr;
> > +	struct bond_up_slave *new_arr, *old_arr, *discard_arr = 0;
> >  	int agg_id = 0;
> >  	int ret = 0;
> >
> > @@ -3786,6 +3881,12 @@ int bond_update_slave_arr(struct bonding *bond,
> struct slave *skipslave)
> >  		pr_err("Failed to build slave-array.\n");
> >  		goto out;
> >  	}
> > +	discard_arr = kzalloc(offsetof(struct bond_up_slave, arr[bond-
> >slave_cnt]),
> > +			GFP_KERNEL);
> > +	if (!discard_arr) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> >  	if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> >  		struct ad_info ad_info;
> >
> > @@ -3797,6 +3898,7 @@ int bond_update_slave_arr(struct bonding *bond,
> struct slave *skipslave)
> >  			 */
> >  			old_arr = rtnl_dereference(bond->slave_arr);
> >  			if (old_arr) {
> > +				bond_slave_arr_clear_notify(bond, old_arr, 0);
> >  				RCU_INIT_POINTER(bond->slave_arr, NULL);
> >  				kfree_rcu(old_arr, rcu);
> >  			}
> > @@ -3809,8 +3911,10 @@ int bond_update_slave_arr(struct bonding *bond,
> struct slave *skipslave)
> >  			struct aggregator *agg;
> >
> >  			agg = SLAVE_AD_INFO(slave)->port.aggregator;
> > -			if (!agg || agg->aggregator_identifier != agg_id)
> > +			if (!agg || agg->aggregator_identifier != agg_id) {
> > +				discard_arr->arr[discard_arr->count++] = slave;
> >  				continue;
> > +			}
> >  		}
> >  		if (!bond_slave_can_tx(slave))
> >  			continue;
> > @@ -3820,10 +3924,15 @@ int bond_update_slave_arr(struct bonding *bond,
> struct slave *skipslave)
> >  	}
> >
> >  	old_arr = rtnl_dereference(bond->slave_arr);
> > +	bond_slave_arr_clear_notify(bond, old_arr, new_arr);
> > +	bond_slave_arr_clear_notify(bond, discard_arr, 0);
> >  	rcu_assign_pointer(bond->slave_arr, new_arr);
> > +	bond_slave_arr_set_notify(bond, old_arr);
> >  	if (old_arr)
> >  		kfree_rcu(old_arr, rcu);
> >  out:
> > +	if (discard_arr)
> > +		kfree(discard_arr);
> >  	if (ret != 0 && skipslave) {
> >  		int idx;
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 4ac653b..facc35f 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1236,6 +1236,10 @@ struct net_device_ops {
> >  							 bool proto_down);
> >  	int			(*ndo_fill_metadata_dst)(struct net_device
> *dev,
> >  						       struct sk_buff *skb);
> > +	int		(*ndo_bond_slave_add)(struct net_device *slave_dev,
> > +				struct net_device *bond);
> > +	int		(*ndo_bond_slave_discard)(struct net_device
> *slave_dev,
> > +				struct net_device *bond);
> >  };
> >
> >  /**
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> --
> Florian

  parent reply	other threads:[~2015-11-16  6:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-12 16:02 [PATCH] bonding: Offloading bonds to hardware Premkumar Jonnala
2015-11-12 17:08 ` Andrew Lunn
2015-11-14  9:40   ` Jiri Pirko
2015-11-13 18:38 ` Florian Fainelli
2015-11-13 19:10   ` David Miller
2015-11-16  6:12     ` Premkumar Jonnala
2015-11-16  6:51       ` David Miller
2015-11-16  6:49     ` Premkumar Jonnala
2015-11-16  6:54       ` David Miller
2015-11-16  6:10   ` Premkumar Jonnala [this message]
2015-11-13 21:11 ` Andrew Lunn
2015-11-16  6:15   ` Premkumar Jonnala
2015-11-14  9:39 ` Jiri Pirko
2015-11-15  5:51   ` John Fastabend
2015-11-15  9:01     ` Jiri Pirko
2015-11-16 16:24       ` John Fastabend
2015-11-16  6:48   ` Premkumar Jonnala
2015-11-16  7:46     ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77EF4405DD4BB54AACCE7DB593DF6A9AA06FFC@SJEXCHMB14.corp.ad.broadcom.com \
    --to=pjonnala@broadcom.com \
    --cc=f.fainelli@gmail.com \
    --cc=gospo@cumulusnetworks.com \
    --cc=idosch@mellanox.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=sfeldma@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.