All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next v5 0/8] switchdev: change locking
@ 2015-10-14 17:40 Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 1/8] switchdev: introduce switchdev deferred ops infrastructure Jiri Pirko
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

This is something which I'm currently struggling with.
Callers of attr_set and obj_add/del often hold not only RTNL, but also
spinlock (bridge). So in that case, the driver implementing the op cannot sleep.

The way rocker is dealing with this now is just to invoke driver operation
and go out, without any checking or reporting of the operation status.

Since it would be nice to at least put a warning in case the operation fails,
it makes sense to do this in delayed work directly in switchdev core
instead of implementing this in separate drivers. And that is what this patchset
is introducing.

So from now on, the locking of switchdev mod ops is consistent. Caller either
holds rtnl mutex or in case it does not, caller sets defer flag, telling
switchdev core to process the op later, in deferred queue.

Function to force to process switchdev deferred ops can be called by op
caller in appropriate location, for example after it releases
spin lock, to force switchdev core to process pending ops.

v1->v2:
- rebased on current net-next head (including Scott's ageing patchset)
v2->v3:
- fixed comment s/of/or/ typo suggested by Nik
v3->v4:
- the actual patchset is sent instead of different branch I send in v3 :/
v4->v5:
- added patch to "const" attr param
- reworked deferred ops infrastructure (mainly patch number 1 and
  internal users (patch 3 and 5)) - resolves the issue pointed out
  by John

Jiri Pirko (8):
  switchdev: introduce switchdev deferred ops infrastructure
  switchdev: make struct switchdev_attr parameter const for attr_set
    calls
  switchdev: allow caller to explicitly request attr_set as deferred
  switchdev: remove pointers from switchdev objects
  switchdev: introduce possibility to defer obj_add/del
  bridge: defer switchdev fdb del call in fdb_del_external_learn
  rocker: remove nowait from switchdev callbacks.
  switchdev: assert rtnl mutex when going over lower netdevs

 drivers/net/ethernet/rocker/rocker.c |  15 +-
 include/net/switchdev.h              |  20 ++-
 net/bridge/br_fdb.c                  |   7 +-
 net/bridge/br_if.c                   |   3 +
 net/bridge/br_stp.c                  |   3 +-
 net/dsa/slave.c                      |   4 +-
 net/switchdev/switchdev.c            | 315 ++++++++++++++++++++++++-----------
 7 files changed, 254 insertions(+), 113 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch net-next v5 1/8] switchdev: introduce switchdev deferred ops infrastructure
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls Jiri Pirko
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Introduce infrastructure which will be used internally to defer ops.
Note that the deferred ops are queued up and either are processed by
scheduled work or explicitly by user calling deferred_process function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/switchdev.h   |  5 +++
 net/switchdev/switchdev.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 1ce7083..31b9038 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -167,6 +167,7 @@ switchdev_notifier_info_to_dev(const struct switchdev_notifier_info *info)
 
 #ifdef CONFIG_NET_SWITCHDEV
 
+void switchdev_deferred_process(void);
 int switchdev_port_attr_get(struct net_device *dev,
 			    struct switchdev_attr *attr);
 int switchdev_port_attr_set(struct net_device *dev,
@@ -208,6 +209,10 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
 
 #else
 
+static inline void switchdev_deferred_process(void)
+{
+}
+
 static inline int switchdev_port_attr_get(struct net_device *dev,
 					  struct switchdev_attr *attr)
 {
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index b8aaf820..5e64b59 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -17,6 +17,7 @@
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
 #include <linux/list.h>
+#include <linux/workqueue.h>
 #include <net/ip_fib.h>
 #include <net/switchdev.h>
 
@@ -92,6 +93,85 @@ static void switchdev_trans_items_warn_destroy(struct net_device *dev,
 	switchdev_trans_items_destroy(trans);
 }
 
+static LIST_HEAD(deferred);
+static DEFINE_SPINLOCK(deferred_lock);
+
+typedef void switchdev_deferred_func_t(struct net_device *dev,
+				       const void *data);
+
+struct switchdev_deferred_item {
+	struct list_head list;
+	struct net_device *dev;
+	switchdev_deferred_func_t *func;
+	unsigned long data[0];
+};
+
+static struct switchdev_deferred_item *switchdev_deferred_dequeue(void)
+{
+	struct switchdev_deferred_item *dfitem;
+
+	spin_lock_bh(&deferred_lock);
+	if (list_empty(&deferred)) {
+		dfitem = NULL;
+		goto unlock;
+	}
+	dfitem = list_first_entry(&deferred,
+				  struct switchdev_deferred_item, list);
+	list_del(&dfitem->list);
+unlock:
+	spin_unlock_bh(&deferred_lock);
+	return dfitem;
+}
+
+/**
+ *	switchdev_deferred_process - Process ops in deferred queue
+ *
+ *	Called to flush the ops currently queued in deferred ops queue.
+ *	rtnl_lock must be held.
+ */
+void switchdev_deferred_process(void)
+{
+	struct switchdev_deferred_item *dfitem;
+
+	ASSERT_RTNL();
+
+	while ((dfitem = switchdev_deferred_dequeue())) {
+		dfitem->func(dfitem->dev, dfitem->data);
+		dev_put(dfitem->dev);
+		kfree(dfitem);
+	}
+}
+EXPORT_SYMBOL_GPL(switchdev_deferred_process);
+
+static void switchdev_deferred_process_work(struct work_struct *work)
+{
+	rtnl_lock();
+	switchdev_deferred_process();
+	rtnl_unlock();
+}
+
+static DECLARE_WORK(deferred_process_work, switchdev_deferred_process_work);
+
+static int switchdev_deferred_enqueue(struct net_device *dev,
+				      const void *data, size_t data_len,
+				      switchdev_deferred_func_t *func)
+{
+	struct switchdev_deferred_item *dfitem;
+
+	dfitem = kmalloc(sizeof(*dfitem) + data_len, GFP_ATOMIC);
+	if (!dfitem)
+		return -ENOMEM;
+	dfitem->dev = dev;
+	dfitem->func = func;
+	memcpy(dfitem->data, data, data_len);
+	dev_hold(dev);
+	spin_lock_bh(&deferred_lock);
+	list_add_tail(&dfitem->list, &deferred);
+	spin_unlock_bh(&deferred_lock);
+	schedule_work(&deferred_process_work);
+	return 0;
+}
+
 /**
  *	switchdev_port_attr_get - Get port attribute
  *
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 1/8] switchdev: introduce switchdev deferred ops infrastructure Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-14 19:01   ` Vivien Didelot
  2015-10-14 17:40 ` [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker.c | 2 +-
 include/net/switchdev.h              | 6 +++---
 net/dsa/slave.c                      | 2 +-
 net/switchdev/switchdev.c            | 7 ++++---
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index eafa907..f0e820d 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4374,7 +4374,7 @@ static int rocker_port_bridge_ageing_time(struct rocker_port *rocker_port,
 }
 
 static int rocker_port_attr_set(struct net_device *dev,
-				struct switchdev_attr *attr,
+				const struct switchdev_attr *attr,
 				struct switchdev_trans *trans)
 {
 	struct rocker_port *rocker_port = netdev_priv(dev);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 31b9038..d1c7f90 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -132,7 +132,7 @@ struct switchdev_ops {
 	int	(*switchdev_port_attr_get)(struct net_device *dev,
 					   struct switchdev_attr *attr);
 	int	(*switchdev_port_attr_set)(struct net_device *dev,
-					   struct switchdev_attr *attr,
+					   const struct switchdev_attr *attr,
 					   struct switchdev_trans *trans);
 	int	(*switchdev_port_obj_add)(struct net_device *dev,
 					  const struct switchdev_obj *obj,
@@ -171,7 +171,7 @@ void switchdev_deferred_process(void);
 int switchdev_port_attr_get(struct net_device *dev,
 			    struct switchdev_attr *attr);
 int switchdev_port_attr_set(struct net_device *dev,
-			    struct switchdev_attr *attr);
+			    const struct switchdev_attr *attr);
 int switchdev_port_obj_add(struct net_device *dev,
 			   const struct switchdev_obj *obj);
 int switchdev_port_obj_del(struct net_device *dev,
@@ -220,7 +220,7 @@ static inline int switchdev_port_attr_get(struct net_device *dev,
 }
 
 static inline int switchdev_port_attr_set(struct net_device *dev,
-					  struct switchdev_attr *attr)
+					  const struct switchdev_attr *attr)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 43d7342..84cd863 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -453,7 +453,7 @@ static int dsa_slave_stp_update(struct net_device *dev, u8 state)
 }
 
 static int dsa_slave_port_attr_set(struct net_device *dev,
-				   struct switchdev_attr *attr,
+				   const struct switchdev_attr *attr,
 				   struct switchdev_trans *trans)
 {
 	struct dsa_slave_priv *p = netdev_priv(dev);
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 5e64b59..23b4e5b 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -215,7 +215,7 @@ int switchdev_port_attr_get(struct net_device *dev, struct switchdev_attr *attr)
 EXPORT_SYMBOL_GPL(switchdev_port_attr_get);
 
 static int __switchdev_port_attr_set(struct net_device *dev,
-				     struct switchdev_attr *attr,
+				     const struct switchdev_attr *attr,
 				     struct switchdev_trans *trans)
 {
 	const struct switchdev_ops *ops = dev->switchdev_ops;
@@ -274,7 +274,7 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
 }
 
 static int switchdev_port_attr_set_defer(struct net_device *dev,
-					 struct switchdev_attr *attr)
+					 const struct switchdev_attr *attr)
 {
 	struct switchdev_attr_set_work *asw;
 
@@ -303,7 +303,8 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
  *	system is not left in a partially updated state due to
  *	failure from driver/device.
  */
-int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr)
+int switchdev_port_attr_set(struct net_device *dev,
+			    const struct switchdev_attr *attr)
 {
 	struct switchdev_trans trans;
 	int err;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 1/8] switchdev: introduce switchdev deferred ops infrastructure Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-15  4:34   ` Scott Feldman
  2015-10-15 15:21   ` John Fastabend
  2015-10-14 17:40 ` [patch net-next v5 4/8] switchdev: remove pointers from switchdev objects Jiri Pirko
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Benefit from newly introduced switchdev deferred ops infrastructure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/switchdev.h   |   1 +
 net/bridge/br_stp.c       |   3 +-
 net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
 3 files changed, 46 insertions(+), 66 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index d1c7f90..f7de6f8 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -17,6 +17,7 @@
 
 #define SWITCHDEV_F_NO_RECURSE		BIT(0)
 #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
+#define SWITCHDEV_F_DEFER		BIT(2)
 
 struct switchdev_trans_item {
 	struct list_head list;
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index db6d243de..80c34d7 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
 {
 	struct switchdev_attr attr = {
 		.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
+		.flags = SWITCHDEV_F_DEFER,
 		.u.stp_state = state,
 	};
 	int err;
 
 	p->state = state;
 	err = switchdev_port_attr_set(p->dev, &attr);
-	if (err && err != -EOPNOTSUPP)
+	if (err)
 		br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
 				(unsigned int) p->port_no, p->dev->name);
 }
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 23b4e5b..007b8f4 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -250,75 +250,12 @@ done:
 	return err;
 }
 
-struct switchdev_attr_set_work {
-	struct work_struct work;
-	struct net_device *dev;
-	struct switchdev_attr attr;
-};
-
-static void switchdev_port_attr_set_work(struct work_struct *work)
-{
-	struct switchdev_attr_set_work *asw =
-		container_of(work, struct switchdev_attr_set_work, work);
-	int err;
-
-	rtnl_lock();
-	err = switchdev_port_attr_set(asw->dev, &asw->attr);
-	if (err && err != -EOPNOTSUPP)
-		netdev_err(asw->dev, "failed (err=%d) to set attribute (id=%d)\n",
-			   err, asw->attr.id);
-	rtnl_unlock();
-
-	dev_put(asw->dev);
-	kfree(work);
-}
-
-static int switchdev_port_attr_set_defer(struct net_device *dev,
-					 const struct switchdev_attr *attr)
-{
-	struct switchdev_attr_set_work *asw;
-
-	asw = kmalloc(sizeof(*asw), GFP_ATOMIC);
-	if (!asw)
-		return -ENOMEM;
-
-	INIT_WORK(&asw->work, switchdev_port_attr_set_work);
-
-	dev_hold(dev);
-	asw->dev = dev;
-	memcpy(&asw->attr, attr, sizeof(asw->attr));
-
-	schedule_work(&asw->work);
-
-	return 0;
-}
-
-/**
- *	switchdev_port_attr_set - Set port attribute
- *
- *	@dev: port device
- *	@attr: attribute to set
- *
- *	Use a 2-phase prepare-commit transaction model to ensure
- *	system is not left in a partially updated state due to
- *	failure from driver/device.
- */
-int switchdev_port_attr_set(struct net_device *dev,
-			    const struct switchdev_attr *attr)
+static int switchdev_port_attr_set_now(struct net_device *dev,
+				       const struct switchdev_attr *attr)
 {
 	struct switchdev_trans trans;
 	int err;
 
-	if (!rtnl_is_locked()) {
-		/* Running prepare-commit transaction across stacked
-		 * devices requires nothing moves, so if rtnl_lock is
-		 * not held, schedule a worker thread to hold rtnl_lock
-		 * while setting attr.
-		 */
-
-		return switchdev_port_attr_set_defer(dev, attr);
-	}
-
 	switchdev_trans_init(&trans);
 
 	/* Phase I: prepare for attr set. Driver/device should fail
@@ -355,6 +292,47 @@ int switchdev_port_attr_set(struct net_device *dev,
 
 	return err;
 }
+
+static void switchdev_port_attr_set_deferred(struct net_device *dev,
+					     const void *data)
+{
+	const struct switchdev_attr *attr = data;
+	int err;
+
+	err = switchdev_port_attr_set_now(dev, attr);
+	if (err && err != -EOPNOTSUPP)
+		netdev_err(dev, "failed (err=%d) to set attribute (id=%d)\n",
+			   err, attr->id);
+}
+
+static int switchdev_port_attr_set_defer(struct net_device *dev,
+					 const struct switchdev_attr *attr)
+{
+	return switchdev_deferred_enqueue(dev, attr, sizeof(*attr),
+					  switchdev_port_attr_set_deferred);
+}
+
+/**
+ *	switchdev_port_attr_set - Set port attribute
+ *
+ *	@dev: port device
+ *	@attr: attribute to set
+ *
+ *	Use a 2-phase prepare-commit transaction model to ensure
+ *	system is not left in a partially updated state due to
+ *	failure from driver/device.
+ *
+ *	rtnl_lock must be held and must not be in atomic section,
+ *	in case SWITCHDEV_F_DEFER flag is not set.
+ */
+int switchdev_port_attr_set(struct net_device *dev,
+			    const struct switchdev_attr *attr)
+{
+	if (attr->flags & SWITCHDEV_F_DEFER)
+		return switchdev_port_attr_set_defer(dev, attr);
+	ASSERT_RTNL();
+	return switchdev_port_attr_set_now(dev, attr);
+}
 EXPORT_SYMBOL_GPL(switchdev_port_attr_set);
 
 static int __switchdev_port_obj_add(struct net_device *dev,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 4/8] switchdev: remove pointers from switchdev objects
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
                   ` (2 preceding siblings ...)
  2015-10-14 17:40 ` [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/rocker/rocker.c |  6 +++---
 include/net/switchdev.h              |  7 +++----
 net/bridge/br_fdb.c                  |  2 +-
 net/dsa/slave.c                      |  2 +-
 net/switchdev/switchdev.c            | 11 +++++++----
 5 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index f0e820d..2cd7435 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4469,7 +4469,7 @@ static int rocker_port_obj_add(struct net_device *dev,
 		fib4 = SWITCHDEV_OBJ_IPV4_FIB(obj);
 		err = rocker_port_fib_ipv4(rocker_port, trans,
 					   htonl(fib4->dst), fib4->dst_len,
-					   fib4->fi, fib4->tb_id, 0);
+					   &fib4->fi, fib4->tb_id, 0);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = rocker_port_fdb_add(rocker_port, trans,
@@ -4541,7 +4541,7 @@ static int rocker_port_obj_del(struct net_device *dev,
 		fib4 = SWITCHDEV_OBJ_IPV4_FIB(obj);
 		err = rocker_port_fib_ipv4(rocker_port, NULL,
 					   htonl(fib4->dst), fib4->dst_len,
-					   fib4->fi, fib4->tb_id,
+					   &fib4->fi, fib4->tb_id,
 					   ROCKER_OP_FLAG_REMOVE);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
@@ -4571,7 +4571,7 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port,
 	hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, found, entry) {
 		if (found->key.rocker_port != rocker_port)
 			continue;
-		fdb->addr = found->key.addr;
+		ether_addr_copy(fdb->addr, found->key.addr);
 		fdb->ndm_state = NUD_REACHABLE;
 		fdb->vid = rocker_port_vlan_to_vid(rocker_port,
 						   found->key.vlan_id);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index f7de6f8..f8672d7 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -14,6 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/notifier.h>
 #include <linux/list.h>
+#include <net/ip_fib.h>
 
 #define SWITCHDEV_F_NO_RECURSE		BIT(0)
 #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
@@ -59,8 +60,6 @@ struct switchdev_attr {
 	} u;
 };
 
-struct fib_info;
-
 enum switchdev_obj_id {
 	SWITCHDEV_OBJ_ID_UNDEFINED,
 	SWITCHDEV_OBJ_ID_PORT_VLAN,
@@ -88,7 +87,7 @@ struct switchdev_obj_ipv4_fib {
 	struct switchdev_obj obj;
 	u32 dst;
 	int dst_len;
-	struct fib_info *fi;
+	struct fib_info fi;
 	u8 tos;
 	u8 type;
 	u32 nlflags;
@@ -101,7 +100,7 @@ struct switchdev_obj_ipv4_fib {
 /* SWITCHDEV_OBJ_ID_PORT_FDB */
 struct switchdev_obj_port_fdb {
 	struct switchdev_obj obj;
-	const unsigned char *addr;
+	unsigned char addr[ETH_ALEN];
 	u16 vid;
 	u16 ndm_state;
 };
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index f43ce05..f5e7da0 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -135,10 +135,10 @@ static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
 {
 	struct switchdev_obj_port_fdb fdb = {
 		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-		.addr = f->addr.addr,
 		.vid = f->vlan_id,
 	};
 
+	ether_addr_copy(fdb.addr, f->addr.addr);
 	switchdev_port_obj_del(f->dst->dev, &fdb.obj);
 }
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 84cd863..b0b8da0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -393,7 +393,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev,
 		if (ret < 0)
 			break;
 
-		fdb->addr = addr;
+		ether_addr_copy(fdb->addr, addr);
 		fdb->vid = vid;
 		fdb->ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE;
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 007b8f4..5963d7a 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -15,6 +15,7 @@
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/etherdevice.h>
 #include <linux/if_bridge.h>
 #include <linux/list.h>
 #include <linux/workqueue.h>
@@ -891,10 +892,10 @@ int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 {
 	struct switchdev_obj_port_fdb fdb = {
 		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-		.addr = addr,
 		.vid = vid,
 	};
 
+	ether_addr_copy(fdb.addr, addr);
 	return switchdev_port_obj_add(dev, &fdb.obj);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_add);
@@ -916,10 +917,10 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
 {
 	struct switchdev_obj_port_fdb fdb = {
 		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-		.addr = addr,
 		.vid = vid,
 	};
 
+	ether_addr_copy(fdb.addr, addr);
 	return switchdev_port_obj_del(dev, &fdb.obj);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_del);
@@ -1081,7 +1082,6 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
 		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
 		.dst = dst,
 		.dst_len = dst_len,
-		.fi = fi,
 		.tos = tos,
 		.type = type,
 		.nlflags = nlflags,
@@ -1090,6 +1090,8 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
 	struct net_device *dev;
 	int err = 0;
 
+	memcpy(&ipv4_fib.fi, fi, sizeof(ipv4_fib.fi));
+
 	/* Don't offload route if using custom ip rules or if
 	 * IPv4 FIB offloading has been disabled completely.
 	 */
@@ -1133,7 +1135,6 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
 		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
 		.dst = dst,
 		.dst_len = dst_len,
-		.fi = fi,
 		.tos = tos,
 		.type = type,
 		.nlflags = 0,
@@ -1142,6 +1143,8 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
 	struct net_device *dev;
 	int err = 0;
 
+	memcpy(&ipv4_fib.fi, fi, sizeof(ipv4_fib.fi));
+
 	if (!(fi->fib_flags & RTNH_F_OFFLOAD))
 		return 0;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
                   ` (3 preceding siblings ...)
  2015-10-14 17:40 ` [patch net-next v5 4/8] switchdev: remove pointers from switchdev objects Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-19  7:55   ` Scott Feldman
  2015-10-14 17:40 ` [patch net-next v5 6/8] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/switchdev.h   |   1 +
 net/switchdev/switchdev.c | 100 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index f8672d7..bc865e2 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -69,6 +69,7 @@ enum switchdev_obj_id {
 
 struct switchdev_obj {
 	enum switchdev_obj_id id;
+	u32 flags;
 };
 
 /* SWITCHDEV_OBJ_ID_PORT_VLAN */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 5963d7a..eac68c4 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -362,21 +362,8 @@ static int __switchdev_port_obj_add(struct net_device *dev,
 	return err;
 }
 
-/**
- *	switchdev_port_obj_add - Add port object
- *
- *	@dev: port device
- *	@id: object ID
- *	@obj: object to add
- *
- *	Use a 2-phase prepare-commit transaction model to ensure
- *	system is not left in a partially updated state due to
- *	failure from driver/device.
- *
- *	rtnl_lock must be held.
- */
-int switchdev_port_obj_add(struct net_device *dev,
-			   const struct switchdev_obj *obj)
+static int switchdev_port_obj_add_now(struct net_device *dev,
+				      const struct switchdev_obj *obj)
 {
 	struct switchdev_trans trans;
 	int err;
@@ -418,18 +405,53 @@ int switchdev_port_obj_add(struct net_device *dev,
 
 	return err;
 }
-EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
+
+static void switchdev_port_obj_add_deferred(struct net_device *dev,
+					    const void *data)
+{
+	const struct switchdev_obj *obj = data;
+	int err;
+
+	err = switchdev_port_obj_add_now(dev, obj);
+	if (err && err != -EOPNOTSUPP)
+		netdev_err(dev, "failed (err=%d) to add object (id=%d)\n",
+			   err, obj->id);
+}
+
+static int switchdev_port_obj_add_defer(struct net_device *dev,
+					const struct switchdev_obj *obj)
+{
+	return switchdev_deferred_enqueue(dev, obj, sizeof(*obj),
+					  switchdev_port_obj_add_deferred);
+}
 
 /**
- *	switchdev_port_obj_del - Delete port object
+ *	switchdev_port_obj_add - Add port object
  *
  *	@dev: port device
  *	@id: object ID
- *	@obj: object to delete
+ *	@obj: object to add
+ *
+ *	Use a 2-phase prepare-commit transaction model to ensure
+ *	system is not left in a partially updated state due to
+ *	failure from driver/device.
+ *
+ *	rtnl_lock must be held and must not be in atomic section,
+ *	in case SWITCHDEV_F_DEFER flag is not set.
  */
-int switchdev_port_obj_del(struct net_device *dev,
+int switchdev_port_obj_add(struct net_device *dev,
 			   const struct switchdev_obj *obj)
 {
+	if (obj->flags & SWITCHDEV_F_DEFER)
+		return switchdev_port_obj_add_defer(dev, obj);
+	ASSERT_RTNL();
+	return switchdev_port_obj_add_now(dev, obj);
+}
+EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
+
+static int switchdev_port_obj_del_now(struct net_device *dev,
+				      const struct switchdev_obj *obj)
+{
 	const struct switchdev_ops *ops = dev->switchdev_ops;
 	struct net_device *lower_dev;
 	struct list_head *iter;
@@ -444,13 +466,51 @@ int switchdev_port_obj_del(struct net_device *dev,
 	 */
 
 	netdev_for_each_lower_dev(dev, lower_dev, iter) {
-		err = switchdev_port_obj_del(lower_dev, obj);
+		err = switchdev_port_obj_del_now(lower_dev, obj);
 		if (err)
 			break;
 	}
 
 	return err;
 }
+
+static void switchdev_port_obj_del_deferred(struct net_device *dev,
+					    const void *data)
+{
+	const struct switchdev_obj *obj = data;
+	int err;
+
+	err = switchdev_port_obj_del_now(dev, obj);
+	if (err && err != -EOPNOTSUPP)
+		netdev_err(dev, "failed (err=%d) to del object (id=%d)\n",
+			   err, obj->id);
+}
+
+static int switchdev_port_obj_del_defer(struct net_device *dev,
+					const struct switchdev_obj *obj)
+{
+	return switchdev_deferred_enqueue(dev, obj, sizeof(*obj),
+					  switchdev_port_obj_del_deferred);
+}
+
+/**
+ *	switchdev_port_obj_del - Delete port object
+ *
+ *	@dev: port device
+ *	@id: object ID
+ *	@obj: object to delete
+ *
+ *	rtnl_lock must be held and must not be in atomic section,
+ *	in case SWITCHDEV_F_DEFER flag is not set.
+ */
+int switchdev_port_obj_del(struct net_device *dev,
+			   const struct switchdev_obj *obj)
+{
+	if (obj->flags & SWITCHDEV_F_DEFER)
+		return switchdev_port_obj_del_defer(dev, obj);
+	ASSERT_RTNL();
+	return switchdev_port_obj_del_now(dev, obj);
+}
 EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
 
 /**
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 6/8] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
                   ` (4 preceding siblings ...)
  2015-10-14 17:40 ` [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 7/8] rocker: remove nowait from switchdev callbacks Jiri Pirko
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Since spinlock is held here, defer the switchdev operation. Also, ensure
that defered switchdev ops are processed before port master device
is unlinked.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/bridge/br_fdb.c | 5 ++++-
 net/bridge/br_if.c  | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index f5e7da0..c88bd8e 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
 static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
 {
 	struct switchdev_obj_port_fdb fdb = {
-		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
+		.obj = {
+			.id = SWITCHDEV_OBJ_ID_PORT_FDB,
+			.flags = SWITCHDEV_F_DEFER,
+		},
 		.vid = f->vlan_id,
 	};
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 45e4757..ec02f58 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -24,6 +24,7 @@
 #include <linux/slab.h>
 #include <net/sock.h>
 #include <linux/if_vlan.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 
@@ -250,6 +251,8 @@ static void del_nbp(struct net_bridge_port *p)
 
 	nbp_vlan_flush(p);
 	br_fdb_delete_by_port(br, p, 0, 1);
+	switchdev_deferred_process();
+
 	nbp_update_port_count(br);
 
 	netdev_upper_dev_unlink(dev, br->dev);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 7/8] rocker: remove nowait from switchdev callbacks.
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
                   ` (5 preceding siblings ...)
  2015-10-14 17:40 ` [patch net-next v5 6/8] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-14 17:40 ` [patch net-next v5 8/8] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
  2015-10-15 13:10 ` [patch net-next v5 0/8] switchdev: change locking David Miller
  8 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

No need to avoid sleeping in switchdev callbacks now, as the switchdev
core allows it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index 2cd7435..32a80d2 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port *rocker_port,
 	    rocker_port->stp_state == BR_STATE_FORWARDING)
 		return 0;
 
-	flags |= ROCKER_OP_FLAG_REMOVE;
+	flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
 
 	spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
 
@@ -4382,8 +4382,7 @@ static int rocker_port_attr_set(struct net_device *dev,
 
 	switch (attr->id) {
 	case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
-		err = rocker_port_stp_update(rocker_port, trans,
-					     ROCKER_OP_FLAG_NOWAIT,
+		err = rocker_port_stp_update(rocker_port, trans, 0,
 					     attr->u.stp_state);
 		break;
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
@@ -4517,7 +4516,7 @@ static int rocker_port_fdb_del(struct rocker_port *rocker_port,
 			       const struct switchdev_obj_port_fdb *fdb)
 {
 	__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
-	int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
+	int flags = ROCKER_OP_FLAG_REMOVE;
 
 	if (!rocker_port_is_bridged(rocker_port))
 		return -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [patch net-next v5 8/8] switchdev: assert rtnl mutex when going over lower netdevs
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
                   ` (6 preceding siblings ...)
  2015-10-14 17:40 ` [patch net-next v5 7/8] rocker: remove nowait from switchdev callbacks Jiri Pirko
@ 2015-10-14 17:40 ` Jiri Pirko
  2015-10-15 13:10 ` [patch net-next v5 0/8] switchdev: change locking David Miller
  8 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-14 17:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

netdev_for_each_lower_dev has to be called with rtnl mutex held. So
better enforce it in switchdev functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/switchdev/switchdev.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index eac68c4..73e3895 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -520,6 +520,8 @@ EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
  *	@id: object ID
  *	@obj: object to dump
  *	@cb: function to call with a filled object
+ *
+ *	rtnl_lock must be held.
  */
 int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj,
 			    switchdev_obj_dump_cb_t *cb)
@@ -529,6 +531,8 @@ int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj,
 	struct list_head *iter;
 	int err = -EOPNOTSUPP;
 
+	ASSERT_RTNL();
+
 	if (ops && ops->switchdev_port_obj_dump)
 		return ops->switchdev_port_obj_dump(dev, obj, cb);
 
@@ -1097,6 +1101,8 @@ static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
 	struct net_device *dev = NULL;
 	int nhsel;
 
+	ASSERT_RTNL();
+
 	/* For this route, all nexthop devs must be on the same switch. */
 
 	for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
@@ -1327,10 +1333,11 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
 	u32 mark = dev->ifindex;
 	u32 reset_mark = 0;
 
-	if (group_dev && joining) {
-		mark = switchdev_port_fwd_mark_get(dev, group_dev);
-	} else if (group_dev && !joining) {
-		if (dev->offload_fwd_mark == mark)
+	if (group_dev) {
+		ASSERT_RTNL();
+		if (joining)
+			mark = switchdev_port_fwd_mark_get(dev, group_dev);
+		else if (dev->offload_fwd_mark == mark)
 			/* Ohoh, this port was the mark reference port,
 			 * but it's leaving the group, so reset the
 			 * mark for the remaining ports in the group.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls
  2015-10-14 17:40 ` [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls Jiri Pirko
@ 2015-10-14 19:01   ` Vivien Didelot
  0 siblings, 0 replies; 20+ messages in thread
From: Vivien Didelot @ 2015-10-14 19:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, sfeldma, f.fainelli, linux, andrew,
	john.fastabend, David.Laight, stephen

On Oct. Wednesday 14 (42) 07:40 PM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-14 17:40 ` [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
@ 2015-10-15  4:34   ` Scott Feldman
  2015-10-15  5:57     ` Jiri Pirko
  2015-10-15 15:21   ` John Fastabend
  1 sibling, 1 reply; 20+ messages in thread
From: Scott Feldman @ 2015-10-15  4:34 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Caller should know if he can call attr_set directly (when holding RTNL)
> or if he has to defer the att_set processing for later.
>
> This also allows drivers to sleep inside attr_set and report operation
> status back to switchdev core. Switchdev core then warns if status is
> not ok, instead of silent errors happening in drivers.
>
> Benefit from newly introduced switchdev deferred ops infrastructure.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/switchdev.h   |   1 +
>  net/bridge/br_stp.c       |   3 +-
>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>  3 files changed, 46 insertions(+), 66 deletions(-)
>
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index d1c7f90..f7de6f8 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -17,6 +17,7 @@
>
>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
> +#define SWITCHDEV_F_DEFER              BIT(2)
>
>  struct switchdev_trans_item {
>         struct list_head list;
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index db6d243de..80c34d7 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>  {
>         struct switchdev_attr attr = {
>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
> +               .flags = SWITCHDEV_F_DEFER,
>                 .u.stp_state = state,
>         };
>         int err;
>
>         p->state = state;
>         err = switchdev_port_attr_set(p->dev, &attr);
> -       if (err && err != -EOPNOTSUPP)
> +       if (err)
>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>                                 (unsigned int) p->port_no, p->dev->name);
>  }

Should this part of the patch be moved to patch 6/8 where
switchdev_deferred_process() is called from del_nbp()?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-15  4:34   ` Scott Feldman
@ 2015-10-15  5:57     ` Jiri Pirko
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-15  5:57 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

Thu, Oct 15, 2015 at 06:34:01AM CEST, sfeldma@gmail.com wrote:
>On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Caller should know if he can call attr_set directly (when holding RTNL)
>> or if he has to defer the att_set processing for later.
>>
>> This also allows drivers to sleep inside attr_set and report operation
>> status back to switchdev core. Switchdev core then warns if status is
>> not ok, instead of silent errors happening in drivers.
>>
>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/bridge/br_stp.c       |   3 +-
>>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>  3 files changed, 46 insertions(+), 66 deletions(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index d1c7f90..f7de6f8 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,7 @@
>>
>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>
>>  struct switchdev_trans_item {
>>         struct list_head list;
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index db6d243de..80c34d7 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>  {
>>         struct switchdev_attr attr = {
>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>> +               .flags = SWITCHDEV_F_DEFER,
>>                 .u.stp_state = state,
>>         };
>>         int err;
>>
>>         p->state = state;
>>         err = switchdev_port_attr_set(p->dev, &attr);
>> -       if (err && err != -EOPNOTSUPP)
>> +       if (err)
>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>                                 (unsigned int) p->port_no, p->dev->name);
>>  }
>
>Should this part of the patch be moved to patch 6/8 where
>switchdev_deferred_process() is called from del_nbp()?

No, this part relates to the fact that attr_set now does not defer
automagically. So caller must say when to defer. So having this here is
correct.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 0/8] switchdev: change locking
  2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
                   ` (7 preceding siblings ...)
  2015-10-14 17:40 ` [patch net-next v5 8/8] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
@ 2015-10-15 13:10 ` David Miller
  8 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2015-10-15 13:10 UTC (permalink / raw)
  To: jiri
  Cc: netdev, idosch, eladr, sfeldma, f.fainelli, linux,
	vivien.didelot, andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 14 Oct 2015 19:40:47 +0200

> This is something which I'm currently struggling with.
> Callers of attr_set and obj_add/del often hold not only RTNL, but also
> spinlock (bridge). So in that case, the driver implementing the op cannot sleep.
> 
> The way rocker is dealing with this now is just to invoke driver operation
> and go out, without any checking or reporting of the operation status.
> 
> Since it would be nice to at least put a warning in case the operation fails,
> it makes sense to do this in delayed work directly in switchdev core
> instead of implementing this in separate drivers. And that is what this patchset
> is introducing.
> 
> So from now on, the locking of switchdev mod ops is consistent. Caller either
> holds rtnl mutex or in case it does not, caller sets defer flag, telling
> switchdev core to process the op later, in deferred queue.
> 
> Function to force to process switchdev deferred ops can be called by op
> caller in appropriate location, for example after it releases
> spin lock, to force switchdev core to process pending ops.

Series applied, thanks Jiri.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-14 17:40 ` [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
  2015-10-15  4:34   ` Scott Feldman
@ 2015-10-15 15:21   ` John Fastabend
  2015-10-16  8:23     ` Jiri Pirko
  1 sibling, 1 reply; 20+ messages in thread
From: John Fastabend @ 2015-10-15 15:21 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, David.Laight, stephen

On 15-10-14 10:40 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Caller should know if he can call attr_set directly (when holding RTNL)
> or if he has to defer the att_set processing for later.
> 
> This also allows drivers to sleep inside attr_set and report operation
> status back to switchdev core. Switchdev core then warns if status is
> not ok, instead of silent errors happening in drivers.
> 
> Benefit from newly introduced switchdev deferred ops infrastructure.
> 

A nit but the patch description should note your setting the defer bit
on the bridge set state.

> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/switchdev.h   |   1 +
>  net/bridge/br_stp.c       |   3 +-
>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>  3 files changed, 46 insertions(+), 66 deletions(-)
> 
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index d1c7f90..f7de6f8 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -17,6 +17,7 @@
>  
>  #define SWITCHDEV_F_NO_RECURSE		BIT(0)
>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
> +#define SWITCHDEV_F_DEFER		BIT(2)
>  
>  struct switchdev_trans_item {
>  	struct list_head list;
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index db6d243de..80c34d7 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>  {
>  	struct switchdev_attr attr = {
>  		.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
> +		.flags = SWITCHDEV_F_DEFER,
>  		.u.stp_state = state,
>  	};


This creates a possible race (with 6/8) I think, please check!

In del_nbp() we call br_stp_disable_port() to set the port state
to BR_STATE_DISABLE and disabling learning events. But with this
patch it can be deferred. Also note the STP agent may be in userspace
which actually seems more likely the case because you likely want to
run some more modern variant of STP than the kernel supports.

So at some point in the future the driver will turn off learning. At
the same time we call br_fdb_delete_by_port which calls a deferred
set of fdb deletes.

I don't see how you guarantee learning is off before you start doing
the deletes here and possibly learning new addresses after the software
side believes the port is down.

So

   br_stp_disable_port
                           br_fdb_delete_by_port
                           {fdb_del_external_learn}
   [hw learns a fdb]
   [hw disables learning]

What stops this from happening?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-15 15:21   ` John Fastabend
@ 2015-10-16  8:23     ` Jiri Pirko
  2015-10-16 16:20       ` Scott Feldman
  2015-10-17  2:11       ` John Fastabend
  0 siblings, 2 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-16  8:23 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, davem, idosch, eladr, sfeldma, f.fainelli, linux,
	vivien.didelot, andrew, David.Laight, stephen

Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastabend@gmail.com wrote:
>On 15-10-14 10:40 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Caller should know if he can call attr_set directly (when holding RTNL)
>> or if he has to defer the att_set processing for later.
>> 
>> This also allows drivers to sleep inside attr_set and report operation
>> status back to switchdev core. Switchdev core then warns if status is
>> not ok, instead of silent errors happening in drivers.
>> 
>> Benefit from newly introduced switchdev deferred ops infrastructure.
>> 
>
>A nit but the patch description should note your setting the defer bit
>on the bridge set state.
>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/bridge/br_stp.c       |   3 +-
>>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>  3 files changed, 46 insertions(+), 66 deletions(-)
>> 
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index d1c7f90..f7de6f8 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,7 @@
>>  
>>  #define SWITCHDEV_F_NO_RECURSE		BIT(0)
>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
>> +#define SWITCHDEV_F_DEFER		BIT(2)
>>  
>>  struct switchdev_trans_item {
>>  	struct list_head list;
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index db6d243de..80c34d7 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>  {
>>  	struct switchdev_attr attr = {
>>  		.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>> +		.flags = SWITCHDEV_F_DEFER,
>>  		.u.stp_state = state,
>>  	};
>
>
>This creates a possible race (with 6/8) I think, please check!

Wait. This patch does not change the previous behaviour. Patch 6 does,
so I don't understand why you are asking here. Confusing.


>
>In del_nbp() we call br_stp_disable_port() to set the port state
>to BR_STATE_DISABLE and disabling learning events. But with this
>patch it can be deferred. Also note the STP agent may be in userspace
>which actually seems more likely the case because you likely want to
>run some more modern variant of STP than the kernel supports.
>
>So at some point in the future the driver will turn off learning. At
>the same time we call br_fdb_delete_by_port which calls a deferred
>set of fdb deletes.
>
>I don't see how you guarantee learning is off before you start doing
>the deletes here and possibly learning new addresses after the software
>side believes the port is down.
>
>So
>
>   br_stp_disable_port
>                           br_fdb_delete_by_port
>                           {fdb_del_external_learn}
>   [hw learns a fdb]
>   [hw disables learning]
>
>What stops this from happening?

Okay. This behaviour is the same as without the patchset. What would
resolve the issue it to put switchdev_deferred_process() after
br_stp_disable_port() and before br_fdb_delete_by_port() call.
That would enforce stp change to happen in hw before fdbs are explicitly
deleted. Sound good to you?



>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-16  8:23     ` Jiri Pirko
@ 2015-10-16 16:20       ` Scott Feldman
  2015-10-16 17:16         ` John Fastabend
  2015-10-17  2:11       ` John Fastabend
  1 sibling, 1 reply; 20+ messages in thread
From: Scott Feldman @ 2015-10-16 16:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On Fri, Oct 16, 2015 at 1:23 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastabend@gmail.com wrote:
>>On 15-10-14 10:40 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>> or if he has to defer the att_set processing for later.
>>>
>>> This also allows drivers to sleep inside attr_set and report operation
>>> status back to switchdev core. Switchdev core then warns if status is
>>> not ok, instead of silent errors happening in drivers.
>>>
>>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>>
>>
>>A nit but the patch description should note your setting the defer bit
>>on the bridge set state.
>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  include/net/switchdev.h   |   1 +
>>>  net/bridge/br_stp.c       |   3 +-
>>>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>>  3 files changed, 46 insertions(+), 66 deletions(-)
>>>
>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index d1c7f90..f7de6f8 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,7 @@
>>>
>>>  #define SWITCHDEV_F_NO_RECURSE              BIT(0)
>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP BIT(1)
>>> +#define SWITCHDEV_F_DEFER           BIT(2)
>>>
>>>  struct switchdev_trans_item {
>>>      struct list_head list;
>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>> index db6d243de..80c34d7 100644
>>> --- a/net/bridge/br_stp.c
>>> +++ b/net/bridge/br_stp.c
>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>  {
>>>      struct switchdev_attr attr = {
>>>              .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>> +            .flags = SWITCHDEV_F_DEFER,
>>>              .u.stp_state = state,
>>>      };
>>
>>
>>This creates a possible race (with 6/8) I think, please check!
>
> Wait. This patch does not change the previous behaviour. Patch 6 does,
> so I don't understand why you are asking here. Confusing.
>
>
>>
>>In del_nbp() we call br_stp_disable_port() to set the port state
>>to BR_STATE_DISABLE and disabling learning events. But with this
>>patch it can be deferred. Also note the STP agent may be in userspace
>>which actually seems more likely the case because you likely want to
>>run some more modern variant of STP than the kernel supports.
>>
>>So at some point in the future the driver will turn off learning. At
>>the same time we call br_fdb_delete_by_port which calls a deferred
>>set of fdb deletes.
>>
>>I don't see how you guarantee learning is off before you start doing
>>the deletes here and possibly learning new addresses after the software
>>side believes the port is down.
>>
>>So
>>
>>   br_stp_disable_port
>>                           br_fdb_delete_by_port
>>                           {fdb_del_external_learn}
>>   [hw learns a fdb]
>>   [hw disables learning]
>>
>>What stops this from happening?
>
> Okay. This behaviour is the same as without the patchset. What would
> resolve the issue it to put switchdev_deferred_process() after
> br_stp_disable_port() and before br_fdb_delete_by_port() call.
> That would enforce stp change to happen in hw before fdbs are explicitly
> deleted. Sound good to you?

Doesn't HW already see things in the right order since items are
dequeued from the deferred list in the order queued?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-16 16:20       ` Scott Feldman
@ 2015-10-16 17:16         ` John Fastabend
  0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2015-10-16 17:16 UTC (permalink / raw)
  To: Scott Feldman, Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-16 09:20 AM, Scott Feldman wrote:
> On Fri, Oct 16, 2015 at 1:23 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastabend@gmail.com wrote:
>>> On 15-10-14 10:40 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>> or if he has to defer the att_set processing for later.
>>>>
>>>> This also allows drivers to sleep inside attr_set and report operation
>>>> status back to switchdev core. Switchdev core then warns if status is
>>>> not ok, instead of silent errors happening in drivers.
>>>>
>>>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>>>
>>>
>>> A nit but the patch description should note your setting the defer bit
>>> on the bridge set state.
>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>>  include/net/switchdev.h   |   1 +
>>>>  net/bridge/br_stp.c       |   3 +-
>>>>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>>>  3 files changed, 46 insertions(+), 66 deletions(-)
>>>>
>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index d1c7f90..f7de6f8 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,7 @@
>>>>
>>>>  #define SWITCHDEV_F_NO_RECURSE              BIT(0)
>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP BIT(1)
>>>> +#define SWITCHDEV_F_DEFER           BIT(2)
>>>>
>>>>  struct switchdev_trans_item {
>>>>      struct list_head list;
>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>> index db6d243de..80c34d7 100644
>>>> --- a/net/bridge/br_stp.c
>>>> +++ b/net/bridge/br_stp.c
>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>  {
>>>>      struct switchdev_attr attr = {
>>>>              .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>> +            .flags = SWITCHDEV_F_DEFER,
>>>>              .u.stp_state = state,
>>>>      };
>>>
>>>
>>> This creates a possible race (with 6/8) I think, please check!
>>
>> Wait. This patch does not change the previous behaviour. Patch 6 does,
>> so I don't understand why you are asking here. Confusing.
>>
>>
>>>
>>> In del_nbp() we call br_stp_disable_port() to set the port state
>>> to BR_STATE_DISABLE and disabling learning events. But with this
>>> patch it can be deferred. Also note the STP agent may be in userspace
>>> which actually seems more likely the case because you likely want to
>>> run some more modern variant of STP than the kernel supports.
>>>
>>> So at some point in the future the driver will turn off learning. At
>>> the same time we call br_fdb_delete_by_port which calls a deferred
>>> set of fdb deletes.
>>>
>>> I don't see how you guarantee learning is off before you start doing
>>> the deletes here and possibly learning new addresses after the software
>>> side believes the port is down.
>>>
>>> So
>>>
>>>   br_stp_disable_port
>>>                           br_fdb_delete_by_port
>>>                           {fdb_del_external_learn}
>>>   [hw learns a fdb]
>>>   [hw disables learning]
>>>
>>> What stops this from happening?
>>
>> Okay. This behaviour is the same as without the patchset. What would
>> resolve the issue it to put switchdev_deferred_process() after
>> br_stp_disable_port() and before br_fdb_delete_by_port() call.
>> That would enforce stp change to happen in hw before fdbs are explicitly
>> deleted. Sound good to you?
> 
> Doesn't HW already see things in the right order since items are
> dequeued from the deferred list in the order queued?
> 

I just noticed there is the rtnl lock around the deferred process so
this should mean only a single deferred task is run at a time. OK sort
of a big hammer though.

+static void switchdev_deferred_process_work(struct work_struct *work)
+{
+	rtnl_lock();
+	switchdev_deferred_process();
+	rtnl_unlock();
+}

But I don't see how this block of code works,

        spin_lock_bh(&br->lock);
        br_stp_disable_port(p);
        spin_unlock_bh(&br->lock);

        br_ifinfo_notify(RTM_DELLINK, p);

        list_del_rcu(&p->list);

        nbp_vlan_flush(p);
        br_fdb_delete_by_port(br, p, 0, 1);
        switchdev_deferred_process();


If you defer disabling learning then setup a bunch of deletes also
deferred, what guarantees that learning was disabled before you setup
the deferred fdb delete list?

It seems you may call rocker_port_fdb_learn_work() after the
br_fdb_delete_by_port call above but before the deferred learn disable
call which would have disabled it. Also the rocker_port_fdb_learn_work
queue is on a lw->work and doesn't use the same locking?

Its sort of an interesting hieararchy of work queues being built up
here. I'm being a bit overly paranoid at this point because I've seen
some really ugly bugs around these types of things that are difficult
to spot because they are correctness bugs instead of hard crashes.

.John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-16  8:23     ` Jiri Pirko
  2015-10-16 16:20       ` Scott Feldman
@ 2015-10-17  2:11       ` John Fastabend
  1 sibling, 0 replies; 20+ messages in thread
From: John Fastabend @ 2015-10-17  2:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, sfeldma, f.fainelli, linux,
	vivien.didelot, andrew, David.Laight, stephen

On 15-10-16 01:23 AM, Jiri Pirko wrote:
> Thu, Oct 15, 2015 at 05:21:22PM CEST, john.fastabend@gmail.com wrote:
>> On 15-10-14 10:40 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>> or if he has to defer the att_set processing for later.
>>>
>>> This also allows drivers to sleep inside attr_set and report operation
>>> status back to switchdev core. Switchdev core then warns if status is
>>> not ok, instead of silent errors happening in drivers.
>>>
>>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>>
>>
>> A nit but the patch description should note your setting the defer bit
>> on the bridge set state.
>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  include/net/switchdev.h   |   1 +
>>>  net/bridge/br_stp.c       |   3 +-
>>>  net/switchdev/switchdev.c | 108 ++++++++++++++++++----------------------------
>>>  3 files changed, 46 insertions(+), 66 deletions(-)
>>>
>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index d1c7f90..f7de6f8 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,7 @@
>>>  
>>>  #define SWITCHDEV_F_NO_RECURSE		BIT(0)
>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
>>> +#define SWITCHDEV_F_DEFER		BIT(2)
>>>  
>>>  struct switchdev_trans_item {
>>>  	struct list_head list;
>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>> index db6d243de..80c34d7 100644
>>> --- a/net/bridge/br_stp.c
>>> +++ b/net/bridge/br_stp.c
>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>  {
>>>  	struct switchdev_attr attr = {
>>>  		.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>> +		.flags = SWITCHDEV_F_DEFER,
>>>  		.u.stp_state = state,
>>>  	};
>>
>>
>> This creates a possible race (with 6/8) I think, please check!
> 
> Wait. This patch does not change the previous behaviour. Patch 6 does,
> so I don't understand why you are asking here. Confusing.
> 

Sorry if its confusing I keyed of the addition of the SWITCHDEV_F_DEFER
here.

> 
>>
>> In del_nbp() we call br_stp_disable_port() to set the port state
>> to BR_STATE_DISABLE and disabling learning events. But with this
>> patch it can be deferred. Also note the STP agent may be in userspace
>> which actually seems more likely the case because you likely want to
>> run some more modern variant of STP than the kernel supports.
>>
>> So at some point in the future the driver will turn off learning. At
>> the same time we call br_fdb_delete_by_port which calls a deferred
>> set of fdb deletes.
>>
>> I don't see how you guarantee learning is off before you start doing
>> the deletes here and possibly learning new addresses after the software
>> side believes the port is down.
>>
>> So
>>
>>   br_stp_disable_port
>>                           br_fdb_delete_by_port
>>                           {fdb_del_external_learn}
>>   [hw learns a fdb]
>>   [hw disables learning]
>>
>> What stops this from happening?
> 
> Okay. This behaviour is the same as without the patchset. What would
> resolve the issue it to put switchdev_deferred_process() after
> br_stp_disable_port() and before br_fdb_delete_by_port() call.
> That would enforce stp change to happen in hw before fdbs are explicitly
> deleted. Sound good to you?

OK so putting the switchdev_deferred_process() between the disable_port
and the delete_by_port will enforce the stp change to happen in hw
before the fdbs are explicitly deleted. I think this is minimally
required. I don't like scattering these flush_workqueue() calls all
over the place but I don't have any better ideas right now so sounds
good enough.

But now I'm wondering if you can have a deferred fdb add in the rocker
driver (rocker_port_fdb_learn_work) running in parallel with this that
could happen after the delete and add a bogus fdb entry. I think you
also need to have a flush in rocker_port_stp_update() to handle this
case.

Also I agree these issues were not completely caused by your patches.

Thanks,
John

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del
  2015-10-14 17:40 ` [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
@ 2015-10-19  7:55   ` Scott Feldman
  2015-10-19  8:24     ` Jiri Pirko
  0 siblings, 1 reply; 20+ messages in thread
From: Scott Feldman @ 2015-10-19  7:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Similar to the attr usecase, the caller knows if he is holding RTNL and is
> in atomic section. So let the called to decide the correct call variant.
>
> This allows drivers to sleep inside their ops and wait for hw to get the
> operation status. Then the status is propagated into switchdev core.
> This avoids silent errors in drivers.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/switchdev.h   |   1 +
>  net/switchdev/switchdev.c | 100 ++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 81 insertions(+), 20 deletions(-)
>
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index f8672d7..bc865e2 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -69,6 +69,7 @@ enum switchdev_obj_id {
>
>  struct switchdev_obj {
>         enum switchdev_obj_id id;
> +       u32 flags;
>  };
>
>  /* SWITCHDEV_OBJ_ID_PORT_VLAN */
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 5963d7a..eac68c4 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -362,21 +362,8 @@ static int __switchdev_port_obj_add(struct net_device *dev,
>         return err;
>  }
>
> -/**
> - *     switchdev_port_obj_add - Add port object
> - *
> - *     @dev: port device
> - *     @id: object ID
> - *     @obj: object to add
> - *
> - *     Use a 2-phase prepare-commit transaction model to ensure
> - *     system is not left in a partially updated state due to
> - *     failure from driver/device.
> - *
> - *     rtnl_lock must be held.
> - */
> -int switchdev_port_obj_add(struct net_device *dev,
> -                          const struct switchdev_obj *obj)
> +static int switchdev_port_obj_add_now(struct net_device *dev,
> +                                     const struct switchdev_obj *obj)
>  {
>         struct switchdev_trans trans;
>         int err;
> @@ -418,18 +405,53 @@ int switchdev_port_obj_add(struct net_device *dev,
>
>         return err;
>  }
> -EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
> +
> +static void switchdev_port_obj_add_deferred(struct net_device *dev,
> +                                           const void *data)
> +{
> +       const struct switchdev_obj *obj = data;
> +       int err;
> +
> +       err = switchdev_port_obj_add_now(dev, obj);
> +       if (err && err != -EOPNOTSUPP)
> +               netdev_err(dev, "failed (err=%d) to add object (id=%d)\n",
> +                          err, obj->id);
> +}
> +
> +static int switchdev_port_obj_add_defer(struct net_device *dev,
> +                                       const struct switchdev_obj *obj)
> +{
> +       return switchdev_deferred_enqueue(dev, obj, sizeof(*obj),
> +                                         switchdev_port_obj_add_deferred);
> +}

Hi Jiri,

I think there is a problem here with this patch.  When we defer adding
an obj, we're only copying the inner struct switchdev_obj.  The rest
of the containing obj is lost.  So when
switchdev_port_obj_add_deferred() runs, we're up casting to garbage.
So if this FDB obj was deferred, the addr, vid, and ndm_state members
would be garbage:

struct switchdev_obj_port_fdb {
        struct switchdev_obj obj;
        unsigned char addr[ETH_ALEN];
        u16 vid;
        u16 ndm_state;
};

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del
  2015-10-19  7:55   ` Scott Feldman
@ 2015-10-19  8:24     ` Jiri Pirko
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2015-10-19  8:24 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

Mon, Oct 19, 2015 at 09:55:59AM CEST, sfeldma@gmail.com wrote:
>On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Similar to the attr usecase, the caller knows if he is holding RTNL and is
>> in atomic section. So let the called to decide the correct call variant.
>>
>> This allows drivers to sleep inside their ops and wait for hw to get the
>> operation status. Then the status is propagated into switchdev core.
>> This avoids silent errors in drivers.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/switchdev/switchdev.c | 100 ++++++++++++++++++++++++++++++++++++----------
>>  2 files changed, 81 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index f8672d7..bc865e2 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -69,6 +69,7 @@ enum switchdev_obj_id {
>>
>>  struct switchdev_obj {
>>         enum switchdev_obj_id id;
>> +       u32 flags;
>>  };
>>
>>  /* SWITCHDEV_OBJ_ID_PORT_VLAN */
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index 5963d7a..eac68c4 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -362,21 +362,8 @@ static int __switchdev_port_obj_add(struct net_device *dev,
>>         return err;
>>  }
>>
>> -/**
>> - *     switchdev_port_obj_add - Add port object
>> - *
>> - *     @dev: port device
>> - *     @id: object ID
>> - *     @obj: object to add
>> - *
>> - *     Use a 2-phase prepare-commit transaction model to ensure
>> - *     system is not left in a partially updated state due to
>> - *     failure from driver/device.
>> - *
>> - *     rtnl_lock must be held.
>> - */
>> -int switchdev_port_obj_add(struct net_device *dev,
>> -                          const struct switchdev_obj *obj)
>> +static int switchdev_port_obj_add_now(struct net_device *dev,
>> +                                     const struct switchdev_obj *obj)
>>  {
>>         struct switchdev_trans trans;
>>         int err;
>> @@ -418,18 +405,53 @@ int switchdev_port_obj_add(struct net_device *dev,
>>
>>         return err;
>>  }
>> -EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
>> +
>> +static void switchdev_port_obj_add_deferred(struct net_device *dev,
>> +                                           const void *data)
>> +{
>> +       const struct switchdev_obj *obj = data;
>> +       int err;
>> +
>> +       err = switchdev_port_obj_add_now(dev, obj);
>> +       if (err && err != -EOPNOTSUPP)
>> +               netdev_err(dev, "failed (err=%d) to add object (id=%d)\n",
>> +                          err, obj->id);
>> +}
>> +
>> +static int switchdev_port_obj_add_defer(struct net_device *dev,
>> +                                       const struct switchdev_obj *obj)
>> +{
>> +       return switchdev_deferred_enqueue(dev, obj, sizeof(*obj),
>> +                                         switchdev_port_obj_add_deferred);
>> +}
>
>Hi Jiri,
>
>I think there is a problem here with this patch.  When we defer adding
>an obj, we're only copying the inner struct switchdev_obj.  The rest
>of the containing obj is lost.  So when
>switchdev_port_obj_add_deferred() runs, we're up casting to garbage.
>So if this FDB obj was deferred, the addr, vid, and ndm_state members
>would be garbage:

Right, we should pass correct size. I'll fix this in a moment. Thanks

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-10-19  8:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-14 17:40 [patch net-next v5 0/8] switchdev: change locking Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 1/8] switchdev: introduce switchdev deferred ops infrastructure Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls Jiri Pirko
2015-10-14 19:01   ` Vivien Didelot
2015-10-14 17:40 ` [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
2015-10-15  4:34   ` Scott Feldman
2015-10-15  5:57     ` Jiri Pirko
2015-10-15 15:21   ` John Fastabend
2015-10-16  8:23     ` Jiri Pirko
2015-10-16 16:20       ` Scott Feldman
2015-10-16 17:16         ` John Fastabend
2015-10-17  2:11       ` John Fastabend
2015-10-14 17:40 ` [patch net-next v5 4/8] switchdev: remove pointers from switchdev objects Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 5/8] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
2015-10-19  7:55   ` Scott Feldman
2015-10-19  8:24     ` Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 6/8] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 7/8] rocker: remove nowait from switchdev callbacks Jiri Pirko
2015-10-14 17:40 ` [patch net-next v5 8/8] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
2015-10-15 13:10 ` [patch net-next v5 0/8] switchdev: change locking David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.