All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next v4 0/7] switchdev: change locking
@ 2015-10-12 17:54 Jiri Pirko
  2015-10-12 17:54 ` [patch net-next v4 1/7] switchdev: introduce switchdev workqueue Jiri Pirko
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
  0 siblings, 2 replies; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 17:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

This is something which I'm currently struggling with.
Callers of attr_set and obj_add/del often hold not only RTNL, but also
spinlock (bridge). So in that case, the driver implementing the op cannot sleep.

The way rocker is dealing with this now is just to invoke driver operation
and go out, without any checking or reporting of the operation status.

Since it would be nice to at least put a warning in case the operation fails,
it makes sense to do this in delayed work directly in switchdev core
instead of implementing this in separate drivers. And that is what this patchset
is introducing.

So from now on, the locking of switchdev mod ops is consistent. Caller either
holds rtnl mutex or in case it does not, caller sets defer flag, telling
switchdev core to process the op later in delayed work.

Flush function for switchdev deferred ops can be called by op
caller in appropriate location, for example after it releases
spin lock, to force switchdev core to process pending ops.

v1->v2:
- rebased on current net-next head (including Scott's ageing patchset)
v2->v3:
- fixed comment s/of/or/ typo suggested by Nik
v3->v4:
- the actual patchset is sent instead of different branch I send in v3 :/

Jiri Pirko (7):
  switchdev: introduce switchdev workqueue
  switchdev: allow caller to explicitly request attr_set as deferred
  switchdev: remove pointers from switchdev objects
  switchdev: introduce possibility to defer obj_add/del
  bridge: defer switchdev fdb del call in fdb_del_external_learn
  rocker: remove nowait from switchdev callbacks.
  switchdev: assert rtnl mutex when going over lower netdevs

 drivers/net/ethernet/rocker/rocker.c |  13 +-
 include/net/switchdev.h              |  14 +-
 net/bridge/br_fdb.c                  |   7 +-
 net/bridge/br_if.c                   |   3 +
 net/bridge/br_stp.c                  |   3 +-
 net/dsa/slave.c                      |   2 +-
 net/switchdev/switchdev.c            | 289 ++++++++++++++++++++++++-----------
 7 files changed, 231 insertions(+), 100 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [patch net-next v4 1/7] switchdev: introduce switchdev workqueue
  2015-10-12 17:54 [patch net-next v4 0/7] switchdev: change locking Jiri Pirko
@ 2015-10-12 17:54 ` Jiri Pirko
  2015-10-13  3:01   ` Scott Feldman
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 17:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

This is going to be used for deferred operations.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/switchdev.h   |  5 +++++
 net/switchdev/switchdev.c | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 1ce7083..d2879f2 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -205,6 +205,7 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 void switchdev_port_fwd_mark_set(struct net_device *dev,
 				 struct net_device *group_dev,
 				 bool joining);
+void switchdev_flush_deferred(void);
 
 #else
 
@@ -326,6 +327,10 @@ static inline void switchdev_port_fwd_mark_set(struct net_device *dev,
 {
 }
 
+static inline void switchdev_flush_deferred(void)
+{
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 7a9ab90..d119e9c 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -17,9 +17,12 @@
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
 #include <linux/list.h>
+#include <linux/workqueue.h>
 #include <net/ip_fib.h>
 #include <net/switchdev.h>
 
+static struct workqueue_struct *switchdev_wq;
+
 /**
  *	switchdev_trans_item_enqueue - Enqueue data item to transaction queue
  *
@@ -1217,3 +1220,19 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
 	dev->offload_fwd_mark = mark;
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fwd_mark_set);
+
+void switchdev_flush_deferred(void)
+{
+	flush_workqueue(switchdev_wq);
+}
+EXPORT_SYMBOL_GPL(switchdev_flush_deferred);
+
+static int __init switchdev_init(void)
+{
+	switchdev_wq = create_workqueue("switchdev");
+	if (!switchdev_wq)
+		return -ENOMEM;
+	return 0;
+}
+
+subsys_initcall(switchdev_init);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-12 17:54 [patch net-next v4 0/7] switchdev: change locking Jiri Pirko
  2015-10-12 17:54 ` [patch net-next v4 1/7] switchdev: introduce switchdev workqueue Jiri Pirko
@ 2015-10-12 18:03 ` Jiri Pirko
  2015-10-12 18:03   ` [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects Jiri Pirko
                     ` (5 more replies)
  1 sibling, 6 replies; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 18:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/switchdev.h   |   1 +
 net/bridge/br_stp.c       |   3 +-
 net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
 3 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index d2879f2..6b109e4 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -17,6 +17,7 @@
 
 #define SWITCHDEV_F_NO_RECURSE		BIT(0)
 #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
+#define SWITCHDEV_F_DEFER		BIT(2)
 
 struct switchdev_trans_item {
 	struct list_head list;
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index db6d243de..80c34d7 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
 {
 	struct switchdev_attr attr = {
 		.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
+		.flags = SWITCHDEV_F_DEFER,
 		.u.stp_state = state,
 	};
 	int err;
 
 	p->state = state;
 	err = switchdev_port_attr_set(p->dev, &attr);
-	if (err && err != -EOPNOTSUPP)
+	if (err)
 		br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
 				(unsigned int) p->port_no, p->dev->name);
 }
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index d119e9c..9f94272 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -173,6 +173,49 @@ done:
 	return err;
 }
 
+static int switchdev_port_attr_set_now(struct net_device *dev,
+				       struct switchdev_attr *attr)
+{
+	struct switchdev_trans trans;
+	int err;
+
+	switchdev_trans_init(&trans);
+
+	/* Phase I: prepare for attr set. Driver/device should fail
+	 * here if there are going to be issues in the commit phase,
+	 * such as lack of resources or support.  The driver/device
+	 * should reserve resources needed for the commit phase here,
+	 * but should not commit the attr.
+	 */
+
+	trans.ph_prepare = true;
+	err = __switchdev_port_attr_set(dev, attr, &trans);
+	if (err) {
+		/* Prepare phase failed: abort the transaction.  Any
+		 * resources reserved in the prepare phase are
+		 * released.
+		 */
+
+		if (err != -EOPNOTSUPP)
+			switchdev_trans_items_destroy(&trans);
+
+		return err;
+	}
+
+	/* Phase II: commit attr set.  This cannot fail as a fault
+	 * of driver/device.  If it does, it's a bug in the driver/device
+	 * because the driver said everythings was OK in phase I.
+	 */
+
+	trans.ph_prepare = false;
+	err = __switchdev_port_attr_set(dev, attr, &trans);
+	WARN(err, "%s: Commit of attribute (id=%d) failed.\n",
+	     dev->name, attr->id);
+	switchdev_trans_items_warn_destroy(dev, &trans);
+
+	return err;
+}
+
 struct switchdev_attr_set_work {
 	struct work_struct work;
 	struct net_device *dev;
@@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
 {
 	struct switchdev_attr_set_work *asw =
 		container_of(work, struct switchdev_attr_set_work, work);
+	bool rtnl_locked = rtnl_is_locked();
 	int err;
 
-	rtnl_lock();
-	err = switchdev_port_attr_set(asw->dev, &asw->attr);
+	if (!rtnl_locked)
+		rtnl_lock();
+	err = switchdev_port_attr_set_now(asw->dev, &asw->attr);
 	if (err && err != -EOPNOTSUPP)
 		netdev_err(asw->dev, "failed (err=%d) to set attribute (id=%d)\n",
 			   err, asw->attr.id);
-	rtnl_unlock();
+	if (!rtnl_locked)
+		rtnl_unlock();
 
 	dev_put(asw->dev);
 	kfree(work);
@@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
 	asw->dev = dev;
 	memcpy(&asw->attr, attr, sizeof(asw->attr));
 
-	schedule_work(&asw->work);
+	queue_work(switchdev_wq, &asw->work);
 
 	return 0;
 }
@@ -225,57 +271,16 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
  *	Use a 2-phase prepare-commit transaction model to ensure
  *	system is not left in a partially updated state due to
  *	failure from driver/device.
+ *
+ *	rtnl_lock must be held and must not be in atomic section,
+ *	in case SWITCHDEV_F_DEFER flag is not set.
  */
 int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr)
 {
-	struct switchdev_trans trans;
-	int err;
-
-	if (!rtnl_is_locked()) {
-		/* Running prepare-commit transaction across stacked
-		 * devices requires nothing moves, so if rtnl_lock is
-		 * not held, schedule a worker thread to hold rtnl_lock
-		 * while setting attr.
-		 */
-
+	if (attr->flags & SWITCHDEV_F_DEFER)
 		return switchdev_port_attr_set_defer(dev, attr);
-	}
-
-	switchdev_trans_init(&trans);
-
-	/* Phase I: prepare for attr set. Driver/device should fail
-	 * here if there are going to be issues in the commit phase,
-	 * such as lack of resources or support.  The driver/device
-	 * should reserve resources needed for the commit phase here,
-	 * but should not commit the attr.
-	 */
-
-	trans.ph_prepare = true;
-	err = __switchdev_port_attr_set(dev, attr, &trans);
-	if (err) {
-		/* Prepare phase failed: abort the transaction.  Any
-		 * resources reserved in the prepare phase are
-		 * released.
-		 */
-
-		if (err != -EOPNOTSUPP)
-			switchdev_trans_items_destroy(&trans);
-
-		return err;
-	}
-
-	/* Phase II: commit attr set.  This cannot fail as a fault
-	 * of driver/device.  If it does, it's a bug in the driver/device
-	 * because the driver said everythings was OK in phase I.
-	 */
-
-	trans.ph_prepare = false;
-	err = __switchdev_port_attr_set(dev, attr, &trans);
-	WARN(err, "%s: Commit of attribute (id=%d) failed.\n",
-	     dev->name, attr->id);
-	switchdev_trans_items_warn_destroy(dev, &trans);
-
-	return err;
+	ASSERT_RTNL();
+	return switchdev_port_attr_set_now(dev, attr);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_attr_set);
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
@ 2015-10-12 18:03   ` Jiri Pirko
  2015-10-13  3:01     ` Scott Feldman
  2015-10-12 18:03   ` [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 18:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker.c |  6 +++---
 include/net/switchdev.h              |  7 +++----
 net/bridge/br_fdb.c                  |  2 +-
 net/dsa/slave.c                      |  2 +-
 net/switchdev/switchdev.c            | 11 +++++++----
 5 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index eafa907..bb956a5 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4469,7 +4469,7 @@ static int rocker_port_obj_add(struct net_device *dev,
 		fib4 = SWITCHDEV_OBJ_IPV4_FIB(obj);
 		err = rocker_port_fib_ipv4(rocker_port, trans,
 					   htonl(fib4->dst), fib4->dst_len,
-					   fib4->fi, fib4->tb_id, 0);
+					   &fib4->fi, fib4->tb_id, 0);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = rocker_port_fdb_add(rocker_port, trans,
@@ -4541,7 +4541,7 @@ static int rocker_port_obj_del(struct net_device *dev,
 		fib4 = SWITCHDEV_OBJ_IPV4_FIB(obj);
 		err = rocker_port_fib_ipv4(rocker_port, NULL,
 					   htonl(fib4->dst), fib4->dst_len,
-					   fib4->fi, fib4->tb_id,
+					   &fib4->fi, fib4->tb_id,
 					   ROCKER_OP_FLAG_REMOVE);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
@@ -4571,7 +4571,7 @@ static int rocker_port_fdb_dump(const struct rocker_port *rocker_port,
 	hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, found, entry) {
 		if (found->key.rocker_port != rocker_port)
 			continue;
-		fdb->addr = found->key.addr;
+		ether_addr_copy(fdb->addr, found->key.addr);
 		fdb->ndm_state = NUD_REACHABLE;
 		fdb->vid = rocker_port_vlan_to_vid(rocker_port,
 						   found->key.vlan_id);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 6b109e4..767d516 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -14,6 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/notifier.h>
 #include <linux/list.h>
+#include <net/ip_fib.h>
 
 #define SWITCHDEV_F_NO_RECURSE		BIT(0)
 #define SWITCHDEV_F_SKIP_EOPNOTSUPP	BIT(1)
@@ -59,8 +60,6 @@ struct switchdev_attr {
 	} u;
 };
 
-struct fib_info;
-
 enum switchdev_obj_id {
 	SWITCHDEV_OBJ_ID_UNDEFINED,
 	SWITCHDEV_OBJ_ID_PORT_VLAN,
@@ -88,7 +87,7 @@ struct switchdev_obj_ipv4_fib {
 	struct switchdev_obj obj;
 	u32 dst;
 	int dst_len;
-	struct fib_info *fi;
+	struct fib_info fi;
 	u8 tos;
 	u8 type;
 	u32 nlflags;
@@ -101,7 +100,7 @@ struct switchdev_obj_ipv4_fib {
 /* SWITCHDEV_OBJ_ID_PORT_FDB */
 struct switchdev_obj_port_fdb {
 	struct switchdev_obj obj;
-	const unsigned char *addr;
+	unsigned char addr[ETH_ALEN];
 	u16 vid;
 	u16 ndm_state;
 };
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index f43ce05..f5e7da0 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -135,10 +135,10 @@ static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
 {
 	struct switchdev_obj_port_fdb fdb = {
 		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-		.addr = f->addr.addr,
 		.vid = f->vlan_id,
 	};
 
+	ether_addr_copy(fdb.addr, f->addr.addr);
 	switchdev_port_obj_del(f->dst->dev, &fdb.obj);
 }
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bb2bd3b..2ad4e0e 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -393,7 +393,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev,
 		if (ret < 0)
 			break;
 
-		fdb->addr = addr;
+		ether_addr_copy(fdb->addr, addr);
 		fdb->vid = vid;
 		fdb->ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE;
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 9f94272..c23dd31 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -15,6 +15,7 @@
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
+#include <linux/etherdevice.h>
 #include <linux/if_bridge.h>
 #include <linux/list.h>
 #include <linux/workqueue.h>
@@ -837,10 +838,10 @@ int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 {
 	struct switchdev_obj_port_fdb fdb = {
 		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-		.addr = addr,
 		.vid = vid,
 	};
 
+	ether_addr_copy(fdb.addr, addr);
 	return switchdev_port_obj_add(dev, &fdb.obj);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_add);
@@ -862,10 +863,10 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
 {
 	struct switchdev_obj_port_fdb fdb = {
 		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-		.addr = addr,
 		.vid = vid,
 	};
 
+	ether_addr_copy(fdb.addr, addr);
 	return switchdev_port_obj_del(dev, &fdb.obj);
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_del);
@@ -1027,7 +1028,6 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
 		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
 		.dst = dst,
 		.dst_len = dst_len,
-		.fi = fi,
 		.tos = tos,
 		.type = type,
 		.nlflags = nlflags,
@@ -1036,6 +1036,8 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
 	struct net_device *dev;
 	int err = 0;
 
+	memcpy(&ipv4_fib.fi, fi, sizeof(ipv4_fib.fi));
+
 	/* Don't offload route if using custom ip rules or if
 	 * IPv4 FIB offloading has been disabled completely.
 	 */
@@ -1079,7 +1081,6 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
 		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
 		.dst = dst,
 		.dst_len = dst_len,
-		.fi = fi,
 		.tos = tos,
 		.type = type,
 		.nlflags = 0,
@@ -1088,6 +1089,8 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
 	struct net_device *dev;
 	int err = 0;
 
+	memcpy(&ipv4_fib.fi, fi, sizeof(ipv4_fib.fi));
+
 	if (!(fi->fib_flags & RTNH_F_OFFLOAD))
 		return 0;
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
  2015-10-12 18:03   ` [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects Jiri Pirko
@ 2015-10-12 18:03   ` Jiri Pirko
  2015-10-13  3:08     ` Scott Feldman
  2015-10-12 18:03   ` [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 18:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/switchdev.h   |   1 +
 net/switchdev/switchdev.c | 137 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 112 insertions(+), 26 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 767d516..14e2595 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -69,6 +69,7 @@ enum switchdev_obj_id {
 
 struct switchdev_obj {
 	enum switchdev_obj_id id;
+	u32 flags;
 };
 
 /* SWITCHDEV_OBJ_ID_PORT_VLAN */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index c23dd31..00f7363 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -311,21 +311,8 @@ static int __switchdev_port_obj_add(struct net_device *dev,
 	return err;
 }
 
-/**
- *	switchdev_port_obj_add - Add port object
- *
- *	@dev: port device
- *	@id: object ID
- *	@obj: object to add
- *
- *	Use a 2-phase prepare-commit transaction model to ensure
- *	system is not left in a partially updated state due to
- *	failure from driver/device.
- *
- *	rtnl_lock must be held.
- */
-int switchdev_port_obj_add(struct net_device *dev,
-			   const struct switchdev_obj *obj)
+static int switchdev_port_obj_add_now(struct net_device *dev,
+				      const struct switchdev_obj *obj)
 {
 	struct switchdev_trans trans;
 	int err;
@@ -367,17 +354,9 @@ int switchdev_port_obj_add(struct net_device *dev,
 
 	return err;
 }
-EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
 
-/**
- *	switchdev_port_obj_del - Delete port object
- *
- *	@dev: port device
- *	@id: object ID
- *	@obj: object to delete
- */
-int switchdev_port_obj_del(struct net_device *dev,
-			   const struct switchdev_obj *obj)
+static int switchdev_port_obj_del_now(struct net_device *dev,
+				      const struct switchdev_obj *obj)
 {
 	const struct switchdev_ops *ops = dev->switchdev_ops;
 	struct net_device *lower_dev;
@@ -393,13 +372,119 @@ int switchdev_port_obj_del(struct net_device *dev,
 	 */
 
 	netdev_for_each_lower_dev(dev, lower_dev, iter) {
-		err = switchdev_port_obj_del(lower_dev, obj);
+		err = switchdev_port_obj_del_now(lower_dev, obj);
 		if (err)
 			break;
 	}
 
 	return err;
 }
+
+struct switchdev_obj_work {
+	struct work_struct work;
+	struct net_device *dev;
+	struct switchdev_obj obj;
+	bool add; /* add or del */
+};
+
+static void switchdev_port_obj_work(struct work_struct *work)
+{
+	struct switchdev_obj_work *ow =
+			container_of(work, struct switchdev_obj_work, work);
+	bool rtnl_locked = rtnl_is_locked();
+	int err;
+
+	if (!rtnl_locked)
+		rtnl_lock();
+	if (ow->add)
+		err = switchdev_port_obj_add_now(ow->dev, &ow->obj);
+	else
+		err = switchdev_port_obj_del_now(ow->dev, &ow->obj);
+	if (err && err != -EOPNOTSUPP)
+		netdev_err(ow->dev, "failed (err=%d) to %s object (id=%d)\n",
+			   err, ow->add ? "add" : "del", ow->obj.id);
+	if (!rtnl_locked)
+		rtnl_unlock();
+
+	dev_put(ow->dev);
+	kfree(ow);
+}
+
+static int switchdev_port_obj_work_schedule(struct net_device *dev,
+					    const struct switchdev_obj *obj,
+					    bool add)
+{
+	struct switchdev_obj_work *ow;
+
+	ow = kmalloc(sizeof(*ow), GFP_ATOMIC);
+	if (!ow)
+		return -ENOMEM;
+
+	INIT_WORK(&ow->work, switchdev_port_obj_work);
+
+	dev_hold(dev);
+	ow->dev = dev;
+	memcpy(&ow->obj, obj, sizeof(ow->obj));
+	ow->add = add;
+
+	queue_work(switchdev_wq, &ow->work);
+	return 0;
+}
+
+static int switchdev_port_obj_add_defer(struct net_device *dev,
+					const struct switchdev_obj *obj)
+{
+	return switchdev_port_obj_work_schedule(dev, obj, true);
+}
+
+/**
+ *	switchdev_port_obj_add - Add port object
+ *
+ *	@dev: port device
+ *	@id: object ID
+ *	@obj: object to add
+ *
+ *	Use a 2-phase prepare-commit transaction model to ensure
+ *	system is not left in a partially updated state due to
+ *	failure from driver/device.
+ *
+ *	rtnl_lock must be held and must not be in atomic section,
+ *	in case SWITCHDEV_F_DEFER flag is not set.
+ */
+int switchdev_port_obj_add(struct net_device *dev,
+			   const struct switchdev_obj *obj)
+{
+	if (obj->flags & SWITCHDEV_F_DEFER)
+		return switchdev_port_obj_add_defer(dev, obj);
+	ASSERT_RTNL();
+	return switchdev_port_obj_add_now(dev, obj);
+}
+EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
+
+static int switchdev_port_obj_del_defer(struct net_device *dev,
+					const struct switchdev_obj *obj)
+{
+	return switchdev_port_obj_work_schedule(dev, obj, false);
+}
+
+/**
+ *	switchdev_port_obj_del - Delete port object
+ *
+ *	@dev: port device
+ *	@id: object ID
+ *	@obj: object to delete
+ *
+ *	rtnl_lock must be held and must not be in atomic section,
+ *	in case SWITCHDEV_F_DEFER flag is not set.
+ */
+int switchdev_port_obj_del(struct net_device *dev,
+			   const struct switchdev_obj *obj)
+{
+	if (obj->flags & SWITCHDEV_F_DEFER)
+		return switchdev_port_obj_del_defer(dev, obj);
+	ASSERT_RTNL();
+	return switchdev_port_obj_del_now(dev, obj);
+}
 EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
 
 /**
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
  2015-10-12 18:03   ` [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects Jiri Pirko
  2015-10-12 18:03   ` [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
@ 2015-10-12 18:03   ` Jiri Pirko
  2015-10-13  3:28     ` Scott Feldman
  2015-10-12 18:03   ` [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks Jiri Pirko
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 18:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

Since spinlock is held here, defer the switchdev operation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/bridge/br_fdb.c | 5 ++++-
 net/bridge/br_if.c  | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index f5e7da0..c88bd8e 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
 static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
 {
 	struct switchdev_obj_port_fdb fdb = {
-		.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
+		.obj = {
+			.id = SWITCHDEV_OBJ_ID_PORT_FDB,
+			.flags = SWITCHDEV_F_DEFER,
+		},
 		.vid = f->vlan_id,
 	};
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 934cae9..09147cb 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -24,6 +24,7 @@
 #include <linux/slab.h>
 #include <net/sock.h>
 #include <linux/if_vlan.h>
+#include <net/switchdev.h>
 
 #include "br_private.h"
 
@@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
 	list_del_rcu(&p->list);
 
 	br_fdb_delete_by_port(br, p, 0, 1);
+	switchdev_flush_deferred();
+
 	nbp_update_port_count(br);
 
 	netdev_upper_dev_unlink(dev, br->dev);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
                     ` (2 preceding siblings ...)
  2015-10-12 18:03   ` [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
@ 2015-10-12 18:03   ` Jiri Pirko
  2015-10-13  4:02     ` Scott Feldman
  2015-10-12 18:03   ` [patch net-next v4 7/7] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
  2015-10-13  2:52   ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Scott Feldman
  5 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 18:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

No need to avoid sleeping in switchdev callbacks now, as the switchdev
core allows it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index bb956a5..9629c5b5 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port *rocker_port,
 	    rocker_port->stp_state == BR_STATE_FORWARDING)
 		return 0;
 
-	flags |= ROCKER_OP_FLAG_REMOVE;
+	flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
 
 	spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
 
@@ -4382,8 +4382,7 @@ static int rocker_port_attr_set(struct net_device *dev,
 
 	switch (attr->id) {
 	case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
-		err = rocker_port_stp_update(rocker_port, trans,
-					     ROCKER_OP_FLAG_NOWAIT,
+		err = rocker_port_stp_update(rocker_port, trans, 0,
 					     attr->u.stp_state);
 		break;
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
@@ -4517,7 +4516,7 @@ static int rocker_port_fdb_del(struct rocker_port *rocker_port,
 			       const struct switchdev_obj_port_fdb *fdb)
 {
 	__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
-	int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
+	int flags = ROCKER_OP_FLAG_REMOVE;
 
 	if (!rocker_port_is_bridged(rocker_port))
 		return -EINVAL;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [patch net-next v4 7/7] switchdev: assert rtnl mutex when going over lower netdevs
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
                     ` (3 preceding siblings ...)
  2015-10-12 18:03   ` [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks Jiri Pirko
@ 2015-10-12 18:03   ` Jiri Pirko
  2015-10-13  2:52   ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Scott Feldman
  5 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2015-10-12 18:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, sfeldma, f.fainelli, linux, vivien.didelot,
	andrew, john.fastabend, David.Laight, stephen

From: Jiri Pirko <jiri@mellanox.com>

netdev_for_each_lower_dev has to be called with rtnl mutex held. So
better enforce it in switchdev functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/switchdev/switchdev.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 00f7363..619c2dc 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -494,6 +494,8 @@ EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
  *	@id: object ID
  *	@obj: object to dump
  *	@cb: function to call with a filled object
+ *
+ *	rtnl_lock must be held.
  */
 int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj,
 			    switchdev_obj_dump_cb_t *cb)
@@ -503,6 +505,8 @@ int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj,
 	struct list_head *iter;
 	int err = -EOPNOTSUPP;
 
+	ASSERT_RTNL();
+
 	if (ops && ops->switchdev_port_obj_dump)
 		return ops->switchdev_port_obj_dump(dev, obj, cb);
 
@@ -1068,6 +1072,8 @@ static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
 	struct net_device *dev = NULL;
 	int nhsel;
 
+	ASSERT_RTNL();
+
 	/* For this route, all nexthop devs must be on the same switch. */
 
 	for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
@@ -1298,10 +1304,11 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
 	u32 mark = dev->ifindex;
 	u32 reset_mark = 0;
 
-	if (group_dev && joining) {
-		mark = switchdev_port_fwd_mark_get(dev, group_dev);
-	} else if (group_dev && !joining) {
-		if (dev->offload_fwd_mark == mark)
+	if (group_dev) {
+		ASSERT_RTNL();
+		if (joining)
+			mark = switchdev_port_fwd_mark_get(dev, group_dev);
+		else if (dev->offload_fwd_mark == mark)
 			/* Ohoh, this port was the mark reference port,
 			 * but it's leaving the group, so reset the
 			 * mark for the remaining ports in the group.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
                     ` (4 preceding siblings ...)
  2015-10-12 18:03   ` [patch net-next v4 7/7] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
@ 2015-10-13  2:52   ` Scott Feldman
  2015-10-13  4:40     ` John Fastabend
  2015-10-13  5:44     ` Jiri Pirko
  5 siblings, 2 replies; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  2:52 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Caller should know if he can call attr_set directly (when holding RTNL)
> or if he has to defer the att_set processing for later.
>
> This also allows drivers to sleep inside attr_set and report operation
> status back to switchdev core. Switchdev core then warns if status is
> not ok, instead of silent errors happening in drivers.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/switchdev.h   |   1 +
>  net/bridge/br_stp.c       |   3 +-
>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>  3 files changed, 59 insertions(+), 52 deletions(-)
>
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index d2879f2..6b109e4 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -17,6 +17,7 @@
>
>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
> +#define SWITCHDEV_F_DEFER              BIT(2)
>
>  struct switchdev_trans_item {
>         struct list_head list;
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index db6d243de..80c34d7 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>  {
>         struct switchdev_attr attr = {
>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
> +               .flags = SWITCHDEV_F_DEFER,
>                 .u.stp_state = state,
>         };
>         int err;
>
>         p->state = state;
>         err = switchdev_port_attr_set(p->dev, &attr);
> -       if (err && err != -EOPNOTSUPP)
> +       if (err)

This looks like a problem as now all other non-switchdev ports will
get an WARN in the log when STP state changes.  We should only WARN if
there was an err and the err is not -EOPNOTSUPP.

>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>                                 (unsigned int) p->port_no, p->dev->name);
>  }

<snip>

>  struct switchdev_attr_set_work {
>         struct work_struct work;
>         struct net_device *dev;
> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>  {
>         struct switchdev_attr_set_work *asw =
>                 container_of(work, struct switchdev_attr_set_work, work);
> +       bool rtnl_locked = rtnl_is_locked();
>         int err;
>
> -       rtnl_lock();
> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
> +       if (!rtnl_locked)
> +               rtnl_lock();

I'm not following this change.  If someone else has rtnl_lock, we'll
not wait to grab it here ourselves, and proceed as if we have the
lock.  But what if that someone else releases the lock in the middle
of us doing switchdev_port_attr_set_now?  Seems we want to
unconditionally wait and grab the lock.  We need to block anything
from moving while we do the attr set.

> +       err = switchdev_port_attr_set_now(asw->dev, &asw->attr);
>         if (err && err != -EOPNOTSUPP)
>                 netdev_err(asw->dev, "failed (err=%d) to set attribute (id=%d)\n",
>                            err, asw->attr.id);
> -       rtnl_unlock();
> +       if (!rtnl_locked)
> +               rtnl_unlock();
>
>         dev_put(asw->dev);
>         kfree(work);
> @@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
>         asw->dev = dev;
>         memcpy(&asw->attr, attr, sizeof(asw->attr));
>
> -       schedule_work(&asw->work);
> +       queue_work(switchdev_wq, &asw->work);
>
>         return 0;
>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 1/7] switchdev: introduce switchdev workqueue
  2015-10-12 17:54 ` [patch net-next v4 1/7] switchdev: introduce switchdev workqueue Jiri Pirko
@ 2015-10-13  3:01   ` Scott Feldman
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  3:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 10:54 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> This is going to be used for deferred operations.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Acked-by: Scott Feldman <sfeldma@gmail.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects
  2015-10-12 18:03   ` [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects Jiri Pirko
@ 2015-10-13  3:01     ` Scott Feldman
  2015-10-13  4:44       ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  3:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> When object is used in deferred work, we cannot use pointers in
> switchdev object structures because the memory they point at may be already
> used by someone else. So rather do local copy of the value.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Acked-by: Scott Feldman <sfeldma@gmail.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del
  2015-10-12 18:03   ` [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
@ 2015-10-13  3:08     ` Scott Feldman
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  3:08 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Similar to the attr usecase, the caller knows if he is holding RTNL and is
> in atomic section. So let the called to decide the correct call variant.
>
> This allows drivers to sleep inside their ops and wait for hw to get the
> operation status. Then the status is propagated into switchdev core.
> This avoids silent errors in drivers.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

<snip>

> +static void switchdev_port_obj_work(struct work_struct *work)
> +{
> +       struct switchdev_obj_work *ow =
> +                       container_of(work, struct switchdev_obj_work, work);
> +       bool rtnl_locked = rtnl_is_locked();
> +       int err;
> +
> +       if (!rtnl_locked)
> +               rtnl_lock();

Same comment as on patch 2/7 about not unconditionally grabbing rtnl_lock.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-12 18:03   ` [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
@ 2015-10-13  3:28     ` Scott Feldman
  2015-10-13  3:31       ` John Fastabend
  2015-10-13  6:05       ` Jiri Pirko
  0 siblings, 2 replies; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  3:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> Since spinlock is held here, defer the switchdev operation.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  net/bridge/br_fdb.c | 5 ++++-
>  net/bridge/br_if.c  | 3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index f5e7da0..c88bd8e 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>  {
>         struct switchdev_obj_port_fdb fdb = {
> -               .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
> +               .obj = {
> +                       .id = SWITCHDEV_OBJ_ID_PORT_FDB,
> +                       .flags = SWITCHDEV_F_DEFER,
> +               },
>                 .vid = f->vlan_id,
>         };
>
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 934cae9..09147cb 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -24,6 +24,7 @@
>  #include <linux/slab.h>
>  #include <net/sock.h>
>  #include <linux/if_vlan.h>
> +#include <net/switchdev.h>
>
>  #include "br_private.h"
>
> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>         list_del_rcu(&p->list);
>
>         br_fdb_delete_by_port(br, p, 0, 1);
> +       switchdev_flush_deferred();
> +

This potentially flushes other (valid) work on the deferred queue not
related to FDB del.

I wonder if this flush step is necessary at all?  The work we deferred
to delete the FDB entry can still happen after the port has been
removed (del_nbp).  If the port driver/device find the FDB entry, then
delete it, otherwise ignore it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-13  3:28     ` Scott Feldman
@ 2015-10-13  3:31       ` John Fastabend
  2015-10-13  4:19         ` Scott Feldman
  2015-10-13  6:05       ` Jiri Pirko
  1 sibling, 1 reply; 32+ messages in thread
From: John Fastabend @ 2015-10-13  3:31 UTC (permalink / raw)
  To: Scott Feldman, Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-12 08:28 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Since spinlock is held here, defer the switchdev operation.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  net/bridge/br_fdb.c | 5 ++++-
>>  net/bridge/br_if.c  | 3 +++
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index f5e7da0..c88bd8e 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>  {
>>         struct switchdev_obj_port_fdb fdb = {
>> -               .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>> +               .obj = {
>> +                       .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>> +                       .flags = SWITCHDEV_F_DEFER,
>> +               },
>>                 .vid = f->vlan_id,
>>         };
>>
>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>> index 934cae9..09147cb 100644
>> --- a/net/bridge/br_if.c
>> +++ b/net/bridge/br_if.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/slab.h>
>>  #include <net/sock.h>
>>  #include <linux/if_vlan.h>
>> +#include <net/switchdev.h>
>>
>>  #include "br_private.h"
>>
>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>>         list_del_rcu(&p->list);
>>
>>         br_fdb_delete_by_port(br, p, 0, 1);
>> +       switchdev_flush_deferred();
>> +
> 
> This potentially flushes other (valid) work on the deferred queue not
> related to FDB del.
> 
> I wonder if this flush step is necessary at all?  The work we deferred
> to delete the FDB entry can still happen after the port has been
> removed (del_nbp).  If the port driver/device find the FDB entry, then
> delete it, otherwise ignore it.
> 

Just the first thing that springs to mind reading this comment is,

  - del gets deffered
  - add fdb
  - del runs

Is there an issue here? Sorry I'll do a more thorough review now just
thought I would toss it out there before I forget.

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.
  2015-10-12 18:03   ` [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks Jiri Pirko
@ 2015-10-13  4:02     ` Scott Feldman
  2015-10-13  6:25       ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  4:02 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> From: Jiri Pirko <jiri@mellanox.com>
>
> No need to avoid sleeping in switchdev callbacks now, as the switchdev
> core allows it.
>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  drivers/net/ethernet/rocker/rocker.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
> index bb956a5..9629c5b5 100644
> --- a/drivers/net/ethernet/rocker/rocker.c
> +++ b/drivers/net/ethernet/rocker/rocker.c
> @@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port *rocker_port,
>             rocker_port->stp_state == BR_STATE_FORWARDING)
>                 return 0;
>
> -       flags |= ROCKER_OP_FLAG_REMOVE;
> +       flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;

I understand the two changes below where you're removing NOWAIT, but
here you're adding NOWAIT which I'm not sure how that is related to
the switchdev core changes.  Is this two patches?


>         spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
>
> @@ -4382,8 +4382,7 @@ static int rocker_port_attr_set(struct net_device *dev,
>
>         switch (attr->id) {
>         case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
> -               err = rocker_port_stp_update(rocker_port, trans,
> -                                            ROCKER_OP_FLAG_NOWAIT,
> +               err = rocker_port_stp_update(rocker_port, trans, 0,
>                                              attr->u.stp_state);
>                 break;
>         case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
> @@ -4517,7 +4516,7 @@ static int rocker_port_fdb_del(struct rocker_port *rocker_port,
>                                const struct switchdev_obj_port_fdb *fdb)
>  {
>         __be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
> -       int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
> +       int flags = ROCKER_OP_FLAG_REMOVE;
>
>         if (!rocker_port_is_bridged(rocker_port))
>                 return -EINVAL;
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-13  3:31       ` John Fastabend
@ 2015-10-13  4:19         ` Scott Feldman
  2015-10-13  5:16           ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  4:19 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On Mon, Oct 12, 2015 at 8:31 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 15-10-12 08:28 PM, Scott Feldman wrote:
>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Since spinlock is held here, defer the switchdev operation.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  net/bridge/br_fdb.c | 5 ++++-
>>>  net/bridge/br_if.c  | 3 +++
>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>> index f5e7da0..c88bd8e 100644
>>> --- a/net/bridge/br_fdb.c
>>> +++ b/net/bridge/br_fdb.c
>>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>>  {
>>>         struct switchdev_obj_port_fdb fdb = {
>>> -               .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>> +               .obj = {
>>> +                       .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>> +                       .flags = SWITCHDEV_F_DEFER,
>>> +               },
>>>                 .vid = f->vlan_id,
>>>         };
>>>
>>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>>> index 934cae9..09147cb 100644
>>> --- a/net/bridge/br_if.c
>>> +++ b/net/bridge/br_if.c
>>> @@ -24,6 +24,7 @@
>>>  #include <linux/slab.h>
>>>  #include <net/sock.h>
>>>  #include <linux/if_vlan.h>
>>> +#include <net/switchdev.h>
>>>
>>>  #include "br_private.h"
>>>
>>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>>>         list_del_rcu(&p->list);
>>>
>>>         br_fdb_delete_by_port(br, p, 0, 1);
>>> +       switchdev_flush_deferred();
>>> +
>>
>> This potentially flushes other (valid) work on the deferred queue not
>> related to FDB del.
>>
>> I wonder if this flush step is necessary at all?  The work we deferred
>> to delete the FDB entry can still happen after the port has been
>> removed (del_nbp).  If the port driver/device find the FDB entry, then
>> delete it, otherwise ignore it.
>>
>
> Just the first thing that springs to mind reading this comment is,
>
>   - del gets deffered
>   - add fdb
>   - del runs
>
> Is there an issue here? Sorry I'll do a more thorough review now just
> thought I would toss it out there before I forget.

It's a valid thought to consider, for sure.  The context is these are
only FDB entries added by an external learn event.  So I believe in
your sequence, the second step to add fdb entry wouldn't happen as the
fdb entry already exists at that point (in other words, the entry has
already been learned on external device and pushed up via notifier to
bridge).  So I think we're OK in regards to your question.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  2:52   ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Scott Feldman
@ 2015-10-13  4:40     ` John Fastabend
  2015-10-13  5:45       ` Jiri Pirko
  2015-10-13  5:44     ` Jiri Pirko
  1 sibling, 1 reply; 32+ messages in thread
From: John Fastabend @ 2015-10-13  4:40 UTC (permalink / raw)
  To: Scott Feldman, Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-12 07:52 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Caller should know if he can call attr_set directly (when holding RTNL)
>> or if he has to defer the att_set processing for later.
>>
>> This also allows drivers to sleep inside attr_set and report operation
>> status back to switchdev core. Switchdev core then warns if status is
>> not ok, instead of silent errors happening in drivers.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/bridge/br_stp.c       |   3 +-
>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index d2879f2..6b109e4 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,7 @@
>>
>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>
>>  struct switchdev_trans_item {
>>         struct list_head list;
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index db6d243de..80c34d7 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>  {
>>         struct switchdev_attr attr = {
>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>> +               .flags = SWITCHDEV_F_DEFER,
>>                 .u.stp_state = state,
>>         };
>>         int err;
>>
>>         p->state = state;
>>         err = switchdev_port_attr_set(p->dev, &attr);
>> -       if (err && err != -EOPNOTSUPP)
>> +       if (err)
> 
> This looks like a problem as now all other non-switchdev ports will
> get an WARN in the log when STP state changes.  We should only WARN if
> there was an err and the err is not -EOPNOTSUPP.
> 
>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>                                 (unsigned int) p->port_no, p->dev->name);
>>  }
> 
> <snip>
> 
>>  struct switchdev_attr_set_work {
>>         struct work_struct work;
>>         struct net_device *dev;
>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>  {
>>         struct switchdev_attr_set_work *asw =
>>                 container_of(work, struct switchdev_attr_set_work, work);
>> +       bool rtnl_locked = rtnl_is_locked();
>>         int err;
>>
>> -       rtnl_lock();
>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>> +       if (!rtnl_locked)
>> +               rtnl_lock();
> 
> I'm not following this change.  If someone else has rtnl_lock, we'll
> not wait to grab it here ourselves, and proceed as if we have the
> lock.  But what if that someone else releases the lock in the middle
> of us doing switchdev_port_attr_set_now?  Seems we want to
> unconditionally wait and grab the lock.  We need to block anything
> from moving while we do the attr set.
> 

Also an additional race between setting rtnl_locked and the if stmt
and then grabbing the lock. There seems to be a something of pattern
around this where other subsystems use a rtnl_trylock and if it fails
do a restart/re-queue operation to retry. Looks like how you handle
it in the team driver at least.

>> +       err = switchdev_port_attr_set_now(asw->dev, &asw->attr);
>>         if (err && err != -EOPNOTSUPP)
>>                 netdev_err(asw->dev, "failed (err=%d) to set attribute (id=%d)\n",
>>                            err, asw->attr.id);
>> -       rtnl_unlock();
>> +       if (!rtnl_locked)
>> +               rtnl_unlock();
>>
>>         dev_put(asw->dev);
>>         kfree(work);
>> @@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
>>         asw->dev = dev;
>>         memcpy(&asw->attr, attr, sizeof(asw->attr));
>>
>> -       schedule_work(&asw->work);
>> +       queue_work(switchdev_wq, &asw->work);
>>
>>         return 0;
>>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects
  2015-10-13  3:01     ` Scott Feldman
@ 2015-10-13  4:44       ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2015-10-13  4:44 UTC (permalink / raw)
  To: Scott Feldman, Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-12 08:01 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> When object is used in deferred work, we cannot use pointers in
>> switchdev object structures because the memory they point at may be already
>> used by someone else. So rather do local copy of the value.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> 
> Acked-by: Scott Feldman <sfeldma@gmail.com>
> 

also fwiw

Reviewed-by: John Fastabend <john.r.fastabend@intel.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-13  4:19         ` Scott Feldman
@ 2015-10-13  5:16           ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2015-10-13  5:16 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-12 09:19 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 8:31 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 15-10-12 08:28 PM, Scott Feldman wrote:
>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Since spinlock is held here, defer the switchdev operation.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>>  net/bridge/br_fdb.c | 5 ++++-
>>>>  net/bridge/br_if.c  | 3 +++
>>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>>> index f5e7da0..c88bd8e 100644
>>>> --- a/net/bridge/br_fdb.c
>>>> +++ b/net/bridge/br_fdb.c
>>>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>>>  {
>>>>         struct switchdev_obj_port_fdb fdb = {
>>>> -               .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>>> +               .obj = {
>>>> +                       .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>>> +                       .flags = SWITCHDEV_F_DEFER,
>>>> +               },
>>>>                 .vid = f->vlan_id,
>>>>         };
>>>>
>>>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>>>> index 934cae9..09147cb 100644
>>>> --- a/net/bridge/br_if.c
>>>> +++ b/net/bridge/br_if.c
>>>> @@ -24,6 +24,7 @@
>>>>  #include <linux/slab.h>
>>>>  #include <net/sock.h>
>>>>  #include <linux/if_vlan.h>
>>>> +#include <net/switchdev.h>
>>>>
>>>>  #include "br_private.h"
>>>>
>>>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>>>>         list_del_rcu(&p->list);
>>>>
>>>>         br_fdb_delete_by_port(br, p, 0, 1);
>>>> +       switchdev_flush_deferred();
>>>> +
>>>
>>> This potentially flushes other (valid) work on the deferred queue not
>>> related to FDB del.
>>>
>>> I wonder if this flush step is necessary at all?  The work we deferred
>>> to delete the FDB entry can still happen after the port has been
>>> removed (del_nbp).  If the port driver/device find the FDB entry, then
>>> delete it, otherwise ignore it.
>>>
>>
>> Just the first thing that springs to mind reading this comment is,
>>
>>   - del gets deffered
>>   - add fdb
>>   - del runs
>>
>> Is there an issue here? Sorry I'll do a more thorough review now just
>> thought I would toss it out there before I forget.
> 
> It's a valid thought to consider, for sure.  The context is these are
> only FDB entries added by an external learn event.  So I believe in
> your sequence, the second step to add fdb entry wouldn't happen as the
> fdb entry already exists at that point (in other words, the entry has
> already been learned on external device and pushed up via notifier to
> bridge).  So I think we're OK in regards to your question.
> 

ah I see so the take away is we need to be very careful about who/what
sets the deferred bit or you might get yourself in a world of hurt.

Here you are just ensuring you get all the fdb addr's out of the device.
Seems OK to me just be sure you don't try to set the deferred bit on
the attributes setting the state to DISABLED so we don't get a race
there.

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  2:52   ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Scott Feldman
  2015-10-13  4:40     ` John Fastabend
@ 2015-10-13  5:44     ` Jiri Pirko
  2015-10-13  6:03       ` John Fastabend
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13  5:44 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Caller should know if he can call attr_set directly (when holding RTNL)
>> or if he has to defer the att_set processing for later.
>>
>> This also allows drivers to sleep inside attr_set and report operation
>> status back to switchdev core. Switchdev core then warns if status is
>> not ok, instead of silent errors happening in drivers.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/bridge/br_stp.c       |   3 +-
>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index d2879f2..6b109e4 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,7 @@
>>
>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>
>>  struct switchdev_trans_item {
>>         struct list_head list;
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index db6d243de..80c34d7 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>  {
>>         struct switchdev_attr attr = {
>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>> +               .flags = SWITCHDEV_F_DEFER,
>>                 .u.stp_state = state,
>>         };
>>         int err;
>>
>>         p->state = state;
>>         err = switchdev_port_attr_set(p->dev, &attr);
>> -       if (err && err != -EOPNOTSUPP)
>> +       if (err)
>
>This looks like a problem as now all other non-switchdev ports will
>get an WARN in the log when STP state changes.  We should only WARN if
>there was an err and the err is not -EOPNOTSUPP.

If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.


>
>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>                                 (unsigned int) p->port_no, p->dev->name);
>>  }
>
><snip>
>
>>  struct switchdev_attr_set_work {
>>         struct work_struct work;
>>         struct net_device *dev;
>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>  {
>>         struct switchdev_attr_set_work *asw =
>>                 container_of(work, struct switchdev_attr_set_work, work);
>> +       bool rtnl_locked = rtnl_is_locked();
>>         int err;
>>
>> -       rtnl_lock();
>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>> +       if (!rtnl_locked)
>> +               rtnl_lock();
>
>I'm not following this change.  If someone else has rtnl_lock, we'll
>not wait to grab it here ourselves, and proceed as if we have the
>lock.  But what if that someone else releases the lock in the middle
>of us doing switchdev_port_attr_set_now?  Seems we want to
>unconditionally wait and grab the lock.  We need to block anything
>from moving while we do the attr set.

Why would someone we call (driver) return the lock? In that case, he is
buggy and should be fixed.

This hunk only ensures we have rtnl_lock. If not, we take it here. We do
not take it unconditionally because we may already have it, for example
if caller of switchdev_flush_deferred holds rtnl_lock.



>
>> +       err = switchdev_port_attr_set_now(asw->dev, &asw->attr);
>>         if (err && err != -EOPNOTSUPP)
>>                 netdev_err(asw->dev, "failed (err=%d) to set attribute (id=%d)\n",
>>                            err, asw->attr.id);
>> -       rtnl_unlock();
>> +       if (!rtnl_locked)
>> +               rtnl_unlock();
>>
>>         dev_put(asw->dev);
>>         kfree(work);
>> @@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
>>         asw->dev = dev;
>>         memcpy(&asw->attr, attr, sizeof(asw->attr));
>>
>> -       schedule_work(&asw->work);
>> +       queue_work(switchdev_wq, &asw->work);
>>
>>         return 0;
>>  }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  4:40     ` John Fastabend
@ 2015-10-13  5:45       ` Jiri Pirko
  2015-10-13  5:48         ` John Fastabend
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13  5:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: Scott Feldman, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

Tue, Oct 13, 2015 at 06:40:25AM CEST, john.fastabend@gmail.com wrote:
>On 15-10-12 07:52 PM, Scott Feldman wrote:
>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>> or if he has to defer the att_set processing for later.
>>>
>>> This also allows drivers to sleep inside attr_set and report operation
>>> status back to switchdev core. Switchdev core then warns if status is
>>> not ok, instead of silent errors happening in drivers.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  include/net/switchdev.h   |   1 +
>>>  net/bridge/br_stp.c       |   3 +-
>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>
>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index d2879f2..6b109e4 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,7 @@
>>>
>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>
>>>  struct switchdev_trans_item {
>>>         struct list_head list;
>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>> index db6d243de..80c34d7 100644
>>> --- a/net/bridge/br_stp.c
>>> +++ b/net/bridge/br_stp.c
>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>  {
>>>         struct switchdev_attr attr = {
>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>> +               .flags = SWITCHDEV_F_DEFER,
>>>                 .u.stp_state = state,
>>>         };
>>>         int err;
>>>
>>>         p->state = state;
>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>> -       if (err && err != -EOPNOTSUPP)
>>> +       if (err)
>> 
>> This looks like a problem as now all other non-switchdev ports will
>> get an WARN in the log when STP state changes.  We should only WARN if
>> there was an err and the err is not -EOPNOTSUPP.
>> 
>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>  }
>> 
>> <snip>
>> 
>>>  struct switchdev_attr_set_work {
>>>         struct work_struct work;
>>>         struct net_device *dev;
>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>  {
>>>         struct switchdev_attr_set_work *asw =
>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>> +       bool rtnl_locked = rtnl_is_locked();
>>>         int err;
>>>
>>> -       rtnl_lock();
>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>> +       if (!rtnl_locked)
>>> +               rtnl_lock();
>> 
>> I'm not following this change.  If someone else has rtnl_lock, we'll
>> not wait to grab it here ourselves, and proceed as if we have the
>> lock.  But what if that someone else releases the lock in the middle
>> of us doing switchdev_port_attr_set_now?  Seems we want to
>> unconditionally wait and grab the lock.  We need to block anything
>> from moving while we do the attr set.
>> 
>
>Also an additional race between setting rtnl_locked and the if stmt
>and then grabbing the lock. There seems to be a something of pattern
>around this where other subsystems use a rtnl_trylock and if it fails
>do a restart/re-queue operation to retry. Looks like how you handle
>it in the team driver at least.

No, this is for different case. This is for case someone calls
switchdev_flush_defererd holding the rtnl_lock.


>
>>> +       err = switchdev_port_attr_set_now(asw->dev, &asw->attr);
>>>         if (err && err != -EOPNOTSUPP)
>>>                 netdev_err(asw->dev, "failed (err=%d) to set attribute (id=%d)\n",
>>>                            err, asw->attr.id);
>>> -       rtnl_unlock();
>>> +       if (!rtnl_locked)
>>> +               rtnl_unlock();
>>>
>>>         dev_put(asw->dev);
>>>         kfree(work);
>>> @@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct net_device *dev,
>>>         asw->dev = dev;
>>>         memcpy(&asw->attr, attr, sizeof(asw->attr));
>>>
>>> -       schedule_work(&asw->work);
>>> +       queue_work(switchdev_wq, &asw->work);
>>>
>>>         return 0;
>>>  }
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  5:45       ` Jiri Pirko
@ 2015-10-13  5:48         ` John Fastabend
  0 siblings, 0 replies; 32+ messages in thread
From: John Fastabend @ 2015-10-13  5:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Scott Feldman, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-12 10:45 PM, Jiri Pirko wrote:
> Tue, Oct 13, 2015 at 06:40:25AM CEST, john.fastabend@gmail.com wrote:
>> On 15-10-12 07:52 PM, Scott Feldman wrote:
>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>> or if he has to defer the att_set processing for later.
>>>>
>>>> This also allows drivers to sleep inside attr_set and report operation
>>>> status back to switchdev core. Switchdev core then warns if status is
>>>> not ok, instead of silent errors happening in drivers.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>>  include/net/switchdev.h   |   1 +
>>>>  net/bridge/br_stp.c       |   3 +-
>>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>>
>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index d2879f2..6b109e4 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,7 @@
>>>>
>>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>>
>>>>  struct switchdev_trans_item {
>>>>         struct list_head list;
>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>> index db6d243de..80c34d7 100644
>>>> --- a/net/bridge/br_stp.c
>>>> +++ b/net/bridge/br_stp.c
>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>  {
>>>>         struct switchdev_attr attr = {
>>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>> +               .flags = SWITCHDEV_F_DEFER,
>>>>                 .u.stp_state = state,
>>>>         };
>>>>         int err;
>>>>
>>>>         p->state = state;
>>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>>> -       if (err && err != -EOPNOTSUPP)
>>>> +       if (err)
>>>
>>> This looks like a problem as now all other non-switchdev ports will
>>> get an WARN in the log when STP state changes.  We should only WARN if
>>> there was an err and the err is not -EOPNOTSUPP.
>>>
>>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>>  }
>>>
>>> <snip>
>>>
>>>>  struct switchdev_attr_set_work {
>>>>         struct work_struct work;
>>>>         struct net_device *dev;
>>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>>  {
>>>>         struct switchdev_attr_set_work *asw =
>>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>>> +       bool rtnl_locked = rtnl_is_locked();
>>>>         int err;
>>>>
>>>> -       rtnl_lock();
>>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>>> +       if (!rtnl_locked)
>>>> +               rtnl_lock();
>>>
>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>> not wait to grab it here ourselves, and proceed as if we have the
>>> lock.  But what if that someone else releases the lock in the middle
>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>> unconditionally wait and grab the lock.  We need to block anything
>>> from moving while we do the attr set.
>>>
>>
>> Also an additional race between setting rtnl_locked and the if stmt
>> and then grabbing the lock. There seems to be a something of pattern
>> around this where other subsystems use a rtnl_trylock and if it fails
>> do a restart/re-queue operation to retry. Looks like how you handle
>> it in the team driver at least.
> 
> No, this is for different case. This is for case someone calls
> switchdev_flush_defererd holding the rtnl_lock.
> 

OK rather than funky if stmt could you just do a rtnl_trylock() and
put a comment explaining the reasoning?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  5:44     ` Jiri Pirko
@ 2015-10-13  6:03       ` John Fastabend
  2015-10-13  6:21         ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: John Fastabend @ 2015-10-13  6:03 UTC (permalink / raw)
  To: Jiri Pirko, Scott Feldman
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On 15-10-12 10:44 PM, Jiri Pirko wrote:
> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>> or if he has to defer the att_set processing for later.
>>>
>>> This also allows drivers to sleep inside attr_set and report operation
>>> status back to switchdev core. Switchdev core then warns if status is
>>> not ok, instead of silent errors happening in drivers.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  include/net/switchdev.h   |   1 +
>>>  net/bridge/br_stp.c       |   3 +-
>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>
>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>> index d2879f2..6b109e4 100644
>>> --- a/include/net/switchdev.h
>>> +++ b/include/net/switchdev.h
>>> @@ -17,6 +17,7 @@
>>>
>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>
>>>  struct switchdev_trans_item {
>>>         struct list_head list;
>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>> index db6d243de..80c34d7 100644
>>> --- a/net/bridge/br_stp.c
>>> +++ b/net/bridge/br_stp.c
>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>  {
>>>         struct switchdev_attr attr = {
>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>> +               .flags = SWITCHDEV_F_DEFER,
>>>                 .u.stp_state = state,
>>>         };
>>>         int err;
>>>
>>>         p->state = state;
>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>> -       if (err && err != -EOPNOTSUPP)
>>> +       if (err)
>>
>> This looks like a problem as now all other non-switchdev ports will
>> get an WARN in the log when STP state changes.  We should only WARN if
>> there was an err and the err is not -EOPNOTSUPP.
> 
> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.
> 
> 
>>
>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>  }
>>
>> <snip>
>>
>>>  struct switchdev_attr_set_work {
>>>         struct work_struct work;
>>>         struct net_device *dev;
>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>  {
>>>         struct switchdev_attr_set_work *asw =
>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>> +       bool rtnl_locked = rtnl_is_locked();
>>>         int err;
>>>
>>> -       rtnl_lock();
>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>> +       if (!rtnl_locked)
>>> +               rtnl_lock();
>>
>> I'm not following this change.  If someone else has rtnl_lock, we'll
>> not wait to grab it here ourselves, and proceed as if we have the
>> lock.  But what if that someone else releases the lock in the middle
>> of us doing switchdev_port_attr_set_now?  Seems we want to
>> unconditionally wait and grab the lock.  We need to block anything
>>from moving while we do the attr set.
> 
> Why would someone we call (driver) return the lock? In that case, he is
> buggy and should be fixed.
> 
> This hunk only ensures we have rtnl_lock. If not, we take it here. We do
> not take it unconditionally because we may already have it, for example
> if caller of switchdev_flush_deferred holds rtnl_lock.
> 

This is where you lost me. How do you know another core doesn't happen
to have the lock when you hit this code path? Maybe someone is running
an ethtool command on another core or something.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-13  3:28     ` Scott Feldman
  2015-10-13  3:31       ` John Fastabend
@ 2015-10-13  6:05       ` Jiri Pirko
  2015-10-13  6:46         ` Scott Feldman
  1 sibling, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13  6:05 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

Tue, Oct 13, 2015 at 05:28:20AM CEST, sfeldma@gmail.com wrote:
>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> Since spinlock is held here, defer the switchdev operation.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  net/bridge/br_fdb.c | 5 ++++-
>>  net/bridge/br_if.c  | 3 +++
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index f5e7da0..c88bd8e 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>  {
>>         struct switchdev_obj_port_fdb fdb = {
>> -               .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>> +               .obj = {
>> +                       .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>> +                       .flags = SWITCHDEV_F_DEFER,
>> +               },
>>                 .vid = f->vlan_id,
>>         };
>>
>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>> index 934cae9..09147cb 100644
>> --- a/net/bridge/br_if.c
>> +++ b/net/bridge/br_if.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/slab.h>
>>  #include <net/sock.h>
>>  #include <linux/if_vlan.h>
>> +#include <net/switchdev.h>
>>
>>  #include "br_private.h"
>>
>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>>         list_del_rcu(&p->list);
>>
>>         br_fdb_delete_by_port(br, p, 0, 1);
>> +       switchdev_flush_deferred();
>> +
>
>This potentially flushes other (valid) work on the deferred queue not
>related to FDB del.

flush_deferred is implemented by flush_workqueue. This enforces the
workqueue to be processed heve and sleeps until it is done. Nothing is
lost.


>
>I wonder if this flush step is necessary at all?  The work we deferred
>to delete the FDB entry can still happen after the port has been
>removed (del_nbp).  If the port driver/device find the FDB entry, then
>delete it, otherwise ignore it.

This step is necessary. Consider following scenario:

spin_lock
switchdev_schedule_deferred (workx)
spin_unlock
netdev_upper_dev_unlink
workx is executed-> driver op is executed.

But during netdev_upper_dev_unlink, resources may have been removed, so
the workx after that would fail (this happens in case of rocker).

So I added switchdev_flush_deferred so the caller may force to safely
process the pending work in right time, for example after he released
the spinlock (after br_fdb_delete_by_port in this case).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  6:03       ` John Fastabend
@ 2015-10-13  6:21         ` Jiri Pirko
  2015-10-13  6:53           ` Scott Feldman
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13  6:21 UTC (permalink / raw)
  To: John Fastabend
  Cc: Scott Feldman, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastabend@gmail.com wrote:
>On 15-10-12 10:44 PM, Jiri Pirko wrote:
>> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>> or if he has to defer the att_set processing for later.
>>>>
>>>> This also allows drivers to sleep inside attr_set and report operation
>>>> status back to switchdev core. Switchdev core then warns if status is
>>>> not ok, instead of silent errors happening in drivers.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>>>>  include/net/switchdev.h   |   1 +
>>>>  net/bridge/br_stp.c       |   3 +-
>>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>>
>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>> index d2879f2..6b109e4 100644
>>>> --- a/include/net/switchdev.h
>>>> +++ b/include/net/switchdev.h
>>>> @@ -17,6 +17,7 @@
>>>>
>>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>>
>>>>  struct switchdev_trans_item {
>>>>         struct list_head list;
>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>> index db6d243de..80c34d7 100644
>>>> --- a/net/bridge/br_stp.c
>>>> +++ b/net/bridge/br_stp.c
>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>  {
>>>>         struct switchdev_attr attr = {
>>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>> +               .flags = SWITCHDEV_F_DEFER,
>>>>                 .u.stp_state = state,
>>>>         };
>>>>         int err;
>>>>
>>>>         p->state = state;
>>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>>> -       if (err && err != -EOPNOTSUPP)
>>>> +       if (err)
>>>
>>> This looks like a problem as now all other non-switchdev ports will
>>> get an WARN in the log when STP state changes.  We should only WARN if
>>> there was an err and the err is not -EOPNOTSUPP.
>> 
>> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.
>> 
>> 
>>>
>>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>>  }
>>>
>>> <snip>
>>>
>>>>  struct switchdev_attr_set_work {
>>>>         struct work_struct work;
>>>>         struct net_device *dev;
>>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>>  {
>>>>         struct switchdev_attr_set_work *asw =
>>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>>> +       bool rtnl_locked = rtnl_is_locked();
>>>>         int err;
>>>>
>>>> -       rtnl_lock();
>>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>>> +       if (!rtnl_locked)
>>>> +               rtnl_lock();
>>>
>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>> not wait to grab it here ourselves, and proceed as if we have the
>>> lock.  But what if that someone else releases the lock in the middle
>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>> unconditionally wait and grab the lock.  We need to block anything
>>>from moving while we do the attr set.
>> 
>> Why would someone we call (driver) return the lock? In that case, he is
>> buggy and should be fixed.
>> 
>> This hunk only ensures we have rtnl_lock. If not, we take it here. We do
>> not take it unconditionally because we may already have it, for example
>> if caller of switchdev_flush_deferred holds rtnl_lock.
>> 
>
>This is where you lost me. How do you know another core doesn't happen
>to have the lock when you hit this code path? Maybe someone is running
>an ethtool command on another core or something.

You are right. The same problem exists currently in switchdev_port_attr_set.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.
  2015-10-13  4:02     ` Scott Feldman
@ 2015-10-13  6:25       ` Jiri Pirko
  2015-10-13  6:55         ` Scott Feldman
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13  6:25 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

Tue, Oct 13, 2015 at 06:02:28AM CEST, sfeldma@gmail.com wrote:
>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>>
>> No need to avoid sleeping in switchdev callbacks now, as the switchdev
>> core allows it.
>>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  drivers/net/ethernet/rocker/rocker.c | 7 +++----
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>> index bb956a5..9629c5b5 100644
>> --- a/drivers/net/ethernet/rocker/rocker.c
>> +++ b/drivers/net/ethernet/rocker/rocker.c
>> @@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port *rocker_port,
>>             rocker_port->stp_state == BR_STATE_FORWARDING)
>>                 return 0;
>>
>> -       flags |= ROCKER_OP_FLAG_REMOVE;
>> +       flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
>
>I understand the two changes below where you're removing NOWAIT, but
>here you're adding NOWAIT which I'm not sure how that is related to
>the switchdev core changes.  Is this two patches?

I removed ROCKER_OP_FLAG_NOWAIT from attr_set. But here in
rocker_port_fdb_flush, which is called from attr_set, we call
rocker_port_fdb_learn with spin lock. Therefore I had to put
ROCKER_OP_FLAG_NOWAIT here. Before ROCKER_OP_FLAG_NOWAIT removal from
attr_set this was there already.


>
>
>>         spin_lock_irqsave(&rocker->fdb_tbl_lock, lock_flags);
>>
>> @@ -4382,8 +4382,7 @@ static int rocker_port_attr_set(struct net_device *dev,
>>
>>         switch (attr->id) {
>>         case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
>> -               err = rocker_port_stp_update(rocker_port, trans,
>> -                                            ROCKER_OP_FLAG_NOWAIT,
>> +               err = rocker_port_stp_update(rocker_port, trans, 0,
>>                                              attr->u.stp_state);
>>                 break;
>>         case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
>> @@ -4517,7 +4516,7 @@ static int rocker_port_fdb_del(struct rocker_port *rocker_port,
>>                                const struct switchdev_obj_port_fdb *fdb)
>>  {
>>         __be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
>> -       int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
>> +       int flags = ROCKER_OP_FLAG_REMOVE;
>>
>>         if (!rocker_port_is_bridged(rocker_port))
>>                 return -EINVAL;
>> --
>> 1.9.3
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn
  2015-10-13  6:05       ` Jiri Pirko
@ 2015-10-13  6:46         ` Scott Feldman
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  6:46 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:05 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Oct 13, 2015 at 05:28:20AM CEST, sfeldma@gmail.com wrote:
>>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Since spinlock is held here, defer the switchdev operation.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  net/bridge/br_fdb.c | 5 ++++-
>>>  net/bridge/br_if.c  | 3 +++
>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>> index f5e7da0..c88bd8e 100644
>>> --- a/net/bridge/br_fdb.c
>>> +++ b/net/bridge/br_fdb.c
>>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>>  {
>>>         struct switchdev_obj_port_fdb fdb = {
>>> -               .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>> +               .obj = {
>>> +                       .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>> +                       .flags = SWITCHDEV_F_DEFER,
>>> +               },
>>>                 .vid = f->vlan_id,
>>>         };
>>>
>>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>>> index 934cae9..09147cb 100644
>>> --- a/net/bridge/br_if.c
>>> +++ b/net/bridge/br_if.c
>>> @@ -24,6 +24,7 @@
>>>  #include <linux/slab.h>
>>>  #include <net/sock.h>
>>>  #include <linux/if_vlan.h>
>>> +#include <net/switchdev.h>
>>>
>>>  #include "br_private.h"
>>>
>>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>>>         list_del_rcu(&p->list);
>>>
>>>         br_fdb_delete_by_port(br, p, 0, 1);
>>> +       switchdev_flush_deferred();
>>> +
>>
>>This potentially flushes other (valid) work on the deferred queue not
>>related to FDB del.
>
> flush_deferred is implemented by flush_workqueue. This enforces the
> workqueue to be processed heve and sleeps until it is done. Nothing is
> lost.

My bad, I saw "flush" and assumed everything went down the toilet.

Acked-by: Scott Feldman <sfeldma@gmail.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  6:21         ` Jiri Pirko
@ 2015-10-13  6:53           ` Scott Feldman
  2015-10-13  7:30             ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  6:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On Mon, Oct 12, 2015 at 11:21 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastabend@gmail.com wrote:
>>On 15-10-12 10:44 PM, Jiri Pirko wrote:
>>> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>>> or if he has to defer the att_set processing for later.
>>>>>
>>>>> This also allows drivers to sleep inside attr_set and report operation
>>>>> status back to switchdev core. Switchdev core then warns if status is
>>>>> not ok, instead of silent errors happening in drivers.
>>>>>
>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>> ---
>>>>>  include/net/switchdev.h   |   1 +
>>>>>  net/bridge/br_stp.c       |   3 +-
>>>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>>>
>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>> index d2879f2..6b109e4 100644
>>>>> --- a/include/net/switchdev.h
>>>>> +++ b/include/net/switchdev.h
>>>>> @@ -17,6 +17,7 @@
>>>>>
>>>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>>>
>>>>>  struct switchdev_trans_item {
>>>>>         struct list_head list;
>>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>>> index db6d243de..80c34d7 100644
>>>>> --- a/net/bridge/br_stp.c
>>>>> +++ b/net/bridge/br_stp.c
>>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>>  {
>>>>>         struct switchdev_attr attr = {
>>>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>>> +               .flags = SWITCHDEV_F_DEFER,
>>>>>                 .u.stp_state = state,
>>>>>         };
>>>>>         int err;
>>>>>
>>>>>         p->state = state;
>>>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>>>> -       if (err && err != -EOPNOTSUPP)
>>>>> +       if (err)
>>>>
>>>> This looks like a problem as now all other non-switchdev ports will
>>>> get an WARN in the log when STP state changes.  We should only WARN if
>>>> there was an err and the err is not -EOPNOTSUPP.
>>>
>>> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.
>>>
>>>
>>>>
>>>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>>>  }
>>>>
>>>> <snip>
>>>>
>>>>>  struct switchdev_attr_set_work {
>>>>>         struct work_struct work;
>>>>>         struct net_device *dev;
>>>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>>>  {
>>>>>         struct switchdev_attr_set_work *asw =
>>>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>>>> +       bool rtnl_locked = rtnl_is_locked();
>>>>>         int err;
>>>>>
>>>>> -       rtnl_lock();
>>>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>>>> +       if (!rtnl_locked)
>>>>> +               rtnl_lock();
>>>>
>>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>>> not wait to grab it here ourselves, and proceed as if we have the
>>>> lock.  But what if that someone else releases the lock in the middle
>>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>>> unconditionally wait and grab the lock.  We need to block anything
>>>>from moving while we do the attr set.
>>>
>>> Why would someone we call (driver) return the lock? In that case, he is
>>> buggy and should be fixed.
>>>
>>> This hunk only ensures we have rtnl_lock. If not, we take it here. We do
>>> not take it unconditionally because we may already have it, for example
>>> if caller of switchdev_flush_deferred holds rtnl_lock.
>>>
>>
>>This is where you lost me. How do you know another core doesn't happen
>>to have the lock when you hit this code path? Maybe someone is running
>>an ethtool command on another core or something.
>
> You are right. The same problem exists currently in switchdev_port_attr_set.

You are right as in you'll change this back to unconditional grabbing
of rtnl_lock?  I don't follow how this problem currently exists as
current code does an unconditional grab of rtnl_lock.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.
  2015-10-13  6:25       ` Jiri Pirko
@ 2015-10-13  6:55         ` Scott Feldman
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Feldman @ 2015-10-13  6:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	john fastabend, David Laight, stephen

On Mon, Oct 12, 2015 at 11:25 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Oct 13, 2015 at 06:02:28AM CEST, sfeldma@gmail.com wrote:
>>On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> No need to avoid sleeping in switchdev callbacks now, as the switchdev
>>> core allows it.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---
>>>  drivers/net/ethernet/rocker/rocker.c | 7 +++----
>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
>>> index bb956a5..9629c5b5 100644
>>> --- a/drivers/net/ethernet/rocker/rocker.c
>>> +++ b/drivers/net/ethernet/rocker/rocker.c
>>> @@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port *rocker_port,
>>>             rocker_port->stp_state == BR_STATE_FORWARDING)
>>>                 return 0;
>>>
>>> -       flags |= ROCKER_OP_FLAG_REMOVE;
>>> +       flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
>>
>>I understand the two changes below where you're removing NOWAIT, but
>>here you're adding NOWAIT which I'm not sure how that is related to
>>the switchdev core changes.  Is this two patches?
>
> I removed ROCKER_OP_FLAG_NOWAIT from attr_set. But here in
> rocker_port_fdb_flush, which is called from attr_set, we call
> rocker_port_fdb_learn with spin lock. Therefore I had to put
> ROCKER_OP_FLAG_NOWAIT here. Before ROCKER_OP_FLAG_NOWAIT removal from
> attr_set this was there already.

Gotcha.

Acked-by: Scott Feldman <sfeldma@gmail.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  6:53           ` Scott Feldman
@ 2015-10-13  7:30             ` Jiri Pirko
  2015-10-13 14:07               ` Scott Feldman
  0 siblings, 1 reply; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13  7:30 UTC (permalink / raw)
  To: Scott Feldman
  Cc: John Fastabend, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

Tue, Oct 13, 2015 at 08:53:45AM CEST, sfeldma@gmail.com wrote:
>On Mon, Oct 12, 2015 at 11:21 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastabend@gmail.com wrote:
>>>On 15-10-12 10:44 PM, Jiri Pirko wrote:
>>>> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>>>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>>
>>>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>>>> or if he has to defer the att_set processing for later.
>>>>>>
>>>>>> This also allows drivers to sleep inside attr_set and report operation
>>>>>> status back to switchdev core. Switchdev core then warns if status is
>>>>>> not ok, instead of silent errors happening in drivers.
>>>>>>
>>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>>> ---
>>>>>>  include/net/switchdev.h   |   1 +
>>>>>>  net/bridge/br_stp.c       |   3 +-
>>>>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>>>>
>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>> index d2879f2..6b109e4 100644
>>>>>> --- a/include/net/switchdev.h
>>>>>> +++ b/include/net/switchdev.h
>>>>>> @@ -17,6 +17,7 @@
>>>>>>
>>>>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>>>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>>>>
>>>>>>  struct switchdev_trans_item {
>>>>>>         struct list_head list;
>>>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>>>> index db6d243de..80c34d7 100644
>>>>>> --- a/net/bridge/br_stp.c
>>>>>> +++ b/net/bridge/br_stp.c
>>>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>>>  {
>>>>>>         struct switchdev_attr attr = {
>>>>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>>>> +               .flags = SWITCHDEV_F_DEFER,
>>>>>>                 .u.stp_state = state,
>>>>>>         };
>>>>>>         int err;
>>>>>>
>>>>>>         p->state = state;
>>>>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>>>>> -       if (err && err != -EOPNOTSUPP)
>>>>>> +       if (err)
>>>>>
>>>>> This looks like a problem as now all other non-switchdev ports will
>>>>> get an WARN in the log when STP state changes.  We should only WARN if
>>>>> there was an err and the err is not -EOPNOTSUPP.
>>>>
>>>> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.
>>>>
>>>>
>>>>>
>>>>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>>>>  }
>>>>>
>>>>> <snip>
>>>>>
>>>>>>  struct switchdev_attr_set_work {
>>>>>>         struct work_struct work;
>>>>>>         struct net_device *dev;
>>>>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>>>>  {
>>>>>>         struct switchdev_attr_set_work *asw =
>>>>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>>>>> +       bool rtnl_locked = rtnl_is_locked();
>>>>>>         int err;
>>>>>>
>>>>>> -       rtnl_lock();
>>>>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>>>>> +       if (!rtnl_locked)
>>>>>> +               rtnl_lock();
>>>>>
>>>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>>>> not wait to grab it here ourselves, and proceed as if we have the
>>>>> lock.  But what if that someone else releases the lock in the middle
>>>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>>>> unconditionally wait and grab the lock.  We need to block anything
>>>>>from moving while we do the attr set.
>>>>
>>>> Why would someone we call (driver) return the lock? In that case, he is
>>>> buggy and should be fixed.
>>>>
>>>> This hunk only ensures we have rtnl_lock. If not, we take it here. We do
>>>> not take it unconditionally because we may already have it, for example
>>>> if caller of switchdev_flush_deferred holds rtnl_lock.
>>>>
>>>
>>>This is where you lost me. How do you know another core doesn't happen
>>>to have the lock when you hit this code path? Maybe someone is running
>>>an ethtool command on another core or something.
>>
>> You are right. The same problem exists currently in switchdev_port_attr_set.
>
>You are right as in you'll change this back to unconditional grabbing
>of rtnl_lock?  I don't follow how this problem currently exists as
>current code does an unconditional grab of rtnl_lock.

  cpu1				cpu2
  				rtnl_lock()
switchdev_port_attr_set
  !rtnl_is_locked() == false
  switchdev_trans_init
 				rtnl_unlock()
  __switchdev_port_attr_set

now __switchdev_port_attr_set is called without rtnl_lock.

Would make sense to introduce rtnl_is_locked_by_me() or something.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13  7:30             ` Jiri Pirko
@ 2015-10-13 14:07               ` Scott Feldman
  2015-10-13 14:39                 ` Jiri Pirko
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Feldman @ 2015-10-13 14:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

On Tue, Oct 13, 2015 at 12:30 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Oct 13, 2015 at 08:53:45AM CEST, sfeldma@gmail.com wrote:
>>On Mon, Oct 12, 2015 at 11:21 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastabend@gmail.com wrote:
>>>>On 15-10-12 10:44 PM, Jiri Pirko wrote:
>>>>> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>>>>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>>>
>>>>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>>>>> or if he has to defer the att_set processing for later.
>>>>>>>
>>>>>>> This also allows drivers to sleep inside attr_set and report operation
>>>>>>> status back to switchdev core. Switchdev core then warns if status is
>>>>>>> not ok, instead of silent errors happening in drivers.
>>>>>>>
>>>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>>>> ---
>>>>>>>  include/net/switchdev.h   |   1 +
>>>>>>>  net/bridge/br_stp.c       |   3 +-
>>>>>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>>>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>> index d2879f2..6b109e4 100644
>>>>>>> --- a/include/net/switchdev.h
>>>>>>> +++ b/include/net/switchdev.h
>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>
>>>>>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>>>>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>>>>>
>>>>>>>  struct switchdev_trans_item {
>>>>>>>         struct list_head list;
>>>>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>>>>> index db6d243de..80c34d7 100644
>>>>>>> --- a/net/bridge/br_stp.c
>>>>>>> +++ b/net/bridge/br_stp.c
>>>>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>>>>  {
>>>>>>>         struct switchdev_attr attr = {
>>>>>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>>>>> +               .flags = SWITCHDEV_F_DEFER,
>>>>>>>                 .u.stp_state = state,
>>>>>>>         };
>>>>>>>         int err;
>>>>>>>
>>>>>>>         p->state = state;
>>>>>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>>>>>> -       if (err && err != -EOPNOTSUPP)
>>>>>>> +       if (err)
>>>>>>
>>>>>> This looks like a problem as now all other non-switchdev ports will
>>>>>> get an WARN in the log when STP state changes.  We should only WARN if
>>>>>> there was an err and the err is not -EOPNOTSUPP.
>>>>>
>>>>> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.
>>>>>
>>>>>
>>>>>>
>>>>>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>>>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>>>>>  }
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>>  struct switchdev_attr_set_work {
>>>>>>>         struct work_struct work;
>>>>>>>         struct net_device *dev;
>>>>>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>>>>>  {
>>>>>>>         struct switchdev_attr_set_work *asw =
>>>>>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>>>>>> +       bool rtnl_locked = rtnl_is_locked();
>>>>>>>         int err;
>>>>>>>
>>>>>>> -       rtnl_lock();
>>>>>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>>>>>> +       if (!rtnl_locked)
>>>>>>> +               rtnl_lock();
>>>>>>
>>>>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>>>>> not wait to grab it here ourselves, and proceed as if we have the
>>>>>> lock.  But what if that someone else releases the lock in the middle
>>>>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>>>>> unconditionally wait and grab the lock.  We need to block anything
>>>>>>from moving while we do the attr set.
>>>>>
>>>>> Why would someone we call (driver) return the lock? In that case, he is
>>>>> buggy and should be fixed.
>>>>>
>>>>> This hunk only ensures we have rtnl_lock. If not, we take it here. We do
>>>>> not take it unconditionally because we may already have it, for example
>>>>> if caller of switchdev_flush_deferred holds rtnl_lock.
>>>>>
>>>>
>>>>This is where you lost me. How do you know another core doesn't happen
>>>>to have the lock when you hit this code path? Maybe someone is running
>>>>an ethtool command on another core or something.
>>>
>>> You are right. The same problem exists currently in switchdev_port_attr_set.
>>
>>You are right as in you'll change this back to unconditional grabbing
>>of rtnl_lock?  I don't follow how this problem currently exists as
>>current code does an unconditional grab of rtnl_lock.
>
>   cpu1                          cpu2
>                                 rtnl_lock()
> switchdev_port_attr_set
>   !rtnl_is_locked() == false
>   switchdev_trans_init
>                                 rtnl_unlock()
>   __switchdev_port_attr_set
>
> now __switchdev_port_attr_set is called without rtnl_lock.

Got it.  Another example of trying to guess context and getting it
wrong.  This is why I like your DEFERRED option so caller can be
explicit.

> Would make sense to introduce rtnl_is_locked_by_me() or something.

Is it sufficient to simply call rtnl_lock() in your deferred context?
You can sleep there and that way there is no question who has the
lock.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred
  2015-10-13 14:07               ` Scott Feldman
@ 2015-10-13 14:39                 ` Jiri Pirko
  0 siblings, 0 replies; 32+ messages in thread
From: Jiri Pirko @ 2015-10-13 14:39 UTC (permalink / raw)
  To: Scott Feldman
  Cc: John Fastabend, Netdev, David S. Miller, Ido Schimmel, Elad Raz,
	Florian Fainelli, Guenter Roeck, Vivien Didelot, andrew,
	David Laight, stephen

Tue, Oct 13, 2015 at 04:07:31PM CEST, sfeldma@gmail.com wrote:
>On Tue, Oct 13, 2015 at 12:30 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Oct 13, 2015 at 08:53:45AM CEST, sfeldma@gmail.com wrote:
>>>On Mon, Oct 12, 2015 at 11:21 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Tue, Oct 13, 2015 at 08:03:46AM CEST, john.fastabend@gmail.com wrote:
>>>>>On 15-10-12 10:44 PM, Jiri Pirko wrote:
>>>>>> Tue, Oct 13, 2015 at 04:52:42AM CEST, sfeldma@gmail.com wrote:
>>>>>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>>>>
>>>>>>>> Caller should know if he can call attr_set directly (when holding RTNL)
>>>>>>>> or if he has to defer the att_set processing for later.
>>>>>>>>
>>>>>>>> This also allows drivers to sleep inside attr_set and report operation
>>>>>>>> status back to switchdev core. Switchdev core then warns if status is
>>>>>>>> not ok, instead of silent errors happening in drivers.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>>>>> ---
>>>>>>>>  include/net/switchdev.h   |   1 +
>>>>>>>>  net/bridge/br_stp.c       |   3 +-
>>>>>>>>  net/switchdev/switchdev.c | 107 ++++++++++++++++++++++++----------------------
>>>>>>>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>>>>>>>> index d2879f2..6b109e4 100644
>>>>>>>> --- a/include/net/switchdev.h
>>>>>>>> +++ b/include/net/switchdev.h
>>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>>
>>>>>>>>  #define SWITCHDEV_F_NO_RECURSE         BIT(0)
>>>>>>>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPP    BIT(1)
>>>>>>>> +#define SWITCHDEV_F_DEFER              BIT(2)
>>>>>>>>
>>>>>>>>  struct switchdev_trans_item {
>>>>>>>>         struct list_head list;
>>>>>>>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>>>>>>>> index db6d243de..80c34d7 100644
>>>>>>>> --- a/net/bridge/br_stp.c
>>>>>>>> +++ b/net/bridge/br_stp.c
>>>>>>>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int state)
>>>>>>>>  {
>>>>>>>>         struct switchdev_attr attr = {
>>>>>>>>                 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>>>>>>>> +               .flags = SWITCHDEV_F_DEFER,
>>>>>>>>                 .u.stp_state = state,
>>>>>>>>         };
>>>>>>>>         int err;
>>>>>>>>
>>>>>>>>         p->state = state;
>>>>>>>>         err = switchdev_port_attr_set(p->dev, &attr);
>>>>>>>> -       if (err && err != -EOPNOTSUPP)
>>>>>>>> +       if (err)
>>>>>>>
>>>>>>> This looks like a problem as now all other non-switchdev ports will
>>>>>>> get an WARN in the log when STP state changes.  We should only WARN if
>>>>>>> there was an err and the err is not -EOPNOTSUPP.
>>>>>>
>>>>>> If SWITCHDEV_F_DEFER flag is set, there's only 0 of -ENOMEM.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>                 br_warn(p->br, "error setting offload STP state on port %u(%s)\n",
>>>>>>>>                                 (unsigned int) p->port_no, p->dev->name);
>>>>>>>>  }
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>>  struct switchdev_attr_set_work {
>>>>>>>>         struct work_struct work;
>>>>>>>>         struct net_device *dev;
>>>>>>>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct work_struct *work)
>>>>>>>>  {
>>>>>>>>         struct switchdev_attr_set_work *asw =
>>>>>>>>                 container_of(work, struct switchdev_attr_set_work, work);
>>>>>>>> +       bool rtnl_locked = rtnl_is_locked();
>>>>>>>>         int err;
>>>>>>>>
>>>>>>>> -       rtnl_lock();
>>>>>>>> -       err = switchdev_port_attr_set(asw->dev, &asw->attr);
>>>>>>>> +       if (!rtnl_locked)
>>>>>>>> +               rtnl_lock();
>>>>>>>
>>>>>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>>>>>> not wait to grab it here ourselves, and proceed as if we have the
>>>>>>> lock.  But what if that someone else releases the lock in the middle
>>>>>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>>>>>> unconditionally wait and grab the lock.  We need to block anything
>>>>>>>from moving while we do the attr set.
>>>>>>
>>>>>> Why would someone we call (driver) return the lock? In that case, he is
>>>>>> buggy and should be fixed.
>>>>>>
>>>>>> This hunk only ensures we have rtnl_lock. If not, we take it here. We do
>>>>>> not take it unconditionally because we may already have it, for example
>>>>>> if caller of switchdev_flush_deferred holds rtnl_lock.
>>>>>>
>>>>>
>>>>>This is where you lost me. How do you know another core doesn't happen
>>>>>to have the lock when you hit this code path? Maybe someone is running
>>>>>an ethtool command on another core or something.
>>>>
>>>> You are right. The same problem exists currently in switchdev_port_attr_set.
>>>
>>>You are right as in you'll change this back to unconditional grabbing
>>>of rtnl_lock?  I don't follow how this problem currently exists as
>>>current code does an unconditional grab of rtnl_lock.
>>
>>   cpu1                          cpu2
>>                                 rtnl_lock()
>> switchdev_port_attr_set
>>   !rtnl_is_locked() == false
>>   switchdev_trans_init
>>                                 rtnl_unlock()
>>   __switchdev_port_attr_set
>>
>> now __switchdev_port_attr_set is called without rtnl_lock.
>
>Got it.  Another example of trying to guess context and getting it
>wrong.  This is why I like your DEFERRED option so caller can be
>explicit.
>
>> Would make sense to introduce rtnl_is_locked_by_me() or something.
>
>Is it sufficient to simply call rtnl_lock() in your deferred context?
>You can sleep there and that way there is no question who has the
>lock.

The problem is that caller of flust_deferred may hold the lock already
and then we would deadlock. I'm cooking up v5 with different approach.
Stay tuned.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-10-13 14:39 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-12 17:54 [patch net-next v4 0/7] switchdev: change locking Jiri Pirko
2015-10-12 17:54 ` [patch net-next v4 1/7] switchdev: introduce switchdev workqueue Jiri Pirko
2015-10-13  3:01   ` Scott Feldman
2015-10-12 18:03 ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Jiri Pirko
2015-10-12 18:03   ` [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects Jiri Pirko
2015-10-13  3:01     ` Scott Feldman
2015-10-13  4:44       ` John Fastabend
2015-10-12 18:03   ` [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del Jiri Pirko
2015-10-13  3:08     ` Scott Feldman
2015-10-12 18:03   ` [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn Jiri Pirko
2015-10-13  3:28     ` Scott Feldman
2015-10-13  3:31       ` John Fastabend
2015-10-13  4:19         ` Scott Feldman
2015-10-13  5:16           ` John Fastabend
2015-10-13  6:05       ` Jiri Pirko
2015-10-13  6:46         ` Scott Feldman
2015-10-12 18:03   ` [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks Jiri Pirko
2015-10-13  4:02     ` Scott Feldman
2015-10-13  6:25       ` Jiri Pirko
2015-10-13  6:55         ` Scott Feldman
2015-10-12 18:03   ` [patch net-next v4 7/7] switchdev: assert rtnl mutex when going over lower netdevs Jiri Pirko
2015-10-13  2:52   ` [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred Scott Feldman
2015-10-13  4:40     ` John Fastabend
2015-10-13  5:45       ` Jiri Pirko
2015-10-13  5:48         ` John Fastabend
2015-10-13  5:44     ` Jiri Pirko
2015-10-13  6:03       ` John Fastabend
2015-10-13  6:21         ` Jiri Pirko
2015-10-13  6:53           ` Scott Feldman
2015-10-13  7:30             ` Jiri Pirko
2015-10-13 14:07               ` Scott Feldman
2015-10-13 14:39                 ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.