netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/3] Expose switching attributes via PF_BRIDGE
@ 2012-05-30  3:06 John Fastabend
  2012-05-30  3:07 ` [RFC PATCH v1 1/3] net: create generic bridge ops John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: John Fastabend @ 2012-05-30  3:06 UTC (permalink / raw)
  To: krkumar2, hadi, shemminger, mst, buytenh, eilong
  Cc: sri, gregory.v.rose, netdev, bhutchings, jeffrey.t.kirsher,
	eric.w.multanen

This series decouples the remaining netlink PF_BRIDGE messages
from the bridging module and moves them into rtnetlink proper.
By doing this we can use these netlink messages to handle any
type of bridge and extend the attributes as needed.

I hope this resolves some of the concerns with the DSA patch
below and expect the attached series can be extended to
support the DSA infrastructure as needed:

http://patchwork.ozlabs.org/patch/16578/

Also this should resolve a patch here that tried to expose
the switching modes but did so using a device specific hook:

http://lists.openwall.net/netdev/2012/04/16/10

I've used a hacked version of the 'bridge' tool Stephen
Hemminger submitted as an RFC some months back to test this
the output looks like this:

[root@jf-dev1-dcblab iproute2]# ./br/br bridge show
eth2: bridge mode: VEB		embedded
eth3: bridge mode: VEB		embedded
[root@jf-dev1-dcblab iproute2]# ./br/br bridge mode dev eth2 mode vepa
[root@jf-dev1-dcblab iproute2]# ./br/br bridge show
eth2: bridge mode: VEPA		embedded
eth3: bridge mode: VEB		embedded

I could have just added a ndo op and IFLA_XXX message to
set the switching mode but IMHO this is not going to scale
as more bridging functionality becomes offloaded. The DSA
example is a case where we already have a fully offloaded
switch. Any solution we come up with should support both
embedded switches and SW switches.

Any comments would be appreciated Thanks!

---

John Fastabend (3):
      ixgbe: add setlink, getlink support to ixgbe and ixgbevf
      net: add VEPA, VEB bridge mode
      net: create generic bridge ops


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  100 +++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |    3 
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   10 ++
 include/linux/if_link.h                           |   16 ++
 include/linux/netdevice.h                         |   10 ++
 net/bridge/br_device.c                            |    2 
 net/bridge/br_netlink.c                           |   75 ++---------
 net/bridge/br_private.h                           |    3 
 net/core/rtnetlink.c                              |  137 +++++++++++++++++++++
 9 files changed, 291 insertions(+), 65 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 1/3] net: create generic bridge ops
  2012-05-30  3:06 [RFC PATCH v1 0/3] Expose switching attributes via PF_BRIDGE John Fastabend
@ 2012-05-30  3:07 ` John Fastabend
  2012-05-30  3:07 ` [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode John Fastabend
  2012-05-30  3:07 ` [RFC PATCH v1 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf John Fastabend
  2 siblings, 0 replies; 7+ messages in thread
From: John Fastabend @ 2012-05-30  3:07 UTC (permalink / raw)
  To: krkumar2, hadi, shemminger, mst, buytenh, eilong
  Cc: sri, gregory.v.rose, netdev, bhutchings, jeffrey.t.kirsher,
	eric.w.multanen

The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.

In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.

	ndo_bridge_setlink()
	ndo_bridge_getlink()

This avoids adding many ndo_ops to the net_device but does
require drivers do more nlmsg handling.

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/netdevice.h |   10 ++++++
 net/bridge/br_device.c    |    2 +
 net/bridge/br_netlink.c   |   73 ++++++++----------------------------------
 net/bridge/br_private.h   |    3 ++
 net/core/rtnetlink.c      |   78 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 106 insertions(+), 60 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e7fd468..a307d77 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -916,6 +916,10 @@ struct netdev_fcoe_hbainfo {
  *		       struct net_device *dev, int idx)
  *	Used to add FDB entries to dump requests. Implementers should add
  *	entries to skb and update idx with the number of entries.
+ *
+ * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
+ * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
+ *			     struct net_device *dev)
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1025,6 +1029,12 @@ struct net_device_ops {
 						struct netlink_callback *cb,
 						struct net_device *dev,
 						int idx);
+
+	int			(*ndo_bridge_setlink)(struct net_device *dev,
+						      struct nlmsghdr *nlh);
+	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
+						      u32 pid, u32 seq,
+						      struct net_device *dev);
 };
 
 /*
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 929e48aed..e942180 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -320,6 +320,8 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_fdb_add		 = br_fdb_add,
 	.ndo_fdb_del		 = br_fdb_delete,
 	.ndo_fdb_dump		 = br_fdb_dump,
+	.ndo_bridge_getlink	 = br_getlink,
+	.ndo_bridge_setlink	 = br_setlink,
 };
 
 static void br_dev_free(struct net_device *dev)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 2080485..f207234 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -111,54 +111,33 @@ errout:
 /*
  * Dump information about all ports, in response to GETLINK
  */
-static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
+int br_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+	       struct net_device *dev)
 {
-	struct net *net = sock_net(skb->sk);
-	struct net_device *dev;
-	int idx;
-
-	idx = 0;
-	rcu_read_lock();
-	for_each_netdev_rcu(net, dev) {
-		struct net_bridge_port *port = br_port_get_rcu(dev);
-
-		/* not a bridge port */
-		if (!port || idx < cb->args[0])
-			goto skip;
-
-		if (br_fill_ifinfo(skb, port,
-				   NETLINK_CB(cb->skb).pid,
-				   cb->nlh->nlmsg_seq, RTM_NEWLINK,
-				   NLM_F_MULTI) < 0)
-			break;
-skip:
-		++idx;
-	}
-	rcu_read_unlock();
-	cb->args[0] = idx;
+	int err = 0;
+	struct net_bridge_port *port = br_port_get_rcu(dev);
+
+	/* not a bridge port */
+	if (!port)
+		goto out;
 
-	return skb->len;
+	err = br_fill_ifinfo(skb, port, pid, seq, RTM_NEWLINK, NLM_F_MULTI);
+out:
+	return err;
 }
 
 /*
  * Change state of port (ie from forwarding to blocking etc)
  * Used by spanning tree in user space.
  */
-static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
+int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
 {
-	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
 	struct nlattr *protinfo;
-	struct net_device *dev;
 	struct net_bridge_port *p;
 	u8 new_state;
 
-	if (nlmsg_len(nlh) < sizeof(*ifm))
-		return -EINVAL;
-
 	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_family != AF_BRIDGE)
-		return -EPFNOSUPPORT;
 
 	protinfo = nlmsg_find_attr(nlh, sizeof(*ifm), IFLA_PROTINFO);
 	if (!protinfo || nla_len(protinfo) < sizeof(u8))
@@ -168,10 +147,6 @@ static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 	if (new_state > BR_STATE_BLOCKING)
 		return -EINVAL;
 
-	dev = __dev_get_by_index(net, ifm->ifi_index);
-	if (!dev)
-		return -ENODEV;
-
 	p = br_port_get_rtnl(dev);
 	if (!p)
 		return -EINVAL;
@@ -218,29 +193,7 @@ static struct rtnl_link_ops br_link_ops __read_mostly = {
 
 int __init br_netlink_init(void)
 {
-	int err;
-
-	err = rtnl_link_register(&br_link_ops);
-	if (err < 0)
-		goto err1;
-
-	err = __rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL,
-			      br_dump_ifinfo, NULL);
-	if (err)
-		goto err2;
-	err = __rtnl_register(PF_BRIDGE, RTM_SETLINK,
-			      br_rtm_setlink, NULL, NULL);
-	if (err)
-		goto err3;
-
-	return 0;
-
-err3:
-	rtnl_unregister_all(PF_BRIDGE);
-err2:
-	rtnl_link_unregister(&br_link_ops);
-err1:
-	return err;
+	return rtnl_link_register(&br_link_ops);
 }
 
 void __exit br_netlink_fini(void)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1a8ad4f..659907c 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -552,6 +552,9 @@ extern int (*br_fdb_test_addr_hook)(struct net_device *dev, unsigned char *addr)
 extern int br_netlink_init(void);
 extern void br_netlink_fini(void);
 extern void br_ifinfo_notify(int event, struct net_bridge_port *port);
+extern int br_setlink(struct net_device *dev, struct nlmsghdr *nlmsg);
+extern int br_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+		      struct net_device *dev);
 
 #ifdef CONFIG_SYSFS
 /* br_sysfs_if.c */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 21318d1..de6c371 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2239,6 +2239,81 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	int idx = 0;
+	u32 pid = NETLINK_CB(cb->skb).pid;
+	u32 seq = cb->nlh->nlmsg_seq;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		const struct net_device_ops *ops = dev->netdev_ops;
+		struct net_device *master = dev->master;
+
+		if (idx < cb->args[0])
+			continue;
+
+		if (master && master->netdev_ops->ndo_bridge_getlink) {
+			const struct net_device_ops *bops = master->netdev_ops;
+			int err = bops->ndo_bridge_getlink(skb, pid, seq, dev);
+
+			if (err < 0)
+				break;
+			else
+				idx++;
+		}
+
+		if (ops->ndo_bridge_getlink) {
+			int err = ops->ndo_bridge_getlink(skb, pid, seq, dev);
+
+			if (err < 0)
+				break;
+			else
+				idx++;
+		}
+	}
+	rcu_read_unlock();
+	cb->args[0] = idx;
+
+	return skb->len;
+}
+
+static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
+			       void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ifinfomsg *ifm;
+	struct net_device *dev;
+	int err = -EINVAL;
+
+	if (nlmsg_len(nlh) < sizeof(*ifm))
+		return -EINVAL;
+
+	ifm = nlmsg_data(nlh);
+	if (ifm->ifi_family != AF_BRIDGE)
+		return -EPFNOSUPPORT;
+
+	dev = __dev_get_by_index(net, ifm->ifi_index);
+	if (!dev) {
+		pr_info("PF_BRIDGE: RTM_SETLINK with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	if (dev->master && dev->master->netdev_ops->ndo_bridge_setlink) {
+		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
+		if (err)
+			goto out;
+	}
+
+	if (dev->netdev_ops->ndo_bridge_setlink)
+		err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
+
+out:
+	return err;
+}
+
 /* Protected by RTNL sempahore.  */
 static struct rtattr **rta_buf;
 static int rtattr_max;
@@ -2415,5 +2490,8 @@ void __init rtnetlink_init(void)
 	rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, rtnl_fdb_add, NULL, NULL);
 	rtnl_register(PF_BRIDGE, RTM_DELNEIGH, rtnl_fdb_del, NULL, NULL);
 	rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, rtnl_fdb_dump, NULL);
+
+	rtnl_register(PF_BRIDGE, RTM_GETLINK, NULL, rtnl_bridge_getlink, NULL);
+	rtnl_register(PF_BRIDGE, RTM_SETLINK, rtnl_bridge_setlink, NULL, NULL);
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode
  2012-05-30  3:06 [RFC PATCH v1 0/3] Expose switching attributes via PF_BRIDGE John Fastabend
  2012-05-30  3:07 ` [RFC PATCH v1 1/3] net: create generic bridge ops John Fastabend
@ 2012-05-30  3:07 ` John Fastabend
  2012-06-04 14:59   ` Krishna Kumar2
  2012-05-30  3:07 ` [RFC PATCH v1 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf John Fastabend
  2 siblings, 1 reply; 7+ messages in thread
From: John Fastabend @ 2012-05-30  3:07 UTC (permalink / raw)
  To: krkumar2, hadi, shemminger, mst, buytenh, eilong
  Cc: sri, gregory.v.rose, netdev, bhutchings, jeffrey.t.kirsher,
	eric.w.multanen

Hardware switches may support enabling and disabling the
loopback switch which puts the device in a VEPA mode defined
in the IEEE 802.1Qbg specification. In this mode frames are
not switched in the hardware but sent directly to the switch.
SR-IOV capabable NICs will likely support this mode I am
aware of at least two such devices.

This patch adds an additional IFLA_BRIDGE_MODE attribute
that can be set and dumped via the PF_BRIDGE:{SET|GET}LINK
operations. Also anticipating bridge attributes that may
be common for both embedded bridges and software bridges
this adds a flags attribute IFLA_BRIDGE_FLAGS currently
used to determine if the IFLA_BRIDGE command or event is
being generated to/from an embedded bridge or software
bridge. Finally, the event generation is pulled out of
the bridge module and into rtnetlink proper.

For example using the macvlan driver in VEPA mode on top of
an embedded switch requires putting the embedded switch into
a VEPA mode to get the expected results.

	--------  --------
        | VEPA |  | VEPA |       <-- macvlan vepa edge relays
        --------  --------
           |        |
           |        |
        ------------------
        |      VEPA      |       <-- embedded switch in NIC
        ------------------
                |
                |
        -------------------
        | external switch |      <-- shiny new physical
	-------------------          switch with VEPA support

A packet sent from the macvlan VEPA at the top could be
loopbacked on the embedded switch and never seen by the
external switch. So in order for this to work the embedded
switch needs to be set in the VEPA state via the above
described commands.

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/if_link.h |   16 ++++++++++
 net/bridge/br_netlink.c |    2 -
 net/core/rtnetlink.c    |   73 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index f715750..30489e5 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -139,6 +139,7 @@ enum {
 	IFLA_NET_NS_FD,
 	IFLA_EXT_MASK,		/* Extended info mask, VFs, etc */
 	IFLA_PROMISCUITY,	/* Promiscuity count: > 0 means acts PROMISC */
+	IFLA_BRIDGE,		/* Bridge attributes */
 #define IFLA_PROMISCUITY IFLA_PROMISCUITY
 	__IFLA_MAX
 };
@@ -396,4 +397,19 @@ struct ifla_port_vsi {
 	__u8 pad[3];
 };
 
+/* Bridge Flags */
+#define BRIDGE_FLAGS_MASTER	0	/* Bridge command to/from master */
+#define BRIDGE_FLAGS_SELF	1	/* Bridge command to/from lowerdev */
+
+#define BRIDGE_MODE_VEB		0	/* Default loopback mode */
+#define BRIDGE_MODE_VEPA	1	/* 802.1Qbg defined VEPA mode */
+
+/* Bridge management nested attributes */
+enum {
+	IFLA_BRIDGE_FLAGS,
+	IFLA_BRIDGE_MODE,
+	__IFLA_BRIDGE_MAX,
+};
+#define IFLA_BRIDGE_MAX (__IFLA_BRIDGE_MAX - 1)
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index f207234..8edbe0d 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -166,8 +166,6 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
 	br_port_state_selection(p->br);
 	spin_unlock_bh(&p->br->lock);
 
-	br_ifinfo_notify(RTM_NEWLINK, p);
-
 	return 0;
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index de6c371..9cd50ab 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1121,6 +1121,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_AF_SPEC]		= { .type = NLA_NESTED },
 	[IFLA_EXT_MASK]		= { .type = NLA_U32 },
 	[IFLA_PROMISCUITY]	= { .type = NLA_U32 },
+	[IFLA_BRIDGE]		= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -1158,6 +1159,11 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
 	[IFLA_PORT_RESPONSE]	= { .type = NLA_U16, },
 };
 
+static const struct nla_policy bridge_policy[IFLA_BRIDGE_MAX + 1] = {
+	[IFLA_BRIDGE_FLAGS]	= { .type = NLA_U16 },
+	[IFLA_BRIDGE_MODE]	= { .type = NLA_U16 },
+};
+
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;
@@ -2280,13 +2286,60 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+static inline size_t bridge_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
+		+ nla_total_size(IFNAMSIZ)	/* IFLA_IFNAME */
+		+ nla_total_size(MAX_ADDR_LEN)	/* IFLA_ADDRESS */
+		+ nla_total_size(sizeof(u32))	/* IFLA_MASTER */
+		+ nla_total_size(sizeof(u32))	/* IFLA_MTU */
+		+ nla_total_size(sizeof(u32))	/* IFLA_LINK */
+		+ nla_total_size(sizeof(u32))	/* IFLA_OPERSTATE */
+		+ nla_total_size(sizeof(u8))	/* IFLA_PROTINFO */
+		+ nla_total_size(sizeof(struct nlattr))	/* IFLA_BRIDGE */
+		+ nla_total_size(sizeof(u16))	/* IFLA_BRIDGE_FLAGS */
+		+ nla_total_size(sizeof(u16));	/* IFLA_BRIDGE_MODE */
+}
+
+static int rtnl_bridge_notify(struct net_device *dev, u16 flags)
+{
+	struct net *net = dev_net(dev);
+	struct net_device *master = dev->master;
+	struct sk_buff *skb;
+	int err = -EOPNOTSUPP;
+
+	skb = nlmsg_new(bridge_nlmsg_size(), GFP_ATOMIC);
+	if (!skb) {
+		err = -ENOMEM;
+		goto errout;
+	}
+
+	if (!flags && master && master->netdev_ops->ndo_bridge_getlink)
+		err = master->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
+	else if (dev->netdev_ops->ndo_bridge_getlink)
+		err = dev->netdev_ops->ndo_bridge_getlink(skb, 0, 0, dev);
+
+	if (err < 0)
+		goto errout;
+
+	rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+	return 0;
+errout:
+	WARN_ON(err == -EMSGSIZE);
+	kfree_skb(skb);
+	rtnl_set_sk_err(net, RTNLGRP_LINK, err);
+	return err;
+}
+
 static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       void *arg)
 {
 	struct net *net = sock_net(skb->sk);
 	struct ifinfomsg *ifm;
 	struct net_device *dev;
-	int err = -EINVAL;
+	struct nlattr *bridge, *attr;
+	int rem, err = -EOPNOTSUPP;
+	u16 flags = 0;
 
 	if (nlmsg_len(nlh) < sizeof(*ifm))
 		return -EINVAL;
@@ -2301,16 +2354,22 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -ENODEV;
 	}
 
-	if (dev->master && dev->master->netdev_ops->ndo_bridge_setlink) {
-		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
-		if (err)
-			goto out;
+	bridge = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_BRIDGE);
+	nla_for_each_nested(attr, bridge, rem) {
+		if (nla_type(attr) == IFLA_BRIDGE_FLAGS)
+			flags = nla_get_u16(attr);
 	}
 
-	if (dev->netdev_ops->ndo_bridge_setlink)
+	if (!flags && dev->master &&
+	    dev->master->netdev_ops->ndo_bridge_setlink)
+		err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
+	else if ((flags & BRIDGE_FLAGS_SELF) &&
+		   dev->netdev_ops->ndo_bridge_setlink)
 		err = dev->netdev_ops->ndo_bridge_setlink(dev, nlh);
 
-out:
+	/* Generate event to notify upper layer of bridge change */
+	if (!err)
+		err = rtnl_bridge_notify(dev, flags);
 	return err;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v1 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf
  2012-05-30  3:06 [RFC PATCH v1 0/3] Expose switching attributes via PF_BRIDGE John Fastabend
  2012-05-30  3:07 ` [RFC PATCH v1 1/3] net: create generic bridge ops John Fastabend
  2012-05-30  3:07 ` [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode John Fastabend
@ 2012-05-30  3:07 ` John Fastabend
  2 siblings, 0 replies; 7+ messages in thread
From: John Fastabend @ 2012-05-30  3:07 UTC (permalink / raw)
  To: krkumar2, hadi, shemminger, mst, buytenh, eilong
  Cc: sri, gregory.v.rose, netdev, bhutchings, jeffrey.t.kirsher,
	eric.w.multanen

This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.

Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |  100 ++++++++++++++++++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |    3 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   10 ++
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bf20457..55dfb06 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3144,7 +3144,6 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), 0);
 	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), (1 << vf_shift));
 	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), 0);
-	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 
 	/* Map PF MAC address in RAR Entry 0 to first pool following VFs */
 	hw->mac.ops.set_vmdq(hw, 0, adapter->num_vfs);
@@ -3158,8 +3157,6 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	gcr_ext |= IXGBE_GCR_EXT_VT_MODE_64;
 	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, gcr_ext);
 
-	/* enable Tx loopback for VF/PF communication */
-	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 	/* Enable MAC Anti-Spoofing */
 	hw->mac.ops.set_mac_anti_spoofing(hw,
 					   (adapter->num_vfs != 0),
@@ -6844,6 +6841,101 @@ static int ixgbe_ndo_fdb_dump(struct sk_buff *skb,
 	return idx;
 }
 
+static int ixgbe_ndo_bridge_setlink(struct net_device *dev,
+				    struct nlmsghdr *nlh)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct nlattr *attr, *bridge;
+	int rem;
+
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return -EOPNOTSUPP;
+
+	bridge = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_BRIDGE);
+
+	nla_for_each_nested(attr, bridge, rem) {
+		__u16 mode;
+		u32 reg = 0;
+
+		if (nla_type(attr) != IFLA_BRIDGE_MODE)
+			continue;
+
+		mode = nla_get_u16(attr);
+		if (mode == BRIDGE_MODE_VEPA)
+			reg = 0;
+		else if (mode == BRIDGE_MODE_VEB)
+			reg = IXGBE_PFDTXGSWC_VT_LBEN;
+		else
+			return -EINVAL;
+
+		IXGBE_WRITE_REG(&adapter->hw, IXGBE_PFDTXGSWC, reg);
+
+		e_info(drv, "enabling bridge mode: %s\n",
+			mode == BRIDGE_MODE_VEPA ? "VEPA" : "VEB");
+	}
+
+	return 0;
+}
+
+static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
+				    struct net_device *dev)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct nlmsghdr *nlh;
+	struct ifinfomsg *ifm;
+	struct nlattr *bridge;
+	u8 operstate = netif_running(dev) ? dev->operstate : IF_OPER_DOWN;
+	u16 bridge_mode;
+
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return 0;
+
+	nlh = nlmsg_put(skb, pid, seq, RTM_NEWLINK, sizeof(*ifm), NLM_F_MULTI);
+	if (nlh == NULL)
+		return -EMSGSIZE;
+
+	ifm = nlmsg_data(nlh);
+	ifm->ifi_family = AF_BRIDGE;
+	ifm->__ifi_pad = 0;
+	ifm->ifi_type = dev->type;
+	ifm->ifi_index = dev->ifindex;
+	ifm->ifi_flags = dev_get_flags(dev);
+	ifm->ifi_change = 0;
+
+
+	if (nla_put_string(skb, IFLA_IFNAME, dev->name) ||
+	    nla_put_u32(skb, IFLA_MTU, dev->mtu) ||
+	    nla_put_u8(skb, IFLA_OPERSTATE, operstate) ||
+	    (dev->master &&
+	     nla_put_u32(skb, IFLA_MASTER, dev->master->ifindex)) ||
+	    (dev->addr_len &&
+	     nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) ||
+	    (dev->ifindex != dev->iflink &&
+	     nla_put_u32(skb, IFLA_LINK, dev->iflink)))
+		goto nla_put_failure;
+
+	if (IXGBE_READ_REG(&adapter->hw, IXGBE_PFDTXGSWC) & 1)
+		bridge_mode = BRIDGE_MODE_VEB;
+	else
+		bridge_mode = BRIDGE_MODE_VEPA;
+
+	bridge = nla_nest_start(skb, IFLA_BRIDGE);
+	if (!bridge)
+		goto nla_put_failure;
+
+	if (nla_put_u16(skb, IFLA_BRIDGE_FLAGS, BRIDGE_FLAGS_SELF) ||
+	    nla_put_u16(skb, IFLA_BRIDGE_MODE, bridge_mode)) {
+		nla_nest_cancel(skb, bridge);
+		goto nla_put_failure;
+	}
+	nla_nest_end(skb, bridge);
+
+	return nlmsg_end(skb, nlh);
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -6883,6 +6975,8 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
 	.ndo_fdb_del		= ixgbe_ndo_fdb_del,
 	.ndo_fdb_dump		= ixgbe_ndo_fdb_dump,
+	.ndo_bridge_setlink	= ixgbe_ndo_bridge_setlink,
+	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
 };
 
 static void __devinit ixgbe_probe_vf(struct ixgbe_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 2d971d1..bd932c6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -135,6 +135,9 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 		}
 	}
 
+	/* Initialize default switching mode VEB */
+	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+
 	/* If call to enable VFs succeeded then allocate memory
 	 * for per VF control structures.
 	 */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index f69ec42..f8d6f04 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -575,6 +575,16 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		}
 		skb->protocol = eth_type_trans(skb, adapter->netdev);
 
+		/* Workaround hardware that can't do proper VEPA multicast
+		 * source pruning.
+		 */
+		if ((skb->pkt_type & (PACKET_BROADCAST | PACKET_MULTICAST)) &&
+		    !(compare_ether_addr(adapter->netdev->dev_addr,
+					eth_hdr(skb)->h_source))) {
+			dev_kfree_skb_irq(skb);
+			goto next_desc;
+		}
+
 		ixgbevf_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
 
 next_desc:

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode
  2012-05-30  3:07 ` [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode John Fastabend
@ 2012-06-04 14:59   ` Krishna Kumar2
  2012-06-04 16:38     ` John Fastabend
  0 siblings, 1 reply; 7+ messages in thread
From: Krishna Kumar2 @ 2012-06-04 14:59 UTC (permalink / raw)
  To: John Fastabend
  Cc: bhutchings, buytenh, eilong, eric.w.multanen, gregory.v.rose,
	hadi, jeffrey.t.kirsher, mst, netdev, shemminger, sri

John Fastabend <john.r.fastabend@intel.com> wrote on 05/30/2012 08:37:06
AM:

Some comments below:

> +static int rtnl_bridge_notify(struct net_device *dev, u16 flags)
> +{
> ...
> +		 if (!flags && master && master->netdev_ops->
ndo_bridge_getlink)
> +		 		 err = master->netdev_ops->ndo_bridge_getlink(skb,
0, 0, dev);
> +		 else if (dev->netdev_ops->ndo_bridge_getlink)
> +		 		 err = dev->netdev_ops->ndo_bridge_getlink(skb, 0,
0, dev);

I think you should do something like:

        if ((flags == BRIDGE_FLAGS_MASTER) && ...)
                ...

Also you could use BRIDGE_FLAGS_MASTER=1, SELF=2, and use
"if (flags & BRIDGE_FLAGS_MASTER)" for consistency?


+		 if (!err)
+		 		 err = rtnl_bridge_notify(dev, flags);

It is possible to return a reporting error even though
the operation succeeded. Maybe something that could be
done here to indicate that the operation succeeded, or
is that a TODO?

>  static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr
*nlh,
>                  void *arg)
>  {
..
> +   if (!flags && dev->master &&
> +       dev->master->netdev_ops->ndo_bridge_setlink)
> +      err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
> +   else if ((flags & BRIDGE_FLAGS_SELF) &&
> +         dev->netdev_ops->ndo_bridge_setlink)

Same usage of MASTER here.

Thanks,
- KK

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode
  2012-06-04 14:59   ` Krishna Kumar2
@ 2012-06-04 16:38     ` John Fastabend
  2012-06-05  3:11       ` Krishna Kumar2
  0 siblings, 1 reply; 7+ messages in thread
From: John Fastabend @ 2012-06-04 16:38 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: bhutchings, buytenh, eilong, eric.w.multanen, gregory.v.rose,
	hadi, jeffrey.t.kirsher, mst, netdev, shemminger, sri

On 6/4/2012 7:59 AM, Krishna Kumar2 wrote:
> John Fastabend <john.r.fastabend@intel.com> wrote on 05/30/2012 08:37:06
> AM:
> 
> Some comments below:
> 
>> +static int rtnl_bridge_notify(struct net_device *dev, u16 flags)
>> +{
>> ...
>> +		 if (!flags && master && master->netdev_ops->
> ndo_bridge_getlink)
>> +		 		 err = master->netdev_ops->ndo_bridge_getlink(skb,
> 0, 0, dev);
>> +		 else if (dev->netdev_ops->ndo_bridge_getlink)
>> +		 		 err = dev->netdev_ops->ndo_bridge_getlink(skb, 0,
> 0, dev);
> 
> I think you should do something like:
> 
>         if ((flags == BRIDGE_FLAGS_MASTER) && ...)
>                 ...
> 
> Also you could use BRIDGE_FLAGS_MASTER=1, SELF=2, and use
> "if (flags & BRIDGE_FLAGS_MASTER)" for consistency?

OK this is likely a good thing otherwise user space is a
bit tedious when managing FDB and bridge modes. We do still
need the !flags case to support existing applications though,
(we must maintain existing semantics)

if (!flags || (flags & BRIDGE_FLAGS_MASTER) && ...)
	...
else (flags & BRIDGE_FLAGS_SELF)
	...

> 
> 
> +		 if (!err)
> +		 		 err = rtnl_bridge_notify(dev, flags);
> 
> It is possible to return a reporting error even though
> the operation succeeded. Maybe something that could be
> done here to indicate that the operation succeeded, or
> is that a TODO?
> 

The problem is if rtnl_bridge_notify fails due to memory
constraints or otherwise. In this case the set has already
completed successfully as you note so we should not return
any error. This should fix it if I understand your concern
correctly.

	if (!err)
		rtnl_bridge_notify(dev, flags);
	return err;



>>  static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr
> *nlh,
>>                  void *arg)
>>  {
> ..
>> +   if (!flags && dev->master &&
>> +       dev->master->netdev_ops->ndo_bridge_setlink)
>> +      err = dev->master->netdev_ops->ndo_bridge_setlink(dev, nlh);
>> +   else if ((flags & BRIDGE_FLAGS_SELF) &&
>> +         dev->netdev_ops->ndo_bridge_setlink)
> 
> Same usage of MASTER here.

Agreed. Thanks.

> 
> Thanks,
> - KK
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode
  2012-06-04 16:38     ` John Fastabend
@ 2012-06-05  3:11       ` Krishna Kumar2
  0 siblings, 0 replies; 7+ messages in thread
From: Krishna Kumar2 @ 2012-06-05  3:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: bhutchings, buytenh, eilong, eric.w.multanen, gregory.v.rose,
	hadi, jeffrey.t.kirsher, mst, netdev, shemminger, sri

John Fastabend <john.r.fastabend@intel.com> wrote on 06/04/2012 10:08:04
PM:

> > I think you should do something like:
> >
> >         if ((flags == BRIDGE_FLAGS_MASTER) && ...)
> >                 ...
> >
> > Also you could use BRIDGE_FLAGS_MASTER=1, SELF=2, and use
> > "if (flags & BRIDGE_FLAGS_MASTER)" for consistency?
>
> OK this is likely a good thing otherwise user space is a
> bit tedious when managing FDB and bridge modes. We do still
> need the !flags case to support existing applications though,
> (we must maintain existing semantics)
>
> if (!flags || (flags & BRIDGE_FLAGS_MASTER) && ...)
>    ...
> else (flags & BRIDGE_FLAGS_SELF)
>    ...

Yes, looks good.

> > It is possible to return a reporting error even though
> > the operation succeeded. Maybe something that could be
> > done here to indicate that the operation succeeded, or
> > is that a TODO?
> >
>
> The problem is if rtnl_bridge_notify fails due to memory
> constraints or otherwise. In this case the set has already
> completed successfully as you note so we should not return
> any error. This should fix it if I understand your concern
> correctly.
>
>    if (!err)
>       rtnl_bridge_notify(dev, flags);
>    return err;

Yes. I guess user will not hang waiting for a response as it
will pass NLM_F_ACK, which allows netlink_rcv_skb to call
netlink_ack.

Thanks,
- KK

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-06-05  3:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-30  3:06 [RFC PATCH v1 0/3] Expose switching attributes via PF_BRIDGE John Fastabend
2012-05-30  3:07 ` [RFC PATCH v1 1/3] net: create generic bridge ops John Fastabend
2012-05-30  3:07 ` [RFC PATCH v1 2/3] net: add VEPA, VEB bridge mode John Fastabend
2012-06-04 14:59   ` Krishna Kumar2
2012-06-04 16:38     ` John Fastabend
2012-06-05  3:11       ` Krishna Kumar2
2012-05-30  3:07 ` [RFC PATCH v1 3/3] ixgbe: add setlink, getlink support to ixgbe and ixgbevf John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).