All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v3 0/8] Managing the forwarding database(FDB)
@ 2012-04-12 17:06 John Fastabend
  2012-04-12 17:06 ` [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:06 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

The following series is a submission for net-next to allow
embedded switches and other stacked devices other then the
Linux bridge to manage a forwarding database.

Previously posted and discussed here,

http://lists.openwall.net/netdev/2012/03/19/26

Sorry for the thrash on v2 this version resolves the macvlan
patch 8/8 to fix a dev_set_promiscuity() error and add the
flags field to change and get link routines.

v2 addressed feedback from Ben Hutchings resolving a typo
in the multicast add/del routines and improving the error
handling when both NTF_SELF and NTF_MASTER are set.

As always thanks for the feedback and any comments welcom.
I've tested this with 'br' tool published by Stephen Hemminger
soon to be renamed 'bridge' I believe and various traffic
generators mostly pktgen, ping, and netperf.

Thanks!
John

---

Greg Rose (1):
      ixgbe: UTA table incorrectly programmed

John Fastabend (7):
      macvlan: add FDB bridge ops and macvlan flags
      ixgbe: allow RAR table to be updated in promisc mode
      ixgbe: enable FDB netdevice ops
      net: rtnetlink notify events for FDB NTF_SELF adds and deletes
      net: add fdb generic dump routine
      net: addr_list: add exclusive dev_uc_add and dev_mc_add
      net: add generic PF_BRIDGE:RTM_ FDB hooks


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  121 ++++++++---
 drivers/net/macvlan.c                         |   73 ++++++-
 include/linux/if_link.h                       |    3 
 include/linux/if_macvlan.h                    |    1 
 include/linux/neighbour.h                     |    3 
 include/linux/netdevice.h                     |   25 ++
 include/linux/rtnetlink.h                     |    4 
 net/bridge/br_device.c                        |    3 
 net/bridge/br_fdb.c                           |  128 +++---------
 net/bridge/br_netlink.c                       |   12 -
 net/bridge/br_private.h                       |   15 +
 net/core/dev_addr_lists.c                     |   97 ++++++++-
 net/core/rtnetlink.c                          |  267 +++++++++++++++++++++++++
 13 files changed, 579 insertions(+), 173 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
@ 2012-04-12 17:06 ` John Fastabend
  2012-04-12 20:27   ` Ben Hutchings
  2012-04-12 17:06 ` [net-next PATCH v3 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add John Fastabend
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:06 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/neighbour.h |    3 +
 include/linux/netdevice.h |   23 +++++++
 include/linux/rtnetlink.h |    4 +
 net/bridge/br_device.c    |    3 +
 net/bridge/br_fdb.c       |  128 +++++++++-----------------------------
 net/bridge/br_netlink.c   |   12 ----
 net/bridge/br_private.h   |   15 ++++
 net/core/rtnetlink.c      |  152 +++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 228 insertions(+), 112 deletions(-)

diff --git a/include/linux/neighbour.h b/include/linux/neighbour.h
index b188f68..275e5d6 100644
--- a/include/linux/neighbour.h
+++ b/include/linux/neighbour.h
@@ -33,6 +33,9 @@ enum {
 #define NTF_PROXY	0x08	/* == ATF_PUBL */
 #define NTF_ROUTER	0x80
 
+#define NTF_SELF	0x02
+#define NTF_MASTER	0x04
+
 /*
  *	Neighbor Cache Entry States.
  */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbaa20..7600c61 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -54,6 +54,7 @@
 #include <net/netprio_cgroup.h>
 
 #include <linux/netdev_features.h>
+#include <linux/neighbour.h>
 
 struct netpoll_info;
 struct device;
@@ -905,6 +906,16 @@ struct netdev_fcoe_hbainfo {
  *	feature set might be less than what was returned by ndo_fix_features()).
  *	Must return >0 or -errno if it changed dev->features itself.
  *
+ * int (*ndo_fdb_add)(struct ndmsg *ndm, struct net_device *dev,
+ *		      unsigned char *addr, u16 flags)
+ *	Adds an FDB entry to dev for addr.
+ * int (*ndo_fdb_del)(struct ndmsg *ndm, struct net_device *dev,
+ *		      unsigned char *addr)
+ *	Deletes the FDB entry from dev coresponding to addr.
+ * int (*ndo_fdb_dump)(struct sk_buff *skb, struct netlink_callback *cb,
+ *		       struct net_device *dev, int idx)
+ *	Used to add FDB entries to dump requests. Implementers should add
+ *	entries to skb and update idx with the number of entries.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1002,6 +1013,18 @@ struct net_device_ops {
 						    netdev_features_t features);
 	int			(*ndo_neigh_construct)(struct neighbour *n);
 	void			(*ndo_neigh_destroy)(struct neighbour *n);
+
+	int			(*ndo_fdb_add)(struct ndmsg *ndm,
+					       struct net_device *dev,
+					       unsigned char *addr,
+					       u16 flags);
+	int			(*ndo_fdb_del)(struct ndmsg *ndm,
+					       struct net_device *dev,
+					       unsigned char *addr);
+	int			(*ndo_fdb_dump)(struct sk_buff *skb,
+						struct netlink_callback *cb,
+						struct net_device *dev,
+						int idx);
 };
 
 /*
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 577592e..2c1de89 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -801,6 +801,10 @@ rtattr_failure:
 	return table;
 }
 
+extern int ndo_dflt_fdb_dump(struct sk_buff *skb,
+			     struct netlink_callback *cb,
+			     struct net_device *dev,
+			     int idx);
 #endif /* __KERNEL__ */
 
 
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index ba829de..d6e5929 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -317,6 +317,9 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_add_slave		 = br_add_slave,
 	.ndo_del_slave		 = br_del_slave,
 	.ndo_fix_features        = br_fix_features,
+	.ndo_fdb_add		 = br_fdb_add,
+	.ndo_fdb_del		 = br_fdb_delete,
+	.ndo_fdb_dump		 = br_fdb_dump,
 };
 
 static void br_dev_free(struct net_device *dev)
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 80dbce4..5945c54 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -535,44 +535,38 @@ errout:
 }
 
 /* Dump information about entries, in response to GETNEIGH */
-int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+int br_fdb_dump(struct sk_buff *skb,
+		struct netlink_callback *cb,
+		struct net_device *dev,
+		int idx)
 {
-	struct net *net = sock_net(skb->sk);
-	struct net_device *dev;
-	int idx = 0;
-
-	rcu_read_lock();
-	for_each_netdev_rcu(net, dev) {
-		struct net_bridge *br = netdev_priv(dev);
-		int i;
-
-		if (!(dev->priv_flags & IFF_EBRIDGE))
-			continue;
+	struct net_bridge *br = netdev_priv(dev);
+	int i;
 
-		for (i = 0; i < BR_HASH_SIZE; i++) {
-			struct hlist_node *h;
-			struct net_bridge_fdb_entry *f;
+	if (!(dev->priv_flags & IFF_EBRIDGE))
+		goto out;
 
-			hlist_for_each_entry_rcu(f, h, &br->hash[i], hlist) {
-				if (idx < cb->args[0])
-					goto skip;
+	for (i = 0; i < BR_HASH_SIZE; i++) {
+		struct hlist_node *h;
+		struct net_bridge_fdb_entry *f;
 
-				if (fdb_fill_info(skb, br, f,
-						  NETLINK_CB(cb->skb).pid,
-						  cb->nlh->nlmsg_seq,
-						  RTM_NEWNEIGH,
-						  NLM_F_MULTI) < 0)
-					break;
+		hlist_for_each_entry_rcu(f, h, &br->hash[i], hlist) {
+			if (idx < cb->args[0])
+				goto skip;
+
+			if (fdb_fill_info(skb, br, f,
+					  NETLINK_CB(cb->skb).pid,
+					  cb->nlh->nlmsg_seq,
+					  RTM_NEWNEIGH,
+					  NLM_F_MULTI) < 0)
+				break;
 skip:
-				++idx;
-			}
+			++idx;
 		}
 	}
-	rcu_read_unlock();
-
-	cb->args[0] = idx;
 
-	return skb->len;
+out:
+	return idx;
 }
 
 /* Update (create or replace) forwarding database entry */
@@ -614,43 +608,11 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
 }
 
 /* Add new permanent fdb entry with RTM_NEWNEIGH */
-int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+int br_fdb_add(struct ndmsg *ndm, struct net_device *dev,
+	       unsigned char *addr, u16 nlh_flags)
 {
-	struct net *net = sock_net(skb->sk);
-	struct ndmsg *ndm;
-	struct nlattr *tb[NDA_MAX+1];
-	struct net_device *dev;
 	struct net_bridge_port *p;
-	const __u8 *addr;
-	int err;
-
-	ASSERT_RTNL();
-	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
-	if (err < 0)
-		return err;
-
-	ndm = nlmsg_data(nlh);
-	if (ndm->ndm_ifindex == 0) {
-		pr_info("bridge: RTM_NEWNEIGH with invalid ifindex\n");
-		return -EINVAL;
-	}
-
-	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
-	if (dev == NULL) {
-		pr_info("bridge: RTM_NEWNEIGH with unknown ifindex\n");
-		return -ENODEV;
-	}
-
-	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
-		pr_info("bridge: RTM_NEWNEIGH with invalid address\n");
-		return -EINVAL;
-	}
-
-	addr = nla_data(tb[NDA_LLADDR]);
-	if (!is_valid_ether_addr(addr)) {
-		pr_info("bridge: RTM_NEWNEIGH with invalid ether address\n");
-		return -EINVAL;
-	}
+	int err = 0;
 
 	if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
 		pr_info("bridge: RTM_NEWNEIGH with invalid state %#x\n", ndm->ndm_state);
@@ -670,14 +632,14 @@ int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		rcu_read_unlock();
 	} else {
 		spin_lock_bh(&p->br->hash_lock);
-		err = fdb_add_entry(p, addr, ndm->ndm_state, nlh->nlmsg_flags);
+		err = fdb_add_entry(p, addr, ndm->ndm_state, nlh_flags);
 		spin_unlock_bh(&p->br->hash_lock);
 	}
 
 	return err;
 }
 
-static int fdb_delete_by_addr(struct net_bridge_port *p, const u8 *addr)
+static int fdb_delete_by_addr(struct net_bridge_port *p, u8 *addr)
 {
 	struct net_bridge *br = p->br;
 	struct hlist_head *head = &br->hash[br_mac_hash(addr)];
@@ -692,40 +654,12 @@ static int fdb_delete_by_addr(struct net_bridge_port *p, const u8 *addr)
 }
 
 /* Remove neighbor entry with RTM_DELNEIGH */
-int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+int br_fdb_delete(struct ndmsg *ndm, struct net_device *dev,
+		  unsigned char *addr)
 {
-	struct net *net = sock_net(skb->sk);
-	struct ndmsg *ndm;
 	struct net_bridge_port *p;
-	struct nlattr *llattr;
-	const __u8 *addr;
-	struct net_device *dev;
 	int err;
 
-	ASSERT_RTNL();
-	if (nlmsg_len(nlh) < sizeof(*ndm))
-		return -EINVAL;
-
-	ndm = nlmsg_data(nlh);
-	if (ndm->ndm_ifindex == 0) {
-		pr_info("bridge: RTM_DELNEIGH with invalid ifindex\n");
-		return -EINVAL;
-	}
-
-	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
-	if (dev == NULL) {
-		pr_info("bridge: RTM_DELNEIGH with unknown ifindex\n");
-		return -ENODEV;
-	}
-
-	llattr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_LLADDR);
-	if (llattr == NULL || nla_len(llattr) != ETH_ALEN) {
-		pr_info("bridge: RTM_DELNEIGH with invalid address\n");
-		return -EINVAL;
-	}
-
-	addr = nla_data(llattr);
-
 	p = br_port_get_rtnl(dev);
 	if (p == NULL) {
 		pr_info("bridge: RTM_DELNEIGH %s not a bridge port\n",
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 346b368..1fa0535 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -232,18 +232,6 @@ int __init br_netlink_init(void)
 			      br_rtm_setlink, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH,
-			      br_fdb_add, NULL, NULL);
-	if (err)
-		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH,
-			      br_fdb_delete, NULL, NULL);
-	if (err)
-		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
-			      NULL, br_fdb_dump, NULL);
-	if (err)
-		goto err3;
 
 	return 0;
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0b67a63..929b9f6 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -363,9 +363,18 @@ extern int br_fdb_insert(struct net_bridge *br,
 extern void br_fdb_update(struct net_bridge *br,
 			  struct net_bridge_port *source,
 			  const unsigned char *addr);
-extern int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb);
-extern int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
-extern int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
+
+extern int br_fdb_delete(struct ndmsg *ndm,
+			 struct net_device *dev,
+			 unsigned char *addr);
+extern int br_fdb_add(struct ndmsg *nlh,
+		      struct net_device *dev,
+		      unsigned char *addr,
+		      u16 nlh_flags);
+extern int br_fdb_dump(struct sk_buff *skb,
+		       struct netlink_callback *cb,
+		       struct net_device *dev,
+		       int idx);
 
 /* br_forward.c */
 extern void br_deliver(const struct net_bridge_port *to,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 545a969..5ec09d5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -35,7 +35,9 @@
 #include <linux/security.h>
 #include <linux/mutex.h>
 #include <linux/if_addr.h>
+#include <linux/if_bridge.h>
 #include <linux/pci.h>
+#include <linux/etherdevice.h>
 
 #include <asm/uaccess.h>
 
@@ -1978,6 +1980,152 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
 }
 
+static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *master = NULL;
+	struct ndmsg *ndm;
+	struct nlattr *tb[NDA_MAX+1];
+	struct net_device *dev;
+	u8 *addr;
+	int err;
+
+	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
+	if (err < 0)
+		return err;
+
+	ndm = nlmsg_data(nlh);
+	if (ndm->ndm_ifindex == 0) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (dev == NULL) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid address\n");
+		return -EINVAL;
+	}
+
+	addr = nla_data(tb[NDA_LLADDR]);
+	if (!is_valid_ether_addr(addr)) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid ether address\n");
+		return -EINVAL;
+	}
+
+	err = -EOPNOTSUPP;
+
+	/* Support fdb on master device the net/bridge default case */
+	if ((!ndm->ndm_flags || ndm->ndm_flags & NTF_MASTER) &&
+	    (dev->priv_flags & IFF_BRIDGE_PORT)) {
+		master = dev->master;
+		err = master->netdev_ops->ndo_fdb_add(ndm, dev, addr,
+						      nlh->nlmsg_flags);
+		if (err)
+			goto out;
+		else
+			ndm->ndm_flags &= ~NTF_MASTER;
+	}
+
+	/* Embedded bridge, macvlan, and any other device support */
+	if ((ndm->ndm_flags & NTF_SELF) && dev->netdev_ops->ndo_fdb_add) {
+		err = dev->netdev_ops->ndo_fdb_add(ndm, dev, addr,
+						   nlh->nlmsg_flags);
+
+		if (!err)
+			ndm->ndm_flags &= ~NTF_SELF;
+	}
+out:
+	return err;
+}
+
+static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ndmsg *ndm;
+	struct nlattr *llattr;
+	struct net_device *dev;
+	int err = -EINVAL;
+	__u8 *addr;
+
+	if (nlmsg_len(nlh) < sizeof(*ndm))
+		return -EINVAL;
+
+	ndm = nlmsg_data(nlh);
+	if (ndm->ndm_ifindex == 0) {
+		pr_info("PF_BRIDGE: RTM_DELNEIGH with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (dev == NULL) {
+		pr_info("PF_BRIDGE: RTM_DELNEIGH with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	llattr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_LLADDR);
+	if (llattr == NULL || nla_len(llattr) != ETH_ALEN) {
+		pr_info("PF_BRIGDE: RTM_DELNEIGH with invalid address\n");
+		return -EINVAL;
+	}
+
+	addr = nla_data(llattr);
+	err = -EOPNOTSUPP;
+
+	/* Support fdb on master device the net/bridge default case */
+	if ((!ndm->ndm_flags || ndm->ndm_flags & NTF_MASTER) &&
+	    (dev->priv_flags & IFF_BRIDGE_PORT)) {
+		struct net_device *master = dev->master;
+
+		if (master->netdev_ops->ndo_fdb_del)
+			err = master->netdev_ops->ndo_fdb_del(ndm, dev, addr);
+
+		if (err)
+			goto out;
+		else
+			ndm->ndm_flags &= ~NTF_MASTER;
+	}
+
+	/* Embedded bridge, macvlan, and any other device support */
+	if ((ndm->ndm_flags & NTF_SELF) && dev->netdev_ops->ndo_fdb_del) {
+		err = dev->netdev_ops->ndo_fdb_del(ndm, dev, addr);
+
+		if (!err)
+			ndm->ndm_flags &= ~NTF_SELF;
+	}
+out:
+	return err;
+}
+
+static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	int idx = 0;
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		if (dev->priv_flags & IFF_BRIDGE_PORT) {
+			struct net_device *master = dev->master;
+			const struct net_device_ops *ops = master->netdev_ops;
+
+			if (ops->ndo_fdb_dump)
+				idx = ops->ndo_fdb_dump(skb, cb, dev, idx);
+		}
+
+		if (dev->netdev_ops->ndo_fdb_dump)
+			idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, dev, idx);
+	}
+	rcu_read_unlock();
+
+	cb->args[0] = idx;
+	return skb->len;
+}
+
 /* Protected by RTNL sempahore.  */
 static struct rtattr **rta_buf;
 static int rtattr_max;
@@ -2150,5 +2298,9 @@ void __init rtnetlink_init(void)
 
 	rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all, NULL);
 	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, NULL);
+
+	rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, rtnl_fdb_add, NULL, NULL);
+	rtnl_register(PF_BRIDGE, RTM_DELNEIGH, rtnl_fdb_del, NULL, NULL);
+	rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, rtnl_fdb_dump, NULL);
 }
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
  2012-04-12 17:06 ` [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
@ 2012-04-12 17:06 ` John Fastabend
  2012-04-12 17:07 ` [net-next PATCH v3 3/8] net: add fdb generic dump routine John Fastabend
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:06 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds a dev_uc_add_excl() and dev_mc_add_excl() calls
similar to the original dev_{uc|mc}_add() except it sets
the global bit and returns -EEXIST for duplicat entires.

This is useful for drivers that support SR-IOV, macvlan
devices and any other devices that need to manage the
unicast and multicast lists.

v2: fix typo UNICAST should be MULTICAST in dev_mc_add_excl()

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/netdevice.h |    2 +
 net/core/dev_addr_lists.c |   97 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7600c61..3f738ca 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2569,6 +2569,7 @@ extern int dev_addr_init(struct net_device *dev);
 
 /* Functions used for unicast addresses handling */
 extern int dev_uc_add(struct net_device *dev, unsigned char *addr);
+extern int dev_uc_add_excl(struct net_device *dev, unsigned char *addr);
 extern int dev_uc_del(struct net_device *dev, unsigned char *addr);
 extern int dev_uc_sync(struct net_device *to, struct net_device *from);
 extern void dev_uc_unsync(struct net_device *to, struct net_device *from);
@@ -2578,6 +2579,7 @@ extern void dev_uc_init(struct net_device *dev);
 /* Functions used for multicast addresses handling */
 extern int dev_mc_add(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_add_global(struct net_device *dev, unsigned char *addr);
+extern int dev_mc_add_excl(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_del(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_del_global(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_sync(struct net_device *to, struct net_device *from);
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 626698f..c4cc2bc 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -21,12 +21,35 @@
  * General list handling functions
  */
 
+static int __hw_addr_create_ex(struct netdev_hw_addr_list *list,
+			       unsigned char *addr, int addr_len,
+			       unsigned char addr_type, bool global)
+{
+	struct netdev_hw_addr *ha;
+	int alloc_size;
+
+	alloc_size = sizeof(*ha);
+	if (alloc_size < L1_CACHE_BYTES)
+		alloc_size = L1_CACHE_BYTES;
+	ha = kmalloc(alloc_size, GFP_ATOMIC);
+	if (!ha)
+		return -ENOMEM;
+	memcpy(ha->addr, addr, addr_len);
+	ha->type = addr_type;
+	ha->refcount = 1;
+	ha->global_use = global;
+	ha->synced = false;
+	list_add_tail_rcu(&ha->list, &list->list);
+	list->count++;
+
+	return 0;
+}
+
 static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
 			    unsigned char *addr, int addr_len,
 			    unsigned char addr_type, bool global)
 {
 	struct netdev_hw_addr *ha;
-	int alloc_size;
 
 	if (addr_len > MAX_ADDR_LEN)
 		return -EINVAL;
@@ -46,21 +69,7 @@ static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
 		}
 	}
 
-
-	alloc_size = sizeof(*ha);
-	if (alloc_size < L1_CACHE_BYTES)
-		alloc_size = L1_CACHE_BYTES;
-	ha = kmalloc(alloc_size, GFP_ATOMIC);
-	if (!ha)
-		return -ENOMEM;
-	memcpy(ha->addr, addr, addr_len);
-	ha->type = addr_type;
-	ha->refcount = 1;
-	ha->global_use = global;
-	ha->synced = false;
-	list_add_tail_rcu(&ha->list, &list->list);
-	list->count++;
-	return 0;
+	return __hw_addr_create_ex(list, addr, addr_len, addr_type, global);
 }
 
 static int __hw_addr_add(struct netdev_hw_addr_list *list, unsigned char *addr,
@@ -377,6 +386,34 @@ EXPORT_SYMBOL(dev_addr_del_multiple);
  */
 
 /**
+ *	dev_uc_add_excl - Add a global secondary unicast address
+ *	@dev: device
+ *	@addr: address to add
+ */
+int dev_uc_add_excl(struct net_device *dev, unsigned char *addr)
+{
+	struct netdev_hw_addr *ha;
+	int err;
+
+	netif_addr_lock_bh(dev);
+	list_for_each_entry(ha, &dev->uc.list, list) {
+		if (!memcmp(ha->addr, addr, dev->addr_len) &&
+		    ha->type == NETDEV_HW_ADDR_T_UNICAST) {
+			err = -EEXIST;
+			goto out;
+		}
+	}
+	err = __hw_addr_create_ex(&dev->uc, addr, dev->addr_len,
+				  NETDEV_HW_ADDR_T_UNICAST, true);
+	if (!err)
+		__dev_set_rx_mode(dev);
+out:
+	netif_addr_unlock_bh(dev);
+	return err;
+}
+EXPORT_SYMBOL(dev_uc_add_excl);
+
+/**
  *	dev_uc_add - Add a secondary unicast address
  *	@dev: device
  *	@addr: address to add
@@ -501,6 +538,34 @@ EXPORT_SYMBOL(dev_uc_init);
  * Multicast list handling functions
  */
 
+/**
+ *	dev_mc_add_excl - Add a global secondary multicast address
+ *	@dev: device
+ *	@addr: address to add
+ */
+int dev_mc_add_excl(struct net_device *dev, unsigned char *addr)
+{
+	struct netdev_hw_addr *ha;
+	int err;
+
+	netif_addr_lock_bh(dev);
+	list_for_each_entry(ha, &dev->mc.list, list) {
+		if (!memcmp(ha->addr, addr, dev->addr_len) &&
+		    ha->type == NETDEV_HW_ADDR_T_MULTICAST) {
+			err = -EEXIST;
+			goto out;
+		}
+	}
+	err = __hw_addr_create_ex(&dev->mc, addr, dev->addr_len,
+				  NETDEV_HW_ADDR_T_MULTICAST, true);
+	if (!err)
+		__dev_set_rx_mode(dev);
+out:
+	netif_addr_unlock_bh(dev);
+	return err;
+}
+EXPORT_SYMBOL(dev_mc_add_excl);
+
 static int __dev_mc_add(struct net_device *dev, unsigned char *addr,
 			bool global)
 {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 3/8] net: add fdb generic dump routine
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
  2012-04-12 17:06 ` [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
  2012-04-12 17:06 ` [net-next PATCH v3 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add John Fastabend
@ 2012-04-12 17:07 ` John Fastabend
  2012-04-12 20:20   ` Ben Hutchings
  2012-04-12 17:07 ` [net-next PATCH v3 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes John Fastabend
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:07 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds a generic dump routine drivers can call. It
should be sufficient to handle any bridging model that
uses the unicast address list. This should be most SR-IOV
enabled NICs.

v2: return error on nlmsg_put and use -EMSGSIZE instead
    of -ENOMEM this is inline other usages

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/rtnetlink.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ec09d5..347c420 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1980,6 +1980,37 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
 }
 
+static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
+				   struct net_device *dev,
+				   u8 *addr, u32 pid, u32 seq,
+				   int type, unsigned int flags)
+{
+	struct nlmsghdr *nlh;
+	struct ndmsg *ndm;
+
+	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndm), NLM_F_MULTI);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	ndm = nlmsg_data(nlh);
+	ndm->ndm_family  = AF_BRIDGE;
+	ndm->ndm_pad1	 = 0;
+	ndm->ndm_pad2    = 0;
+	ndm->ndm_flags	 = flags;
+	ndm->ndm_type	 = 0;
+	ndm->ndm_ifindex = dev->ifindex;
+	ndm->ndm_state   = NUD_PERMANENT;
+
+	if (nla_put(skb, NDA_LLADDR, ETH_ALEN, addr))
+		goto nla_put_failure;
+
+	return nlmsg_end(skb, nlh);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
 static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -2101,6 +2132,59 @@ out:
 	return err;
 }
 
+static int nlmsg_populate_fdb(struct sk_buff *skb,
+			      struct netlink_callback *cb,
+			      struct net_device *dev,
+			      int *idx,
+			      struct netdev_hw_addr_list *list)
+{
+	struct netdev_hw_addr *ha;
+	int err;
+	u32 pid, seq;
+
+	pid = NETLINK_CB(cb->skb).pid;
+	seq = cb->nlh->nlmsg_seq;
+
+	list_for_each_entry(ha, &list->list, list) {
+		if (*idx < cb->args[0])
+			goto skip;
+
+		err = nlmsg_populate_fdb_fill(skb, dev, ha->addr,
+					      pid, seq, 0, NTF_SELF);
+		if (err < 0)
+			break;
+skip:
+		*idx += 1;
+	}
+	return 0;
+}
+
+/**
+ * ndo_dflt_fdb_dump: default netdevice operation to dump an FDB table.
+ * @nlh: netlink message header
+ * @dev: netdevice
+ *
+ * Default netdevice operation to dump the existing unicast address list.
+ * Returns zero on success.
+ */
+int ndo_dflt_fdb_dump(struct sk_buff *skb,
+		      struct netlink_callback *cb,
+		      struct net_device *dev,
+		      int idx)
+{
+	int err;
+
+	netif_addr_lock_bh(dev);
+	err = nlmsg_populate_fdb(skb, cb, dev, &idx, &dev->uc);
+	if (err)
+		goto out;
+	nlmsg_populate_fdb(skb, cb, dev, &idx, &dev->mc);
+out:
+	netif_addr_unlock_bh(dev);
+	return idx;
+}
+EXPORT_SYMBOL(ndo_dflt_fdb_dump);
+
 static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	int idx = 0;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (2 preceding siblings ...)
  2012-04-12 17:07 ` [net-next PATCH v3 3/8] net: add fdb generic dump routine John Fastabend
@ 2012-04-12 17:07 ` John Fastabend
  2012-04-12 17:07 ` [net-next PATCH v3 5/8] ixgbe: enable FDB netdevice ops John Fastabend
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:07 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

It is useful to be able to monitor for FDB events in user space.
This patch adds support to generate netlink events when a change
is made to a device supporting the FDB ops.

This brings embedded switches inline with the SW net/bridge which
triggers events on FDB updates as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/rtnetlink.c |   35 +++++++++++++++++++++++++++++++++--
 1 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 347c420..4b7ed7b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2011,6 +2011,33 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+static inline size_t rtnl_fdb_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN);
+}
+
+static void rtnl_fdb_notify(struct net_device *dev, u8 *addr, int type)
+{
+	struct net *net = dev_net(dev);
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(rtnl_fdb_nlmsg_size(), GFP_ATOMIC);
+	if (!skb)
+		goto errout;
+
+	err = nlmsg_populate_fdb_fill(skb, dev, addr, 0, 0, type, NTF_SELF);
+	if (err < 0) {
+		kfree_skb(skb);
+		goto errout;
+	}
+
+	rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+	return;
+errout:
+	rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
+}
+
 static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -2067,8 +2094,10 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		err = dev->netdev_ops->ndo_fdb_add(ndm, dev, addr,
 						   nlh->nlmsg_flags);
 
-		if (!err)
+		if (!err) {
+			rtnl_fdb_notify(dev, addr, RTM_NEWNEIGH);
 			ndm->ndm_flags &= ~NTF_SELF;
+		}
 	}
 out:
 	return err;
@@ -2125,8 +2154,10 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	if ((ndm->ndm_flags & NTF_SELF) && dev->netdev_ops->ndo_fdb_del) {
 		err = dev->netdev_ops->ndo_fdb_del(ndm, dev, addr);
 
-		if (!err)
+		if (!err) {
+			rtnl_fdb_notify(dev, addr, RTM_DELNEIGH);
 			ndm->ndm_flags &= ~NTF_SELF;
+		}
 	}
 out:
 	return err;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 5/8] ixgbe: enable FDB netdevice ops
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (3 preceding siblings ...)
  2012-04-12 17:07 ` [net-next PATCH v3 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes John Fastabend
@ 2012-04-12 17:07 ` John Fastabend
  2012-04-12 17:07 ` [net-next PATCH v3 6/8] ixgbe: allow RAR table to be updated in promisc mode John Fastabend
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:07 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

Enable FDB ops on ixgbe when in SR-IOV mode.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   71 +++++++++++++++++++++++++
 1 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3e26b1f..8b37395 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6681,6 +6681,74 @@ static int ixgbe_set_features(struct net_device *netdev,
 	return 0;
 }
 
+static int ixgbe_ndo_fdb_add(struct ndmsg *ndm,
+			     struct net_device *dev,
+			     unsigned char *addr,
+			     u16 flags)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	int err = -EOPNOTSUPP;
+
+	if (ndm->ndm_state & NUD_PERMANENT) {
+		pr_info("%s: FDB only supports static addresses\n",
+			ixgbe_driver_name);
+		return -EINVAL;
+	}
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+		if (is_unicast_ether_addr(addr))
+			err = dev_uc_add_excl(dev, addr);
+		else if (is_multicast_ether_addr(addr))
+			err = dev_mc_add_excl(dev, addr);
+		else
+			err = -EINVAL;
+	}
+
+	/* Only return duplicate errors if NLM_F_EXCL is set */
+	if (err == -EEXIST && !(flags & NLM_F_EXCL))
+		err = 0;
+
+	return err;
+}
+
+static int ixgbe_ndo_fdb_del(struct ndmsg *ndm,
+			     struct net_device *dev,
+			     unsigned char *addr)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	int err = -EOPNOTSUPP;
+
+	if (ndm->ndm_state & NUD_PERMANENT) {
+		pr_info("%s: FDB only supports static addresses\n",
+			ixgbe_driver_name);
+		return -EINVAL;
+	}
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+		if (is_unicast_ether_addr(addr))
+			err = dev_uc_del(dev, addr);
+		else if (is_multicast_ether_addr(addr))
+			err = dev_mc_del(dev, addr);
+		else
+			err = -EINVAL;
+	}
+
+	return err;
+}
+
+static int ixgbe_ndo_fdb_dump(struct sk_buff *skb,
+			      struct netlink_callback *cb,
+			      struct net_device *dev,
+			      int idx)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		idx = ndo_dflt_fdb_dump(skb, cb, dev, idx);
+
+	return idx;
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -6717,6 +6785,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #endif /* IXGBE_FCOE */
 	.ndo_set_features = ixgbe_set_features,
 	.ndo_fix_features = ixgbe_fix_features,
+	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
+	.ndo_fdb_del		= ixgbe_ndo_fdb_del,
+	.ndo_fdb_dump		= ixgbe_ndo_fdb_dump,
 };
 
 static void __devinit ixgbe_probe_vf(struct ixgbe_adapter *adapter,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 6/8] ixgbe: allow RAR table to be updated in promisc mode
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (4 preceding siblings ...)
  2012-04-12 17:07 ` [net-next PATCH v3 5/8] ixgbe: enable FDB netdevice ops John Fastabend
@ 2012-04-12 17:07 ` John Fastabend
  2012-04-12 17:07 ` [net-next PATCH v3 7/8] ixgbe: UTA table incorrectly programmed John Fastabend
  2012-04-12 17:07 ` [net-next PATCH v3 8/8] macvlan: add FDB bridge ops and macvlan flags John Fastabend
  7 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:07 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This allows RAR table updates while in promiscuous. With
SR-IOV enabled it is valuable to allow the RAR table to
be updated even when in promisc mode to configure forwarding

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 8b37395..25a7ed9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3462,16 +3462,17 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
 		}
 		ixgbe_vlan_filter_enable(adapter);
 		hw->addr_ctrl.user_set_promisc = false;
-		/*
-		 * Write addresses to available RAR registers, if there is not
-		 * sufficient space to store all the addresses then enable
-		 * unicast promiscuous mode
-		 */
-		count = ixgbe_write_uc_addr_list(netdev);
-		if (count < 0) {
-			fctrl |= IXGBE_FCTRL_UPE;
-			vmolr |= IXGBE_VMOLR_ROPE;
-		}
+	}
+
+	/*
+	 * Write addresses to available RAR registers, if there is not
+	 * sufficient space to store all the addresses then enable
+	 * unicast promiscuous mode
+	 */
+	count = ixgbe_write_uc_addr_list(netdev);
+	if (count < 0) {
+		fctrl |= IXGBE_FCTRL_UPE;
+		vmolr |= IXGBE_VMOLR_ROPE;
 	}
 
 	if (adapter->num_vfs) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 7/8] ixgbe: UTA table incorrectly programmed
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (5 preceding siblings ...)
  2012-04-12 17:07 ` [net-next PATCH v3 6/8] ixgbe: allow RAR table to be updated in promisc mode John Fastabend
@ 2012-04-12 17:07 ` John Fastabend
  2012-04-12 17:07 ` [net-next PATCH v3 8/8] macvlan: add FDB bridge ops and macvlan flags John Fastabend
  7 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:07 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

From: Greg Rose <gregory.v.rose@intel.com>

The UTA table was being set to the functional equivalent of promiscuous
mode.  This was resulting in traffic from the virtual function being
flooded onto the wire and the PF device. This resulted in additional
overhead for VF traffic sent to the network and in the case of traffic
sent to the PF or another VF resulted in unwanted packets on the wire.

This was actually not the intended behavior. Now that we can program
the embedded switch correctly we can remove this snippit of code. Users
who want to support this should configure the FDB correctly using the
FDB ops.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   29 -------------------------
 1 files changed, 0 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 25a7ed9..10606bd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2904,33 +2904,6 @@ static void ixgbe_configure_rscctl(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(reg_idx), rscctrl);
 }
 
-/**
- *  ixgbe_set_uta - Set unicast filter table address
- *  @adapter: board private structure
- *
- *  The unicast table address is a register array of 32-bit registers.
- *  The table is meant to be used in a way similar to how the MTA is used
- *  however due to certain limitations in the hardware it is necessary to
- *  set all the hash bits to 1 and use the VMOLR ROPE bit as a promiscuous
- *  enable bit to allow vlan tag stripping when promiscuous mode is enabled
- **/
-static void ixgbe_set_uta(struct ixgbe_adapter *adapter)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	int i;
-
-	/* The UTA table only exists on 82599 hardware and newer */
-	if (hw->mac.type < ixgbe_mac_82599EB)
-		return;
-
-	/* we only need to do this if VMDq is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
-		return;
-
-	for (i = 0; i < 128; i++)
-		IXGBE_WRITE_REG(hw, IXGBE_UTA(i), ~0);
-}
-
 #define IXGBE_MAX_RX_DESC_POLL 10
 static void ixgbe_rx_desc_queue_enable(struct ixgbe_adapter *adapter,
 				       struct ixgbe_ring *ring)
@@ -3224,8 +3197,6 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter)
 	/* Program registers for the distribution of queues */
 	ixgbe_setup_mrqc(adapter);
 
-	ixgbe_set_uta(adapter);
-
 	/* set_rx_buffer_len must be called before ring initialization */
 	ixgbe_set_rx_buffer_len(adapter);
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [net-next PATCH v3 8/8] macvlan: add FDB bridge ops and macvlan flags
  2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (6 preceding siblings ...)
  2012-04-12 17:07 ` [net-next PATCH v3 7/8] ixgbe: UTA table incorrectly programmed John Fastabend
@ 2012-04-12 17:07 ` John Fastabend
  7 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-12 17:07 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings, sri
  Cc: hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds FDB bridge ops to the macvlan device passthru mode.
Additionally a flags field was added and a NOPROMISC bit to
allow users to use passthru mode without the driver calling
dev_set_promiscuity(). The flags field is a u16 placed in a
4 byte hole (consuming 2 bytes) of the macvlan_dev struct.

We want to do this so that the macvlan driver or stack
above the macvlan driver does not have to process every
packet. For the use case where we know all the MAC addresses
of the endstations above us this works well.

This patch is a result of Roopa Prabhu's work. Follow up
patches are needed for VEPA and VEB macvlan modes.

v2: Change from distinct nopromisc mode to a flags field to
    configure this. This avoids the tendency to add a new
    mode every time we need some slightly different behavior.
v3: fix error in dev_set_promiscuity and add change and get
    link attributes for flags.

CC: Roopa Prabhu <roprabhu@cisco.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/macvlan.c      |   73 ++++++++++++++++++++++++++++++++++++++++----
 include/linux/if_link.h    |    3 ++
 include/linux/if_macvlan.h |    1 +
 3 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b17fc90..9653ed6 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -312,7 +312,8 @@ static int macvlan_open(struct net_device *dev)
 	int err;
 
 	if (vlan->port->passthru) {
-		dev_set_promiscuity(lowerdev, 1);
+		if (!(vlan->flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(lowerdev, 1);
 		goto hash_add;
 	}
 
@@ -344,12 +345,15 @@ static int macvlan_stop(struct net_device *dev)
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
 
+	dev_uc_unsync(lowerdev, dev);
+	dev_mc_unsync(lowerdev, dev);
+
 	if (vlan->port->passthru) {
-		dev_set_promiscuity(lowerdev, -1);
+		if (!(vlan->flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(lowerdev, -1);
 		goto hash_del;
 	}
 
-	dev_mc_unsync(lowerdev, dev);
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(lowerdev, -1);
 
@@ -399,10 +403,11 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change)
 		dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1);
 }
 
-static void macvlan_set_multicast_list(struct net_device *dev)
+static void macvlan_set_mac_lists(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
+	dev_uc_sync(vlan->lowerdev, dev);
 	dev_mc_sync(vlan->lowerdev, dev);
 }
 
@@ -542,6 +547,43 @@ static int macvlan_vlan_rx_kill_vid(struct net_device *dev,
 	return 0;
 }
 
+static int macvlan_fdb_add(struct ndmsg *ndm,
+			   struct net_device *dev,
+			   unsigned char *addr,
+			   u16 flags)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	if (!vlan->port->passthru)
+		return -EOPNOTSUPP;
+
+	if (is_unicast_ether_addr(addr))
+		err = dev_uc_add_excl(dev, addr);
+	else if (is_multicast_ether_addr(addr))
+		err = dev_mc_add_excl(dev, addr);
+
+	return err;
+}
+
+static int macvlan_fdb_del(struct ndmsg *ndm,
+			   struct net_device *dev,
+			   unsigned char *addr)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	if (!vlan->port->passthru)
+		return -EOPNOTSUPP;
+
+	if (is_unicast_ether_addr(addr))
+		err = dev_uc_del(dev, addr);
+	else if (is_multicast_ether_addr(addr))
+		err = dev_mc_del(dev, addr);
+
+	return err;
+}
+
 static void macvlan_ethtool_get_drvinfo(struct net_device *dev,
 					struct ethtool_drvinfo *drvinfo)
 {
@@ -572,11 +614,14 @@ static const struct net_device_ops macvlan_netdev_ops = {
 	.ndo_change_mtu		= macvlan_change_mtu,
 	.ndo_change_rx_flags	= macvlan_change_rx_flags,
 	.ndo_set_mac_address	= macvlan_set_mac_address,
-	.ndo_set_rx_mode	= macvlan_set_multicast_list,
+	.ndo_set_rx_mode	= macvlan_set_mac_lists,
 	.ndo_get_stats64	= macvlan_dev_get_stats64,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_vlan_rx_add_vid	= macvlan_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= macvlan_vlan_rx_kill_vid,
+	.ndo_fdb_add		= macvlan_fdb_add,
+	.ndo_fdb_del		= macvlan_fdb_del,
+	.ndo_fdb_dump		= ndo_dflt_fdb_dump,
 };
 
 void macvlan_common_setup(struct net_device *dev)
@@ -711,6 +756,9 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 	if (data && data[IFLA_MACVLAN_MODE])
 		vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
 
+	if (data && data[IFLA_MACVLAN_FLAGS])
+		vlan->flags = nla_get_u16(data[IFLA_MACVLAN_FLAGS]);
+
 	if (vlan->mode == MACVLAN_MODE_PASSTHRU) {
 		if (port->count)
 			return -EINVAL;
@@ -760,6 +808,16 @@ static int macvlan_changelink(struct net_device *dev,
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	if (data && data[IFLA_MACVLAN_MODE])
 		vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
+	if (data && data[IFLA_MACVLAN_FLAGS]) {
+		__u16 flags = nla_get_u16(data[IFLA_MACVLAN_FLAGS]);
+		bool promisc = (flags ^ vlan->flags) & MACVLAN_FLAG_NOPROMISC;
+
+		if (promisc && (flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(vlan->lowerdev, -1);
+		else if (promisc && !(flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(vlan->lowerdev, 1);
+		vlan->flags = flags;
+	}
 	return 0;
 }
 
@@ -775,6 +833,8 @@ static int macvlan_fill_info(struct sk_buff *skb,
 
 	if (nla_put_u32(skb, IFLA_MACVLAN_MODE, vlan->mode))
 		goto nla_put_failure;
+	if (nla_put_u16(skb, IFLA_MACVLAN_FLAGS, vlan->flags))
+		goto nla_put_failure;
 	return 0;
 
 nla_put_failure:
@@ -782,7 +842,8 @@ nla_put_failure:
 }
 
 static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
-	[IFLA_MACVLAN_MODE] = { .type = NLA_U32 },
+	[IFLA_MACVLAN_MODE]  = { .type = NLA_U32 },
+	[IFLA_MACVLAN_FLAGS] = { .type = NLA_U16 },
 };
 
 int macvlan_link_register(struct rtnl_link_ops *ops)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 2f4fa93..f715750 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -255,6 +255,7 @@ struct ifla_vlan_qos_mapping {
 enum {
 	IFLA_MACVLAN_UNSPEC,
 	IFLA_MACVLAN_MODE,
+	IFLA_MACVLAN_FLAGS,
 	__IFLA_MACVLAN_MAX,
 };
 
@@ -267,6 +268,8 @@ enum macvlan_mode {
 	MACVLAN_MODE_PASSTHRU = 8,/* take over the underlying device */
 };
 
+#define MACVLAN_FLAG_NOPROMISC	1
+
 /* SR-IOV virtual function management section */
 
 enum {
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index d103dca..f65e8d2 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -60,6 +60,7 @@ struct macvlan_dev {
 	struct net_device	*lowerdev;
 	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 	enum macvlan_mode	mode;
+	u16			flags;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next PATCH v3 3/8] net: add fdb generic dump routine
  2012-04-12 17:07 ` [net-next PATCH v3 3/8] net: add fdb generic dump routine John Fastabend
@ 2012-04-12 20:20   ` Ben Hutchings
  2012-04-15 16:05     ` John Fastabend
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Hutchings @ 2012-04-12 20:20 UTC (permalink / raw)
  To: John Fastabend
  Cc: shemminger, mst, davem, sri, hadi, jeffrey.t.kirsher, netdev,
	gregory.v.rose, krkumar2

On Thu, 2012-04-12 at 10:07 -0700, John Fastabend wrote:
> This adds a generic dump routine drivers can call. It
> should be sufficient to handle any bridging model that
> uses the unicast address list. This should be most SR-IOV
> enabled NICs.
> 
> v2: return error on nlmsg_put and use -EMSGSIZE instead
>     of -ENOMEM this is inline other usages

It's still not propagated up to ndo_dflt_fdb_dump() though:

[...]
> +static int nlmsg_populate_fdb(struct sk_buff *skb,
> +			      struct netlink_callback *cb,
> +			      struct net_device *dev,
> +			      int *idx,
> +			      struct netdev_hw_addr_list *list)
> +{
> +	struct netdev_hw_addr *ha;
> +	int err;
> +	u32 pid, seq;
> +
> +	pid = NETLINK_CB(cb->skb).pid;
> +	seq = cb->nlh->nlmsg_seq;
> +
> +	list_for_each_entry(ha, &list->list, list) {
> +		if (*idx < cb->args[0])
> +			goto skip;
> +
> +		err = nlmsg_populate_fdb_fill(skb, dev, ha->addr,
> +					      pid, seq, 0, NTF_SELF);
> +		if (err < 0)
> +			break;
			return err;
> +skip:
> +		*idx += 1;
> +	}
> +	return 0;
> +}
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks
  2012-04-12 17:06 ` [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
@ 2012-04-12 20:27   ` Ben Hutchings
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2012-04-12 20:27 UTC (permalink / raw)
  To: John Fastabend
  Cc: shemminger, mst, davem, sri, hadi, jeffrey.t.kirsher, netdev,
	gregory.v.rose, krkumar2

On Thu, 2012-04-12 at 10:06 -0700, John Fastabend wrote:
[...]
> There is a slight complication in the case with both flags set
> when an error occurs. To resolve this the rtnl handler clears
> the NTF_ flag in the netlink ack to indicate which sets completed
> successfully. The add/del handlers will abort as soon as any
> error occurs.
[...]

Yes, this should work nicely.  Presumably these new flags are completely
ignored by the current implementation of PF_BRIDGE, and new userland
will also be able to use the flags in the response to detect the old
implementation.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [net-next PATCH v3 3/8] net: add fdb generic dump routine
  2012-04-12 20:20   ` Ben Hutchings
@ 2012-04-15 16:05     ` John Fastabend
  0 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2012-04-15 16:05 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: shemminger, mst, davem, sri, hadi, jeffrey.t.kirsher, netdev,
	gregory.v.rose, krkumar2

On 4/12/2012 1:20 PM, Ben Hutchings wrote:
> On Thu, 2012-04-12 at 10:07 -0700, John Fastabend wrote:
>> This adds a generic dump routine drivers can call. It
>> should be sufficient to handle any bridging model that
>> uses the unicast address list. This should be most SR-IOV
>> enabled NICs.
>>
>> v2: return error on nlmsg_put and use -EMSGSIZE instead
>>     of -ENOMEM this is inline other usages
> 
> It's still not propagated up to ndo_dflt_fdb_dump() though:
> 

Thanks Ben. I'll fix this in v4.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-04-15 16:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12 17:06 [net-next PATCH v3 0/8] Managing the forwarding database(FDB) John Fastabend
2012-04-12 17:06 ` [net-next PATCH v3 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
2012-04-12 20:27   ` Ben Hutchings
2012-04-12 17:06 ` [net-next PATCH v3 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add John Fastabend
2012-04-12 17:07 ` [net-next PATCH v3 3/8] net: add fdb generic dump routine John Fastabend
2012-04-12 20:20   ` Ben Hutchings
2012-04-15 16:05     ` John Fastabend
2012-04-12 17:07 ` [net-next PATCH v3 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes John Fastabend
2012-04-12 17:07 ` [net-next PATCH v3 5/8] ixgbe: enable FDB netdevice ops John Fastabend
2012-04-12 17:07 ` [net-next PATCH v3 6/8] ixgbe: allow RAR table to be updated in promisc mode John Fastabend
2012-04-12 17:07 ` [net-next PATCH v3 7/8] ixgbe: UTA table incorrectly programmed John Fastabend
2012-04-12 17:07 ` [net-next PATCH v3 8/8] macvlan: add FDB bridge ops and macvlan flags John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.