All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
@ 2012-04-15 16:43 John Fastabend
  2012-04-15 16:43 ` [net-next PATCH v4 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:43 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

The following series is a submission for net-next to allow
embedded switches and other stacked devices other then the
Linux bridge to manage a forwarding database.

Previously discussed here,

http://lists.openwall.net/netdev/2012/03/19/26

v4: propagate return codes correctly for ndo_dflt_Fdb_dump()

v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
    error and add the flags field to change and get link routines.

v2: addressed feedback from Ben Hutchings resolving a typo in the
    multicast add/del routines and improving the error handling
    when both NTF_SELF and NTF_MASTER are set.

I've tested this with 'br' tool published by Stephen Hemminger
soon to be renamed 'bridge' I believe and various traffic
generators mostly pktgen, ping, and netperf.

Thanks for the feedback any comments welcome.
John

---

Greg Rose (1):
      ixgbe: UTA table incorrectly programmed

John Fastabend (7):
      macvlan: add FDB bridge ops and macvlan flags
      ixgbe: allow RAR table to be updated in promisc mode
      ixgbe: enable FDB netdevice ops
      net: rtnetlink notify events for FDB NTF_SELF adds and deletes
      net: add fdb generic dump routine
      net: addr_list: add exclusive dev_uc_add and dev_mc_add
      net: add generic PF_BRIDGE:RTM_ FDB hooks


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  121 ++++++++---
 drivers/net/macvlan.c                         |   73 ++++++-
 include/linux/if_link.h                       |    3 
 include/linux/if_macvlan.h                    |    1 
 include/linux/neighbour.h                     |    3 
 include/linux/netdevice.h                     |   25 ++
 include/linux/rtnetlink.h                     |    4 
 net/bridge/br_device.c                        |    3 
 net/bridge/br_fdb.c                           |  128 +++---------
 net/bridge/br_netlink.c                       |   12 -
 net/bridge/br_private.h                       |   15 +
 net/core/dev_addr_lists.c                     |   97 ++++++++-
 net/core/rtnetlink.c                          |  267 +++++++++++++++++++++++++
 13 files changed, 579 insertions(+), 173 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
@ 2012-04-15 16:43 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add John Fastabend
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:43 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/neighbour.h |    3 +
 include/linux/netdevice.h |   23 +++++++
 include/linux/rtnetlink.h |    4 +
 net/bridge/br_device.c    |    3 +
 net/bridge/br_fdb.c       |  128 +++++++++-----------------------------
 net/bridge/br_netlink.c   |   12 ----
 net/bridge/br_private.h   |   15 ++++
 net/core/rtnetlink.c      |  152 +++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 228 insertions(+), 112 deletions(-)

diff --git a/include/linux/neighbour.h b/include/linux/neighbour.h
index b188f68..275e5d6 100644
--- a/include/linux/neighbour.h
+++ b/include/linux/neighbour.h
@@ -33,6 +33,9 @@ enum {
 #define NTF_PROXY	0x08	/* == ATF_PUBL */
 #define NTF_ROUTER	0x80
 
+#define NTF_SELF	0x02
+#define NTF_MASTER	0x04
+
 /*
  *	Neighbor Cache Entry States.
  */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbaa20..7600c61 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -54,6 +54,7 @@
 #include <net/netprio_cgroup.h>
 
 #include <linux/netdev_features.h>
+#include <linux/neighbour.h>
 
 struct netpoll_info;
 struct device;
@@ -905,6 +906,16 @@ struct netdev_fcoe_hbainfo {
  *	feature set might be less than what was returned by ndo_fix_features()).
  *	Must return >0 or -errno if it changed dev->features itself.
  *
+ * int (*ndo_fdb_add)(struct ndmsg *ndm, struct net_device *dev,
+ *		      unsigned char *addr, u16 flags)
+ *	Adds an FDB entry to dev for addr.
+ * int (*ndo_fdb_del)(struct ndmsg *ndm, struct net_device *dev,
+ *		      unsigned char *addr)
+ *	Deletes the FDB entry from dev coresponding to addr.
+ * int (*ndo_fdb_dump)(struct sk_buff *skb, struct netlink_callback *cb,
+ *		       struct net_device *dev, int idx)
+ *	Used to add FDB entries to dump requests. Implementers should add
+ *	entries to skb and update idx with the number of entries.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1002,6 +1013,18 @@ struct net_device_ops {
 						    netdev_features_t features);
 	int			(*ndo_neigh_construct)(struct neighbour *n);
 	void			(*ndo_neigh_destroy)(struct neighbour *n);
+
+	int			(*ndo_fdb_add)(struct ndmsg *ndm,
+					       struct net_device *dev,
+					       unsigned char *addr,
+					       u16 flags);
+	int			(*ndo_fdb_del)(struct ndmsg *ndm,
+					       struct net_device *dev,
+					       unsigned char *addr);
+	int			(*ndo_fdb_dump)(struct sk_buff *skb,
+						struct netlink_callback *cb,
+						struct net_device *dev,
+						int idx);
 };
 
 /*
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 577592e..2c1de89 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -801,6 +801,10 @@ rtattr_failure:
 	return table;
 }
 
+extern int ndo_dflt_fdb_dump(struct sk_buff *skb,
+			     struct netlink_callback *cb,
+			     struct net_device *dev,
+			     int idx);
 #endif /* __KERNEL__ */
 
 
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index ba829de..d6e5929 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -317,6 +317,9 @@ static const struct net_device_ops br_netdev_ops = {
 	.ndo_add_slave		 = br_add_slave,
 	.ndo_del_slave		 = br_del_slave,
 	.ndo_fix_features        = br_fix_features,
+	.ndo_fdb_add		 = br_fdb_add,
+	.ndo_fdb_del		 = br_fdb_delete,
+	.ndo_fdb_dump		 = br_fdb_dump,
 };
 
 static void br_dev_free(struct net_device *dev)
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 80dbce4..5945c54 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -535,44 +535,38 @@ errout:
 }
 
 /* Dump information about entries, in response to GETNEIGH */
-int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+int br_fdb_dump(struct sk_buff *skb,
+		struct netlink_callback *cb,
+		struct net_device *dev,
+		int idx)
 {
-	struct net *net = sock_net(skb->sk);
-	struct net_device *dev;
-	int idx = 0;
-
-	rcu_read_lock();
-	for_each_netdev_rcu(net, dev) {
-		struct net_bridge *br = netdev_priv(dev);
-		int i;
-
-		if (!(dev->priv_flags & IFF_EBRIDGE))
-			continue;
+	struct net_bridge *br = netdev_priv(dev);
+	int i;
 
-		for (i = 0; i < BR_HASH_SIZE; i++) {
-			struct hlist_node *h;
-			struct net_bridge_fdb_entry *f;
+	if (!(dev->priv_flags & IFF_EBRIDGE))
+		goto out;
 
-			hlist_for_each_entry_rcu(f, h, &br->hash[i], hlist) {
-				if (idx < cb->args[0])
-					goto skip;
+	for (i = 0; i < BR_HASH_SIZE; i++) {
+		struct hlist_node *h;
+		struct net_bridge_fdb_entry *f;
 
-				if (fdb_fill_info(skb, br, f,
-						  NETLINK_CB(cb->skb).pid,
-						  cb->nlh->nlmsg_seq,
-						  RTM_NEWNEIGH,
-						  NLM_F_MULTI) < 0)
-					break;
+		hlist_for_each_entry_rcu(f, h, &br->hash[i], hlist) {
+			if (idx < cb->args[0])
+				goto skip;
+
+			if (fdb_fill_info(skb, br, f,
+					  NETLINK_CB(cb->skb).pid,
+					  cb->nlh->nlmsg_seq,
+					  RTM_NEWNEIGH,
+					  NLM_F_MULTI) < 0)
+				break;
 skip:
-				++idx;
-			}
+			++idx;
 		}
 	}
-	rcu_read_unlock();
-
-	cb->args[0] = idx;
 
-	return skb->len;
+out:
+	return idx;
 }
 
 /* Update (create or replace) forwarding database entry */
@@ -614,43 +608,11 @@ static int fdb_add_entry(struct net_bridge_port *source, const __u8 *addr,
 }
 
 /* Add new permanent fdb entry with RTM_NEWNEIGH */
-int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+int br_fdb_add(struct ndmsg *ndm, struct net_device *dev,
+	       unsigned char *addr, u16 nlh_flags)
 {
-	struct net *net = sock_net(skb->sk);
-	struct ndmsg *ndm;
-	struct nlattr *tb[NDA_MAX+1];
-	struct net_device *dev;
 	struct net_bridge_port *p;
-	const __u8 *addr;
-	int err;
-
-	ASSERT_RTNL();
-	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
-	if (err < 0)
-		return err;
-
-	ndm = nlmsg_data(nlh);
-	if (ndm->ndm_ifindex == 0) {
-		pr_info("bridge: RTM_NEWNEIGH with invalid ifindex\n");
-		return -EINVAL;
-	}
-
-	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
-	if (dev == NULL) {
-		pr_info("bridge: RTM_NEWNEIGH with unknown ifindex\n");
-		return -ENODEV;
-	}
-
-	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
-		pr_info("bridge: RTM_NEWNEIGH with invalid address\n");
-		return -EINVAL;
-	}
-
-	addr = nla_data(tb[NDA_LLADDR]);
-	if (!is_valid_ether_addr(addr)) {
-		pr_info("bridge: RTM_NEWNEIGH with invalid ether address\n");
-		return -EINVAL;
-	}
+	int err = 0;
 
 	if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
 		pr_info("bridge: RTM_NEWNEIGH with invalid state %#x\n", ndm->ndm_state);
@@ -670,14 +632,14 @@ int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		rcu_read_unlock();
 	} else {
 		spin_lock_bh(&p->br->hash_lock);
-		err = fdb_add_entry(p, addr, ndm->ndm_state, nlh->nlmsg_flags);
+		err = fdb_add_entry(p, addr, ndm->ndm_state, nlh_flags);
 		spin_unlock_bh(&p->br->hash_lock);
 	}
 
 	return err;
 }
 
-static int fdb_delete_by_addr(struct net_bridge_port *p, const u8 *addr)
+static int fdb_delete_by_addr(struct net_bridge_port *p, u8 *addr)
 {
 	struct net_bridge *br = p->br;
 	struct hlist_head *head = &br->hash[br_mac_hash(addr)];
@@ -692,40 +654,12 @@ static int fdb_delete_by_addr(struct net_bridge_port *p, const u8 *addr)
 }
 
 /* Remove neighbor entry with RTM_DELNEIGH */
-int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+int br_fdb_delete(struct ndmsg *ndm, struct net_device *dev,
+		  unsigned char *addr)
 {
-	struct net *net = sock_net(skb->sk);
-	struct ndmsg *ndm;
 	struct net_bridge_port *p;
-	struct nlattr *llattr;
-	const __u8 *addr;
-	struct net_device *dev;
 	int err;
 
-	ASSERT_RTNL();
-	if (nlmsg_len(nlh) < sizeof(*ndm))
-		return -EINVAL;
-
-	ndm = nlmsg_data(nlh);
-	if (ndm->ndm_ifindex == 0) {
-		pr_info("bridge: RTM_DELNEIGH with invalid ifindex\n");
-		return -EINVAL;
-	}
-
-	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
-	if (dev == NULL) {
-		pr_info("bridge: RTM_DELNEIGH with unknown ifindex\n");
-		return -ENODEV;
-	}
-
-	llattr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_LLADDR);
-	if (llattr == NULL || nla_len(llattr) != ETH_ALEN) {
-		pr_info("bridge: RTM_DELNEIGH with invalid address\n");
-		return -EINVAL;
-	}
-
-	addr = nla_data(llattr);
-
 	p = br_port_get_rtnl(dev);
 	if (p == NULL) {
 		pr_info("bridge: RTM_DELNEIGH %s not a bridge port\n",
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 346b368..1fa0535 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -232,18 +232,6 @@ int __init br_netlink_init(void)
 			      br_rtm_setlink, NULL, NULL);
 	if (err)
 		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_NEWNEIGH,
-			      br_fdb_add, NULL, NULL);
-	if (err)
-		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_DELNEIGH,
-			      br_fdb_delete, NULL, NULL);
-	if (err)
-		goto err3;
-	err = __rtnl_register(PF_BRIDGE, RTM_GETNEIGH,
-			      NULL, br_fdb_dump, NULL);
-	if (err)
-		goto err3;
 
 	return 0;
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e1d8822..dd8a121 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -359,9 +359,18 @@ extern int br_fdb_insert(struct net_bridge *br,
 extern void br_fdb_update(struct net_bridge *br,
 			  struct net_bridge_port *source,
 			  const unsigned char *addr);
-extern int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb);
-extern int br_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
-extern int br_fdb_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg);
+
+extern int br_fdb_delete(struct ndmsg *ndm,
+			 struct net_device *dev,
+			 unsigned char *addr);
+extern int br_fdb_add(struct ndmsg *nlh,
+		      struct net_device *dev,
+		      unsigned char *addr,
+		      u16 nlh_flags);
+extern int br_fdb_dump(struct sk_buff *skb,
+		       struct netlink_callback *cb,
+		       struct net_device *dev,
+		       int idx);
 
 /* br_forward.c */
 extern void br_deliver(const struct net_bridge_port *to,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4a0d8cf..037f53c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -35,7 +35,9 @@
 #include <linux/security.h>
 #include <linux/mutex.h>
 #include <linux/if_addr.h>
+#include <linux/if_bridge.h>
 #include <linux/pci.h>
+#include <linux/etherdevice.h>
 
 #include <asm/uaccess.h>
 
@@ -1978,6 +1980,152 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
 }
 
+static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *master = NULL;
+	struct ndmsg *ndm;
+	struct nlattr *tb[NDA_MAX+1];
+	struct net_device *dev;
+	u8 *addr;
+	int err;
+
+	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
+	if (err < 0)
+		return err;
+
+	ndm = nlmsg_data(nlh);
+	if (ndm->ndm_ifindex == 0) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (dev == NULL) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	if (!tb[NDA_LLADDR] || nla_len(tb[NDA_LLADDR]) != ETH_ALEN) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid address\n");
+		return -EINVAL;
+	}
+
+	addr = nla_data(tb[NDA_LLADDR]);
+	if (!is_valid_ether_addr(addr)) {
+		pr_info("PF_BRIDGE: RTM_NEWNEIGH with invalid ether address\n");
+		return -EINVAL;
+	}
+
+	err = -EOPNOTSUPP;
+
+	/* Support fdb on master device the net/bridge default case */
+	if ((!ndm->ndm_flags || ndm->ndm_flags & NTF_MASTER) &&
+	    (dev->priv_flags & IFF_BRIDGE_PORT)) {
+		master = dev->master;
+		err = master->netdev_ops->ndo_fdb_add(ndm, dev, addr,
+						      nlh->nlmsg_flags);
+		if (err)
+			goto out;
+		else
+			ndm->ndm_flags &= ~NTF_MASTER;
+	}
+
+	/* Embedded bridge, macvlan, and any other device support */
+	if ((ndm->ndm_flags & NTF_SELF) && dev->netdev_ops->ndo_fdb_add) {
+		err = dev->netdev_ops->ndo_fdb_add(ndm, dev, addr,
+						   nlh->nlmsg_flags);
+
+		if (!err)
+			ndm->ndm_flags &= ~NTF_SELF;
+	}
+out:
+	return err;
+}
+
+static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ndmsg *ndm;
+	struct nlattr *llattr;
+	struct net_device *dev;
+	int err = -EINVAL;
+	__u8 *addr;
+
+	if (nlmsg_len(nlh) < sizeof(*ndm))
+		return -EINVAL;
+
+	ndm = nlmsg_data(nlh);
+	if (ndm->ndm_ifindex == 0) {
+		pr_info("PF_BRIDGE: RTM_DELNEIGH with invalid ifindex\n");
+		return -EINVAL;
+	}
+
+	dev = __dev_get_by_index(net, ndm->ndm_ifindex);
+	if (dev == NULL) {
+		pr_info("PF_BRIDGE: RTM_DELNEIGH with unknown ifindex\n");
+		return -ENODEV;
+	}
+
+	llattr = nlmsg_find_attr(nlh, sizeof(*ndm), NDA_LLADDR);
+	if (llattr == NULL || nla_len(llattr) != ETH_ALEN) {
+		pr_info("PF_BRIGDE: RTM_DELNEIGH with invalid address\n");
+		return -EINVAL;
+	}
+
+	addr = nla_data(llattr);
+	err = -EOPNOTSUPP;
+
+	/* Support fdb on master device the net/bridge default case */
+	if ((!ndm->ndm_flags || ndm->ndm_flags & NTF_MASTER) &&
+	    (dev->priv_flags & IFF_BRIDGE_PORT)) {
+		struct net_device *master = dev->master;
+
+		if (master->netdev_ops->ndo_fdb_del)
+			err = master->netdev_ops->ndo_fdb_del(ndm, dev, addr);
+
+		if (err)
+			goto out;
+		else
+			ndm->ndm_flags &= ~NTF_MASTER;
+	}
+
+	/* Embedded bridge, macvlan, and any other device support */
+	if ((ndm->ndm_flags & NTF_SELF) && dev->netdev_ops->ndo_fdb_del) {
+		err = dev->netdev_ops->ndo_fdb_del(ndm, dev, addr);
+
+		if (!err)
+			ndm->ndm_flags &= ~NTF_SELF;
+	}
+out:
+	return err;
+}
+
+static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	int idx = 0;
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		if (dev->priv_flags & IFF_BRIDGE_PORT) {
+			struct net_device *master = dev->master;
+			const struct net_device_ops *ops = master->netdev_ops;
+
+			if (ops->ndo_fdb_dump)
+				idx = ops->ndo_fdb_dump(skb, cb, dev, idx);
+		}
+
+		if (dev->netdev_ops->ndo_fdb_dump)
+			idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, dev, idx);
+	}
+	rcu_read_unlock();
+
+	cb->args[0] = idx;
+	return skb->len;
+}
+
 /* Protected by RTNL sempahore.  */
 static struct rtattr **rta_buf;
 static int rtattr_max;
@@ -2150,5 +2298,9 @@ void __init rtnetlink_init(void)
 
 	rtnl_register(PF_UNSPEC, RTM_GETADDR, NULL, rtnl_dump_all, NULL);
 	rtnl_register(PF_UNSPEC, RTM_GETROUTE, NULL, rtnl_dump_all, NULL);
+
+	rtnl_register(PF_BRIDGE, RTM_NEWNEIGH, rtnl_fdb_add, NULL, NULL);
+	rtnl_register(PF_BRIDGE, RTM_DELNEIGH, rtnl_fdb_del, NULL, NULL);
+	rtnl_register(PF_BRIDGE, RTM_GETNEIGH, NULL, rtnl_fdb_dump, NULL);
 }
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
  2012-04-15 16:43 ` [net-next PATCH v4 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 3/8] net: add fdb generic dump routine John Fastabend
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds a dev_uc_add_excl() and dev_mc_add_excl() calls
similar to the original dev_{uc|mc}_add() except it sets
the global bit and returns -EEXIST for duplicat entires.

This is useful for drivers that support SR-IOV, macvlan
devices and any other devices that need to manage the
unicast and multicast lists.

v2: fix typo UNICAST should be MULTICAST in dev_mc_add_excl()

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 include/linux/netdevice.h |    2 +
 net/core/dev_addr_lists.c |   97 ++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7600c61..3f738ca 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2569,6 +2569,7 @@ extern int dev_addr_init(struct net_device *dev);
 
 /* Functions used for unicast addresses handling */
 extern int dev_uc_add(struct net_device *dev, unsigned char *addr);
+extern int dev_uc_add_excl(struct net_device *dev, unsigned char *addr);
 extern int dev_uc_del(struct net_device *dev, unsigned char *addr);
 extern int dev_uc_sync(struct net_device *to, struct net_device *from);
 extern void dev_uc_unsync(struct net_device *to, struct net_device *from);
@@ -2578,6 +2579,7 @@ extern void dev_uc_init(struct net_device *dev);
 /* Functions used for multicast addresses handling */
 extern int dev_mc_add(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_add_global(struct net_device *dev, unsigned char *addr);
+extern int dev_mc_add_excl(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_del(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_del_global(struct net_device *dev, unsigned char *addr);
 extern int dev_mc_sync(struct net_device *to, struct net_device *from);
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index 626698f..c4cc2bc 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -21,12 +21,35 @@
  * General list handling functions
  */
 
+static int __hw_addr_create_ex(struct netdev_hw_addr_list *list,
+			       unsigned char *addr, int addr_len,
+			       unsigned char addr_type, bool global)
+{
+	struct netdev_hw_addr *ha;
+	int alloc_size;
+
+	alloc_size = sizeof(*ha);
+	if (alloc_size < L1_CACHE_BYTES)
+		alloc_size = L1_CACHE_BYTES;
+	ha = kmalloc(alloc_size, GFP_ATOMIC);
+	if (!ha)
+		return -ENOMEM;
+	memcpy(ha->addr, addr, addr_len);
+	ha->type = addr_type;
+	ha->refcount = 1;
+	ha->global_use = global;
+	ha->synced = false;
+	list_add_tail_rcu(&ha->list, &list->list);
+	list->count++;
+
+	return 0;
+}
+
 static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
 			    unsigned char *addr, int addr_len,
 			    unsigned char addr_type, bool global)
 {
 	struct netdev_hw_addr *ha;
-	int alloc_size;
 
 	if (addr_len > MAX_ADDR_LEN)
 		return -EINVAL;
@@ -46,21 +69,7 @@ static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
 		}
 	}
 
-
-	alloc_size = sizeof(*ha);
-	if (alloc_size < L1_CACHE_BYTES)
-		alloc_size = L1_CACHE_BYTES;
-	ha = kmalloc(alloc_size, GFP_ATOMIC);
-	if (!ha)
-		return -ENOMEM;
-	memcpy(ha->addr, addr, addr_len);
-	ha->type = addr_type;
-	ha->refcount = 1;
-	ha->global_use = global;
-	ha->synced = false;
-	list_add_tail_rcu(&ha->list, &list->list);
-	list->count++;
-	return 0;
+	return __hw_addr_create_ex(list, addr, addr_len, addr_type, global);
 }
 
 static int __hw_addr_add(struct netdev_hw_addr_list *list, unsigned char *addr,
@@ -377,6 +386,34 @@ EXPORT_SYMBOL(dev_addr_del_multiple);
  */
 
 /**
+ *	dev_uc_add_excl - Add a global secondary unicast address
+ *	@dev: device
+ *	@addr: address to add
+ */
+int dev_uc_add_excl(struct net_device *dev, unsigned char *addr)
+{
+	struct netdev_hw_addr *ha;
+	int err;
+
+	netif_addr_lock_bh(dev);
+	list_for_each_entry(ha, &dev->uc.list, list) {
+		if (!memcmp(ha->addr, addr, dev->addr_len) &&
+		    ha->type == NETDEV_HW_ADDR_T_UNICAST) {
+			err = -EEXIST;
+			goto out;
+		}
+	}
+	err = __hw_addr_create_ex(&dev->uc, addr, dev->addr_len,
+				  NETDEV_HW_ADDR_T_UNICAST, true);
+	if (!err)
+		__dev_set_rx_mode(dev);
+out:
+	netif_addr_unlock_bh(dev);
+	return err;
+}
+EXPORT_SYMBOL(dev_uc_add_excl);
+
+/**
  *	dev_uc_add - Add a secondary unicast address
  *	@dev: device
  *	@addr: address to add
@@ -501,6 +538,34 @@ EXPORT_SYMBOL(dev_uc_init);
  * Multicast list handling functions
  */
 
+/**
+ *	dev_mc_add_excl - Add a global secondary multicast address
+ *	@dev: device
+ *	@addr: address to add
+ */
+int dev_mc_add_excl(struct net_device *dev, unsigned char *addr)
+{
+	struct netdev_hw_addr *ha;
+	int err;
+
+	netif_addr_lock_bh(dev);
+	list_for_each_entry(ha, &dev->mc.list, list) {
+		if (!memcmp(ha->addr, addr, dev->addr_len) &&
+		    ha->type == NETDEV_HW_ADDR_T_MULTICAST) {
+			err = -EEXIST;
+			goto out;
+		}
+	}
+	err = __hw_addr_create_ex(&dev->mc, addr, dev->addr_len,
+				  NETDEV_HW_ADDR_T_MULTICAST, true);
+	if (!err)
+		__dev_set_rx_mode(dev);
+out:
+	netif_addr_unlock_bh(dev);
+	return err;
+}
+EXPORT_SYMBOL(dev_mc_add_excl);
+
 static int __dev_mc_add(struct net_device *dev, unsigned char *addr,
 			bool global)
 {

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 3/8] net: add fdb generic dump routine
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
  2012-04-15 16:43 ` [net-next PATCH v4 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes John Fastabend
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds a generic dump routine drivers can call. It
should be sufficient to handle any bridging model that
uses the unicast address list. This should be most SR-IOV
enabled NICs.

v2: return error on nlmsg_put and use -EMSGSIZE instead
    of -ENOMEM this is inline other usages

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/rtnetlink.c |   84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 037f53c..9149018 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1980,6 +1980,37 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
 }
 
+static int nlmsg_populate_fdb_fill(struct sk_buff *skb,
+				   struct net_device *dev,
+				   u8 *addr, u32 pid, u32 seq,
+				   int type, unsigned int flags)
+{
+	struct nlmsghdr *nlh;
+	struct ndmsg *ndm;
+
+	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ndm), NLM_F_MULTI);
+	if (!nlh)
+		return -EMSGSIZE;
+
+	ndm = nlmsg_data(nlh);
+	ndm->ndm_family  = AF_BRIDGE;
+	ndm->ndm_pad1	 = 0;
+	ndm->ndm_pad2    = 0;
+	ndm->ndm_flags	 = flags;
+	ndm->ndm_type	 = 0;
+	ndm->ndm_ifindex = dev->ifindex;
+	ndm->ndm_state   = NUD_PERMANENT;
+
+	if (nla_put(skb, NDA_LLADDR, ETH_ALEN, addr))
+		goto nla_put_failure;
+
+	return nlmsg_end(skb, nlh);
+
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+	return -EMSGSIZE;
+}
+
 static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -2101,6 +2132,59 @@ out:
 	return err;
 }
 
+static int nlmsg_populate_fdb(struct sk_buff *skb,
+			      struct netlink_callback *cb,
+			      struct net_device *dev,
+			      int *idx,
+			      struct netdev_hw_addr_list *list)
+{
+	struct netdev_hw_addr *ha;
+	int err;
+	u32 pid, seq;
+
+	pid = NETLINK_CB(cb->skb).pid;
+	seq = cb->nlh->nlmsg_seq;
+
+	list_for_each_entry(ha, &list->list, list) {
+		if (*idx < cb->args[0])
+			goto skip;
+
+		err = nlmsg_populate_fdb_fill(skb, dev, ha->addr,
+					      pid, seq, 0, NTF_SELF);
+		if (err < 0)
+			return err;
+skip:
+		*idx += 1;
+	}
+	return 0;
+}
+
+/**
+ * ndo_dflt_fdb_dump: default netdevice operation to dump an FDB table.
+ * @nlh: netlink message header
+ * @dev: netdevice
+ *
+ * Default netdevice operation to dump the existing unicast address list.
+ * Returns zero on success.
+ */
+int ndo_dflt_fdb_dump(struct sk_buff *skb,
+		      struct netlink_callback *cb,
+		      struct net_device *dev,
+		      int idx)
+{
+	int err;
+
+	netif_addr_lock_bh(dev);
+	err = nlmsg_populate_fdb(skb, cb, dev, &idx, &dev->uc);
+	if (err)
+		goto out;
+	nlmsg_populate_fdb(skb, cb, dev, &idx, &dev->mc);
+out:
+	netif_addr_unlock_bh(dev);
+	return idx;
+}
+EXPORT_SYMBOL(ndo_dflt_fdb_dump);
+
 static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	int idx = 0;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (2 preceding siblings ...)
  2012-04-15 16:44 ` [net-next PATCH v4 3/8] net: add fdb generic dump routine John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 5/8] ixgbe: enable FDB netdevice ops John Fastabend
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

It is useful to be able to monitor for FDB events in user space.
This patch adds support to generate netlink events when a change
is made to a device supporting the FDB ops.

This brings embedded switches inline with the SW net/bridge which
triggers events on FDB updates as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/rtnetlink.c |   35 +++++++++++++++++++++++++++++++++--
 1 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9149018..46f69b5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2011,6 +2011,33 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+static inline size_t rtnl_fdb_nlmsg_size(void)
+{
+	return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN);
+}
+
+static void rtnl_fdb_notify(struct net_device *dev, u8 *addr, int type)
+{
+	struct net *net = dev_net(dev);
+	struct sk_buff *skb;
+	int err = -ENOBUFS;
+
+	skb = nlmsg_new(rtnl_fdb_nlmsg_size(), GFP_ATOMIC);
+	if (!skb)
+		goto errout;
+
+	err = nlmsg_populate_fdb_fill(skb, dev, addr, 0, 0, type, NTF_SELF);
+	if (err < 0) {
+		kfree_skb(skb);
+		goto errout;
+	}
+
+	rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+	return;
+errout:
+	rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
+}
+
 static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -2067,8 +2094,10 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		err = dev->netdev_ops->ndo_fdb_add(ndm, dev, addr,
 						   nlh->nlmsg_flags);
 
-		if (!err)
+		if (!err) {
+			rtnl_fdb_notify(dev, addr, RTM_NEWNEIGH);
 			ndm->ndm_flags &= ~NTF_SELF;
+		}
 	}
 out:
 	return err;
@@ -2125,8 +2154,10 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	if ((ndm->ndm_flags & NTF_SELF) && dev->netdev_ops->ndo_fdb_del) {
 		err = dev->netdev_ops->ndo_fdb_del(ndm, dev, addr);
 
-		if (!err)
+		if (!err) {
+			rtnl_fdb_notify(dev, addr, RTM_DELNEIGH);
 			ndm->ndm_flags &= ~NTF_SELF;
+		}
 	}
 out:
 	return err;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 5/8] ixgbe: enable FDB netdevice ops
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (3 preceding siblings ...)
  2012-04-15 16:44 ` [net-next PATCH v4 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 6/8] ixgbe: allow RAR table to be updated in promisc mode John Fastabend
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

Enable FDB ops on ixgbe when in SR-IOV mode.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   71 +++++++++++++++++++++++++
 1 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3e26b1f..8b37395 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6681,6 +6681,74 @@ static int ixgbe_set_features(struct net_device *netdev,
 	return 0;
 }
 
+static int ixgbe_ndo_fdb_add(struct ndmsg *ndm,
+			     struct net_device *dev,
+			     unsigned char *addr,
+			     u16 flags)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	int err = -EOPNOTSUPP;
+
+	if (ndm->ndm_state & NUD_PERMANENT) {
+		pr_info("%s: FDB only supports static addresses\n",
+			ixgbe_driver_name);
+		return -EINVAL;
+	}
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+		if (is_unicast_ether_addr(addr))
+			err = dev_uc_add_excl(dev, addr);
+		else if (is_multicast_ether_addr(addr))
+			err = dev_mc_add_excl(dev, addr);
+		else
+			err = -EINVAL;
+	}
+
+	/* Only return duplicate errors if NLM_F_EXCL is set */
+	if (err == -EEXIST && !(flags & NLM_F_EXCL))
+		err = 0;
+
+	return err;
+}
+
+static int ixgbe_ndo_fdb_del(struct ndmsg *ndm,
+			     struct net_device *dev,
+			     unsigned char *addr)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	int err = -EOPNOTSUPP;
+
+	if (ndm->ndm_state & NUD_PERMANENT) {
+		pr_info("%s: FDB only supports static addresses\n",
+			ixgbe_driver_name);
+		return -EINVAL;
+	}
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
+		if (is_unicast_ether_addr(addr))
+			err = dev_uc_del(dev, addr);
+		else if (is_multicast_ether_addr(addr))
+			err = dev_mc_del(dev, addr);
+		else
+			err = -EINVAL;
+	}
+
+	return err;
+}
+
+static int ixgbe_ndo_fdb_dump(struct sk_buff *skb,
+			      struct netlink_callback *cb,
+			      struct net_device *dev,
+			      int idx)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		idx = ndo_dflt_fdb_dump(skb, cb, dev, idx);
+
+	return idx;
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -6717,6 +6785,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #endif /* IXGBE_FCOE */
 	.ndo_set_features = ixgbe_set_features,
 	.ndo_fix_features = ixgbe_fix_features,
+	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
+	.ndo_fdb_del		= ixgbe_ndo_fdb_del,
+	.ndo_fdb_dump		= ixgbe_ndo_fdb_dump,
 };
 
 static void __devinit ixgbe_probe_vf(struct ixgbe_adapter *adapter,

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 6/8] ixgbe: allow RAR table to be updated in promisc mode
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (4 preceding siblings ...)
  2012-04-15 16:44 ` [net-next PATCH v4 5/8] ixgbe: enable FDB netdevice ops John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 7/8] ixgbe: UTA table incorrectly programmed John Fastabend
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This allows RAR table updates while in promiscuous. With
SR-IOV enabled it is valuable to allow the RAR table to
be updated even when in promisc mode to configure forwarding

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 8b37395..25a7ed9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3462,16 +3462,17 @@ void ixgbe_set_rx_mode(struct net_device *netdev)
 		}
 		ixgbe_vlan_filter_enable(adapter);
 		hw->addr_ctrl.user_set_promisc = false;
-		/*
-		 * Write addresses to available RAR registers, if there is not
-		 * sufficient space to store all the addresses then enable
-		 * unicast promiscuous mode
-		 */
-		count = ixgbe_write_uc_addr_list(netdev);
-		if (count < 0) {
-			fctrl |= IXGBE_FCTRL_UPE;
-			vmolr |= IXGBE_VMOLR_ROPE;
-		}
+	}
+
+	/*
+	 * Write addresses to available RAR registers, if there is not
+	 * sufficient space to store all the addresses then enable
+	 * unicast promiscuous mode
+	 */
+	count = ixgbe_write_uc_addr_list(netdev);
+	if (count < 0) {
+		fctrl |= IXGBE_FCTRL_UPE;
+		vmolr |= IXGBE_VMOLR_ROPE;
 	}
 
 	if (adapter->num_vfs) {

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 7/8] ixgbe: UTA table incorrectly programmed
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (5 preceding siblings ...)
  2012-04-15 16:44 ` [net-next PATCH v4 6/8] ixgbe: allow RAR table to be updated in promisc mode John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 16:44 ` [net-next PATCH v4 8/8] macvlan: add FDB bridge ops and macvlan flags John Fastabend
  2012-04-15 17:06 ` [net-next PATCH v4 0/8] Managing the forwarding database(FDB) David Miller
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

From: Greg Rose <gregory.v.rose@intel.com>

The UTA table was being set to the functional equivalent of promiscuous
mode.  This was resulting in traffic from the virtual function being
flooded onto the wire and the PF device. This resulted in additional
overhead for VF traffic sent to the network and in the case of traffic
sent to the PF or another VF resulted in unwanted packets on the wire.

This was actually not the intended behavior. Now that we can program
the embedded switch correctly we can remove this snippit of code. Users
who want to support this should configure the FDB correctly using the
FDB ops.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   29 -------------------------
 1 files changed, 0 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 25a7ed9..10606bd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2904,33 +2904,6 @@ static void ixgbe_configure_rscctl(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_RSCCTL(reg_idx), rscctrl);
 }
 
-/**
- *  ixgbe_set_uta - Set unicast filter table address
- *  @adapter: board private structure
- *
- *  The unicast table address is a register array of 32-bit registers.
- *  The table is meant to be used in a way similar to how the MTA is used
- *  however due to certain limitations in the hardware it is necessary to
- *  set all the hash bits to 1 and use the VMOLR ROPE bit as a promiscuous
- *  enable bit to allow vlan tag stripping when promiscuous mode is enabled
- **/
-static void ixgbe_set_uta(struct ixgbe_adapter *adapter)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	int i;
-
-	/* The UTA table only exists on 82599 hardware and newer */
-	if (hw->mac.type < ixgbe_mac_82599EB)
-		return;
-
-	/* we only need to do this if VMDq is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
-		return;
-
-	for (i = 0; i < 128; i++)
-		IXGBE_WRITE_REG(hw, IXGBE_UTA(i), ~0);
-}
-
 #define IXGBE_MAX_RX_DESC_POLL 10
 static void ixgbe_rx_desc_queue_enable(struct ixgbe_adapter *adapter,
 				       struct ixgbe_ring *ring)
@@ -3224,8 +3197,6 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter)
 	/* Program registers for the distribution of queues */
 	ixgbe_setup_mrqc(adapter);
 
-	ixgbe_set_uta(adapter);
-
 	/* set_rx_buffer_len must be called before ring initialization */
 	ixgbe_set_rx_buffer_len(adapter);
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [net-next PATCH v4 8/8] macvlan: add FDB bridge ops and macvlan flags
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (6 preceding siblings ...)
  2012-04-15 16:44 ` [net-next PATCH v4 7/8] ixgbe: UTA table incorrectly programmed John Fastabend
@ 2012-04-15 16:44 ` John Fastabend
  2012-04-15 17:06 ` [net-next PATCH v4 0/8] Managing the forwarding database(FDB) David Miller
  8 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-04-15 16:44 UTC (permalink / raw)
  To: shemminger, mst, davem, bhutchings
  Cc: sri, hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

This adds FDB bridge ops to the macvlan device passthru mode.
Additionally a flags field was added and a NOPROMISC bit to
allow users to use passthru mode without the driver calling
dev_set_promiscuity(). The flags field is a u16 placed in a
4 byte hole (consuming 2 bytes) of the macvlan_dev struct.

We want to do this so that the macvlan driver or stack
above the macvlan driver does not have to process every
packet. For the use case where we know all the MAC addresses
of the endstations above us this works well.

This patch is a result of Roopa Prabhu's work. Follow up
patches are needed for VEPA and VEB macvlan modes.

v2: Change from distinct nopromisc mode to a flags field to
    configure this. This avoids the tendency to add a new
    mode every time we need some slightly different behavior.
v3: fix error in dev_set_promiscuity and add change and get
    link attributes for flags.

CC: Roopa Prabhu <roprabhu@cisco.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 drivers/net/macvlan.c      |   73 ++++++++++++++++++++++++++++++++++++++++----
 include/linux/if_link.h    |    3 ++
 include/linux/if_macvlan.h |    1 +
 3 files changed, 71 insertions(+), 6 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b17fc90..9653ed6 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -312,7 +312,8 @@ static int macvlan_open(struct net_device *dev)
 	int err;
 
 	if (vlan->port->passthru) {
-		dev_set_promiscuity(lowerdev, 1);
+		if (!(vlan->flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(lowerdev, 1);
 		goto hash_add;
 	}
 
@@ -344,12 +345,15 @@ static int macvlan_stop(struct net_device *dev)
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
 
+	dev_uc_unsync(lowerdev, dev);
+	dev_mc_unsync(lowerdev, dev);
+
 	if (vlan->port->passthru) {
-		dev_set_promiscuity(lowerdev, -1);
+		if (!(vlan->flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(lowerdev, -1);
 		goto hash_del;
 	}
 
-	dev_mc_unsync(lowerdev, dev);
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(lowerdev, -1);
 
@@ -399,10 +403,11 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change)
 		dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1);
 }
 
-static void macvlan_set_multicast_list(struct net_device *dev)
+static void macvlan_set_mac_lists(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
+	dev_uc_sync(vlan->lowerdev, dev);
 	dev_mc_sync(vlan->lowerdev, dev);
 }
 
@@ -542,6 +547,43 @@ static int macvlan_vlan_rx_kill_vid(struct net_device *dev,
 	return 0;
 }
 
+static int macvlan_fdb_add(struct ndmsg *ndm,
+			   struct net_device *dev,
+			   unsigned char *addr,
+			   u16 flags)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	if (!vlan->port->passthru)
+		return -EOPNOTSUPP;
+
+	if (is_unicast_ether_addr(addr))
+		err = dev_uc_add_excl(dev, addr);
+	else if (is_multicast_ether_addr(addr))
+		err = dev_mc_add_excl(dev, addr);
+
+	return err;
+}
+
+static int macvlan_fdb_del(struct ndmsg *ndm,
+			   struct net_device *dev,
+			   unsigned char *addr)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err = -EINVAL;
+
+	if (!vlan->port->passthru)
+		return -EOPNOTSUPP;
+
+	if (is_unicast_ether_addr(addr))
+		err = dev_uc_del(dev, addr);
+	else if (is_multicast_ether_addr(addr))
+		err = dev_mc_del(dev, addr);
+
+	return err;
+}
+
 static void macvlan_ethtool_get_drvinfo(struct net_device *dev,
 					struct ethtool_drvinfo *drvinfo)
 {
@@ -572,11 +614,14 @@ static const struct net_device_ops macvlan_netdev_ops = {
 	.ndo_change_mtu		= macvlan_change_mtu,
 	.ndo_change_rx_flags	= macvlan_change_rx_flags,
 	.ndo_set_mac_address	= macvlan_set_mac_address,
-	.ndo_set_rx_mode	= macvlan_set_multicast_list,
+	.ndo_set_rx_mode	= macvlan_set_mac_lists,
 	.ndo_get_stats64	= macvlan_dev_get_stats64,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_vlan_rx_add_vid	= macvlan_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= macvlan_vlan_rx_kill_vid,
+	.ndo_fdb_add		= macvlan_fdb_add,
+	.ndo_fdb_del		= macvlan_fdb_del,
+	.ndo_fdb_dump		= ndo_dflt_fdb_dump,
 };
 
 void macvlan_common_setup(struct net_device *dev)
@@ -711,6 +756,9 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 	if (data && data[IFLA_MACVLAN_MODE])
 		vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
 
+	if (data && data[IFLA_MACVLAN_FLAGS])
+		vlan->flags = nla_get_u16(data[IFLA_MACVLAN_FLAGS]);
+
 	if (vlan->mode == MACVLAN_MODE_PASSTHRU) {
 		if (port->count)
 			return -EINVAL;
@@ -760,6 +808,16 @@ static int macvlan_changelink(struct net_device *dev,
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	if (data && data[IFLA_MACVLAN_MODE])
 		vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
+	if (data && data[IFLA_MACVLAN_FLAGS]) {
+		__u16 flags = nla_get_u16(data[IFLA_MACVLAN_FLAGS]);
+		bool promisc = (flags ^ vlan->flags) & MACVLAN_FLAG_NOPROMISC;
+
+		if (promisc && (flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(vlan->lowerdev, -1);
+		else if (promisc && !(flags & MACVLAN_FLAG_NOPROMISC))
+			dev_set_promiscuity(vlan->lowerdev, 1);
+		vlan->flags = flags;
+	}
 	return 0;
 }
 
@@ -775,6 +833,8 @@ static int macvlan_fill_info(struct sk_buff *skb,
 
 	if (nla_put_u32(skb, IFLA_MACVLAN_MODE, vlan->mode))
 		goto nla_put_failure;
+	if (nla_put_u16(skb, IFLA_MACVLAN_FLAGS, vlan->flags))
+		goto nla_put_failure;
 	return 0;
 
 nla_put_failure:
@@ -782,7 +842,8 @@ nla_put_failure:
 }
 
 static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
-	[IFLA_MACVLAN_MODE] = { .type = NLA_U32 },
+	[IFLA_MACVLAN_MODE]  = { .type = NLA_U32 },
+	[IFLA_MACVLAN_FLAGS] = { .type = NLA_U16 },
 };
 
 int macvlan_link_register(struct rtnl_link_ops *ops)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 2f4fa93..f715750 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -255,6 +255,7 @@ struct ifla_vlan_qos_mapping {
 enum {
 	IFLA_MACVLAN_UNSPEC,
 	IFLA_MACVLAN_MODE,
+	IFLA_MACVLAN_FLAGS,
 	__IFLA_MACVLAN_MAX,
 };
 
@@ -267,6 +268,8 @@ enum macvlan_mode {
 	MACVLAN_MODE_PASSTHRU = 8,/* take over the underlying device */
 };
 
+#define MACVLAN_FLAG_NOPROMISC	1
+
 /* SR-IOV virtual function management section */
 
 enum {
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index d103dca..f65e8d2 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -60,6 +60,7 @@ struct macvlan_dev {
 	struct net_device	*lowerdev;
 	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 	enum macvlan_mode	mode;
+	u16			flags;
 	int (*receive)(struct sk_buff *skb);
 	int (*forward)(struct net_device *dev, struct sk_buff *skb);
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
                   ` (7 preceding siblings ...)
  2012-04-15 16:44 ` [net-next PATCH v4 8/8] macvlan: add FDB bridge ops and macvlan flags John Fastabend
@ 2012-04-15 17:06 ` David Miller
  2012-05-02 15:08   ` Michael S. Tsirkin
  8 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2012-04-15 17:06 UTC (permalink / raw)
  To: john.r.fastabend
  Cc: shemminger, mst, bhutchings, sri, hadi, jeffrey.t.kirsher,
	netdev, gregory.v.rose, krkumar2

From: John Fastabend <john.r.fastabend@intel.com>
Date: Sun, 15 Apr 2012 09:43:51 -0700

> The following series is a submission for net-next to allow
> embedded switches and other stacked devices other then the
> Linux bridge to manage a forwarding database.
> 
> Previously discussed here,
> 
> http://lists.openwall.net/netdev/2012/03/19/26
> 
> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
> 
> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>     error and add the flags field to change and get link routines.
> 
> v2: addressed feedback from Ben Hutchings resolving a typo in the
>     multicast add/del routines and improving the error handling
>     when both NTF_SELF and NTF_MASTER are set.
> 
> I've tested this with 'br' tool published by Stephen Hemminger
> soon to be renamed 'bridge' I believe and various traffic
> generators mostly pktgen, ping, and netperf.

All applied, if we need any more tweaks we can just add them
on top of this work.

Thanks John.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-04-15 17:06 ` [net-next PATCH v4 0/8] Managing the forwarding database(FDB) David Miller
@ 2012-05-02 15:08   ` Michael S. Tsirkin
  2012-05-02 21:52     ` John Fastabend
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-05-02 15:08 UTC (permalink / raw)
  Cc: john.r.fastabend, shemminger, bhutchings, sri, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2, roprabhu

On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
> From: John Fastabend <john.r.fastabend@intel.com>
> Date: Sun, 15 Apr 2012 09:43:51 -0700
> 
> > The following series is a submission for net-next to allow
> > embedded switches and other stacked devices other then the
> > Linux bridge to manage a forwarding database.
> > 
> > Previously discussed here,
> > 
> > http://lists.openwall.net/netdev/2012/03/19/26
> > 
> > v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
> > 
> > v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
> >     error and add the flags field to change and get link routines.
> > 
> > v2: addressed feedback from Ben Hutchings resolving a typo in the
> >     multicast add/del routines and improving the error handling
> >     when both NTF_SELF and NTF_MASTER are set.
> > 
> > I've tested this with 'br' tool published by Stephen Hemminger
> > soon to be renamed 'bridge' I believe and various traffic
> > generators mostly pktgen, ping, and netperf.
> 
> All applied, if we need any more tweaks we can just add them
> on top of this work.
> 
> Thanks John.

John, do you plan to update kvm userspace to use this interface?

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-05-02 15:08   ` Michael S. Tsirkin
@ 2012-05-02 21:52     ` John Fastabend
  2012-05-02 23:36       ` Sridhar Samudrala
  2012-05-03  5:48       ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: John Fastabend @ 2012-05-02 21:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: shemminger, bhutchings, sri, hadi, jeffrey.t.kirsher, netdev,
	gregory.v.rose, krkumar2, roprabhu

On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>> From: John Fastabend <john.r.fastabend@intel.com>
>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>
>>> The following series is a submission for net-next to allow
>>> embedded switches and other stacked devices other then the
>>> Linux bridge to manage a forwarding database.
>>>
>>> Previously discussed here,
>>>
>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>
>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>
>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>     error and add the flags field to change and get link routines.
>>>
>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>     multicast add/del routines and improving the error handling
>>>     when both NTF_SELF and NTF_MASTER are set.
>>>
>>> I've tested this with 'br' tool published by Stephen Hemminger
>>> soon to be renamed 'bridge' I believe and various traffic
>>> generators mostly pktgen, ping, and netperf.
>>
>> All applied, if we need any more tweaks we can just add them
>> on top of this work.
>>
>> Thanks John.
> 
> John, do you plan to update kvm userspace to use this interface?
> 

No immediate plans. I would really appreciate it if you or one
of the IBM developers working in this space took it on. Of course
if no one steps up I guess I can eventually get at it but it will
be sometime. For now I've been doing this manually with the bridge
tool yet to be published.

.John

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-05-02 21:52     ` John Fastabend
@ 2012-05-02 23:36       ` Sridhar Samudrala
  2012-05-03 19:38         ` John Fastabend
  2012-05-03  5:48       ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-05-02 23:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: Michael S. Tsirkin, shemminger, bhutchings, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2, roprabhu

On 5/2/2012 2:52 PM, John Fastabend wrote:
> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
>> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>>> From: John Fastabend<john.r.fastabend@intel.com>
>>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>>
>>>> The following series is a submission for net-next to allow
>>>> embedded switches and other stacked devices other then the
>>>> Linux bridge to manage a forwarding database.
>>>>
>>>> Previously discussed here,
>>>>
>>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>>
>>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>>
>>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>>      error and add the flags field to change and get link routines.
>>>>
>>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>>      multicast add/del routines and improving the error handling
>>>>      when both NTF_SELF and NTF_MASTER are set.
>>>>
>>>> I've tested this with 'br' tool published by Stephen Hemminger
>>>> soon to be renamed 'bridge' I believe and various traffic
>>>> generators mostly pktgen, ping, and netperf.
>>> All applied, if we need any more tweaks we can just add them
>>> on top of this work.
>>>
>>> Thanks John.
>> John, do you plan to update kvm userspace to use this interface?
>>
> No immediate plans. I would really appreciate it if you or one
> of the IBM developers working in this space took it on. Of course
> if no one steps up I guess I can eventually get at it but it will
> be sometime. For now I've been doing this manually with the bridge
> tool yet to be published.
>
>
Does this mean that when we add an interface to a bridge, it need not be 
put in promiscuous mode and
add/delete fdb entries dynamically?
Or are we talking only about VMs attached to macvtap?

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-05-02 21:52     ` John Fastabend
  2012-05-02 23:36       ` Sridhar Samudrala
@ 2012-05-03  5:48       ` Michael S. Tsirkin
  2012-05-03 19:26         ` John Fastabend
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-05-03  5:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: shemminger, bhutchings, sri, hadi, jeffrey.t.kirsher, netdev,
	gregory.v.rose, krkumar2, roprabhu

On Wed, May 02, 2012 at 02:52:33PM -0700, John Fastabend wrote:
> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
> > On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
> >> From: John Fastabend <john.r.fastabend@intel.com>
> >> Date: Sun, 15 Apr 2012 09:43:51 -0700
> >>
> >>> The following series is a submission for net-next to allow
> >>> embedded switches and other stacked devices other then the
> >>> Linux bridge to manage a forwarding database.
> >>>
> >>> Previously discussed here,
> >>>
> >>> http://lists.openwall.net/netdev/2012/03/19/26
> >>>
> >>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
> >>>
> >>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
> >>>     error and add the flags field to change and get link routines.
> >>>
> >>> v2: addressed feedback from Ben Hutchings resolving a typo in the
> >>>     multicast add/del routines and improving the error handling
> >>>     when both NTF_SELF and NTF_MASTER are set.
> >>>
> >>> I've tested this with 'br' tool published by Stephen Hemminger
> >>> soon to be renamed 'bridge' I believe and various traffic
> >>> generators mostly pktgen, ping, and netperf.
> >>
> >> All applied, if we need any more tweaks we can just add them
> >> on top of this work.
> >>
> >> Thanks John.
> > 
> > John, do you plan to update kvm userspace to use this interface?
> > 
> 
> No immediate plans. I would really appreciate it if you or one
> of the IBM developers working in this space took it on. Of course
> if no one steps up I guess I can eventually get at it but it will
> be sometime. For now I've been doing this manually with the bridge
> tool yet to be published.
> 
> .John

It'll be easier once you publish the tool, qemu can just run
scripts like it does for ifup/ifdown now.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-05-03  5:48       ` Michael S. Tsirkin
@ 2012-05-03 19:26         ` John Fastabend
  0 siblings, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-05-03 19:26 UTC (permalink / raw)
  To: shemminger
  Cc: Michael S. Tsirkin, bhutchings, sri, hadi, jeffrey.t.kirsher,
	netdev, gregory.v.rose, krkumar2, roprabhu

On 5/2/2012 10:48 PM, Michael S. Tsirkin wrote:
> On Wed, May 02, 2012 at 02:52:33PM -0700, John Fastabend wrote:
>> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
>>> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>>>> From: John Fastabend <john.r.fastabend@intel.com>
>>>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>>>
>>>>> The following series is a submission for net-next to allow
>>>>> embedded switches and other stacked devices other then the
>>>>> Linux bridge to manage a forwarding database.
>>>>>
>>>>> Previously discussed here,
>>>>>
>>>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>>>
>>>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>>>
>>>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>>>     error and add the flags field to change and get link routines.
>>>>>
>>>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>>>     multicast add/del routines and improving the error handling
>>>>>     when both NTF_SELF and NTF_MASTER are set.
>>>>>
>>>>> I've tested this with 'br' tool published by Stephen Hemminger
>>>>> soon to be renamed 'bridge' I believe and various traffic
>>>>> generators mostly pktgen, ping, and netperf.
>>>>
>>>> All applied, if we need any more tweaks we can just add them
>>>> on top of this work.
>>>>
>>>> Thanks John.
>>>
>>> John, do you plan to update kvm userspace to use this interface?
>>>
>>
>> No immediate plans. I would really appreciate it if you or one
>> of the IBM developers working in this space took it on. Of course
>> if no one steps up I guess I can eventually get at it but it will
>> be sometime. For now I've been doing this manually with the bridge
>> tool yet to be published.
>>
>> .John
> 
> It'll be easier once you publish the tool, qemu can just run
> scripts like it does for ifup/ifdown now.
> 

Agreed.

Stephen when do you think you will be able to submit 'br' renamed
'bridge' for the iproute2 package? I've been using it now for sometime
without any issues and it seems to be working great for me.

Thanks,
John

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-05-02 23:36       ` Sridhar Samudrala
@ 2012-05-03 19:38         ` John Fastabend
  2012-05-04  5:43           ` Sridhar Samudrala
  0 siblings, 1 reply; 19+ messages in thread
From: John Fastabend @ 2012-05-03 19:38 UTC (permalink / raw)
  To: Sridhar Samudrala, Roopa Prabhu
  Cc: Michael S. Tsirkin, shemminger, bhutchings, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

On 5/2/2012 4:36 PM, Sridhar Samudrala wrote:
> On 5/2/2012 2:52 PM, John Fastabend wrote:
>> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
>>> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>>>> From: John Fastabend<john.r.fastabend@intel.com>
>>>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>>>
>>>>> The following series is a submission for net-next to allow
>>>>> embedded switches and other stacked devices other then the
>>>>> Linux bridge to manage a forwarding database.
>>>>>
>>>>> Previously discussed here,
>>>>>
>>>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>>>
>>>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>>>
>>>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>>>      error and add the flags field to change and get link routines.
>>>>>
>>>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>>>      multicast add/del routines and improving the error handling
>>>>>      when both NTF_SELF and NTF_MASTER are set.
>>>>>
>>>>> I've tested this with 'br' tool published by Stephen Hemminger
>>>>> soon to be renamed 'bridge' I believe and various traffic
>>>>> generators mostly pktgen, ping, and netperf.
>>>> All applied, if we need any more tweaks we can just add them
>>>> on top of this work.
>>>>
>>>> Thanks John.
>>> John, do you plan to update kvm userspace to use this interface?
>>>
>> No immediate plans. I would really appreciate it if you or one
>> of the IBM developers working in this space took it on. Of course
>> if no one steps up I guess I can eventually get at it but it will
>> be sometime. For now I've been doing this manually with the bridge
>> tool yet to be published.
>>
>>
> Does this mean that when we add an interface to a bridge, it need not be put in promiscuous mode and
> add/delete fdb entries dynamically?

The net/bridge will automatically put the interface in promisc mode
when the device is attached. We do need to add/delete fdb entries
though to allow forwarding packets from the virtual function and
any emulated devices e.g. tap devices on the bridge.

Currently I am doing this by manually running a tool Stephen created.
My hope would be to integrate this with KVM so that when I setup my
VM with an emulated device and have SR-IOV enabled perhaps for direct
assign use case qemu/libvirt also adds the VM address to the embedded
switch FDB.

> Or are we talking only about VMs attached to macvtap?
> 

The macvlan bridge calls dev_uc_add and dev_uc_sync so in this case
we shouldn't need to explicitly add entries to the embedded bridge
on the physical function.

> Thanks
> Sridhar
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
  2012-05-03 19:38         ` John Fastabend
@ 2012-05-04  5:43           ` Sridhar Samudrala
       [not found]             ` <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Sridhar Samudrala @ 2012-05-04  5:43 UTC (permalink / raw)
  To: John Fastabend
  Cc: Roopa Prabhu, Michael S. Tsirkin, shemminger, bhutchings, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

On 5/3/2012 12:38 PM, John Fastabend wrote:
> On 5/2/2012 4:36 PM, Sridhar Samudrala wrote:
>> On 5/2/2012 2:52 PM, John Fastabend wrote:
>>> On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
>>>> On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
>>>>> From: John Fastabend<john.r.fastabend@intel.com>
>>>>> Date: Sun, 15 Apr 2012 09:43:51 -0700
>>>>>
>>>>>> The following series is a submission for net-next to allow
>>>>>> embedded switches and other stacked devices other then the
>>>>>> Linux bridge to manage a forwarding database.
>>>>>>
>>>>>> Previously discussed here,
>>>>>>
>>>>>> http://lists.openwall.net/netdev/2012/03/19/26
>>>>>>
>>>>>> v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
>>>>>>
>>>>>> v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>>>>>>       error and add the flags field to change and get link routines.
>>>>>>
>>>>>> v2: addressed feedback from Ben Hutchings resolving a typo in the
>>>>>>       multicast add/del routines and improving the error handling
>>>>>>       when both NTF_SELF and NTF_MASTER are set.
>>>>>>
>>>>>> I've tested this with 'br' tool published by Stephen Hemminger
>>>>>> soon to be renamed 'bridge' I believe and various traffic
>>>>>> generators mostly pktgen, ping, and netperf.
>>>>> All applied, if we need any more tweaks we can just add them
>>>>> on top of this work.
>>>>>
>>>>> Thanks John.
>>>> John, do you plan to update kvm userspace to use this interface?
>>>>
>>> No immediate plans. I would really appreciate it if you or one
>>> of the IBM developers working in this space took it on. Of course
>>> if no one steps up I guess I can eventually get at it but it will
>>> be sometime. For now I've been doing this manually with the bridge
>>> tool yet to be published.
>>>
>>>
>> Does this mean that when we add an interface to a bridge, it need not be put in promiscuous mode and
>> add/delete fdb entries dynamically?
> The net/bridge will automatically put the interface in promisc mode
> when the device is attached. We do need to add/delete fdb entries
> though to allow forwarding packets from the virtual function and
> any emulated devices e.g. tap devices on the bridge.

Consider the following scenario where we have a SR-IOV NIC with 1 PF
and 2 VFs (VF1 & VF2).
- eth0 is the PF which is attached to bridge br0 and connected to 2 VMs 
VM1 and VM2.
- eth1 is the VF1 terminated on the host and assigned to VM3 via 
macvtap0 in passthru mode.
- VF2 is directly assigned to VM4 via pci-device assignment.

  VM1      VM2         VM3           VM4
(mac1)  (mac2)     (mac3)         (mac4)
  |        |           |             |
  |        |           |             |
vnet0   vnet1         |             |
  |        |           |             |
  \        /           |             |
   \      /            |             |
     br0            macvtap0         |
      |              (mac3)          |
      |                |             |
     eth0            eth1            |
      |              (mac3)          |
      |               |              |
    ------------------------------------
   | PF              VF1           VF2  |
   |                                    |
   |                 VEB                |
   ------------------------------------

In this setup, i think when VM1 and VM2 come up, mac1 and mac2 have to 
be added to the
embedded bridge's fdb.  Once we add these 2 entries, all the 4 VMs can 
talk to each other.
Is this correct?

Now, if VM1 or VM2 wants to add secondary mac addresses, i think we need 
qemu to add a new fdb
entry when it receives add mac address command via virtio control vq.

Can we add multiple mac addresses to VFs? For example VM3 and VM4 trying 
to add a secondary mac address.

What about VMs trying to create VLANs? I think this will work on VM1 and 
VM2. However with VM3
and VM4, i think we need qemu to add vlans to the VFs when the VMs 
create them.

Thanks
Sridhar

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
       [not found]             ` <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>
@ 2012-05-05  5:00               ` John Fastabend
  2012-05-05 19:53               ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: John Fastabend @ 2012-05-05  5:00 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Sridhar Samudrala, Michael S. Tsirkin, shemminger, bhutchings,
	hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

On 5/4/2012 1:34 PM, Roopa Prabhu wrote:
> 
> 
> On Thu, May 3, 2012 at 10:43 PM, Sridhar Samudrala <sri@us.ibm.com <mailto:sri@us.ibm.com>> wrote:
> 
>     On 5/3/2012 12:38 PM, John Fastabend wrote:
> 
>         On 5/2/2012 4:36 PM, Sridhar Samudrala wrote:
> 
>             On 5/2/2012 2:52 PM, John Fastabend wrote:
> 
>                 On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
> 
>                     On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
> 
>                         From: John Fastabend<john.r.fastabend@__intel.com <mailto:john.r.fastabend@intel.com>>
>                         Date: Sun, 15 Apr 2012 09:43:51 -0700
> 
>                             The following series is a submission for net-next to allow
>                             embedded switches and other stacked devices other then the
>                             Linux bridge to manage a forwarding database.
> 
>                             Previously discussed here,
> 
>                             http://lists.openwall.net/__netdev/2012/03/19/26 <http://lists.openwall.net/netdev/2012/03/19/26>
> 
>                             v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
> 
>                             v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>                                  error and add the flags field to change and get link routines.
> 
>                             v2: addressed feedback from Ben Hutchings resolving a typo in the
>                                  multicast add/del routines and improving the error handling
>                                  when both NTF_SELF and NTF_MASTER are set.
> 
>                             I've tested this with 'br' tool published by Stephen Hemminger
>                             soon to be renamed 'bridge' I believe and various traffic
>                             generators mostly pktgen, ping, and netperf.
> 
>                         All applied, if we need any more tweaks we can just add them
>                         on top of this work.
> 
>                         Thanks John.
> 
>                     John, do you plan to update kvm userspace to use this interface?
> 
>                 No immediate plans. I would really appreciate it if you or one
>                 of the IBM developers working in this space took it on. Of course
>                 if no one steps up I guess I can eventually get at it but it will
>                 be sometime. For now I've been doing this manually with the bridge
>                 tool yet to be published.
> 
> 
>             Does this mean that when we add an interface to a bridge, it need not be put in promiscuous mode and
>             add/delete fdb entries dynamically?
> 
>         The net/bridge will automatically put the interface in promisc mode
>         when the device is attached. We do need to add/delete fdb entries
>         though to allow forwarding packets from the virtual function and
>         any emulated devices e.g. tap devices on the bridge.
> 
> 
>     Consider the following scenario where we have a SR-IOV NIC with 1 PF
>     and 2 VFs (VF1 & VF2).
>     - eth0 is the PF which is attached to bridge br0 and connected to 2 VMs VM1 and VM2.
>     - eth1 is the VF1 terminated on the host and assigned to VM3 via macvtap0 in passthru mode.
>     - VF2 is directly assigned to VM4 via pci-device assignment.
> 
>      VM1      VM2         VM3           VM4
>     (mac1)  (mac2)     (mac3)         (mac4)
>      |        |           |             |
>      |        |           |             |
>     vnet0   vnet1         |             |
>      |        |           |             |
>      \        /           |             |
>      \      /            |             |
>        br0            macvtap0         |
>         |              (mac3)          |
>         |                |             |
>        eth0            eth1            |
>         |              (mac3)          |
>         |               |              |
>       ------------------------------__------
>      | PF              VF1           VF2  |
>      |                                    |
>      |                 VEB                |
>      ------------------------------__------
> 
>     In this setup, i think when VM1 and VM2 come up, mac1 and mac2 have to be added to the
>     embedded bridge's fdb.  Once we add these 2 entries, all the 4 VMs can talk to each other.
>     Is this correct?
> 

Correct as Roopa indicated.

>     Now, if VM1 or VM2 wants to add secondary mac addresses, i think we need qemu to add a new fdb
>     entry when it receives add mac address command via virtio control vq.
> 
> 
> yes. I had used (with some tweaks) some existing qemu patches on patchwork to try this out with my implementation.
> 
> The links to the patches on patchwork are listed in my cover mail at http://marc.info/?l=linux-netdev&m=131534911001054&w=2 <http://marc.info/?l=linux-netdev&m=131534911001054&w=2>
> 
>  
> 
>     Can we add multiple mac addresses to VFs? For example VM3 and VM4 trying to add a secondary mac address.

Yes this is why we also added the fdb interface to the macvlan device as well.

> 
>     What about VMs trying to create VLANs? I think this will work on VM1 and VM2. However with VM3
>     and VM4, i think we need qemu to add vlans to the VFs when the VMs create them.
> 
> 
> yes for vlans too, the qemu patches pointed out above can be reused.
> 
> Thanks,
> Roopa
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
       [not found]             ` <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>
  2012-05-05  5:00               ` John Fastabend
@ 2012-05-05 19:53               ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2012-05-05 19:53 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Sridhar Samudrala, John Fastabend, shemminger, bhutchings, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2

On Fri, May 04, 2012 at 01:34:24PM -0700, Roopa Prabhu wrote:
> the qemu patches pointed out above can be reused.

Do you have plans to do this?

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-05-05 19:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-15 16:43 [net-next PATCH v4 0/8] Managing the forwarding database(FDB) John Fastabend
2012-04-15 16:43 ` [net-next PATCH v4 1/8] net: add generic PF_BRIDGE:RTM_ FDB hooks John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 2/8] net: addr_list: add exclusive dev_uc_add and dev_mc_add John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 3/8] net: add fdb generic dump routine John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 4/8] net: rtnetlink notify events for FDB NTF_SELF adds and deletes John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 5/8] ixgbe: enable FDB netdevice ops John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 6/8] ixgbe: allow RAR table to be updated in promisc mode John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 7/8] ixgbe: UTA table incorrectly programmed John Fastabend
2012-04-15 16:44 ` [net-next PATCH v4 8/8] macvlan: add FDB bridge ops and macvlan flags John Fastabend
2012-04-15 17:06 ` [net-next PATCH v4 0/8] Managing the forwarding database(FDB) David Miller
2012-05-02 15:08   ` Michael S. Tsirkin
2012-05-02 21:52     ` John Fastabend
2012-05-02 23:36       ` Sridhar Samudrala
2012-05-03 19:38         ` John Fastabend
2012-05-04  5:43           ` Sridhar Samudrala
     [not found]             ` <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>
2012-05-05  5:00               ` John Fastabend
2012-05-05 19:53               ` Michael S. Tsirkin
2012-05-03  5:48       ` Michael S. Tsirkin
2012-05-03 19:26         ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.