netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Series short description
@ 2013-09-11 18:45 John Fastabend
  2013-09-11 18:46 ` [RFC PATCH 1/4] net: rtnetlink: make priv_size a function for devs with dynamic size John Fastabend
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: John Fastabend @ 2013-09-11 18:45 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson

This patch series implements virtual station interfaces or VSIs. The
VSI term comes from the IEEE std 802.1Qbg-2012 specification which
should be merged with 802.1Q proper at some point.

	3.18 Virtual Station Interface (VSI): An interface to a
	     virtual station that is attached to a DRP of an edge
	     edge relay.

A DRP (downlink relay port) is the link between an edge relay and
a VSI. An edge relay is basically a simple bridge that does not
need to support learning, flooding, xSTP, etc. Which matches up well
with the ixgbe and I believe other hardware embedded bridge (ebridge)
implementations.

This series adds a new VSI rtnl link type. I chose to do this via
a link type because it allows us to reuse a lot of the existing
infrastructure to bring up a net device and it lets a VSI look
similar to a macvlan. The macvlan link type being the software
equivalent of a VSI. In many cases I can simply replace the
macvlan type with vsi from the ip command line tool and my existing
scripts work but use the ebridge instead of SW.

The usage model looks like this,

# ip link add link p3p2 numtxqueues 2 numrxqueues 2 type vsi
# ip link add link p3p2 numtxqueues 2 numrxqueues 2 type vsi
# ip link add link p3p2 numtxqueues 4 numrxqueues 4 type vsi
# ip link set dev vsi0 addr 00:1b:21:69:9f:15
# ip link set dev vsi1 addr 00:1b:21:69:9f:16
# ip link set dev vsi2 addr 00:1b:21:69:9f:17
# ip link set dev vsi0 up
# ip link set dev vsi1 up
# ip link set dev vsi2 up
# ip link show
16: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:1b:21:69:9f:09 brd ff:ff:ff:ff:ff:ff
17: vsi0@p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:1b:21:69:9f:15 brd ff:ff:ff:ff:ff:ff
18: vsi1@p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:1b:21:69:9f:16 brd ff:ff:ff:ff:ff:ff
19: vsi2@p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 00:1b:21:69:9f:17 brd ff:ff:ff:ff:ff:ff

And creates this topology,

      vsi0      vsi1       vsi2
       |         |          |
     +------------------------+
     |                        |
     |        ebridge         |
     |                        |
     +------------------------+
                 |
               p3p2

At this point each vsi# will receive their assigned MAC addresses.
Using the 'fdb' interfaces additional L2/L3/tunnel entries could be
added to the vsi#. And VSIs can be assigned to net name spaces.

The topology of the ebridge is tracked via the upper and lower dev
lists. After we get Veaceslav Falico's work to expose this via sysfs
then the topology will be visible. Although it can be learned also
via iflink:ifindex to some extent.

PATCH DESCRIPTION:

The first extend rtnl link ops priv_size routine so VSI link types
can set the private space for the netdev being created. This is
required because device drivers will use this space.

The second patch adds some helper routines to traverse the lower dev
list.

The third patch is the interface work to support VSI devices.

And the last patch is an implementation for ixgbe. Notice there are
still a few items I need to clean up on this patch before submitting
but it is working, without DCB/FCoE, now. And I think I can simplify
it some to bring down the line count.

My plan would be to submit this as a real patch after net-next opens
but I wanted to see if there was any initial feedback especially
related to the VSI link type.

phew... sorry that was a bit long winded.

---

John Fastabend (4):
      net: rtnetlink: make priv_size a function for devs with dynamic size
      net: Add lower dev list helpers
      net: VSI: Add virtual station interface support
      ixgbe: Adding VSI support to ixgbe


 drivers/infiniband/ulp/ipoib/ipoib_netlink.c     |    4 
 drivers/net/Kconfig                              |    9 
 drivers/net/Makefile                             |    1 
 drivers/net/bonding/bond_main.c                  |    4 
 drivers/net/caif/caif_hsi.c                      |    4 
 drivers/net/ethernet/intel/ixgbe/Makefile        |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   32 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    4 
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  307 +++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c     |  428 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h     |   71 ++++
 drivers/net/ifb.c                                |    4 
 drivers/net/macvlan.c                            |    4 
 drivers/net/nlmon.c                              |    4 
 drivers/net/team/team.c                          |    4 
 drivers/net/tun.c                                |    4 
 drivers/net/veth.c                               |    4 
 drivers/net/vsi.c                                |  124 ++++++
 drivers/net/vxlan.c                              |    4 
 include/linux/netdevice.h                        |   35 ++
 include/net/rtnetlink.h                          |   11 +
 include/uapi/linux/if.h                          |    1 
 net/8021q/vlan_netlink.c                         |    4 
 net/bridge/br_netlink.c                          |    4 
 net/core/dev.c                                   |   25 +
 net/core/rtnetlink.c                             |    2 
 net/ieee802154/6lowpan.c                         |    4 
 net/ipv4/ip_gre.c                                |    6 
 net/ipv4/ip_tunnel.c                             |    2 
 net/ipv4/ip_vti.c                                |    4 
 net/ipv4/ipip.c                                  |    4 
 net/ipv6/ip6_tunnel.c                            |    4 
 net/ipv6/sit.c                                   |    4 
 34 files changed, 1028 insertions(+), 116 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h
 create mode 100644 drivers/net/vsi.c

-- 
Signature

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 1/4] net: rtnetlink: make priv_size a function for devs with dynamic size
  2013-09-11 18:45 [RFC PATCH 0/4] Series short description John Fastabend
@ 2013-09-11 18:46 ` John Fastabend
  2013-09-11 18:46 ` [RFC PATCH 2/4] net: Add lower dev list helpers John Fastabend
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: John Fastabend @ 2013-09-11 18:46 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson

The priv_size rtnl_link_op today is a fixed size_t value.

For upcoming VSI support via rtnl_link_ops it is useful to allow this
size to be configurable. This patch converts the existing static
definition into a function that returns the size_t value.

To make this conversion as easy as possible the patch uses a new
macro RTNL_LINK_OPS_PRIV_SIZE which existing users can call to
generate a function equivalent to the previous static value.

RFC NOTE: I'm not entirely sure the macro makes the code easier
          to read or just obfuscates what is really happening.
          Any opinions?

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_netlink.c |    4 +++-
 drivers/net/bonding/bond_main.c              |    4 +++-
 drivers/net/caif/caif_hsi.c                  |    4 +++-
 drivers/net/ifb.c                            |    4 +++-
 drivers/net/macvlan.c                        |    4 +++-
 drivers/net/nlmon.c                          |    4 +++-
 drivers/net/team/team.c                      |    4 +++-
 drivers/net/tun.c                            |    4 +++-
 drivers/net/veth.c                           |    4 +++-
 drivers/net/vxlan.c                          |    4 +++-
 include/net/rtnetlink.h                      |   11 ++++++++++-
 net/8021q/vlan_netlink.c                     |    4 +++-
 net/bridge/br_netlink.c                      |    4 +++-
 net/core/rtnetlink.c                         |    2 +-
 net/ieee802154/6lowpan.c                     |    4 +++-
 net/ipv4/ip_gre.c                            |    6 ++++--
 net/ipv4/ip_tunnel.c                         |    2 +-
 net/ipv4/ip_vti.c                            |    4 +++-
 net/ipv4/ipip.c                              |    4 +++-
 net/ipv6/ip6_tunnel.c                        |    4 +++-
 net/ipv6/sit.c                               |    4 +++-
 21 files changed, 67 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
index f81abe1..8fd93fc 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c
@@ -155,11 +155,13 @@ static size_t ipoib_get_size(const struct net_device *dev)
 		nla_total_size(2);	/* IFLA_IPOIB_UMCAST */
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(ipoib, ipoib_dev_priv);
+
 static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
 	.kind		= "ipoib",
 	.maxtype	= IFLA_IPOIB_MAX,
 	.policy		= ipoib_policy,
-	.priv_size	= sizeof(struct ipoib_dev_priv),
+	.priv_size	= ipoib_priv_size,
 	.setup		= ipoib_setup,
 	.newlink	= ipoib_new_child_link,
 	.changelink	= ipoib_changelink,
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 39e5b1c..1b3caf4 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4503,9 +4503,11 @@ static unsigned int bond_get_num_tx_queues(void)
 	return tx_queues;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(bond, bonding);
+
 static struct rtnl_link_ops bond_link_ops __read_mostly = {
 	.kind			= "bond",
-	.priv_size		= sizeof(struct bonding),
+	.priv_size		= bond_priv_size,
 	.setup			= bond_setup,
 	.validate		= bond_validate,
 	.get_num_tx_queues	= bond_get_num_tx_queues,
diff --git a/drivers/net/caif/caif_hsi.c b/drivers/net/caif/caif_hsi.c
index 5e40a8b..bf4f19a 100644
--- a/drivers/net/caif/caif_hsi.c
+++ b/drivers/net/caif/caif_hsi.c
@@ -1445,9 +1445,11 @@ err:
 	return -ENODEV;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(caif, cfhsi);
+
 static struct rtnl_link_ops caif_hsi_link_ops __read_mostly = {
 	.kind		= "cfhsi",
-	.priv_size	= sizeof(struct cfhsi),
+	.priv_size	= caif_priv_size,
 	.setup		= cfhsi_setup,
 	.maxtype	= __IFLA_CAIF_HSI_MAX,
 	.policy	= caif_hsi_policy,
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index a3bed28..ed8f5fa 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -251,9 +251,11 @@ static int ifb_validate(struct nlattr *tb[], struct nlattr *data[])
 	return 0;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(ifb, ifb_private);
+
 static struct rtnl_link_ops ifb_link_ops __read_mostly = {
 	.kind		= "ifb",
-	.priv_size	= sizeof(struct ifb_private),
+	.priv_size	= ifb_priv_size,
 	.setup		= ifb_setup,
 	.validate	= ifb_validate,
 };
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 64dfaa3..6f293f7 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -934,10 +934,12 @@ static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
 	[IFLA_MACVLAN_FLAGS] = { .type = NLA_U16 },
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(macvlan, macvlan_dev);
+
 int macvlan_link_register(struct rtnl_link_ops *ops)
 {
 	/* common fields */
-	ops->priv_size		= sizeof(struct macvlan_dev);
+	ops->priv_size		= macvlan_priv_size,
 	ops->validate		= macvlan_validate;
 	ops->maxtype		= IFLA_MACVLAN_MAX;
 	ops->policy		= macvlan_policy;
diff --git a/drivers/net/nlmon.c b/drivers/net/nlmon.c
index b57ce5f..693f357 100644
--- a/drivers/net/nlmon.c
+++ b/drivers/net/nlmon.c
@@ -154,9 +154,11 @@ static int nlmon_validate(struct nlattr *tb[], struct nlattr *data[])
 	return 0;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(nlmon, nlmon);
+
 static struct rtnl_link_ops nlmon_link_ops __read_mostly = {
 	.kind			= "nlmon",
-	.priv_size		= sizeof(struct nlmon),
+	.priv_size		= nlmon_priv_size,
 	.setup			= nlmon_setup,
 	.validate		= nlmon_validate,
 };
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 50e43e6..b5f0526 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2068,9 +2068,11 @@ static unsigned int team_get_num_rx_queues(void)
 	return TEAM_DEFAULT_NUM_RX_QUEUES;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(team, team);
+
 static struct rtnl_link_ops team_link_ops __read_mostly = {
 	.kind			= DRV_NAME,
-	.priv_size		= sizeof(struct team),
+	.priv_size		= team_priv_size,
 	.setup			= team_setup,
 	.newlink		= team_newlink,
 	.validate		= team_validate,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a639de8..d5aebec 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1380,9 +1380,11 @@ static int tun_validate(struct nlattr *tb[], struct nlattr *data[])
 	return -EINVAL;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(tun, tun_struct);
+
 static struct rtnl_link_ops tun_link_ops __read_mostly = {
 	.kind		= DRV_NAME,
-	.priv_size	= sizeof(struct tun_struct),
+	.priv_size	= tun_priv_size,
 	.setup		= tun_setup,
 	.validate	= tun_validate,
 };
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index eee1f19..61f7122 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -434,9 +434,11 @@ static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(veth, veth_priv);
+
 static struct rtnl_link_ops veth_link_ops = {
 	.kind		= DRV_NAME,
-	.priv_size	= sizeof(struct veth_priv),
+	.priv_size	= veth_priv_size,
 	.setup		= veth_setup,
 	.validate	= veth_validate,
 	.newlink	= veth_newlink,
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index bf64b41..cff4c45 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2616,11 +2616,13 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(vxlan, vxlan_dev);
+
 static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.kind		= "vxlan",
 	.maxtype	= IFLA_VXLAN_MAX,
 	.policy		= vxlan_policy,
-	.priv_size	= sizeof(struct vxlan_dev),
+	.priv_size	= vxlan_priv_size,
 	.setup		= vxlan_setup,
 	.validate	= vxlan_validate,
 	.newlink	= vxlan_newlink,
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 7026648..b6a2ffb 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -4,6 +4,14 @@
 #include <linux/rtnetlink.h>
 #include <net/netlink.h>
 
+/* generate a get_priv function for simple field */
+#define RTNL_LINK_OPS_PRIV_SIZE(device, size_struct)		\
+static size_t device##_priv_size(struct net *src_net,		\
+				  struct nlattr *tb[])		\
+{								\
+	return sizeof(struct size_struct);			\
+}
+
 typedef int (*rtnl_doit_func)(struct sk_buff *, struct nlmsghdr *);
 typedef int (*rtnl_dumpit_func)(struct sk_buff *, struct netlink_callback *);
 typedef u16 (*rtnl_calcit_func)(struct sk_buff *, struct nlmsghdr *);
@@ -54,7 +62,8 @@ struct rtnl_link_ops {
 
 	const char		*kind;
 
-	size_t			priv_size;
+	size_t			(*priv_size)(struct net *src_net,
+					     struct nlattr *tb[]);
 	void			(*setup)(struct net_device *dev);
 
 	int			maxtype;
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index 3091297..6052370 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -238,11 +238,13 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(vlan, vlan_dev_priv);
+
 struct rtnl_link_ops vlan_link_ops __read_mostly = {
 	.kind		= "vlan",
 	.maxtype	= IFLA_VLAN_MAX,
 	.policy		= vlan_policy,
-	.priv_size	= sizeof(struct vlan_dev_priv),
+	.priv_size	= vlan_priv_size,
 	.setup		= vlan_setup,
 	.validate	= vlan_validate,
 	.newlink	= vlan_newlink,
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index b9259ef..3f4a792 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -469,9 +469,11 @@ static struct rtnl_af_ops br_af_ops = {
 	.get_link_af_size	= br_get_link_af_size,
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(br, net_bridge);
+
 struct rtnl_link_ops br_link_ops __read_mostly = {
 	.kind		= "bridge",
-	.priv_size	= sizeof(struct net_bridge),
+	.priv_size	= br_priv_size,
 	.setup		= br_dev_setup,
 	.validate	= br_validate,
 	.dellink	= br_dev_delete,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2a0e21d..76320fb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1673,7 +1673,7 @@ struct net_device *rtnl_create_link(struct net *net,
 		num_rx_queues = ops->get_num_rx_queues();
 
 	err = -ENOMEM;
-	dev = alloc_netdev_mqs(ops->priv_size, ifname, ops->setup,
+	dev = alloc_netdev_mqs(ops->priv_size(net, tb), ifname, ops->setup,
 			       num_tx_queues, num_rx_queues);
 	if (!dev)
 		goto err;
diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index c85e71e..8ee9235 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -1420,9 +1420,11 @@ static void lowpan_dellink(struct net_device *dev, struct list_head *head)
 	dev_put(real_dev);
 }
 
+RTNL_LINK_OPS_PRIV_SIZE(lowpan, lowpan_dev_info);
+
 static struct rtnl_link_ops lowpan_link_ops __read_mostly = {
 	.kind		= "lowpan",
-	.priv_size	= sizeof(struct lowpan_dev_info),
+	.priv_size	= lowpan_priv_size,
 	.setup		= lowpan_setup,
 	.newlink	= lowpan_newlink,
 	.dellink	= lowpan_dellink,
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index d7aea4c..c3cfb7b 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -731,11 +731,13 @@ static const struct nla_policy ipgre_policy[IFLA_GRE_MAX + 1] = {
 	[IFLA_GRE_PMTUDISC]	= { .type = NLA_U8 },
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(ipgre, ip_tunnel);
+
 static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
 	.kind		= "gre",
 	.maxtype	= IFLA_GRE_MAX,
 	.policy		= ipgre_policy,
-	.priv_size	= sizeof(struct ip_tunnel),
+	.priv_size	= ipgre_priv_size,
 	.setup		= ipgre_tunnel_setup,
 	.validate	= ipgre_tunnel_validate,
 	.newlink	= ipgre_newlink,
@@ -749,7 +751,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
 	.kind		= "gretap",
 	.maxtype	= IFLA_GRE_MAX,
 	.policy		= ipgre_policy,
-	.priv_size	= sizeof(struct ip_tunnel),
+	.priv_size	= ipgre_priv_size,
 	.setup		= ipgre_tap_setup,
 	.validate	= ipgre_tap_validate,
 	.newlink	= ipgre_newlink,
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index ac9fabe..562dd12 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -293,7 +293,7 @@ static struct net_device *__ip_tunnel_create(struct net *net,
 	}
 
 	ASSERT_RTNL();
-	dev = alloc_netdev(ops->priv_size, name, ops->setup);
+	dev = alloc_netdev(ops->priv_size(net, NULL), name, ops->setup);
 	if (!dev) {
 		err = -ENOMEM;
 		goto failed;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index e805e7b..545cc20 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -416,11 +416,13 @@ static const struct nla_policy vti_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_REMOTE]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(vti, ip_tunnel);
+
 static struct rtnl_link_ops vti_link_ops __read_mostly = {
 	.kind		= "vti",
 	.maxtype	= IFLA_VTI_MAX,
 	.policy		= vti_policy,
-	.priv_size	= sizeof(struct ip_tunnel),
+	.priv_size	= vti_priv_size,
 	.setup		= vti_tunnel_setup,
 	.validate	= vti_tunnel_validate,
 	.newlink	= vti_newlink,
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 7f80fb4..ee5a926 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -408,11 +408,13 @@ static const struct nla_policy ipip_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_PMTUDISC]		= { .type = NLA_U8 },
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(ipip, ip_tunnel);
+
 static struct rtnl_link_ops ipip_link_ops __read_mostly = {
 	.kind		= "ipip",
 	.maxtype	= IFLA_IPTUN_MAX,
 	.policy		= ipip_policy,
-	.priv_size	= sizeof(struct ip_tunnel),
+	.priv_size	= ipip_priv_size,
 	.setup		= ipip_tunnel_setup,
 	.newlink	= ipip_newlink,
 	.changelink	= ipip_changelink,
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 61355f7..2ab41b9 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1682,11 +1682,13 @@ static const struct nla_policy ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_PROTO]		= { .type = NLA_U8 },
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(ip6_tnl, ip6_tnl);
+
 static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.kind		= "ip6tnl",
 	.maxtype	= IFLA_IPTUN_MAX,
 	.policy		= ip6_tnl_policy,
-	.priv_size	= sizeof(struct ip6_tnl),
+	.priv_size	= ip6_tnl_priv_size,
 	.setup		= ip6_tnl_dev_setup,
 	.validate	= ip6_tnl_validate,
 	.newlink	= ip6_tnl_newlink,
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 7ee5cb9..bd638ba 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1540,11 +1540,13 @@ static const struct nla_policy ipip6_policy[IFLA_IPTUN_MAX + 1] = {
 #endif
 };
 
+RTNL_LINK_OPS_PRIV_SIZE(ipip6, ip_tunnel);
+
 static struct rtnl_link_ops sit_link_ops __read_mostly = {
 	.kind		= "sit",
 	.maxtype	= IFLA_IPTUN_MAX,
 	.policy		= ipip6_policy,
-	.priv_size	= sizeof(struct ip_tunnel),
+	.priv_size	= ipip6_priv_size,
 	.setup		= ipip6_tunnel_setup,
 	.validate	= ipip6_validate,
 	.newlink	= ipip6_newlink,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 2/4] net: Add lower dev list helpers
  2013-09-11 18:45 [RFC PATCH 0/4] Series short description John Fastabend
  2013-09-11 18:46 ` [RFC PATCH 1/4] net: rtnetlink: make priv_size a function for devs with dynamic size John Fastabend
@ 2013-09-11 18:46 ` John Fastabend
  2013-09-14 12:27   ` Veaceslav Falico
  2013-09-11 18:47 ` [RFC PATCH 3/4] net: VSI: Add virtual station interface support John Fastabend
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-09-11 18:46 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson

This patch adds helpers to traverse the lower dev lists, these
helpers match the upper dev list implementation.

VSI implementers may use these to track a list of connected netdevs.
This is easier then having drivers do their own accounting.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/netdevice.h |    8 ++++++++
 net/core/dev.c            |   25 +++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 041b42a..4d24b38 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2813,6 +2813,8 @@ extern int		bpf_jit_enable;
 extern bool netdev_has_upper_dev(struct net_device *dev,
 				 struct net_device *upper_dev);
 extern bool netdev_has_any_upper_dev(struct net_device *dev);
+extern struct net_device *netdev_lower_get_next_dev_rcu(struct net_device *dev,
+							struct list_head **iter);
 extern struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
 							struct list_head **iter);
 
@@ -2823,6 +2825,12 @@ extern struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
 	     upper; \
 	     upper = netdev_upper_get_next_dev_rcu(dev, &(iter)))
 
+#define netdev_for_each_lower_dev_rcu(dev, lower, iter) \
+	for (iter = &(dev)->lower_dev_list, \
+	     lower = netdev_lower_get_next_dev_rcu(dev, &(iter)); \
+	     lower; \
+	     lower = netdev_lower_get_next_dev_rcu(dev, &(iter)))
+
 extern struct net_device *netdev_master_upper_dev_get(struct net_device *dev);
 extern struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev);
 extern int netdev_upper_dev_link(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2..65ed610 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4468,6 +4468,31 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_master_upper_dev_get);
 
+/* netdev_lower_get_next_dev_rcu - Get the next dev from lower list
+ * @dev: device
+ * @iter: list_head ** of the current position
+ *
+ * Gets the next device from the dev's lower list, starting from iter
+ * position. The caller must hold RTNL/RCU read lock.
+ */
+struct net_device *netdev_lower_get_next_dev_rcu(struct net_device *dev,
+						 struct list_head **iter)
+{
+	struct netdev_adjacent *lower;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
+
+	if (&lower->list == &dev->lower_dev_list)
+		return NULL;
+
+	*iter = &lower->list;
+
+	return lower->dev;
+}
+EXPORT_SYMBOL(netdev_lower_get_next_dev_rcu);
+
 /* netdev_upper_get_next_dev_rcu - Get the next dev from upper list
  * @dev: device
  * @iter: list_head ** of the current position

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 3/4] net: VSI: Add virtual station interface support
  2013-09-11 18:45 [RFC PATCH 0/4] Series short description John Fastabend
  2013-09-11 18:46 ` [RFC PATCH 1/4] net: rtnetlink: make priv_size a function for devs with dynamic size John Fastabend
  2013-09-11 18:46 ` [RFC PATCH 2/4] net: Add lower dev list helpers John Fastabend
@ 2013-09-11 18:47 ` John Fastabend
  2013-09-20 23:12   ` Neil Horman
  2013-09-11 18:47 ` [RFC PATCH 4/4] ixgbe: Adding VSI support to ixgbe John Fastabend
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
  4 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-09-11 18:47 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson

This patch adds support for a new device type VSI (virtual station
interface) this device type exposes additional net devices complete
with queues and a MAC/VLAN pair to the host OS that are logically
stacked on top of a switching/routing component with the physical
link acting as the downlink to the peer switch.

The hardware on receive path will forward packets to the new VSI
net device using the forwarding database (FDB) already exposed via
the ndo ops ndo_fdb_{add|del|dump}. On transmit the hardware may
use either a VEB or VEPA. In the VEB case traffic may be "switched"
between VSI net devices by the hardware and in VEPA case all traffic
is sent to the adjacent switch. The hardware _should_ expose this
functionality via the ndo_bridge_{set|get}link ndo operations.

This net device should be functionally analogous to an offloaded
macvlan device with the ebridge component offloaded into hardware.

Also notice that for now the ixgbe implementation accompanying this
patch set only supports L2 forwarding the fdb interfaces could push
L3/L4 forwarding to the hardware for more advanced usages including
vxlan and other tunnel schemes.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/Kconfig       |    9 +++
 drivers/net/Makefile      |    1 
 drivers/net/vsi.c         |  124 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h |   27 ++++++++++
 include/uapi/linux/if.h   |    1 
 5 files changed, 162 insertions(+)
 create mode 100644 drivers/net/vsi.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b45b240..19be0fb 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -362,4 +362,13 @@ config VMXNET3
 
 source "drivers/net/hyperv/Kconfig"
 
+config VSI
+	tristate "Virtual Station Interfaces (VSI)"
+	help
+	  This supports chip sets with embedded switching components
+	  and allows creating additional net devices that are
+	  logically slaves of a master net device typically the net
+	  device associated with the physical function. For these
+	  child devices switching occurs in the hardware component.
+
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3fef8a8..3ef1d66 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
 obj-$(CONFIG_NLMON) += nlmon.o
+obj-$(CONFIG_VSI) += vsi.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/vsi.c b/drivers/net/vsi.c
new file mode 100644
index 0000000..e9d39da
--- /dev/null
+++ b/drivers/net/vsi.c
@@ -0,0 +1,124 @@
+/*
+ * VSI - Virtual Sstation Interface
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * John Fastabend <john.r.fastabend@intel.com>
+ */
+#include <linux/module.h>
+#include <net/rtnetlink.h>
+#include <linux/etherdevice.h>
+
+size_t vsi_priv_size(struct net *src_net, struct nlattr *tb[])
+{
+	struct net_device *dev;
+	size_t size = 0;
+
+	if (!tb[IFLA_LINK])
+		return 0;
+
+	dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!dev)
+		return -ENODEV;
+
+	if (dev->netdev_ops->ndo_vsi_size)
+		size = dev->netdev_ops->ndo_vsi_size(dev);
+	return size;
+}
+
+static int vsi_newlink(struct net *src_net, struct net_device *dev,
+		       struct nlattr *tb[], struct nlattr *data[])
+{
+	struct net_device *lower;
+	int err;
+
+	if (!tb[IFLA_LINK])
+		return -EINVAL;
+
+	lower = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!lower)
+		return -ENODEV;
+
+	if (!tb[IFLA_MTU])
+		dev->mtu = lower->mtu;
+	else if (lower->mtu > dev->mtu)
+		return -EINVAL;
+
+	dev->priv_flags |= IFF_VSI_PORT;
+	err = lower->netdev_ops->ndo_vsi_add(lower, dev);
+	if (err < 0)
+		return err;
+
+	err = netdev_upper_dev_link(lower, dev);
+	if (err)
+		goto destroy_port;
+
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto upper_dev_unlink;
+
+	netif_stacked_transfer_operstate(lower, dev);
+	return 0;
+upper_dev_unlink:
+	netdev_upper_dev_unlink(lower, dev);
+destroy_port:
+	if (lower->netdev_ops->ndo_vsi_del)
+		lower->netdev_ops->ndo_vsi_del(dev);
+	return err;
+}
+
+void vsi_dellink(struct net_device *dev, struct list_head *head)
+{
+	struct net_device *lower;
+	struct list_head *iter;
+
+	netdev_for_each_lower_dev_rcu(dev, lower, iter) {
+		if (lower->netdev_ops->ndo_vsi_del)
+			lower->netdev_ops->ndo_vsi_del(dev);
+		netdev_upper_dev_unlink(lower, dev);
+	}
+
+	unregister_netdevice_queue(dev, head);
+}
+
+static struct rtnl_link_ops vsi_link_ops __read_mostly = {
+	.kind		= "vsi",
+	.priv_size	= vsi_priv_size,
+	.setup		= ether_setup,
+	.newlink	= vsi_newlink,
+	.dellink	= vsi_dellink,
+};
+
+static int __init vsi_init_module(void)
+{
+	return rtnl_link_register(&vsi_link_ops);
+}
+
+static void __exit vsi_cleanup_module(void)
+{
+	rtnl_link_unregister(&vsi_link_ops);
+}
+
+module_init(vsi_init_module);
+module_exit(vsi_cleanup_module);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("John Fastabend <john.r.fastabend@intel.com>");
+MODULE_DESCRIPTION("Virutal Station Interfaces (VSI)");
+MODULE_ALIAS_RTNL_LINK("vsi");
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4d24b38..9817745 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -961,6 +961,24 @@ struct netdev_phys_port_id {
  *	Called by vxlan to notify the driver about a UDP port and socket
  *	address family that vxlan is not listening to anymore. The operation
  *	is protected by the vxlan_net->sock_lock.
+ *
+ * int (*ndo_vsi_add)(struct net_device *lower, struct net_device *dev)
+ *	Called by the virtual station interface (VSI) link type to add a new
+ *	net device 'dev' to an embedded switch where the embedded switch
+ *	management net device is identified by 'lower'. This should return
+ *	0 on success or may return negative error codes. Error codes should
+ *	be used here to signify resource constraints, unsupportable attributes,
+ *	or any other condition which caused the creation to fail.
+ * void (*ndo_vsi_del)(struct net_device *dev)
+ *	Called by the virtual station interface (VSI) link type to remove the
+ *	net device 'dev' from an embedded switch. Drivers may not fail this
+ *	command.
+ * size_t (*ndo_vsi_size)(struct net_device *dev)
+ *	Called by the virtual station interface (VSI) link type to add the
+ *	required private size to a VSI interface that is being created. If
+ *	this routine is not implemented size_t 0 is used. The 'dev' argument
+ *	indicates the embedded switch management interface where the new
+ *	net devices is being attached.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1097,6 +1115,10 @@ struct net_device_ops {
 	void			(*ndo_del_vxlan_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
 						      __u16 port);
+	int			(*ndo_vsi_add)(struct net_device *lower,
+					       struct net_device *dev);
+	void			(*ndo_vsi_del)(struct net_device *dev);
+	size_t			(*ndo_vsi_size)(struct net_device *dev);
 };
 
 /*
@@ -2967,6 +2989,11 @@ static inline bool netif_supports_nofcs(struct net_device *dev)
 	return dev->priv_flags & IFF_SUPP_NOFCS;
 }
 
+static inline bool netif_is_vsi_port(struct net_device *dev)
+{
+	return dev->priv_flags & IFF_VSI_PORT;
+}
+
 extern struct pernet_operations __net_initdata loopback_net_ops;
 
 /* Logging, debugging and troubleshooting/diagnostic helpers. */
diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 1ec407b..9b8d6a0 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -83,6 +83,7 @@
 #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
 #define IFF_LIVE_ADDR_CHANGE 0x100000	/* device supports hardware address
 					 * change when it's running */
+#define IFF_VSI_PORT 0x200000		/* Virtual Station Interface port */
 
 
 #define IF_GET_IFACE	0x0001		/* for querying only */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 4/4] ixgbe: Adding VSI support to ixgbe
  2013-09-11 18:45 [RFC PATCH 0/4] Series short description John Fastabend
                   ` (2 preceding siblings ...)
  2013-09-11 18:47 ` [RFC PATCH 3/4] net: VSI: Add virtual station interface support John Fastabend
@ 2013-09-11 18:47 ` John Fastabend
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
  4 siblings, 0 replies; 35+ messages in thread
From: John Fastabend @ 2013-09-11 18:47 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson

This adds initial base support for VSI interfaces to the ixgbe
driver.

This supports up to 32 VSIs per physical function with each VSI
supporting up to 4 TX/RX queue pairs. The hardware has support
for modes using 2 queue pairs that enable up to 64 VSIs but they
are not enabled here. VSIs can be instantiated with 1,2, or 4
queues unfortunately due to hardware restrictions related to RSS
masking across queues 3 TX/RX queues is not supported. If a user
inputs a queue parameter of 3 for either TX or RX the vsi_add
will fail with EINVAL and a message will be printed to dmesg
explaining the error. EBUSY is returned if more than 32 VSIs are
added.

The driver uses a bitmask added to the ixgbe_adapter struct to
track VSIs. VSIs spawn net devices that can be independently
managed via their own ndo and ethtool ops.

Currently, MAC addresses must be added manually via 'ip link set'
or similar configuration (netlink) commands before bringing the
spawned devices online. I could easily generate random MAC addresses
if that is prefered although not required in my use case and seems
a bit arbitrary. I expect most users have a pool of known "good"
MAC addresses they can use or for testing can simply pick their
favorite address.

Additionally VSIs do not support promisc mode. It is not clear
to me at least what promiscuous mode means when the VSI is behind
a non-learning embedded bridge wich does not do flooding. The
hardware does not support flooding on the embedded bridge however
it does support mirroring which could be enabled to get something
that looks like promiscuous mode. Currently there is no kernel
support (netlink cmd?) to configure mirroring.

The embedded bridge that connects VSIs looks like an edge relay
in IEEE std terms. And can support both VEB and VEPA modes which
are managed independently of VSIs through the {set|get} bridge
APIs.

VSIs only support a single MAC addresses in this patch. This
is a software limitation and can/will be extended in subsequent
patches. For now a single MAC addresses is supported.

NOTE:
The SRIOV and VMDQ flags in ixgbe seem to be a bit overloaded
and convoluted. This patch uses these flags. A follow up patch
could clean up the ixgbe flag usage scheme.

RFC TODO: I'll complete these before sending the real patch
(1) interop with DCB/FCOE/SR-IOV
(2) vmdq_netdev can be dropped from tx_ring and just use
    correctly set netdev. In slowpath we can look up management
    netdev via vadapter->real_adapter->netdev and save the hotpath
    vmdq_netdev if/else branch.

Otherwise I've been adding/delete VSIs with multiple netperf
TCP sessions without any crashes but I still need to do some
more testing.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/Makefile        |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   32 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    4 
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  307 +++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c     |  428 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h     |   71 ++++
 7 files changed, 766 insertions(+), 94 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index be2989e..24136e7 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -34,7 +34,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
               ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-              ixgbe_mbx.o ixgbe_x540.o ixgbe_lib.o ixgbe_ptp.o
+              ixgbe_mbx.o ixgbe_x540.o ixgbe_lib.o ixgbe_ptp.o \
+	      ixgbe_vsi.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
                               ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 0ac6b11..ba2ab14 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -244,6 +244,7 @@ struct ixgbe_ring {
 	unsigned long last_rx_timestamp;
 	unsigned long state;
 	u8 __iomem *tail;
+	struct net_device *vmdq_netdev;
 	dma_addr_t dma;			/* phys. address of descriptor ring */
 	unsigned int size;		/* length in bytes */
 
@@ -288,11 +289,15 @@ enum ixgbe_ring_f_enum {
 };
 
 #define IXGBE_MAX_RSS_INDICES  16
-#define IXGBE_MAX_VMDQ_INDICES 64
+#define IXGBE_MAX_VMDQ_INDICES 32
 #define IXGBE_MAX_FDIR_INDICES 63	/* based on q_vector limit */
 #define IXGBE_MAX_FCOE_INDICES  8
 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
+#define IXGBE_MAX_VSI_QUEUES 4
+#define IXGBE_MAX_VSI_QUEUES 4
+#define IXGBE_BAD_VSI_QUEUE 3
+
 struct ixgbe_ring_feature {
 	u16 limit;	/* upper limit on feature indices */
 	u16 indices;	/* current value of indices */
@@ -738,6 +743,7 @@ struct ixgbe_adapter {
 #endif /*CONFIG_DEBUG_FS*/
 
 	u8 default_up;
+	unsigned long vsi_bitmask; /* Bitmask indicating in use pools */
 };
 
 struct ixgbe_fdir_filter {
@@ -747,6 +753,17 @@ struct ixgbe_fdir_filter {
 	u16 action;
 };
 
+struct ixgbe_vsi_adapter {
+	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
+	struct net_device *netdev;
+	struct ixgbe_adapter *real_adapter;
+	unsigned int tx_base_queue;
+	unsigned int rx_base_queue;
+	struct net_device_stats net_stats;
+	int pool;
+	bool online;
+};
+
 enum ixgbe_state_t {
 	__IXGBE_TESTING,
 	__IXGBE_RESETTING,
@@ -879,9 +896,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {}
 static inline void ixgbe_dbg_init(void) {}
 static inline void ixgbe_dbg_exit(void) {}
 #endif /* CONFIG_DEBUG_FS */
+static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring)
+{
+	return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev;
+}
+
 static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring)
 {
-	return netdev_get_tx_queue(ring->netdev, ring->queue_index);
+	return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index);
 }
 
 extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter);
@@ -915,4 +937,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr);
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter);
 #endif
 
+int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd);
+int ixgbe_write_uc_addr_list(struct net_device *netdev);
+netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
+				  struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *tx_ring);
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring);
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 0e1b973..48b2d81 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = {
 };
 #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN
 
-static int ixgbe_get_settings(struct net_device *netdev,
-                              struct ethtool_cmd *ecmd)
+int ixgbe_get_settings(struct net_device *netdev,
+		       struct ethtool_cmd *ecmd)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 90b4e10..e2dd635 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 #endif
 
 	/* only proceed if SR-IOV is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) &&
+	    !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED))
 		return false;
 
 	/* Add starting offset to total pool count */
@@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 
 		/* apply Tx specific ring traits */
 		ring->count = adapter->tx_ring_count;
-		ring->queue_index = txr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				txr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = txr_idx;
 
 		/* assign ring to adapter */
 		adapter->tx_ring[txr_idx] = ring;
@@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 #endif /* IXGBE_FCOE */
 		/* apply Rx specific ring traits */
 		ring->count = adapter->rx_ring_count;
-		ring->queue_index = rxr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				rxr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = rxr_idx;
 
 		/* assign ring to adapter */
 		adapter->rx_ring[rxr_idx] = ring;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7aba452..3a2138a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -52,6 +52,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_vsi.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -872,7 +873,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
 
 static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
 {
-	struct ixgbe_adapter *adapter = netdev_priv(ring->netdev);
+	struct net_device *dev = ring->netdev;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 
 	u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx));
@@ -1055,7 +1057,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1072,16 +1074,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
+		if (__netif_subqueue_stopped(netdev_ring(tx_ring),
 					     tx_ring->queue_index)
 		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
+			netif_wake_subqueue(netdev_ring(tx_ring),
 					    tx_ring->queue_index);
 			++tx_ring->tx_stats.restart_queue;
 		}
@@ -1226,7 +1228,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring,
 				 union ixgbe_adv_rx_desc *rx_desc,
 				 struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
+	if (netdev_ring(ring)->features & NETIF_F_RXHASH)
 		skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
 }
 
@@ -1260,10 +1262,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
+	struct net_device *dev = netdev_ring(ring);
+
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(ring->netdev->features & NETIF_F_RXCSUM))
+	if (!(dev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -1559,7 +1563,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
-	struct net_device *dev = rx_ring->netdev;
+	struct net_device *dev = netdev_ring(rx_ring);
 
 	ixgbe_update_rsc_stats(rx_ring, skb);
 
@@ -1739,7 +1743,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 				  union ixgbe_adv_rx_desc *rx_desc,
 				  struct sk_buff *skb)
 {
-	struct net_device *netdev = rx_ring->netdev;
+	struct net_device *netdev = netdev_ring(rx_ring);
 
 	/* verify that the packet does not have any known errors */
 	if (unlikely(ixgbe_test_staterr(rx_desc,
@@ -1905,7 +1909,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+		skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring),
 						IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
@@ -1986,6 +1990,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	int ddp_bytes;
 	unsigned int mss = 0;
+	struct net_device *netdev = netdev_ring(rx_ring);
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
 
@@ -2041,7 +2046,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			/* include DDPed FCoE data */
 			if (ddp_bytes > 0) {
 				if (!mss) {
-					mss = rx_ring->netdev->mtu -
+					mss = netdev->mtu -
 						sizeof(struct fcoe_hdr) -
 						sizeof(struct fc_frame_header) -
 						sizeof(struct fcoe_crc_eof);
@@ -2455,58 +2460,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 	}
 }
 
-static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
-					   u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
-static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
-					    u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
 /**
  * ixgbe_irq_enable - Enable default interrupt generation settings
  * @adapter: board private structure
@@ -2946,6 +2899,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 			     struct ixgbe_ring *ring)
 {
+	struct net_device *netdev = netdev_ring(ring);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
@@ -3005,7 +2959,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 		struct ixgbe_q_vector *q_vector = ring->q_vector;
 
 		if (q_vector)
-			netif_set_xps_queue(adapter->netdev,
+			netif_set_xps_queue(netdev,
 					    &q_vector->affinity_mask,
 					    ring->queue_index);
 	}
@@ -3395,7 +3349,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int rss_i = adapter->ring_feature[RING_F_RSS].indices;
-	int p;
+	u16 pool;
 
 	/* PSRTYPE must be initialized in non 82598 adapters */
 	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
@@ -3412,9 +3366,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	else if (rss_i > 1)
 		psrtype |= 1 << 29;
 
-	for (p = 0; p < adapter->num_rx_pools; p++)
-		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)),
-				psrtype);
+	for_each_set_bit(pool, &adapter->vsi_bitmask, 32)
+		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
 }
 
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
@@ -3676,6 +3629,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl &= ~IXGBE_RXDCTL_VME;
@@ -3706,6 +3661,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl |= IXGBE_RXDCTL_VME;
@@ -3736,15 +3693,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
  *                0 on no addresses written
  *                X on writing X addresses to the RAR table
  **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev)
+int ixgbe_write_uc_addr_list(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	unsigned int rar_entries = hw->mac.num_rar_entries - 1;
 	int count = 0;
 
-	/* In SR-IOV mode significantly less RAR entries are available */
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+	/* In SR-IOV/VMDQ modes significantly less RAR entries are available */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED ||
+	    adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)
 		rar_entries = IXGBE_MAX_PF_MACVLANS - 1;
 
 	/* return ENOMEM indicating insufficient memory for addresses */
@@ -3765,6 +3723,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev)
 			count++;
 		}
 	}
+
 	/* write the addresses in reverse order to avoid write combining */
 	for (; rar_entries > 0 ; rar_entries--)
 		hw->mac.ops.clear_rar(hw, rar_entries);
@@ -4114,6 +4073,8 @@ static void ixgbe_fdir_filter_restore(struct ixgbe_adapter *adapter)
 static void ixgbe_configure(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct net_device *upper;
+	struct list_head *iter;
 
 	ixgbe_configure_pb(adapter);
 #ifdef CONFIG_IXGBE_DCB
@@ -4126,6 +4087,13 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	ixgbe_configure_virtualization(adapter);
 
 	ixgbe_set_rx_mode(adapter->netdev);
+	netdev_for_each_upper_dev_rcu(adapter->netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+
+		ixgbe_vsi_set_rx_mode(upper);
+	}
+
 	ixgbe_restore_vlan(adapter);
 
 	switch (hw->mac.type) {
@@ -4452,7 +4420,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
  **/
-static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
 	unsigned long size;
@@ -4576,8 +4544,9 @@ static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
 
 void ixgbe_down(struct ixgbe_adapter *adapter)
 {
-	struct net_device *netdev = adapter->netdev;
+	struct net_device *upper, *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct list_head *iter;
 	u32 rxctrl;
 	int i;
 
@@ -4588,6 +4557,14 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	rxctrl = IXGBE_READ_REG(hw, IXGBE_RXCTRL);
 	IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, rxctrl & ~IXGBE_RXCTRL_RXEN);
 
+	/* disable all VSIs */
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+		if (netif_running(upper))
+			ixgbe_vsi_down(upper);
+	}
+
 	/* disable all enabled rx queues */
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		/* this call also flushes the previous write */
@@ -4831,6 +4808,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		return -EIO;
 	}
 
+	/* PF holds first pool slot */
+	set_bit(0, &adapter->vsi_bitmask);
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
 	return 0;
@@ -5136,7 +5115,9 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
 static int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
+	struct net_device *upper;
+	struct list_head *iter;
+	int err, queues;
 
 	/* disallow open during test */
 	if (test_bit(__IXGBE_TESTING, &adapter->state))
@@ -5161,16 +5142,22 @@ static int ixgbe_open(struct net_device *netdev)
 		goto err_req_irq;
 
 	/* Notify the stack of the actual queue counts. */
-	err = netif_set_real_num_tx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_tx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_tx_queues > IXGBE_MAX_VSI_QUEUES)
+		queues = IXGBE_MAX_VSI_QUEUES;
+	else
+		queues = adapter->num_tx_queues;
+
+	err = netif_set_real_num_tx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
-
-	err = netif_set_real_num_rx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_rx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_rx_queues > IXGBE_MAX_VSI_QUEUES)
+		queues = IXGBE_MAX_VSI_QUEUES;
+	else
+		queues = adapter->num_rx_queues;
+	err = netif_set_real_num_rx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
@@ -5178,6 +5165,16 @@ static int ixgbe_open(struct net_device *netdev)
 
 	ixgbe_up_complete(adapter);
 
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		struct ixgbe_vsi_adapter *vadapter;
+
+		if (!netif_is_vsi_port(upper))
+			continue;
+		vadapter = netdev_priv(upper);
+		if (vadapter->online)
+			ixgbe_vsi_up(upper);
+	}
+
 	return 0;
 
 err_set_queues:
@@ -5208,7 +5205,6 @@ static int ixgbe_close(struct net_device *netdev)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
 	ixgbe_ptp_stop(adapter);
-
 	ixgbe_down(adapter);
 	ixgbe_free_irq(adapter);
 
@@ -5383,9 +5379,10 @@ static void ixgbe_shutdown(struct pci_dev *pdev)
  **/
 void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 {
-	struct net_device *netdev = adapter->netdev;
+	struct net_device *vsi, *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbe_hw_stats *hwstats = &adapter->stats;
+	struct list_head *iter;
 	u64 total_mpc = 0;
 	u32 i, missed_rx = 0, mpc, bprc, lxon, lxoff, xon_off_tot;
 	u64 non_eop_descs = 0, restart_queue = 0, tx_busy = 0;
@@ -5407,6 +5404,34 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 		adapter->rsc_total_flush = rsc_flush;
 	}
 
+	netdev_for_each_upper_dev_rcu(netdev, vsi, iter) {
+		struct net_device_stats *vmdq_net_stats;
+		struct ixgbe_ring *rx_ring, *tx_ring;
+		struct ixgbe_vsi_adapter *vadapter;
+		unsigned int txq, rxq;
+		int j;
+
+		if (!netif_is_vsi_port(vsi))
+			continue;
+
+		vmdq_net_stats = &vsi->stats;
+		vadapter = netdev_priv(vsi);
+		txq = vadapter->tx_base_queue;
+		rxq = vadapter->rx_base_queue;
+
+		for (j = txq; j < txq + adapter->num_rx_queues_per_pool; j++) {
+			tx_ring = adapter->tx_ring[j];
+			vmdq_net_stats->tx_packets = tx_ring->stats.packets;
+			vmdq_net_stats->tx_bytes = tx_ring->stats.bytes;
+		}
+
+		for (j = rxq; j < rxq + adapter->num_rx_queues_per_pool; j++) {
+			rx_ring = adapter->rx_ring[j];
+			vmdq_net_stats->rx_packets = rx_ring->stats.packets;
+			vmdq_net_stats->rx_bytes = rx_ring->stats.bytes;
+		}
+	}
+
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		struct ixgbe_ring *rx_ring = adapter->rx_ring[i];
 		non_eop_descs += rx_ring->rx_stats.non_eop_descs;
@@ -5743,6 +5768,8 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 link_speed = adapter->link_speed;
+	struct net_device *upper;
+	struct list_head *iter;
 	bool flow_rx, flow_tx;
 
 	/* only continue if link was previously down */
@@ -5798,6 +5825,11 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter)
 
 	/* ping all the active vfs to let them know link has changed */
 	ixgbe_ping_all_vfs(adapter);
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+		netif_carrier_on(upper);
+	}
 }
 
 /**
@@ -5809,6 +5841,8 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct net_device *upper;
+	struct list_head *iter;
 
 	adapter->link_up = false;
 	adapter->link_speed = 0;
@@ -5829,6 +5863,11 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 
 	/* ping all the active vfs to let them know link has changed */
 	ixgbe_ping_all_vfs(adapter);
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+		netif_carrier_off(upper);
+	}
 }
 
 /**
@@ -6561,7 +6600,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -6573,7 +6612,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
@@ -6624,6 +6663,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_ring *tx_ring)
 {
 	struct ixgbe_tx_buffer *first;
+#ifdef IXGBE_FCOE
+	struct net_device *dev;
+#endif
 	int tso;
 	u32 tx_flags = 0;
 	unsigned short f;
@@ -6715,9 +6757,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 	first->protocol = protocol;
 
 #ifdef IXGBE_FCOE
+	dev = netdev_ring(tx_ring);
 	/* setup tx offload for FCoE */
 	if ((protocol == __constant_htons(ETH_P_FCOE)) &&
-	    (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
+	    (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
 		tso = ixgbe_fso(tx_ring, first, &hdr_len);
 		if (tso < 0)
 			goto out_drop;
@@ -7042,6 +7085,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	 */
 	if (netif_running(dev))
 		ixgbe_close(dev);
+
 	ixgbe_clear_interrupt_scheme(adapter);
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7290,6 +7334,94 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
 }
 
+static const struct net_device_ops ixgbe_vsi_netdev_ops = {
+	.ndo_init		= ixgbe_vsi_init,
+	.ndo_open		= ixgbe_vsi_open,
+	.ndo_stop		= ixgbe_vsi_close,
+	.ndo_start_xmit		= ixgbe_vsi_xmit_frame,
+	.ndo_set_rx_mode	= ixgbe_vsi_set_rx_mode,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_set_mac_address	= ixgbe_vsi_set_mac,
+	.ndo_change_mtu		= ixgbe_vsi_change_mtu,
+	.ndo_tx_timeout		= ixgbe_vsi_tx_timeout,
+	.ndo_vlan_rx_add_vid	= ixgbe_vsi_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid	= ixgbe_vsi_vlan_rx_kill_vid,
+	.ndo_set_features	= ixgbe_vsi_set_features,
+};
+
+static int ixgbe_vsi_add(struct net_device *dev, struct net_device *vsi)
+{
+	struct ixgbe_vsi_adapter *vsi_adapter = netdev_priv(vsi);
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	int pool, vmdq_pool, base_queue;
+
+	/* Check for hardware restriction on number of rx/tx queues */
+	if (vsi->num_rx_queues != vsi->num_tx_queues ||
+	    vsi->num_tx_queues > IXGBE_MAX_VSI_QUEUES ||
+	    vsi->num_tx_queues == IXGBE_BAD_VSI_QUEUE) {
+		e_info(drv, "%s: Supports RX/TX Queue counts 1,2, and 4\n",
+		       dev->name);
+		return -EINVAL;
+	}
+
+	if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES)
+		return -EBUSY;
+
+	pool = find_first_zero_bit(&adapter->vsi_bitmask, 32);
+	adapter->num_rx_pools++;
+	set_bit(pool, &adapter->vsi_bitmask);
+
+	/* Enable VMDq flag so device will be set in VM mode */
+	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
+	adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools;
+	adapter->ring_feature[RING_F_VMDQ].offset = 0;
+	adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_VSI_QUEUES;
+
+	/* Force reinit of ring allocation with VMDQ enabled */
+	ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+
+	/* Configure VSI adapter structure */
+	vmdq_pool = VMDQ_P(pool);
+	base_queue = vmdq_pool * adapter->num_rx_queues_per_pool;
+
+	netdev_dbg(dev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   pool, adapter->num_rx_pools,
+		   base_queue, base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->vsi_bitmask);
+
+	vsi_adapter->pool = pool;
+	vsi_adapter->netdev = vsi;
+	vsi_adapter->real_adapter = adapter;
+	vsi_adapter->rx_base_queue = base_queue;
+	vsi_adapter->tx_base_queue = base_queue;
+
+	vsi->netdev_ops = &ixgbe_vsi_netdev_ops;
+	ixgbe_vsi_set_ethtool_ops(vsi);
+
+	return 0;
+}
+
+static void ixgbe_vsi_del(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+
+	ixgbe_vsi_close(dev);
+	clear_bit(vadapter->pool, &adapter->vsi_bitmask);
+	adapter->num_rx_pools--;
+
+	netdev_dbg(dev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   vadapter->pool, adapter->num_rx_pools,
+		   vadapter->rx_base_queue,
+		   vadapter->rx_base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->vsi_bitmask);
+}
+
+size_t ixgbe_vsi_size(struct net_device *dev)
+{
+	return sizeof(struct ixgbe_vsi_adapter);
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7334,6 +7466,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
 	.ndo_bridge_setlink	= ixgbe_ndo_bridge_setlink,
 	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
+	.ndo_vsi_add		= ixgbe_vsi_add,
+	.ndo_vsi_del		= ixgbe_vsi_del,
+	.ndo_vsi_size		= ixgbe_vsi_size
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
new file mode 100644
index 0000000..48ad793
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
@@ -0,0 +1,428 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include <linux/tcp.h>
+
+#include "ixgbe.h"
+#include "ixgbe_vsi.h"
+
+/**
+ * ixgbe_del_mac_filter - Add a mac filter for the VSI netdev
+ * @adapter: pointer to private adapter struct
+ * @p: pointer to mac address (u8*) to program
+ * @pool: VSI pool to configure with MAC address
+ */
+static void ixgbe_del_mac_filter(struct ixgbe_adapter *adapter, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool - 1;
+	hw->mac.ops.clear_vmdq(hw, entry, VMDQ_P(pool));
+}
+
+/**
+ * ixgbe_add_mac_filter - Add a mac filter for the VSI netdev
+ * @adapter: pointer to private adapter struct
+ * @p: pointer to mac address (u8*) to program
+ * @pool: VSI pool to configure with MAC address
+ */
+static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter,
+				 u8 *addr, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool;
+	hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV);
+}
+
+static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+}
+
+/**
+ * ixgbe_disable_vsi_ring - shutdown hw queue and release buffers
+ * @adapter: pointer to private adapter struct
+ * @rx_ring: ring to free
+ */
+static void ixgbe_disable_vsi_ring(struct ixgbe_vsi_adapter *vadapter,
+				    struct ixgbe_ring *rx_ring)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int index = rx_ring->queue_index + vadapter->rx_base_queue;
+
+	/* shutdown specific queue receive and wait for dma to settle */
+	ixgbe_disable_rx_queue(adapter, rx_ring);
+	usleep_range(10000, 20000);
+	ixgbe_irq_disable_queues(adapter, ((u64)1 << index));
+	ixgbe_clean_rx_ring(rx_ring);
+}
+
+/**
+ * ixgbe_enable_vsi_ring - allocate rx buffers and start queue
+ * @adapter: pointer to private adapter struct
+ * @rx_ring: ring to enable with queue
+ */
+static void ixgbe_enable_vsi_ring(struct ixgbe_adapter *adapter,
+				   struct ixgbe_ring *rx_ring)
+{
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+}
+
+int ixgbe_vsi_init(struct net_device *dev)
+{
+	struct net_device *lower;
+	struct list_head *iter;
+
+	/* There can only be one lower dev at this point */
+	netdev_for_each_lower_dev_rcu(dev, lower, iter) {
+		dev->features		= lower->features;
+		dev->vlan_features	= lower->vlan_features;
+		dev->gso_max_size	= lower->gso_max_size;
+		dev->iflink		= lower->ifindex;
+		dev->hard_header_len	= lower->hard_header_len;
+	}
+
+	return 0;
+}
+
+static void ixgbe_vsi_psrtype(struct ixgbe_vsi_adapter *vadapter)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int rss_i = vadapter->netdev->real_num_rx_queues;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u16 pool = vadapter->pool;
+	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
+		      IXGBE_PSRTYPE_UDPHDR |
+		      IXGBE_PSRTYPE_IPV4HDR |
+		      IXGBE_PSRTYPE_L2HDR |
+		      IXGBE_PSRTYPE_IPV6HDR;
+
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		return;
+
+	if (rss_i > 3)
+		psrtype |= 2 << 29;
+	else if (rss_i > 1)
+		psrtype |= 1 << 29;
+
+	IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
+}
+
+int ixgbe_vsi_open(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int err = 0;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state)) {
+		e_dev_info("%s is down\n", adapter->netdev->name);
+		return -EBUSY;
+	}
+
+	vadapter->online = true;
+	err = ixgbe_vsi_up(dev);
+	if (!err)
+		netif_carrier_on(dev);
+	return err;
+}
+
+int ixgbe_vsi_up(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	unsigned int rxbase = vadapter->pool * adapter->num_rx_queues_per_pool;
+	unsigned int txbase = vadapter->pool * adapter->num_rx_queues_per_pool;
+	int err, i;
+
+	netif_carrier_off(dev);
+
+	vadapter->rx_base_queue = rxbase;
+	vadapter->tx_base_queue = txbase;
+
+	for (i = 0; i < dev->num_rx_queues; i++)
+		ixgbe_disable_vsi_ring(vadapter, adapter->rx_ring[rxbase + i]);
+
+	for (i = 0; i < dev->num_rx_queues; i++) {
+		adapter->rx_ring[rxbase + i]->vmdq_netdev = dev;
+		ixgbe_enable_vsi_ring(adapter, adapter->rx_ring[rxbase + i]);
+	}
+
+	for (i = 0; i < dev->num_tx_queues; i++)
+		adapter->tx_ring[txbase + i]->vmdq_netdev = dev;
+
+	if (is_valid_ether_addr(dev->dev_addr))
+		ixgbe_add_mac_filter(adapter, dev->dev_addr, vadapter->pool);
+
+	err = netif_set_real_num_tx_queues(dev, dev->num_tx_queues);
+	if (err)
+		goto err_set_queues;
+	err = netif_set_real_num_rx_queues(dev, dev->num_rx_queues);
+	if (err)
+		goto err_set_queues;
+
+	ixgbe_vsi_psrtype(vadapter);
+	netif_tx_start_all_queues(dev);
+	return 0;
+err_set_queues:
+	for (i = 0; i < dev->num_rx_queues; i++)
+		ixgbe_disable_vsi_ring(vadapter, adapter->rx_ring[rxbase + i]);
+	return err;
+}
+
+int ixgbe_vsi_close(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+
+	vadapter->online = false;
+	return ixgbe_vsi_down(dev);
+}
+
+int ixgbe_vsi_down(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	unsigned int rxbase = vadapter->rx_base_queue;
+	int i;
+
+	netif_tx_stop_all_queues(dev);
+	netif_carrier_off(dev);
+	netif_tx_disable(dev);
+
+	for (i = 0; i < dev->num_rx_queues; i++)
+		ixgbe_disable_vsi_ring(vadapter, adapter->rx_ring[rxbase + i]);
+
+	return 0;
+}
+
+netdev_tx_t ixgbe_vsi_xmit_frame(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_ring *tx_ring;
+	unsigned int queue;
+
+	queue = skb->queue_mapping + vadapter->tx_base_queue;
+	tx_ring = vadapter->real_adapter->tx_ring[queue];
+
+	return ixgbe_xmit_frame_ring(skb, vadapter->real_adapter, tx_ring);
+}
+
+struct net_device_stats *ixgbe_vsi_get_stats(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+
+	/* only return the current stats */
+	return &vadapter->net_stats;
+}
+
+int ixgbe_vsi_set_features(struct net_device *dev, netdev_features_t features)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 vlnctrl;
+	int i;
+
+	for (i = 0; i < dev->num_rx_queues; i++) {
+		unsigned int index = i + vadapter->rx_base_queue;
+		int reg_idx = adapter->rx_ring[index]->reg_idx;
+
+		vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+		if (features & NETIF_F_HW_VLAN_CTAG_RX)
+			vlnctrl |= IXGBE_RXDCTL_VME;
+		else
+			vlnctrl &= ~IXGBE_RXDCTL_VME;
+		IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), vlnctrl);
+	}
+
+	return 0;
+}
+
+void ixgbe_vsi_set_rx_mode(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 vmolr;
+
+	/* No unicast promiscuous support for VMDQ devices. */
+	vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vadapter->pool));
+	vmolr |= (IXGBE_VMOLR_ROMPE | IXGBE_VMOLR_BAM | IXGBE_VMOLR_AUPE);
+
+	/* clear the affected bit */
+	vmolr &= ~IXGBE_VMOLR_MPE;
+
+	if (dev->flags & IFF_ALLMULTI) {
+		vmolr |= IXGBE_VMOLR_MPE;
+	} else {
+		vmolr |= IXGBE_VMOLR_ROMPE;
+		hw->mac.ops.update_mc_addr_list(hw, dev);
+	}
+	ixgbe_write_uc_addr_list(adapter->netdev);
+	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vadapter->pool), vmolr);
+}
+
+int ixgbe_vsi_set_mac(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	ixgbe_del_mac_filter(adapter, vadapter->pool);
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	ixgbe_add_mac_filter(adapter, dev->dev_addr, vadapter->pool);
+	return 0;
+}
+
+int ixgbe_vsi_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+
+	if (adapter->netdev->mtu < new_mtu) {
+		e_warn(probe,
+		       "Set MTU on %s to >= %d before changing MTU on %s\n",
+			adapter->netdev->name, new_mtu, dev->name);
+		return -EINVAL;
+	}
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+void ixgbe_vsi_tx_timeout(struct net_device *dev)
+{
+	return;
+}
+
+int ixgbe_vsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_hw *hw = &vadapter->real_adapter->hw;
+
+	hw->mac.ops.set_vfta(hw, vid, vadapter->pool, true);
+	return 0;
+}
+
+int ixgbe_vsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_hw *hw = &vadapter->real_adapter->hw;
+
+	hw->mac.ops.set_vfta(hw, vid, vadapter->pool, false);
+	return 0;
+}
+
+static int ixgbe_vsi_get_settings(struct net_device *netdev,
+				   struct ethtool_cmd *ecmd)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+	struct net_device *real_netdev = vadapter->real_adapter->netdev;
+
+	return ixgbe_get_settings(real_netdev, ecmd);
+}
+
+static u32 ixgbe_vsi_get_msglevel(struct net_device *netdev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+
+	return vadapter->real_adapter->msg_enable;
+}
+
+static void ixgbe_vsi_get_drvinfo(struct net_device *netdev,
+				   struct ethtool_drvinfo *drvinfo)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	struct net_device *main_netdev = adapter->netdev;
+
+	strncpy(drvinfo->driver, ixgbe_driver_name, 32);
+	strncpy(drvinfo->version, ixgbe_driver_version, 32);
+
+	strncpy(drvinfo->fw_version, "N/A", 4);
+	snprintf(drvinfo->bus_info, 32, "%s VSI %d",
+		 main_netdev->name, vadapter->pool);
+	drvinfo->n_stats = 0;
+	drvinfo->testinfo_len = 0;
+	drvinfo->regdump_len = 0;
+}
+
+static void ixgbe_vsi_get_ringparam(struct net_device *netdev,
+				     struct ethtool_ringparam *ring)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+	unsigned int txr = vadapter->tx_base_queue;
+	unsigned int rxr = vadapter->rx_base_queue;
+	struct ixgbe_ring *tx_ring, *rx_ring;
+
+	tx_ring = vadapter->real_adapter->tx_ring[txr];
+	rx_ring = vadapter->real_adapter->rx_ring[rxr];
+
+	ring->rx_max_pending = IXGBE_MAX_RXD;
+	ring->tx_max_pending = IXGBE_MAX_TXD;
+	ring->rx_mini_max_pending = 0;
+	ring->rx_jumbo_max_pending = 0;
+	ring->rx_pending = rx_ring->count;
+	ring->tx_pending = tx_ring->count;
+	ring->rx_mini_pending = 0;
+	ring->rx_jumbo_pending = 0;
+}
+
+static struct ethtool_ops ixgbe_vsi_ethtool_ops = {
+	.get_settings	= ixgbe_vsi_get_settings,
+	.get_drvinfo	= ixgbe_vsi_get_drvinfo,
+	.get_link	= ethtool_op_get_link,
+	.get_ringparam	= ixgbe_vsi_get_ringparam,
+	.get_msglevel	= ixgbe_vsi_get_msglevel,
+};
+
+void ixgbe_vsi_set_ethtool_ops(struct net_device *netdev)
+{
+	SET_ETHTOOL_OPS(netdev, &ixgbe_vsi_ethtool_ops);
+}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h
new file mode 100644
index 0000000..509f485
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h
@@ -0,0 +1,71 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include "ixgbe.h"
+
+void ixgbe_ping_all_vadapters(struct net_device *dev);
+int ixgbe_vsi_init(struct net_device *dev);
+int ixgbe_vsi_up(struct net_device *dev);
+int ixgbe_vsi_open(struct net_device *dev);
+int ixgbe_vsi_close(struct net_device *dev);
+int ixgbe_vsi_down(struct net_device *dev);
+netdev_tx_t ixgbe_vsi_xmit_frame(struct sk_buff *skb, struct net_device *dev);
+struct net_device_stats *ixgbe_vsi_get_stats(struct net_device *dev);
+void ixgbe_vsi_set_rx_mode(struct net_device *dev);
+int ixgbe_vsi_set_mac(struct net_device *dev, void *addr);
+int ixgbe_vsi_change_mtu(struct net_device *dev, int new_mtu);
+void ixgbe_vsi_tx_timeout(struct net_device *dev);
+int ixgbe_vsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid);
+int ixgbe_vsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid);
+void ixgbe_vsi_set_ethtool_ops(struct net_device *netdev);
+int ixgbe_vsi_set_features(struct net_device *netdev,
+			   netdev_features_t features);
+
+static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
+					   u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+	/* skip the flush */
+}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/4] net: Add lower dev list helpers
  2013-09-11 18:46 ` [RFC PATCH 2/4] net: Add lower dev list helpers John Fastabend
@ 2013-09-14 12:27   ` Veaceslav Falico
  2013-09-14 20:43     ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Veaceslav Falico @ 2013-09-14 12:27 UTC (permalink / raw)
  To: John Fastabend
  Cc: stephen, bhutchings, ogerlitz, john.ronciak, netdev, shannon.nelson

On Wed, Sep 11, 2013 at 11:46:49AM -0700, John Fastabend wrote:
>This patch adds helpers to traverse the lower dev lists, these
>helpers match the upper dev list implementation.
>
>VSI implementers may use these to track a list of connected netdevs.
>This is easier then having drivers do their own accounting.

Just as a note (as I have quite no idea how ixgbe works) - you are aware
that the upper/lower lists currently include *all* upper/lower devices, and
not only the first-connected ones?

I've seen that you usually verify it, however not always, just a heads-up -
sorry if misread.

I'm also currently trying to get the new patchset included - which would
split the first-tier connected devices from all the 'other' (as in - lower
of a lower and upper of an upper) devices, so that way, I think, it would
be easier/faster for you to use it. It also has a ->private field, easily
accessible, which you could have used instead of/in conjunction with/for
struct ixgbe_vsi_adapter.

[PATCH v2 net-next 00/27] bonding: use neighbours instead of own lists

Here's the patchset.

Hope that helps.

>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
> include/linux/netdevice.h |    8 ++++++++
> net/core/dev.c            |   25 +++++++++++++++++++++++++
> 2 files changed, 33 insertions(+)
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 041b42a..4d24b38 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -2813,6 +2813,8 @@ extern int		bpf_jit_enable;
> extern bool netdev_has_upper_dev(struct net_device *dev,
> 				 struct net_device *upper_dev);
> extern bool netdev_has_any_upper_dev(struct net_device *dev);
>+extern struct net_device *netdev_lower_get_next_dev_rcu(struct net_device *dev,
>+							struct list_head **iter);
> extern struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
> 							struct list_head **iter);
>
>@@ -2823,6 +2825,12 @@ extern struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev,
> 	     upper; \
> 	     upper = netdev_upper_get_next_dev_rcu(dev, &(iter)))
>
>+#define netdev_for_each_lower_dev_rcu(dev, lower, iter) \
>+	for (iter = &(dev)->lower_dev_list, \
>+	     lower = netdev_lower_get_next_dev_rcu(dev, &(iter)); \
>+	     lower; \
>+	     lower = netdev_lower_get_next_dev_rcu(dev, &(iter)))
>+
> extern struct net_device *netdev_master_upper_dev_get(struct net_device *dev);
> extern struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev);
> extern int netdev_upper_dev_link(struct net_device *dev,
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 5c713f2..65ed610 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -4468,6 +4468,31 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
> }
> EXPORT_SYMBOL(netdev_master_upper_dev_get);
>
>+/* netdev_lower_get_next_dev_rcu - Get the next dev from lower list
>+ * @dev: device
>+ * @iter: list_head ** of the current position
>+ *
>+ * Gets the next device from the dev's lower list, starting from iter
>+ * position. The caller must hold RTNL/RCU read lock.
>+ */
>+struct net_device *netdev_lower_get_next_dev_rcu(struct net_device *dev,
>+						 struct list_head **iter)
>+{
>+	struct netdev_adjacent *lower;
>+
>+	WARN_ON_ONCE(!rcu_read_lock_held());
>+
>+	lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
>+
>+	if (&lower->list == &dev->lower_dev_list)
>+		return NULL;
>+
>+	*iter = &lower->list;
>+
>+	return lower->dev;
>+}
>+EXPORT_SYMBOL(netdev_lower_get_next_dev_rcu);
>+
> /* netdev_upper_get_next_dev_rcu - Get the next dev from upper list
>  * @dev: device
>  * @iter: list_head ** of the current position
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/4] net: Add lower dev list helpers
  2013-09-14 12:27   ` Veaceslav Falico
@ 2013-09-14 20:43     ` John Fastabend
  2013-09-14 21:14       ` Veaceslav Falico
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-09-14 20:43 UTC (permalink / raw)
  To: Veaceslav Falico
  Cc: stephen, bhutchings, ogerlitz, john.ronciak, netdev, shannon.nelson

On 09/14/2013 05:27 AM, Veaceslav Falico wrote:
> On Wed, Sep 11, 2013 at 11:46:49AM -0700, John Fastabend wrote:
>> This patch adds helpers to traverse the lower dev lists, these
>> helpers match the upper dev list implementation.
>>
>> VSI implementers may use these to track a list of connected netdevs.
>> This is easier then having drivers do their own accounting.
>
> Just as a note (as I have quite no idea how ixgbe works) - you are aware
> that the upper/lower lists currently include *all* upper/lower devices, and
> not only the first-connected ones?

Yep, the netif_is_vsi_port() should hopefully catch all the cases.

>
> I've seen that you usually verify it, however not always, just a heads-up -
> sorry if misread.

I'll audit it again, maybe I missed a case.

>
> I'm also currently trying to get the new patchset included - which would
> split the first-tier connected devices from all the 'other' (as in - lower
> of a lower and upper of an upper) devices, so that way, I think, it would
> be easier/faster for you to use it. It also has a ->private field, easily
> accessible, which you could have used instead of/in conjunction with/for
> struct ixgbe_vsi_adapter.

Yeah I saw the patch set I'll use the improved lists once I see the
series gets applied. Although in my case the list walking only occurs in
the configuration path so its not critical.

Also the net device structure already has private space to hang the
device dependent structure off of, so I'm not sure having another
private field helps here.

Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 2/4] net: Add lower dev list helpers
  2013-09-14 20:43     ` John Fastabend
@ 2013-09-14 21:14       ` Veaceslav Falico
  0 siblings, 0 replies; 35+ messages in thread
From: Veaceslav Falico @ 2013-09-14 21:14 UTC (permalink / raw)
  To: John Fastabend
  Cc: stephen, bhutchings, ogerlitz, john.ronciak, netdev, shannon.nelson

On Sat, Sep 14, 2013 at 01:43:09PM -0700, John Fastabend wrote:
>On 09/14/2013 05:27 AM, Veaceslav Falico wrote:
>>On Wed, Sep 11, 2013 at 11:46:49AM -0700, John Fastabend wrote:
>>>This patch adds helpers to traverse the lower dev lists, these
>>>helpers match the upper dev list implementation.
>>>
>>>VSI implementers may use these to track a list of connected netdevs.
>>>This is easier then having drivers do their own accounting.
>>
>>Just as a note (as I have quite no idea how ixgbe works) - you are aware
>>that the upper/lower lists currently include *all* upper/lower devices, and
>>not only the first-connected ones?
>
>Yep, the netif_is_vsi_port() should hopefully catch all the cases.
>
>>
>>I've seen that you usually verify it, however not always, just a heads-up -
>>sorry if misread.
>
>I'll audit it again, maybe I missed a case.

It was about lower devices, and I think I've just really misread it -
indeed they can be only 'your' devices.

>
>>
>>I'm also currently trying to get the new patchset included - which would
>>split the first-tier connected devices from all the 'other' (as in - lower
>>of a lower and upper of an upper) devices, so that way, I think, it would
>>be easier/faster for you to use it. It also has a ->private field, easily
>>accessible, which you could have used instead of/in conjunction with/for
>>struct ixgbe_vsi_adapter.
>
>Yeah I saw the patch set I'll use the improved lists once I see the
>series gets applied. Although in my case the list walking only occurs in
>the configuration path so its not critical.
>
>Also the net device structure already has private space to hang the
>device dependent structure off of, so I'm not sure having another
>private field helps here.

Yep, it should/can be used when linking devices which might already be
using their private space for their own structures, which is not your case,
I suppose.

>
>Thanks,
>John
>
>-- 
>John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/4] net: VSI: Add virtual station interface support
  2013-09-11 18:47 ` [RFC PATCH 3/4] net: VSI: Add virtual station interface support John Fastabend
@ 2013-09-20 23:12   ` Neil Horman
  2013-09-21 17:30     ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2013-09-20 23:12 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend


John-
     Sorry for not copying your orgional patch into your email, but I apparently
erased this email prior to fully reading it.  At any rate, as we discussed the
other day, I think this idea is great, but it would benefit from being
implemented using macvlans rather than a new link type.  If we can set a
hardware flag in the underlying driver to indicate support for hardware
forwarding, we can skip the macvlan soft forwarding, and just transmit directly.
It should save us needing to create a whole new link type that is so simmilar to
one we already have.

     I'm back home now, so I can start looking at this if you like on monday.

Best
Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/4] net: VSI: Add virtual station interface support
  2013-09-20 23:12   ` Neil Horman
@ 2013-09-21 17:30     ` John Fastabend
  2013-09-22 16:44       ` Neil Horman
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-09-21 17:30 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, John Fastabend

On 9/20/2013 4:12 PM, Neil Horman wrote:
>
> John-
>       Sorry for not copying your orgional patch into your email, but I apparently
> erased this email prior to fully reading it.  At any rate, as we discussed the
> other day, I think this idea is great, but it would benefit from being
> implemented using macvlans rather than a new link type.  If we can set a
> hardware flag in the underlying driver to indicate support for hardware
> forwarding, we can skip the macvlan soft forwarding, and just transmit directly.
> It should save us needing to create a whole new link type that is so simmilar to
> one we already have.
>
>       I'm back home now, so I can start looking at this if you like on monday.
>
> Best
> Neil
>
>

Yes definitely take a look making it an extension (flag) to macvlan
would be more useful if it can work. The one question I have is would
we need to clear the flag when a feature is attached to the PF that can
not be done in hardware. For example attaching an ingress qdisc,
ebtables, or starting up tcpdump are two that come to mind.

.John

Sorry for the duplication Neil, I sent this mail from a broken email
client just a minute ago and vger dropped it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 3/4] net: VSI: Add virtual station interface support
  2013-09-21 17:30     ` John Fastabend
@ 2013-09-22 16:44       ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-09-22 16:44 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, John Fastabend

On Sat, Sep 21, 2013 at 10:30:29AM -0700, John Fastabend wrote:
> On 9/20/2013 4:12 PM, Neil Horman wrote:
> >
> >John-
> >      Sorry for not copying your orgional patch into your email, but I apparently
> >erased this email prior to fully reading it.  At any rate, as we discussed the
> >other day, I think this idea is great, but it would benefit from being
> >implemented using macvlans rather than a new link type.  If we can set a
> >hardware flag in the underlying driver to indicate support for hardware
> >forwarding, we can skip the macvlan soft forwarding, and just transmit directly.
> >It should save us needing to create a whole new link type that is so simmilar to
> >one we already have.
> >
> >      I'm back home now, so I can start looking at this if you like on monday.
> >
> >Best
> >Neil
> >
> >
> 
> Yes definitely take a look making it an extension (flag) to macvlan
> would be more useful if it can work. The one question I have is would
> we need to clear the flag when a feature is attached to the PF that can
> not be done in hardware. For example attaching an ingress qdisc,
> ebtables, or starting up tcpdump are two that come to mind.
> 
I think that it probably would, although I expect that this would be the case
independent of the implementations (virtual station interfaces vs. macvlans).
The only other option I see is to disallow those features that can't be done in
hardware, but I think thats unworkable long term.

Neil

> .John
> 
> Sorry for the duplication Neil, I sent this mail from a broken email
> client just a minute ago and vger dropped it.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration
  2013-09-11 18:45 [RFC PATCH 0/4] Series short description John Fastabend
                   ` (3 preceding siblings ...)
  2013-09-11 18:47 ` [RFC PATCH 4/4] ixgbe: Adding VSI support to ixgbe John Fastabend
@ 2013-09-25 20:16 ` Neil Horman
  2013-09-25 20:16   ` [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
                     ` (4 more replies)
  4 siblings, 5 replies; 35+ messages in thread
From: Neil Horman @ 2013-09-25 20:16 UTC (permalink / raw)
  To: netdev

John, et al. -
     As promised, heres my (very rough) first pass at an alternate propsal for
what you're trying to do with virtual station interfaces here.  Its completely
untested, but it builds, and I'll be trying to run it over the next few days
(though I'm sure I got part of the hardware manipulation wrong).  I wanted to
post it early though so you could get a look at it to see what you did and
didn't like about it.  Some notes:

1) As discussed, the major effort here is to tie in macvlans with l2 forwarding
acceleration, rather than creating a new vsi link type.  That should make
management easier for admins (be it via ovs or some other mechanism).  It
basically exposes a bit less to the user, which I think is good.

2) I've separated out the l2 forwarding acceleration operations from the
net_device_operations structure.  I'm not sure I like that yet, but I'm kind on
leaning that way.  Since a limited set of hardare supports forwarding
acceleration, it makes for a nice easy way to group functionality without
polluting the net_device_operations structure.  It also lets us group simmilar
functions together nicely (I can see a future l3_accel_ops structure if we can
do l3 flows in hardware).  Anywho, its a divergence from what we've been doing
so I thought I would call attention to it.

3) I've included a l2_accel_xmit method in the accel_ops structure for fast path
forwarding, but I'm not sure I like that.  It seems we should be able to use
ndo_start_xmit and key off some data to recognize that we should be doing
hardware forwarding.  I'm not quite sure how to do that yet though.  Something
to think about.

4) I've borrowed heavily from your vsi work of course just to get this building.
I think theres probbaly alot of consolidation that can be done in the code that
I added to ixgbe_main.c to make it smaller.  Again, I just wanted to post this
so you could speak up if you though this was all crap before I wen't too far
down the rabbit hole.

Regards
Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
@ 2013-09-25 20:16   ` Neil Horman
  2013-10-02  7:08     ` John Fastabend
  2013-09-25 20:16   ` [RFC PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2013-09-25 20:16 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, john.fastabend, David S. Miller

Add a operations structure that allows a network interface to export the fact
that it supports package forwarding in hardware between physical interfaces and
other mac layer devices assigned to it (such as macvlans).  this operaions
structure can be used by virtual mac devices to bypass software switching so
that forwarding can be done in hardware more efficiently.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.fastabend@intel.com
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/macvlan.c      | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/if_macvlan.h |  1 +
 include/linux/netdevice.h  | 10 ++++++++++
 net/core/dev.c             |  3 +++
 4 files changed, 51 insertions(+)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9bf46bd..0c37b30 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -296,8 +296,16 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	unsigned int len = skb->len;
 	int ret;
 	const struct macvlan_dev *vlan = netdev_priv(dev);
+	const struct l2_forwarding_accel_ops *l2a_ops = vlan->lowerdev->l2a_ops;
+
+	if (l2a_ops->l2_accel_xmit) {
+		ret = l2a_ops->l2_accel_xmit(skb, vlan->l2a_priv);
+		if (likely(ret == NETDEV_TX_OK))
+			goto update_stats;
+	}
 
 	ret = macvlan_queue_xmit(skb, dev);
+update_stats:
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
 		struct macvlan_pcpu_stats *pcpu_stats;
 
@@ -336,6 +344,7 @@ static int macvlan_open(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
+	const struct l2_forwarding_accel_ops *l2a_ops = lowerdev->l2a_ops;
 	int err;
 
 	if (vlan->port->passthru) {
@@ -347,6 +356,19 @@ static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	if (l2a_ops->l2_accel_add_dev) {
+		/* The lowerdev supports l2 switching
+		 * try to add this macvlan to it
+		 */
+		vlan->l2a_priv = kzalloc(l2a_ops->priv_size, GFP_KERNEL);
+		if (!vlan->l2a_priv)
+			return -ENOMEM;
+		err = l2a_ops->l2_accel_add_dev(vlan->lowerdev,
+						dev, vlan->l2a_priv);
+		if (err < 0)
+			return err;
+	}
+
 	err = -EBUSY;
 	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
 		goto out;
@@ -367,6 +389,13 @@ hash_add:
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
+	if (vlan->l2a_priv) {
+		if (l2a_ops->l2_accel_del_dev)
+			l2a_ops->l2_accel_del_dev(vlan->lowerdev,
+						  vlan->l2a_priv);
+		kfree(vlan->l2a_priv);
+		vlan->l2a_priv = NULL;
+	}
 	return err;
 }
 
@@ -374,6 +403,7 @@ static int macvlan_stop(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
+	const struct l2_forwarding_accel_ops *l2a_ops = lowerdev->l2a_ops;
 
 	dev_uc_unsync(lowerdev, dev);
 	dev_mc_unsync(lowerdev, dev);
@@ -391,6 +421,12 @@ static int macvlan_stop(struct net_device *dev)
 
 hash_del:
 	macvlan_hash_del(vlan, !dev->dismantle);
+	if (vlan->l2a_priv) {
+		l2a_ops->l2_accel_del_dev(vlan->lowerdev, vlan->l2a_priv);
+		kfree(vlan->l2a_priv);
+		vlan->l2a_priv = NULL;
+	}
+
 	return 0;
 }
 
@@ -801,6 +837,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 		if (err < 0)
 			return err;
 	}
+
 	port = macvlan_port_get_rtnl(lowerdev);
 
 	/* Only 1 macvlan device can be created in passthru mode */
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index ddd33fd..9b16af9 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -61,6 +61,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
+	void			*l2a_priv;
 	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3de49ac..4cbc535 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1099,6 +1099,15 @@ struct net_device_ops {
 						      __be16 port);
 };
 
+struct l2_forwarding_accel_ops {
+	size_t priv_size;
+	int	(*l2_accel_add_dev)(struct net_device *pdev,
+				    struct net_device *vdev, void *priv);
+	void	(*l2_accel_del_dev)(struct net_device *pdev, void *priv);
+	netdev_tx_t             (*l2_accel_xmit)(struct sk_buff *skb,
+						 void *priv);
+};
+
 /*
  *	The DEVICE structure.
  *	Actually, this whole structure is a big mistake.  It mixes I/O
@@ -1183,6 +1192,7 @@ struct net_device {
 	/* Management operations */
 	const struct net_device_ops *netdev_ops;
 	const struct ethtool_ops *ethtool_ops;
+	const struct l2_forwarding_accel_ops *l2a_ops;
 
 	/* Hardware header description */
 	const struct header_ops *header_ops;
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2..5c5eec2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5992,6 +5992,7 @@ struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
 }
 
 static const struct ethtool_ops default_ethtool_ops;
+static const struct l2_forwarding_accel_ops default_l2a_ops;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops)
@@ -6090,6 +6091,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->group = INIT_NETDEV_GROUP;
 	if (!dev->ethtool_ops)
 		dev->ethtool_ops = &default_ethtool_ops;
+	if (!dev->l2a_ops)
+		dev->l2a_ops = &default_l2a_ops;
 	return dev;
 
 free_all:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
  2013-09-25 20:16   ` [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
@ 2013-09-25 20:16   ` Neil Horman
  2013-10-02  6:31   ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-09-25 20:16 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, john.fastabend, David S. Miller

Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to
take advantage of these operations.  Allow it to allocate queues for a macvlan
so that when we transmit a frame, we can do the switching in hardware inside the
ixgbe card, rather than in software.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.fastabend@intel.com
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |  32 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h     |  54 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |  15 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 360 +++++++++++++++++------
 5 files changed, 374 insertions(+), 91 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 0ac6b11..cfa6a8b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -244,6 +244,7 @@ struct ixgbe_ring {
 	unsigned long last_rx_timestamp;
 	unsigned long state;
 	u8 __iomem *tail;
+	struct net_device *vmdq_netdev;
 	dma_addr_t dma;			/* phys. address of descriptor ring */
 	unsigned int size;		/* length in bytes */
 
@@ -288,11 +289,15 @@ enum ixgbe_ring_f_enum {
 };
 
 #define IXGBE_MAX_RSS_INDICES  16
-#define IXGBE_MAX_VMDQ_INDICES 64
+#define IXGBE_MAX_VMDQ_INDICES 32
 #define IXGBE_MAX_FDIR_INDICES 63	/* based on q_vector limit */
 #define IXGBE_MAX_FCOE_INDICES  8
 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
+#define IXGBE_MAX_L2A_QUEUES 4
+#define IXGBE_MAX_L2A_QUEUES 4
+#define IXGBE_BAD_L2A_QUEUE 3
+
 struct ixgbe_ring_feature {
 	u16 limit;	/* upper limit on feature indices */
 	u16 indices;	/* current value of indices */
@@ -738,6 +743,7 @@ struct ixgbe_adapter {
 #endif /*CONFIG_DEBUG_FS*/
 
 	u8 default_up;
+	unsigned long l2a_bitmask; /* Bitmask indicating in use pools */
 };
 
 struct ixgbe_fdir_filter {
@@ -747,6 +753,17 @@ struct ixgbe_fdir_filter {
 	u16 action;
 };
 
+struct ixgbe_l2a_adapter {
+	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
+	struct net_device *netdev;
+	struct ixgbe_adapter *real_adapter;
+	unsigned int tx_base_queue;
+	unsigned int rx_base_queue;
+	struct net_device_stats net_stats;
+	int pool;
+	bool online;
+};
+
 enum ixgbe_state_t {
 	__IXGBE_TESTING,
 	__IXGBE_RESETTING,
@@ -879,9 +896,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {}
 static inline void ixgbe_dbg_init(void) {}
 static inline void ixgbe_dbg_exit(void) {}
 #endif /* CONFIG_DEBUG_FS */
+static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring)
+{
+	return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev;
+}
+
 static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring)
 {
-	return netdev_get_tx_queue(ring->netdev, ring->queue_index);
+	return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index);
 }
 
 extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter);
@@ -915,4 +937,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr);
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter);
 #endif
 
+int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd);
+int ixgbe_write_uc_addr_list(struct net_device *netdev);
+netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
+				  struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *tx_ring);
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring);
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index e8649ab..277af14 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = {
 };
 #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN
 
-static int ixgbe_get_settings(struct net_device *netdev,
-                              struct ethtool_cmd *ecmd)
+int ixgbe_get_settings(struct net_device *netdev,
+		       struct ethtool_cmd *ecmd)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h
new file mode 100644
index 0000000..2f36584
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h
@@ -0,0 +1,54 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include "ixgbe.h"
+
+
+static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
+					   u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+	/* skip the flush */
+}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 90b4e10..e2dd635 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 #endif
 
 	/* only proceed if SR-IOV is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) &&
+	    !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED))
 		return false;
 
 	/* Add starting offset to total pool count */
@@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 
 		/* apply Tx specific ring traits */
 		ring->count = adapter->tx_ring_count;
-		ring->queue_index = txr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				txr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = txr_idx;
 
 		/* assign ring to adapter */
 		adapter->tx_ring[txr_idx] = ring;
@@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 #endif /* IXGBE_FCOE */
 		/* apply Rx specific ring traits */
 		ring->count = adapter->rx_ring_count;
-		ring->queue_index = rxr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				rxr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = rxr_idx;
 
 		/* assign ring to adapter */
 		adapter->rx_ring[rxr_idx] = ring;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..dccec86 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -52,6 +52,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_l2a.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -872,7 +873,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
 
 static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
 {
-	struct ixgbe_adapter *adapter = netdev_priv(ring->netdev);
+	struct net_device *dev = ring->netdev;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 
 	u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx));
@@ -1055,7 +1057,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1072,16 +1074,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
+		if (__netif_subqueue_stopped(netdev_ring(tx_ring),
 					     tx_ring->queue_index)
 		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
+			netif_wake_subqueue(netdev_ring(tx_ring),
 					    tx_ring->queue_index);
 			++tx_ring->tx_stats.restart_queue;
 		}
@@ -1226,7 +1228,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring,
 				 union ixgbe_adv_rx_desc *rx_desc,
 				 struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
+	if (netdev_ring(ring)->features & NETIF_F_RXHASH)
 		skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
 }
 
@@ -1260,10 +1262,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
+	struct net_device *dev = netdev_ring(ring);
+
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(ring->netdev->features & NETIF_F_RXCSUM))
+	if (!(dev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -1559,7 +1563,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
-	struct net_device *dev = rx_ring->netdev;
+	struct net_device *dev = netdev_ring(rx_ring);
 
 	ixgbe_update_rsc_stats(rx_ring, skb);
 
@@ -1739,7 +1743,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 				  union ixgbe_adv_rx_desc *rx_desc,
 				  struct sk_buff *skb)
 {
-	struct net_device *netdev = rx_ring->netdev;
+	struct net_device *netdev = netdev_ring(rx_ring);
 
 	/* verify that the packet does not have any known errors */
 	if (unlikely(ixgbe_test_staterr(rx_desc,
@@ -1905,7 +1909,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+		skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring),
 						IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
@@ -1986,6 +1990,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	int ddp_bytes;
 	unsigned int mss = 0;
+	struct net_device *netdev = netdev_ring(rx_ring);
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
 
@@ -2041,7 +2046,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			/* include DDPed FCoE data */
 			if (ddp_bytes > 0) {
 				if (!mss) {
-					mss = rx_ring->netdev->mtu -
+					mss = netdev->mtu -
 						sizeof(struct fcoe_hdr) -
 						sizeof(struct fc_frame_header) -
 						sizeof(struct fcoe_crc_eof);
@@ -2455,58 +2460,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 	}
 }
 
-static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
-					   u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
-static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
-					    u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
 /**
  * ixgbe_irq_enable - Enable default interrupt generation settings
  * @adapter: board private structure
@@ -2946,6 +2899,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 			     struct ixgbe_ring *ring)
 {
+	struct net_device *netdev = netdev_ring(ring);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
@@ -3005,7 +2959,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 		struct ixgbe_q_vector *q_vector = ring->q_vector;
 
 		if (q_vector)
-			netif_set_xps_queue(adapter->netdev,
+			netif_set_xps_queue(netdev,
 					    &q_vector->affinity_mask,
 					    ring->queue_index);
 	}
@@ -3395,7 +3349,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int rss_i = adapter->ring_feature[RING_F_RSS].indices;
-	int p;
+	u16 pool;
 
 	/* PSRTYPE must be initialized in non 82598 adapters */
 	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
@@ -3412,9 +3366,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	else if (rss_i > 1)
 		psrtype |= 1 << 29;
 
-	for (p = 0; p < adapter->num_rx_pools; p++)
-		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)),
-				psrtype);
+	for_each_set_bit(pool, &adapter->l2a_bitmask, 32)
+		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
 }
 
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
@@ -3683,6 +3636,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl &= ~IXGBE_RXDCTL_VME;
@@ -3713,6 +3668,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl |= IXGBE_RXDCTL_VME;
@@ -3743,15 +3700,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
  *                0 on no addresses written
  *                X on writing X addresses to the RAR table
  **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev)
+int ixgbe_write_uc_addr_list(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	unsigned int rar_entries = hw->mac.num_rar_entries - 1;
 	int count = 0;
 
-	/* In SR-IOV mode significantly less RAR entries are available */
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+	/* In SR-IOV/VMDQ modes significantly less RAR entries are available */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED ||
+	    adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)
 		rar_entries = IXGBE_MAX_PF_MACVLANS - 1;
 
 	/* return ENOMEM indicating insufficient memory for addresses */
@@ -3772,6 +3730,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev)
 			count++;
 		}
 	}
+
 	/* write the addresses in reverse order to avoid write combining */
 	for (; rar_entries > 0 ; rar_entries--)
 		hw->mac.ops.clear_rar(hw, rar_entries);
@@ -4133,6 +4092,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	ixgbe_configure_virtualization(adapter);
 
 	ixgbe_set_rx_mode(adapter->netdev);
+
 	ixgbe_restore_vlan(adapter);
 
 	switch (hw->mac.type) {
@@ -4459,7 +4419,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
  **/
-static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
 	unsigned long size;
@@ -4838,6 +4798,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		return -EIO;
 	}
 
+	/* PF holds first pool slot */
+	set_bit(0, &adapter->l2a_bitmask);
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
 	return 0;
@@ -5143,7 +5105,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
 static int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
+	int err, queues;
 
 	/* disallow open during test */
 	if (test_bit(__IXGBE_TESTING, &adapter->state))
@@ -5168,16 +5130,22 @@ static int ixgbe_open(struct net_device *netdev)
 		goto err_req_irq;
 
 	/* Notify the stack of the actual queue counts. */
-	err = netif_set_real_num_tx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_tx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_tx_queues > IXGBE_MAX_L2A_QUEUES)
+		queues = IXGBE_MAX_L2A_QUEUES;
+	else
+		queues = adapter->num_tx_queues;
+
+	err = netif_set_real_num_tx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
-
-	err = netif_set_real_num_rx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_rx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_rx_queues > IXGBE_MAX_L2A_QUEUES)
+		queues = IXGBE_MAX_L2A_QUEUES;
+	else
+		queues = adapter->num_rx_queues;
+	err = netif_set_real_num_rx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
@@ -5215,7 +5183,6 @@ static int ixgbe_close(struct net_device *netdev)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
 	ixgbe_ptp_stop(adapter);
-
 	ixgbe_down(adapter);
 	ixgbe_free_irq(adapter);
 
@@ -6576,7 +6543,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -6588,7 +6555,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
@@ -6639,6 +6606,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_ring *tx_ring)
 {
 	struct ixgbe_tx_buffer *first;
+#ifdef IXGBE_FCOE
+	struct net_device *dev;
+#endif
 	int tso;
 	u32 tx_flags = 0;
 	unsigned short f;
@@ -6730,9 +6700,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 	first->protocol = protocol;
 
 #ifdef IXGBE_FCOE
+	dev = netdev_ring(tx_ring);
 	/* setup tx offload for FCoE */
 	if ((protocol == __constant_htons(ETH_P_FCOE)) &&
-	    (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
+	    (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
 		tso = ixgbe_fso(tx_ring, first, &hdr_len);
 		if (tso < 0)
 			goto out_drop;
@@ -7057,6 +7028,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	 */
 	if (netif_running(dev))
 		ixgbe_close(dev);
+
 	ixgbe_clear_interrupt_scheme(adapter);
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7305,6 +7277,219 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
 }
 
+static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+}
+
+static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter,
+				 u8 *addr, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool;
+	hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV);
+}
+
+static void ixgbe_l2a_psrtype(struct ixgbe_l2a_adapter *vadapter)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int rss_i = vadapter->netdev->real_num_rx_queues;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u16 pool = vadapter->pool;
+	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
+		      IXGBE_PSRTYPE_UDPHDR |
+		      IXGBE_PSRTYPE_IPV4HDR |
+		      IXGBE_PSRTYPE_L2HDR |
+		      IXGBE_PSRTYPE_IPV6HDR;
+
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		return;
+
+	if (rss_i > 3)
+		psrtype |= 2 << 29;
+	else if (rss_i > 1)
+		psrtype |= 1 << 29;
+
+	IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
+}
+
+static void ixgbe_enable_l2a_ring(struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *rx_ring)
+{
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+}
+
+
+static void ixgbe_disable_l2a_ring(struct ixgbe_l2a_adapter *vadapter,
+				   struct ixgbe_ring *rx_ring)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int index = rx_ring->queue_index + vadapter->rx_base_queue;
+
+	/* shutdown specific queue receive and wait for dma to settle */
+	ixgbe_disable_rx_queue(adapter, rx_ring);
+	usleep_range(10000, 20000);
+	ixgbe_irq_disable_queues(adapter, ((u64)1 << index));
+	ixgbe_clean_rx_ring(rx_ring);
+}
+
+int ixgbe_l2a_ring_up(struct net_device *vdev, struct ixgbe_l2a_adapter *accel)
+{
+	struct ixgbe_adapter *adapter = accel->real_adapter;
+	unsigned int rxbase = accel->pool * adapter->num_rx_queues_per_pool;
+	unsigned int txbase = accel->pool * adapter->num_rx_queues_per_pool;
+	int err, i;
+
+
+	accel->rx_base_queue = rxbase;
+	accel->tx_base_queue = txbase;
+
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_l2a_ring(accel, adapter->rx_ring[rxbase + i]);
+
+	for (i = 0; i < vdev->num_rx_queues; i++) {
+		adapter->rx_ring[rxbase + i]->vmdq_netdev = vdev;
+		ixgbe_enable_l2a_ring(adapter, adapter->rx_ring[rxbase + i]);
+	}
+
+	for (i = 0; i < vdev->num_tx_queues; i++)
+		adapter->tx_ring[txbase + i]->vmdq_netdev = vdev;
+
+	if (is_valid_ether_addr(vdev->dev_addr))
+		ixgbe_add_mac_filter(adapter, vdev->dev_addr, accel->pool);
+
+	err = netif_set_real_num_tx_queues(vdev, vdev->num_tx_queues);
+	if (err)
+		goto err_set_queues;
+	err = netif_set_real_num_rx_queues(vdev, vdev->num_rx_queues);
+	if (err)
+		goto err_set_queues;
+
+	ixgbe_l2a_psrtype(accel);
+	netif_tx_start_all_queues(vdev);
+	return 0;
+err_set_queues:
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_l2a_ring(accel, adapter->rx_ring[rxbase + i]);
+	return err;
+}
+
+int ixgbe_l2a_ring_down(struct net_device *vdev, struct ixgbe_l2a_adapter *accel)
+{
+	struct ixgbe_adapter *adapter = accel->real_adapter;
+	unsigned int rxbase = accel->rx_base_queue;
+	int i;
+
+	netif_tx_stop_all_queues(vdev);
+
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_l2a_ring(accel, adapter->rx_ring[rxbase + i]);
+
+	return 0;
+}
+
+static int ixgbe_l2a_add(struct net_device *pdev, struct net_device *vdev,
+			 void *priv)
+{
+	struct ixgbe_l2a_adapter *l2a_adapter = priv;
+	struct ixgbe_adapter *adapter = netdev_priv(pdev);
+	int pool, vmdq_pool, base_queue;
+	int err;
+
+	/* Check for hardware restriction on number of rx/tx queues */
+	if (vdev->num_rx_queues != vdev->num_tx_queues ||
+	    vdev->num_tx_queues > IXGBE_MAX_L2A_QUEUES ||
+	    vdev->num_tx_queues == IXGBE_BAD_L2A_QUEUE) {
+		netdev_info(pdev, "%s: Supports RX/TX Queue counts 1,2, and 4\n",
+		       pdev->name);
+		return -EINVAL;
+	}
+
+	if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES)
+		return -EBUSY;
+
+	pool = find_first_zero_bit(&adapter->l2a_bitmask, 32);
+	adapter->num_rx_pools++;
+	set_bit(pool, &adapter->l2a_bitmask);
+
+	/* Enable VMDq flag so device will be set in VM mode */
+	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
+	adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools;
+	adapter->ring_feature[RING_F_VMDQ].offset = 0;
+	adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_L2A_QUEUES;
+
+	/* Force reinit of ring allocation with VMDQ enabled */
+	ixgbe_setup_tc(pdev, netdev_get_num_tc(pdev));
+
+	/* Configure VSI adapter structure */
+	vmdq_pool = VMDQ_P(pool);
+	base_queue = vmdq_pool * adapter->num_rx_queues_per_pool;
+
+	netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   pool, adapter->num_rx_pools,
+		   base_queue, base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->l2a_bitmask);
+
+	l2a_adapter->pool = pool;
+	l2a_adapter->netdev = vdev;
+	l2a_adapter->real_adapter = adapter;
+	l2a_adapter->rx_base_queue = base_queue;
+	l2a_adapter->tx_base_queue = base_queue;
+
+	err = ixgbe_l2a_ring_up(vdev, l2a_adapter);	
+	return err;
+}
+
+static void ixgbe_l2a_del(struct net_device *pdev, void *priv)
+{
+	struct ixgbe_l2a_adapter *l2a_adapter = priv; 
+	struct ixgbe_adapter *adapter = l2a_adapter->real_adapter;
+
+	clear_bit(l2a_adapter->pool, &adapter->l2a_bitmask);
+	adapter->num_rx_pools--;
+
+	ixgbe_l2a_ring_down(l2a_adapter->netdev, l2a_adapter);
+
+	netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   l2a_adapter->pool, adapter->num_rx_pools,
+		   l2a_adapter->rx_base_queue,
+		   l2a_adapter->rx_base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->l2a_bitmask);
+}
+
+netdev_tx_t ixgbe_l2a_xmit_frame(struct sk_buff *skb, void *priv)
+{
+	struct ixgbe_l2a_adapter *l2a_adapter = priv;
+	struct ixgbe_ring *tx_ring;
+	unsigned int queue;
+
+	queue = skb->queue_mapping + l2a_adapter->tx_base_queue;
+	tx_ring = l2a_adapter->real_adapter->tx_ring[queue];
+
+	return ixgbe_xmit_frame_ring(skb, l2a_adapter->real_adapter, tx_ring);
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7351,6 +7536,13 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
 };
 
+const struct l2_forwarding_accel_ops  ixgbe_l2a_ops = {
+	.priv_size = sizeof(struct ixgbe_l2a_adapter),
+	.l2_accel_add_dev = ixgbe_l2a_add,
+	.l2_accel_del_dev = ixgbe_l2a_del,
+	.l2_accel_xmit = ixgbe_l2a_xmit_frame,
+};
+
 /**
  * ixgbe_enumerate_functions - Get the number of ports this device has
  * @adapter: adapter structure
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
  2013-09-25 20:16   ` [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
  2013-09-25 20:16   ` [RFC PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
@ 2013-10-02  6:31   ` John Fastabend
  2013-10-02 13:28     ` Neil Horman
  2013-10-04 20:10   ` [RFC PATCH 0/2 v2] " Neil Horman
  2013-10-11 18:43   ` [RFC PATCH 0/2 v3] " Neil Horman
  4 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-10-02  6:31 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev

On 09/25/2013 01:16 PM, Neil Horman wrote:
> John, et al. -
>       As promised, heres my (very rough) first pass at an alternate propsal for
> what you're trying to do with virtual station interfaces here.  Its completely
> untested, but it builds, and I'll be trying to run it over the next few days
> (though I'm sure I got part of the hardware manipulation wrong).  I wanted to
> post it early though so you could get a look at it to see what you did and
> didn't like about it.  Some notes:

Sorry for the delay. I like the idea one nice win here is my macvlan
kvm setup would use the offloads without having to reconfigure.

>
> 1) As discussed, the major effort here is to tie in macvlans with l2 forwarding
> acceleration, rather than creating a new vsi link type.  That should make
> management easier for admins (be it via ovs or some other mechanism).  It
> basically exposes a bit less to the user, which I think is good.
>

The trick here is the offload path may be functionally different from
the non-offload path. The user needs some visibility into this. For
example any qdiscs running on the lowerdev will not be visible from the
accelerated path.

When a new link type was being used I was able to convince myself that
it was a property of the link type. But if we reuse macvlan I think we
need some way to either automatically disable the offload path when this
occurs or provide the user visibility. Maybe a feature flag and a
netif_can_hw_offload() routine is needed?

> 2) I've separated out the l2 forwarding acceleration operations from the
> net_device_operations structure.  I'm not sure I like that yet, but I'm kind on
> leaning that way.  Since a limited set of hardare supports forwarding
> acceleration, it makes for a nice easy way to group functionality without
> polluting the net_device_operations structure.  It also lets us group simmilar
> functions together nicely (I can see a future l3_accel_ops structure if we can
> do l3 flows in hardware).  Anywho, its a divergence from what we've been doing
> so I thought I would call attention to it.
>
> 3) I've included a l2_accel_xmit method in the accel_ops structure for fast path
> forwarding, but I'm not sure I like that.  It seems we should be able to use
> ndo_start_xmit and key off some data to recognize that we should be doing
> hardware forwarding.  I'm not quite sure how to do that yet though.  Something
> to think about.

Without a specific xmit routine though you will be adding operations in
the common case for a special case. Having a new op fixes this.

>
> 4) I've borrowed heavily from your vsi work of course just to get this building.
> I think theres probbaly alot of consolidation that can be done in the code that
> I added to ixgbe_main.c to make it smaller.  Again, I just wanted to post this
> so you could speak up if you though this was all crap before I wen't too far
> down the rabbit hole.

There was some consolidation needed in my original RFC as well. I can
help clean some of this stuff up if you want.

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-09-25 20:16   ` [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
@ 2013-10-02  7:08     ` John Fastabend
  2013-10-02 12:53       ` Neil Horman
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-10-02  7:08 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, john.fastabend, David S. Miller

On 09/25/2013 01:16 PM, Neil Horman wrote:
> Add a operations structure that allows a network interface to export the fact
> that it supports package forwarding in hardware between physical interfaces and
> other mac layer devices assigned to it (such as macvlans).  this operaions
> structure can be used by virtual mac devices to bypass software switching so
> that forwarding can be done in hardware more efficiently.

Some additional nits below which maybe you have already thought of.

>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: john.fastabend@intel.com
> CC: "David S. Miller" <davem@davemloft.net>
> ---
>   drivers/net/macvlan.c      | 37 +++++++++++++++++++++++++++++++++++++
>   include/linux/if_macvlan.h |  1 +
>   include/linux/netdevice.h  | 10 ++++++++++
>   net/core/dev.c             |  3 +++
>   4 files changed, 51 insertions(+)
>
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index 9bf46bd..0c37b30 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -296,8 +296,16 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
>   	unsigned int len = skb->len;
>   	int ret;
>   	const struct macvlan_dev *vlan = netdev_priv(dev);
> +	const struct l2_forwarding_accel_ops *l2a_ops = vlan->lowerdev->l2a_ops;
> +
> +	if (l2a_ops->l2_accel_xmit) {
> +		ret = l2a_ops->l2_accel_xmit(skb, vlan->l2a_priv);
> +		if (likely(ret == NETDEV_TX_OK))

maybe dev_xmit_complete() would be more appropriate?

> +			goto update_stats;
> +	}
>
>   	ret = macvlan_queue_xmit(skb, dev);
> +update_stats:
>   	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
>   		struct macvlan_pcpu_stats *pcpu_stats;
>
> @@ -336,6 +344,7 @@ static int macvlan_open(struct net_device *dev)
>   {
>   	struct macvlan_dev *vlan = netdev_priv(dev);
>   	struct net_device *lowerdev = vlan->lowerdev;
> +	const struct l2_forwarding_accel_ops *l2a_ops = lowerdev->l2a_ops;
>   	int err;
>
>   	if (vlan->port->passthru) {
> @@ -347,6 +356,19 @@ static int macvlan_open(struct net_device *dev)
>   		goto hash_add;

Looks like this might break in the passthru case? If you don't call
l2_accel_add_dev here but still use the l2_accel_xmit.

>   	}
>
> +	if (l2a_ops->l2_accel_add_dev) {

In the error cases it might be preferred to fallback to the
non-offloaded software path. For example hardware may have a limit
to the number of VSIs that can be created but we wouldn't want to
push that up the stack.

> +		/* The lowerdev supports l2 switching
> +		 * try to add this macvlan to it
> +		 */
> +		vlan->l2a_priv = kzalloc(l2a_ops->priv_size, GFP_KERNEL);
> +		if (!vlan->l2a_priv)
> +			return -ENOMEM;
> +		err = l2a_ops->l2_accel_add_dev(vlan->lowerdev,
> +						dev, vlan->l2a_priv);
> +		if (err < 0)
> +			return err;
> +	}
> +
>   	err = -EBUSY;
>   	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
>   		goto out;
> @@ -367,6 +389,13 @@ hash_add:
>   del_unicast:
>   	dev_uc_del(lowerdev, dev->dev_addr);
>   out:
> +	if (vlan->l2a_priv) {

Add a feature flag here so it can be disabled.

> +		if (l2a_ops->l2_accel_del_dev)
> +			l2a_ops->l2_accel_del_dev(vlan->lowerdev,
> +						  vlan->l2a_priv);
> +		kfree(vlan->l2a_priv);
> +		vlan->l2a_priv = NULL;
> +	}
>   	return err;
>   }

[...]

Thanks,
John


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-02  7:08     ` John Fastabend
@ 2013-10-02 12:53       ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-02 12:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, john.fastabend, David S. Miller

On Wed, Oct 02, 2013 at 12:08:11AM -0700, John Fastabend wrote:
> On 09/25/2013 01:16 PM, Neil Horman wrote:
> >Add a operations structure that allows a network interface to export the fact
> >that it supports package forwarding in hardware between physical interfaces and
> >other mac layer devices assigned to it (such as macvlans).  this operaions
> >structure can be used by virtual mac devices to bypass software switching so
> >that forwarding can be done in hardware more efficiently.
> 
> Some additional nits below which maybe you have already thought of.
> 
Its ok, you can say it, theyre more than nits :), gaping holes is more like it.
This pass was completely untested, I'm still ironing out bits, but I think for
the most part we're in agreement on the items below.  Comments inline.

> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@intel.com
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> >  drivers/net/macvlan.c      | 37 +++++++++++++++++++++++++++++++++++++
> >  include/linux/if_macvlan.h |  1 +
> >  include/linux/netdevice.h  | 10 ++++++++++
> >  net/core/dev.c             |  3 +++
> >  4 files changed, 51 insertions(+)
> >
> >diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> >index 9bf46bd..0c37b30 100644
> >--- a/drivers/net/macvlan.c
> >+++ b/drivers/net/macvlan.c
> >@@ -296,8 +296,16 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
> >  	unsigned int len = skb->len;
> >  	int ret;
> >  	const struct macvlan_dev *vlan = netdev_priv(dev);
> >+	const struct l2_forwarding_accel_ops *l2a_ops = vlan->lowerdev->l2a_ops;
> >+
> >+	if (l2a_ops->l2_accel_xmit) {
> >+		ret = l2a_ops->l2_accel_xmit(skb, vlan->l2a_priv);
> >+		if (likely(ret == NETDEV_TX_OK))
> 
> maybe dev_xmit_complete() would be more appropriate?
> 
Yup, I've currently modified this to dev_queue_xmit, but _complete might be a
better option.

> >+			goto update_stats;
> >+	}
> >
> >  	ret = macvlan_queue_xmit(skb, dev);
> >+update_stats:
> >  	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
> >  		struct macvlan_pcpu_stats *pcpu_stats;
> >
> >@@ -336,6 +344,7 @@ static int macvlan_open(struct net_device *dev)
> >  {
> >  	struct macvlan_dev *vlan = netdev_priv(dev);
> >  	struct net_device *lowerdev = vlan->lowerdev;
> >+	const struct l2_forwarding_accel_ops *l2a_ops = lowerdev->l2a_ops;
> >  	int err;
> >
> >  	if (vlan->port->passthru) {
> >@@ -347,6 +356,19 @@ static int macvlan_open(struct net_device *dev)
> >  		goto hash_add;
> 
> Looks like this might break in the passthru case? If you don't call
> l2_accel_add_dev here but still use the l2_accel_xmit.
> 
Yeah, I need to clean this up.

> >  	}
> >
> >+	if (l2a_ops->l2_accel_add_dev) {
> 
> In the error cases it might be preferred to fallback to the
> non-offloaded software path. For example hardware may have a limit
> to the number of VSIs that can be created but we wouldn't want to
> push that up the stack.
> 
Hadn't thought of that, but yes, I agree, falling back to software switching is
definately desireable here

> >+		/* The lowerdev supports l2 switching
> >+		 * try to add this macvlan to it
> >+		 */
> >+		vlan->l2a_priv = kzalloc(l2a_ops->priv_size, GFP_KERNEL);
> >+		if (!vlan->l2a_priv)
> >+			return -ENOMEM;
> >+		err = l2a_ops->l2_accel_add_dev(vlan->lowerdev,
> >+						dev, vlan->l2a_priv);
> >+		if (err < 0)
> >+			return err;
> >+	}
> >+
> >  	err = -EBUSY;
> >  	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
> >  		goto out;
> >@@ -367,6 +389,13 @@ hash_add:
> >  del_unicast:
> >  	dev_uc_del(lowerdev, dev->dev_addr);
> >  out:
> >+	if (vlan->l2a_priv) {
> 
> Add a feature flag here so it can be disabled.
> 
Makes sense.

I'm still working out kinks, but I hope to have a next version later this
week/early next.  I'll make sure to incorporate these changes. 

Thanks!
Neil

> >+		if (l2a_ops->l2_accel_del_dev)
> >+			l2a_ops->l2_accel_del_dev(vlan->lowerdev,
> >+						  vlan->l2a_priv);
> >+		kfree(vlan->l2a_priv);
> >+		vlan->l2a_priv = NULL;
> >+	}
> >  	return err;
> >  }
> 
> [...]
> 
> Thanks,
> John
> 
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration
  2013-10-02  6:31   ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
@ 2013-10-02 13:28     ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-02 13:28 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev

On Tue, Oct 01, 2013 at 11:31:29PM -0700, John Fastabend wrote:
> On 09/25/2013 01:16 PM, Neil Horman wrote:
> >John, et al. -
> >      As promised, heres my (very rough) first pass at an alternate propsal for
> >what you're trying to do with virtual station interfaces here.  Its completely
> >untested, but it builds, and I'll be trying to run it over the next few days
> >(though I'm sure I got part of the hardware manipulation wrong).  I wanted to
> >post it early though so you could get a look at it to see what you did and
> >didn't like about it.  Some notes:
> 
> Sorry for the delay. I like the idea one nice win here is my macvlan
> kvm setup would use the offloads without having to reconfigure.
> 
Thats ok, I've been swamped, so this has gone slower than I had hoped for me.

> >
> >1) As discussed, the major effort here is to tie in macvlans with l2 forwarding
> >acceleration, rather than creating a new vsi link type.  That should make
> >management easier for admins (be it via ovs or some other mechanism).  It
> >basically exposes a bit less to the user, which I think is good.
> >
> 
> The trick here is the offload path may be functionally different from
> the non-offload path. The user needs some visibility into this. For
> example any qdiscs running on the lowerdev will not be visible from the
> accelerated path.
> 
Actually, given your suggestion in your other note (using dev_queue_xmit), I
think this will be handled, in that we will now pass accelerated traffic through
any lowerdev attached qdiscs.

> When a new link type was being used I was able to convince myself that
> it was a property of the link type. But if we reuse macvlan I think we
> need some way to either automatically disable the offload path when this
> occurs or provide the user visibility. Maybe a feature flag and a
> netif_can_hw_offload() routine is needed?
> 
As noted above, I think we can make macvlans work identially with and without
acceleration by reusing the xmit path, and keying off data (what I've done is
attach a pointer to acceleration data in the skb so the common ndo_start_xmit
path in the driver can differentiate accelerated from non-accelerated skbs).
That will allow everything to function the same way.  I do think though that a
feature flag just to provide some visibility to the admin is warranted.

> >2) I've separated out the l2 forwarding acceleration operations from the
> >net_device_operations structure.  I'm not sure I like that yet, but I'm kind on
> >leaning that way.  Since a limited set of hardare supports forwarding
> >acceleration, it makes for a nice easy way to group functionality without
> >polluting the net_device_operations structure.  It also lets us group simmilar
> >functions together nicely (I can see a future l3_accel_ops structure if we can
> >do l3 flows in hardware).  Anywho, its a divergence from what we've been doing
> >so I thought I would call attention to it.
> >
> >3) I've included a l2_accel_xmit method in the accel_ops structure for fast path
> >forwarding, but I'm not sure I like that.  It seems we should be able to use
> >ndo_start_xmit and key off some data to recognize that we should be doing
> >hardware forwarding.  I'm not quite sure how to do that yet though.  Something
> >to think about.
> 
> Without a specific xmit routine though you will be adding operations in
> the common case for a special case. Having a new op fixes this.
> 
Agreed, and thats something to consider.  I'm not sure if separating the special
case to its own operation is a net win however, given the behavioral differences
that arise (the qdisc issue above).  My next version will use a common xmit
path.  We can look it over then for comparison and weight the pros and cons.

> >
> >4) I've borrowed heavily from your vsi work of course just to get this building.
> >I think theres probbaly alot of consolidation that can be done in the code that
> >I added to ixgbe_main.c to make it smaller.  Again, I just wanted to post this
> >so you could speak up if you though this was all crap before I wen't too far
> >down the rabbit hole.
> 
> There was some consolidation needed in my original RFC as well. I can
> help clean some of this stuff up if you want.
> 
That would be greatly appreciated.  Thanks.  Next version should be available
in the next few days.  I'll get it to you asap.

Best
Neil

> .John
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
                     ` (2 preceding siblings ...)
  2013-10-02  6:31   ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
@ 2013-10-04 20:10   ` Neil Horman
  2013-10-04 20:10     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
                       ` (2 more replies)
  2013-10-11 18:43   ` [RFC PATCH 0/2 v3] " Neil Horman
  4 siblings, 3 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-04 20:10 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Andy Gospodarek, David Miller

Hey all-
     heres the next, updated version of the vsi/macvlan integration that we've
been discussing.

Some change notes:

* Changes to the fowarding ops structure - Removed the priv_size field, and
added a flags field.  Removal of the priv_size field was accomplished by just
having the add method return a void * and using ERR_PTR and PTR_ERR checks,
which also allows us to allocate memory for the acceleration path in the driver,
which I like.  I'm not super happy still with how I'm using the flags (currenly
only used to indicate support for feature sets), but at least we have the flags
now, and they can be exposed to user space via iproute2 or ethtool if need be

* Changes to the Transmit path - Specifically I'm using dev_queue_xmit to send
frames now, which I like as it makes the macvlan subject to the lowerdevs qdisc
configuration.

* Changes to the acceleration fail path behavior - Now if we don't/can't use
acceleration, we just fall back to using the normal macvlan software switch
strategy

* General clenups (some renaming, that I'm not super sure of, but I though
forwarding acceleration (fwd) would be a better prefix than l2 acceleration).

Still a long way to go I think, and lots of tweaking to do, but I didn't want to
keep you waiting John.  Anywho, take a look at what I'm doing and feel free to
rip it apart.

Thanks!
Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-04 20:10   ` [RFC PATCH 0/2 v2] " Neil Horman
@ 2013-10-04 20:10     ` Neil Horman
  2013-10-07 19:52       ` David Miller
  2013-10-04 20:10     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
  2013-10-07 22:09     ` [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
  2 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2013-10-04 20:10 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Andy Gospodarek, David Miller, Neil Horman

Add a operations structure that allows a network interface to export the fact
that it supports package forwarding in hardware between physical interfaces and
other mac layer devices assigned to it (such as macvlans).  this operaions
structure can be used by virtual mac devices to bypass software switching so
that forwarding can be done in hardware more efficiently.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/macvlan.c      | 32 ++++++++++++++++++++++++++++++++
 include/linux/if_macvlan.h |  1 +
 include/linux/netdevice.h  | 22 ++++++++++++++++++++++
 include/linux/skbuff.h     |  9 ++++++---
 net/core/dev.c             |  3 +++
 5 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9bf46bd..38d0fc5 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -297,7 +297,17 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	int ret;
 	const struct macvlan_dev *vlan = netdev_priv(dev);
 
+	if (vlan->fwd_priv) {
+		skb->dev = vlan->lowerdev;
+		skb->accel_priv = vlan->fwd_priv;
+		ret = dev_queue_xmit(skb);
+		if (likely(ret == NETDEV_TX_OK))
+			goto update_stats;
+	}
+
+	skb->accel_priv = NULL;
 	ret = macvlan_queue_xmit(skb, dev);
+update_stats:
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
 		struct macvlan_pcpu_stats *pcpu_stats;
 
@@ -347,6 +357,18 @@ static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	if (fwd_accel_supports(lowerdev, FA_FLG_STA_SUPPORT)) {
+		vlan->fwd_priv = fwd_accel_add_station(lowerdev, dev);
+		/*
+		 * If we get a NULL pointer back, or if we get an error
+		 * then we should just fall through to the non accelerated path
+		 */
+		if (IS_ERR_OR_NULL(vlan->fwd_priv))
+			vlan->fwd_priv = NULL;
+		else
+			return 0;
+	}
+
 	err = -EBUSY;
 	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
 		goto out;
@@ -367,6 +389,10 @@ hash_add:
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
+	if (vlan->fwd_priv) {
+		fwd_accel_del_station(lowerdev, vlan->fwd_priv);
+		vlan->fwd_priv = NULL;
+	}
 	return err;
 }
 
@@ -391,6 +417,11 @@ static int macvlan_stop(struct net_device *dev)
 
 hash_del:
 	macvlan_hash_del(vlan, !dev->dismantle);
+	if (vlan->fwd_priv) {
+		fwd_accel_del_station(lowerdev, vlan->fwd_priv);
+		vlan->fwd_priv = NULL;
+	}
+
 	return 0;
 }
 
@@ -801,6 +832,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 		if (err < 0)
 			return err;
 	}
+
 	port = macvlan_port_get_rtnl(lowerdev);
 
 	/* Only 1 macvlan device can be created in passthru mode */
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index ddd33fd..c270285 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -61,6 +61,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
+	void			*fwd_priv;
 	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3de49ac..ea18f07 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1100,6 +1100,27 @@ struct net_device_ops {
 };
 
 /*
+ * Flags to ennumerate hardware acceleration support
+ */
+#define FA_FLG_STA_SUPPORT (1 << 1)
+
+#define fwd_accel_supports(dev, feature) (dev->fwd_ops->flags & feature)
+#define fwd_accel_add_station(pdev, vdev) dev->fwd_ops->fwd_accel_add_station(pdev, vdev)
+#define fwd_accel_del_station(pdev, priv) dev->fwd_ops->fwd_accel_del_station(pdev, priv)
+
+struct forwarding_accel_ops {
+	unsigned int flags;
+
+	/*
+	 * fwd_accel_[add|del]_station must be set if
+	 * FA_FLG_STA_SUPPORT is set
+	 */
+	void*	(*fwd_accel_add_station)(struct net_device *pdev,
+					struct net_device *vdev);
+	void	(*fwd_accel_del_station)(struct net_device *pdev, void *priv);
+};
+
+/*
  *	The DEVICE structure.
  *	Actually, this whole structure is a big mistake.  It mixes I/O
  *	data with strictly "high-level" data, and it has to know about
@@ -1183,6 +1204,7 @@ struct net_device {
 	/* Management operations */
 	const struct net_device_ops *netdev_ops;
 	const struct ethtool_ops *ethtool_ops;
+	const struct forwarding_accel_ops *fwd_ops;
 
 	/* Hardware header description */
 	const struct header_ops *header_ops;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2ddb48d..0be9152 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -426,9 +426,12 @@ struct sk_buff {
 	char			cb[48] __aligned(8);
 
 	unsigned long		_skb_refdst;
-#ifdef CONFIG_XFRM
-	struct	sec_path	*sp;
-#endif
+
+	union {
+		struct	sec_path	*sp;
+		void 			*accel_priv;
+	};
+
 	unsigned int		len,
 				data_len;
 	__u16			mac_len,
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2..5f99382 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5992,6 +5992,7 @@ struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
 }
 
 static const struct ethtool_ops default_ethtool_ops;
+static const struct forwarding_accel_ops default_fwd_ops;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops)
@@ -6090,6 +6091,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->group = INIT_NETDEV_GROUP;
 	if (!dev->ethtool_ops)
 		dev->ethtool_ops = &default_ethtool_ops;
+	if (!dev->fwd_ops)
+		dev->fwd_ops = &default_fwd_ops;
 	return dev;
 
 free_all:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans
  2013-10-04 20:10   ` [RFC PATCH 0/2 v2] " Neil Horman
  2013-10-04 20:10     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
@ 2013-10-04 20:10     ` Neil Horman
  2013-10-07 22:09     ` [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
  2 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-04 20:10 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Andy Gospodarek, David Miller, Neil Horman

Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to
take advantage of these operations.  Allow it to allocate queues for a macvlan
so that when we transmit a frame, we can do the switching in hardware inside the
ixgbe card, rather than in software.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |  33 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h     |  54 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |  15 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 373 +++++++++++++++++------
 5 files changed, 387 insertions(+), 92 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 0ac6b11..e924efa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -219,6 +219,17 @@ enum ixgbe_ring_state_t {
 	__IXGBE_RX_FCOE,
 };
 
+struct ixgbe_fwd_adapter {
+	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
+	struct net_device *netdev;
+	struct ixgbe_adapter *real_adapter;
+	unsigned int tx_base_queue;
+	unsigned int rx_base_queue;
+	struct net_device_stats net_stats;
+	int pool;
+	bool online;
+};
+
 #define check_for_tx_hang(ring) \
 	test_bit(__IXGBE_TX_DETECT_HANG, &(ring)->state)
 #define set_check_for_tx_hang(ring) \
@@ -236,6 +247,7 @@ struct ixgbe_ring {
 	struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
 	struct net_device *netdev;	/* netdev ring belongs to */
 	struct device *dev;		/* device for DMA mapping */
+	struct ixgbe_fwd_adapter *l2_accel_priv;
 	void *desc;			/* descriptor ring memory */
 	union {
 		struct ixgbe_tx_buffer *tx_buffer_info;
@@ -244,6 +256,7 @@ struct ixgbe_ring {
 	unsigned long last_rx_timestamp;
 	unsigned long state;
 	u8 __iomem *tail;
+	struct net_device *vmdq_netdev;
 	dma_addr_t dma;			/* phys. address of descriptor ring */
 	unsigned int size;		/* length in bytes */
 
@@ -288,11 +301,15 @@ enum ixgbe_ring_f_enum {
 };
 
 #define IXGBE_MAX_RSS_INDICES  16
-#define IXGBE_MAX_VMDQ_INDICES 64
+#define IXGBE_MAX_VMDQ_INDICES 32
 #define IXGBE_MAX_FDIR_INDICES 63	/* based on q_vector limit */
 #define IXGBE_MAX_FCOE_INDICES  8
 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
+#define IXGBE_MAX_L2A_QUEUES 4
+#define IXGBE_MAX_L2A_QUEUES 4
+#define IXGBE_BAD_L2A_QUEUE 3
+
 struct ixgbe_ring_feature {
 	u16 limit;	/* upper limit on feature indices */
 	u16 indices;	/* current value of indices */
@@ -738,6 +755,7 @@ struct ixgbe_adapter {
 #endif /*CONFIG_DEBUG_FS*/
 
 	u8 default_up;
+	unsigned long fwd_bitmask; /* Bitmask indicating in use pools */
 };
 
 struct ixgbe_fdir_filter {
@@ -879,9 +897,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {}
 static inline void ixgbe_dbg_init(void) {}
 static inline void ixgbe_dbg_exit(void) {}
 #endif /* CONFIG_DEBUG_FS */
+static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring)
+{
+	return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev;
+}
+
 static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring)
 {
-	return netdev_get_tx_queue(ring->netdev, ring->queue_index);
+	return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index);
 }
 
 extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter);
@@ -915,4 +938,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr);
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter);
 #endif
 
+int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd);
+int ixgbe_write_uc_addr_list(struct net_device *netdev);
+netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
+				  struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *tx_ring);
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring);
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index e8649ab..277af14 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = {
 };
 #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN
 
-static int ixgbe_get_settings(struct net_device *netdev,
-                              struct ethtool_cmd *ecmd)
+int ixgbe_get_settings(struct net_device *netdev,
+		       struct ethtool_cmd *ecmd)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h
new file mode 100644
index 0000000..2f36584
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h
@@ -0,0 +1,54 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include "ixgbe.h"
+
+
+static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
+					   u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+	/* skip the flush */
+}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 90b4e10..e2dd635 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 #endif
 
 	/* only proceed if SR-IOV is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) &&
+	    !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED))
 		return false;
 
 	/* Add starting offset to total pool count */
@@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 
 		/* apply Tx specific ring traits */
 		ring->count = adapter->tx_ring_count;
-		ring->queue_index = txr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				txr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = txr_idx;
 
 		/* assign ring to adapter */
 		adapter->tx_ring[txr_idx] = ring;
@@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 #endif /* IXGBE_FCOE */
 		/* apply Rx specific ring traits */
 		ring->count = adapter->rx_ring_count;
-		ring->queue_index = rxr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				rxr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = rxr_idx;
 
 		/* assign ring to adapter */
 		adapter->rx_ring[rxr_idx] = ring;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..5fa553f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -52,6 +52,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_l2a.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -118,6 +119,8 @@ static DEFINE_PCI_DEVICE_TABLE(ixgbe_pci_tbl) = {
 };
 MODULE_DEVICE_TABLE(pci, ixgbe_pci_tbl);
 
+netdev_tx_t ixgbe_fwd_xmit_frame(struct sk_buff *skb, void *priv);
+
 #ifdef CONFIG_IXGBE_DCA
 static int ixgbe_notify_dca(struct notifier_block *, unsigned long event,
 			    void *p);
@@ -872,7 +875,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
 
 static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
 {
-	struct ixgbe_adapter *adapter = netdev_priv(ring->netdev);
+	struct net_device *dev = ring->netdev;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 
 	u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx));
@@ -1055,7 +1059,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1072,16 +1076,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
+		if (__netif_subqueue_stopped(netdev_ring(tx_ring),
 					     tx_ring->queue_index)
 		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
+			netif_wake_subqueue(netdev_ring(tx_ring),
 					    tx_ring->queue_index);
 			++tx_ring->tx_stats.restart_queue;
 		}
@@ -1226,7 +1230,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring,
 				 union ixgbe_adv_rx_desc *rx_desc,
 				 struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
+	if (netdev_ring(ring)->features & NETIF_F_RXHASH)
 		skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
 }
 
@@ -1260,10 +1264,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
+	struct net_device *dev = netdev_ring(ring);
+
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(ring->netdev->features & NETIF_F_RXCSUM))
+	if (!(dev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -1559,7 +1565,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
-	struct net_device *dev = rx_ring->netdev;
+	struct net_device *dev = netdev_ring(rx_ring);
 
 	ixgbe_update_rsc_stats(rx_ring, skb);
 
@@ -1739,7 +1745,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 				  union ixgbe_adv_rx_desc *rx_desc,
 				  struct sk_buff *skb)
 {
-	struct net_device *netdev = rx_ring->netdev;
+	struct net_device *netdev = netdev_ring(rx_ring);
 
 	/* verify that the packet does not have any known errors */
 	if (unlikely(ixgbe_test_staterr(rx_desc,
@@ -1905,7 +1911,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+		skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring),
 						IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
@@ -1986,6 +1992,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	int ddp_bytes;
 	unsigned int mss = 0;
+	struct net_device *netdev = netdev_ring(rx_ring);
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
 
@@ -1993,6 +2000,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct sk_buff *skb;
 
+		if (rx_ring->l2_accel_priv) {
+			printk(KERN_CRIT "RECEIVING ON AN ACCELERATED QUEUE\n");
+		}
+
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
 			ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
@@ -2041,7 +2052,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			/* include DDPed FCoE data */
 			if (ddp_bytes > 0) {
 				if (!mss) {
-					mss = rx_ring->netdev->mtu -
+					mss = netdev->mtu -
 						sizeof(struct fcoe_hdr) -
 						sizeof(struct fc_frame_header) -
 						sizeof(struct fcoe_crc_eof);
@@ -2455,58 +2466,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 	}
 }
 
-static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
-					   u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
-static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
-					    u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
 /**
  * ixgbe_irq_enable - Enable default interrupt generation settings
  * @adapter: board private structure
@@ -2946,6 +2905,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 			     struct ixgbe_ring *ring)
 {
+	struct net_device *netdev = netdev_ring(ring);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
@@ -3005,7 +2965,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 		struct ixgbe_q_vector *q_vector = ring->q_vector;
 
 		if (q_vector)
-			netif_set_xps_queue(adapter->netdev,
+			netif_set_xps_queue(netdev,
 					    &q_vector->affinity_mask,
 					    ring->queue_index);
 	}
@@ -3395,7 +3355,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int rss_i = adapter->ring_feature[RING_F_RSS].indices;
-	int p;
+	u16 pool;
 
 	/* PSRTYPE must be initialized in non 82598 adapters */
 	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
@@ -3412,9 +3372,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	else if (rss_i > 1)
 		psrtype |= 1 << 29;
 
-	for (p = 0; p < adapter->num_rx_pools; p++)
-		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)),
-				psrtype);
+	for_each_set_bit(pool, &adapter->fwd_bitmask, 32)
+		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
 }
 
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
@@ -3683,6 +3642,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl &= ~IXGBE_RXDCTL_VME;
@@ -3713,6 +3674,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl |= IXGBE_RXDCTL_VME;
@@ -3743,15 +3706,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
  *                0 on no addresses written
  *                X on writing X addresses to the RAR table
  **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev)
+int ixgbe_write_uc_addr_list(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	unsigned int rar_entries = hw->mac.num_rar_entries - 1;
 	int count = 0;
 
-	/* In SR-IOV mode significantly less RAR entries are available */
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+	/* In SR-IOV/VMDQ modes significantly less RAR entries are available */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED ||
+	    adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)
 		rar_entries = IXGBE_MAX_PF_MACVLANS - 1;
 
 	/* return ENOMEM indicating insufficient memory for addresses */
@@ -3772,6 +3736,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev)
 			count++;
 		}
 	}
+
 	/* write the addresses in reverse order to avoid write combining */
 	for (; rar_entries > 0 ; rar_entries--)
 		hw->mac.ops.clear_rar(hw, rar_entries);
@@ -4133,6 +4098,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	ixgbe_configure_virtualization(adapter);
 
 	ixgbe_set_rx_mode(adapter->netdev);
+
 	ixgbe_restore_vlan(adapter);
 
 	switch (hw->mac.type) {
@@ -4459,7 +4425,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
  **/
-static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
 	unsigned long size;
@@ -4838,6 +4804,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		return -EIO;
 	}
 
+	/* PF holds first pool slot */
+	set_bit(0, &adapter->fwd_bitmask);
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
 	return 0;
@@ -5143,7 +5111,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
 static int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
+	int err, queues;
 
 	/* disallow open during test */
 	if (test_bit(__IXGBE_TESTING, &adapter->state))
@@ -5168,16 +5136,22 @@ static int ixgbe_open(struct net_device *netdev)
 		goto err_req_irq;
 
 	/* Notify the stack of the actual queue counts. */
-	err = netif_set_real_num_tx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_tx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_tx_queues > IXGBE_MAX_L2A_QUEUES)
+		queues = IXGBE_MAX_L2A_QUEUES;
+	else
+		queues = adapter->num_tx_queues;
+
+	err = netif_set_real_num_tx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
-
-	err = netif_set_real_num_rx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_rx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_rx_queues > IXGBE_MAX_L2A_QUEUES)
+		queues = IXGBE_MAX_L2A_QUEUES;
+	else
+		queues = adapter->num_rx_queues;
+	err = netif_set_real_num_rx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
@@ -5215,7 +5189,6 @@ static int ixgbe_close(struct net_device *netdev)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
 	ixgbe_ptp_stop(adapter);
-
 	ixgbe_down(adapter);
 	ixgbe_free_irq(adapter);
 
@@ -6576,7 +6549,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -6588,7 +6561,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
@@ -6639,6 +6612,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_ring *tx_ring)
 {
 	struct ixgbe_tx_buffer *first;
+#ifdef IXGBE_FCOE
+	struct net_device *dev;
+#endif
 	int tso;
 	u32 tx_flags = 0;
 	unsigned short f;
@@ -6730,9 +6706,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 	first->protocol = protocol;
 
 #ifdef IXGBE_FCOE
+	dev = netdev_ring(tx_ring);
 	/* setup tx offload for FCoE */
 	if ((protocol == __constant_htons(ETH_P_FCOE)) &&
-	    (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
+	    (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
 		tso = ixgbe_fso(tx_ring, first, &hdr_len);
 		if (tso < 0)
 			goto out_drop;
@@ -6784,7 +6761,15 @@ static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb,
 		skb_set_tail_pointer(skb, 17);
 	}
 
-	tx_ring = adapter->tx_ring[skb->queue_mapping];
+	if (skb->accel_priv) {
+		struct ixgbe_fwd_adapter *fwd_adapter = skb->accel_priv;
+		unsigned int queue;
+
+		queue = skb->queue_mapping + fwd_adapter->tx_base_queue;
+		tx_ring = fwd_adapter->real_adapter->tx_ring[queue];
+	} else 
+		tx_ring = adapter->tx_ring[skb->queue_mapping];
+
 	return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
 }
 
@@ -7057,6 +7042,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	 */
 	if (netif_running(dev))
 		ixgbe_close(dev);
+
 	ixgbe_clear_interrupt_scheme(adapter);
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7305,6 +7291,217 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
 }
 
+static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+}
+
+static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter,
+				 u8 *addr, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool;
+	hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV);
+}
+
+static void ixgbe_fwd_psrtype(struct ixgbe_fwd_adapter *vadapter)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int rss_i = vadapter->netdev->real_num_rx_queues;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u16 pool = vadapter->pool;
+	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
+		      IXGBE_PSRTYPE_UDPHDR |
+		      IXGBE_PSRTYPE_IPV4HDR |
+		      IXGBE_PSRTYPE_L2HDR |
+		      IXGBE_PSRTYPE_IPV6HDR;
+
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		return;
+
+	if (rss_i > 3)
+		psrtype |= 2 << 29;
+	else if (rss_i > 1)
+		psrtype |= 1 << 29;
+
+	IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
+}
+
+static void ixgbe_enable_fwd_ring(struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *rx_ring,
+				  struct ixgbe_fwd_adapter *accel)
+{
+	rx_ring->l2_accel_priv = accel;
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+}
+
+
+static void ixgbe_disable_fwd_ring(struct ixgbe_fwd_adapter *vadapter,
+				   struct ixgbe_ring *rx_ring)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int index = rx_ring->queue_index + vadapter->rx_base_queue;
+
+	/* shutdown specific queue receive and wait for dma to settle */
+	ixgbe_disable_rx_queue(adapter, rx_ring);
+	usleep_range(10000, 20000);
+	ixgbe_irq_disable_queues(adapter, ((u64)1 << index));
+	ixgbe_clean_rx_ring(rx_ring);
+	rx_ring->l2_accel_priv = NULL;
+}
+
+int ixgbe_fwd_ring_up(struct net_device *vdev, struct ixgbe_fwd_adapter *accel)
+{
+	struct ixgbe_adapter *adapter = accel->real_adapter;
+	unsigned int rxbase = accel->pool * adapter->num_rx_queues_per_pool;
+	unsigned int txbase = accel->pool * adapter->num_rx_queues_per_pool;
+	int err, i;
+
+
+	accel->rx_base_queue = rxbase;
+	accel->tx_base_queue = txbase;
+
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]);
+
+	for (i = 0; i < vdev->num_rx_queues; i++) {
+		adapter->rx_ring[rxbase + i]->vmdq_netdev = vdev;
+		ixgbe_enable_fwd_ring(adapter, adapter->rx_ring[rxbase + i], accel);
+	}
+
+	for (i = 0; i < vdev->num_tx_queues; i++)
+		adapter->tx_ring[txbase + i]->vmdq_netdev = vdev;
+
+	if (is_valid_ether_addr(vdev->dev_addr))
+		ixgbe_add_mac_filter(adapter, vdev->dev_addr, accel->pool);
+
+	err = netif_set_real_num_tx_queues(vdev, vdev->num_tx_queues);
+	if (err)
+		goto err_set_queues;
+	err = netif_set_real_num_rx_queues(vdev, vdev->num_rx_queues);
+	if (err)
+		goto err_set_queues;
+
+	ixgbe_fwd_psrtype(accel);
+	netif_tx_start_all_queues(vdev);
+	return 0;
+err_set_queues:
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]);
+	return err;
+}
+
+int ixgbe_fwd_ring_down(struct net_device *vdev, struct ixgbe_fwd_adapter *accel)
+{
+	struct ixgbe_adapter *adapter = accel->real_adapter;
+	unsigned int rxbase = accel->rx_base_queue;
+	int i;
+
+	netif_tx_stop_all_queues(vdev);
+
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]);
+
+	return 0;
+}
+
+static void* ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
+{
+	struct ixgbe_fwd_adapter *fwd_adapter = NULL;
+	struct ixgbe_adapter *adapter = netdev_priv(pdev);
+	int pool, vmdq_pool, base_queue;
+	int err;
+
+	/* Check for hardware restriction on number of rx/tx queues */
+	if (vdev->num_rx_queues != vdev->num_tx_queues ||
+	    vdev->num_tx_queues > IXGBE_MAX_L2A_QUEUES ||
+	    vdev->num_tx_queues == IXGBE_BAD_L2A_QUEUE) {
+		netdev_info(pdev, "%s: Supports RX/TX Queue counts 1,2, and 4\n",
+		       pdev->name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES)
+		return ERR_PTR(-EBUSY);
+
+	fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL);
+	if (!fwd_adapter)
+		return ERR_PTR(-ENOMEM);
+
+	pool = find_first_zero_bit(&adapter->fwd_bitmask, 32);
+	adapter->num_rx_pools++;
+	set_bit(pool, &adapter->fwd_bitmask);
+
+	/* Enable VMDq flag so device will be set in VM mode */
+	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
+	adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools;
+	adapter->ring_feature[RING_F_VMDQ].offset = 0;
+	adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_L2A_QUEUES;
+
+	/* Force reinit of ring allocation with VMDQ enabled */
+	ixgbe_setup_tc(pdev, netdev_get_num_tc(pdev));
+
+	/* Configure VSI adapter structure */
+	vmdq_pool = VMDQ_P(pool);
+	base_queue = vmdq_pool * adapter->num_rx_queues_per_pool;
+
+	netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   pool, adapter->num_rx_pools,
+		   base_queue, base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->fwd_bitmask);
+
+	fwd_adapter->pool = pool;
+	fwd_adapter->netdev = vdev;
+	fwd_adapter->real_adapter = adapter;
+	fwd_adapter->rx_base_queue = base_queue;
+	fwd_adapter->tx_base_queue = base_queue;
+
+	err = ixgbe_fwd_ring_up(vdev, fwd_adapter);	
+	if (!err) {
+		kfree(fwd_adapter);
+		return ERR_PTR(err);
+	}
+	return fwd_adapter;
+}
+
+static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
+{
+	struct ixgbe_fwd_adapter *fwd_adapter = priv; 
+	struct ixgbe_adapter *adapter = fwd_adapter->real_adapter;
+
+	clear_bit(fwd_adapter->pool, &adapter->fwd_bitmask);
+	adapter->num_rx_pools--;
+
+	ixgbe_fwd_ring_down(fwd_adapter->netdev, fwd_adapter);
+
+	netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   fwd_adapter->pool, adapter->num_rx_pools,
+		   fwd_adapter->rx_base_queue,
+		   fwd_adapter->rx_base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->fwd_bitmask);
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7351,6 +7548,11 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
 };
 
+const struct forwarding_accel_ops  ixgbe_fwd_ops = {
+	.fwd_accel_add_station = ixgbe_fwd_add,
+	.fwd_accel_del_station = ixgbe_fwd_del,
+};
+
 /**
  * ixgbe_enumerate_functions - Get the number of ports this device has
  * @adapter: adapter structure
@@ -7554,6 +7756,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 
 	netdev->netdev_ops = &ixgbe_netdev_ops;
+	netdev->fwd_ops = &ixgbe_fwd_ops;
 	ixgbe_set_ethtool_ops(netdev);
 	netdev->watchdog_timeo = 5 * HZ;
 	strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-04 20:10     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
@ 2013-10-07 19:52       ` David Miller
  2013-10-07 21:20         ` Neil Horman
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2013-10-07 19:52 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, john.r.fastabend, andy

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri,  4 Oct 2013 16:10:04 -0400

> @@ -426,9 +426,12 @@ struct sk_buff {
>  	char			cb[48] __aligned(8);
>  
>  	unsigned long		_skb_refdst;
> -#ifdef CONFIG_XFRM
> -	struct	sec_path	*sp;
> -#endif
> +
> +	union {
> +		struct	sec_path	*sp;
> +		void 			*accel_priv;
> +	};
> +

I'm not %100 sure these two things are really mutually exclusive.

What if bridging ebtables does an input route lookup?  That can
populate the security path.

Also, why have you not added this to the usual netdev_ops and
hw_features?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-07 19:52       ` David Miller
@ 2013-10-07 21:20         ` Neil Horman
  2013-10-07 21:34           ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Neil Horman @ 2013-10-07 21:20 UTC (permalink / raw)
  To: davem; +Cc: netdev

Forgive the poor reply format, Dave, I deleted your email (to fast on the
trigger apparently), so I have to reconstruct it.

>> @@ -426,9 +426,12 @@ struct sk_buff {
>>  	char			cb[48] __aligned(8);
>>  
>>  	unsigned long		_skb_refdst;
>> -#ifdef CONFIG_XFRM
>> -	struct	sec_path	*sp;
>> -#endif
>> +
>> +	union {
>> +		struct	sec_path	*sp;
>> +		void 			*accel_priv;
>> +	};
>> +
>
>I'm not %100 sure these two things are really mutually exclusive.
>
>What if bridging ebtables does an input route lookup?  That can
>populate the security path.
>
You are mostly likely right, thats why this is an RFC, I haven't really thought
through that bit fully yet, to be perfectly honest.  I wanted a place for a
pointer to the accelerated data path data to live, and that looked like a
reasonably safe place at the time, but as you point out, its not.  I'll need to
find a better place for it.

>Also, why have you not added this to the usual netdev_ops and
>hw_features?

Thats me experimenting.  I was thinking that origionally this functionality
might be grouped separately, so that we could handle it independently of the
standard network device operations (you might have noticed in v1 of my patch I
had a size_t variable in there, so I thought the separation might be
organizationally nice).  It was also something I was tinkering with for
potential future work to support other data plane accelerators (like the FM6000
switch chip from intel) in a manner that didn't pollute the more typical host network
devices.  Like I said though, just experimenting at the moment....

Regards
Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-07 21:20         ` Neil Horman
@ 2013-10-07 21:34           ` David Miller
  2013-10-07 22:39             ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2013-10-07 21:34 UTC (permalink / raw)
  To: nhorman; +Cc: netdev

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 7 Oct 2013 17:20:00 -0400

> Thats me experimenting.  I was thinking that origionally this functionality
> might be grouped separately, so that we could handle it independently of the
> standard network device operations (you might have noticed in v1 of my patch I
> had a size_t variable in there, so I thought the separation might be
> organizationally nice).  It was also something I was tinkering with for
> potential future work to support other data plane accelerators (like the FM6000
> switch chip from intel) in a manner that didn't pollute the more typical host network
> devices.  Like I said though, just experimenting at the moment....

Can these dataplane devices still act like a normal networking port and
send and receive packets at the host level?

If yes, that would be an extremely strong argument for netdev_ops.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration
  2013-10-04 20:10   ` [RFC PATCH 0/2 v2] " Neil Horman
  2013-10-04 20:10     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
  2013-10-04 20:10     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
@ 2013-10-07 22:09     ` John Fastabend
  2013-10-08  1:08       ` Neil Horman
  2 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-10-07 22:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, John Fastabend, Andy Gospodarek, David Miller

On 10/04/2013 01:10 PM, Neil Horman wrote:
> Hey all-
>       heres the next, updated version of the vsi/macvlan integration that we've
> been discussing.
>
> Some change notes:
>
> * Changes to the fowarding ops structure - Removed the priv_size field, and
> added a flags field.  Removal of the priv_size field was accomplished by just
> having the add method return a void * and using ERR_PTR and PTR_ERR checks,
> which also allows us to allocate memory for the acceleration path in the driver,
> which I like.  I'm not super happy still with how I'm using the flags (currenly
> only used to indicate support for feature sets), but at least we have the flags
> now, and they can be exposed to user space via iproute2 or ethtool if need be

For the flag why not use the existing feature flag namespace? Adding
NETIF_F_HW_L2FW_DOFFLOAD or something equivalent to netdev_features.h
and also doing the string updates would get the userspace support for
free.

The one downside of a global feature flag approach like this is if
the user wants to create some offloaded macvlan devices and some SW
only macvlans the control sequence is a bit clumsy. But unless we add
a new flag to the macvlan netlink message I'm not sure how to avoid
this. Further if you have multiple control applications
creating/deleting these macvlans the flag mechanism starts to break
down. One app sets the flag the other deletes, you get the idea.

It might be worth adding a netlink flag to change the state of
the HW offload. I know this goes against the 'it just gets offloaded'
line of thought, but if you make the default to offload then it should
be OK.

>
> * Changes to the Transmit path - Specifically I'm using dev_queue_xmit to send
> frames now, which I like as it makes the macvlan subject to the lowerdevs qdisc
> configuration.

This creates perhaps another oddity where on TX the packets will be
visible to the lowerdev. Think tcpdump and egress qdisc. But on ingress
the packets will be delivered directly to macvlan device. Also I was
thinking there might be good reasons to skip the lowerdev qdisc, for
performance reasons or QOS setups.

So it might be best the way you had it in the previous revision even if
it was not functionally equivalent to the SW path. If you skip the lower
dev and submit directly to the hardware then you also don't need a
sk_buff change. The driver can do a lookup via the skb->dev field and
additional hints from the stack are not needed.

If there is a perceived problem with not having the HW and SW completely
equivalent functionally we could always default the feature flag to off.
Then enabling the feature would be a single 'ethtool' command. This
seems like a nice compromise.

I like where this is going though!

Thanks,
John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-07 21:34           ` David Miller
@ 2013-10-07 22:39             ` John Fastabend
  2013-10-08  0:52               ` Neil Horman
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-10-07 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, netdev

On 10/07/2013 02:34 PM, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 7 Oct 2013 17:20:00 -0400
>
>> Thats me experimenting.  I was thinking that origionally this functionality
>> might be grouped separately, so that we could handle it independently of the
>> standard network device operations (you might have noticed in v1 of my patch I
>> had a size_t variable in there, so I thought the separation might be
>> organizationally nice).  It was also something I was tinkering with for
>> potential future work to support other data plane accelerators (like the FM6000
>> switch chip from intel) in a manner that didn't pollute the more typical host network
>> devices.  Like I said though, just experimenting at the moment....
>

We can do something like the dcbnl ops and add another pointer off
the net device structure and then use the skb->dev field to find the
correct set of ops? This seems like the simplest option to me and
isolates the ops structure.

Is there some information loss from hanging it off the netdevice
structure vs the skb? I can't see any.

> Can these dataplane devices still act like a normal networking port and
> send and receive packets at the host level?
>

Yes they act like normal networking ports except for there is a
switching component in the hardware. These patches are not looking at
virtual or multiple physical functions at the moment.

> If yes, that would be an extremely strong argument for netdev_ops.

I agree.


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-07 22:39             ` John Fastabend
@ 2013-10-08  0:52               ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-08  0:52 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, netdev

On Mon, Oct 07, 2013 at 03:39:01PM -0700, John Fastabend wrote:
> On 10/07/2013 02:34 PM, David Miller wrote:
> >From: Neil Horman <nhorman@tuxdriver.com>
> >Date: Mon, 7 Oct 2013 17:20:00 -0400
> >
> >>Thats me experimenting.  I was thinking that origionally this functionality
> >>might be grouped separately, so that we could handle it independently of the
> >>standard network device operations (you might have noticed in v1 of my patch I
> >>had a size_t variable in there, so I thought the separation might be
> >>organizationally nice).  It was also something I was tinkering with for
> >>potential future work to support other data plane accelerators (like the FM6000
> >>switch chip from intel) in a manner that didn't pollute the more typical host network
> >>devices.  Like I said though, just experimenting at the moment....
> >
> 
> We can do something like the dcbnl ops and add another pointer off
> the net device structure and then use the skb->dev field to find the
> correct set of ops? This seems like the simplest option to me and
> isolates the ops structure.
> 
We certainly could do that, or perhaps, for what we're trying to do here, just
using standard netdev_ops is sufficient.  I kind of like the separation (like
the dcbnl_ops), but like I said, experimenting.  I'll try the next version with
the accel methods added to the netdev structure for comparison.

> Is there some information loss from hanging it off the netdevice
> structure vs the skb? I can't see any.
> 
No, not that I'm aware of.  The only reason I added it to the skb in this
version was that, by doing so, I was able to make dual use of the netdev's
standard tx path.

> >Can these dataplane devices still act like a normal networking port and
> >send and receive packets at the host level?
> >
> 
> Yes they act like normal networking ports except for there is a
> switching component in the hardware. These patches are not looking at
> virtual or multiple physical functions at the moment.
> 
To be clear, as John says, these patches aren't addressing any dataplane
acceleration devices beyond the internal switching capabilities of the ixgbe
cards.  That said, other chips will have varying degrees of capabilities, from
simple L2 switching, to full content addressable memories that allow for l2/l3
forwarding, as well as higher level routing functions.  Again however, these
patches are just to integrate macvlans with johns virtual station interface
work.

> >If yes, that would be an extremely strong argument for netdev_ops.
> 
> I agree.
In this specific case, that may well be the case, yes.  I'm not so sure of that
for more advanced switching/routing accelerators, but we probably should do what
makes sense now, and worry about future bridges when we forward over them
(pardon the pun :) ).

Neil

> 
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration
  2013-10-07 22:09     ` [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
@ 2013-10-08  1:08       ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-08  1:08 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, John Fastabend, Andy Gospodarek, David Miller

On Mon, Oct 07, 2013 at 03:09:11PM -0700, John Fastabend wrote:
> On 10/04/2013 01:10 PM, Neil Horman wrote:
> >Hey all-
> >      heres the next, updated version of the vsi/macvlan integration that we've
> >been discussing.
> >
> >Some change notes:
> >
> >* Changes to the fowarding ops structure - Removed the priv_size field, and
> >added a flags field.  Removal of the priv_size field was accomplished by just
> >having the add method return a void * and using ERR_PTR and PTR_ERR checks,
> >which also allows us to allocate memory for the acceleration path in the driver,
> >which I like.  I'm not super happy still with how I'm using the flags (currenly
> >only used to indicate support for feature sets), but at least we have the flags
> >now, and they can be exposed to user space via iproute2 or ethtool if need be
> 
> For the flag why not use the existing feature flag namespace? Adding
> NETIF_F_HW_L2FW_DOFFLOAD or something equivalent to netdev_features.h
> and also doing the string updates would get the userspace support for
> free.
> 
> The one downside of a global feature flag approach like this is if
> the user wants to create some offloaded macvlan devices and some SW
> only macvlans the control sequence is a bit clumsy. But unless we add
> a new flag to the macvlan netlink message I'm not sure how to avoid
> this. Further if you have multiple control applications
> creating/deleting these macvlans the flag mechanism starts to break
> down. One app sets the flag the other deletes, you get the idea.
> 
> It might be worth adding a netlink flag to change the state of
> the HW offload. I know this goes against the 'it just gets offloaded'
> line of thought, but if you make the default to offload then it should
> be OK.
> 
Good suggestions, I should have thought to use the global flags previously. I
got hung up on the notion that for some reason our feature set should be
separated like the ops.  I really don't need to do that.

> >
> >* Changes to the Transmit path - Specifically I'm using dev_queue_xmit to send
> >frames now, which I like as it makes the macvlan subject to the lowerdevs qdisc
> >configuration.
> 
> This creates perhaps another oddity where on TX the packets will be
> visible to the lowerdev. Think tcpdump and egress qdisc. But on ingress
> the packets will be delivered directly to macvlan device. Also I was
> thinking there might be good reasons to skip the lowerdev qdisc, for
> performance reasons or QOS setups.
> 
Hmm, is that avoidable by doing an extra check in dev_queue_xmit (i.e. checking
to see if accel_data or some other pointer is set)?  Or do you think its worth
separating the tx path to an accelerated or non accelerated case?

> So it might be best the way you had it in the previous revision even if
> it was not functionally equivalent to the SW path. If you skip the lower
> dev and submit directly to the hardware then you also don't need a
> sk_buff change. The driver can do a lookup via the skb->dev field and
> additional hints from the stack are not needed.
> 
Ok, fair enough, I'll put it back the way I had it previously.

> If there is a perceived problem with not having the HW and SW completely
> equivalent functionally we could always default the feature flag to off.
> Then enabling the feature would be a single 'ethtool' command. This
> seems like a nice compromise.
> 
Yeah, I can agree with that.

> I like where this is going though!
I'm glad you think so!  I'll have another version out later this week!
Best
Neil

> 
> Thanks,
> John
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH 0/2 v3] net: alternate proposal for using macvlans with forwarding acceleration
  2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
                     ` (3 preceding siblings ...)
  2013-10-04 20:10   ` [RFC PATCH 0/2 v2] " Neil Horman
@ 2013-10-11 18:43   ` Neil Horman
  2013-10-11 18:43     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
  2013-10-11 18:43     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
  4 siblings, 2 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-11 18:43 UTC (permalink / raw)
  To: netdev; +Cc: john.fastabend, Andy Gospodarek, David Miller

Hey all-
     heres the next, updated version of the vsi/macvlan integration that we've
been discussing.

Change notes:

* Moved the feature flag to netdev_features.h.  No ethtool option for disabling
it yet, but its there now, and seems to fit fairly well.  I was actually
thinking about your comment John, regarding the clumsiness in allowing sw and hw
accel vlans on the same lowerdev, and it just occured to me that we could use
the same flag on the macvlan device directly - i.e. if we found that a lowerdev
supported acceleration, then call ndo_dfwd_station_add, and, if successfull, set
the same feature flag in the macvlan device.  Then we could use ethtool to
control the enabling/disabling of acceleration at the macvlan device directly.
Thoughts?

* Moved the acceleration net device methods back into net_device_ops.  Looks
pretty good to me there.

* Restored the use of a separate xmit routine so we weren't subject to the
lowerdevs queue disciplines.  I integrated its use with dev_hard_start_xmit, so
we could share the use of the linearization code, etc.  Let me know what you
think.

Best
Neil

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-11 18:43   ` [RFC PATCH 0/2 v3] " Neil Horman
@ 2013-10-11 18:43     ` Neil Horman
  2013-10-13 20:46       ` John Fastabend
  2013-10-11 18:43     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
  1 sibling, 1 reply; 35+ messages in thread
From: Neil Horman @ 2013-10-11 18:43 UTC (permalink / raw)
  To: netdev; +Cc: john.fastabend, Andy Gospodarek, David Miller, Neil Horman

Add a operations structure that allows a network interface to export the fact
that it supports package forwarding in hardware between physical interfaces and
other mac layer devices assigned to it (such as macvlans).  this operaions
structure can be used by virtual mac devices to bypass software switching so
that forwarding can be done in hardware more efficiently.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.fastabend@gmail.com
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/macvlan.c           | 31 +++++++++++++++++++++++++++++++
 include/linux/if_macvlan.h      |  1 +
 include/linux/netdev_features.h |  2 ++
 include/linux/netdevice.h       | 11 ++++++++++-
 include/linux/skbuff.h          |  4 ++--
 net/core/dev.c                  | 18 +++++++++++++-----
 net/core/ethtool.c              |  1 +
 net/sched/sch_generic.c         |  2 +-
 8 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9bf46bd..c5a2718 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -297,7 +297,16 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	int ret;
 	const struct macvlan_dev *vlan = netdev_priv(dev);
 
+	if (vlan->fwd_priv) {
+		skb->dev = vlan->lowerdev;
+		ret = dev_hard_start_xmit(skb, skb->dev, NULL, vlan->fwd_priv);
+					  
+		if (likely(ret == NETDEV_TX_OK))
+			goto update_stats;
+	}
+
 	ret = macvlan_queue_xmit(skb, dev);
+update_stats:
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
 		struct macvlan_pcpu_stats *pcpu_stats;
 
@@ -347,6 +356,18 @@ static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
+		vlan->fwd_priv = lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
+		/*
+		 * If we get a NULL pointer back, or if we get an error
+		 * then we should just fall through to the non accelerated path
+		 */
+		if (IS_ERR_OR_NULL(vlan->fwd_priv))
+			vlan->fwd_priv = NULL;
+		else
+			return 0;
+	}
+
 	err = -EBUSY;
 	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
 		goto out;
@@ -367,6 +388,10 @@ hash_add:
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
+	if (vlan->fwd_priv) {
+		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev, vlan->fwd_priv);
+		vlan->fwd_priv = NULL;
+	}	
 	return err;
 }
 
@@ -391,6 +416,11 @@ static int macvlan_stop(struct net_device *dev)
 
 hash_del:
 	macvlan_hash_del(vlan, !dev->dismantle);
+	if (vlan->fwd_priv) {
+		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev, vlan->fwd_priv);
+		vlan->fwd_priv = NULL;
+	}
+
 	return 0;
 }
 
@@ -801,6 +831,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 		if (err < 0)
 			return err;
 	}
+
 	port = macvlan_port_get_rtnl(lowerdev);
 
 	/* Only 1 macvlan device can be created in passthru mode */
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index ddd33fd..c270285 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -61,6 +61,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
+	void			*fwd_priv;
 	struct macvlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index a2a89a5..9d1ee76 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -60,6 +60,7 @@ enum {
 	NETIF_F_HW_VLAN_STAG_TX_BIT,	/* Transmit VLAN STAG HW acceleration */
 	NETIF_F_HW_VLAN_STAG_RX_BIT,	/* Receive VLAN STAG HW acceleration */
 	NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
+	NETIF_F_HW_L2FW_DOFFLOAD_BIT,	/* Allow L2 Forwarding in Hardware */
 
 	/*
 	 * Add your fresh new feature above and remember to update
@@ -112,6 +113,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
 #define NETIF_F_HW_VLAN_STAG_RX	__NETIF_F(HW_VLAN_STAG_RX)
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
+#define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 
 /* Features valid for ethtool to change */
 /* = all defined minus driver/device-class-related */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3de49ac..0249179 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1097,6 +1097,13 @@ struct net_device_ops {
 	void			(*ndo_del_vxlan_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
 						      __be16 port);
+
+	void*			(*ndo_dfwd_add_station)(struct net_device *pdev,
+							struct net_device *vdev);
+	void			(*ndo_dfwd_del_station)(struct net_device *pdev, void *priv);
+
+	netdev_tx_t		(*ndo_dfwd_start_xmit) (struct sk_buff *skb,
+							struct net_device *dev, void *priv);
 };
 
 /*
@@ -1183,6 +1190,7 @@ struct net_device {
 	/* Management operations */
 	const struct net_device_ops *netdev_ops;
 	const struct ethtool_ops *ethtool_ops;
+	const struct forwarding_accel_ops *fwd_ops;
 
 	/* Hardware header description */
 	const struct header_ops *header_ops;
@@ -2383,7 +2391,8 @@ extern int		dev_get_phys_port_id(struct net_device *dev,
 					     struct netdev_phys_port_id *ppid);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev,
-					    struct netdev_queue *txq);
+					    struct netdev_queue *txq,
+					    void *accel_priv);
 extern int		dev_forward_skb(struct net_device *dev,
 					struct sk_buff *skb);
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2ddb48d..1710fdb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -426,9 +426,9 @@ struct sk_buff {
 	char			cb[48] __aligned(8);
 
 	unsigned long		_skb_refdst;
-#ifdef CONFIG_XFRM
+
 	struct	sec_path	*sp;
-#endif
+
 	unsigned int		len,
 				data_len;
 	__u16			mac_len,
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2..ecad8c2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2536,7 +2536,7 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
 }
 
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
-			struct netdev_queue *txq)
+			struct netdev_queue *txq, void *accel_priv)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	int rc = NETDEV_TX_OK;
@@ -2602,9 +2602,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			dev_queue_xmit_nit(skb, dev);
 
 		skb_len = skb->len;
-		rc = ops->ndo_start_xmit(skb, dev);
+		if (accel_priv)
+			rc = ops->ndo_dfwd_start_xmit(skb, dev, accel_priv);
+		else
+			rc = ops->ndo_start_xmit(skb, dev);
+
 		trace_net_dev_xmit(skb, rc, dev, skb_len);
-		if (rc == NETDEV_TX_OK)
+		if (rc == NETDEV_TX_OK && txq)
 			txq_trans_update(txq);
 		return rc;
 	}
@@ -2620,7 +2624,10 @@ gso:
 			dev_queue_xmit_nit(nskb, dev);
 
 		skb_len = nskb->len;
-		rc = ops->ndo_start_xmit(nskb, dev);
+		if (accel_priv)
+			rc = ops->ndo_dfwd_start_xmit(nskb, dev, accel_priv);
+		else
+			rc = ops->ndo_start_xmit(nskb, dev);
 		trace_net_dev_xmit(nskb, rc, dev, skb_len);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
@@ -2645,6 +2652,7 @@ out_kfree_skb:
 out:
 	return rc;
 }
+EXPORT_SYMBOL_GPL(dev_hard_start_xmit);
 
 static void qdisc_pkt_len_init(struct sk_buff *skb)
 {
@@ -2852,7 +2860,7 @@ int dev_queue_xmit(struct sk_buff *skb)
 
 			if (!netif_xmit_stopped(txq)) {
 				__this_cpu_inc(xmit_recursion);
-				rc = dev_hard_start_xmit(skb, dev, txq);
+				rc = dev_hard_start_xmit(skb, dev, txq, NULL);
 				__this_cpu_dec(xmit_recursion);
 				if (dev_xmit_complete(rc)) {
 					HARD_TX_UNLOCK(dev, txq);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 78e9d92..9f0c599b 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -94,6 +94,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_LOOPBACK_BIT] =         "loopback",
 	[NETIF_F_RXFCS_BIT] =            "rx-fcs",
 	[NETIF_F_RXALL_BIT] =            "rx-all",
+	[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
 };
 
 static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e7121d2..8c44b1b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -126,7 +126,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 	HARD_TX_LOCK(dev, txq, smp_processor_id());
 	if (!netif_xmit_frozen_or_stopped(txq))
-		ret = dev_hard_start_xmit(skb, dev, txq);
+		ret = dev_hard_start_xmit(skb, dev, txq, NULL);
 
 	HARD_TX_UNLOCK(dev, txq);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans
  2013-10-11 18:43   ` [RFC PATCH 0/2 v3] " Neil Horman
  2013-10-11 18:43     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
@ 2013-10-11 18:43     ` Neil Horman
  2013-10-13 20:48       ` John Fastabend
  1 sibling, 1 reply; 35+ messages in thread
From: Neil Horman @ 2013-10-11 18:43 UTC (permalink / raw)
  To: netdev; +Cc: john.fastabend, Andy Gospodarek, David Miller, Neil Horman

Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to
take advantage of these operations.  Allow it to allocate queues for a macvlan
so that when we transmit a frame, we can do the switching in hardware inside the
ixgbe card, rather than in software.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: john.fastabend@gmail.com
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |  33 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h     |  54 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |  15 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 388 ++++++++++++++++++-----
 5 files changed, 400 insertions(+), 94 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 0ac6b11..e924efa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -219,6 +219,17 @@ enum ixgbe_ring_state_t {
 	__IXGBE_RX_FCOE,
 };
 
+struct ixgbe_fwd_adapter {
+	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
+	struct net_device *netdev;
+	struct ixgbe_adapter *real_adapter;
+	unsigned int tx_base_queue;
+	unsigned int rx_base_queue;
+	struct net_device_stats net_stats;
+	int pool;
+	bool online;
+};
+
 #define check_for_tx_hang(ring) \
 	test_bit(__IXGBE_TX_DETECT_HANG, &(ring)->state)
 #define set_check_for_tx_hang(ring) \
@@ -236,6 +247,7 @@ struct ixgbe_ring {
 	struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
 	struct net_device *netdev;	/* netdev ring belongs to */
 	struct device *dev;		/* device for DMA mapping */
+	struct ixgbe_fwd_adapter *l2_accel_priv;
 	void *desc;			/* descriptor ring memory */
 	union {
 		struct ixgbe_tx_buffer *tx_buffer_info;
@@ -244,6 +256,7 @@ struct ixgbe_ring {
 	unsigned long last_rx_timestamp;
 	unsigned long state;
 	u8 __iomem *tail;
+	struct net_device *vmdq_netdev;
 	dma_addr_t dma;			/* phys. address of descriptor ring */
 	unsigned int size;		/* length in bytes */
 
@@ -288,11 +301,15 @@ enum ixgbe_ring_f_enum {
 };
 
 #define IXGBE_MAX_RSS_INDICES  16
-#define IXGBE_MAX_VMDQ_INDICES 64
+#define IXGBE_MAX_VMDQ_INDICES 32
 #define IXGBE_MAX_FDIR_INDICES 63	/* based on q_vector limit */
 #define IXGBE_MAX_FCOE_INDICES  8
 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
+#define IXGBE_MAX_L2A_QUEUES 4
+#define IXGBE_MAX_L2A_QUEUES 4
+#define IXGBE_BAD_L2A_QUEUE 3
+
 struct ixgbe_ring_feature {
 	u16 limit;	/* upper limit on feature indices */
 	u16 indices;	/* current value of indices */
@@ -738,6 +755,7 @@ struct ixgbe_adapter {
 #endif /*CONFIG_DEBUG_FS*/
 
 	u8 default_up;
+	unsigned long fwd_bitmask; /* Bitmask indicating in use pools */
 };
 
 struct ixgbe_fdir_filter {
@@ -879,9 +897,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {}
 static inline void ixgbe_dbg_init(void) {}
 static inline void ixgbe_dbg_exit(void) {}
 #endif /* CONFIG_DEBUG_FS */
+static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring)
+{
+	return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev;
+}
+
 static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring)
 {
-	return netdev_get_tx_queue(ring->netdev, ring->queue_index);
+	return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index);
 }
 
 extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter);
@@ -915,4 +938,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr);
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter);
 #endif
 
+int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd);
+int ixgbe_write_uc_addr_list(struct net_device *netdev);
+netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
+				  struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *tx_ring);
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring);
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index e8649ab..277af14 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = {
 };
 #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN
 
-static int ixgbe_get_settings(struct net_device *netdev,
-                              struct ethtool_cmd *ecmd)
+int ixgbe_get_settings(struct net_device *netdev,
+		       struct ethtool_cmd *ecmd)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h
new file mode 100644
index 0000000..2f36584
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_l2a.h
@@ -0,0 +1,54 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include "ixgbe.h"
+
+
+static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
+					   u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+	/* skip the flush */
+}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 90b4e10..e2dd635 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 #endif
 
 	/* only proceed if SR-IOV is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) &&
+	    !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED))
 		return false;
 
 	/* Add starting offset to total pool count */
@@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 
 		/* apply Tx specific ring traits */
 		ring->count = adapter->tx_ring_count;
-		ring->queue_index = txr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				txr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = txr_idx;
 
 		/* assign ring to adapter */
 		adapter->tx_ring[txr_idx] = ring;
@@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 #endif /* IXGBE_FCOE */
 		/* apply Rx specific ring traits */
 		ring->count = adapter->rx_ring_count;
-		ring->queue_index = rxr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				rxr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = rxr_idx;
 
 		/* assign ring to adapter */
 		adapter->rx_ring[rxr_idx] = ring;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..582d30b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -52,6 +52,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_l2a.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -118,6 +119,8 @@ static DEFINE_PCI_DEVICE_TABLE(ixgbe_pci_tbl) = {
 };
 MODULE_DEVICE_TABLE(pci, ixgbe_pci_tbl);
 
+netdev_tx_t ixgbe_fwd_xmit_frame(struct sk_buff *skb, void *priv);
+
 #ifdef CONFIG_IXGBE_DCA
 static int ixgbe_notify_dca(struct notifier_block *, unsigned long event,
 			    void *p);
@@ -872,7 +875,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
 
 static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
 {
-	struct ixgbe_adapter *adapter = netdev_priv(ring->netdev);
+	struct net_device *dev = ring->netdev;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 
 	u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx));
@@ -1055,7 +1059,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1072,16 +1076,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
+		if (__netif_subqueue_stopped(netdev_ring(tx_ring),
 					     tx_ring->queue_index)
 		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
+			netif_wake_subqueue(netdev_ring(tx_ring),
 					    tx_ring->queue_index);
 			++tx_ring->tx_stats.restart_queue;
 		}
@@ -1226,7 +1230,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring,
 				 union ixgbe_adv_rx_desc *rx_desc,
 				 struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
+	if (netdev_ring(ring)->features & NETIF_F_RXHASH)
 		skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
 }
 
@@ -1260,10 +1264,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
+	struct net_device *dev = netdev_ring(ring);
+
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(ring->netdev->features & NETIF_F_RXCSUM))
+	if (!(dev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -1559,7 +1565,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
-	struct net_device *dev = rx_ring->netdev;
+	struct net_device *dev = netdev_ring(rx_ring);
 
 	ixgbe_update_rsc_stats(rx_ring, skb);
 
@@ -1739,7 +1745,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 				  union ixgbe_adv_rx_desc *rx_desc,
 				  struct sk_buff *skb)
 {
-	struct net_device *netdev = rx_ring->netdev;
+	struct net_device *netdev = netdev_ring(rx_ring);
 
 	/* verify that the packet does not have any known errors */
 	if (unlikely(ixgbe_test_staterr(rx_desc,
@@ -1905,7 +1911,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+		skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring),
 						IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
@@ -1986,6 +1992,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	int ddp_bytes;
 	unsigned int mss = 0;
+	struct net_device *netdev = netdev_ring(rx_ring);
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
 
@@ -1993,6 +2000,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct sk_buff *skb;
 
+		if (rx_ring->l2_accel_priv) {
+			printk(KERN_CRIT "RECEIVING ON AN ACCELERATED QUEUE\n");
+		}
+
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
 			ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
@@ -2041,7 +2052,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			/* include DDPed FCoE data */
 			if (ddp_bytes > 0) {
 				if (!mss) {
-					mss = rx_ring->netdev->mtu -
+					mss = netdev->mtu -
 						sizeof(struct fcoe_hdr) -
 						sizeof(struct fc_frame_header) -
 						sizeof(struct fcoe_crc_eof);
@@ -2455,58 +2466,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 	}
 }
 
-static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
-					   u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
-static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
-					    u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
 /**
  * ixgbe_irq_enable - Enable default interrupt generation settings
  * @adapter: board private structure
@@ -2946,6 +2905,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 			     struct ixgbe_ring *ring)
 {
+	struct net_device *netdev = netdev_ring(ring);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
@@ -3005,7 +2965,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 		struct ixgbe_q_vector *q_vector = ring->q_vector;
 
 		if (q_vector)
-			netif_set_xps_queue(adapter->netdev,
+			netif_set_xps_queue(netdev,
 					    &q_vector->affinity_mask,
 					    ring->queue_index);
 	}
@@ -3395,7 +3355,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int rss_i = adapter->ring_feature[RING_F_RSS].indices;
-	int p;
+	u16 pool;
 
 	/* PSRTYPE must be initialized in non 82598 adapters */
 	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
@@ -3412,9 +3372,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	else if (rss_i > 1)
 		psrtype |= 1 << 29;
 
-	for (p = 0; p < adapter->num_rx_pools; p++)
-		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)),
-				psrtype);
+	for_each_set_bit(pool, &adapter->fwd_bitmask, 32)
+		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
 }
 
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
@@ -3683,6 +3642,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl &= ~IXGBE_RXDCTL_VME;
@@ -3713,6 +3674,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl |= IXGBE_RXDCTL_VME;
@@ -3743,15 +3706,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
  *                0 on no addresses written
  *                X on writing X addresses to the RAR table
  **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev)
+int ixgbe_write_uc_addr_list(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	unsigned int rar_entries = hw->mac.num_rar_entries - 1;
 	int count = 0;
 
-	/* In SR-IOV mode significantly less RAR entries are available */
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+	/* In SR-IOV/VMDQ modes significantly less RAR entries are available */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED ||
+	    adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)
 		rar_entries = IXGBE_MAX_PF_MACVLANS - 1;
 
 	/* return ENOMEM indicating insufficient memory for addresses */
@@ -3772,6 +3736,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev)
 			count++;
 		}
 	}
+
 	/* write the addresses in reverse order to avoid write combining */
 	for (; rar_entries > 0 ; rar_entries--)
 		hw->mac.ops.clear_rar(hw, rar_entries);
@@ -4133,6 +4098,7 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	ixgbe_configure_virtualization(adapter);
 
 	ixgbe_set_rx_mode(adapter->netdev);
+
 	ixgbe_restore_vlan(adapter);
 
 	switch (hw->mac.type) {
@@ -4459,7 +4425,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
  **/
-static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
 	unsigned long size;
@@ -4838,6 +4804,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		return -EIO;
 	}
 
+	/* PF holds first pool slot */
+	set_bit(0, &adapter->fwd_bitmask);
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
 	return 0;
@@ -5143,7 +5111,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
 static int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
+	int err, queues;
 
 	/* disallow open during test */
 	if (test_bit(__IXGBE_TESTING, &adapter->state))
@@ -5168,16 +5136,22 @@ static int ixgbe_open(struct net_device *netdev)
 		goto err_req_irq;
 
 	/* Notify the stack of the actual queue counts. */
-	err = netif_set_real_num_tx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_tx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_tx_queues > IXGBE_MAX_L2A_QUEUES)
+		queues = IXGBE_MAX_L2A_QUEUES;
+	else
+		queues = adapter->num_tx_queues;
+
+	err = netif_set_real_num_tx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
-
-	err = netif_set_real_num_rx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_rx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_rx_queues > IXGBE_MAX_L2A_QUEUES)
+		queues = IXGBE_MAX_L2A_QUEUES;
+	else
+		queues = adapter->num_rx_queues;
+	err = netif_set_real_num_rx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
@@ -5215,7 +5189,6 @@ static int ixgbe_close(struct net_device *netdev)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
 	ixgbe_ptp_stop(adapter);
-
 	ixgbe_down(adapter);
 	ixgbe_free_irq(adapter);
 
@@ -6576,7 +6549,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -6588,7 +6561,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
@@ -6639,6 +6612,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_ring *tx_ring)
 {
 	struct ixgbe_tx_buffer *first;
+#ifdef IXGBE_FCOE
+	struct net_device *dev;
+#endif
 	int tso;
 	u32 tx_flags = 0;
 	unsigned short f;
@@ -6730,9 +6706,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 	first->protocol = protocol;
 
 #ifdef IXGBE_FCOE
+	dev = netdev_ring(tx_ring);
 	/* setup tx offload for FCoE */
 	if ((protocol == __constant_htons(ETH_P_FCOE)) &&
-	    (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
+	    (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
 		tso = ixgbe_fso(tx_ring, first, &hdr_len);
 		if (tso < 0)
 			goto out_drop;
@@ -6767,8 +6744,9 @@ out_drop:
 	return NETDEV_TX_OK;
 }
 
-static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb,
-				    struct net_device *netdev)
+static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb,
+				      struct net_device *netdev,
+				      struct ixgbe_ring *ring)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_ring *tx_ring;
@@ -6784,10 +6762,17 @@ static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb,
 		skb_set_tail_pointer(skb, 17);
 	}
 
-	tx_ring = adapter->tx_ring[skb->queue_mapping];
+	tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
+
 	return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
 }
 
+static netdev_tx_t ixgbe_xmit_frame(struct sk_buff *skb,
+				    struct net_device*netdev)
+{
+	return __ixgbe_xmit_frame(skb, netdev, NULL);
+}
+
 /**
  * ixgbe_set_mac - Change the Ethernet Address of the NIC
  * @netdev: network interface device structure
@@ -7057,6 +7042,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	 */
 	if (netif_running(dev))
 		ixgbe_close(dev);
+
 	ixgbe_clear_interrupt_scheme(adapter);
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7305,6 +7291,231 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
 }
 
+static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+}
+
+static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter,
+				 u8 *addr, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool;
+	hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV);
+}
+
+static void ixgbe_fwd_psrtype(struct ixgbe_fwd_adapter *vadapter)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int rss_i = vadapter->netdev->real_num_rx_queues;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u16 pool = vadapter->pool;
+	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
+		      IXGBE_PSRTYPE_UDPHDR |
+		      IXGBE_PSRTYPE_IPV4HDR |
+		      IXGBE_PSRTYPE_L2HDR |
+		      IXGBE_PSRTYPE_IPV6HDR;
+
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		return;
+
+	if (rss_i > 3)
+		psrtype |= 2 << 29;
+	else if (rss_i > 1)
+		psrtype |= 1 << 29;
+
+	IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
+}
+
+static void ixgbe_enable_fwd_ring(struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *rx_ring,
+				  struct ixgbe_fwd_adapter *accel)
+{
+	rx_ring->l2_accel_priv = accel;
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+}
+
+
+static void ixgbe_disable_fwd_ring(struct ixgbe_fwd_adapter *vadapter,
+				   struct ixgbe_ring *rx_ring)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int index = rx_ring->queue_index + vadapter->rx_base_queue;
+
+	/* shutdown specific queue receive and wait for dma to settle */
+	ixgbe_disable_rx_queue(adapter, rx_ring);
+	usleep_range(10000, 20000);
+	ixgbe_irq_disable_queues(adapter, ((u64)1 << index));
+	ixgbe_clean_rx_ring(rx_ring);
+	rx_ring->l2_accel_priv = NULL;
+}
+
+int ixgbe_fwd_ring_up(struct net_device *vdev, struct ixgbe_fwd_adapter *accel)
+{
+	struct ixgbe_adapter *adapter = accel->real_adapter;
+	unsigned int rxbase = accel->pool * adapter->num_rx_queues_per_pool;
+	unsigned int txbase = accel->pool * adapter->num_rx_queues_per_pool;
+	int err, i;
+
+
+	accel->rx_base_queue = rxbase;
+	accel->tx_base_queue = txbase;
+
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]);
+
+	for (i = 0; i < vdev->num_rx_queues; i++) {
+		adapter->rx_ring[rxbase + i]->vmdq_netdev = vdev;
+		ixgbe_enable_fwd_ring(adapter, adapter->rx_ring[rxbase + i], accel);
+	}
+
+	for (i = 0; i < vdev->num_tx_queues; i++)
+		adapter->tx_ring[txbase + i]->vmdq_netdev = vdev;
+
+	if (is_valid_ether_addr(vdev->dev_addr))
+		ixgbe_add_mac_filter(adapter, vdev->dev_addr, accel->pool);
+
+	err = netif_set_real_num_tx_queues(vdev, vdev->num_tx_queues);
+	if (err)
+		goto err_set_queues;
+	err = netif_set_real_num_rx_queues(vdev, vdev->num_rx_queues);
+	if (err)
+		goto err_set_queues;
+
+	ixgbe_fwd_psrtype(accel);
+	netif_tx_start_all_queues(vdev);
+	return 0;
+err_set_queues:
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]);
+	return err;
+}
+
+int ixgbe_fwd_ring_down(struct net_device *vdev, struct ixgbe_fwd_adapter *accel)
+{
+	struct ixgbe_adapter *adapter = accel->real_adapter;
+	unsigned int rxbase = accel->rx_base_queue;
+	int i;
+
+	netif_tx_stop_all_queues(vdev);
+
+	for (i = 0; i < vdev->num_rx_queues; i++)
+		ixgbe_disable_fwd_ring(accel, adapter->rx_ring[rxbase + i]);
+
+	return 0;
+}
+
+static void* ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
+{
+	struct ixgbe_fwd_adapter *fwd_adapter = NULL;
+	struct ixgbe_adapter *adapter = netdev_priv(pdev);
+	int pool, vmdq_pool, base_queue;
+	int err;
+
+	/* Check for hardware restriction on number of rx/tx queues */
+	if (vdev->num_rx_queues != vdev->num_tx_queues ||
+	    vdev->num_tx_queues > IXGBE_MAX_L2A_QUEUES ||
+	    vdev->num_tx_queues == IXGBE_BAD_L2A_QUEUE) {
+		netdev_info(pdev, "%s: Supports RX/TX Queue counts 1,2, and 4\n",
+		       pdev->name);
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES)
+		return ERR_PTR(-EBUSY);
+
+	fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL);
+	if (!fwd_adapter)
+		return ERR_PTR(-ENOMEM);
+
+	pool = find_first_zero_bit(&adapter->fwd_bitmask, 32);
+	adapter->num_rx_pools++;
+	set_bit(pool, &adapter->fwd_bitmask);
+
+	/* Enable VMDq flag so device will be set in VM mode */
+	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
+	adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools;
+	adapter->ring_feature[RING_F_VMDQ].offset = 0;
+	adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_L2A_QUEUES;
+
+	/* Force reinit of ring allocation with VMDQ enabled */
+	ixgbe_setup_tc(pdev, netdev_get_num_tc(pdev));
+
+	/* Configure VSI adapter structure */
+	vmdq_pool = VMDQ_P(pool);
+	base_queue = vmdq_pool * adapter->num_rx_queues_per_pool;
+
+	netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   pool, adapter->num_rx_pools,
+		   base_queue, base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->fwd_bitmask);
+
+	fwd_adapter->pool = pool;
+	fwd_adapter->netdev = vdev;
+	fwd_adapter->real_adapter = adapter;
+	fwd_adapter->rx_base_queue = base_queue;
+	fwd_adapter->tx_base_queue = base_queue;
+
+	err = ixgbe_fwd_ring_up(vdev, fwd_adapter);	
+	if (!err) {
+		kfree(fwd_adapter);
+		return ERR_PTR(err);
+	}
+	return fwd_adapter;
+}
+
+static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
+{
+	struct ixgbe_fwd_adapter *fwd_adapter = priv; 
+	struct ixgbe_adapter *adapter = fwd_adapter->real_adapter;
+
+	clear_bit(fwd_adapter->pool, &adapter->fwd_bitmask);
+	adapter->num_rx_pools--;
+
+	ixgbe_fwd_ring_down(fwd_adapter->netdev, fwd_adapter);
+
+	netdev_dbg(pdev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   fwd_adapter->pool, adapter->num_rx_pools,
+		   fwd_adapter->rx_base_queue,
+		   fwd_adapter->rx_base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->fwd_bitmask);
+}
+
+static netdev_tx_t ixgbe_fwd_xmit (struct sk_buff *skb,
+				   struct net_device *dev,
+				   void *priv)
+{
+	struct ixgbe_fwd_adapter *fwd_adapter = priv;
+	unsigned int queue;
+	struct ixgbe_ring *tx_ring;
+
+	queue = skb->queue_mapping + fwd_adapter->tx_base_queue;
+	tx_ring = fwd_adapter->real_adapter->tx_ring[queue];
+
+	return __ixgbe_xmit_frame(skb, dev, tx_ring);
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7349,6 +7560,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
 	.ndo_bridge_setlink	= ixgbe_ndo_bridge_setlink,
 	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
+	.ndo_dfwd_add_station	= ixgbe_fwd_add,
+	.ndo_dfwd_del_station	= ixgbe_fwd_del,
+	.ndo_dfwd_start_xmit	= ixgbe_fwd_xmit,
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-11 18:43     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
@ 2013-10-13 20:46       ` John Fastabend
  2013-10-14 10:48         ` Neil Horman
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-10-13 20:46 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Andy Gospodarek, David Miller

On 10/11/2013 11:43 AM, Neil Horman wrote:
> Add a operations structure that allows a network interface to export the fact
> that it supports package forwarding in hardware between physical interfaces and
> other mac layer devices assigned to it (such as macvlans).  this operaions
> structure can be used by virtual mac devices to bypass software switching so
> that forwarding can be done in hardware more efficiently.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: john.fastabend@gmail.com
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: "David S. Miller" <davem@davemloft.net>
> ---

[...]

>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2ddb48d..1710fdb 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -426,9 +426,9 @@ struct sk_buff {
>   	char			cb[48] __aligned(8);
>
>   	unsigned long		_skb_refdst;
> -#ifdef CONFIG_XFRM
> +

Is this a hold-over from the previous patches? 'sp' isn't touched
anywhere else so put the ifdef/endif back.

>   	struct	sec_path	*sp;
> -#endif
> +
>   	unsigned int		len,
>   				data_len;
>   	__u16			mac_len,
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5c713f2..ecad8c2 100644

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans
  2013-10-11 18:43     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
@ 2013-10-13 20:48       ` John Fastabend
  2013-10-14 10:50         ` Neil Horman
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2013-10-13 20:48 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Andy Gospodarek, David Miller

On 10/11/2013 11:43 AM, Neil Horman wrote:
> Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to
> take advantage of these operations.  Allow it to allocate queues for a macvlan
> so that when we transmit a frame, we can do the switching in hardware inside the
> ixgbe card, rather than in software.
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: john.fastabend@gmail.com
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: "David S. Miller" <davem@davemloft.net>
> ---

Neil, I'm fairly sure I can simplify this patch further. I'll be able
to take a shot at it Tuesday if you don't mind waiting.

I can resubmit the series then and preserve your signed-off on at least
the first patch. Seem reasonable?

.John

-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
  2013-10-13 20:46       ` John Fastabend
@ 2013-10-14 10:48         ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-14 10:48 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, Andy Gospodarek, David Miller

On Sun, Oct 13, 2013 at 01:46:01PM -0700, John Fastabend wrote:
> On 10/11/2013 11:43 AM, Neil Horman wrote:
> >Add a operations structure that allows a network interface to export the fact
> >that it supports package forwarding in hardware between physical interfaces and
> >other mac layer devices assigned to it (such as macvlans).  this operaions
> >structure can be used by virtual mac devices to bypass software switching so
> >that forwarding can be done in hardware more efficiently.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@gmail.com
> >CC: Andy Gospodarek <andy@greyhouse.net>
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> 
> [...]
> 
> >
> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >index 2ddb48d..1710fdb 100644
> >--- a/include/linux/skbuff.h
> >+++ b/include/linux/skbuff.h
> >@@ -426,9 +426,9 @@ struct sk_buff {
> >  	char			cb[48] __aligned(8);
> >
> >  	unsigned long		_skb_refdst;
> >-#ifdef CONFIG_XFRM
> >+
> 
> Is this a hold-over from the previous patches? 'sp' isn't touched
> anywhere else so put the ifdef/endif back.
> 
Yeah, my screw up, I wanted to get this out before the weekend and missed that
screw up.  Sorry.

Neil

> >  	struct	sec_path	*sp;
> >-#endif
> >+
> >  	unsigned int		len,
> >  				data_len;
> >  	__u16			mac_len,
> >diff --git a/net/core/dev.c b/net/core/dev.c
> >index 5c713f2..ecad8c2 100644
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans
  2013-10-13 20:48       ` John Fastabend
@ 2013-10-14 10:50         ` Neil Horman
  0 siblings, 0 replies; 35+ messages in thread
From: Neil Horman @ 2013-10-14 10:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, Andy Gospodarek, David Miller

On Sun, Oct 13, 2013 at 01:48:33PM -0700, John Fastabend wrote:
> On 10/11/2013 11:43 AM, Neil Horman wrote:
> >Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to
> >take advantage of these operations.  Allow it to allocate queues for a macvlan
> >so that when we transmit a frame, we can do the switching in hardware inside the
> >ixgbe card, rather than in software.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@gmail.com
> >CC: Andy Gospodarek <andy@greyhouse.net>
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> 
> Neil, I'm fairly sure I can simplify this patch further. I'll be able
> to take a shot at it Tuesday if you don't mind waiting.
> 
> I can resubmit the series then and preserve your signed-off on at least
> the first patch. Seem reasonable?
> 
That sounds fine.
Thanks!
Neil

> .John
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-10-14 10:50 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-11 18:45 [RFC PATCH 0/4] Series short description John Fastabend
2013-09-11 18:46 ` [RFC PATCH 1/4] net: rtnetlink: make priv_size a function for devs with dynamic size John Fastabend
2013-09-11 18:46 ` [RFC PATCH 2/4] net: Add lower dev list helpers John Fastabend
2013-09-14 12:27   ` Veaceslav Falico
2013-09-14 20:43     ` John Fastabend
2013-09-14 21:14       ` Veaceslav Falico
2013-09-11 18:47 ` [RFC PATCH 3/4] net: VSI: Add virtual station interface support John Fastabend
2013-09-20 23:12   ` Neil Horman
2013-09-21 17:30     ` John Fastabend
2013-09-22 16:44       ` Neil Horman
2013-09-11 18:47 ` [RFC PATCH 4/4] ixgbe: Adding VSI support to ixgbe John Fastabend
2013-09-25 20:16 ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration Neil Horman
2013-09-25 20:16   ` [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
2013-10-02  7:08     ` John Fastabend
2013-10-02 12:53       ` Neil Horman
2013-09-25 20:16   ` [RFC PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
2013-10-02  6:31   ` [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
2013-10-02 13:28     ` Neil Horman
2013-10-04 20:10   ` [RFC PATCH 0/2 v2] " Neil Horman
2013-10-04 20:10     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
2013-10-07 19:52       ` David Miller
2013-10-07 21:20         ` Neil Horman
2013-10-07 21:34           ` David Miller
2013-10-07 22:39             ` John Fastabend
2013-10-08  0:52               ` Neil Horman
2013-10-04 20:10     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
2013-10-07 22:09     ` [RFC PATCH 0/2 v2] net: alternate proposal for using macvlans with forwarding acceleration John Fastabend
2013-10-08  1:08       ` Neil Horman
2013-10-11 18:43   ` [RFC PATCH 0/2 v3] " Neil Horman
2013-10-11 18:43     ` [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices Neil Horman
2013-10-13 20:46       ` John Fastabend
2013-10-14 10:48         ` Neil Horman
2013-10-11 18:43     ` [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans Neil Horman
2013-10-13 20:48       ` John Fastabend
2013-10-14 10:50         ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).