All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH 0/2] iovnl netlink ops + enic dynamic vnics
@ 2010-04-19 19:18 Scott Feldman
  2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
                   ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Scott Feldman @ 2010-04-19 19:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw

Patch 1/2 adds new I/O Virtualization netlink ops:

  IOV netlink (IOVNL) adds I/O Virtualization control support to a master
  device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
  control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
  design allows for the case where master and slave are the
  same netdev interface.

  The ops currently defined are:
  	set_mac_vlan: set mac+vlan on VF
	set_port_profile: set port-profile on VF
	unset_port_profile: unset port-profile on VF

Patch 2/2 adds IOV netlink ops support to enic dynamic vnics:

  Add enic iovnl ops to support setting port-profile for dynamic vnics.  Enic
  dynamic vnics are just like normal enic eth vnics except dynamic vnics require
  an extra configuration step to assign a port-profile identifier to the
  interface before the interface is useable. Once assigned, link comes up
  on the interface and is ready for I/O.  The port-profile is used to configure
  the network port assigned to the interface.  The network port configuration
  includes VLAN membership, QoS policies, and port security settings typical
  of a data center network.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [net-next PATCH 1/2] add iovnl netlink support
  2010-04-19 19:18 [net-next PATCH 0/2] iovnl netlink ops + enic dynamic vnics Scott Feldman
@ 2010-04-19 19:18 ` Scott Feldman
  2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
                     ` (2 more replies)
  2010-04-19 19:18 ` [net-next PATCH 2/2] add enic iovnl ops for dynamic vnics Scott Feldman
  2010-04-19 21:35 ` [net-next PATCH 0/2] iovnl netlink ops + enic " Chris Wright
  2 siblings, 3 replies; 53+ messages in thread
From: Scott Feldman @ 2010-04-19 19:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw

From: Scott Feldman <scofeldm@cisco.com>

IOV netlink (IOVNL) adds I/O Virtualization control support to a master
device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
design allows for the case where master and slave are the
same netdev interface.

One control setting example is MAC/VLAN settings for a VF.  Another
example control setting is a port-profile for a VF.  A port-profile is an
identifier that defines policy-based settings on the network port
backing the VF.  The network port settings examples are VLAN membership,
QoS settings, and L2 security settings, typical of a data center network.

This patch adds the iovnl interface definitions and an iovnl module.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 include/linux/iovnl.h     |  124 +++++++++++++++++++++
 include/linux/netdevice.h |    4 +
 include/linux/rtnetlink.h |    5 +
 include/net/iovnl.h       |   36 ++++++
 net/Kconfig               |    1 
 net/Makefile              |    3 +
 net/iovnl/Kconfig         |   10 ++
 net/iovnl/Makefile        |    1 
 net/iovnl/iovnl.c         |  260 +++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 444 insertions(+), 0 deletions(-)

diff --git a/include/linux/iovnl.h b/include/linux/iovnl.h
new file mode 100644
index 0000000..ac5fcd3
--- /dev/null
+++ b/include/linux/iovnl.h
@@ -0,0 +1,124 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __LINUX_IOVNL_H__
+#define __LINUX_IOVNL_H__
+
+#include <linux/types.h>
+
+#define IOVNL_PROTO_VERSION 1
+
+/**
+ * IOV netlink (IOVNL) adds I/O Virtualization control support to a master
+ * device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
+ * control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
+ * design allows for the degenerative case where master and slave are the
+ * same netdev interface.
+ *
+ * One control setting example is MAC/VLAN settings for a VF.  Another
+ * example control setting is a port-profile for a VF.  A port-profile is an
+ * identifier that defines policy-based settings on the network port
+ * backing the VF.  The network port settings examples are VLAN membership,
+ * QoS settings, and L2 security settings, typical of a data center network.
+ *
+ * This file defines an rtnetlink interface to allow setting of IOVNL
+ * on capable netdev devices.
+ */
+
+struct iovnlmsg {
+	__u8	family;
+	__u8	cmd;
+	__u16	pad;
+};
+
+/**
+ * enum iovnl_cmds - supported IOV commands
+ *
+ * @IOV_CMD_UNDEFINED: unspecified command to catch errors
+ * @IOV_CMD_SET_PORT_PROFILE: set the port-profile on the device
+ * @IOV_CMD_UNSET_PORT_PROFILE: clear port-profile on the device
+ * @IOV_CMD_GET_PORT_PROFILE_STATUS: return status of last
+ *   IOV_CMD_SET_PORT_PROFILE command
+ * @IOV_SET_MAC_VLAN: Set the MAC address and VLAN on the device
+ */
+enum iovnl_cmds {
+	IOV_CMD_UNDEFINED,
+
+	IOV_CMD_SET_PORT_PROFILE,
+	IOV_CMD_UNSET_PORT_PROFILE,
+	IOV_CMD_GET_PORT_PROFILE_STATUS,
+
+	IOV_CMD_SET_MAC_VLAN,
+
+	__IOV_CMD_ENUM_MAX,
+	IOV_CMD_MAX = __IOV_CMD_ENUM_MAX - 1,
+};
+
+/**
+ * enum iovnl_attrs - IOV top-level netlink attributes
+ *
+ * @IOV_ATTR_UNDEFINED: unspecified attribute to catch errors
+ * @IOV_ATTR_IFNAME: interface name of master (PF) net device (NLA_NUL_STRING)
+ * @IOV_ATTR_VF_IFNAME: interface name of target VF device (NLA_NUL_STRING)
+ * @IOV_ATTR_PORT_PROFILE: port-profile name to assign to device
+ *   (NLA_NUL_STRING)
+ * @IOV_ATTR_CLIENT_NAME: client name (NLA_NUL_STRING)
+ * @IOV_ATTR_HOST_UUID: host UUID (NLA_NUL_STRING)
+ * @IOV_ATTR_PORT_PROFILE_STATUS: status of last IOV_CMD_SET_PORT_PROFILE
+ *   command (NLA_U8)
+ * @IOV_ATTR_MAC_ADDR: device station MAC address (NLA_U8[6])
+ * @IOV_ATTR_VLAN: device 8021q VLAN ID (NLA_U16)
+ # @IOV_ATTR_STATUS: cmd return status code
+ */
+enum iovnl_attrs {
+	IOV_ATTR_UNDEFINED,
+
+	IOV_ATTR_IFNAME,
+	IOV_ATTR_VF_IFNAME,
+
+	IOV_ATTR_PORT_PROFILE,
+	IOV_ATTR_CLIENT_NAME,
+	IOV_ATTR_HOST_UUID,
+	IOV_ATTR_PORT_PROFILE_STATUS,
+
+	IOV_ATTR_MAC_ADDR,
+	IOV_ATTR_VLAN,
+
+	IOV_ATTR_STATUS,
+
+	__IOV_ATTR_ENUM_MAX,
+	IOV_ATTR_MAX = __IOV_ATTR_ENUM_MAX - 1,
+};
+
+/**
+ * enum iovnl_port_profile_status - IOV_ATTR_PORT_PROFILE_STATUS status
+ * return codes
+ *
+ * @IOV_PORT_PROFILE_STATUS_UNKNOWN: unspecified to catch errors
+ * @IOV_PORT_PROFILE_STATUS_SUCCESS:  port-profile aiovlied successfully
+ * @IOV_PORT_PROFILE_STATUS_ERROR: port-profile setting had error
+ * @IOV_PORT_PROFILE_STATUS_INPROGRESS: port-profile setting in-progress
+ */
+enum iovnl_port_profile_status {
+	IOV_PORT_PROFILE_STATUS_UNKNOWN,
+	IOV_PORT_PROFILE_STATUS_SUCCESS,
+	IOV_PORT_PROFILE_STATUS_ERROR,
+	IOV_PORT_PROFILE_STATUS_INPROGRESS,
+};
+
+#endif /* __LINUX_IOVNL_H__ */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 649a025..b531b0d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -50,6 +50,7 @@
 #ifdef CONFIG_DCB
 #include <net/dcbnl.h>
 #endif
+#include <net/iovnl.h>
 
 struct vlan_group;
 struct netpoll_info;
@@ -1048,6 +1049,9 @@ struct net_device {
 	const struct dcbnl_rtnl_ops *dcbnl_ops;
 #endif
 
+	/* IOV netlink ops */
+	const struct iovnl_ops *iovnl_ops;
+
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	/* max exchange id for FCoE LRO by ddp */
 	unsigned int		fcoe_ddp_xid;
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index d1c7c90..aafadf7 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -113,6 +113,11 @@ enum {
 	RTM_SETDCB,
 #define RTM_SETDCB RTM_SETDCB
 
+	RTM_GETIOV = 82,
+#define RTM_GETIOV RTM_GETIOV
+	RTM_SETIOV,
+#define RTM_SETIOV RTM_SETIOV
+
 	__RTM_MAX,
 #define RTM_MAX		(((__RTM_MAX + 3) & ~3) - 1)
 };
diff --git a/include/net/iovnl.h b/include/net/iovnl.h
new file mode 100644
index 0000000..c353eee
--- /dev/null
+++ b/include/net/iovnl.h
@@ -0,0 +1,36 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef __NET_IOVNL_H__
+#define __NET_IOVNL_H__
+
+/*
+ * Ops struct for the netlink callbacks.  Used by IOVNL-enabled drivers through
+ * the netdevice struct.
+ */
+struct iovnl_ops {
+	int (*set_port_profile)(struct net_device *, struct net_device *,
+		char *, u8 *, char *, char *);
+	int (*unset_port_profile)(struct net_device *, struct net_device *);
+	int (*get_port_profile_status)(struct net_device *,
+		struct net_device *);
+	int (*set_mac_vlan)(struct net_device *, struct net_device *,
+		u8 *, u16);
+};
+
+#endif /* __NET_IOVNL_H__ */
diff --git a/net/Kconfig b/net/Kconfig
index 0d68b40..aca5de0 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -203,6 +203,7 @@ source "net/phonet/Kconfig"
 source "net/ieee802154/Kconfig"
 source "net/sched/Kconfig"
 source "net/dcb/Kconfig"
+source "net/iovnl/Kconfig"
 
 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index cb7bdc1..23589e9 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -61,6 +61,9 @@ obj-$(CONFIG_CAIF)		+= caif/
 ifneq ($(CONFIG_DCB),)
 obj-y				+= dcb/
 endif
+ifneq ($(CONFIG_IOVNL),)
+obj-y				+= iovnl/
+endif
 obj-y				+= ieee802154/
 
 ifeq ($(CONFIG_NET),y)
diff --git a/net/iovnl/Kconfig b/net/iovnl/Kconfig
new file mode 100644
index 0000000..4548417
--- /dev/null
+++ b/net/iovnl/Kconfig
@@ -0,0 +1,10 @@
+config IOVNL
+	tristate "IOV rtnetlink support"
+	default n
+	---help---
+	  This enables support for configuring IOV
+	  on Ethernet adapters via rtnetlink.  Say 'Y'
+	  if you have a Ethernet adapter which supports network
+	  configuration using IOV rtnetlinl.
+
+	  If unsure, say N.
diff --git a/net/iovnl/Makefile b/net/iovnl/Makefile
new file mode 100644
index 0000000..9256d01
--- /dev/null
+++ b/net/iovnl/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_IOVNL) += iovnl.o
diff --git a/net/iovnl/iovnl.c b/net/iovnl/iovnl.c
new file mode 100644
index 0000000..ce9db50
--- /dev/null
+++ b/net/iovnl/iovnl.c
@@ -0,0 +1,260 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/iovnl.h>
+#include <net/netlink.h>
+#include <net/rtnetlink.h>
+#include <net/iovnl.h>
+#include <net/sock.h>
+
+MODULE_AUTHOR("Roopa Prabhu <roprabhu@cisco.com, "
+	"Scott Feldman <scofeldm@cisco.com>");
+MODULE_DESCRIPTION("IOV netlink");
+MODULE_LICENSE("GPL");
+
+/* IOVNL netlink attributes policy */
+static const struct nla_policy iovnl_rtnl_policy[IOV_ATTR_MAX + 1] = {
+	[IOV_ATTR_IFNAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
+	[IOV_ATTR_VF_IFNAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
+	[IOV_ATTR_PORT_PROFILE] =  { .type = NLA_NUL_STRING, .len = 32 },
+	[IOV_ATTR_CLIENT_NAME] = { .type = NLA_NUL_STRING, .len = 32 },
+	[IOV_ATTR_HOST_UUID] = { .type = NLA_NUL_STRING, .len = 64 },
+	[IOV_ATTR_PORT_PROFILE_STATUS] = { .type = NLA_U8 },
+	[IOV_ATTR_MAC_ADDR] = { .len = 6 },
+	[IOV_ATTR_VLAN] = { .type = NLA_U16 },
+	[IOV_ATTR_STATUS] = { .type = NLA_U8 },
+};
+
+/* standard netlink reply call */
+static int iovnl_reply(u8 value, u8 event, u8 cmd, u8 attr, u32 pid,
+	u32 seq, u16 flags)
+{
+	struct sk_buff *skb;
+	struct iovnlmsg *iov;
+	struct nlmsghdr *nlh;
+	int ret = -EINVAL;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return ret;
+
+	nlh = NLMSG_NEW(skb, pid, seq, event, sizeof(*iov), flags);
+
+	iov = NLMSG_DATA(nlh);
+	iov->family = AF_UNSPEC;
+	iov->cmd = cmd;
+	iov->pad = 0;
+
+	ret = nla_put_u8(skb, attr, value);
+	if (ret)
+		goto err;
+
+	/* end the message, assign the nlmsg_len. */
+	nlmsg_end(skb, nlh);
+	ret = rtnl_unicast(skb, &init_net, pid);
+	if (ret)
+		return -EINVAL;
+
+	return 0;
+nlmsg_failure:
+err:
+	kfree_skb(skb);
+	return ret;
+}
+
+static int iovnl_get_port_profile_status(struct net_device *dev,
+	struct net_device *vf_dev, u32 pid, u32 seq, u16 flags)
+{
+	int ret;
+
+	if (!dev->iovnl_ops->get_port_profile_status)
+		return -EINVAL;
+
+	ret = dev->iovnl_ops->get_port_profile_status(dev, vf_dev);
+
+	return  iovnl_reply(ret, RTM_GETIOV,
+		IOV_CMD_GET_PORT_PROFILE_STATUS, IOV_ATTR_PORT_PROFILE_STATUS,
+		pid, seq, flags);
+}
+
+
+static int iovnl_set_port_profile(struct net_device *dev,
+	struct net_device *vf_dev, struct nlattr **tb,
+	u32 pid, u32 seq, u16 flags)
+{
+	int i, ret;
+	char *port_profile = NULL;
+	u8 *mac_addr = NULL;
+	char *client_name = NULL;
+	char *host_uuid = NULL;
+
+	if (!tb[IOV_ATTR_PORT_PROFILE] || !dev->iovnl_ops->set_port_profile)
+		return -EINVAL;
+
+	for (i = 0; i <= IOV_ATTR_MAX; i++) {
+		if (!tb[i])
+			continue;
+		switch (tb[i]->nla_type) {
+		case IOV_ATTR_PORT_PROFILE:
+			port_profile = nla_data(tb[i]);
+			break;
+		case IOV_ATTR_MAC_ADDR:
+			mac_addr = nla_data(tb[i]);
+			break;
+		case IOV_ATTR_CLIENT_NAME:
+			client_name = nla_data(tb[i]);
+			break;
+		case IOV_ATTR_HOST_UUID:
+			host_uuid = nla_data(tb[i]);
+			break;
+		}
+	}
+
+	ret = dev->iovnl_ops->set_port_profile(dev, vf_dev,
+		port_profile, mac_addr, client_name, host_uuid);
+
+	return iovnl_reply(ret, RTM_SETIOV, IOV_CMD_SET_PORT_PROFILE,
+		IOV_ATTR_STATUS, pid, seq, flags);
+}
+
+static int iovnl_set_mac_vlan(struct net_device *dev,
+	struct net_device *vf_dev, struct nlattr **tb,
+	u32 pid, u32 seq, u16 flags)
+{
+	int i, ret;
+	u8 *mac_addr = NULL;
+	u16 vlan = 0;
+
+	if (!dev->iovnl_ops->set_mac_vlan)
+		return -EINVAL;
+
+	for (i = 0; i <= IOV_ATTR_MAX; i++) {
+		if (!tb[i])
+			continue;
+		switch (tb[i]->nla_type) {
+		case IOV_ATTR_MAC_ADDR:
+			mac_addr = nla_data(tb[i]);
+			break;
+		case IOV_ATTR_VLAN:
+			vlan = nla_get_u16(tb[i]);
+			break;
+		}
+	}
+
+	ret = dev->iovnl_ops->set_mac_vlan(dev, vf_dev,
+		mac_addr, vlan);
+
+	return iovnl_reply(ret, RTM_SETIOV, IOV_CMD_SET_MAC_VLAN,
+		IOV_ATTR_STATUS, pid, seq, flags);
+}
+
+static int iovnl_unset_port_profile(struct net_device *dev,
+	struct net_device *vf_dev, struct nlattr **tb,
+	u32 pid, u32 seq, u16 flags)
+{
+	int ret;
+
+	if (!dev->iovnl_ops->unset_port_profile)
+		return -EINVAL;
+
+	ret = dev->iovnl_ops->unset_port_profile(dev, vf_dev);
+
+	return iovnl_reply(ret, RTM_SETIOV, IOV_CMD_UNSET_PORT_PROFILE,
+		IOV_ATTR_STATUS, pid, seq, flags);
+}
+
+static int iovnl_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
+{
+	struct net *net = sock_net(skb->sk);
+	struct net_device *dev;
+	struct net_device *vf_dev = NULL;
+	struct iovnlmsg  *iov = (struct iovnlmsg *)NLMSG_DATA(nlh);
+	struct nlattr *tb[IOV_ATTR_MAX + 1];
+	u32 pid = skb ? NETLINK_CB(skb).pid : 0;
+	int ret;
+
+	if (!net_eq(net, &init_net))
+		return -EINVAL;
+
+	ret = nlmsg_parse(nlh, sizeof(*iov), tb, IOV_ATTR_MAX,
+		iovnl_rtnl_policy);
+	if (ret < 0)
+		return ret;
+
+	if (!tb[IOV_ATTR_IFNAME])
+		return -EINVAL;
+
+	dev = dev_get_by_name(&init_net, nla_data(tb[IOV_ATTR_IFNAME]));
+	if (!dev)
+		return -EINVAL;
+
+	if (tb[IOV_ATTR_VF_IFNAME])
+		vf_dev = dev_get_by_name(&init_net,
+			nla_data(tb[IOV_ATTR_VF_IFNAME]));
+
+	if (!dev->iovnl_ops)
+		goto errout;
+
+	switch (iov->cmd) {
+	case IOV_CMD_SET_PORT_PROFILE:
+		ret = iovnl_set_port_profile(dev, vf_dev,
+			tb, pid, nlh->nlmsg_seq, nlh->nlmsg_flags);
+		goto out;
+	case IOV_CMD_UNSET_PORT_PROFILE:
+		ret = iovnl_unset_port_profile(dev, vf_dev,
+			tb, pid, nlh->nlmsg_seq, nlh->nlmsg_flags);
+		goto out;
+	case IOV_CMD_GET_PORT_PROFILE_STATUS:
+		ret = iovnl_get_port_profile_status(dev, vf_dev,
+			pid, nlh->nlmsg_seq, nlh->nlmsg_flags);
+		goto out;
+	case IOV_CMD_SET_MAC_VLAN:
+		ret = iovnl_set_mac_vlan(dev, vf_dev,
+			tb, pid, nlh->nlmsg_seq, nlh->nlmsg_flags);
+		goto out;
+	default:
+		goto errout;
+	}
+errout:
+	ret = -EINVAL;
+out:
+	dev_put(dev);
+	if (vf_dev)
+		dev_put(vf_dev);
+
+	return ret;
+}
+
+static int __init iovnl_init(void)
+{
+	rtnl_register(PF_UNSPEC, RTM_GETIOV, iovnl_doit, NULL);
+	rtnl_register(PF_UNSPEC, RTM_SETIOV, iovnl_doit, NULL);
+
+	return 0;
+}
+module_init(iovnl_init);
+
+static void __exit iovnl_exit(void)
+{
+	rtnl_unregister(PF_UNSPEC, RTM_GETIOV);
+	rtnl_unregister(PF_UNSPEC, RTM_SETIOV);
+}
+module_exit(iovnl_exit);


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [net-next PATCH 2/2] add enic iovnl ops for dynamic vnics
  2010-04-19 19:18 [net-next PATCH 0/2] iovnl netlink ops + enic dynamic vnics Scott Feldman
  2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
@ 2010-04-19 19:18 ` Scott Feldman
  2010-04-19 21:35 ` [net-next PATCH 0/2] iovnl netlink ops + enic " Chris Wright
  2 siblings, 0 replies; 53+ messages in thread
From: Scott Feldman @ 2010-04-19 19:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw

From: Scott Feldman <scofeldm@cisco.com>

Add enic iovnl ops to support setting port-profile for dynamic vnics.  Enic
dynamic vnics are just like normal enic eth vnics except dynamic vnics require
an extra configuration step to assign a port-profile identifier to the
interface before the interface is useable. Once assigned, link comes up
on the interface and is ready for I/O.  The port-profile is used to configure
the network port assigned to the interface.  The network port configuration
includes VLAN membership, QoS policies, and port security settings typical
of a data center network.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 drivers/net/enic/Makefile     |    2 -
 drivers/net/enic/enic.h       |    3 +
 drivers/net/enic/enic_iovnl.c |  136 +++++++++++++++++++++++++++++++++++++++++
 drivers/net/enic/enic_main.c  |   67 +++++++++++++++++---
 drivers/net/enic/enic_res.c   |    5 ++
 drivers/net/enic/enic_res.h   |    1 
 drivers/net/enic/vnic_dev.c   |   58 +++++++++++++++++
 drivers/net/enic/vnic_dev.h   |    4 +
 drivers/net/enic/vnic_vic.c   |   73 ++++++++++++++++++++++
 drivers/net/enic/vnic_vic.h   |   58 +++++++++++++++++
 10 files changed, 394 insertions(+), 13 deletions(-)

diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile
index 391c3bc..311613b 100644
--- a/drivers/net/enic/Makefile
+++ b/drivers/net/enic/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_ENIC) := enic.o
 
 enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
-	enic_res.o vnic_dev.o vnic_rq.o
+	enic_res.o vnic_dev.o vnic_rq.o vnic_vic.o enic_iovnl.o
 
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 5fa56f1..5790655 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -122,4 +122,7 @@ struct enic {
 	unsigned int cq_count;
 };
 
+void enic_set_multicast_list(struct net_device *netdev);
+void enic_set_iovnl_ops(struct net_device *netdev);
+
 #endif /* _ENIC_H_ */
diff --git a/drivers/net/enic/enic_iovnl.c b/drivers/net/enic/enic_iovnl.c
new file mode 100644
index 0000000..37c5d85
--- /dev/null
+++ b/drivers/net/enic/enic_iovnl.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/netdevice.h>
+#include <linux/iovnl.h>
+#include <net/iovnl.h>
+
+#include "enic_res.h"
+#include "enic.h"
+#include "vnic_dev.h"
+#include "vnic_vic.h"
+
+static int enic_provinfo_add_tlv_str(struct vic_provinfo *vp, u16 type,
+	char *str)
+{
+	return str ? vic_provinfo_add_tlv(vp, type, strlen(str) + 1, str) : 0;
+}
+
+static int enic_set_port_profile(struct net_device *dev,
+	struct net_device *vf_dev, char *port_profile,
+	u8 *client_mac_addr, char *client_name, char *host_uuid)
+{
+	struct enic *enic = netdev_priv(dev);
+	struct vic_provinfo *vp;
+	u8 oui[3] = VIC_PROVINFO_CISCO_OUI;
+	int enable = 1;
+	int err;
+
+	if (vf_dev && dev != vf_dev)
+		return -EINVAL;
+
+	if (!port_profile)
+		return -EINVAL;
+
+	vp = vic_provinfo_alloc(GFP_KERNEL, oui, VIC_PROVINFO_LINUX_TYPE);
+	if (!vp)
+		return -ENOMEM;
+
+	enic_provinfo_add_tlv_str(vp,
+		VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR, port_profile);
+	vic_provinfo_add_tlv(vp, VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR,
+		6, client_mac_addr);
+	enic_provinfo_add_tlv_str(vp,
+		VIC_LINUX_PROV_TLV_CLIENT_NAME_STR, client_name);
+	enic_provinfo_add_tlv_str(vp,
+		VIC_LINUX_PROV_TLV_HOST_UUID_STR, host_uuid);
+
+	spin_lock(&enic->devcmd_lock);
+
+	err = vnic_dev_deinit(enic->vdev);
+	if (err)
+		goto err_out;
+
+	err = vnic_dev_logical_uplink(enic->vdev, enable);
+	if (err)
+		goto err_out;
+
+	err = vnic_dev_init_prov(enic->vdev, (u8 *)vp,
+		vic_provinfo_size(vp));
+
+err_out:
+	spin_unlock(&enic->devcmd_lock);
+
+	vic_provinfo_free(vp);
+
+	enic_set_multicast_list(dev);
+
+	return err;
+}
+
+static int enic_get_init_status(struct net_device *dev,
+	struct net_device *vf_dev)
+{
+	struct enic *enic = netdev_priv(dev);
+	int done, err, error;
+
+	if (vf_dev && dev != vf_dev)
+		return IOV_PORT_PROFILE_STATUS_UNKNOWN;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_init_done(enic->vdev, &done, &error);
+	spin_unlock(&enic->devcmd_lock);
+
+	if (err || error)
+		return IOV_PORT_PROFILE_STATUS_ERROR;
+
+	if (!done)
+		return IOV_PORT_PROFILE_STATUS_INPROGRESS;
+
+	if (!error)
+		return IOV_PORT_PROFILE_STATUS_SUCCESS;
+
+	return IOV_PORT_PROFILE_STATUS_UNKNOWN;
+}
+
+static int enic_unset_port_profile(struct net_device *dev,
+	struct net_device *vf_dev)
+{
+	struct enic *enic = netdev_priv(dev);
+	int err;
+
+	if (vf_dev && dev != vf_dev)
+		return -EINVAL;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_deinit(enic->vdev);
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static const struct iovnl_ops enic_iovnl_ops = {
+	.set_port_profile		= enic_set_port_profile,
+	.unset_port_profile		= enic_unset_port_profile,
+	.get_port_profile_status	= enic_get_init_status,
+};
+
+void enic_set_iovnl_ops(struct net_device *netdev)
+{
+	netdev->iovnl_ops = &enic_iovnl_ops;
+}
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 1232887..2fcc161 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -49,10 +49,12 @@
 #define ENIC_DESC_MAX_SPLITS		(MAX_TSO / WQ_ENET_MAX_DESC_LEN + 1)
 
 #define PCI_DEVICE_ID_CISCO_VIC_ENET         0x0043  /* ethernet vnic */
+#define PCI_DEVICE_ID_CISCO_VIC_ENET_DYN     0x0044  /* enet dynamic vnic */
 
 /* Supported devices */
 static DEFINE_PCI_DEVICE_TABLE(enic_id_table) = {
 	{ PCI_VDEVICE(CISCO, PCI_DEVICE_ID_CISCO_VIC_ENET) },
+	{ PCI_VDEVICE(CISCO, PCI_DEVICE_ID_CISCO_VIC_ENET_DYN) },
 	{ 0, }	/* end of table */
 };
 
@@ -113,6 +115,11 @@ static const struct enic_stat enic_rx_stats[] = {
 static const unsigned int enic_n_tx_stats = ARRAY_SIZE(enic_tx_stats);
 static const unsigned int enic_n_rx_stats = ARRAY_SIZE(enic_rx_stats);
 
+static int enic_is_dynamic(struct enic *enic)
+{
+	return enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_DYN;
+}
+
 static int enic_get_settings(struct net_device *netdev,
 	struct ethtool_cmd *ecmd)
 {
@@ -810,16 +817,44 @@ static void enic_reset_mcaddrs(struct enic *enic)
 
 static int enic_set_mac_addr(struct net_device *netdev, char *addr)
 {
-	if (!is_valid_ether_addr(addr))
-		return -EADDRNOTAVAIL;
+	struct enic *enic = netdev_priv(netdev);
+
+	if (enic_is_dynamic(enic)) {
+		if (!is_zero_ether_addr(addr) && !is_valid_ether_addr(addr))
+			return -EADDRNOTAVAIL;
+	} else {
+		if (!is_valid_ether_addr(addr))
+			return -EADDRNOTAVAIL;
+	}
 
 	memcpy(netdev->dev_addr, addr, netdev->addr_len);
 
 	return 0;
 }
 
+static int enic_set_mac_address(struct net_device *netdev, void *p)
+{
+	struct enic *enic = netdev_priv(netdev);
+	struct sockaddr *saddr = p;
+	char *addr = saddr->sa_data;
+	int err;
+
+	if (!enic_is_dynamic(enic))
+		return 0;
+
+	err = enic_set_mac_addr(netdev, addr);
+	if (!err) {
+		spin_lock(&enic->devcmd_lock);
+		enic_del_station_addr(enic);
+		enic_add_station_addr(enic);
+		spin_unlock(&enic->devcmd_lock);
+	}
+
+	return err;
+}
+
 /* netif_tx_lock held, BHs disabled */
-static void enic_set_multicast_list(struct net_device *netdev)
+void enic_set_multicast_list(struct net_device *netdev)
 {
 	struct enic *enic = netdev_priv(netdev);
 	struct netdev_hw_addr *ha;
@@ -1440,10 +1475,12 @@ static int enic_open(struct net_device *netdev)
 	for (i = 0; i < enic->rq_count; i++)
 		vnic_rq_enable(&enic->rq[i]);
 
-	spin_lock(&enic->devcmd_lock);
-	enic_add_station_addr(enic);
-	spin_unlock(&enic->devcmd_lock);
-	enic_set_multicast_list(netdev);
+	if (!enic_is_dynamic(enic)) {
+		spin_lock(&enic->devcmd_lock);
+		enic_add_station_addr(enic);
+		spin_unlock(&enic->devcmd_lock);
+		enic_set_multicast_list(netdev);
+	}
 
 	netif_wake_queue(netdev);
 	napi_enable(&enic->napi);
@@ -1782,6 +1819,7 @@ static const struct net_device_ops enic_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address 	= eth_mac_addr,
 	.ndo_set_multicast_list	= enic_set_multicast_list,
+	.ndo_set_mac_address	= enic_set_mac_address,
 	.ndo_change_mtu		= enic_change_mtu,
 	.ndo_vlan_rx_register	= enic_vlan_rx_register,
 	.ndo_vlan_rx_add_vid	= enic_vlan_rx_add_vid,
@@ -2010,11 +2048,13 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 
 	netif_carrier_off(netdev);
 
-	err = vnic_dev_init(enic->vdev, 0);
-	if (err) {
-		printk(KERN_ERR PFX
-			"vNIC dev init failed, aborting.\n");
-		goto err_out_dev_close;
+	if (!enic_is_dynamic(enic)) {
+		err = vnic_dev_init(enic->vdev, 0);
+		if (err) {
+			printk(KERN_ERR PFX
+				"vNIC dev init failed, aborting.\n");
+			goto err_out_dev_close;
+		}
 	}
 
 	err = enic_dev_init(enic);
@@ -2069,6 +2109,9 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 	if (using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
+	if (enic_is_dynamic(enic))
+		enic_set_iovnl_ops(netdev);
+
 	enic->csum_rx_enabled = ENIC_SETTING(enic, RXCSUM);
 
 	enic->lro_mgr.max_aggr = ENIC_LRO_MAX_AGGR;
diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c
index 02839bf..9f31f58 100644
--- a/drivers/net/enic/enic_res.c
+++ b/drivers/net/enic/enic_res.c
@@ -108,6 +108,11 @@ void enic_add_station_addr(struct enic *enic)
 	vnic_dev_add_addr(enic->vdev, enic->mac_addr);
 }
 
+void enic_del_station_addr(struct enic *enic)
+{
+	vnic_dev_del_addr(enic->vdev, enic->mac_addr);
+}
+
 void enic_add_multicast_addr(struct enic *enic, u8 *addr)
 {
 	vnic_dev_add_addr(enic->vdev, addr);
diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h
index abc1974..0e16ba0 100644
--- a/drivers/net/enic/enic_res.h
+++ b/drivers/net/enic/enic_res.h
@@ -132,6 +132,7 @@ struct enic;
 
 int enic_get_vnic_config(struct enic *);
 void enic_add_station_addr(struct enic *enic);
+void enic_del_station_addr(struct enic *enic);
 void enic_add_multicast_addr(struct enic *enic, u8 *addr);
 void enic_del_multicast_addr(struct enic *enic, u8 *addr);
 void enic_add_vlan(struct enic *enic, u16 vlanid);
diff --git a/drivers/net/enic/vnic_dev.c b/drivers/net/enic/vnic_dev.c
index d43a9d4..44b2e41 100644
--- a/drivers/net/enic/vnic_dev.c
+++ b/drivers/net/enic/vnic_dev.c
@@ -682,6 +682,64 @@ int vnic_dev_init(struct vnic_dev *vdev, int arg)
 	return r;
 }
 
+int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err)
+{
+	u64 a0 = 0, a1 = 0;
+	int wait = 1000;
+	int ret;
+
+	*done = 0;
+
+	ret = vnic_dev_cmd(vdev, CMD_INIT_STATUS, &a0, &a1, wait);
+	if (ret)
+		return ret;
+
+	*done = (a0 == 0);
+
+	*err = (a0 == 0) ? a1 : 0;
+
+	return 0;
+}
+
+int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len)
+{
+	u64 a0, a1 = len;
+	int wait = 1000;
+	u64 prov_pa;
+	void *prov_buf;
+	int ret;
+
+	prov_buf = pci_alloc_consistent(vdev->pdev, len, &prov_pa);
+	if (!prov_buf)
+		return -ENOMEM;
+
+	memcpy(prov_buf, buf, len);
+
+	a0 = prov_pa;
+
+	ret = vnic_dev_cmd(vdev, CMD_INIT_PROV_INFO, &a0, &a1, wait);
+
+	pci_free_consistent(vdev->pdev, len, prov_buf, prov_pa);
+
+	return ret;
+}
+
+int vnic_dev_logical_uplink(struct vnic_dev *vdev, int enable)
+{
+	u64 a0 = enable, a1 = 0;
+	int wait = 1000;
+
+	return vnic_dev_cmd(vdev, CMD_LOGICAL_UPLINK, &a0, &a1, wait);
+}
+
+int vnic_dev_deinit(struct vnic_dev *vdev)
+{
+	u64 a0 = 0, a1 = 0;
+	int wait = 1000;
+
+	return vnic_dev_cmd(vdev, CMD_DEINIT, &a0, &a1, wait);
+}
+
 int vnic_dev_link_status(struct vnic_dev *vdev)
 {
 	if (vdev->linkstatus)
diff --git a/drivers/net/enic/vnic_dev.h b/drivers/net/enic/vnic_dev.h
index f5be640..bd40045 100644
--- a/drivers/net/enic/vnic_dev.h
+++ b/drivers/net/enic/vnic_dev.h
@@ -124,6 +124,10 @@ int vnic_dev_disable(struct vnic_dev *vdev);
 int vnic_dev_open(struct vnic_dev *vdev, int arg);
 int vnic_dev_open_done(struct vnic_dev *vdev, int *done);
 int vnic_dev_init(struct vnic_dev *vdev, int arg);
+int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err);
+int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len);
+int vnic_dev_logical_uplink(struct vnic_dev *vdev, int enable);
+int vnic_dev_deinit(struct vnic_dev *vdev);
 int vnic_dev_soft_reset(struct vnic_dev *vdev, int arg);
 int vnic_dev_soft_reset_done(struct vnic_dev *vdev, int *done);
 void vnic_dev_set_intr_mode(struct vnic_dev *vdev,
diff --git a/drivers/net/enic/vnic_vic.c b/drivers/net/enic/vnic_vic.c
new file mode 100644
index 0000000..d769772
--- /dev/null
+++ b/drivers/net/enic/vnic_vic.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+
+#include "vnic_vic.h"
+
+struct vic_provinfo *vic_provinfo_alloc(gfp_t flags, u8 *oui, u8 type)
+{
+	struct vic_provinfo *vp = kzalloc(VIC_PROVINFO_MAX_DATA, flags);
+
+	if (!vp || !oui)
+		return NULL;
+
+	memcpy(vp->oui, oui, sizeof(vp->oui));
+	vp->type = type;
+	vp->length = htonl(sizeof(vp->num_tlvs));
+
+	return vp;
+}
+
+void vic_provinfo_free(struct vic_provinfo *vp)
+{
+	kfree(vp);
+}
+
+int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
+	void *value)
+{
+	struct vic_provinfo_tlv *tlv;
+
+	if (!vp || !value)
+		return -EINVAL;
+
+	if (ntohl(vp->length) + sizeof(*tlv) + length >
+		VIC_PROVINFO_MAX_TLV_DATA)
+		return -ENOMEM;
+
+	tlv = (struct vic_provinfo_tlv *)((u8 *)vp->tlv +
+		ntohl(vp->length) - sizeof(vp->num_tlvs));
+
+	tlv->type = htons(type);
+	tlv->length = htons(length);
+	memcpy(tlv->value, value, length);
+
+	vp->num_tlvs = htonl(ntohl(vp->num_tlvs) + 1);
+	vp->length = htonl(ntohl(vp->length) + sizeof(*tlv) + length);
+
+	return 0;
+}
+
+size_t vic_provinfo_size(struct vic_provinfo *vp)
+{
+	return vp ?  ntohl(vp->length) + sizeof(*vp) - sizeof(vp->num_tlvs) : 0;
+}
diff --git a/drivers/net/enic/vnic_vic.h b/drivers/net/enic/vnic_vic.h
new file mode 100644
index 0000000..a7899fb
--- /dev/null
+++ b/drivers/net/enic/vnic_vic.h
@@ -0,0 +1,58 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef _VNIC_VIC_H_
+#define _VNIC_VIC_H_
+
+/* Note: All integer fields in NETWORK byte order */
+
+/* Note: String field lengths include null char */
+
+#define VIC_PROVINFO_CISCO_OUI		{ 0x00, 0x00, 0x0c }
+#define VIC_PROVINFO_LINUX_TYPE		0x2
+
+enum vic_linux_prov_tlv_type {
+	VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR = 0,
+	VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR = 1,			/* u8[6] */
+	VIC_LINUX_PROV_TLV_CLIENT_NAME_STR = 2,
+	VIC_LINUX_PROV_TLV_HOST_UUID_STR = 8,
+};
+
+struct vic_provinfo {
+	u8 oui[3];		/* OUI of data provider */
+	u8 type;		/* provider-specific type */
+	u32 length;		/* length of data below */
+	u32 num_tlvs;		/* number of tlvs */
+	struct vic_provinfo_tlv {
+		u16 type;
+		u16 length;
+		u8 value[0];
+	} tlv[0];
+} __attribute__ ((packed));
+
+#define VIC_PROVINFO_MAX_DATA		1385
+#define VIC_PROVINFO_MAX_TLV_DATA (VIC_PROVINFO_MAX_DATA - \
+	sizeof(struct vic_provinfo))
+
+struct vic_provinfo *vic_provinfo_alloc(gfp_t flags, u8 *oui, u8 type);
+void vic_provinfo_free(struct vic_provinfo *vp);
+int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
+	void *value);
+size_t vic_provinfo_size(struct vic_provinfo *vp);
+
+#endif	/* _VNIC_VIC_H_ */


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 0/2] iovnl netlink ops + enic dynamic vnics
  2010-04-19 19:18 [net-next PATCH 0/2] iovnl netlink ops + enic dynamic vnics Scott Feldman
  2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
  2010-04-19 19:18 ` [net-next PATCH 2/2] add enic iovnl ops for dynamic vnics Scott Feldman
@ 2010-04-19 21:35 ` Chris Wright
  2 siblings, 0 replies; 53+ messages in thread
From: Chris Wright @ 2010-04-19 21:35 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw

* Scott Feldman (scofeldm@cisco.com) wrote:
> Patch 1/2 adds new I/O Virtualization netlink ops:
> 
>   IOV netlink (IOVNL) adds I/O Virtualization control support to a master
>   device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
>   control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
>   design allows for the case where master and slave are the
>   same netdev interface.
> 
>   The ops currently defined are:
>   	set_mac_vlan: set mac+vlan on VF
> 	set_port_profile: set port-profile on VF
> 	unset_port_profile: unset port-profile on VF
> 
> Patch 2/2 adds IOV netlink ops support to enic dynamic vnics:
> 
>   Add enic iovnl ops to support setting port-profile for dynamic vnics.  Enic
>   dynamic vnics are just like normal enic eth vnics except dynamic vnics require
>   an extra configuration step to assign a port-profile identifier to the
>   interface before the interface is useable. Once assigned, link comes up
>   on the interface and is ready for I/O.  The port-profile is used to configure
>   the network port assigned to the interface.  The network port configuration
>   includes VLAN membership, QoS policies, and port security settings typical
>   of a data center network.

Note that while these just enable enic, the framework should be generic
enough for other drivers and can deliver all the way to userspace daemon
for the case of VSI Discovery Protocol.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
@ 2010-04-20 13:48   ` Arnd Bergmann
  2010-04-20 14:34     ` Chris Wright
                       ` (2 more replies)
  2010-04-22  6:48   ` [net-next PATCH 1/2] " David Miller
  2010-04-22  6:52   ` [net-next PATCH 1/2] add iovnl netlink support David Miller
  2 siblings, 3 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-20 13:48 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw

On Monday 19 April 2010, Scott Feldman wrote:

> IOV netlink (IOVNL) adds I/O Virtualization control support to a master
> device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
> control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
> design allows for the case where master and slave are the
> same netdev interface.

What is the reason for controlling the slave device through the master,
rather than talking to the slave directly? The kernel always knows
the master for each slave, so it seems to me that this information
is redundant.

Is this new interface only for the case that you have a switch integrated
in the NIC, or also for the case where you do an LLDP and EDP exchange
with an adjacent bridge and put the device into VEPA mode?

> One control setting example is MAC/VLAN settings for a VF.  Another
> example control setting is a port-profile for a VF.  A port-profile is an
> identifier that defines policy-based settings on the network port
> backing the VF.  The network port settings examples are VLAN membership,
> QoS settings, and L2 security settings, typical of a data center network.
> 
> This patch adds the iovnl interface definitions and an iovnl module.

How does this relate to the existing DCB netlink interface? My feeling
is that there is some overlap in how it would get used, and some parts
that are very distinct. In particular, I'd guess that you'd want to
be able to set DCB parameters for each VF, but not all DCB adapters
would support SR-IOV.

Did you consider making this code an extension to the DCB interface
instead of a separate one? What was the reason for your decision
to keep it separate?

Also, do you expect your interface to be supported by dcbd/lldpad,
or is there a good reason to create a new tool for iovnl?

> + * @IOV_ATTR_IFNAME: interface name of master (PF) net device (NLA_NUL_STRING)
> + * @IOV_ATTR_VF_IFNAME: interface name of target VF device (NLA_NUL_STRING)

As mentioned above, why not drop one of these, and just pass the VF's IFNAME?

> + * @IOV_ATTR_PORT_PROFILE: port-profile name to assign to device
> + *   (NLA_NUL_STRING)

How does the definition of the port profile get into the NIC's switch?
Is there any way to list the available port profiles?

> + * @IOV_ATTR_CLIENT_NAME: client name (NLA_NUL_STRING)
> + * @IOV_ATTR_HOST_UUID: host UUID (NLA_NUL_STRING)

Can you elaborate more on what these do? Who is the 'client' and the 'host'
in this case, and why do you need to identify them?

> + * @IOV_ATTR_MAC_ADDR: device station MAC address (NLA_U8[6])

Just one mac address? What happens if we want to assign multiple mac
addresses to the VF later? Also, how is this defined specifically?
Will a SIOCSIFHWADDR with a different MAC address on the VF fail
later, or is this just the default value?

> + * @IOV_ATTR_VLAN: device 8021q VLAN ID (NLA_U16)

Same here: Should you be able to set multiple MAC addresses, or
trunk mode? Can the VF override it?
Also, for the new multi-channel VEPA, I'd guess that you also need
to supply an 802.1ad S-VLAN ID.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
@ 2010-04-20 14:34     ` Chris Wright
  2010-04-20 14:57       ` Arnd Bergmann
  2010-04-20 19:56     ` Scott Feldman
  2010-04-21 21:22     ` Arnd Bergmann
  2 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-20 14:34 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Scott Feldman, davem, netdev, chrisw

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Monday 19 April 2010, Scott Feldman wrote:
> 
> > IOV netlink (IOVNL) adds I/O Virtualization control support to a master
> > device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
> > control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
> > design allows for the case where master and slave are the
> > same netdev interface.
> 
> What is the reason for controlling the slave device through the master,
> rather than talking to the slave directly? The kernel always knows
> the master for each slave, so it seems to me that this information
> is redundant.

Not all devices have this relationship explicit (i.e. not all are pure
sr-iov devices).  If there's always a way to discover the master from
the device, then I agree we only need the slave.

> Is this new interface only for the case that you have a switch integrated
> in the NIC, or also for the case where you do an LLDP and EDP exchange
> with an adjacent bridge and put the device into VEPA mode?

It should be useful for both.  That's part of the reason for using
netlink, a userspace daemon running the VDP state machine (like lldpad)
can listen for these messages and see a set_port_profile request when
the user starts up a VM.

> > One control setting example is MAC/VLAN settings for a VF.  Another
> > example control setting is a port-profile for a VF.  A port-profile is an
> > identifier that defines policy-based settings on the network port
> > backing the VF.  The network port settings examples are VLAN membership,
> > QoS settings, and L2 security settings, typical of a data center network.
> > 
> > This patch adds the iovnl interface definitions and an iovnl module.
> 
> How does this relate to the existing DCB netlink interface? My feeling
> is that there is some overlap in how it would get used, and some parts
> that are very distinct. In particular, I'd guess that you'd want to
> be able to set DCB parameters for each VF, but not all DCB adapters
> would support SR-IOV.
> 
> Did you consider making this code an extension to the DCB interface
> instead of a separate one? What was the reason for your decision
> to keep it separate?

Well, aside from the fact that DCB and VDP have some low level
similarities in the PDU and they are both communication between
the host and the switch, they are doing different things.

> Also, do you expect your interface to be supported by dcbd/lldpad,
> or is there a good reason to create a new tool for iovnl?

lldpad would listen, I don't see why iproute2 couldn't send, and libvirt
will send as well.

> > + * @IOV_ATTR_IFNAME: interface name of master (PF) net device (NLA_NUL_STRING)
> > + * @IOV_ATTR_VF_IFNAME: interface name of target VF device (NLA_NUL_STRING)
> 
> As mentioned above, why not drop one of these, and just pass the VF's IFNAME?
> 
> > + * @IOV_ATTR_PORT_PROFILE: port-profile name to assign to device
> > + *   (NLA_NUL_STRING)
> 
> How does the definition of the port profile get into the NIC's switch?
> Is there any way to list the available port profiles?

The port profile is a concept external to the NIC's switch.  It's a value
that exists in the external physical layer 2 switching infrastructure.
So an admin knows this value and is informing the adjacent switch that a
new virutal interface is coming up and needs some particular port profile.

> > + * @IOV_ATTR_CLIENT_NAME: client name (NLA_NUL_STRING)
> > + * @IOV_ATTR_HOST_UUID: host UUID (NLA_NUL_STRING)
> 
> Can you elaborate more on what these do? Who is the 'client' and the 'host'
> in this case, and why do you need to identify them?
> 
> > + * @IOV_ATTR_MAC_ADDR: device station MAC address (NLA_U8[6])
> 
> Just one mac address? What happens if we want to assign multiple mac
> addresses to the VF later? Also, how is this defined specifically?
> Will a SIOCSIFHWADDR with a different MAC address on the VF fail
> later, or is this just the default value?
> 
> > + * @IOV_ATTR_VLAN: device 8021q VLAN ID (NLA_U16)
> 
> Same here: Should you be able to set multiple MAC addresses, or
> trunk mode? Can the VF override it?
> Also, for the new multi-channel VEPA, I'd guess that you also need
> to supply an 802.1ad S-VLAN ID.

Something like set_port_profile() would initiate the negotiation for the
s-vlan id for a particular channel, not sure it's needed as part of the
netlink interface or not.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 14:34     ` Chris Wright
@ 2010-04-20 14:57       ` Arnd Bergmann
  2010-04-20 15:22         ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-20 14:57 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev

On Tuesday 20 April 2010, Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Monday 19 April 2010, Scott Feldman wrote:
> > 
> > What is the reason for controlling the slave device through the master,
> > rather than talking to the slave directly? The kernel always knows
> > the master for each slave, so it seems to me that this information
> > is redundant.
> 
> Not all devices have this relationship explicit (i.e. not all are pure
> sr-iov devices).  If there's always a way to discover the master from
> the device, then I agree we only need the slave.

Hmm, is there an actual example of a card where the relationship is not
known to the kernel?

> > Is this new interface only for the case that you have a switch integrated
> > in the NIC, or also for the case where you do an LLDP and EDP exchange
> > with an adjacent bridge and put the device into VEPA mode?
> 
> It should be useful for both.  That's part of the reason for using
> netlink, a userspace daemon running the VDP state machine (like lldpad)
> can listen for these messages and see a set_port_profile request when
> the user starts up a VM.

After thinking some more about this case, I now believe we should do
it the other way around, and have lldpad in control of this interface
from the user space side, and letting user programs (lldptool, libvirt,
...) talk to lldpad in order to set it up.
 
> > Also, do you expect your interface to be supported by dcbd/lldpad,
> > or is there a good reason to create a new tool for iovnl?
> 
> lldpad would listen, I don't see why iproute2 couldn't send, and libvirt
> will send as well.

Not sure. We need lldpad to do this exchange for the case of VEPA with
VDP, so always using lldpad would let us unify the user interface for
both cases. We can of course have iproute2 talk to lldpad, in the
same way that libvirt does.

> > > + * @IOV_ATTR_PORT_PROFILE: port-profile name to assign to device
> > > + *   (NLA_NUL_STRING)
> > 
> > How does the definition of the port profile get into the NIC's switch?
> > Is there any way to list the available port profiles?
> 
> The port profile is a concept external to the NIC's switch.  It's a value
> that exists in the external physical layer 2 switching infrastructure.
> So an admin knows this value and is informing the adjacent switch that a
> new virutal interface is coming up and needs some particular port profile.

But that's only the case if the NIC itself is in VEPA mode. If that
were the case, there would be no need for a kernel interface at all,
because then we could just drive the port profile selection from user
space.

The proposed interface only seems to make sense if you use it to
configure the NIC itself! Why should it care about the port profile
otherwise?

> > Same here: Should you be able to set multiple MAC addresses, or
> > trunk mode? Can the VF override it?
> > Also, for the new multi-channel VEPA, I'd guess that you also need
> > to supply an 802.1ad S-VLAN ID.
> 
> Something like set_port_profile() would initiate the negotiation for the
> s-vlan id for a particular channel, not sure it's needed as part of the
> netlink interface or not.

Well, you have to set up the s-vlan ID in order to have something to
set the port profile in.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 14:57       ` Arnd Bergmann
@ 2010-04-20 15:22         ` Chris Wright
  2010-04-20 16:19           ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-20 15:22 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Scott Feldman, davem, netdev

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Tuesday 20 April 2010, Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > On Monday 19 April 2010, Scott Feldman wrote:
> > > 
> > > What is the reason for controlling the slave device through the master,
> > > rather than talking to the slave directly? The kernel always knows
> > > the master for each slave, so it seems to me that this information
> > > is redundant.
> > 
> > Not all devices have this relationship explicit (i.e. not all are pure
> > sr-iov devices).  If there's always a way to discover the master from
> > the device, then I agree we only need the slave.
> 
> Hmm, is there an actual example of a card where the relationship is not
> known to the kernel?
> 
> > > Is this new interface only for the case that you have a switch integrated
> > > in the NIC, or also for the case where you do an LLDP and EDP exchange
> > > with an adjacent bridge and put the device into VEPA mode?
> > 
> > It should be useful for both.  That's part of the reason for using
> > netlink, a userspace daemon running the VDP state machine (like lldpad)
> > can listen for these messages and see a set_port_profile request when
> > the user starts up a VM.
> 
> After thinking some more about this case, I now believe we should do
> it the other way around, and have lldpad in control of this interface
> from the user space side, and letting user programs (lldptool, libvirt,
> ...) talk to lldpad in order to set it up.

lldpad won't be involved in all cases, yet a mgmt tool like libvirt will.
so this seems backwards.

> > > Also, do you expect your interface to be supported by dcbd/lldpad,
> > > or is there a good reason to create a new tool for iovnl?
> > 
> > lldpad would listen, I don't see why iproute2 couldn't send, and libvirt
> > will send as well.
> 
> Not sure. We need lldpad to do this exchange for the case of VEPA with
> VDP, so always using lldpad would let us unify the user interface for
> both cases. We can of course have iproute2 talk to lldpad, in the
> same way that libvirt does.
> 
> > > > + * @IOV_ATTR_PORT_PROFILE: port-profile name to assign to device
> > > > + *   (NLA_NUL_STRING)
> > > 
> > > How does the definition of the port profile get into the NIC's switch?
> > > Is there any way to list the available port profiles?
> > 
> > The port profile is a concept external to the NIC's switch.  It's a value
> > that exists in the external physical layer 2 switching infrastructure.
> > So an admin knows this value and is informing the adjacent switch that a
> > new virutal interface is coming up and needs some particular port profile.
> 
> But that's only the case if the NIC itself is in VEPA mode. If that
> were the case, there would be no need for a kernel interface at all,
> because then we could just drive the port profile selection from user
> space.
> 
> The proposed interface only seems to make sense if you use it to
> configure the NIC itself! Why should it care about the port profile
> otherwise?

In the case of devices that can do adjacent switch negotiations directly.

> > > Same here: Should you be able to set multiple MAC addresses, or
> > > trunk mode? Can the VF override it?
> > > Also, for the new multi-channel VEPA, I'd guess that you also need
> > > to supply an 802.1ad S-VLAN ID.
> > 
> > Something like set_port_profile() would initiate the negotiation for the
> > s-vlan id for a particular channel, not sure it's needed as part of the
> > netlink interface or not.
> 
> Well, you have to set up the s-vlan ID in order to have something to
> set the port profile in.

Right, depends if the use the port profile to establish the channel and
negotiate the s-vlan ID.  I don't recall the order there.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 15:22         ` Chris Wright
@ 2010-04-20 16:19           ` Arnd Bergmann
  2010-04-20 20:26             ` Scott Feldman
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-20 16:19 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev

On Tuesday 20 April 2010, Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Tuesday 20 April 2010, Chris Wright wrote:
> >
> > After thinking some more about this case, I now believe we should do
> > it the other way around, and have lldpad in control of this interface
> > from the user space side, and letting user programs (lldptool, libvirt,
> > ...) talk to lldpad in order to set it up.
> 
> lldpad won't be involved in all cases, yet a mgmt tool like libvirt will.
> so this seems backwards.

Well, that part is still the matter of this discussion, as far as I can tell ;-)

> > But that's only the case if the NIC itself is in VEPA mode. If that
> > were the case, there would be no need for a kernel interface at all,
> > because then we could just drive the port profile selection from user
> > space.
> > 
> > The proposed interface only seems to make sense if you use it to
> > configure the NIC itself! Why should it care about the port profile
> > otherwise?
> 
> In the case of devices that can do adjacent switch negotiations directly.

I thought the idea to deal with those devices was to beat sense into
the respective developers until they do the negotiation in software 8-)

> > > > Same here: Should you be able to set multiple MAC addresses, or
> > > > trunk mode? Can the VF override it?
> > > > Also, for the new multi-channel VEPA, I'd guess that you also need
> > > > to supply an 802.1ad S-VLAN ID.
> > > 
> > > Something like set_port_profile() would initiate the negotiation for the
> > > s-vlan id for a particular channel, not sure it's needed as part of the
> > > netlink interface or not.
> > 
> > Well, you have to set up the s-vlan ID in order to have something to
> > set the port profile in.
> 
> Right, depends if the use the port profile to establish the channel and
> negotiate the s-vlan ID.  I don't recall the order there.

I'm pretty sure that setting up the channel (for 802.1bg) is done before
any port profile comes in.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
  2010-04-20 14:34     ` Chris Wright
@ 2010-04-20 19:56     ` Scott Feldman
  2010-04-21 11:26       ` Arnd Bergmann
  2010-04-21 21:22     ` Arnd Bergmann
  2 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-20 19:56 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: davem, netdev, chrisw

On 4/20/10 6:48 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Monday 19 April 2010, Scott Feldman wrote:
> 
>> IOV netlink (IOVNL) adds I/O Virtualization control support to a master
>> device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
>> control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
>> design allows for the case where master and slave are the
>> same netdev interface.
> 
> What is the reason for controlling the slave device through the master,
> rather than talking to the slave directly? The kernel always knows
> the master for each slave, so it seems to me that this information
> is redundant.

The interface would allow talking to the slave directly. In fact, that's the
example with enic port-profile in patch 2/2.  But, it would be nice not to
rule out the case where the master proxies slave control and the master is
under exclusively controlled by hypervisor.
 
> Is this new interface only for the case that you have a switch integrated
> in the NIC, or also for the case where you do an LLDP and EDP exchange
> with an adjacent bridge and put the device into VEPA mode?

All of the above.  Basing this on netlink give us flexibility to work with
user-space mgmt tools or directly with kernel netdev as in the enic case.
Not trying to make assumptions about where (user-space, kernel) and by which
entity sources or sinks the netlink msg.
 
>> One control setting example is MAC/VLAN settings for a VF.  Another
>> example control setting is a port-profile for a VF.  A port-profile is an
>> identifier that defines policy-based settings on the network port
>> backing the VF.  The network port settings examples are VLAN membership,
>> QoS settings, and L2 security settings, typical of a data center network.
>> 
>> This patch adds the iovnl interface definitions and an iovnl module.
> 
> How does this relate to the existing DCB netlink interface? My feeling
> is that there is some overlap in how it would get used, and some parts
> that are very distinct. In particular, I'd guess that you'd want to
> be able to set DCB parameters for each VF, but not all DCB adapters
> would support SR-IOV.
>
> Did you consider making this code an extension to the DCB interface
> instead of a separate one? What was the reason for your decision
> to keep it separate?

Considered it but DCB interface is well defined for DCB and it didn't seem
right gluing on interfaces not specified within DCB.  I agree that there is
some overlap in the sense that both interface are used to configure a netdev
with some properties interesting for the data center, but the DCB interface
is for local setting of the properties on the host whereas iovnl is about
pushing the setting of those properties to the network for policy-based
control.
 
> Also, do you expect your interface to be supported by dcbd/lldpad,
> or is there a good reason to create a new tool for iovnl?

Lldpad supporting this interface would seem right, for those cases where
lldpad is responsible for configuring the netdev.
 
>> + * @IOV_ATTR_CLIENT_NAME: client name (NLA_NUL_STRING)
>> + * @IOV_ATTR_HOST_UUID: host UUID (NLA_NUL_STRING)
> 
> Can you elaborate more on what these do? Who is the 'client' and the 'host'
> in this case, and why do you need to identify them?

Those are optional and useful, for example, by the network mgmt tool for
presenting a view such as:

    - blade 1/2                     // know by host uuid
        - vm-rhel5-eth0             // client name
            - port-profile: xyz

Something like that.
 
>> + * @IOV_ATTR_MAC_ADDR: device station MAC address (NLA_U8[6])
> 
> Just one mac address? What happens if we want to assign multiple mac
> addresses to the VF later? Also, how is this defined specifically?
> Will a SIOCSIFHWADDR with a different MAC address on the VF fail
> later, or is this just the default value?

Depends on how the VF wants to handle this.  For our use-case with enic we
only need the port-profile op so I'm not sure what the best design is for
mac+vlan on a VF.  Looking for advise from folks like yourself.  If it's not
needed, let's scratch it.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 16:19           ` Arnd Bergmann
@ 2010-04-20 20:26             ` Scott Feldman
  2010-04-21 13:17               ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-20 20:26 UTC (permalink / raw)
  To: Arnd Bergmann, Chris Wright; +Cc: davem, netdev

On 4/20/10 9:19 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:

>>> But that's only the case if the NIC itself is in VEPA mode. If that
>>> were the case, there would be no need for a kernel interface at all,
>>> because then we could just drive the port profile selection from user
>>> space.
>>> 
>>> The proposed interface only seems to make sense if you use it to
>>> configure the NIC itself! Why should it care about the port profile
>>> otherwise?
>> 
>> In the case of devices that can do adjacent switch negotiations directly.
> 
> I thought the idea to deal with those devices was to beat sense into
> the respective developers until they do the negotiation in software 8-)

When the device can do the negotiation directly with the switch, why does it
make sense to bypass that and use software on the host?  I don't think we'd
want to give up on link speed/duplex auto-negotiation and punt those setting
back to the user/host like in the old days.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 19:56     ` Scott Feldman
@ 2010-04-21 11:26       ` Arnd Bergmann
  2010-04-21 11:42         ` Selective MD5 Checksum Failuers Bijay Singh
  2010-04-21 16:18         ` [net-next,1/2] add iovnl netlink support Chris Wright
  0 siblings, 2 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 11:26 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw

On Tuesday 20 April 2010, Scott Feldman wrote:
> On 4/20/10 6:48 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> > On Monday 19 April 2010, Scott Feldman wrote:
> > 
> >> IOV netlink (IOVNL) adds I/O Virtualization control support to a master
> >> device (MD) netdev interface.  The MD (e.g. SR-IOV PF) will set/get
> >> control settings on behalf of a slave netdevice (e.g. SR-IOV VF).  The
> >> design allows for the case where master and slave are the
> >> same netdev interface.
> > 
> > What is the reason for controlling the slave device through the master,
> > rather than talking to the slave directly? The kernel always knows
> > the master for each slave, so it seems to me that this information
> > is redundant.
> 
> The interface would allow talking to the slave directly. In fact, that's the
> example with enic port-profile in patch 2/2.  But, it would be nice not to
> rule out the case where the master proxies slave control and the master is
> under exclusively controlled by hypervisor.

Not sure I understand. Do you mean the case where this code runs in
the hypervisor (e.g. KVM), or a different scerario with the setup being
done in a guest driver?

So far, I have assumed that we would always do the setup on the host
side, which always has access to both the master, and a slave proxy.
In particular, your interface requires access to the slave AFAICT,
because otherwise the VF IFNAME does not have any significance.

Take the case where you use network namespaces and put the VF into
a separate namespace. With your interface, the PF is still in the
root namespace, but passing both interface names in this interface
won't help you because they are never visible in the same namespace
(e.g. both might be named eth0 in their respective containers).

> > Is this new interface only for the case that you have a switch integrated
> > in the NIC, or also for the case where you do an LLDP and EDP exchange
> > with an adjacent bridge and put the device into VEPA mode?
> 
> All of the above.  Basing this on netlink give us flexibility to work with
> user-space mgmt tools or directly with kernel netdev as in the enic case.
> Not trying to make assumptions about where (user-space, kernel) and by which
> entity sources or sinks the netlink msg.

ok.

> > Did you consider making this code an extension to the DCB interface
> > instead of a separate one? What was the reason for your decision
> > to keep it separate?
> 
> Considered it but DCB interface is well defined for DCB and it didn't seem
> right gluing on interfaces not specified within DCB.  I agree that there is
> some overlap in the sense that both interface are used to configure a netdev
> with some properties interesting for the data center, but the DCB interface
> is for local setting of the properties on the host whereas iovnl is about
> pushing the setting of those properties to the network for policy-based
> control.
>
> > Also, do you expect your interface to be supported by dcbd/lldpad,
> > or is there a good reason to create a new tool for iovnl?
> 
> Lldpad supporting this interface would seem right, for those cases where
> lldpad is responsible for configuring the netdev.

I believe we meant different things here, because I misunderstood the
intention of the code. My question was whether lldpad would send the
netlink messages to iovnl, but from what you and Chris write, the
real idea was that both lldpad and kernel/iovnl can receive the
same messages, right?
 
> >> + * @IOV_ATTR_CLIENT_NAME: client name (NLA_NUL_STRING)
> >> + * @IOV_ATTR_HOST_UUID: host UUID (NLA_NUL_STRING)
> > 
> > Can you elaborate more on what these do? Who is the 'client' and the 'host'
> > in this case, and why do you need to identify them?
> 
> Those are optional and useful, for example, by the network mgmt tool for
> presenting a view such as:
> 
>     - blade 1/2                     // know by host uuid
>         - vm-rhel5-eth0             // client name
>             - port-profile: xyz
> 
> Something like that.

Hmm, but how do they get from the device driver to the the network
management tool then? Also, these are similar to the attributes
that are passed in the 802.1Qbg VDP protocol, but not compatible.
If the idea is use the same netlink protocol for both your internal
representation and for the standard based protocol, I think we should
make them compatible.

Instead of a string identifying the port profile, this needs to pass
a four byte field for a VSI type (3 bytes) and VSI manager ID (1 byte).

There is also a UUID in VDP, but it identifies the guest, not the host,
so this is really confusing.

VDP also needs a list of MAC addresses and VLAN IDs (normally only
one of each), but that would be separate from what you tell the adapter,
see below: 

> >> + * @IOV_ATTR_MAC_ADDR: device station MAC address (NLA_U8[6])
> > 
> > Just one mac address? What happens if we want to assign multiple mac
> > addresses to the VF later? Also, how is this defined specifically?
> > Will a SIOCSIFHWADDR with a different MAC address on the VF fail
> > later, or is this just the default value?
> 
> Depends on how the VF wants to handle this.  For our use-case with enic we
> only need the port-profile op so I'm not sure what the best design is for
> mac+vlan on a VF.  Looking for advise from folks like yourself.  If it's not
> needed, let's scratch it.

In order to make VEPA work, it's absolutely required to impose a hard limit
on what MAC+VLAN IDs are visible to the VF, because the switch identifies
the guest by those and forwards any frames to/from that address according
to the VSI type.

However, I feel that we should strictly separate the steps of configuring
the adapter from talking to the switch. When we do the VDP association
in user land, we still need to set up the VLAN and MAC configuration for
the VF through a kernel interface. If we ignore the port profile stuff
for a moment, your netlink interface looks like a good fit for that.

Since it seems what you really want to do is to do the exchange with the
switch from here, maybe the hardware configuration part should be moved
the DCB interface?

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Selective MD5 Checksum Failuers
  2010-04-21 11:26       ` Arnd Bergmann
@ 2010-04-21 11:42         ` Bijay Singh
  2010-04-21 16:18         ` [net-next,1/2] add iovnl netlink support Chris Wright
  1 sibling, 0 replies; 53+ messages in thread
From: Bijay Singh @ 2010-04-21 11:42 UTC (permalink / raw)
  To: netdev; +Cc: chrisw, davem

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]


I am noticing a strange and a troublesome behavior with tcp md5 checksums. Some selective packets are going out with invalid md5 checksums.

The only thing that is changing is the ack number (between the packets with valid and invalid md5 checksums), so while most packets have correct md5 checksums few 1 in 1000s have md5 checksums errors.

I am on 2.6.26 and I know that there have been significant changes since this version in this area. I have gone thru them but none of issues they address seem like the cause for this problem.

I have the scatter/gather and tcp segmentation disabled in the card.

The packet captures are attached.

Bijay

[-- Attachment #2: md5sum-2010-04-16.pcap1-754478487 --]
[-- Type: application/octet-stream, Size: 126 bytes --]

[-- Attachment #3: md5sum-2010-04-16.pcap1-754479689 --]
[-- Type: application/octet-stream, Size: 126 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 20:26             ` Scott Feldman
@ 2010-04-21 13:17               ` Arnd Bergmann
  2010-04-21 16:28                 ` Scott Feldman
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 13:17 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Chris Wright, davem, netdev

On Tuesday 20 April 2010, Scott Feldman wrote:
> On 4/20/10 9:19 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>
> >> In the case of devices that can do adjacent switch negotiations directly.
> > 
> > I thought the idea to deal with those devices was to beat sense into
> > the respective developers until they do the negotiation in software 8-)
> 
> When the device can do the negotiation directly with the switch, why does it
> make sense to bypass that and use software on the host?  I don't think we'd
> want to give up on link speed/duplex auto-negotiation and punt those setting
> back to the user/host like in the old days.

For the link negotiation, the card is the right place because it's necessary
to get the link working before the OS can talk to the switch.
For VDP, that's different because the hypervisor needs to talk to the switch
before the guest can communicate, so there is no interdependency.

More importantly, the card cannot possibly do the protocol by itself,
because the information that gets exchanged is specific to the hypervisor and
the guest, not to the hardware. What you have implemented is another protocol
between the hypervisor and the NIC that exchanges the exact same data that
then gets sent to the switch. We already need to have an implementation that
sends this data to the switch from user space for all cards that don't do
it in firmware, so doing an alternative path in the adapter really creates
more work for the users, and means that when we fix bugs or add features
to the common code, you don't get them ;-).

	Arnd


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 11:26       ` Arnd Bergmann
  2010-04-21 11:42         ` Selective MD5 Checksum Failuers Bijay Singh
@ 2010-04-21 16:18         ` Chris Wright
  2010-04-21 17:52           ` Arnd Bergmann
  1 sibling, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-21 16:18 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Scott Feldman, davem, netdev, chrisw

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Tuesday 20 April 2010, Scott Feldman wrote:
> > On 4/20/10 6:48 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > > Also, do you expect your interface to be supported by dcbd/lldpad,
> > > or is there a good reason to create a new tool for iovnl?
> > 
> > Lldpad supporting this interface would seem right, for those cases where
> > lldpad is responsible for configuring the netdev.
> 
> I believe we meant different things here, because I misunderstood the
> intention of the code. My question was whether lldpad would send the
> netlink messages to iovnl, but from what you and Chris write, the
> real idea was that both lldpad and kernel/iovnl can receive the
> same messages, right?

Correct.  An example set of steps for initiating host to switch
negotiation and subsequently launching a VM would be (expect user below
to be a mgmt tool like libvirt):

1) user sends netlink message w/ relevant host interface and port profile id
2) recipient picks this up (enic, lldpad, whatever)
3) recipient does negotiation w/ adjacent switch
4) user creates macvtap associated w/ relevant host interface
5) user launches guest

> > >> + * @IOV_ATTR_CLIENT_NAME: client name (NLA_NUL_STRING)
> > >> + * @IOV_ATTR_HOST_UUID: host UUID (NLA_NUL_STRING)
> > > 
> > > Can you elaborate more on what these do? Who is the 'client' and the 'host'
> > > in this case, and why do you need to identify them?
> > 
> > Those are optional and useful, for example, by the network mgmt tool for
> > presenting a view such as:
> > 
> >     - blade 1/2                     // know by host uuid
> >         - vm-rhel5-eth0             // client name
> >             - port-profile: xyz
> > 
> > Something like that.
> 
> Hmm, but how do they get from the device driver to the the network
> management tool then? Also, these are similar to the attributes
> that are passed in the 802.1Qbg VDP protocol, but not compatible.
> If the idea is use the same netlink protocol for both your internal
> representation and for the standard based protocol, I think we should
> make them compatible.

Indeed, that's my expectation.

> Instead of a string identifying the port profile, this needs to pass
> a four byte field for a VSI type (3 bytes) and VSI manager ID (1 byte).

I think we just need a u8 array, 4 bytes for VDP, some maxlen that is
at least as large as enic expects.

> There is also a UUID in VDP, but it identifies the guest, not the host,
> so this is really confusing.

Yes, I had same confusion.  I expected guest, enic wants to send host as
well.

> VDP also needs a list of MAC addresses and VLAN IDs (normally only
> one of each), but that would be separate from what you tell the adapter,
> see below: 
> 
> > >> + * @IOV_ATTR_MAC_ADDR: device station MAC address (NLA_U8[6])
> > > 
> > > Just one mac address? What happens if we want to assign multiple mac
> > > addresses to the VF later? Also, how is this defined specifically?
> > > Will a SIOCSIFHWADDR with a different MAC address on the VF fail
> > > later, or is this just the default value?
> > 
> > Depends on how the VF wants to handle this.  For our use-case with enic we
> > only need the port-profile op so I'm not sure what the best design is for
> > mac+vlan on a VF.  Looking for advise from folks like yourself.  If it's not
> > needed, let's scratch it.
> 
> In order to make VEPA work, it's absolutely required to impose a hard limit
> on what MAC+VLAN IDs are visible to the VF, because the switch identifies
> the guest by those and forwards any frames to/from that address according
> to the VSI type.
> 
> However, I feel that we should strictly separate the steps of configuring
> the adapter from talking to the switch. When we do the VDP association
> in user land, we still need to set up the VLAN and MAC configuration for
> the VF through a kernel interface. If we ignore the port profile stuff
> for a moment, your netlink interface looks like a good fit for that.
> 
> Since it seems what you really want to do is to do the exchange with the
> switch from here, maybe the hardware configuration part should be moved
> the DCB interface?

I suppose this would work  (although it's a bit odd being out of scope
of DCB spec).  I don't expect mgmt app to care about the implementation
specifics of an adapter, so it will always send this and iovnl message
too.  All as part of same setup.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 13:17               ` Arnd Bergmann
@ 2010-04-21 16:28                 ` Scott Feldman
  2010-04-21 18:04                   ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-21 16:28 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, davem, netdev

On 4/21/10 6:17 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Tuesday 20 April 2010, Scott Feldman wrote:
>> On 4/20/10 9:19 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> 
>>>> In the case of devices that can do adjacent switch negotiations directly.
>>> 
>>> I thought the idea to deal with those devices was to beat sense into
>>> the respective developers until they do the negotiation in software 8-)
>> 
>> When the device can do the negotiation directly with the switch, why does it
>> make sense to bypass that and use software on the host?  I don't think we'd
>> want to give up on link speed/duplex auto-negotiation and punt those setting
>> back to the user/host like in the old days.
> 
> For the link negotiation, the card is the right place because it's necessary
> to get the link working before the OS can talk to the switch.
> For VDP, that's different because the hypervisor needs to talk to the switch
> before the guest can communicate, so there is no interdependency.
> 
> More importantly, the card cannot possibly do the protocol by itself,
> because the information that gets exchanged is specific to the hypervisor and
> the guest, not to the hardware. What you have implemented is another protocol
> between the hypervisor and the NIC that exchanges the exact same data that
> then gets sent to the switch. We already need to have an implementation that
> sends this data to the switch from user space for all cards that don't do
> it in firmware, so doing an alternative path in the adapter really creates
> more work for the users, and means that when we fix bugs or add features
> to the common code, you don't get them ;-).

But the point of iovnl was to provide a single mechanism for both types of
adapters (w/ or w/o firmware assist) to exchange this data with the switch,
therefore making the difference in the adapters transparent to the user.  So
I'm missing your point about more work for the users.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 16:18         ` [net-next,1/2] add iovnl netlink support Chris Wright
@ 2010-04-21 17:52           ` Arnd Bergmann
  2010-04-21 18:10             ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 17:52 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev

On Wednesday 21 April 2010, Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Tuesday 20 April 2010, Scott Feldman wrote:
> > I believe we meant different things here, because I misunderstood the
> > intention of the code. My question was whether lldpad would send the
> > netlink messages to iovnl, but from what you and Chris write, the
> > real idea was that both lldpad and kernel/iovnl can receive the
> > same messages, right?
> 
> Correct.  An example set of steps for initiating host to switch
> negotiation and subsequently launching a VM would be (expect user below
> to be a mgmt tool like libvirt):
> 
> 1) user sends netlink message w/ relevant host interface and port profile id
> 2) recipient picks this up (enic, lldpad, whatever)
> 3) recipient does negotiation w/ adjacent switch
> 4) user creates macvtap associated w/ relevant host interface
> 5) user launches guest

I'd move point 4 before 1, but otherwise it makes sense and it would still
work either way.

> > If the idea is use the same netlink protocol for both your internal
> > representation and for the standard based protocol, I think we should
> > make them compatible.
> 
> Indeed, that's my expectation.
>
> [...]
>
> > Instead of a string identifying the port profile, this needs to pass
> > a four byte field for a VSI type (3 bytes) and VSI manager ID (1 byte).
> 
> I think we just need a u8 array, 4 bytes for VDP, some maxlen that is
> at least as large as enic expects.
> 
> > There is also a UUID in VDP, but it identifies the guest, not the host,
> > so this is really confusing.
> 
> Yes, I had same confusion.  I expected guest, enic wants to send host as
> well.

So given all these differences, how compatible can we make them?

With the current definition, most of fields are at least slightly
different. The differences seem to stem mostly from the fact that
Cisco switches use a nonstandard protocol, rather than the difference
between the firmware and userland implementations of the protocol,
and of course we shouldn't confuse the two.

> > In order to make VEPA work, it's absolutely required to impose a hard limit
> > on what MAC+VLAN IDs are visible to the VF, because the switch identifies
> > the guest by those and forwards any frames to/from that address according
> > to the VSI type.
> > 
> > However, I feel that we should strictly separate the steps of configuring
> > the adapter from talking to the switch. When we do the VDP association
> > in user land, we still need to set up the VLAN and MAC configuration for
> > the VF through a kernel interface. If we ignore the port profile stuff
> > for a moment, your netlink interface looks like a good fit for that.
> > 
> > Since it seems what you really want to do is to do the exchange with the
> > switch from here, maybe the hardware configuration part should be moved
> > the DCB interface?
> 
> I suppose this would work  (although it's a bit odd being out of scope
> of DCB spec).

It could be anywhere, it doesn't have to be the DCB interface, but could
be anything ranging from ethtool to iplink I guess. And we should define
it in a way that works for any SR-IOV card, whether it's using Cisco's
protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
none of the above and just provides an internal switch like all the
existing NICs.

> I don't expect mgmt app to care about the implementation
> specifics of an adapter, so it will always send this and iovnl message
> too.  All as part of same setup.

Why? I really see these things as separate. Obviously a management
tool like libvirt would need to do both these things eventually, but
each of them has multiple options that can be combined in various
ways:

1. Setting up the slave device
 a) create an SR-IOV VF to assign to a guest
 b) create a macvtap device to pass to qemu or vhost
 c) attach a tap device to a bridge
 d) create a macvlan device and put it into a container
 e) create a virtual interface for a VMDq adapter

2) Registering the slave with the switch
 a) use Cisco protocol in enic firmware (see patch 2/2)
 b) use standard VDP in lldpad
 c) use reverse-engineered cisco protocol in some user tool for
    non-enic adapters.
 d) use standard VDP in firmware (hopefully this never happens)
 e) do nothing at all (as we do today)

Some of the cases can be treated identically, e.g. 1d) and 1e), or
2a) and 2c), but in general the management app needs to have some
idea of which combination it's going to set up.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 16:28                 ` Scott Feldman
@ 2010-04-21 18:04                   ` Arnd Bergmann
  0 siblings, 0 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 18:04 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Chris Wright, davem, netdev

On Wednesday 21 April 2010, Scott Feldman wrote:
> On 4/21/10 6:17 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > More importantly, the card cannot possibly do the protocol by itself,
> > because the information that gets exchanged is specific to the hypervisor and
> > the guest, not to the hardware. What you have implemented is another protocol
> > between the hypervisor and the NIC that exchanges the exact same data that
> > then gets sent to the switch. We already need to have an implementation that
> > sends this data to the switch from user space for all cards that don't do
> > it in firmware, so doing an alternative path in the adapter really creates
> > more work for the users, and means that when we fix bugs or add features
> > to the common code, you don't get them ;-).
> 
> But the point of iovnl was to provide a single mechanism for both types of
> adapters (w/ or w/o firmware assist) to exchange this data with the switch,
> therefore making the difference in the adapters transparent to the user.  So
> I'm missing your point about more work for the users.

It creates an extra step: Normally we'd simply implement the network protocol
in user space, e.g. in lldpad and have other code use the lldptool command
line interface to start the negotiation.

Now we have a user protocol based on netlink that is about as complex as the
wire protocol itself, at least if you want to implement both the standard
VDP and the Cisco variant, and do all the interesting parts like guest migration
and synchronously waiting for the negotiation to complete.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 17:52           ` Arnd Bergmann
@ 2010-04-21 18:10             ` Chris Wright
  2010-04-21 19:39               ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-21 18:10 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Scott Feldman, davem, netdev

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Wednesday 21 April 2010, Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > Since it seems what you really want to do is to do the exchange with the
> > > switch from here, maybe the hardware configuration part should be moved
> > > the DCB interface?
> > 
> > I suppose this would work  (although it's a bit odd being out of scope
> > of DCB spec).
> 
> It could be anywhere, it doesn't have to be the DCB interface, but could
> be anything ranging from ethtool to iplink I guess. And we should define
> it in a way that works for any SR-IOV card, whether it's using Cisco's
> protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
> none of the above and just provides an internal switch like all the
> existing NICs.

Heh, that's exactly what iovnl does ;-)

> > I don't expect mgmt app to care about the implementation
> > specifics of an adapter, so it will always send this and iovnl message
> > too.  All as part of same setup.
> 
> Why? I really see these things as separate. Obviously a management
> tool like libvirt would need to do both these things eventually, but
> each of them has multiple options that can be combined in various
> ways:
> 
> 1. Setting up the slave device
>  a) create an SR-IOV VF to assign to a guest
>  b) create a macvtap device to pass to qemu or vhost
>  c) attach a tap device to a bridge
>  d) create a macvlan device and put it into a container
>  e) create a virtual interface for a VMDq adapter

OK, but iovnl isn't doing this.

> 2) Registering the slave with the switch
>  a) use Cisco protocol in enic firmware (see patch 2/2)
>  b) use standard VDP in lldpad
>  c) use reverse-engineered cisco protocol in some user tool for
>     non-enic adapters.
>  d) use standard VDP in firmware (hopefully this never happens)
>  e) do nothing at all (as we do today)

And this is the step that is the main purpose of iovnl.

Here's the simplest snippet of libvirt to show this.  It sends
set_port_profile netlink messages and then creates macvtap.  As simple
as it gets...

--- a/src/qemu/qemu_conf.c
+++ b/src/qemu/qemu_conf.c
@@ -1470,6 +1470,11 @@ qemudPhysIfaceConnect(virConnectPtr conn,
         net->model && STREQ(net->model, "virtio"))
         vnet_hdr = 1;
 
+    setPortProfileId(net->data.direct.linkdev,
+                      net->data.direct.mode,
+                      net->data.direct.profileid,
+                      net->mac);
+
     rc = openMacvtapTap(net->ifname, net->mac, linkdev, brmode,
                         &res_ifname, vnet_hdr);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 18:10             ` Chris Wright
@ 2010-04-21 19:39               ` Arnd Bergmann
  2010-04-21 20:25                 ` Scott Feldman
  2010-04-21 22:18                 ` Chris Wright
  0 siblings, 2 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 19:39 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev

On Wednesday 21 April 2010, Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Wednesday 21 April 2010, Chris Wright wrote:
> > > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > > Since it seems what you really want to do is to do the exchange with the
> > > > switch from here, maybe the hardware configuration part should be moved
> > > > the DCB interface?
> > > 
> > > I suppose this would work  (although it's a bit odd being out of scope
> > > of DCB spec).
> > 
> > It could be anywhere, it doesn't have to be the DCB interface, but could
> > be anything ranging from ethtool to iplink I guess. And we should define
> > it in a way that works for any SR-IOV card, whether it's using Cisco's
> > protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
> > none of the above and just provides an internal switch like all the
> > existing NICs.
> 
> Heh, that's exactly what iovnl does ;-)

No, according to what you write below, it's exactly what iovnl does *not* do,
i.e. part 1 in my list.

> > 1. Setting up the slave device
> >  a) create an SR-IOV VF to assign to a guest
> >  b) create a macvtap device to pass to qemu or vhost
> >  c) attach a tap device to a bridge
> >  d) create a macvlan device and put it into a container
> >  e) create a virtual interface for a VMDq adapter
> 
> OK, but iovnl isn't doing this.

The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
as I can tell. Interestingly, this is not actually implemented in
the enic driver in patch 2/2. So if we all agree that this is out of the
scope of iovnl, let's just remove it from the interface and find another
way for it (ethtool, iplink, ..., as listed above).

Note that we still need to pass the MAC address and VLAN ID (or a list
of these) to the external switch, my point is just that this should be
separate from enforcing it in the hypervisor.

> > 2) Registering the slave with the switch
> >  a) use Cisco protocol in enic firmware (see patch 2/2)
> >  b) use standard VDP in lldpad
> >  c) use reverse-engineered cisco protocol in some user tool for
> >     non-enic adapters.
> >  d) use standard VDP in firmware (hopefully this never happens)
> >  e) do nothing at all (as we do today)
> 
> And this is the step that is the main purpose of iovnl.
> 
> Here's the simplest snippet of libvirt to show this.  It sends
> set_port_profile netlink messages and then creates macvtap.  As simple
> as it gets...
> 
> --- a/src/qemu/qemu_conf.c
> +++ b/src/qemu/qemu_conf.c
> @@ -1470,6 +1470,11 @@ qemudPhysIfaceConnect(virConnectPtr conn,
>          net->model && STREQ(net->model, "virtio"))
>          vnet_hdr = 1;
>  
> +    setPortProfileId(net->data.direct.linkdev,
> +                      net->data.direct.mode,
> +                      net->data.direct.profileid,
> +                      net->mac);
> +
>      rc = openMacvtapTap(net->ifname, net->mac, linkdev, brmode,
>                          &res_ifname, vnet_hdr);

Ok. In case of VDP, I guess this needs to be extended with the vlan ID
that has been configured, and possibly with a UUID, because we need to
pass the same one on the target machine if we migrate it.

Alternatively, the setPortProfileId could figure out the MAC address and
VLAN ID from the device itself, but then we don't need to pass either of
them.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 19:39               ` Arnd Bergmann
@ 2010-04-21 20:25                 ` Scott Feldman
  2010-04-21 21:13                   ` Arnd Bergmann
  2010-04-21 22:18                 ` Chris Wright
  1 sibling, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-21 20:25 UTC (permalink / raw)
  To: Arnd Bergmann, Chris Wright; +Cc: davem, netdev

On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

>>> 1. Setting up the slave device
>>>  a) create an SR-IOV VF to assign to a guest
>>>  b) create a macvtap device to pass to qemu or vhost
>>>  c) attach a tap device to a bridge
>>>  d) create a macvlan device and put it into a container
>>>  e) create a virtual interface for a VMDq adapter
>> 
>> OK, but iovnl isn't doing this.
> 
> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> as I can tell. Interestingly, this is not actually implemented in
> the enic driver in patch 2/2. So if we all agree that this is out of the
> scope of iovnl, let's just remove it from the interface and find another
> way for it (ethtool, iplink, ..., as listed above).

You're right, not needed for enic since mac addr is included with
port-profile push and vlan membership is implied by port-profile.  So I put
set_mac_vlan in there basically to elicit feedback.

There really wouldn't be much different between iplink and iovnl since
they're both rtnetlink...seems we should keep IOV-related APIs in one place.
Maybe there are other IOV APIs to add to iovnl in the future like:

    vf <- add_vf(pf)
    del_vf(pf, vf)

Ethtool doesn't seem the right place for this.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 20:25                 ` Scott Feldman
@ 2010-04-21 21:13                   ` Arnd Bergmann
  2010-04-21 22:48                     ` Chris Wright
                                       ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 21:13 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Chris Wright, davem, netdev

On Wednesday 21 April 2010, Scott Feldman wrote:
> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> >>> 1. Setting up the slave device
> >>>  a) create an SR-IOV VF to assign to a guest
> >>>  b) create a macvtap device to pass to qemu or vhost
> >>>  c) attach a tap device to a bridge
> >>>  d) create a macvlan device and put it into a container
> >>>  e) create a virtual interface for a VMDq adapter
> >> 
> >> OK, but iovnl isn't doing this.
> > 
> > The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> > as I can tell. Interestingly, this is not actually implemented in
> > the enic driver in patch 2/2. So if we all agree that this is out of the
> > scope of iovnl, let's just remove it from the interface and find another
> > way for it (ethtool, iplink, ..., as listed above).
> 
> You're right, not needed for enic since mac addr is included with
> port-profile push and vlan membership is implied by port-profile.  So I put
> set_mac_vlan in there basically to elicit feedback.

Ok. Two points though:

- when you say that the mac address is included in the port-profile push,
  does that imply that the VF does not have a mac address prior to this?
  This would again mix the NIC configuration phase with the switch
  association, which I think we really need to avoid if we want to be
  able to implement the association in user space!

- The VLAN ID being implied in the port profile seems to be another
  difference between what enic is doing and the current draft VDP
  that will eventually become 802.1Qbg, and I fear that this difference
  will be visible in the iovnl protocol.

> There really wouldn't be much different between iplink and iovnl since
> they're both rtnetlink...seems we should keep IOV-related APIs in one place.
> Maybe there are other IOV APIs to add to iovnl in the future like:
> 
>     vf <- add_vf(pf)
>     del_vf(pf, vf)
> 
> Ethtool doesn't seem the right place for this.

Right. My preference would probably be make these a subcategory of
the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
This would make it resemble the existing interfaces and mean you can
use

ip link add link eth0 type macvlan    # for a container
ip link add link eth0 type macvtap    # for qemu/vhost
ip link add link eth0 type vf         # for device assignment

There are obviously significant differences between these three, but
they also share enough of their properties to let us treat them
in similar ways.

If we integrate the iovnl client into iproute2, the sequence for setting
up an enic VF and associating it to the port profile could be

# create vf0, pass mac and vlan id to HW, no association yet
ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78

# associate vf with port profile, mac address must match the one assigned
#  to the interface before.
ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
	 mac fe:dc:ba:12:34:56

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
  2010-04-20 14:34     ` Chris Wright
  2010-04-20 19:56     ` Scott Feldman
@ 2010-04-21 21:22     ` Arnd Bergmann
  2 siblings, 0 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-21 21:22 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw

On Tuesday 20 April 2010, Arnd Bergmann wrote:
> > + * @IOV_ATTR_IFNAME: interface name of master (PF) net device (NLA_NUL_STRING)
> > + * @IOV_ATTR_VF_IFNAME: interface name of target VF device (NLA_NUL_STRING)
> 
> As mentioned above, why not drop one of these, and just pass the VF's IFNAME?
> 

Coming back to this point, I now think it would be ideal if we could actually
leave out IOV_ATTR_VF_IFNAME and just pass the master IFNAME and the slave
MAC address. Since we're not actually doing anything with the slave itself
but really talking the switch, it should not be needed at all.

That would solve all problems with the slave having moved to another namespace
already, and make it totally clear that this is not about configuring the
slave but about registering it.

Scott, would that still work with your driver?

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 19:39               ` Arnd Bergmann
  2010-04-21 20:25                 ` Scott Feldman
@ 2010-04-21 22:18                 ` Chris Wright
  2010-04-22  0:01                   ` Scott Feldman
  1 sibling, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-21 22:18 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Scott Feldman, davem, netdev

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Wednesday 21 April 2010, Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > On Wednesday 21 April 2010, Chris Wright wrote:
> > > > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > > > Since it seems what you really want to do is to do the exchange with the
> > > > > switch from here, maybe the hardware configuration part should be moved
> > > > > the DCB interface?
> > > > 
> > > > I suppose this would work  (although it's a bit odd being out of scope
> > > > of DCB spec).
> > > 
> > > It could be anywhere, it doesn't have to be the DCB interface, but could
> > > be anything ranging from ethtool to iplink I guess. And we should define
> > > it in a way that works for any SR-IOV card, whether it's using Cisco's
> > > protocol in firmware, 802.1Qbg VDP in firmware, lldpad to do VDP or
> > > none of the above and just provides an internal switch like all the
> > > existing NICs.
> > 
> > Heh, that's exactly what iovnl does ;-)
> 
> No, according to what you write below, it's exactly what iovnl does *not* do,
> i.e. part 1 in my list.

OK, I see...in this case to me hw setup was the port profile for the
enic to initiate host<->switch negotiation, sorry for confusion.

> > > 1. Setting up the slave device
> > >  a) create an SR-IOV VF to assign to a guest
> > >  b) create a macvtap device to pass to qemu or vhost
> > >  c) attach a tap device to a bridge
> > >  d) create a macvlan device and put it into a container
> > >  e) create a virtual interface for a VMDq adapter
> > 
> > OK, but iovnl isn't doing this.
> 
> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> as I can tell. Interestingly, this is not actually implemented in
> the enic driver in patch 2/2. So if we all agree that this is out of the
> scope of iovnl, let's just remove it from the interface and find another
> way for it (ethtool, iplink, ..., as listed above).

Scott, any objection?  At least a way to keep moving forward on the port
profile bit.

> Note that we still need to pass the MAC address and VLAN ID (or a list
> of these) to the external switch, my point is just that this should be
> separate from enforcing it in the hypervisor.

Yup, we should focus on reconciling the diff of enic vs vpd port profile
needs.

> > > 2) Registering the slave with the switch
> > >  a) use Cisco protocol in enic firmware (see patch 2/2)
> > >  b) use standard VDP in lldpad
> > >  c) use reverse-engineered cisco protocol in some user tool for
> > >     non-enic adapters.
> > >  d) use standard VDP in firmware (hopefully this never happens)
> > >  e) do nothing at all (as we do today)
> > 
> > And this is the step that is the main purpose of iovnl.
> > 
> > Here's the simplest snippet of libvirt to show this.  It sends
> > set_port_profile netlink messages and then creates macvtap.  As simple
> > as it gets...
> > 
> > --- a/src/qemu/qemu_conf.c
> > +++ b/src/qemu/qemu_conf.c
> > @@ -1470,6 +1470,11 @@ qemudPhysIfaceConnect(virConnectPtr conn,
> >          net->model && STREQ(net->model, "virtio"))
> >          vnet_hdr = 1;
> >  
> > +    setPortProfileId(net->data.direct.linkdev,
> > +                      net->data.direct.mode,
> > +                      net->data.direct.profileid,
> > +                      net->mac);
> > +
> >      rc = openMacvtapTap(net->ifname, net->mac, linkdev, brmode,
> >                          &res_ifname, vnet_hdr);
> 
> Ok. In case of VDP, I guess this needs to be extended with the vlan ID
> that has been configured, and possibly with a UUID, because we need to
> pass the same one on the target machine if we migrate it.
> 
> Alternatively, the setPortProfileId could figure out the MAC address and
> VLAN ID from the device itself, but then we don't need to pass either of
> them.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 21:13                   ` Arnd Bergmann
@ 2010-04-21 22:48                     ` Chris Wright
  2010-04-22  6:51                       ` Arnd Bergmann
  2010-04-21 23:54                     ` Scott Feldman
  2010-04-22  7:09                     ` David Miller
  2 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-21 22:48 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Scott Feldman, Chris Wright, davem, netdev

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Wednesday 21 April 2010, Scott Feldman wrote:
> > On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > 
> > >>> 1. Setting up the slave device
> > >>>  a) create an SR-IOV VF to assign to a guest
> > >>>  b) create a macvtap device to pass to qemu or vhost
> > >>>  c) attach a tap device to a bridge
> > >>>  d) create a macvlan device and put it into a container
> > >>>  e) create a virtual interface for a VMDq adapter
> > >> 
> > >> OK, but iovnl isn't doing this.
> > > 
> > > The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
> > > as I can tell. Interestingly, this is not actually implemented in
> > > the enic driver in patch 2/2. So if we all agree that this is out of the
> > > scope of iovnl, let's just remove it from the interface and find another
> > > way for it (ethtool, iplink, ..., as listed above).
> > 
> > You're right, not needed for enic since mac addr is included with
> > port-profile push and vlan membership is implied by port-profile.  So I put
> > set_mac_vlan in there basically to elicit feedback.
> 
> Ok. Two points though:
> 
> - when you say that the mac address is included in the port-profile push,
>   does that imply that the VF does not have a mac address prior to this?
>   This would again mix the NIC configuration phase with the switch
>   association, which I think we really need to avoid if we want to be
>   able to implement the association in user space!
> 
> - The VLAN ID being implied in the port profile seems to be another
>   difference between what enic is doing and the current draft VDP
>   that will eventually become 802.1Qbg, and I fear that this difference
>   will be visible in the iovnl protocol.
> 
> > There really wouldn't be much different between iplink and iovnl since
> > they're both rtnetlink...seems we should keep IOV-related APIs in one place.
> > Maybe there are other IOV APIs to add to iovnl in the future like:
> > 
> >     vf <- add_vf(pf)
> >     del_vf(pf, vf)
> > 
> > Ethtool doesn't seem the right place for this.
> 
> Right. My preference would probably be make these a subcategory of
> the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> This would make it resemble the existing interfaces and mean you can
> use
> 
> ip link add link eth0 type macvlan    # for a container
> ip link add link eth0 type macvtap    # for qemu/vhost
> ip link add link eth0 type vf         # for device assignment

BTW, what do you mean by device assignment?

> There are obviously significant differences between these three, but
> they also share enough of their properties to let us treat them
> in similar ways.
> 
> If we integrate the iovnl client into iproute2, the sequence for setting
> up an enic VF and associating it to the port profile could be
> 
> # create vf0, pass mac and vlan id to HW, no association yet
> ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78

Just to clarify...right now, the normal SR-IOV VF is already there.
And, or course, can have its mac addr/vlan set already.

> # associate vf with port profile, mac address must match the one assigned
> #  to the interface before.
> ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> 	 mac fe:dc:ba:12:34:56

At that point you could just do s/mac fe:.*/link vf0/

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 21:13                   ` Arnd Bergmann
  2010-04-21 22:48                     ` Chris Wright
@ 2010-04-21 23:54                     ` Scott Feldman
  2010-04-22 12:49                       ` Arnd Bergmann
  2010-04-22  7:09                     ` David Miller
  2 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-21 23:54 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, davem, netdev

On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Wednesday 21 April 2010, Scott Feldman wrote:
>> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> 
>>>>> 1. Setting up the slave device
>>>>>  a) create an SR-IOV VF to assign to a guest
>>>>>  b) create a macvtap device to pass to qemu or vhost
>>>>>  c) attach a tap device to a bridge
>>>>>  d) create a macvlan device and put it into a container
>>>>>  e) create a virtual interface for a VMDq adapter
>>>> 
>>>> OK, but iovnl isn't doing this.
>>> 
>>> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
>>> as I can tell. Interestingly, this is not actually implemented in
>>> the enic driver in patch 2/2. So if we all agree that this is out of the
>>> scope of iovnl, let's just remove it from the interface and find another
>>> way for it (ethtool, iplink, ..., as listed above).
>> 
>> You're right, not needed for enic since mac addr is included with
>> port-profile push and vlan membership is implied by port-profile.  So I put
>> set_mac_vlan in there basically to elicit feedback.
> 
> Ok. Two points though:
> 
> - when you say that the mac address is included in the port-profile push,
>   does that imply that the VF does not have a mac address prior to this?

Correct, VF has no mac addr prior to port-profile being applied.  The
mac_addr is the mac_addr of the VM guest interface that's to use the VF.  If
the port-profile defines L2 mac spoofing, for example, the switch wants to
know the mac address before i/o starts.   I/o doesn't start until
port-profile is applied and the switch virtual port is setup.

>   This would again mix the NIC configuration phase with the switch
>   association, which I think we really need to avoid if we want to be
>   able to implement the association in user space!
> 
> - The VLAN ID being implied in the port profile seems to be another
>   difference between what enic is doing and the current draft VDP
>   that will eventually become 802.1Qbg, and I fear that this difference
>   will be visible in the iovnl protocol.

It's not just a VLAN ID, but the entire VLAN membership for the switch
virtual port.  The port-profile may define a single native VLAN for access
mode on the switch port, or a trunk mode with a list of allowed vlans, with
on native vlan.

The key is the port-profile.  The port-profile resolves the configuration of
the switch virtual port.  The configuration of the switch virtual port
includes many setting like I mentioned earlier: VLAN membership, QoS (rate
limits, priority class, L2 security, etc).
 
>> There really wouldn't be much different between iplink and iovnl since
>> they're both rtnetlink...seems we should keep IOV-related APIs in one place.
>> Maybe there are other IOV APIs to add to iovnl in the future like:
>> 
>>     vf <- add_vf(pf)
>>     del_vf(pf, vf)
>> 
>> Ethtool doesn't seem the right place for this.
> 
> Right. My preference would probably be make these a subcategory of
> the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> This would make it resemble the existing interfaces and mean you can
> use
>
> ip link add link eth0 type macvlan    # for a container
> ip link add link eth0 type macvtap    # for qemu/vhost
> ip link add link eth0 type vf         # for device assignment
> 
> There are obviously significant differences between these three, but
> they also share enough of their properties to let us treat them
> in similar ways.
> 

I don't have strong preference for iovnl vs. extending if_link.  I thought I
had a reason against if_link, but I can't recall that now...it'll probably
come to me when I look at it again.  Let me look again...
 
> If we integrate the iovnl client into iproute2, the sequence for setting
> up an enic VF and associating it to the port profile could be
> 
> # create vf0, pass mac and vlan id to HW, no association yet
> ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> # associate vf with port profile, mac address must match the one assigned
> #  to the interface before.
> ip iov assoc eth0 port-profile "general" host-uuid
> "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> mac fe:dc:ba:12:34:56

Ya, that sounds pretty close.  I still want the flexibility to direct ops to
a PF link for a VF link.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 22:18                 ` Chris Wright
@ 2010-04-22  0:01                   ` Scott Feldman
  0 siblings, 0 replies; 53+ messages in thread
From: Scott Feldman @ 2010-04-22  0:01 UTC (permalink / raw)
  To: Chris Wright, Arnd Bergmann; +Cc: davem, netdev

On 4/21/10 3:18 PM, "Chris Wright" <chrisw@redhat.com> wrote:

>> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
>> as I can tell. Interestingly, this is not actually implemented in
>> the enic driver in patch 2/2. So if we all agree that this is out of the
>> scope of iovnl, let's just remove it from the interface and find another
>> way for it (ethtool, iplink, ..., as listed above).
> 
> Scott, any objection?  At least a way to keep moving forward on the port
> profile bit.

Yes, that's fine with me, port-profile bit is the most important part.
 
>> Note that we still need to pass the MAC address and VLAN ID (or a list
>> of these) to the external switch, my point is just that this should be
>> separate from enforcing it in the hypervisor.
> 
> Yup, we should focus on reconciling the diff of enic vs vpd port profile
> needs.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
  2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
@ 2010-04-22  6:48   ` David Miller
  2010-04-22 21:23     ` Scott Feldman
  2010-04-22  6:52   ` [net-next PATCH 1/2] add iovnl netlink support David Miller
  2 siblings, 1 reply; 53+ messages in thread
From: David Miller @ 2010-04-22  6:48 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 19 Apr 2010 12:18:07 -0700

> +#define IOVNL_PROTO_VERSION 1
> +

Please delete this in the final version, the macro isn't even used by
the code.

We don't do protocol versioning in netlink.  Instead we get the base
stuff solid from the beginning, and then if something needs fixing up
we handle this using new attributes in a way which is both backward
and forward compatible.

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 22:48                     ` Chris Wright
@ 2010-04-22  6:51                       ` Arnd Bergmann
  2010-04-22 17:47                         ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-22  6:51 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev

On Thursday 22 April 2010, Chris Wright wrote:
> > 
> > ip link add link eth0 type macvlan    # for a container
> > ip link add link eth0 type macvtap    # for qemu/vhost
> > ip link add link eth0 type vf         # for device assignment
> 
> BTW, what do you mean by device assignment?

I mean giving an SR-IOV VF to the guest as a native PCI device
rather than having qemu or vhost present a virtio-net to the
guest.

> > There are obviously significant differences between these three, but
> > they also share enough of their properties to let us treat them
> > in similar ways.
> > 
> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> Just to clarify...right now, the normal SR-IOV VF is already there.
> And, or course, can have its mac addr/vlan set already.

I don't have an SR-IOV card available for testing yet. How is this
configured now?

> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> >        mac fe:dc:ba:12:34:56
> 
> At that point you could just do s/mac fe:.*/link vf0/

My point was that this information should be irrelevant to the code doing the
association with the switch. It sort of makes sense when the receiver is enic,
but when we send the same data to lldpad, it doesn't care about the slave device
name but only about the mac address. Especially since the slave device might not
be in the root name space any more, meaning we have no way to find it.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
  2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
  2010-04-22  6:48   ` [net-next PATCH 1/2] " David Miller
@ 2010-04-22  6:52   ` David Miller
  2010-04-22 10:53     ` Arnd Bergmann
  2 siblings, 1 reply; 53+ messages in thread
From: David Miller @ 2010-04-22  6:52 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw

From: Scott Feldman <scofeldm@cisco.com>
Date: Mon, 19 Apr 2010 12:18:07 -0700

> +	if (tb[IOV_ATTR_VF_IFNAME])
> +		vf_dev = dev_get_by_name(&init_net,
> +			nla_data(tb[IOV_ATTR_VF_IFNAME]));

It's probably best to check this for NULL and notify
the user with an error in that case (don't forget to
put 'dev' in that error path :-)

As things stand it looks like if we can't find vf_dev, we'll just send
NULL down to the vf_dev arg of the various operations and possibly
silently succeed.

That's not desirable, semantically.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 21:13                   ` Arnd Bergmann
  2010-04-21 22:48                     ` Chris Wright
  2010-04-21 23:54                     ` Scott Feldman
@ 2010-04-22  7:09                     ` David Miller
  2 siblings, 0 replies; 53+ messages in thread
From: David Miller @ 2010-04-22  7:09 UTC (permalink / raw)
  To: arnd; +Cc: scofeldm, chrisw, netdev

From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 21 Apr 2010 23:13:04 +0200

> My preference would probably be make these a subcategory of the
> if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.

I was going to suggest this as well.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-22  6:52   ` [net-next PATCH 1/2] add iovnl netlink support David Miller
@ 2010-04-22 10:53     ` Arnd Bergmann
  2010-04-22 10:56       ` David Miller
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-22 10:53 UTC (permalink / raw)
  To: David Miller; +Cc: scofeldm, netdev, chrisw

On Thursday 22 April 2010, David Miller wrote:
> From: Scott Feldman <scofeldm@cisco.com>
> Date: Mon, 19 Apr 2010 12:18:07 -0700
> 
> > +     if (tb[IOV_ATTR_VF_IFNAME])
> > +             vf_dev = dev_get_by_name(&init_net,
> > +                     nla_data(tb[IOV_ATTR_VF_IFNAME]));
> 
> It's probably best to check this for NULL and notify
> the user with an error in that case (don't forget to
> put 'dev' in that error path :-)

Since you brought up that hunk: shouldn't the namespace better
be current->nsproxy->net_ns instead of init_ns? If the sender
is confined in a separate network namespace, I would expect
that it should be able to modify devices in its own namespace
but none that are in the root namespace.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-22 10:53     ` Arnd Bergmann
@ 2010-04-22 10:56       ` David Miller
  2010-04-22 11:12         ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: David Miller @ 2010-04-22 10:56 UTC (permalink / raw)
  To: arnd; +Cc: scofeldm, netdev, chrisw

From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 22 Apr 2010 12:53:11 +0200

> On Thursday 22 April 2010, David Miller wrote:
>> From: Scott Feldman <scofeldm@cisco.com>
>> Date: Mon, 19 Apr 2010 12:18:07 -0700
>> 
>> > +     if (tb[IOV_ATTR_VF_IFNAME])
>> > +             vf_dev = dev_get_by_name(&init_net,
>> > +                     nla_data(tb[IOV_ATTR_VF_IFNAME]));
>> 
>> It's probably best to check this for NULL and notify
>> the user with an error in that case (don't forget to
>> put 'dev' in that error path :-)
> 
> Since you brought up that hunk: shouldn't the namespace better
> be current->nsproxy->net_ns instead of init_ns? If the sender
> is confined in a separate network namespace, I would expect
> that it should be able to modify devices in its own namespace
> but none that are in the root namespace.

Yes, the namespace needs to be handled better.

But reading other parts of the discussion it seems that
IOV_ATTR_VF_IFNAME and some other bits will likely be
removed in the initial implementation of this stuff.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-22 10:56       ` David Miller
@ 2010-04-22 11:12         ` Arnd Bergmann
  0 siblings, 0 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-22 11:12 UTC (permalink / raw)
  To: David Miller; +Cc: scofeldm, netdev, chrisw

On Thursday 22 April 2010, David Miller wrote:
> But reading other parts of the discussion it seems that
> IOV_ATTR_VF_IFNAME and some other bits will likely be
> removed in the initial implementation of this stuff.

That's what I suggested, yes. However, I'm still waiting for
a reply from Scott wether it's actually possibly to remove
it based on the way that the enic firmware works.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-21 23:54                     ` Scott Feldman
@ 2010-04-22 12:49                       ` Arnd Bergmann
  0 siblings, 0 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-22 12:49 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Chris Wright, davem, netdev

On Thursday 22 April 2010, Scott Feldman wrote:
> On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> > On Wednesday 21 April 2010, Scott Feldman wrote:
> >> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> >> You're right, not needed for enic since mac addr is included with
> >> port-profile push and vlan membership is implied by port-profile.  So I put
> >> set_mac_vlan in there basically to elicit feedback.
> > 
> > Ok. Two points though:
> > 
> > - when you say that the mac address is included in the port-profile push,
> >   does that imply that the VF does not have a mac address prior to this?
> 
> Correct, VF has no mac addr prior to port-profile being applied.  The
> mac_addr is the mac_addr of the VM guest interface that's to use the VF.  If
> the port-profile defines L2 mac spoofing, for example, the switch wants to
> know the mac address before i/o starts.   I/o doesn't start until
> port-profile is applied and the switch virtual port is setup.

Is it possible to split this this process, in order to make it more
closely resemble what we have when the registration is in user space?
This would mean that you assign a MAC address to the interface when the
interface gets created, and register the same MAC address at the switch
independent from the creation.

Obviously, if the port-profile (for enic) or the VSI list in the switch
enforces a the mac address and you pass one that's different from the
one that's set in the VF, it won't be able to send any data, but it
remains the job of the switch to enforce that case.

> It's not just a VLAN ID, but the entire VLAN membership for the switch
> virtual port.  The port-profile may define a single native VLAN for access
> mode on the switch port, or a trunk mode with a list of allowed vlans, with
> on native vlan.
> 
> The key is the port-profile.  The port-profile resolves the configuration of
> the switch virtual port.  The configuration of the switch virtual port
> includes many setting like I mentioned earlier: VLAN membership, QoS (rate
> limits, priority class, L2 security, etc).

Ok, I see.

> > If we integrate the iovnl client into iproute2, the sequence for setting
> > up an enic VF and associating it to the port profile could be
> > 
> > # create vf0, pass mac and vlan id to HW, no association yet
> > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> > 
> > # associate vf with port profile, mac address must match the one assigned
> > #  to the interface before.
> > ip iov assoc eth0 port-profile "general" host-uuid
> > "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> > mac fe:dc:ba:12:34:56
> 
> Ya, that sounds pretty close.  I still want the flexibility to direct ops to
> a PF link for a VF link.

Does that mean you require passing both the PF and the VF name?

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-22  6:51                       ` Arnd Bergmann
@ 2010-04-22 17:47                         ` Chris Wright
  2010-04-22 18:48                           ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-22 17:47 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Scott Feldman, davem, netdev, Mitch Williams

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Thursday 22 April 2010, Chris Wright wrote:
> > > 
> > > ip link add link eth0 type macvlan    # for a container
> > > ip link add link eth0 type macvtap    # for qemu/vhost
> > > ip link add link eth0 type vf         # for device assignment
> > 
> > BTW, what do you mean by device assignment?
> 
> I mean giving an SR-IOV VF to the guest as a native PCI device
> rather than having qemu or vhost present a virtio-net to the
> guest.

OK, wasn't clear if you meant that or simply 100% dedicating the interface
via something like virtio.  The add_vf() idea, while neat, doesn't really
match how VF's are allocated.

> > > There are obviously significant differences between these three, but
> > > they also share enough of their properties to let us treat them
> > > in similar ways.
> > > 
> > > If we integrate the iovnl client into iproute2, the sequence for setting
> > > up an enic VF and associating it to the port profile could be
> > > 
> > > # create vf0, pass mac and vlan id to HW, no association yet
> > > ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> > 
> > Just to clarify...right now, the normal SR-IOV VF is already there.
> > And, or course, can have its mac addr/vlan set already.
> 
> I don't have an SR-IOV card available for testing yet. How is this
> configured now?

The device shows up in the host as a normal network device, so mgmt tools
currently treat it as if it's no different from a PF.  So that's just
plain old:

SIOCSIFHWADDR or RTM_SETLINK (i.e. normal ->ndo_set_mac_addr)

There's also the possiblity of configuring through the PF (although
this isn't really widely used ATM, and has the disadvantage of exposing
the VF number to userspace in a way that's difficult to use).  This is
also done via RTM_SETLINK (on the PF this time), and will result in
->ndo_set_vf_mac().

> > > # associate vf with port profile, mac address must match the one assigned
> > > #  to the interface before.
> > > ip iov assoc eth0 port-profile "general" host-uuid "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> > >        mac fe:dc:ba:12:34:56
> > 
> > At that point you could just do s/mac fe:.*/link vf0/
> 
> My point was that this information should be irrelevant to the code doing the
> association with the switch. It sort of makes sense when the receiver is enic,
> but when we send the same data to lldpad, it doesn't care about the slave device
> name but only about the mac address. Especially since the slave device might not
> be in the root name space any more, meaning we have no way to find it.

Yeah, w/ namespace I think you'd normally do all setup before handing
into a new namespace.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-22 17:47                         ` Chris Wright
@ 2010-04-22 18:48                           ` Arnd Bergmann
  2010-04-22 19:02                             ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-22 18:48 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev, Mitch Williams

On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> OK, wasn't clear if you meant that or simply 100% dedicating the interface
> via something like virtio.  The add_vf() idea, while neat, doesn't really
> match how VF's are allocated.

But we still need something like that for allocating queues in VMDq
and similar cases where we do not have pass-through, right?

As far as I can tell we don't have an interface for that yet, but
we have drivers for a number of cards that could do this.

> > I don't have an SR-IOV card available for testing yet. How is this
> > configured now?
> 
> The device shows up in the host as a normal network device, so mgmt tools
> currently treat it as if it's no different from a PF.  So that's just
> plain old:
> 
> SIOCSIFHWADDR or RTM_SETLINK (i.e. normal ->ndo_set_mac_addr)

Ok, but that only works for a fixed number of VFs and you can only
configure the VF before it's assigned to the guest, right?

Both are not serious limitations, but it would be nice to
have an easy way around them. In particular, for assigning
the mac address and vlan id (VF in access mode), there needs
to be some interface that allows the host but not the guest
to change the settings after assigning the card to the guest.

This is a fundamental requirement for VEPA, because the switch
applied its forwarding rules based on the mac address and trusts
the hypervisor to make sure it cannot be faked by the guest.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-22 18:48                           ` Arnd Bergmann
@ 2010-04-22 19:02                             ` Chris Wright
  2010-04-22 19:36                               ` Arnd Bergmann
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-22 19:02 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Scott Feldman, davem, netdev, Mitch Williams

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> > OK, wasn't clear if you meant that or simply 100% dedicating the interface
> > via something like virtio.  The add_vf() idea, while neat, doesn't really
> > match how VF's are allocated.
> 
> But we still need something like that for allocating queues in VMDq
> and similar cases where we do not have pass-through, right?

Iff we care about VMDq w/out SR-IOV (since SR-IOV hardware is VMDq
capable and already has a queue-pair + interrupt + net_dev), yes.

And it's not just VMDq, it's any multi-queue card that can do mac/vlan
filter in hw + header/data split (for direct data DMA to guest buffers).

> As far as I can tell we don't have an interface for that yet, but
> we have drivers for a number of cards that could do this.
> 
> > > I don't have an SR-IOV card available for testing yet. How is this
> > > configured now?
> > 
> > The device shows up in the host as a normal network device, so mgmt tools
> > currently treat it as if it's no different from a PF.  So that's just
> > plain old:
> > 
> > SIOCSIFHWADDR or RTM_SETLINK (i.e. normal ->ndo_set_mac_addr)
> 
> Ok, but that only works for a fixed number of VFs and you can only
> configure the VF before it's assigned to the guest, right?

Depends on assign.

Assign meaning it's still visible in host, but only one guest is using
it via virtio (e.g. vhost-net)....then no, can change anytime (although
it's not typically changed during VM lifecycle).

Assign meaning direct PCI device assignment of the VF to the guest,
then yes, only while the device has driver in host.

> Both are not serious limitations, but it would be nice to
> have an easy way around them. In particular, for assigning
> the mac address and vlan id (VF in access mode), there needs
> to be some interface that allows the host but not the guest
> to change the settings after assigning the card to the guest.
> 
> This is a fundamental requirement for VEPA, because the switch
> applied its forwarding rules based on the mac address and trusts
> the hypervisor to make sure it cannot be faked by the guest.

Sure, but the VF (when directly assigned to the guest) is going to (at
least it should, for security reasons) always trap to a privileged code if
the guest tries to do something like set mac or vlan id.  All the SR-IOV
cards I've seen do this.  The "set VF mac addr" is really a message to
the PF.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-22 19:02                             ` Chris Wright
@ 2010-04-22 19:36                               ` Arnd Bergmann
  2010-04-22 21:03                                 ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-22 19:36 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev, Mitch Williams

On Thursday 22 April 2010 21:02:30 Chris Wright wrote:
> * Arnd Bergmann (arnd@arndb.de) wrote:
> > On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> > > OK, wasn't clear if you meant that or simply 100% dedicating the interface
> > > via something like virtio.  The add_vf() idea, while neat, doesn't really
> > > match how VF's are allocated.
> > 
> > But we still need something like that for allocating queues in VMDq
> > and similar cases where we do not have pass-through, right?
> 
> Iff we care about VMDq w/out SR-IOV (since SR-IOV hardware is VMDq
> capable and already has a queue-pair + interrupt + net_dev), yes.
> 
> And it's not just VMDq, it's any multi-queue card that can do mac/vlan
> filter in hw + header/data split (for direct data DMA to guest buffers).

Right, that's what I meant by VMDq. Do we have a better term to describe
this class of devices, i.e. VMDq and other cards that also have the
features you listed?

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next,1/2] add iovnl netlink support
  2010-04-22 19:36                               ` Arnd Bergmann
@ 2010-04-22 21:03                                 ` Chris Wright
  0 siblings, 0 replies; 53+ messages in thread
From: Chris Wright @ 2010-04-22 21:03 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, Scott Feldman, davem, netdev, Mitch Williams

* Arnd Bergmann (arnd@arndb.de) wrote:
> On Thursday 22 April 2010 21:02:30 Chris Wright wrote:
> > * Arnd Bergmann (arnd@arndb.de) wrote:
> > > On Thursday 22 April 2010 19:47:29 Chris Wright wrote:
> > > > OK, wasn't clear if you meant that or simply 100% dedicating the interface
> > > > via something like virtio.  The add_vf() idea, while neat, doesn't really
> > > > match how VF's are allocated.
> > > 
> > > But we still need something like that for allocating queues in VMDq
> > > and similar cases where we do not have pass-through, right?
> > 
> > Iff we care about VMDq w/out SR-IOV (since SR-IOV hardware is VMDq
> > capable and already has a queue-pair + interrupt + net_dev), yes.
> > 
> > And it's not just VMDq, it's any multi-queue card that can do mac/vlan
> > filter in hw + header/data split (for direct data DMA to guest buffers).
> 
> Right, that's what I meant by VMDq. Do we have a better term to describe
> this class of devices, i.e. VMDq and other cards that also have the
> features you listed?

I don't have a good term.  Some of these devices can already surface
multiple netdevs.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-22  6:48   ` [net-next PATCH 1/2] " David Miller
@ 2010-04-22 21:23     ` Scott Feldman
  2010-04-22 23:04       ` David Miller
  2010-04-22 23:16       ` eSwitch management Anirban Chakraborty
  0 siblings, 2 replies; 53+ messages in thread
From: Scott Feldman @ 2010-04-22 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, chrisw

On 4/21/10 11:48 PM, "David Miller" <davem@davemloft.net> wrote:

> From: Scott Feldman <scofeldm@cisco.com>
> Date: Mon, 19 Apr 2010 12:18:07 -0700
> 
>> +#define IOVNL_PROTO_VERSION 1
>> +
> 
> Please delete this in the final version, the macro isn't even used by
> the code.
> 
> We don't do protocol versioning in netlink.  Instead we get the base
> stuff solid from the beginning, and then if something needs fixing up
> we handle this using new attributes in a way which is both backward
> and forward compatible.

Sounds good to me, was a cut-and-paste from dcbnl.h.  How about:

diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
index b7cdbb4..8723491 100644
--- a/include/linux/dcbnl.h
+++ b/include/linux/dcbnl.h
@@ -22,8 +22,6 @@
 
 #include <linux/types.h>
 
-#define DCB_PROTO_VERSION 1
-
 struct dcbmsg {
        __u8               dcb_family;
        __u8               cmd;

-scott


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [net-next PATCH 1/2] add iovnl netlink support
  2010-04-22 21:23     ` Scott Feldman
@ 2010-04-22 23:04       ` David Miller
  2010-04-22 23:16       ` eSwitch management Anirban Chakraborty
  1 sibling, 0 replies; 53+ messages in thread
From: David Miller @ 2010-04-22 23:04 UTC (permalink / raw)
  To: scofeldm; +Cc: netdev, chrisw

From: Scott Feldman <scofeldm@cisco.com>
Date: Thu, 22 Apr 2010 14:23:30 -0700

> On 4/21/10 11:48 PM, "David Miller" <davem@davemloft.net> wrote:
> 
>> From: Scott Feldman <scofeldm@cisco.com>
>> Date: Mon, 19 Apr 2010 12:18:07 -0700
>> 
>>> +#define IOVNL_PROTO_VERSION 1
>>> +
>> 
>> Please delete this in the final version, the macro isn't even used by
>> the code.
>> 
>> We don't do protocol versioning in netlink.  Instead we get the base
>> stuff solid from the beginning, and then if something needs fixing up
>> we handle this using new attributes in a way which is both backward
>> and forward compatible.
> 
> Sounds good to me, was a cut-and-paste from dcbnl.h.  How about:

This is perfectly fine except it got whitespace damanged by your
email client and needs a proper commit message and signoff :-)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* eSwitch management
  2010-04-22 21:23     ` Scott Feldman
  2010-04-22 23:04       ` David Miller
@ 2010-04-22 23:16       ` Anirban Chakraborty
  2010-04-23  0:47         ` Scott Feldman
  1 sibling, 1 reply; 53+ messages in thread
From: Anirban Chakraborty @ 2010-04-22 23:16 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Scott Feldman, chrisw, Arnd Bergmann, Ameen Rahman,
	Amit Salecha, Rajesh Borundia

Hi All,

I am following the discussions on iovnl patch closely. While it is going to take some time for iovnl patch to be reviewed and accepted, what would be the interim approach to manage the eswitch in NIC? We need to add support in qlcnic driver to configure the eswitch in our 10G NIC. Some of the things that we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We would like to set these parameters for a bunch of ports at the start of the day and set it to the eswitch.
Can we expose sysfs nodes to manage the eswitch or should we have a netlink/ioctl support put in the driver?  Not sure if we can do it via sysfs in a clean way. Netlink seems to be the ideal candidate for this.  What is an acceptable solution? Any suggesstion, advice will be highly appreciated.

thanks much,
Anirban Chakraborty
 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-22 23:16       ` eSwitch management Anirban Chakraborty
@ 2010-04-23  0:47         ` Scott Feldman
  2010-04-23  1:29           ` Scott Feldman
  0 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-23  0:47 UTC (permalink / raw)
  To: Anirban Chakraborty, David Miller
  Cc: netdev, chrisw, Arnd Bergmann, Ameen Rahman, Amit Salecha,
	Rajesh Borundia

On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
wrote:

> I am following the discussions on iovnl patch closely. While it is going to
> take some time for iovnl patch to be reviewed and accepted, what would be the
> interim approach to manage the eswitch in NIC? We need to add support in
> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things that
> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
> would like to set these parameters for a bunch of ports at the start of the
> day and set it to the eswitch.

Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
get a start there?  Not sure not knowing your device requirements.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23  0:47         ` Scott Feldman
@ 2010-04-23  1:29           ` Scott Feldman
  2010-04-23  5:57             ` Anirban Chakraborty
  0 siblings, 1 reply; 53+ messages in thread
From: Scott Feldman @ 2010-04-23  1:29 UTC (permalink / raw)
  To: Scott Feldman, Anirban Chakraborty, David Miller
  Cc: netdev, chrisw, Arnd Bergmann, Ameen Rahman, Amit Salecha,
	Rajesh Borundia

On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:

> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
> wrote:
> 
>> I am following the discussions on iovnl patch closely. While it is going to
>> take some time for iovnl patch to be reviewed and accepted, what would be the
>> interim approach to manage the eswitch in NIC? We need to add support in
>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
>> that
>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
>> would like to set these parameters for a bunch of ports at the start of the
>> day and set it to the eswitch.
> 
> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
> get a start there?  Not sure not knowing your device requirements.

Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
seem like what you're looking for.  I'm looking at moving iovnl here as well
for port-profile.

-scott


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23  1:29           ` Scott Feldman
@ 2010-04-23  5:57             ` Anirban Chakraborty
  2010-04-23 12:42               ` Arnd Bergmann
  2010-04-23 16:23               ` Chris Wright
  0 siblings, 2 replies; 53+ messages in thread
From: Anirban Chakraborty @ 2010-04-23  5:57 UTC (permalink / raw)
  To: Scott Feldman
  Cc: David Miller, netdev, chrisw, Arnd Bergmann, Ameen Rahman,
	Amit Salecha, Rajesh Borundia


On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:

> On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> 
>> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
>> wrote:
>> 
>>> I am following the discussions on iovnl patch closely. While it is going to
>>> take some time for iovnl patch to be reviewed and accepted, what would be the
>>> interim approach to manage the eswitch in NIC? We need to add support in
>>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
>>> that
>>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
>>> would like to set these parameters for a bunch of ports at the start of the
>>> day and set it to the eswitch.
>> 
>> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
>> get a start there?  Not sure not knowing your device requirements.
> 
> Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
> seem like what you're looking for.  I'm looking at moving iovnl here as well
> for port-profile.

It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?

Thanks a lot,
Anirban


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23  5:57             ` Anirban Chakraborty
@ 2010-04-23 12:42               ` Arnd Bergmann
  2010-04-23 16:23               ` Chris Wright
  1 sibling, 0 replies; 53+ messages in thread
From: Arnd Bergmann @ 2010-04-23 12:42 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: Scott Feldman, David Miller, netdev, chrisw, Ameen Rahman,
	Amit Salecha, Rajesh Borundia

On Friday 23 April 2010, Anirban Chakraborty wrote:
> On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:
> > On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> >> 
> >> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
> >> get a start there?  Not sure not knowing your device requirements.
> > 
> > Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
> > seem like what you're looking for.  I'm looking at moving iovnl here as well
> > for port-profile.
> 
> It looks like ifla_vf_info does contain most of the data set. But if I use it, what
> NETLINK protocol family should I use in my driver to receive netlink messages? Do I
> need to create a private protocol family?

Your driver should implement the ndo_set_vf_*/ndo_get_vf_* callbacks, not
implement the netlink protocol itself. If there is anything missing in the
existing callbacks that you require for the operation of your driver, you
should send patches to extend the implementation in net/core/rtnetlink.c.

	Arnd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23  5:57             ` Anirban Chakraborty
  2010-04-23 12:42               ` Arnd Bergmann
@ 2010-04-23 16:23               ` Chris Wright
  2010-04-23 19:00                 ` Anirban Chakraborty
  1 sibling, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-23 16:23 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: Scott Feldman, David Miller, netdev, chrisw, Arnd Bergmann,
	Ameen Rahman, Amit Salecha, Rajesh Borundia

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> 
> On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:
> 
> > On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
> > 
> >> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
> >> wrote:
> >> 
> >>> I am following the discussions on iovnl patch closely. While it is going to
> >>> take some time for iovnl patch to be reviewed and accepted, what would be the
> >>> interim approach to manage the eswitch in NIC? We need to add support in
> >>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
> >>> that
> >>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
> >>> would like to set these parameters for a bunch of ports at the start of the
> >>> day and set it to the eswitch.
> >> 
> >> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
> >> get a start there?  Not sure not knowing your device requirements.
> > 
> > Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
> > seem like what you're looking for.  I'm looking at moving iovnl here as well
> > for port-profile.
> 
> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?

No, you don't need to use netlink in your driver.  You just need to fill
in the relevant net_device_ops in your driver init.  Specifically:

 *      SR-IOV management functions.
 * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
 * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
 * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
 * int (*ndo_get_vf_config)(struct net_device *dev,
 *                          int vf, struct ifla_vf_info *ivf);

These are all operating on a VF indexed internally w/in the driver, so it's
a little cumbersome to use from userspace.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23 16:23               ` Chris Wright
@ 2010-04-23 19:00                 ` Anirban Chakraborty
  2010-04-23 19:44                   ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Anirban Chakraborty @ 2010-04-23 19:00 UTC (permalink / raw)
  To: Chris Wright
  Cc: Scott Feldman, David Miller, netdev, Arnd Bergmann, Ameen Rahman,
	Amit Salecha, Rajesh Borundia


On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> 
>> On Apr 22, 2010, at 6:29 PM, Scott Feldman wrote:
>> 
>>> On 4/22/10 5:47 PM, "Scott Feldman" <scofeldm@cisco.com> wrote:
>>> 
>>>> On 4/22/10 4:16 PM, "Anirban Chakraborty" <anirban.chakraborty@qlogic.com>
>>>> wrote:
>>>> 
>>>>> I am following the discussions on iovnl patch closely. While it is going to
>>>>> take some time for iovnl patch to be reviewed and accepted, what would be the
>>>>> interim approach to manage the eswitch in NIC? We need to add support in
>>>>> qlcnic driver to configure the eswitch in our 10G NIC. Some of the things
>>>>> that
>>>>> we need to set to the switch are setting a port's VLAN, tx bandwidth etc. We
>>>>> would like to set these parameters for a bunch of ports at the start of the
>>>>> day and set it to the eswitch.
>>>> 
>>>> Are any of these settings covered in DCB?  (net/dcb/dcbnl.c).  Maybe you can
>>>> get a start there?  Not sure not knowing your device requirements.
>>> 
>>> Or maybe the RTM_SETLINK IFLA_VF_* ops in include/linux/if_link.h?  Those
>>> seem like what you're looking for.  I'm looking at moving iovnl here as well
>>> for port-profile.
>> 
>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> 
> No, you don't need to use netlink in your driver.  You just need to fill
> in the relevant net_device_ops in your driver init.  Specifically:
> 
> *      SR-IOV management functions.
> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> * int (*ndo_get_vf_config)(struct net_device *dev,
> *                          int vf, struct ifla_vf_info *ivf);
> 
> These are all operating on a VF indexed internally w/in the driver, so it's
> a little cumbersome to use from userspace.

These are all intended for VFs and are configureable from PF. However, in our case, there are multiple physical NIC function on a port which are configureable by the eswitch. So, what we are setting is essentially switch ports, rather than configuring any setting on the physical functions. If netlink doesn't fly, is sysfs going to work? If we allocate a buffer and fill it up with user space tools that the driver grabs it and does the configuration itself?  

thanks,
Anirban



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23 19:00                 ` Anirban Chakraborty
@ 2010-04-23 19:44                   ` Chris Wright
  2010-04-23 21:08                     ` Anirban Chakraborty
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-23 19:44 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: Chris Wright, Scott Feldman, David Miller, netdev, Arnd Bergmann,
	Ameen Rahman, Amit Salecha, Rajesh Borundia, shemminger

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
> > * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> > 
> > No, you don't need to use netlink in your driver.  You just need to fill
> > in the relevant net_device_ops in your driver init.  Specifically:
> > 
> > *      SR-IOV management functions.
> > * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> > * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> > * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> > * int (*ndo_get_vf_config)(struct net_device *dev,
> > *                          int vf, struct ifla_vf_info *ivf);
> > 
> > These are all operating on a VF indexed internally w/in the driver, so it's
> > a little cumbersome to use from userspace.
> 
> These are all intended for VFs and are configureable from PF.

Yes, and while the set of callbacks can change, they are always tied to
some net_device (typically the PF) that knows how to make hardware
settings on behalf of a VF.

> However, in our case, there are multiple physical NIC function on a
> port which are configureable by the eswitch.

Is there a PCI function that represents the switch?  Or a special PCI
NIC function that has VEB mgmt plane access?  And do you have examples
of configuration that you'll do here?

> So, what we are setting
> is essentially switch ports, rather than configuring any setting on the
> physical functions. If netlink doesn't fly, is sysfs going to work?

Before we go to implementation specifics (i.e. netlink vs. sysfs, and my
guess is sysfs isn't going to be the right fit), let's step back and
look at what needs setting.

> If
> we allocate a buffer and fill it up with user space tools that the driver
> grabs it and does the configuration itself?

One idea that has been discussed in the past is to create essentially
a pluggable set of bridge_ops.  The first step would be purely internal
shuffling, to make the existing sw bridge code go through the bridge_ops.
The second step would be making your driver for whichever PCI function
you have that supports managing the bridge create a net_device which is
a bridge during driver init.  And now normal brctl can call into your
VEB via the bridge_ops callbacks. </handwave>

But this too starts w/ looking at what the management requirements are
for your bridge.  Can you enumerate those?

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23 19:44                   ` Chris Wright
@ 2010-04-23 21:08                     ` Anirban Chakraborty
  2010-04-23 23:04                       ` Chris Wright
  0 siblings, 1 reply; 53+ messages in thread
From: Anirban Chakraborty @ 2010-04-23 21:08 UTC (permalink / raw)
  To: Chris Wright
  Cc: Scott Feldman, David Miller, netdev, Arnd Bergmann, Ameen Rahman,
	Amit Salecha, Rajesh Borundia, shemminger


On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
>>> 
>>> No, you don't need to use netlink in your driver.  You just need to fill
>>> in the relevant net_device_ops in your driver init.  Specifically:
>>> 
>>> *      SR-IOV management functions.
>>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
>>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
>>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
>>> * int (*ndo_get_vf_config)(struct net_device *dev,
>>> *                          int vf, struct ifla_vf_info *ivf);
>>> 
>>> These are all operating on a VF indexed internally w/in the driver, so it's
>>> a little cumbersome to use from userspace.
>> 
>> These are all intended for VFs and are configureable from PF.
> 
> Yes, and while the set of callbacks can change, they are always tied to
> some net_device (typically the PF) that knows how to make hardware
> settings on behalf of a VF.
> 
>> However, in our case, there are multiple physical NIC function on a
>> port which are configureable by the eswitch.
> 
> Is there a PCI function that represents the switch?  Or a special PCI
> NIC function that has VEB mgmt plane access?  And do you have examples
> of configuration that you'll do here?
There is no PCI function that represents the switch. However, one of the NIC functions can act as a privileged function to configure the eswitch. Typically the first NIC function that is enumerated in the bus manages the eswitch. Typical configurations would be to set tx bandwidth, VLAN ID, MAC address, promiscuous mode setting for each of these ports at the start of the day. This is useful in virtualization scenario where we can do PCI passthru of the functions to the guest and these settings for the guest are configured via the driver in the host.

<snip>
> 
> One idea that has been discussed in the past is to create essentially
> a pluggable set of bridge_ops.  The first step would be purely internal
> shuffling, to make the existing sw bridge code go through the bridge_ops.
> The second step would be making your driver for whichever PCI function
> you have that supports managing the bridge create a net_device which is
> a bridge during driver init.  And now normal brctl can call into your
> VEB via the bridge_ops callbacks. </handwave>
> 
I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.

thanks,
Anirban





^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23 21:08                     ` Anirban Chakraborty
@ 2010-04-23 23:04                       ` Chris Wright
  2010-04-24  6:21                         ` Anirban Chakraborty
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wright @ 2010-04-23 23:04 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: Chris Wright, Scott Feldman, David Miller, netdev, Arnd Bergmann,
	Ameen Rahman, Amit Salecha, Rajesh Borundia, shemminger

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> 
> On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:
> 
> > * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
> >>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> >>> 
> >>> No, you don't need to use netlink in your driver.  You just need to fill
> >>> in the relevant net_device_ops in your driver init.  Specifically:
> >>> 
> >>> *      SR-IOV management functions.
> >>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> >>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> >>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> >>> * int (*ndo_get_vf_config)(struct net_device *dev,
> >>> *                          int vf, struct ifla_vf_info *ivf);
> >>> 
> >>> These are all operating on a VF indexed internally w/in the driver, so it's
> >>> a little cumbersome to use from userspace.
> >> 
> >> These are all intended for VFs and are configureable from PF.
> > 
> > Yes, and while the set of callbacks can change, they are always tied to
> > some net_device (typically the PF) that knows how to make hardware
> > settings on behalf of a VF.
> > 
> >> However, in our case, there are multiple physical NIC function on a
> >> port which are configureable by the eswitch.
> > 
> > Is there a PCI function that represents the switch?  Or a special PCI
> > NIC function that has VEB mgmt plane access?  And do you have examples
> > of configuration that you'll do here?
> 
> There is no PCI function that represents the switch. However, one
> of the NIC functions can act as a privileged function to configure the
> eswitch. Typically the first NIC function that is enumerated in the bus
> manages the eswitch. Typical configurations would be to set tx bandwidth,
> VLAN ID, MAC address, promiscuous mode setting for each of these ports
> at the start of the day. This is useful in virtualization scenario where
> we can do PCI passthru of the functions to the guest and these settings
> for the guest are configured via the driver in the host.

(btw, this is not uncommon, there other adapters that have multiple
functions for a single physical port that is not SR-IOV based)

How does the privileged function identify the other functions?  IOW, the
existing SR-IOV ndo callbacks have most of the above (tx bw control, mac,
vlan id), and have an 'int vf' which is basically just a driver specific
identifier to a non-privileged function or set of hw resources.  It looks
like you can use the existing bits (just need to expand a little).

So far we have only:

- tx bw control
- set mac addr
- set vlan id

You've additionally identified:

- set promiscuous mode

I'm also aware of:

- setting port aggregation
- issuing a function reset
- setting port mirroring or bcast/mcast replication
- setting anti-spoofing (mac/vlan..)
- setting security/filtering
- getting port statistics
- ...whatever else I'm forgetting

> <snip>
> > 
> > One idea that has been discussed in the past is to create essentially
> > a pluggable set of bridge_ops.  The first step would be purely internal
> > shuffling, to make the existing sw bridge code go through the bridge_ops.
> > The second step would be making your driver for whichever PCI function
> > you have that supports managing the bridge create a net_device which is
> > a bridge during driver init.  And now normal brctl can call into your
> > VEB via the bridge_ops callbacks. </handwave>
> > 
> I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.

I don't quite follow you here.

thanks,
-chris

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: eSwitch management
  2010-04-23 23:04                       ` Chris Wright
@ 2010-04-24  6:21                         ` Anirban Chakraborty
  0 siblings, 0 replies; 53+ messages in thread
From: Anirban Chakraborty @ 2010-04-24  6:21 UTC (permalink / raw)
  To: Chris Wright
  Cc: Scott Feldman, David Miller, netdev, Arnd Bergmann, Ameen Rahman,
	Amit Salecha, Rajesh Borundia, shemminger


On Apr 23, 2010, at 4:04 PM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> 
>> On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:
>> 
>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
>>>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
>>>>> 
>>>>> No, you don't need to use netlink in your driver.  You just need to fill
>>>>> in the relevant net_device_ops in your driver init.  Specifically:
>>>>> 
>>>>> *      SR-IOV management functions.
>>>>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
>>>>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
>>>>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
>>>>> * int (*ndo_get_vf_config)(struct net_device *dev,
>>>>> *                          int vf, struct ifla_vf_info *ivf);
>>>>> 
>>>>> These are all operating on a VF indexed internally w/in the driver, so it's
>>>>> a little cumbersome to use from userspace.
>>>> 
>>>> These are all intended for VFs and are configureable from PF.
>>> 
>>> Yes, and while the set of callbacks can change, they are always tied to
>>> some net_device (typically the PF) that knows how to make hardware
>>> settings on behalf of a VF.
>>> 
>>>> However, in our case, there are multiple physical NIC function on a
>>>> port which are configureable by the eswitch.
>>> 
>>> Is there a PCI function that represents the switch?  Or a special PCI
>>> NIC function that has VEB mgmt plane access?  And do you have examples
>>> of configuration that you'll do here?
>> 
>> There is no PCI function that represents the switch. However, one
>> of the NIC functions can act as a privileged function to configure the
>> eswitch. Typically the first NIC function that is enumerated in the bus
>> manages the eswitch. Typical configurations would be to set tx bandwidth,
>> VLAN ID, MAC address, promiscuous mode setting for each of these ports
>> at the start of the day. This is useful in virtualization scenario where
>> we can do PCI passthru of the functions to the guest and these settings
>> for the guest are configured via the driver in the host.
> 
> (btw, this is not uncommon, there other adapters that have multiple
> functions for a single physical port that is not SR-IOV based)
> 
> How does the privileged function identify the other functions?  IOW, the
> existing SR-IOV ndo callbacks have most of the above (tx bw control, mac,
> vlan id), and have an 'int vf' which is basically just a driver specific
> identifier to a non-privileged function or set of hw resources.  It looks
> like you can use the existing bits (just need to expand a little).
> 
> So far we have only:
> 
> - tx bw control
> - set mac addr
> - set vlan id
> 
> You've additionally identified:
> 
> - set promiscuous mode
> 
> I'm also aware of:
> 
> - setting port aggregation
> - issuing a function reset
> - setting port mirroring or bcast/mcast replication
> - setting anti-spoofing (mac/vlan..)
> - setting security/filtering
> - getting port statistics
> - ...whatever else I'm forgetting
Scott's latest patch already addressed some of these. May be we should add the missing pieces, e.g. setting promiscuous mode, port mirroring etc. from the above list to ndo_ops. Function reset should be handled via FLR.

> 
>> <snip>
>>> 
>>> One idea that has been discussed in the past is to create essentially
>>> a pluggable set of bridge_ops.  The first step would be purely internal
>>> shuffling, to make the existing sw bridge code go through the bridge_ops.
>>> The second step would be making your driver for whichever PCI function
>>> you have that supports managing the bridge create a net_device which is
>>> a bridge during driver init.  And now normal brctl can call into your
>>> VEB via the bridge_ops callbacks. </handwave>
>>> 
>> I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.
> 
> I don't quite follow you here.
If I am not mistaken, port profile is supposed to keep configuration data of a NIC port and a software vswitch typically residing at the host uses it. When there are multiple physical NICs (on the same physical port) in the hypervisor, there are multiple vswitches created for each of the pNICs. The inter vm traffic in this case goes via the eswitch and thats where the eswitch configuration for these ports comes into picture.

thanks,
Anirban

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2010-04-24  6:21 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-19 19:18 [net-next PATCH 0/2] iovnl netlink ops + enic dynamic vnics Scott Feldman
2010-04-19 19:18 ` [net-next PATCH 1/2] add iovnl netlink support Scott Feldman
2010-04-20 13:48   ` [net-next,1/2] " Arnd Bergmann
2010-04-20 14:34     ` Chris Wright
2010-04-20 14:57       ` Arnd Bergmann
2010-04-20 15:22         ` Chris Wright
2010-04-20 16:19           ` Arnd Bergmann
2010-04-20 20:26             ` Scott Feldman
2010-04-21 13:17               ` Arnd Bergmann
2010-04-21 16:28                 ` Scott Feldman
2010-04-21 18:04                   ` Arnd Bergmann
2010-04-20 19:56     ` Scott Feldman
2010-04-21 11:26       ` Arnd Bergmann
2010-04-21 11:42         ` Selective MD5 Checksum Failuers Bijay Singh
2010-04-21 16:18         ` [net-next,1/2] add iovnl netlink support Chris Wright
2010-04-21 17:52           ` Arnd Bergmann
2010-04-21 18:10             ` Chris Wright
2010-04-21 19:39               ` Arnd Bergmann
2010-04-21 20:25                 ` Scott Feldman
2010-04-21 21:13                   ` Arnd Bergmann
2010-04-21 22:48                     ` Chris Wright
2010-04-22  6:51                       ` Arnd Bergmann
2010-04-22 17:47                         ` Chris Wright
2010-04-22 18:48                           ` Arnd Bergmann
2010-04-22 19:02                             ` Chris Wright
2010-04-22 19:36                               ` Arnd Bergmann
2010-04-22 21:03                                 ` Chris Wright
2010-04-21 23:54                     ` Scott Feldman
2010-04-22 12:49                       ` Arnd Bergmann
2010-04-22  7:09                     ` David Miller
2010-04-21 22:18                 ` Chris Wright
2010-04-22  0:01                   ` Scott Feldman
2010-04-21 21:22     ` Arnd Bergmann
2010-04-22  6:48   ` [net-next PATCH 1/2] " David Miller
2010-04-22 21:23     ` Scott Feldman
2010-04-22 23:04       ` David Miller
2010-04-22 23:16       ` eSwitch management Anirban Chakraborty
2010-04-23  0:47         ` Scott Feldman
2010-04-23  1:29           ` Scott Feldman
2010-04-23  5:57             ` Anirban Chakraborty
2010-04-23 12:42               ` Arnd Bergmann
2010-04-23 16:23               ` Chris Wright
2010-04-23 19:00                 ` Anirban Chakraborty
2010-04-23 19:44                   ` Chris Wright
2010-04-23 21:08                     ` Anirban Chakraborty
2010-04-23 23:04                       ` Chris Wright
2010-04-24  6:21                         ` Anirban Chakraborty
2010-04-22  6:52   ` [net-next PATCH 1/2] add iovnl netlink support David Miller
2010-04-22 10:53     ` Arnd Bergmann
2010-04-22 10:56       ` David Miller
2010-04-22 11:12         ` Arnd Bergmann
2010-04-19 19:18 ` [net-next PATCH 2/2] add enic iovnl ops for dynamic vnics Scott Feldman
2010-04-19 21:35 ` [net-next PATCH 0/2] iovnl netlink ops + enic " Chris Wright

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.