All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next V1 0/9] Add Ethernet IPoIB driver
@ 2012-07-18 10:59 Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Or Gerlitz

changes from V0:
 - applied feedback from Eric/Dave - RX flow uses only the last 20 bytes of skb->cb[]
 - applied feedback from Ben H. on ethtool changes
 - fix sparse error on function which should be made static
 - made the netdev features related code of the driver more elegant/robust
 - used _bh locking in some paths which used plain rw locking in V0
 - some code rearrangements in flows that send ARPs


The eIPoIB driver provides a standard Ethernet netdevice over 
the InfiniBand IPoIB interface.

Some services can run only on top of Ethernet L2 interfaces, and cannot be
bound to an IPoIB interface. With this new driver, these services can run
seamlessly.

Main use case of the driver is the Ethernet Virtual Switching used in
virtualized environments, where an eipoib netdevice can be used as a 
Physical Interface (PIF) in the hypervisor domain, and allow other 
guests Virtual Interfaces (VIF) connected to the same Virtual Switch 
to run over the InfiniBand fabric.

This driver supports L2 Switching (Direct Bridging) as well as other L3
Switching modes (e.g. NAT).

Whenever an IPoIB interface is created, one eIPoIB PIF netdevice 
will be created. The default naming scheme is as in other Ethernet 
interfaces: ethX, for example, on a system with two IPoIB interfaces,
ib0 and ib1, two interfaces will be created ethX and ethX+1 When "X" 
is the next free Ethernet number in the system.

Using "ethtool -i " over the new interface can tell on which IPoIB
PIF interface that interface is above.  For example: driver: eth_ipoib:ib0 
indicates that eth3 is the Ethernet interface over the ib0 IPoIB interface.

The driver can be used as independent interface or to serve in
virtualization environment as the physical layer for the virtual
interfaces on the virtual guest.

The driver interface (eipoib interface or which is also referred to as parent) 
uses slave interfaces, IPoIB clones, which are the VIFs described above.

VIFs interfaces are enslaved/released from the eipoib driver on demand, according 
to the management interface provided to user space.

The management interface for the driver uses sysfs entries. Via these sysfs 
entries the driver gets details on new VIF's to manage. The driver can 
enslave new VIF (IPoIB cloned interface) or detaches from it.

Here are few sysfs commands that are used in order to manage the driver, 
according to few scenarios:

1. create new clone of IPoIB interface:

	$ echo .Y > /sys/class/net/ibX/create_child

create new clone ibX.Y with the same pkey as ibX, for example:

	$ echo .1 > /sys/class/net/ib0/create_child

will create new interface ib0.1

2. notify parent interface on new VIF to enslave:

	$ echo +ibX.Y > /sys/class/net/ethZ/eth/slaves

where ethZ is the driver interface, for example:

	$ echo +ib0.1 > /sys/class/net/eth4/eth/slaves

will enslave ib0.1 to eth4

3. notify parent interface interface on VIF details (mac and vlan)

	$ echo +ibX.Y <MAC address> > /sys/class/net/ethZ/eth/vifs

for example:

	$ echo +ib0.1 00:02:c9:43:3b:f1 > /sys/class/net/eth4/eth/vifs

4. notify parent to release VIF:

	$ echo -ibX.Y > /sys/class/net/ethZ/eth/slaves

where ethZ is the driver interface, for example:

        $ echo -ib0.1 > /sys/class/net/eth4/eth/slaves

will release ib0.1 from eth4

5. see the list of ipoib interfaces enslaved under eipoib interface,

	$ cat /sys/class/net/ethX/eth/vifs

for example:
	
	$ cat /sys/class/net/eth4/eth/vifs

	SLAVE=ib0.1      MAC=9a:c2:1f:d7:3b:63 VLAN=N/A
	SLAVE=ib0.2      MAC=52:54:00:60:55:88 VLAN=N/A
	SLAVE=ib0.3      MAC=52:54:00:60:55:89 VLAN=N/A

Note: Each ethX interface has at least one ibX.Y slave to serve the PIF
itself, in the VIFs list of ethX you'll notice that ibX.1 is always created 
to serve applications running from the Hypervisor on top of ethX interface directly.

For IB applications that require native IPoIB interfaces (e.g. RDMA-CM), the
original ipoib interfaces ibX can still be used.  For example, RDMA-CM and
eth_ipoib drivers can co-exist and make use of IPoIB

The last patch of this series was made such that the series works as is over 
net-next, in parallel to the submission of this driver, a patch to modify IPoIB 
such that it doesn't assume dst/neighbour on the skb was posted. 

Or.


Erez Shitrit (8):
  include/linux: Add private flags for IPoIB interfaces
  IB/ipoib: Add support for acting as VIF
  net/eipoib: Add private header file
  net/eipoib: Add ethtool file support
  net/eipoib: Add sysfs support
  net/eipoib: Add main driver functionality
  net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries
  IB/ipoib: Add support for transmission of skbs w.o dst/neighbour

Or Gerlitz (1):
  IB/ipoib: Add support for clones / multiple childs on the same partition

 Documentation/infiniband/ipoib.txt         |   23 +
 MAINTAINERS                                |    6 +
 drivers/infiniband/ulp/ipoib/ipoib.h       |   12 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c    |    9 +
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |    8 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   76 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c  |   46 +-
 drivers/net/Kconfig                        |   15 +
 drivers/net/Makefile                       |    1 +
 drivers/net/eipoib/Makefile                |    4 +
 drivers/net/eipoib/eth_ipoib.h             |  227 ++++
 drivers/net/eipoib/eth_ipoib_ethtool.c     |  126 ++
 drivers/net/eipoib/eth_ipoib_main.c        | 1915 ++++++++++++++++++++++++++++
 drivers/net/eipoib/eth_ipoib_sysfs.c       |  640 ++++++++++
 include/linux/if.h                         |    2 +
 include/rdma/e_ipoib.h                     |   54 +
 17 files changed, 3134 insertions(+), 33 deletions(-)
 create mode 100644 drivers/net/eipoib/Makefile
 create mode 100644 drivers/net/eipoib/eth_ipoib.h
 create mode 100644 drivers/net/eipoib/eth_ipoib_ethtool.c
 create mode 100644 drivers/net/eipoib/eth_ipoib_main.c
 create mode 100644 drivers/net/eipoib/eth_ipoib_sysfs.c
 create mode 100644 include/rdma/e_ipoib.h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
@ 2012-07-18 10:59 ` Or Gerlitz
  2012-07-18 18:38   ` David Miller
  2012-07-18 10:59 ` [PATCH net-next V1 2/9] include/linux: Add private flags for IPoIB interfaces Or Gerlitz
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Or Gerlitz, Erez Shitrit

Allow creating "clone" child interfaces which further partition an
IPoIB interface to sub interfaces who either use the same pkey as
their parent or use the same pkey as already created child interface.

Each child now has a child index, which together with the pkey is
used as the identifier of the created network device.

All sorts of childs are still created/deleted through sysfs, in a
similar manner to the way legacy child interfaces are.

A major use case for clone childs is for virtualization purposes, where
a per VM NIC is desired at the hypervisor level, such as the solution
provided by the newly introduced Ethernet IPoIB driver.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 Documentation/infiniband/ipoib.txt         |   23 +++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib.h       |    7 +++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   48 +++++++++++++++++++++-------
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c  |   46 ++++++++++++++++++--------
 5 files changed, 98 insertions(+), 29 deletions(-)

diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt
index 64eeb55..601f78f 100644
--- a/Documentation/infiniband/ipoib.txt
+++ b/Documentation/infiniband/ipoib.txt
@@ -24,6 +24,29 @@ Partitions and P_Keys
   The P_Key for any interface is given by the "pkey" file, and the
   main interface for a subinterface is in "parent."
 
+Clones
+  Its possible to further partition an IPoIB interfaces, and create
+  "clone" child interfaces which either use the same pkey as their
+  parent, or as an already created child interface. Each child now has
+  a child index, which together with the pkey is used as the identifier
+  of the created network device.
+
+ All sorts of childs are still created/deleted through sysfs, in a
+ similar manner to the way legacy child interfaces are, for example:
+
+    echo 0x8001.1 > /sys/class/net/ib0/create_child
+
+  will create an interface named ib0.8001.1 with P_Key 0x8001 and index 1
+
+    echo .1 > /sys/class/net/ib0/create_child
+
+  will create an interface named ib0.1 with same P_Key as ib0 and index 1
+
+  remove a subinterface, use the "delete_child" file:
+
+    echo 0x8001.1 > /sys/class/net/ib0/create_child
+    echo .1  > /sys/class/net/ib0/create_child
+
 Datagram vs Connected modes
 
   The IPoIB driver supports two modes of operation: datagram and
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 86df632..a57db27 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -332,6 +332,7 @@ struct ipoib_dev_priv {
 	struct net_device *parent;
 	struct list_head child_intfs;
 	struct list_head list;
+	int child_index;
 
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
 	struct ipoib_cm_dev_priv cm;
@@ -490,8 +491,10 @@ void ipoib_transport_dev_cleanup(struct net_device *dev);
 void ipoib_event(struct ib_event_handler *handler,
 		 struct ib_event *record);
 
-int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey);
-int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey);
+int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey,
+						unsigned char clone_index);
+int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey,
+						unsigned char clone_index);
 
 void ipoib_pkey_poll(struct work_struct *work);
 int ipoib_pkey_dev_delay_open(struct net_device *dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index bbee4b2..d0cb5cc 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1095,17 +1095,44 @@ int ipoib_add_umcast_attr(struct net_device *dev)
 	return device_create_file(&dev->dev, &dev_attr_umcast);
 }
 
+static int parse_child(struct device *dev, const char *buf, int *pkey,
+		       int *child_index)
+{
+	int ret;
+	struct ipoib_dev_priv *priv = netdev_priv(to_net_dev(dev));
+
+	*pkey = *child_index = -1;
+
+	/* 'pkey' or 'pkey.child_index' or '.child_index' are allowed */
+	ret = sscanf(buf, "%i.%i", pkey, child_index);
+	if (ret == 1)  /* just pkey, implicit child index is 0 */
+		*child_index = 0;
+	else  if (ret != 2) { /* pkey same as parent, specified child index */
+		*pkey = priv->pkey;
+		ret  = sscanf(buf, ".%i", child_index);
+		if (ret != 1 || *child_index == 0)
+			return -EINVAL;
+	}
+
+	if (*child_index < 0 || *child_index > 0xff)
+		return -EINVAL;
+
+	if (*pkey < 0 || *pkey > 0xffff)
+		return -EINVAL;
+
+	ipoib_dbg(priv, "parse_child inp %s out pkey %04x index %d\n",
+		buf, *pkey, *child_index);
+	return 0;
+}
+
 static ssize_t create_child(struct device *dev,
 			    struct device_attribute *attr,
 			    const char *buf, size_t count)
 {
-	int pkey;
+	int pkey, child_index;
 	int ret;
 
-	if (sscanf(buf, "%i", &pkey) != 1)
-		return -EINVAL;
-
-	if (pkey < 0 || pkey > 0xffff)
+	if (parse_child(dev, buf, &pkey, &child_index))
 		return -EINVAL;
 
 	/*
@@ -1114,7 +1141,7 @@ static ssize_t create_child(struct device *dev,
 	 */
 	pkey |= 0x8000;
 
-	ret = ipoib_vlan_add(to_net_dev(dev), pkey);
+	ret = ipoib_vlan_add(to_net_dev(dev), pkey, child_index);
 
 	return ret ? ret : count;
 }
@@ -1124,16 +1151,13 @@ static ssize_t delete_child(struct device *dev,
 			    struct device_attribute *attr,
 			    const char *buf, size_t count)
 {
-	int pkey;
+	int pkey, child_index;
 	int ret;
 
-	if (sscanf(buf, "%i", &pkey) != 1)
-		return -EINVAL;
-
-	if (pkey < 0 || pkey > 0xffff)
+	if (parse_child(dev, buf, &pkey, &child_index))
 		return -EINVAL;
 
-	ret = ipoib_vlan_delete(to_net_dev(dev), pkey);
+	ret = ipoib_vlan_delete(to_net_dev(dev), pkey, child_index);
 
 	return ret ? ret : count;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 049a997..2131772 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -167,7 +167,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 			size += ipoib_recvq_size * ipoib_max_conn_qp;
 	}
 
-	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0);
+	priv->recv_cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size,
+				     priv->child_index % priv->ca->num_comp_vectors);
 	if (IS_ERR(priv->recv_cq)) {
 		printk(KERN_WARNING "%s: failed to create receive CQ\n", ca->name);
 		goto out_free_mr;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index d7e9740..2d35cb4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -49,7 +49,8 @@ static ssize_t show_parent(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR(parent, S_IRUGO, show_parent, NULL);
 
-int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
+int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey,
+		unsigned char child_index)
 {
 	struct ipoib_dev_priv *ppriv, *priv;
 	char intf_name[IFNAMSIZ];
@@ -65,25 +66,40 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 	mutex_lock(&ppriv->vlan_mutex);
 
 	/*
-	 * First ensure this isn't a duplicate. We check the parent device and
-	 * then all of the child interfaces to make sure the Pkey doesn't match.
+	 * First ensure this isn't a duplicate. We check all of the child
+	 * interfaces to make sure the Pkey AND the child index
+	 * don't match.
 	 */
-	if (ppriv->pkey == pkey) {
-		result = -ENOTUNIQ;
-		priv = NULL;
-		goto err;
-	}
-
 	list_for_each_entry(priv, &ppriv->child_intfs, list) {
-		if (priv->pkey == pkey) {
+		if (priv->pkey == pkey && priv->child_index == child_index) {
 			result = -ENOTUNIQ;
 			priv = NULL;
 			goto err;
 		}
 	}
 
-	snprintf(intf_name, sizeof intf_name, "%s.%04x",
-		 ppriv->dev->name, pkey);
+	/*
+	 * for the case of non-legacy and same pkey childs we wanted to use
+	 * a notation of ibN.pkey:index and ibN:index but this is problematic
+	 * with tools like ifconfig who treat devices with ":" in their names
+	 * as aliases which are restriced, e.t w.r.t counters, etc
+	 */
+	if (ppriv->pkey != pkey && child_index == 0) /* legacy child */
+		snprintf(intf_name, sizeof intf_name, "%s.%04x",
+			 ppriv->dev->name, pkey);
+	else if (ppriv->pkey != pkey && child_index != 0) /* non-legacy child */
+		snprintf(intf_name, sizeof intf_name, "%s.%04x.%d",
+			 ppriv->dev->name, pkey, child_index);
+	else if (ppriv->pkey == pkey && child_index != 0) /* same pkey child */
+		snprintf(intf_name, sizeof intf_name, "%s.%d",
+			 ppriv->dev->name, child_index);
+	else  {
+		ipoib_warn(ppriv, "wrong pkey/child_index pairing %04x %d\n",
+			   pkey, child_index);
+		result = -EINVAL;
+		goto err;
+	}
+
 	priv = ipoib_intf_alloc(intf_name);
 	if (!priv) {
 		result = -ENOMEM;
@@ -101,6 +117,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		goto err;
 
 	priv->pkey = pkey;
+	priv->child_index = child_index;
 
 	memcpy(priv->dev->dev_addr, ppriv->dev->dev_addr, INFINIBAND_ALEN);
 	priv->dev->broadcast[8] = pkey >> 8;
@@ -157,7 +174,8 @@ err:
 	return result;
 }
 
-int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
+int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey,
+		unsigned char child_index)
 {
 	struct ipoib_dev_priv *ppriv, *priv, *tpriv;
 	struct net_device *dev = NULL;
@@ -171,7 +189,7 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey)
 		return restart_syscall();
 	mutex_lock(&ppriv->vlan_mutex);
 	list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) {
-		if (priv->pkey == pkey) {
+		if (priv->pkey == pkey && priv->child_index == child_index) {
 			unregister_netdevice(priv->dev);
 			ipoib_dev_cleanup(priv->dev);
 			list_del(&priv->list);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 2/9] include/linux: Add private flags for IPoIB interfaces
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
@ 2012-07-18 10:59 ` Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 3/9] IB/ipoib: Add support for acting as VIF Or Gerlitz
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

The new 2 bits indicates whenever a device is considered PIF interface,
which means the "main" interfaces (ib0, ib1 etc), or cloned interfaces
(ib0.1, ib1.2 etc.) that is now in use by the eIPoIB driver.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/linux/if.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/if.h b/include/linux/if.h
index 1ec407b..f50dbf2 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -84,6 +84,8 @@
 #define IFF_LIVE_ADDR_CHANGE 0x100000	/* device supports hardware address
 					 * change when it's running */
 
+#define IFF_EIPOIB_PIF  0x200000       /* IPoIB PIF intf (ib0, ib1 etc.) */
+#define IFF_EIPOIB_VIF  0x400000       /* IPoIB VIF intf (ib0.x, ib1.x etc.) */
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
 #define IF_GET_PROTO	0x0002
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 3/9] IB/ipoib: Add support for acting as VIF
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 2/9] include/linux: Add private flags for IPoIB interfaces Or Gerlitz
@ 2012-07-18 10:59 ` Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 4/9] net/eipoib: Add private header file Or Gerlitz
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

When IPoIB interface acts as a VIF for an eIPoIB interface, it uses
the skb cb storage area on the RX flow, to place information which
can be of use to the upper layer device.

One such usage example, is when an eIPoIB inteface needs to generate
a source mac for incoming Ethernet frames.

The IPoIB code checks the VIF private flag on the RX path, and according
to the value of the flag prepares the skb CB data, etc.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |    5 +++
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |    9 +++++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   |    8 ++++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c |   21 +++++++++++
 include/rdma/e_ipoib.h                    |   54 +++++++++++++++++++++++++++++
 5 files changed, 96 insertions(+), 1 deletions(-)
 create mode 100644 include/rdma/e_ipoib.h

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index a57db27..0416e8f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -52,6 +52,7 @@
 #include <rdma/ib_pack.h>
 #include <rdma/ib_sa.h>
 #include <linux/sched.h>
+#include <rdma/e_ipoib.h>
 
 /* constants */
 
@@ -209,6 +210,7 @@ struct ipoib_cm_rx {
 	unsigned long		jiffies;
 	enum ipoib_cm_state	state;
 	int			recv_count;
+	u32			qpn;
 };
 
 struct ipoib_cm_tx {
@@ -695,6 +697,9 @@ extern int ipoib_recvq_size;
 
 extern struct ib_sa_client ipoib_sa_client;
 
+void set_skb_oob_cb_data(struct sk_buff *skb, struct ib_wc *wc,
+			 struct napi_struct *napi);
+
 #ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
 extern int ipoib_debug_level;
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 1ca7322..6042905 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -440,6 +440,7 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	struct net_device *dev = cm_id->context;
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_cm_rx *p;
+	struct ipoib_cm_data *data = event->private_data;
 	unsigned psn;
 	int ret;
 
@@ -452,6 +453,10 @@ static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *even
 	cm_id->context = p;
 	p->state = IPOIB_CM_RX_LIVE;
 	p->jiffies = jiffies;
+
+	/* used to keep track of base qpn in CM mode */
+	p->qpn = be32_to_cpu(data->qpn);
+
 	INIT_LIST_HEAD(&p->list);
 
 	p->qp = ipoib_cm_create_rx_qp(dev, p);
@@ -669,6 +674,10 @@ copied:
 	skb->dev = dev;
 	/* XXX get correct PACKET_ type here */
 	skb->pkt_type = PACKET_HOST;
+	/* if handler is registered on top of ipoib, set skb oob data. */
+	if (skb->dev->priv_flags & IFF_EIPOIB_VIF)
+		set_skb_oob_cb_data(skb, wc, NULL);
+
 	netif_receive_skb(skb);
 
 repost:
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index f10221f..f248e6e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -304,7 +304,13 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 			likely(wc->wc_flags & IB_WC_IP_CSUM_OK))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-	napi_gro_receive(&priv->napi, skb);
+	/* if handler is registered on top of ipoib, set skb oob data */
+	if (dev->priv_flags & IFF_EIPOIB_VIF) {
+		set_skb_oob_cb_data(skb, wc, &priv->napi);
+		/* the registered handler will take care of the skb */
+		netif_receive_skb(skb);
+	} else
+		napi_gro_receive(&priv->napi, skb);
 
 repost:
 	if (unlikely(ipoib_ib_post_receive(dev, wr_id)))
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index d0cb5cc..8575fa7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -91,6 +91,24 @@ static struct ib_client ipoib_client = {
 	.remove = ipoib_remove_one
 };
 
+void set_skb_oob_cb_data(struct sk_buff *skb, struct ib_wc *wc,
+			 struct napi_struct *napi)
+{
+	struct ipoib_cm_rx *p_cm_ctx = NULL;
+	struct eipoib_cb_data *data = NULL;
+
+	p_cm_ctx = wc->qp->qp_context;
+	data = IPOIB_HANDLER_CB(skb);
+
+	data->rx.slid = wc->slid;
+	data->rx.sqpn = wc->src_qp;
+	data->rx.napi = napi;
+
+	/* in CM mode, use the "base" qpn as sqpn */
+	if (p_cm_ctx)
+		data->rx.sqpn = p_cm_ctx->qpn;
+}
+
 int ipoib_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -1277,6 +1295,9 @@ static struct net_device *ipoib_add_port(const char *format,
 		goto event_failed;
 	}
 
+	/* indicates pif port */
+	priv->dev->priv_flags |= IFF_EIPOIB_PIF;
+
 	result = register_netdev(priv->dev);
 	if (result) {
 		printk(KERN_WARNING "%s: couldn't register ipoib port %d; error %d\n",
diff --git a/include/rdma/e_ipoib.h b/include/rdma/e_ipoib.h
new file mode 100644
index 0000000..7249334
--- /dev/null
+++ b/include/rdma/e_ipoib.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * openfabric.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _LINUX_ETH_IB_IPOIB_H
+#define _LINUX_ETH_IB_IPOIB_H
+
+#include <net/sch_generic.h>
+
+struct eipoib_cb_data {
+	/*
+	 * extra care taken not to collide with the usage done
+	 * by the qdisc layer in struct skb cb data.
+	 */
+	struct qdisc_skb_cb	qdisc_cb;
+	struct { /* must be <= 20 bytes */
+		u32 sqpn;
+		struct napi_struct *napi;
+		u16 slid;
+		u8 data[6];
+	} __packed rx;
+};
+
+#define IPOIB_HANDLER_CB(skb) ((struct eipoib_cb_data *)(skb)->cb)
+
+#endif /* _LINUX_ETH_IB_IPOIB_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 4/9] net/eipoib: Add private header file
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
                   ` (2 preceding siblings ...)
  2012-07-18 10:59 ` [PATCH net-next V1 3/9] IB/ipoib: Add support for acting as VIF Or Gerlitz
@ 2012-07-18 10:59 ` Or Gerlitz
  2012-07-18 10:59 ` [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support Or Gerlitz
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

The header file includes all structures, macros and non-static
functions which are of use by the driver.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/eipoib/eth_ipoib.h |  227 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 227 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/eipoib/eth_ipoib.h

diff --git a/drivers/net/eipoib/eth_ipoib.h b/drivers/net/eipoib/eth_ipoib.h
new file mode 100644
index 0000000..408cef5
--- /dev/null
+++ b/drivers/net/eipoib/eth_ipoib.h
@@ -0,0 +1,227 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * openfabric.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _LINUX_ETH_IPOIB_H
+#define _LINUX_ETH_IPOIB_H
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <net/arp.h>
+#include <linux/if_vlan.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <linux/if_infiniband.h>
+#include <rdma/ib_verbs.h>
+
+#include <rdma/e_ipoib.h>
+
+/* macros and definitions */
+#define DRV_VERSION		"1.0.0"
+#define DRV_RELDATE		"June 1, 2012"
+#define DRV_NAME		"eth_ipoib"
+#define SDRV_NAME		"ipoib"
+#define DRV_DESCRIPTION		"IP-over-InfiniBand Para Virtualized Driver"
+#define EIPOIB_ABI_VER	1
+
+#undef  pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#define GID_LEN			16
+#define GUID_LEN		8
+
+#define PARENT_VLAN_FEATURES \
+	(NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX | \
+	 NETIF_F_HW_VLAN_FILTER)
+
+#define parent_for_each_slave(_parent, slave)		\
+		list_for_each_entry(slave, &(_parent)->slave_list, list)\
+
+#define PARENT_IS_OK(_parent)				\
+		(((_parent)->dev->flags & IFF_UP) &&	\
+		 netif_running((_parent)->dev)    &&	\
+		 ((_parent)->slave_cnt > 0))
+
+#define IS_E_IPOIB_PROTO(_proto)			\
+		 (((_proto) == htons(ETH_P_ARP)) ||	\
+		 ((_proto) == htons(ETH_P_RARP)) ||	\
+		 ((_proto) == htons(ETH_P_IP)))
+
+enum eipoib_emac_guest_info {
+	VALID,
+	MIGRATED_OUT,
+	INVALID,
+};
+
+/* structs */
+struct eth_arp_data {
+	u8 arp_sha[ETH_ALEN];
+	__be32 arp_sip;
+	u8 arp_dha[ETH_ALEN];
+	__be32 arp_dip;
+} __packed;
+
+struct ipoib_arp_data {
+	u8 arp_sha[INFINIBAND_ALEN];
+	__be32 arp_sip;
+	u8 arp_dha[INFINIBAND_ALEN];
+	__be32 arp_dip;
+} __packed;
+
+/* live migration support structures: */
+struct ip_member {
+	__be32 ip;
+	struct list_head list;
+};
+
+/*
+ * for each slave (emac) saves all the ip over that mac.
+ * the parent keeps that list for live migration.
+ */
+struct guest_emac_info {
+	u8 emac[ETH_ALEN];
+	u16 vlan;
+	struct list_head ip_list;
+	struct list_head list;
+	enum eipoib_emac_guest_info rec_state;
+	int num_of_retries;
+};
+
+struct neigh {
+	struct list_head list;
+	u8 emac[ETH_ALEN];
+	u8 imac[INFINIBAND_ALEN];
+	/* this part is used for neigh_add_list */
+	char cmd[PAGE_SIZE];
+};
+
+struct slave {
+	struct net_device *dev;
+	struct slave *next;
+	struct slave *prev;
+	int    index;
+	struct list_head list;
+	unsigned long jiffies;
+	s8     link;
+	s8     state;
+	u16    pkey;
+	u16    vlan;
+	u8     emac[ETH_ALEN];
+	u8     imac[INFINIBAND_ALEN];
+	struct list_head neigh_list;
+	/* this part is used for vif_add_list */
+	char cmd[PAGE_SIZE];
+};
+
+struct port_stats {
+	/* update PORT_STATS_LEN (number of stat fields)accordingly */
+	unsigned long tx_parent_dropped;
+	unsigned long tx_vif_miss;
+	unsigned long tx_neigh_miss;
+	unsigned long tx_vlan;
+	unsigned long tx_shared;
+	unsigned long tx_proto_errors;
+	unsigned long tx_skb_errors;
+	unsigned long tx_slave_err;
+
+	unsigned long rx_parent_dropped;
+	unsigned long rx_vif_miss;
+	unsigned long rx_neigh_miss;
+	unsigned long rx_vlan;
+	unsigned long rx_shared;
+	unsigned long rx_proto_errors;
+	unsigned long rx_skb_errors;
+	unsigned long rx_slave_err;
+};
+
+struct parent {
+	struct   net_device *dev;
+	int      index;
+	struct   neigh_parms nparms;
+	struct   list_head slave_list;
+	/* never change this value outside the attach/detach wrappers */
+	s32      slave_cnt;
+	rwlock_t lock;
+	struct   net_device_stats stats;
+	struct   port_stats port_stats;
+	struct   list_head parent_list;
+	struct   dev_mc_list *mc_list;
+	u16      flags;
+	struct   list_head vlan_list;
+	struct   workqueue_struct *wq;
+	s8       kill_timers;
+	struct   delayed_work neigh_learn_work;
+	struct   delayed_work vif_learn_work;
+	struct   list_head neigh_add_list;
+	union    ib_gid gid;
+	char     ipoib_main_interface[IFNAMSIZ];
+	struct   list_head emac_ip_list;
+	struct   delayed_work emac_ip_work;
+	struct   delayed_work migrate_out_work;
+};
+
+#define eipoib_slave_get_rcu(dev) \
+	((struct slave *) rcu_dereference(dev->rx_handler_data))
+
+/* name space support for sys/fs */
+struct eipoib_net {
+	struct net	*net;	/* Associated network namespace */
+	struct class_attribute class_attr_eipoib_interfaces;
+};
+
+/* exported from main.c */
+extern int eipoib_net_id;
+extern struct list_head parent_dev_list;
+
+/* functions prototypes */
+int mod_create_sysfs(struct eipoib_net *eipoib_n);
+void mod_destroy_sysfs(struct eipoib_net *eipoib_n);
+void parent_destroy_sysfs_entry(struct parent *parent);
+int parent_create_sysfs_entry(struct parent *parent);
+int create_slave_symlinks(struct net_device *master,
+			  struct net_device *slave);
+void destroy_slave_symlinks(struct net_device *master,
+			    struct net_device *slave);
+int parent_enslave(struct net_device *parent_dev,
+		   struct net_device *slave_dev);
+int parent_release_slave(struct net_device *parent_dev,
+			 struct net_device *slave_dev);
+struct neigh *parent_get_neigh_cmd(char op, char *ifname,
+				   u8 *remac, u8 *rimac);
+struct slave *parent_get_vif_cmd(char op, char *ifname, u8 *lemac);
+ssize_t __parent_store_neighs(struct device *d,
+			      struct device_attribute *attr,
+			      const char *buffer, size_t count);
+void parent_set_ethtool_ops(struct net_device *dev);
+
+#endif /* _LINUX_ETH_IPOIB_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
                   ` (3 preceding siblings ...)
  2012-07-18 10:59 ` [PATCH net-next V1 4/9] net/eipoib: Add private header file Or Gerlitz
@ 2012-07-18 10:59 ` Or Gerlitz
  2012-07-18 18:37   ` Ben Hutchings
  2012-07-18 10:59 ` [PATCH net-next V1 6/9] net/eipoib: Add sysfs support Or Gerlitz
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

Via ethtool the driver describes its version, ABI version, on what PIF
interface it runs and various statistics.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/eipoib/eth_ipoib_ethtool.c |  126 ++++++++++++++++++++++++++++++++
 1 files changed, 126 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/eipoib/eth_ipoib_ethtool.c

diff --git a/drivers/net/eipoib/eth_ipoib_ethtool.c b/drivers/net/eipoib/eth_ipoib_ethtool.c
new file mode 100644
index 0000000..cd6ed91
--- /dev/null
+++ b/drivers/net/eipoib/eth_ipoib_ethtool.c
@@ -0,0 +1,126 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * openfabric.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "eth_ipoib.h"
+
+static void parent_ethtool_get_drvinfo(struct net_device *parent_dev,
+				       struct ethtool_drvinfo *drvinfo)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	strncpy(drvinfo->driver, DRV_NAME, 32);
+
+	strncpy(drvinfo->version, DRV_VERSION, 32);
+
+	strncpy(drvinfo->bus_info, parent->ipoib_main_interface,
+		ETHTOOL_BUSINFO_LEN);
+
+	/* indicates ABI version */
+	snprintf(drvinfo->fw_version, 32, "%d", EIPOIB_ABI_VER);
+}
+
+static const char parent_strings[][ETH_GSTRING_LEN] = {
+	/* private statistics */
+	"tx_parent_dropped",
+	"tx_vif_miss",
+	"tx_neigh_miss",
+	"tx_vlan",
+	"tx_shared",
+	"tx_proto_errors",
+	"tx_skb_errors",
+	"tx_slave_err",
+
+	"rx_parent_dropped",
+	"rx_vif_miss",
+	"rx_neigh_miss",
+	"rx_vlan",
+	"rx_shared",
+	"rx_proto_errors",
+	"rx_skb_errors",
+	"rx_slave_err",
+#define PORT_STATS_LEN	(8 * 2)
+};
+
+#define PARENT_STATS_LEN (sizeof(parent_strings) / ETH_GSTRING_LEN)
+
+static void parent_get_strings(struct net_device *parent_dev,
+			       uint32_t stringset, uint8_t *data)
+{
+	int index = 0, stats_off = 0, i;
+
+	if (stringset != ETH_SS_STATS)
+		return;
+
+	for (i = 0; i < PORT_STATS_LEN; i++)
+		strcpy(data + (index++) * ETH_GSTRING_LEN,
+		       parent_strings[i + stats_off]);
+	stats_off += PORT_STATS_LEN;
+
+}
+
+static void parent_get_ethtool_stats(struct net_device *parent_dev,
+				     struct ethtool_stats *stats,
+				     uint64_t *data)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+	int index = 0, i;
+
+	read_lock_bh(&parent->lock);
+
+	for (i = 0; i < PORT_STATS_LEN; i++)
+		data[index++] = ((unsigned long *) &parent->port_stats)[i];
+
+	read_unlock_bh(&parent->lock);
+}
+
+static int parent_get_sset_count(struct net_device *parent_dev, int sset)
+{
+	switch (sset) {
+	case ETH_SS_STATS:
+		return PARENT_STATS_LEN;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static const struct ethtool_ops parent_ethtool_ops = {
+	.get_drvinfo		= parent_ethtool_get_drvinfo,
+	.get_strings		= parent_get_strings,
+	.get_ethtool_stats	= parent_get_ethtool_stats,
+	.get_sset_count		= parent_get_sset_count,
+	.get_link		= ethtool_op_get_link,
+};
+
+void parent_set_ethtool_ops(struct net_device *dev)
+{
+	SET_ETHTOOL_OPS(dev, &parent_ethtool_ops);
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 6/9] net/eipoib: Add sysfs support
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
                   ` (4 preceding siblings ...)
  2012-07-18 10:59 ` [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support Or Gerlitz
@ 2012-07-18 10:59 ` Or Gerlitz
  2012-07-23 12:55   ` Or Gerlitz
  2012-07-18 11:00 ` [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality Or Gerlitz
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 10:59 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

The management interface for the driver uses sysfs entries. Via these sysfs
entries the driver gets details on new VIF's to manage. The driver can
enslave new VIF (IPoIB cloned interface) or detaches from it.

Here are few sysfs commands that are used in order to manage the driver,
according to few scenarios:

1. create new clone of IPoIB interface:

	$ echo .Y > /sys/class/net/ibX/create_child

create new clone ibX.Y with the same pkey as ibX, for example:

	$ echo .1 > /sys/class/net/ib0/create_child

will create new interface ib0.1

2. notify parent interface on new VIF to enslave:

	$ echo +ibX.Y > /sys/class/net/ethZ/eth/slaves

where ethZ is the driver interface, for example:

	$ echo +ib0.1 > /sys/class/net/eth4/eth/slaves

will enslave ib0.1 to eth4

3. notify parent interface interface on VIF details (mac and vlan)

	$ echo +ibX.Y <MAC address> > /sys/class/net/ethZ/eth/vifs

for example:

	$ echo +ib0.1 00:02:c9:43:3b:f1 > /sys/class/net/eth4/eth/vifs

4. notify parent to release VIF:

	$ echo -ibX.Y > /sys/class/net/ethZ/eth/slaves

where ethZ is the driver interface, for example:

        $ echo -ib0.1 > /sys/class/net/eth4/eth/slaves

will release ib0.1 from eth4

5. see the list of ipoib interfaces enslaved under eipoib interface,

	$ cat /sys/class/net/ethX/eth/vifs

for example:

	$ cat /sys/class/net/eth4/eth/vifs

	SLAVE=ib0.1      MAC=9a:c2:1f:d7:3b:63 VLAN=N/A
	SLAVE=ib0.2      MAC=52:54:00:60:55:88 VLAN=N/A
	SLAVE=ib0.3      MAC=52:54:00:60:55:89 VLAN=N/A

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/eipoib/eth_ipoib_sysfs.c |  640 ++++++++++++++++++++++++++++++++++
 1 files changed, 640 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/eipoib/eth_ipoib_sysfs.c

diff --git a/drivers/net/eipoib/eth_ipoib_sysfs.c b/drivers/net/eipoib/eth_ipoib_sysfs.c
new file mode 100644
index 0000000..c3fc121
--- /dev/null
+++ b/drivers/net/eipoib/eth_ipoib_sysfs.c
@@ -0,0 +1,640 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * openfabric.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/in.h>
+#include <linux/sysfs.h>
+#include <linux/ctype.h>
+#include <linux/inet.h>
+#include <linux/rtnetlink.h>
+#include <linux/etherdevice.h>
+#include <net/net_namespace.h>
+
+#include "eth_ipoib.h"
+
+#define to_dev(obj)	container_of(obj, struct device, kobj)
+#define to_parent(cd)	((struct parent *)(netdev_priv(to_net_dev(cd))))
+#define MOD_NA_STRING		"N/A"
+
+#define _sprintf(p, buf, format, arg...)				\
+((PAGE_SIZE - (int)(p - buf)) <= 0 ? 0 :				\
+	scnprintf(p, PAGE_SIZE - (int)(p - buf), format, ## arg))\
+
+#define _end_of_line(_p, _buf)					\
+do { if (_p - _buf) /* eat the leftover space */			\
+		buf[_p - _buf - 1] = '\n';				\
+} while (0)
+
+/* helper functions */
+static int get_emac(u8 *mac, char *s)
+{
+	if (sscanf(s, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+		   mac + 0, mac + 1, mac + 2, mac + 3, mac + 4,
+		   mac + 5) != 6)
+		return -1;
+
+	return 0;
+}
+
+static int get_imac(u8 *mac, char *s)
+{
+	if (sscanf(s, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:"
+		   "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:"
+		   "%hhx:%hhx:%hhx:%hhx",
+		   mac + 0, mac + 1, mac + 2, mac + 3, mac + 4,
+		   mac + 5, mac + 6, mac + 7, mac + 8, mac + 9,
+		   mac + 10, mac + 11, mac + 12, mac + 13,
+		   mac + 14, mac + 15, mac + 16, mac + 17,
+		   mac + 18, mac + 19) != 20)
+		return -1;
+
+	return 0;
+}
+
+/* show/store functions per module (CLASS_ATTR) */
+static ssize_t show_parents(struct class *cls, struct class_attribute *attr,
+			    char *buf)
+{
+	char *p = buf;
+	struct parent *parent;
+
+	rtnl_lock(); /* because of parent_dev_list */
+
+	list_for_each_entry(parent, &parent_dev_list, parent_list) {
+		p += _sprintf(p, buf, "%s over IB port: %s\n",
+			      parent->dev->name,
+			      parent->ipoib_main_interface);
+	}
+	_end_of_line(p, buf);
+
+	rtnl_unlock();
+	return (ssize_t)(p - buf);
+}
+
+/* show/store functions per parent (DEVICE_ATTR) */
+static ssize_t parent_show_neighs(struct device *d,
+				  struct device_attribute *attr, char *buf)
+{
+	struct slave *slave;
+	struct neigh *neigh;
+	struct parent *parent = to_parent(d);
+	char *p = buf;
+
+	read_lock_bh(&parent->lock);
+	parent_for_each_slave(parent, slave) {
+		list_for_each_entry(neigh, &slave->neigh_list, list) {
+			p += _sprintf(p, buf, "SLAVE=%-10s EMAC=%pM IMAC=%pM:%pM:%pM:%.2x:%.2x\n",
+				      slave->dev->name,
+				      neigh->emac,
+				      neigh->imac, neigh->imac + 6, neigh->imac + 12,
+				      neigh->imac[18], neigh->imac[19]);
+		}
+	}
+
+	read_unlock_bh(&parent->lock);
+
+	_end_of_line(p, buf);
+
+	return (ssize_t)(p - buf);
+}
+
+struct neigh *parent_get_neigh_cmd(char op,
+				   char *ifname, u8 *remac, u8 *rimac)
+{
+	struct neigh *neigh_cmd;
+
+	neigh_cmd = kzalloc(sizeof *neigh_cmd, GFP_ATOMIC);
+	if (!neigh_cmd) {
+		pr_err("%s cannot allocate neigh struct\n", ifname);
+		goto out;
+	}
+
+	/*
+	 * populate emac field so it can be used easily
+	 * in neigh_cmd_find_by_mac()
+	 */
+	memcpy(neigh_cmd->emac, remac, ETH_ALEN);
+	memcpy(neigh_cmd->imac, rimac, INFINIBAND_ALEN);
+
+	/* prepare the command as a string */
+	sprintf(neigh_cmd->cmd, "%c%s %pM %pM:%pM:%pM:%.2x:%.2x",
+		op, ifname, remac, rimac, rimac + 6, rimac + 12, rimac[18], rimac[19]);
+out:
+	return neigh_cmd;
+}
+
+/* write_lock_bh(&parent->lock) must be held */
+ssize_t __parent_store_neighs(struct device *d,
+			      struct device_attribute *attr,
+			      const char *buffer, size_t count)
+{
+	char command[IFNAMSIZ + 1] = { 0, };
+	char emac_str[ETH_ALEN * 3] = { 0, };
+	u8 emac[ETH_ALEN];
+	char imac_str[INFINIBAND_ALEN * 3] = { 0, };
+	u8 imac[INFINIBAND_ALEN];
+	char *ifname;
+	int found = 0, ret = count;
+	struct slave *slave = NULL, *slave_tmp;
+	struct neigh *neigh;
+	struct parent *parent = to_parent(d);
+
+	sscanf(buffer, "%s %s %s", command, emac_str, imac_str);
+
+	/* check ifname */
+	ifname = command + 1;
+	if ((strlen(command) <= 1) || !dev_valid_name(ifname) ||
+	    (command[0] != '+' && command[0] != '-'))
+		goto err_no_cmd;
+
+	/* check if ifname exist */
+	parent_for_each_slave(parent, slave_tmp) {
+		if (!strcmp(slave_tmp->dev->name, ifname)) {
+			found = 1;
+			slave = slave_tmp;
+		}
+	}
+
+	if (!found) {
+		pr_err("%s could not find slave\n", ifname);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (get_emac(emac, emac_str)) {
+		pr_err("%s bad emac %s\n", ifname, emac_str);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (get_imac(imac, imac_str)) {
+		pr_err("%s bad imac %s\n", ifname, imac_str);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* process command */
+	if (command[0] == '+') {
+		found = 0;
+		list_for_each_entry(neigh, &slave->neigh_list, list) {
+			if (!memcmp(neigh->emac, emac, ETH_ALEN))
+				found = 1;
+		}
+
+		if (found) {
+			pr_err("%s: cannot update neigh, slave already has "
+			       "this neigh mac %pM\n",
+			       slave->dev->name, emac);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		neigh = kzalloc(sizeof *neigh, GFP_ATOMIC);
+		if (!neigh) {
+			pr_err("%s cannot allocate neigh struct\n",
+			       slave->dev->name);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		/* ready to go */
+		pr_info("%s: slave %s neigh mac is set to %pM\n",
+			ifname, parent->dev->name, emac);
+		memcpy(neigh->emac, emac, ETH_ALEN);
+		memcpy(neigh->imac, imac, INFINIBAND_ALEN);
+
+		list_add_tail(&neigh->list, &slave->neigh_list);
+
+		goto out;
+	}
+
+	if (command[0] == '-') {
+		found = 0;
+		list_for_each_entry(neigh, &slave->neigh_list, list) {
+			if (!memcmp(neigh->emac, emac, ETH_ALEN))
+				found = 1;
+		}
+
+		if (!found) {
+			pr_err("%s cannot delete neigh mac %pM\n",
+			       ifname, emac);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		list_del(&neigh->list);
+		kfree(neigh);
+
+		goto out;
+	}
+
+err_no_cmd:
+	pr_err("%s USAGE: (-|+)ifname emac imac\n", DRV_NAME);
+	ret = -EPERM;
+
+out:
+	return ret;
+}
+
+static ssize_t parent_store_neighs(struct device *d,
+				   struct device_attribute *attr,
+				   const char *buffer, size_t count)
+{
+	struct parent *parent = to_parent(d);
+	ssize_t rc;
+
+	write_lock_bh(&parent->lock);
+	rc = __parent_store_neighs(d, attr, buffer, count);
+	write_unlock_bh(&parent->lock);
+
+	return rc;
+}
+
+static DEVICE_ATTR(neighs, S_IRUGO | S_IWUSR, parent_show_neighs,
+		   parent_store_neighs);
+
+static ssize_t parent_show_vifs(struct device *d,
+				struct device_attribute *attr, char *buf)
+{
+	struct slave *slave;
+	struct parent *parent = to_parent(d);
+	char *p = buf;
+
+	read_lock_bh(&parent->lock);
+	parent_for_each_slave(parent, slave) {
+		if (is_zero_ether_addr(slave->emac)) {
+			p += _sprintf(p, buf, "SLAVE=%-10s MAC=%-17s "
+				      "VLAN=%s\n", slave->dev->name,
+				      MOD_NA_STRING, MOD_NA_STRING);
+		} else if (slave->vlan == VLAN_N_VID) {
+			p += _sprintf(p, buf, "SLAVE=%-10s MAC=%pM VLAN=%s\n",
+				      slave->dev->name,
+				      slave->emac,
+				      MOD_NA_STRING);
+		} else {
+			p += _sprintf(p, buf, "SLAVE=%-10s MAC=%pM VLAN=%d\n",
+				      slave->dev->name,
+				      slave->emac,
+				      slave->vlan);
+		}
+	}
+	read_unlock_bh(&parent->lock);
+
+	_end_of_line(p, buf);
+
+	return (ssize_t)(p - buf);
+}
+
+static ssize_t parent_store_vifs(struct device *d,
+				 struct device_attribute *attr,
+				 const char *buffer, size_t count)
+{
+	char command[IFNAMSIZ + 1] = { 0, };
+	char mac_str[ETH_ALEN * 3] = { 0, };
+	char *ifname;
+	u8 mac[ETH_ALEN];
+	int found = 0, ret = count;
+	struct slave *slave = NULL, *slave_tmp;
+	struct parent *parent = to_parent(d);
+
+	sscanf(buffer, "%s %s", command, mac_str);
+
+	write_lock_bh(&parent->lock);
+
+	/* check ifname */
+	ifname = command + 1;
+	if ((strlen(command) <= 1) || !dev_valid_name(ifname) ||
+	    (command[0] != '+' && command[0] != '-'))
+		goto err_no_cmd;
+
+	/* check if ifname exist */
+	parent_for_each_slave(parent, slave_tmp) {
+		if (!strcmp(slave_tmp->dev->name, ifname)) {
+			found = 1;
+			slave = slave_tmp;
+		}
+	}
+
+	if (!found) {
+		pr_err("%s could not find slave\n", ifname);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* process command */
+	if (command[0] == '+') {
+		if (get_emac(mac, mac_str) || !is_valid_ether_addr(mac)) {
+			pr_err("%s invalid mac input\n", ifname);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		if (!is_zero_ether_addr(slave->emac)) {
+			pr_err("%s slave %s mac already set to %pM\n",
+			       ifname, slave->dev->name, slave->emac);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/* check another slave has this mac/vlan */
+		found = 0;
+		parent_for_each_slave(parent, slave_tmp) {
+			if (!memcmp(slave_tmp->emac, mac, ETH_ALEN) &&
+			    slave_tmp->vlan == slave->vlan) {
+				pr_err("cannot update %s, slave %s already has"
+				       " vlan 0x%x mac %pM\n",
+				       parent->dev->name, slave->dev->name,
+				       slave_tmp->vlan,
+				       mac);
+				ret = -EINVAL;
+				goto out;
+			}
+		}
+
+		/* ready to go */
+		pr_info("slave %s mac is set to %pM\n",
+			ifname, mac);
+
+		memcpy(slave->emac, mac, ETH_ALEN);
+		goto out;
+	}
+
+	if (command[0] == '-') {
+		if (is_zero_ether_addr(slave->emac)) {
+			pr_err("%s slave mac already unset %pM\n",
+			       ifname, slave->emac);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		pr_info("slave %s mac is unset (was %pM)\n",
+			ifname, slave->emac);
+
+		goto out;
+	}
+
+err_no_cmd:
+	pr_err("%s USAGE: (-|+)ifname [mac]\n", DRV_NAME);
+	ret = -EPERM;
+
+out:
+	write_unlock_bh(&parent->lock);
+
+	return ret;
+}
+
+static DEVICE_ATTR(vifs, S_IRUGO | S_IWUSR, parent_show_vifs,
+		   parent_store_vifs);
+
+static ssize_t parent_show_slaves(struct device *d,
+				  struct device_attribute *attr, char *buf)
+{
+	struct slave *slave;
+	struct parent *parent = to_parent(d);
+	char *p = buf;
+
+	read_lock_bh(&parent->lock);
+	parent_for_each_slave(parent, slave)
+		p += _sprintf(p, buf, "%s\n", slave->dev->name);
+	read_unlock_bh(&parent->lock);
+
+	_end_of_line(p, buf);
+
+	return (ssize_t)(p - buf);
+}
+
+static ssize_t parent_store_slaves(struct device *d,
+				   struct device_attribute *attr,
+				   const char *buffer, size_t count)
+{
+	char command[IFNAMSIZ + 1] = { 0, };
+	char *ifname;
+	int res, ret = count;
+	struct slave *slave;
+	struct net_device *dev = NULL;
+	struct parent *parent = to_parent(d);
+
+	/* Quick sanity check -- is the parent interface up? */
+	if (!(parent->dev->flags & IFF_UP)) {
+		pr_warn("%s: doing slave updates when "
+			"interface is down.\n", dev->name);
+	}
+
+	if (!rtnl_trylock()) /* because __dev_get_by_name */
+		return restart_syscall();
+
+	sscanf(buffer, "%16s", command);
+
+	ifname = command + 1;
+	if ((strlen(command) <= 1) || !dev_valid_name(ifname))
+		goto err_no_cmd;
+
+	if (command[0] == '+') {
+		/* Got a slave name in ifname. Is it already in the list? */
+		dev = __dev_get_by_name(&init_net, ifname);
+		if (!dev) {
+			pr_warn("%s: Interface %s does not exist!\n",
+				parent->dev->name, ifname);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		read_lock_bh(&parent->lock);
+		parent_for_each_slave(parent, slave) {
+			if (slave->dev == dev) {
+				pr_err("%s ERR- Interface %s is already enslaved!\n",
+				       parent->dev->name, dev->name);
+				ret = -EPERM;
+			}
+		}
+		read_unlock_bh(&parent->lock);
+
+		if (ret < 0)
+			goto out;
+
+		pr_info("%s: adding slave %s\n",
+			parent->dev->name, ifname);
+
+		res = parent_enslave(parent->dev, dev);
+		if (res)
+			ret = res;
+
+		goto out;
+	}
+
+	if (command[0] == '-') {
+		dev = NULL;
+		parent_for_each_slave(parent, slave)
+			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
+				dev = slave->dev;
+				break;
+			}
+
+		if (dev) {
+			pr_info("%s: removing slave %s\n",
+				parent->dev->name, dev->name);
+			res = parent_release_slave(parent->dev, dev);
+			if (res) {
+				ret = res;
+				goto out;
+			}
+		} else {
+			pr_warn("%s: unable to remove non-existent "
+				"slave for parent %s.\n",
+				ifname, parent->dev->name);
+			ret = -ENODEV;
+		}
+		goto out;
+	}
+
+err_no_cmd:
+	pr_err("%s USAGE: (-|+)ifname\n", DRV_NAME);
+	ret = -EPERM;
+
+out:
+	rtnl_unlock();
+	return ret;
+}
+
+static DEVICE_ATTR(slaves, S_IRUGO | S_IWUSR, parent_show_slaves,
+		   parent_store_slaves);
+
+/* sysfs create/destroy functions */
+static struct attribute *per_parent_attrs[] = {
+	&dev_attr_slaves.attr, /* DEVICE_ATTR(slaves..) */
+	&dev_attr_vifs.attr,
+	&dev_attr_neighs.attr,
+	NULL,
+};
+
+/* name spcase  support */
+static const void *eipoib_namespace(struct class *cls,
+				    const struct class_attribute *attr)
+{
+	const struct eipoib_net *eipoib_n =
+		container_of(attr,
+			     struct eipoib_net, class_attr_eipoib_interfaces);
+	return eipoib_n->net;
+}
+
+static struct attribute_group parent_group = {
+	/* per parent sysfs files under: /sys/class/net/<IF>/eth/.. */
+	.name = "eth",
+	.attrs = per_parent_attrs
+};
+
+int create_slave_symlinks(struct net_device *master,
+			  struct net_device *slave)
+{
+	char linkname[IFNAMSIZ+7];
+	int ret = 0;
+
+	ret = sysfs_create_link(&(slave->dev.kobj), &(master->dev.kobj),
+				"eth_parent");
+	if (ret)
+		return ret;
+
+	sprintf(linkname, "slave_%s", slave->name);
+	ret = sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
+				linkname);
+	return ret;
+
+}
+
+void destroy_slave_symlinks(struct net_device *master,
+			    struct net_device *slave)
+{
+	char linkname[IFNAMSIZ+7];
+
+	sysfs_remove_link(&(slave->dev.kobj), "eth_parent");
+	sprintf(linkname, "slave_%s", slave->name);
+	sysfs_remove_link(&(master->dev.kobj), linkname);
+}
+
+static struct class_attribute class_attr_eth_ipoib_interfaces = {
+	.attr = {
+		.name = "eth_ipoib_interfaces",
+		.mode = S_IWUSR | S_IRUGO,
+	},
+	.show = show_parents,
+	.namespace = eipoib_namespace,
+};
+
+/* per module sysfs file under: /sys/class/net/eth_ipoib_interfaces */
+int mod_create_sysfs(struct eipoib_net *eipoib_n)
+{
+	int rc;
+	/* defined in CLASS_ATTR(eth_ipoib_interfaces..) */
+	eipoib_n->class_attr_eipoib_interfaces =
+		class_attr_eth_ipoib_interfaces;
+
+	sysfs_attr_init(&eipoib_n->class_attr_eipoib_interfaces.attr);
+
+	rc = netdev_class_create_file(&eipoib_n->class_attr_eipoib_interfaces);
+	if (rc)
+		pr_err("%s failed to create sysfs (rc %d)\n",
+		       eipoib_n->class_attr_eipoib_interfaces.attr.name, rc);
+
+	return rc;
+}
+
+void mod_destroy_sysfs(struct eipoib_net *eipoib_n)
+{
+	netdev_class_remove_file(&eipoib_n->class_attr_eipoib_interfaces);
+}
+
+int parent_create_sysfs_entry(struct parent *parent)
+{
+	struct net_device *dev = parent->dev;
+	int rc;
+
+	rc = sysfs_create_group(&(dev->dev.kobj), &parent_group);
+	if (rc)
+		pr_info("failed to create sysfs group\n");
+
+	return rc;
+}
+
+void parent_destroy_sysfs_entry(struct parent *parent)
+{
+	struct net_device *dev = parent->dev;
+
+	sysfs_remove_group(&(dev->dev.kobj), &parent_group);
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
                   ` (5 preceding siblings ...)
  2012-07-18 10:59 ` [PATCH net-next V1 6/9] net/eipoib: Add sysfs support Or Gerlitz
@ 2012-07-18 11:00 ` Or Gerlitz
  2012-07-19 13:49   ` Ben Hutchings
  2012-07-18 11:00 ` [PATCH net-next V1 8/9] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries Or Gerlitz
  2012-07-18 11:00 ` [PATCH net-next V1 9/9] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour Or Gerlitz
  8 siblings, 1 reply; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 11:00 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

The eipoib driver provides a standard Ethernet netdevice over
the InfiniBand IPoIB interface .

Some services can run only on top of Ethernet L2 interfaces, and cannot be
bound to an IPoIB interface. With this new driver, these services can run
seamlessly.

Main use case of the driver is the Ethernet Virtual Switching used in
virtualized environments, where an eipoib netdevice can be used as a
Physical Interface (PIF) in the hypervisor domain, and allow other
guests Virtual Interfaces (VIF) connected to the same Virtual Switch
to run over the InfiniBand fabric.

This driver supports L2 Switching (Direct Bridging) as well as other L3
Switching modes (e.g. NAT).

Whenever an IPoIB interface is created, one eIPoIB PIF netdevice
will be created. The default naming scheme is as in other Ethernet
interfaces: ethX, for example, on a system with two IPoIB interfaces,
ib0 and ib1, two interfaces will be created ethX and ethX+1 When "X"
is the next free Ethernet number in the system.

Using "ethtool -i " over the new interface can tell on which IPoIB
PIF interface that interface is above.  For example: driver: eth_ipoib:ib0
indicates that eth3 is the Ethernet interface over the ib0 IPoIB interface.

The driver can be used as independent interface or to serve in
virtualization environment as the physical layer for the virtual
interfaces on the virtual guest.

The driver interface (eipoib interface or which is also referred to as parent)
uses slave interfaces, IPoIB clones, which are the VIFs described above.

VIFs interfaces are enslaved/released from the eipoib driver on demand, according
to the management interface provided to user space.

Note: Each ethX interface has at least one ibX.Y slave to serve the PIF
itself, in the VIFs list of ethX you'll notice that ibX.1 is always created
to serve applications running from the Hypervisor on top of ethX interface directly.

For IB applications that require native IPoIB interfaces (e.g. RDMA-CM), the
original ipoib interfaces ibX can still be used.  For example, RDMA-CM and
eth_ipoib drivers can co-exist and make use of IPoIB

Support for Live migration:

The driver expose sysfs interface through which the manager can notify on
migrated out guests ("detached VIF"). The abandoned eIPoIB driver instance
sends ARP requests (defined number) to the network in order to triger the migrated
guest to publish its mac address of the new VIF that hosts it. When the guest
responds to that ARP, the eIPoIB driver on the host that owns that guest, sends
Gratuitous ARP in behalf of that guest such that all the peers on the network
which communicate with this VM are noticed on the new VIF address which is
used by the guest.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/eipoib/eth_ipoib_main.c | 1915 +++++++++++++++++++++++++++++++++++
 1 files changed, 1915 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/eipoib/eth_ipoib_main.c

diff --git a/drivers/net/eipoib/eth_ipoib_main.c b/drivers/net/eipoib/eth_ipoib_main.c
new file mode 100644
index 0000000..fe300c7
--- /dev/null
+++ b/drivers/net/eipoib/eth_ipoib_main.c
@@ -0,0 +1,1915 @@
+/*
+ * Copyright (c) 2012 Mellanox Technologies. All rights reserved
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * openfabric.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "eth_ipoib.h"
+#include <net/ip.h>
+
+#define EMAC_IP_GC_TIME (10 * HZ)
+
+#define MIG_OUT_ARP_REQ_ISSUE_TIME (0.5 * HZ)
+
+#define MIG_OUT_MAX_ARP_RETRIES 5
+
+#define LIVE_MIG_PACKET 1
+
+#define PARENT_MAC_MASK 0xe7
+
+/* forward declaration */
+static rx_handler_result_t eipoib_handle_frame(struct sk_buff **pskb);
+static int eipoib_device_event(struct notifier_block *unused,
+			       unsigned long event, void *ptr);
+static void free_ip_mem_in_rec(struct guest_emac_info *emac_info);
+
+static const char * const version =
+	DRV_DESCRIPTION ": v" DRV_VERSION " (" DRV_RELDATE ")\n";
+
+LIST_HEAD(parent_dev_list);
+
+
+/* ----------------------------- VLAN funcs ---------------------------------- */
+static int eth_ipoib_vlan_rx_add_vid(struct net_device *dev,
+				     unsigned short vid)
+{
+	return 0;
+}
+
+static int eth_ipoib_vlan_rx_kill_vid(struct net_device *dev,
+				      unsigned short vid)
+{
+	return 0;
+}
+
+/* name space sys/fs functions */
+int eipoib_net_id __read_mostly;
+
+static int __net_init eipoib_net_init(struct net *net)
+{
+	int rc;
+	struct eipoib_net *eipoib_n = net_generic(net, eipoib_net_id);
+
+	eipoib_n->net = net;
+	rc = mod_create_sysfs(eipoib_n);
+
+	return rc;
+}
+
+static void __net_exit eipoib_net_exit(struct net *net)
+{
+	struct eipoib_net *eipoib_n = net_generic(net, eipoib_net_id);
+
+	mod_destroy_sysfs(eipoib_n);
+}
+
+static struct pernet_operations eipoib_net_ops = {
+	.init = eipoib_net_init,
+	.exit = eipoib_net_exit,
+	.id   = &eipoib_net_id,
+	.size = sizeof(struct eipoib_net),
+};
+
+/* set mac fields emac=<qpn><lid> */
+static inline
+void build_neigh_mac(u8 *_mac, u32 _qpn, u16 _lid)
+{
+	/* _qpn: 3B _lid: 2B */
+	*((__be32 *)(_mac)) = cpu_to_be32(_qpn);
+	*(u8 *)(_mac) = 0x2; /* set LG bit */
+	*(__be16 *)(_mac + sizeof(_qpn)) = cpu_to_be16(_lid);
+}
+
+static inline
+struct slave *get_slave_by_dev(struct parent *parent,
+			       struct net_device *slave_dev)
+{
+	struct slave *slave, *slave_tmp;
+	int found = 0;
+
+	parent_for_each_slave(parent, slave_tmp) {
+		if (slave_tmp->dev == slave_dev) {
+			found = 1;
+			slave = slave_tmp;
+			break;
+		}
+	}
+
+	return found ? slave : NULL;
+}
+
+static inline
+struct slave *get_slave_by_mac_and_vlan(struct parent *parent, u8 *mac,
+					u16 vlan)
+{
+	struct slave *slave, *slave_tmp;
+	int found = 0;
+
+	parent_for_each_slave(parent, slave_tmp) {
+		if ((!memcmp(slave_tmp->emac, mac, ETH_ALEN)) &&
+		    (slave_tmp->vlan == vlan)) {
+			found = 1;
+			slave = slave_tmp;
+			break;
+		}
+	}
+
+	return found ? slave : NULL;
+}
+
+
+static inline
+struct guest_emac_info *get_mac_ip_info_by_mac_and_vlan(struct parent *parent,
+							u8 *mac, u16 vlan)
+{
+	struct guest_emac_info *emac_info, *emac_info_ret;
+	int found = 0;
+
+	list_for_each_entry(emac_info, &parent->emac_ip_list, list) {
+		if ((!memcmp(emac_info->emac, mac, ETH_ALEN)) &&
+		    vlan == emac_info->vlan) {
+			found = 1;
+			emac_info_ret = emac_info;
+			break;
+		}
+	}
+
+	return found ? emac_info_ret : NULL;
+}
+
+/*
+ * searches for the relevant guest_emac_info in the parent.
+ * if found it, check if it contains the required ip
+ * if no such guest_emac_info object or no ip return 0,
+ * otherwise return 1 and if exist set the guest_emac_info obj.
+ */
+static inline
+int is_mac_info_contain_ip(struct parent *parent, u8 *mac, __be32 ip,
+			  struct guest_emac_info *emac_info, u16 vlan)
+{
+	struct ip_member *ipm;
+	int found = 0;
+
+	emac_info = get_mac_ip_info_by_mac_and_vlan(parent, mac, vlan);
+	if (!emac_info)
+		return 0;
+
+	list_for_each_entry(ipm, &emac_info->ip_list, list) {
+		if (ipm->ip == ip) {
+			found = 1;
+			break;
+		}
+	}
+
+	return found;
+}
+
+static inline int netdev_set_parent_master(struct net_device *slave,
+					   struct net_device *master)
+{
+	int err;
+
+	ASSERT_RTNL();
+
+	err = netdev_set_master(slave, master);
+	if (err)
+		return err;
+	if (master) {
+			slave->priv_flags |= IFF_EIPOIB_VIF;
+			/* deny bonding from enslaving it. */;
+			slave->flags |= IFF_SLAVE;
+	} else {
+		slave->priv_flags &= ~(IFF_EIPOIB_VIF);
+		slave->flags &= ~(IFF_SLAVE);
+	}
+
+	return 0;
+}
+
+static inline int is_driver_owner(struct net_device *dev, char *name)
+{
+	struct ethtool_drvinfo drvinfo;
+
+	if (dev->ethtool_ops && dev->ethtool_ops->get_drvinfo) {
+		memset(&drvinfo, 0, sizeof(drvinfo));
+		dev->ethtool_ops->get_drvinfo(dev, &drvinfo);
+		if (!strstr(drvinfo.driver, name))
+			return 0;
+	} else
+		return 0;
+
+	return 1;
+}
+
+static inline int is_parent(struct net_device *dev)
+{
+	return (dev->priv_flags & IFF_EIPOIB_PIF) &&
+		is_driver_owner(dev, DRV_NAME);
+}
+
+static inline int is_parent_mac(struct net_device *dev, u8 *mac)
+{
+	return is_parent(dev) && !memcmp(mac, dev->dev_addr, dev->addr_len);
+}
+
+static inline int __is_slave(struct net_device *dev)
+{
+	return dev->master && is_parent(dev->master);
+}
+
+static inline int is_slave(struct net_device *dev)
+{
+	return (dev->priv_flags & IFF_EIPOIB_VIF) &&
+		is_driver_owner(dev, SDRV_NAME) && __is_slave(dev);
+}
+
+/*
+ * ------------------------------- Link status ------------------
+ * set parent carrier:
+ * link is up if at least one slave has link up
+ * otherwise, bring link down
+ * return 1 if parent carrier changed, zero otherwise
+ */
+static int parent_set_carrier(struct parent *parent)
+{
+	struct slave *slave;
+
+	if (parent->slave_cnt == 0)
+		goto down;
+
+	/* bring parent link up if one slave (at least) is up */
+	parent_for_each_slave(parent, slave) {
+		if (netif_carrier_ok(slave->dev)) {
+			if (!netif_carrier_ok(parent->dev)) {
+				netif_carrier_on(parent->dev);
+				return 1;
+			}
+			return 0;
+		}
+	}
+
+down:
+	if (netif_carrier_ok(parent->dev)) {
+		pr_info("bring down carrier\n");
+		netif_carrier_off(parent->dev);
+		return 1;
+	}
+	return 0;
+}
+
+static int parent_set_mtu(struct parent *parent)
+{
+	struct slave *slave, *f_slave;
+	unsigned int mtu;
+
+	if (parent->slave_cnt == 0)
+		return 0;
+
+	/* find min mtu */
+	f_slave = list_first_entry(&parent->slave_list, struct slave, list);
+	mtu = f_slave->dev->mtu;
+
+	parent_for_each_slave(parent, slave)
+		mtu = min(slave->dev->mtu, mtu);
+
+	if (parent->dev->mtu != mtu) {
+		dev_set_mtu(parent->dev, mtu);
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ *--------------------------- slave list handling ------
+ *
+ * This function attaches the slave to the end of list.
+ * pay attention, the caller should held paren->lock
+ */
+static void parent_attach_slave(struct parent *parent,
+				struct slave *new_slave)
+{
+	list_add_tail(&new_slave->list, &parent->slave_list);
+	parent->slave_cnt++;
+}
+
+static void parent_detach_slave(struct parent *parent, struct slave *slave)
+{
+	list_del(&slave->list);
+	parent->slave_cnt--;
+}
+
+static netdev_features_t parent_fix_features(struct net_device *dev,
+					     netdev_features_t features)
+{
+	struct slave *slave;
+	struct parent *parent = netdev_priv(dev);
+	netdev_features_t mask;
+
+	read_lock_bh(&parent->lock);
+
+	mask = features;
+	features &= ~NETIF_F_ONE_FOR_ALL;
+	features |= NETIF_F_ALL_FOR_ALL;
+
+	parent_for_each_slave(parent, slave)
+		features = netdev_increment_features(features,
+						     slave->dev->features,
+						     mask);
+
+	features &= ~NETIF_F_VLAN_CHALLENGED;
+	read_unlock_bh(&parent->lock);
+	return features;
+}
+
+static int parent_compute_features(struct parent *parent)
+{
+	struct net_device *parent_dev = parent->dev;
+	u64 hw_features, features;
+	struct slave *slave;
+
+	if (list_empty(&parent->slave_list))
+		goto done;
+
+	/* starts with the max set of features mask */
+	hw_features = features = ~0LL;
+
+	/* gets the common features from all slaves */
+	parent_for_each_slave(parent, slave) {
+		features &= slave->dev->features;
+		hw_features &= slave->dev->hw_features;
+	}
+
+	features = features | PARENT_VLAN_FEATURES;
+	hw_features = hw_features | PARENT_VLAN_FEATURES;
+
+	hw_features &= ~NETIF_F_VLAN_CHALLENGED;
+	features &= hw_features;
+
+	parent_dev->hw_features = hw_features;
+	parent_dev->features = features;
+	parent_dev->vlan_features = parent_dev->features & ~PARENT_VLAN_FEATURES;
+done:
+	pr_info("%s: %s: Features: 0x%llx\n",
+		__func__, parent_dev->name, parent_dev->features);
+
+	return 0;
+}
+
+static inline u16 slave_get_pkey(struct net_device *dev)
+{
+	u16 pkey = (dev->broadcast[8] << 8) + dev->broadcast[9];
+
+	return pkey;
+}
+
+static void parent_setup_by_slave(struct net_device *parent_dev,
+				  struct net_device *slave_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
+
+	parent_dev->mtu = slave_dev->mtu;
+	parent_dev->hard_header_len = slave_dev->hard_header_len;
+
+	slave_ops->ndo_neigh_setup(slave_dev, &parent->nparms);
+
+}
+
+/* enslave device <slave> to parent device <master> */
+int parent_enslave(struct net_device *parent_dev, struct net_device *slave_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+	struct slave *new_slave = NULL;
+	int old_features = parent_dev->features;
+	int res = 0;
+	/* slave must be claimed by ipoib */
+	if (!is_driver_owner(slave_dev, SDRV_NAME))
+		return -EOPNOTSUPP;
+
+	/* parent must be initialized by parent_open() before enslaving */
+	if (!(parent_dev->flags & IFF_UP)) {
+		pr_warn("%s parent is not up in "
+			"parent_enslave\n",
+			parent_dev->name);
+		return -EPERM;
+	}
+
+	/* already enslaved */
+	if ((slave_dev->flags & IFF_SLAVE) ||
+		(slave_dev->priv_flags & IFF_EIPOIB_VIF)) {
+		pr_err("%s was already enslaved!!!\n", slave_dev->name);
+		return -EBUSY;
+	}
+
+	/* mark it as ipoib clone vif */
+	slave_dev->priv_flags |= IFF_EIPOIB_VIF;
+
+	/* set parent netdev attributes */
+	if (parent->slave_cnt == 0)
+		parent_setup_by_slave(parent_dev, slave_dev);
+	else {
+		/* check netdev attr match */
+		if (slave_dev->hard_header_len != parent_dev->hard_header_len) {
+			pr_err("%s slave %s has different HDR len %d != %d\n",
+			       parent_dev->name, slave_dev->name,
+			       slave_dev->hard_header_len,
+			       parent_dev->hard_header_len);
+			res = -EINVAL;
+			goto err_undo_flags;
+		}
+
+		if (slave_dev->type != ARPHRD_INFINIBAND ||
+		    slave_dev->addr_len != INFINIBAND_ALEN) {
+			pr_err("%s slave type/addr_len is invalid (%d/%d)\n",
+			       parent_dev->name, slave_dev->type,
+			       slave_dev->addr_len);
+			res = -EINVAL;
+			goto err_undo_flags;
+		}
+	}
+	/*
+	 * verfiy that this (slave) device belongs to the relevant PIF
+	 * abort if the name of the slave is not as the regular way in ipoib
+	 */
+	if (!strstr(slave_dev->name, parent->ipoib_main_interface)) {
+		pr_err("%s slave name (%s) doesn't contain parent name (%s) ",
+		       parent_dev->name, slave_dev->name,
+		       parent->ipoib_main_interface);
+		res = -EINVAL;
+		goto err_undo_flags;
+	}
+
+	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
+	if (!new_slave) {
+		res = -ENOMEM;
+		goto err_undo_flags;
+	}
+
+	INIT_LIST_HEAD(&new_slave->neigh_list);
+
+	/* save slave's vlan */
+	new_slave->pkey = slave_get_pkey(slave_dev);
+	new_slave->vlan = (new_slave->pkey != 0xffff) ?
+			  (new_slave->pkey & 0x7fff) : VLAN_N_VID;
+
+	res = netdev_set_parent_master(slave_dev, parent_dev);
+	if (res) {
+		pr_err("%s %d calling netdev_set_master\n",
+		       slave_dev->name, res);
+		goto err_free;
+	}
+
+	res = dev_open(slave_dev);
+	if (res) {
+		pr_info("open failed %s\n",
+			slave_dev->name);
+		goto err_unset_master;
+	}
+
+	new_slave->dev = slave_dev;
+
+	write_lock_bh(&parent->lock);
+
+	parent_attach_slave(parent, new_slave);
+
+	parent_compute_features(parent);
+
+	write_unlock_bh(&parent->lock);
+
+	read_lock_bh(&parent->lock);
+
+	parent_set_carrier(parent);
+
+	read_unlock_bh(&parent->lock);
+
+	res = create_slave_symlinks(parent_dev, slave_dev);
+	if (res)
+		goto err_close;
+
+	/* register handler */
+	res = netdev_rx_handler_register(slave_dev, eipoib_handle_frame,
+					 new_slave);
+	if (res) {
+		pr_warn("%s %d calling netdev_rx_handler_register\n",
+			parent_dev->name, res);
+		goto err_close;
+	}
+
+	pr_info("%s: enslaving %s\n", parent_dev->name, slave_dev->name);
+
+	/* enslave is successful */
+	return 0;
+
+/* Undo stages on error */
+err_close:
+	dev_close(slave_dev);
+
+err_unset_master:
+	netdev_set_parent_master(slave_dev, NULL);
+
+err_free:
+	kfree(new_slave);
+
+err_undo_flags:
+	parent_dev->features = old_features;
+
+	return res;
+}
+
+static void slave_free(struct parent *parent, struct slave *slave)
+{
+	struct neigh *neigh, *neigh_tmp;
+
+	list_for_each_entry_safe(neigh, neigh_tmp, &slave->neigh_list, list) {
+		list_del(&neigh->list);
+		kfree(neigh);
+	}
+
+	netdev_rx_handler_unregister(slave->dev);
+
+	kfree(slave);
+}
+
+int parent_release_slave(struct net_device *parent_dev,
+			 struct net_device *slave_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+	struct slave *slave;
+	struct guest_emac_info *emac_info;
+	/* slave is not a slave or master is not master of this slave */
+	if (!(slave_dev->flags & IFF_SLAVE) ||
+	    (slave_dev->master != parent_dev)) {
+		pr_err("%s cannot release %s.\n",
+		       parent_dev->name, slave_dev->name);
+		return -EINVAL;
+	}
+
+	write_lock_bh(&parent->lock);
+
+	slave = get_slave_by_dev(parent, slave_dev);
+	if (!slave) {
+		/* not a slave of this parent */
+		pr_warn("%s not enslaved %s\n",
+			parent_dev->name, slave_dev->name);
+		write_unlock_bh(&parent->lock);
+		return -EINVAL;
+	}
+
+	pr_info("%s: releasing interface %s\n", parent_dev->name,
+		slave_dev->name);
+
+	/* for live migration, mark its mac_ip record as invalid */
+	emac_info = get_mac_ip_info_by_mac_and_vlan(parent, slave->emac, slave->vlan);
+	if (!emac_info)
+		pr_warn("%s %s didn't find emac: %pM\n",
+			parent_dev->name, slave_dev->name, slave->emac);
+	else {
+		emac_info->rec_state = MIGRATED_OUT;
+		/* start GC work */
+		pr_info("%s: sending clean task for slave mac: %pM\n",
+			__func__, slave->emac);
+		queue_delayed_work(parent->wq, &parent->migrate_out_work, 0);
+		queue_delayed_work(parent->wq, &parent->emac_ip_work,
+				   EMAC_IP_GC_TIME);
+	}
+
+	/* release the slave from its parent */
+	parent_detach_slave(parent, slave);
+
+	parent_compute_features(parent);
+
+	if (parent->slave_cnt == 0)
+		parent_set_carrier(parent);
+
+	write_unlock_bh(&parent->lock);
+
+	/* must do this from outside any spinlocks */
+	destroy_slave_symlinks(parent_dev, slave_dev);
+
+	netdev_set_parent_master(slave_dev, NULL);
+
+	dev_close(slave_dev);
+
+	slave_free(parent, slave);
+
+	return 0;  /* deletion OK */
+}
+
+static int parent_release_all(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+	struct slave *slave, *slave_tmp;
+	struct net_device *slave_dev;
+	struct neigh *neigh_cmd, *neigh_cmd_tmp;
+	struct guest_emac_info *emac_info, *emac_info_tmp;
+	struct slave;
+
+	write_lock_bh(&parent->lock);
+
+	netif_carrier_off(parent_dev);
+
+	if (parent->slave_cnt == 0)
+		goto out;
+
+	list_for_each_entry_safe(slave, slave_tmp, &parent->slave_list, list) {
+		slave_dev = slave->dev;
+
+		/* remove slave from parent's slave-list */
+		parent_detach_slave(parent, slave);
+
+		parent_compute_features(parent);
+
+		write_unlock_bh(&parent->lock);
+
+		destroy_slave_symlinks(parent_dev, slave_dev);
+
+		netdev_set_parent_master(slave_dev, NULL);
+
+		dev_close(slave_dev);
+
+		slave_free(parent, slave);
+
+		write_lock_bh(&parent->lock);
+	}
+
+	if (list_empty(&parent->vlan_list))
+		list_for_each_entry_safe(neigh_cmd, neigh_cmd_tmp,
+					 &parent->neigh_add_list, list) {
+			list_del(&neigh_cmd->list);
+			kfree(neigh_cmd);
+		}
+
+	list_for_each_entry_safe(emac_info, emac_info_tmp,
+				 &parent->emac_ip_list, list) {
+		free_ip_mem_in_rec(emac_info);
+		list_del(&emac_info->list);
+		kfree(emac_info);
+	}
+
+	pr_info("%s: released all slaves\n", parent_dev->name);
+
+out:
+	write_unlock_bh(&parent->lock);
+
+	return 0;
+}
+
+/* -------------------------- Device entry points --------------------------- */
+static struct net_device_stats *parent_get_stats(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+	struct net_device_stats *stats = &parent->stats;
+	struct net_device_stats local_stats;
+	struct slave *slave;
+
+	struct rtnl_link_stats64 temp;
+
+	memset(&local_stats, 0, sizeof(struct net_device_stats));
+
+	read_lock_bh(&parent->lock);
+
+	parent_for_each_slave(parent, slave) {
+
+		const struct rtnl_link_stats64 *sstats =
+			dev_get_stats(slave->dev, &temp);
+
+		local_stats.rx_packets += sstats->rx_packets;
+		local_stats.rx_bytes += sstats->rx_bytes;
+		local_stats.rx_errors += sstats->rx_errors;
+		local_stats.rx_dropped += sstats->rx_dropped;
+
+		local_stats.tx_packets += sstats->tx_packets;
+		local_stats.tx_bytes += sstats->tx_bytes;
+		local_stats.tx_errors += sstats->tx_errors;
+		local_stats.tx_dropped += sstats->tx_dropped;
+
+		local_stats.multicast += sstats->multicast;
+		local_stats.collisions += sstats->collisions;
+
+		local_stats.rx_length_errors += sstats->rx_length_errors;
+		local_stats.rx_over_errors += sstats->rx_over_errors;
+		local_stats.rx_crc_errors += sstats->rx_crc_errors;
+		local_stats.rx_frame_errors += sstats->rx_frame_errors;
+		local_stats.rx_fifo_errors += sstats->rx_fifo_errors;
+		local_stats.rx_missed_errors += sstats->rx_missed_errors;
+
+		local_stats.tx_aborted_errors += sstats->tx_aborted_errors;
+		local_stats.tx_carrier_errors += sstats->tx_carrier_errors;
+		local_stats.tx_fifo_errors += sstats->tx_fifo_errors;
+		local_stats.tx_heartbeat_errors += sstats->tx_heartbeat_errors;
+		local_stats.tx_window_errors += sstats->tx_window_errors;
+	}
+
+	memcpy(stats, &local_stats, sizeof(struct net_device_stats));
+
+	read_unlock_bh(&parent->lock);
+
+	return stats;
+}
+
+/* ---------------------------- Main funcs ---------------------------------- */
+static struct neigh *neigh_cmd_find_by_mac(struct slave *slave, u8 *mac)
+{
+	struct net_device *dev = slave->dev;
+	struct net_device *parent_dev = dev->master;
+	struct parent *parent = netdev_priv(parent_dev);
+	struct neigh *neigh;
+	int found = 0;
+
+	list_for_each_entry(neigh, &parent->neigh_add_list, list) {
+		if (!memcmp(neigh->emac, mac, ETH_ALEN)) {
+			found = 1;
+			break;
+		}
+	}
+
+	return found ? neigh : NULL;
+}
+
+static struct neigh *neigh_find_by_mac(struct slave *slave, u8 *mac)
+{
+	struct neigh *neigh;
+	int found = 0;
+
+	list_for_each_entry(neigh, &slave->neigh_list, list) {
+		if (!memcmp(neigh->emac, mac, ETH_ALEN)) {
+			found = 1;
+			break;
+		}
+	}
+
+	return found ? neigh : NULL;
+}
+
+static int neigh_learn(struct slave *slave, struct sk_buff *skb, u8 *remac)
+{
+	struct net_device *dev = slave->dev;
+	struct net_device *parent_dev = dev->master;
+	struct parent *parent = netdev_priv(parent_dev);
+	struct neigh *neigh_cmd;
+	u8 *rimac;
+	int rc;
+
+	/* linearize to easy on reading the arp payload */
+	rc = skb_linearize(skb);
+	if (rc) {
+		pr_err("%s: skb_linearize failed rc %d\n", dev->name, rc);
+		goto out;
+	} else
+		rimac = skb->data + sizeof(struct arphdr);
+
+	/* check if entry is being processed or already exists */
+	if (neigh_find_by_mac(slave, remac))
+		goto out;
+
+	if (neigh_cmd_find_by_mac(slave, remac))
+		goto out;
+
+	neigh_cmd = parent_get_neigh_cmd('+', slave->dev->name, remac, rimac);
+	if (!neigh_cmd) {
+		pr_err("%s cannot build neigh cmd\n", slave->dev->name);
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	list_add_tail(&neigh_cmd->list, &parent->neigh_add_list);
+
+	/* calls neigh_learn_task() */
+	queue_delayed_work(parent->wq, &parent->neigh_learn_work, 0);
+
+out:
+	return rc;
+}
+
+static void neigh_learn_task(struct work_struct *work)
+{
+	struct parent *parent = container_of(work, struct parent,
+					     neigh_learn_work.work);
+	struct neigh *neigh_cmd, *neigh_cmd_tmp;
+
+	write_lock_bh(&parent->lock);
+
+	if (parent->kill_timers)
+		goto out;
+
+	list_for_each_entry_safe(neigh_cmd, neigh_cmd_tmp,
+				 &parent->neigh_add_list, list) {
+		__parent_store_neighs(&parent->dev->dev, NULL,
+				      neigh_cmd->cmd, PAGE_SIZE);
+		list_del(&neigh_cmd->list);
+		kfree(neigh_cmd);
+	}
+
+out:
+	write_unlock_bh(&parent->lock);
+	return;
+}
+
+static void parent_work_cancel_all(struct parent *parent)
+{
+	write_lock_bh(&parent->lock);
+	parent->kill_timers = 1;
+	write_unlock_bh(&parent->lock);
+
+	if (delayed_work_pending(&parent->neigh_learn_work))
+		cancel_delayed_work(&parent->neigh_learn_work);
+
+	if (delayed_work_pending(&parent->emac_ip_work))
+		cancel_delayed_work(&parent->emac_ip_work);
+
+	if (delayed_work_pending(&parent->migrate_out_work))
+		cancel_delayed_work(&parent->migrate_out_work);
+}
+
+static struct parent *get_parent_by_pif_name(char *pif_name)
+{
+	struct parent *parent, *nxt;
+
+	list_for_each_entry_safe(parent, nxt, &parent_dev_list, parent_list) {
+		if (!strcmp(parent->ipoib_main_interface, pif_name))
+			return parent;
+	}
+	return NULL;
+}
+
+static void free_ip_mem_in_rec(struct guest_emac_info *emac_info)
+{
+	struct ip_member *ipm, *tmp_ipm;
+	list_for_each_entry_safe(ipm, tmp_ipm, &emac_info->ip_list, list) {
+		list_del(&ipm->list);
+		kfree(ipm);
+	}
+}
+
+static inline void free_invalid_emac_ip_det(struct parent *parent)
+{
+	struct guest_emac_info *emac_info, *emac_info_tmp;
+
+	list_for_each_entry_safe(emac_info, emac_info_tmp,
+				 &parent->emac_ip_list, list) {
+		if (emac_info->rec_state == INVALID) {
+			free_ip_mem_in_rec(emac_info);
+			list_del(&emac_info->list);
+			kfree(emac_info);
+		}
+	}
+}
+
+static void emac_info_clean_task(struct work_struct *work)
+{
+	struct parent *parent = container_of(work, struct parent,
+					     emac_ip_work.work);
+
+	write_lock_bh(&parent->lock);
+
+	if (parent->kill_timers)
+		goto out;
+
+	free_invalid_emac_ip_det(parent);
+
+out:
+	write_unlock_bh(&parent->lock);
+	return;
+}
+
+static int migrate_out_gen_arp_req(struct parent *parent, u8 *emac,
+				   u16 vlan)
+{
+	struct guest_emac_info *emac_info;
+	struct ip_member *ipm;
+	struct slave *slave;
+	struct sk_buff *nskb;
+	int ret = 0;
+
+	slave = get_slave_by_mac_and_vlan(parent, parent->dev->dev_addr, vlan);
+	if (unlikely(!slave)) {
+		pr_info("%s: Failed to find parent slave !!! %pM\n",
+			__func__, parent->dev->dev_addr);
+		return -ENODEV;
+	}
+
+	emac_info = get_mac_ip_info_by_mac_and_vlan(parent, emac, vlan);
+
+	if (!emac_info)
+		return 0;
+
+	/* go over all ip's attached to that mac */
+	list_for_each_entry(ipm, &emac_info->ip_list, list) {
+		/* create and send arp request to that ip.*/
+		pr_info("%s: Sending arp For migrate_out event, to %pI4 "
+			"from 0.0.0.0\n", parent->dev->name, &(ipm->ip));
+
+		nskb = arp_create(ARPOP_REQUEST,
+				  ETH_P_ARP,
+				  ipm->ip,
+				  slave->dev,
+				  0,
+				  slave->dev->broadcast,
+				  slave->dev->broadcast,
+				  slave->dev->broadcast);
+		if (nskb)
+			arp_xmit(nskb);
+		else {
+			pr_err("%s: %s failed creating skb\n",
+			       __func__, slave->dev->name);
+			ret = -ENOMEM;
+		}
+	}
+	return ret;
+}
+
+static void migrate_out_work_task(struct work_struct *work)
+{
+	struct parent *parent = container_of(work, struct parent,
+					     migrate_out_work.work);
+	struct guest_emac_info *emac_info;
+	int is_reschedule = 0;
+	int ret;
+
+	write_lock_bh(&parent->lock);
+
+	if (parent->kill_timers)
+		goto out;
+
+	list_for_each_entry(emac_info, &parent->emac_ip_list, list) {
+		if (emac_info->rec_state == MIGRATED_OUT) {
+			if (emac_info->num_of_retries <
+			    MIG_OUT_MAX_ARP_RETRIES) {
+				ret = migrate_out_gen_arp_req(parent, emac_info->emac,
+							      emac_info->vlan);
+				if (ret)
+					pr_err("%s: migrate_out_gen_arp failed: %d\n",
+					       __func__, ret);
+
+				emac_info->num_of_retries =
+					emac_info->num_of_retries + 1;
+				is_reschedule = 1;
+			} else
+				emac_info->rec_state = INVALID;
+		}
+	}
+	/* issue arp request till the device removed that entry from list */
+	if (is_reschedule)
+		queue_delayed_work(parent->wq, &parent->migrate_out_work,
+				   MIG_OUT_ARP_REQ_ISSUE_TIME);
+out:
+	write_unlock_bh(&parent->lock);
+	return;
+}
+
+static inline int add_emac_ip_info(struct net_device *slave_dev, __be32 ip,
+				   u8 *mac, u16 vlan)
+{
+	struct net_device *parent_dev = slave_dev->master;
+	struct parent *parent = netdev_priv(parent_dev);
+	struct guest_emac_info *emac_info = NULL;
+	struct ip_member *ipm;
+	int ret;
+	int is_just_alloc_emac_info = 0;
+
+	ret = is_mac_info_contain_ip(parent, mac, ip, emac_info, vlan);
+	if (ret)
+		return 0;
+
+	/* new ip add it to the emc_ip obj */
+	if (!emac_info) {
+		emac_info = kzalloc(sizeof *emac_info, GFP_ATOMIC);
+		if (!emac_info) {
+			pr_err("%s: Failed allocating emac_info\n",
+			       parent_dev->name);
+			return -ENOMEM;
+		}
+		memcpy(emac_info->emac, mac, ETH_ALEN);
+		INIT_LIST_HEAD(&emac_info->ip_list);
+		emac_info->rec_state = VALID;
+		emac_info->vlan = vlan;
+		emac_info->num_of_retries = 0;
+		list_add_tail(&emac_info->list, &parent->emac_ip_list);
+		is_just_alloc_emac_info = 1;
+	}
+
+	ipm = kzalloc(sizeof *ipm, GFP_ATOMIC);
+	if (!ipm) {
+		pr_err(" %s Failed allocating emac_info (ipm)\n",
+		       parent_dev->name);
+		if (is_just_alloc_emac_info)
+			kfree(emac_info);
+		return -ENOMEM;
+	}
+
+	ipm->ip = ip;
+	list_add_tail(&ipm->list, &emac_info->ip_list);
+
+	return 0;
+}
+
+/* build ipoib arp/rarp request/reply packet */
+static struct sk_buff *get_slave_skb_arp(struct slave *slave,
+					 struct sk_buff *skb,
+					 u8 *rimac, int *ret)
+{
+	struct sk_buff *nskb;
+	struct arphdr *arphdr = (struct arphdr *)
+				(skb->data + sizeof(struct ethhdr));
+	struct eth_arp_data *arp_data = (struct eth_arp_data *)
+					(skb->data + sizeof(struct ethhdr) +
+					 sizeof(struct arphdr));
+	u8 t_addr[ETH_ALEN] = {0};
+	int err = 0;
+	/* mark regular packet handling */
+	*ret = 0;
+
+	/*
+	 * live-migration support: keeps the new mac/ip address:
+	 * In that way each driver knows which mac/vlan - IP's where on the
+	 * guests above, whenever migrate_out event comes it will send
+	 * arp request for all these IP's.
+	 */
+	if (skb->protocol == htons(ETH_P_ARP))
+		err = add_emac_ip_info(slave->dev, arp_data->arp_sip,
+				       arp_data->arp_sha, slave->vlan);
+	if (err)
+		pr_warn("%s: Failed creating: emac_ip_info for ip: %pI4",
+			__func__, &arp_data->arp_sip);
+	/*
+	 * live migration support:
+	 * 1.checck if we are in live migration process
+	 * 2.check if the arp response is for the parent
+	 * 3.ignore local-administrated bit, which was set to make sure
+	 *   that the bridge will not drop it.
+	 */
+	arp_data->arp_dha[0] = arp_data->arp_dha[0] & 0xFD;
+	if (htons(ARPOP_REPLY) == (arphdr->ar_op) &&
+	    !memcmp(arp_data->arp_dha, slave->dev->master->dev_addr, ETH_ALEN)) {
+		/*
+		 * when the source is the parent interface, assumes
+		 * that we are in the middle of live migration process,
+		 * so, we will send gratuitous arp.
+		 */
+		pr_info("%s: Arp packet for parent: %s",
+			__func__, slave->dev->master->name);
+		/* create gratuitous ARP on behalf of the guest */
+		nskb = arp_create(ARPOP_REQUEST,
+				  be16_to_cpu(skb->protocol),
+				  arp_data->arp_sip,
+				  slave->dev,
+				  arp_data->arp_sip,
+				  NULL,
+				  slave->dev->dev_addr,
+				  t_addr);
+		if (unlikely(!nskb))
+			pr_err("%s: %s live migration: failed creating skb\n",
+			       __func__, slave->dev->name);
+	} else {
+		nskb = arp_create(be16_to_cpu(arphdr->ar_op),
+				  be16_to_cpu(skb->protocol),
+				  arp_data->arp_dip,
+				  slave->dev,
+				  arp_data->arp_sip,
+				  rimac,
+				  slave->dev->dev_addr,
+				  NULL);
+	}
+
+	return nskb;
+}
+
+/*
+ * build ipoib arp request packet according to ip header.
+ * uses for live-migration, or missing neigh for new vif.
+ */
+static void get_slave_skb_arp_by_ip(struct slave *slave,
+				    struct sk_buff *skb)
+{
+	struct sk_buff *nskb = NULL;
+	struct iphdr *iph = ip_hdr(skb);
+
+	pr_info("Sending arp on behalf of slave %s, from %pI4"
+		" to %pI4" , slave->dev->name, &(iph->saddr),
+		&(iph->daddr));
+
+	nskb = arp_create(ARPOP_REQUEST,
+			  ETH_P_ARP,
+			  iph->daddr,
+			  slave->dev,
+			  iph->saddr,
+			  slave->dev->broadcast,
+			  slave->dev->dev_addr,
+			  NULL);
+	if (nskb)
+		arp_xmit(nskb);
+	else
+		pr_err("%s: %s failed creating skb\n",
+		       __func__, slave->dev->name);
+}
+
+/* build ipoib ipv4/ipv6 packet */
+static struct sk_buff *get_slave_skb_ip(struct slave *slave,
+					struct sk_buff *skb)
+{
+
+	skb_pull(skb, ETH_HLEN);
+	skb_reset_network_header(skb);
+
+	return skb;
+}
+
+/*
+ * get_slave_skb -- called in TX flow
+ * get skb that can be sent thru slave xmit func,
+ * if skb was adjusted (cloned, pulled, etc..) successfully
+ * the old skb (if any) is freed here.
+ */
+static struct sk_buff *get_slave_skb(struct slave *slave, struct sk_buff *skb)
+{
+	struct net_device *dev = slave->dev;
+	struct net_device *parent_dev = dev->master;
+	struct parent *parent = netdev_priv(parent_dev);
+	struct sk_buff *nskb = NULL;
+	struct ethhdr *ethh = (struct ethhdr *)(skb->data);
+	struct neigh *neigh = NULL;
+	u8 rimac[INFINIBAND_ALEN];
+	int ret = 0;
+
+	/* set neigh mac */
+	if (is_multicast_ether_addr(ethh->h_dest)) {
+		memcpy(rimac, dev->broadcast, INFINIBAND_ALEN);
+	} else {
+		neigh = neigh_find_by_mac(slave, ethh->h_dest);
+		if (neigh) {
+			memcpy(rimac, neigh->imac, INFINIBAND_ALEN);
+		} else {
+			++parent->port_stats.tx_neigh_miss;
+			/*
+			 * assume VIF migration, tries to get the neigh by
+			 * issue arp request on behalf of the vif.
+			 */
+			if (skb->protocol == htons(ETH_P_IP)) {
+				pr_info("Missed neigh for slave: %s,"
+					"issue ARP request\n",
+					slave->dev->name);
+				get_slave_skb_arp_by_ip(slave, skb);
+				goto out_arp_sent_instead;
+			}
+		}
+	}
+
+	if (skb->protocol == htons(ETH_P_ARP) ||
+	    skb->protocol == htons(ETH_P_RARP)) {
+		nskb = get_slave_skb_arp(slave, skb, rimac, &ret);
+		if (!nskb && LIVE_MIG_PACKET == ret) {
+			pr_info("%s: live migration packets\n", __func__);
+			goto err;
+		}
+	} else {
+		if (!neigh)
+			goto err;
+		/* pull ethernet header here */
+		nskb = get_slave_skb_ip(slave, skb);
+	}
+
+	/* if new skb could not be adjusted/allocated, abort */
+	if (!nskb) {
+		pr_err("%s get_slave_skb_ip/arp failed 0x%x\n",
+		       dev->name, skb->protocol);
+		goto err;
+	}
+
+	if (neigh && nskb == skb) { /* ucast & non-arp/rarp */
+		/* dev_hard_header only for ucast, for arp done already.*/
+		if (dev_hard_header(nskb, dev, ntohs(skb->protocol), rimac,
+				    dev->dev_addr, nskb->len) < 0) {
+			pr_warn("%s: dev_hard_header failed\n",
+				dev->name);
+			goto err;
+		}
+	}
+
+	/*
+	 * new skb is ready to be sent, clean old skb if we hold a clone
+	 * (old skb is not shared, already checked that.)
+	 */
+	if ((nskb != skb))
+		dev_kfree_skb(skb);
+
+	nskb->dev = slave->dev;
+	return nskb;
+
+out_arp_sent_instead:/* whenever sent arp instead of ip packet */
+err:
+	/* got error after nskb was adjusted/allocated */
+	if (nskb && (nskb != skb))
+		dev_kfree_skb(nskb);
+
+	return NULL;
+}
+
+static struct sk_buff *get_parent_skb_arp(struct slave *slave,
+					  struct sk_buff *skb,
+					  u8 *remac)
+{
+	struct net_device *dev = slave->dev->master;
+	struct sk_buff *nskb;
+	struct arphdr *arphdr = (struct arphdr *)(skb->data);
+	struct ipoib_arp_data *arp_data = (struct ipoib_arp_data *)
+					(skb->data + sizeof(struct arphdr));
+	u8 *target_hw = slave->emac;
+	u8 *dst_hw = slave->emac;
+	u8 local_eth_addr[ETH_ALEN];
+
+	/* live migration: gets arp with broadcast src and dst */
+	if (!memcmp(arp_data->arp_sha, slave->dev->broadcast, INFINIBAND_ALEN) &&
+	    !memcmp(arp_data->arp_dha, slave->dev->broadcast, INFINIBAND_ALEN)) {
+		pr_info("%s: ARP with bcast src and dest send from src_hw: %pM\n",
+			__func__, slave->dev->master->dev_addr);
+		/* replace the src with the parent src: */
+		memcpy(local_eth_addr, slave->dev->master->dev_addr, ETH_ALEN);
+		/*
+		 * set local administrated bit,
+		 * that way the bridge will not throws it
+		 */
+		local_eth_addr[0] = local_eth_addr[0] | 0x2;
+		memcpy(remac, local_eth_addr, ETH_ALEN);
+		target_hw = NULL;
+		dst_hw = NULL;
+	}
+
+	nskb = arp_create(be16_to_cpu(arphdr->ar_op),
+			  be16_to_cpu(skb->protocol),
+			  arp_data->arp_dip,
+			  dev,
+			  arp_data->arp_sip,
+			  dst_hw,
+			  remac,
+			  target_hw);
+
+	/* prepare place for the headers. */
+	if (nskb)
+		skb_reserve(nskb, ETH_HLEN);
+
+	return nskb;
+}
+
+static struct sk_buff *get_parent_skb_ip(struct slave *slave,
+					 struct sk_buff *skb)
+{
+	/* nop */
+	return skb;
+}
+
+/* get_parent_skb -- called in RX flow */
+static struct sk_buff *get_parent_skb(struct slave *slave,
+				      struct sk_buff *skb, u8 *remac)
+{
+	struct net_device *dev = slave->dev->master;
+	struct sk_buff *nskb = NULL;
+	struct ethhdr *ethh;
+
+	if (skb->protocol == htons(ETH_P_ARP) ||
+	    skb->protocol == htons(ETH_P_RARP))
+		nskb = get_parent_skb_arp(slave, skb, remac);
+	else
+		nskb = get_parent_skb_ip(slave, skb);
+
+	/* if new skb could not be adjusted/allocated, abort */
+	if (!nskb)
+		goto err;
+
+	/* at this point, we can free old skb if it was cloned */
+	if (nskb && (nskb != skb))
+		dev_kfree_skb(skb);
+
+	skb = nskb;
+
+	/* build ethernet header */
+	ethh = (struct ethhdr *)skb_push(skb, ETH_HLEN);
+	ethh->h_proto = skb->protocol;
+	memcpy(ethh->h_source, remac, ETH_ALEN);
+	memcpy(ethh->h_dest, slave->emac, ETH_ALEN);
+
+	/* zero padding whenever is needed (arp for example).to ETH_ZLEN size */
+	if (unlikely((skb->len < ETH_ZLEN))) {
+		if ((skb->tail + (ETH_ZLEN - skb->len) > skb->end) ||
+		    skb_is_nonlinear(skb))
+			/* nothing */;
+		else
+			memset(skb_put(skb, ETH_ZLEN - skb->len), 0,
+			       ETH_ZLEN - skb->len);
+	}
+
+	/* set new skb fields */
+	skb->pkt_type = PACKET_HOST;
+	/*
+	 * use master dev, to allow netpoll_receive_skb()
+	 * in netif_receive_skb()
+	 */
+	skb->dev = dev;
+
+	/* pull the Ethernet header and update other fields */
+	skb->protocol = eth_type_trans(skb, skb->dev);
+
+	return skb;
+
+err:
+	/* got error after nskb was adjusted/allocated */
+	if (nskb && (nskb != skb))
+		dev_kfree_skb(nskb);
+
+	return NULL;
+}
+
+static int parent_rx(struct sk_buff *skb, struct slave *slave)
+{
+	struct net_device *slave_dev = skb->dev;
+	struct net_device *parent_dev = slave_dev->master;
+	struct parent *parent = netdev_priv(parent_dev);
+	struct eipoib_cb_data *data = IPOIB_HANDLER_CB(skb);
+	struct napi_struct *napi =  data->rx.napi;
+	struct sk_buff *nskb;
+	int rc = 0;
+	u8 remac[ETH_ALEN];
+	int vlan_tag;
+
+	build_neigh_mac(remac, data->rx.sqpn, data->rx.slid);
+
+	read_lock_bh(&parent->lock);
+
+	if (unlikely(skb_headroom(skb) < ETH_HLEN)) {
+		pr_warn("%s: small headroom %d < %d\n",
+			skb->dev->name, skb_headroom(skb), ETH_HLEN);
+		++parent->port_stats.rx_skb_errors;
+		goto drop;
+	}
+
+	/* learn neighs based on ARP snooping */
+	if (unlikely(ntohs(skb->protocol) == ETH_P_ARP)) {
+		read_unlock_bh(&parent->lock);
+		write_lock_bh(&parent->lock);
+		neigh_learn(slave, skb, remac);
+		write_unlock_bh(&parent->lock);
+		read_lock_bh(&parent->lock);
+	}
+
+	nskb = get_parent_skb(slave, skb, remac);
+	if (unlikely(!nskb)) {
+		++parent->port_stats.rx_skb_errors;
+		pr_warn("%s: failed to create parent_skb\n",
+			skb->dev->name);
+		goto drop;
+	} else
+		skb = nskb;
+
+	vlan_tag = slave->vlan & 0xfff;
+	if (vlan_tag) {
+		skb = __vlan_hwaccel_put_tag(skb, vlan_tag);
+		if (!skb) {
+			pr_err("%s failed to insert VLAN tag\n",
+			       skb->dev->name);
+			goto drop;
+		}
+		++parent->port_stats.rx_vlan;
+	}
+
+	if (napi)
+		rc = napi_gro_receive(napi, skb);
+	else
+		rc = netif_receive_skb(skb);
+
+	read_unlock_bh(&parent->lock);
+
+	return rc;
+
+drop:
+	dev_kfree_skb_any(skb);
+	read_unlock_bh(&parent->lock);
+
+	return NET_RX_DROP;
+}
+
+static rx_handler_result_t eipoib_handle_frame(struct sk_buff **pskb)
+{
+	struct sk_buff *skb = *pskb;
+	struct slave *slave;
+
+	slave = eipoib_slave_get_rcu(skb->dev);
+
+	parent_rx(skb, slave);
+
+	return RX_HANDLER_CONSUMED;
+}
+
+static netdev_tx_t parent_tx(struct sk_buff *skb, struct net_device *dev)
+{
+	struct parent *parent = netdev_priv(dev);
+	struct slave *slave = NULL;
+	struct ethhdr *ethh = (struct ethhdr *)(skb->data);
+	struct sk_buff *nskb;
+	int rc;
+	u16 vlan;
+	u8 mac_no_admin_bit[ETH_ALEN];
+
+	read_lock_bh(&parent->lock);
+
+	if (unlikely(!IS_E_IPOIB_PROTO(ethh->h_proto))) {
+		++parent->port_stats.tx_proto_errors;
+		goto drop;
+	}
+	/* assume: only orphan skb's */
+	if (unlikely(skb_shared(skb))) {
+		++parent->port_stats.tx_shared;
+		goto drop;
+	}
+
+	/* obtain VLAN information if present */
+	if (vlan_tx_tag_present(skb)) {
+		vlan = vlan_tx_tag_get(skb) & 0xfff;
+		++parent->port_stats.tx_vlan;
+	} else {
+		vlan = VLAN_N_VID;
+	}
+
+	/*
+	 * for live migration: mask the admin bit if exists.
+	 * only in ARP packets that came from parent's VIF interface.
+	 */
+	if (unlikely((htons(ETH_P_ARP) == ethh->h_proto) &&
+	    !memcmp(parent->dev->dev_addr + 1, ethh->h_source + 1, ETH_ALEN - 1))) {
+		/* parent's VIF: */
+		memcpy(mac_no_admin_bit, ethh->h_source, ETH_ALEN);
+		mac_no_admin_bit[0] = mac_no_admin_bit[0] & 0xFD;
+		/* get slave, and queue packet */
+		slave = get_slave_by_mac_and_vlan(parent, mac_no_admin_bit, vlan);
+	}
+	/* get slave, and queue packet */
+	if (!slave)
+		slave = get_slave_by_mac_and_vlan(parent, ethh->h_source, vlan);
+	if (unlikely(!slave)) {
+		pr_info("vif: %pM miss for parent: %s\n", ethh->h_source,
+			parent->ipoib_main_interface);
+		++parent->port_stats.tx_vif_miss;
+		goto drop;
+	}
+
+	nskb = get_slave_skb(slave, skb);
+	if (unlikely(!nskb)) {
+		++parent->port_stats.tx_skb_errors;
+		goto drop;
+	} else
+		skb = nskb;
+
+	/*
+	 * VST mode: removes the vlan tag in the tx (will add it in the rx)
+	 * the slave is from IPoIB and it is NETIF_F_VLAN_CHALLENGED,
+	 * so must remove the vlan tag.
+	 */
+	if (vlan != VLAN_N_VID)
+		skb->vlan_tci = 0;
+
+	/* arp packets: */
+	if (skb->protocol == htons(ETH_P_ARP) ||
+	    skb->protocol == htons(ETH_P_RARP)) {
+		arp_xmit(skb);
+		goto out;
+	}
+
+	/* ip packets */
+	skb_record_rx_queue(skb, skb_get_queue_mapping(skb));
+
+	rc = dev_queue_xmit(skb);
+	if (unlikely(rc)) {
+		pr_err("slave tx method failed %d\n", rc);
+		++parent->port_stats.tx_slave_err;
+		dev_kfree_skb(skb);
+	}
+
+	goto out;
+
+drop:
+	++parent->port_stats.tx_parent_dropped;
+	dev_kfree_skb(skb);
+
+out:
+	read_unlock_bh(&parent->lock);
+	return NETDEV_TX_OK;
+}
+
+static int parent_open(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	parent->kill_timers = 0;
+	INIT_DELAYED_WORK(&parent->neigh_learn_work, neigh_learn_task);
+	INIT_DELAYED_WORK(&parent->emac_ip_work, emac_info_clean_task);
+	INIT_DELAYED_WORK(&parent->migrate_out_work, migrate_out_work_task);
+	return 0;
+}
+
+static int parent_close(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	write_lock_bh(&parent->lock);
+	parent->kill_timers = 1;
+	write_unlock_bh(&parent->lock);
+
+	cancel_delayed_work(&parent->neigh_learn_work);
+	cancel_delayed_work(&parent->emac_ip_work);
+	cancel_delayed_work(&parent->migrate_out_work);
+
+	return 0;
+}
+
+
+static void parent_deinit(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	list_del(&parent->parent_list);
+
+	parent_work_cancel_all(parent);
+}
+
+static void parent_uninit(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	parent_deinit(parent_dev);
+	parent_destroy_sysfs_entry(parent);
+
+	if (parent->wq)
+		destroy_workqueue(parent->wq);
+}
+
+static struct lock_class_key parent_netdev_xmit_lock_key;
+static struct lock_class_key parent_netdev_addr_lock_key;
+
+static void parent_set_lockdep_class_one(struct net_device *dev,
+					 struct netdev_queue *txq,
+					 void *_unused)
+{
+	lockdep_set_class(&txq->_xmit_lock,
+			  &parent_netdev_xmit_lock_key);
+}
+
+static void parent_set_lockdep_class(struct net_device *dev)
+{
+	lockdep_set_class(&dev->addr_list_lock,
+			  &parent_netdev_addr_lock_key);
+	netdev_for_each_tx_queue(dev, parent_set_lockdep_class_one, NULL);
+}
+
+static int parent_init(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	parent->wq = create_singlethread_workqueue(parent_dev->name);
+	if (!parent->wq)
+		return -ENOMEM;
+
+	parent_set_lockdep_class(parent_dev);
+
+	list_add_tail(&parent->parent_list, &parent_dev_list);
+
+	return 0;
+}
+
+static u16 parent_select_q(struct net_device *dev, struct sk_buff *skb)
+{
+	return skb_tx_hash(dev, skb);
+}
+
+static const struct net_device_ops parent_netdev_ops = {
+	.ndo_init		= parent_init,
+	.ndo_uninit		= parent_uninit,
+	.ndo_open		= parent_open,
+	.ndo_stop		= parent_close,
+	.ndo_start_xmit		= parent_tx,
+	.ndo_select_queue	= parent_select_q,
+	/* parnt mtu is min(slaves_mtus) */
+	.ndo_change_mtu		= NULL,
+	.ndo_fix_features	= parent_fix_features,
+	/*
+	 * initial mac address is randomized, can be changed
+	 * thru this func later
+	 */
+	.ndo_set_mac_address = eth_mac_addr,
+	.ndo_get_stats = parent_get_stats,
+	.ndo_vlan_rx_add_vid = eth_ipoib_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid = eth_ipoib_vlan_rx_kill_vid,
+};
+
+static void parent_setup(struct net_device *parent_dev)
+{
+	struct parent *parent = netdev_priv(parent_dev);
+
+	/* initialize rwlocks */
+	rwlock_init(&parent->lock);
+
+	/* Initialize pointers */
+	parent->dev = parent_dev;
+	INIT_LIST_HEAD(&parent->vlan_list);
+	INIT_LIST_HEAD(&parent->neigh_add_list);
+	INIT_LIST_HEAD(&parent->slave_list);
+	INIT_LIST_HEAD(&parent->emac_ip_list);
+	/* Initialize the device entry points */
+	ether_setup(parent_dev);
+	/* parent_dev->hard_header_len is adjusted later */
+	parent_dev->netdev_ops = &parent_netdev_ops;
+	parent_set_ethtool_ops(parent_dev);
+
+	/* Initialize the device options */
+	parent_dev->tx_queue_len = 0;
+	/* mark the parent intf as pif (master of other vifs.) */
+	parent_dev->priv_flags = IFF_EIPOIB_PIF;
+
+	parent_dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM |
+		NETIF_F_RXCSUM | NETIF_F_GRO | NETIF_F_TSO;
+
+	parent_dev->features = parent_dev->hw_features;
+	parent_dev->vlan_features = parent_dev->hw_features;
+
+	parent_dev->features |= PARENT_VLAN_FEATURES;
+}
+
+/*
+ * Create a new parent based on the specified name and parent parameters.
+ * Caller must NOT hold rtnl_lock; we need to release it here before we
+ * set up our sysfs entries.
+ */
+static struct parent *parent_create(struct net_device *ibd)
+{
+	struct net_device *parent_dev;
+	u32 num_queues;
+	int rc;
+	union ib_gid gid;
+	struct parent *parent = NULL;
+	int i, j;
+
+	memcpy(&gid, ibd->dev_addr + 4, sizeof(union ib_gid));
+	num_queues = num_online_cpus();
+	num_queues = roundup_pow_of_two(num_queues);
+
+	parent_dev = alloc_netdev_mq(sizeof(struct parent), "",
+				     parent_setup, num_queues);
+	if (!parent_dev) {
+		pr_err("%s failed to alloc netdev!\n", ibd->name);
+		rc = -ENOMEM;
+		goto out_rtnl;
+	}
+
+	rc = dev_alloc_name(parent_dev, "eth%d");
+	if (rc < 0)
+		goto out_netdev;
+
+	/* eIPoIB interface mac format. */
+	for (i = 0, j = 0; i < 8; i++) {
+		if ((PARENT_MAC_MASK >> i) & 0x1) {
+			if (j < 6) /* only 6 bytes eth address */
+				parent_dev->dev_addr[j] =
+					gid.raw[GUID_LEN + i];
+			j++;
+		}
+	}
+
+	/* assuming that the ibd->dev.parent was alreadey been set. */
+	SET_NETDEV_DEV(parent_dev, ibd->dev.parent);
+
+	rc = register_netdevice(parent_dev);
+	if (rc < 0)
+		goto out_parent;
+
+	dev_net_set(parent_dev, &init_net);
+
+	rc = parent_create_sysfs_entry(netdev_priv(parent_dev));
+	if (rc < 0)
+		goto out_unreg;
+
+	parent = netdev_priv(parent_dev);
+	memcpy(parent->gid.raw, gid.raw, GID_LEN);
+	strncpy(parent->ipoib_main_interface, ibd->name, IFNAMSIZ);
+	parent_dev->dev_id = ibd->dev_id;
+
+	return parent;
+
+out_unreg:
+	unregister_netdevice(parent_dev);
+out_parent:
+	parent_deinit(parent_dev);
+out_netdev:
+	free_netdev(parent_dev);
+out_rtnl:
+	return ERR_PTR(rc);
+}
+
+
+static void parent_free(struct parent *parent)
+{
+	struct net_device *parent_dev = parent->dev;
+
+	parent_work_cancel_all(parent);
+
+	parent_release_all(parent_dev);
+
+	unregister_netdevice(parent_dev);
+}
+
+static void parent_free_all(void)
+{
+	struct parent *parent, *nxt;
+
+	list_for_each_entry_safe(parent, nxt, &parent_dev_list, parent_list)
+		parent_free(parent);
+}
+
+/* netdev events handlers */
+static inline int is_ipoib_pif_intf(struct net_device *dev)
+{
+	if (ARPHRD_INFINIBAND == dev->type && dev->priv_flags & IFF_EIPOIB_PIF)
+			return 1;
+	return 0;
+}
+
+static int parent_event_changename(struct parent *parent)
+{
+	parent_destroy_sysfs_entry(parent);
+
+	parent_create_sysfs_entry(parent);
+
+	return NOTIFY_DONE;
+}
+
+static int parent_master_netdev_event(unsigned long event,
+				      struct net_device *parent_dev)
+{
+	struct parent *event_parent = netdev_priv(parent_dev);
+
+	switch (event) {
+	case NETDEV_CHANGENAME:
+		pr_info("%s: got NETDEV_CHANGENAME event", parent_dev->name);
+		return parent_event_changename(event_parent);
+	default:
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static int parent_slave_netdev_event(unsigned long event,
+				     struct net_device *slave_dev)
+{
+	struct net_device *parent_dev = slave_dev->master;
+	struct parent *parent = netdev_priv(parent_dev);
+
+	if (!parent_dev) {
+		pr_err("slave:%s has no parent.\n", slave_dev->name);
+		return NOTIFY_DONE;
+	}
+
+	switch (event) {
+	case NETDEV_UNREGISTER:
+		parent_release_slave(parent_dev, slave_dev);
+		break;
+	case NETDEV_CHANGE:
+	case NETDEV_UP:
+	case NETDEV_DOWN:
+		parent_set_carrier(parent);
+		break;
+	case NETDEV_CHANGEMTU:
+		parent_set_mtu(parent);
+		break;
+	case NETDEV_CHANGENAME:
+		break;
+	case NETDEV_FEAT_CHANGE:
+		parent_compute_features(parent);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static int eipoib_netdev_event(struct notifier_block *this,
+			       unsigned long event, void *ptr)
+{
+	struct net_device *event_dev = (struct net_device *)ptr;
+
+	if (dev_net(event_dev) != &init_net)
+		return NOTIFY_DONE;
+
+	if (is_parent(event_dev))
+		return parent_master_netdev_event(event, event_dev);
+
+	if (is_slave(event_dev))
+		return parent_slave_netdev_event(event, event_dev);
+	/*
+	 * general network device triggers event, check if it is new
+	 * ib interface that we want to enslave.
+	 */
+	return eipoib_device_event(this, event, ptr);
+}
+
+static struct notifier_block parent_netdev_notifier = {
+	.notifier_call = eipoib_netdev_event,
+};
+
+static int eipoib_device_event(struct notifier_block *unused,
+			       unsigned long event, void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct parent *parent;
+
+	if (!is_ipoib_pif_intf(dev))
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case NETDEV_REGISTER:
+		parent = parent_create(dev);
+		if (IS_ERR(parent)) {
+			pr_warn("failed to create parent for %s\n",
+				dev->name);
+			break;
+		}
+		break;
+	case NETDEV_UNREGISTER:
+		parent = get_parent_by_pif_name(dev->name);
+		if (parent)
+			parent_free(parent);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static int __init mod_init(void)
+{
+	int rc;
+
+	pr_info(DRV_NAME": %s", version);
+
+	rc = register_pernet_subsys(&eipoib_net_ops);
+	if (rc)
+		goto out;
+
+	rc = register_netdevice_notifier(&parent_netdev_notifier);
+	if (rc) {
+		pr_err("%s failed to register_netdevice_notifier, rc: 0x%x\n",
+		       __func__, rc);
+		goto unreg_subsys;
+	}
+
+	goto out;
+
+unreg_subsys:
+	unregister_pernet_subsys(&eipoib_net_ops);
+out:
+	return rc;
+
+}
+
+static void __exit mod_exit(void)
+{
+	unregister_netdevice_notifier(&parent_netdev_notifier);
+
+	unregister_pernet_subsys(&eipoib_net_ops);
+
+	rtnl_lock();
+	parent_free_all();
+	rtnl_unlock();
+}
+
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION(DRV_VERSION);
+MODULE_DESCRIPTION(DRV_DESCRIPTION ", v" DRV_VERSION);
+MODULE_AUTHOR("Ali Ayoub && Erez Shitrit");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 8/9] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
                   ` (6 preceding siblings ...)
  2012-07-18 11:00 ` [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality Or Gerlitz
@ 2012-07-18 11:00 ` Or Gerlitz
  2012-07-18 11:00 ` [PATCH net-next V1 9/9] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour Or Gerlitz
  8 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 11:00 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

Add Kconfig entry under drivers/net and MAINTAINERS entry for eIPoIB, also
add the driver makefile.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 MAINTAINERS                 |    6 ++++++
 drivers/net/Kconfig         |   15 +++++++++++++++
 drivers/net/Makefile        |    1 +
 drivers/net/eipoib/Makefile |    4 ++++
 4 files changed, 26 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/eipoib/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index b4321fb..52f35ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2618,6 +2618,12 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/ibm/ehea/
 
+EIPoIB (Ethernet services over IPoIB) DRIVER
+M:	Erez Shitrit <erezsh@mellanox.com>
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/eipoib/
+
 EMBEDDED LINUX
 M:	Paul Gortmaker <paul.gortmaker@windriver.com>
 M:	Matt Mackall <mpm@selenic.com>
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0c2bd80..ba98f61 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -68,6 +68,21 @@ config DUMMY
 	  To compile this driver as a module, choose M here: the module
 	  will be called dummy.
 
+config E_IPOIB
+	tristate "Ethernet Services over IPoIB"
+	depends on INFINIBAND_IPOIB
+	---help---
+	  This driver supports Ethernet protocol over InfiniBand IPoIB devices.
+	  Some services can run only on top of Ethernet L2 interfaces, and
+	  cannot be bound to an IPoIB interface.
+	  With this new driver, these services can run seamlessly.
+
+	  Main use case of the driver is the Ethernet Virtual Switching used in
+	  virtualized environments, where an eipoib netdevice can be used as a
+	  Physical Interface (PIF) in the hypervisor domain, and allow other guests
+	  Virtual Interfaces (VIF) connected to the same Virtual Switch to run over
+	  the InfiniBand fabric.
+
 config EQUALIZER
 	tristate "EQL (serial line load balancing) support"
 	---help---
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3d375ca..2c3409e 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -31,6 +31,7 @@ obj-$(CONFIG_CAIF) += caif/
 obj-$(CONFIG_CAN) += can/
 obj-$(CONFIG_ETRAX_ETHERNET) += cris/
 obj-$(CONFIG_NET_DSA) += dsa/
+obj-$(CONFIG_E_IPOIB) += eipoib/
 obj-$(CONFIG_ETHERNET) += ethernet/
 obj-$(CONFIG_FDDI) += fddi/
 obj-$(CONFIG_HIPPI) += hippi/
diff --git a/drivers/net/eipoib/Makefile b/drivers/net/eipoib/Makefile
new file mode 100644
index 0000000..b64e96e
--- /dev/null
+++ b/drivers/net/eipoib/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_E_IPOIB)                         := eth_ipoib.o
+eth_ipoib-y                                    := eth_ipoib_main.o \
+                                                  eth_ipoib_sysfs.o \
+                                                  eth_ipoib_ethtool.o
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next V1 9/9] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour
  2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
                   ` (7 preceding siblings ...)
  2012-07-18 11:00 ` [PATCH net-next V1 8/9] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries Or Gerlitz
@ 2012-07-18 11:00 ` Or Gerlitz
  8 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 11:00 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit, Or Gerlitz

From: Erez Shitrit <erezsh@mellanox.co.il>

Guest IP packets sent by eIPoIB over an IPoIB VIF interface do not point
to dst or neighbour. This patch modifies IPoIB such that trasnmission
of such packets is possible. It does so by extending an already existing
path in the driver which was used so far only for unicast ARP probes.

This patch was made such that the series works as is over net-next, in
parallel to this driver a patch to modify IPoIB such that it doesn't
assume dst/neighbour on the skb was posted.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8575fa7..322e5a5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -732,7 +732,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned long flags;
 
 	rcu_read_lock();
-	if (likely(skb_dst(skb))) {
+	if (likely(skb_dst(skb)) && !(dev->priv_flags & IFF_EIPOIB_VIF)) {
 		n = dst_neigh_lookup_skb(skb_dst(skb), skb);
 		if (!n) {
 			++dev->stats.tx_dropped;
@@ -800,7 +800,8 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			/* unicast GID -- should be ARP or RARP reply */
 
 			if ((be16_to_cpup((__be16 *) skb->data) != ETH_P_ARP) &&
-			    (be16_to_cpup((__be16 *) skb->data) != ETH_P_RARP)) {
+			    (be16_to_cpup((__be16 *) skb->data) != ETH_P_RARP) &&
+				!(dev->priv_flags & IFF_EIPOIB_VIF)) {
 				ipoib_warn(priv, "Unicast, no %s: type %04x, QPN %06x %pI6\n",
 					   skb_dst(skb) ? "neigh" : "dst",
 					   be16_to_cpup((__be16 *) skb->data),
@@ -850,7 +851,7 @@ static int ipoib_hard_header(struct sk_buff *skb,
 	 * destination address into skb->cb so we can figure out where
 	 * to send the packet later.
 	 */
-	if (!skb_dst(skb)) {
+	if (!skb_dst(skb) || dev->priv_flags & IFF_EIPOIB_VIF) {
 		struct ipoib_cb *cb = (struct ipoib_cb *) skb->cb;
 		memcpy(cb->hwaddr, daddr, INFINIBAND_ALEN);
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support
  2012-07-18 10:59 ` [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support Or Gerlitz
@ 2012-07-18 18:37   ` Ben Hutchings
  2012-07-19 15:55     ` Or Gerlitz
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2012-07-18 18:37 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: davem, roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On Wed, 2012-07-18 at 13:59 +0300, Or Gerlitz wrote:
> From: Erez Shitrit <erezsh@mellanox.co.il>
> 
> Via ethtool the driver describes its version, ABI version, on what PIF
> interface it runs and various statistics.
[...]
> +static const char parent_strings[][ETH_GSTRING_LEN] = {
> +	/* private statistics */
> +	"tx_parent_dropped",
> +	"tx_vif_miss",
> +	"tx_neigh_miss",
> +	"tx_vlan",
> +	"tx_shared",
> +	"tx_proto_errors",
> +	"tx_skb_errors",
> +	"tx_slave_err",
> +
> +	"rx_parent_dropped",
> +	"rx_vif_miss",
> +	"rx_neigh_miss",
> +	"rx_vlan",
> +	"rx_shared",
> +	"rx_proto_errors",
> +	"rx_skb_errors",
> +	"rx_slave_err",
> +#define PORT_STATS_LEN	(8 * 2)
> +};
> +
> +#define PARENT_STATS_LEN (sizeof(parent_strings) / ETH_GSTRING_LEN)
> +
> +static void parent_get_strings(struct net_device *parent_dev,
> +			       uint32_t stringset, uint8_t *data)
> +{
> +	int index = 0, stats_off = 0, i;
> +
> +	if (stringset != ETH_SS_STATS)
> +		return;
> +
> +	for (i = 0; i < PORT_STATS_LEN; i++)
> +		strcpy(data + (index++) * ETH_GSTRING_LEN,
> +		       parent_strings[i + stats_off]);
> +
> +	stats_off += PORT_STATS_LEN;

This is a very longwinded way to write:
	memcpy(data, parent_strings, sizeof(parent_strings));

> +
> +}
> +
> +static void parent_get_ethtool_stats(struct net_device *parent_dev,
> +				     struct ethtool_stats *stats,
> +				     uint64_t *data)
> +{
> +	struct parent *parent = netdev_priv(parent_dev);
> +	int index = 0, i;
> +
> +	read_lock_bh(&parent->lock);
> +
> +	for (i = 0; i < PORT_STATS_LEN; i++)
> +		data[index++] = ((unsigned long *) &parent->port_stats)[i];
> +
> +	read_unlock_bh(&parent->lock);
> +}
> +
> +static int parent_get_sset_count(struct net_device *parent_dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return PARENT_STATS_LEN;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
[...]

I get the feeling you've removed some code with unifdef; the result
looks really weird, with PORT_STATS_LEN and PARENT_STATS_LEN used
inconsistently.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition
  2012-07-18 10:59 ` [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
@ 2012-07-18 18:38   ` David Miller
  2012-07-18 21:24     ` Or Gerlitz
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2012-07-18 18:38 UTC (permalink / raw)
  To: ogerlitz; +Cc: roland, netdev, ali, sean.hefty, shlomop, erezsh

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Wed, 18 Jul 2012 13:59:54 +0300

> All sorts of childs are still created/deleted through sysfs, in a
> similar manner to the way legacy child interfaces are.

Network device instantiation of this type is the domain of
rtnl_link_ops rather than ugly sysfs interfaces.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition
  2012-07-18 18:38   ` David Miller
@ 2012-07-18 21:24     ` Or Gerlitz
  2012-07-18 21:36       ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Or Gerlitz @ 2012-07-18 21:24 UTC (permalink / raw)
  To: David Miller; +Cc: roland, netdev, ali, sean.hefty, shlomop, erezsh

On Wed, Jul 18, 2012 at 9:38 PM, David Miller <davem@davemloft.net> wrote:
> From: Or Gerlitz <ogerlitz@mellanox.com>

>> All sorts of childs are still created/deleted through sysfs, in a
>> similar manner to the way legacy child interfaces are.

> Network device instantiation of this type is the domain of
> rtnl_link_ops rather than ugly sysfs interfaces.

Didn't add any **new** sysfs interfaces in this patch. The IPoIB sysfs
entries to create child devices are there from IPoIB's day one, and
we're only extending them a tiny bit.

Or.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition
  2012-07-18 21:24     ` Or Gerlitz
@ 2012-07-18 21:36       ` David Miller
  2012-07-18 22:11         ` John Fastabend
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2012-07-18 21:36 UTC (permalink / raw)
  To: or.gerlitz; +Cc: roland, netdev, ali, sean.hefty, shlomop, erezsh

From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Thu, 19 Jul 2012 00:24:58 +0300

> On Wed, Jul 18, 2012 at 9:38 PM, David Miller <davem@davemloft.net> wrote:
>> From: Or Gerlitz <ogerlitz@mellanox.com>
> 
>>> All sorts of childs are still created/deleted through sysfs, in a
>>> similar manner to the way legacy child interfaces are.
> 
>> Network device instantiation of this type is the domain of
>> rtnl_link_ops rather than ugly sysfs interfaces.
> 
> Didn't add any **new** sysfs interfaces in this patch. The IPoIB sysfs
> entries to create child devices are there from IPoIB's day one, and
> we're only extending them a tiny bit.

That's extremely unfortunate, having private ways of instantiating
networking devices leads to an extremely poor user experience.

Would you like to have to train every single user in the case
where each and every driver author makes his own unique way
of configuring his hardware?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition
  2012-07-18 21:36       ` David Miller
@ 2012-07-18 22:11         ` John Fastabend
  2012-07-19  8:11           ` Or Gerlitz
  0 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2012-07-18 22:11 UTC (permalink / raw)
  To: David Miller, or.gerlitz; +Cc: roland, netdev, ali, sean.hefty, shlomop, erezsh

On 7/18/2012 2:36 PM, David Miller wrote:
> From: Or Gerlitz <or.gerlitz@gmail.com>
> Date: Thu, 19 Jul 2012 00:24:58 +0300
>
>> On Wed, Jul 18, 2012 at 9:38 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Or Gerlitz <ogerlitz@mellanox.com>
>>
>>>> All sorts of childs are still created/deleted through sysfs, in a
>>>> similar manner to the way legacy child interfaces are.
>>
>>> Network device instantiation of this type is the domain of
>>> rtnl_link_ops rather than ugly sysfs interfaces.
>>
>> Didn't add any **new** sysfs interfaces in this patch. The IPoIB sysfs
>> entries to create child devices are there from IPoIB's day one, and
>> we're only extending them a tiny bit.
>
> That's extremely unfortunate, having private ways of instantiating
> networking devices leads to an extremely poor user experience.
>
> Would you like to have to train every single user in the case
> where each and every driver author makes his own unique way
> of configuring his hardware?
> --

Or,

I've got a rough patch to use rtnl_link_ops to add what we've been
calling 'virtual machine device queues' or VMDq. This looks a lot
like macvlan with offloaded switching and I believe similar to your
child case above.

Also what is a "pkey"

I'll post it as a use at your own risk shortly although this week I'm
short on time so maybe next week I can get something more "real" out.
Been stealing cycles between other work things today.

If you want to do complete it more power to you.

.John

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition
  2012-07-18 22:11         ` John Fastabend
@ 2012-07-19  8:11           ` Or Gerlitz
  0 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-19  8:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: David Miller, roland, netdev, ali, sean.hefty, shlomop, erezsh,
	Or Gerlitz

On Thu, Jul 19, 2012 at 1:11 AM, John Fastabend
<john.r.fastabend@intel.com> wrote:

> [...] Also what is a "pkey"

Hi John,

pkey (pronounced PEE KEY) stands for "partition keys" where partitions are
in a way IB's vlans, so the functionality provided by IPoIB child devices
is similar to what done by Ethernet 8021q vlan devices. Dave suggests
that we use rtnl_link_ops to create these childs instead of the proprietary
sysfs which was introduced when IPoIB was merged and is described here
Documentation/infiniband/ipoib.txt


Or.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality
  2012-07-18 11:00 ` [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality Or Gerlitz
@ 2012-07-19 13:49   ` Ben Hutchings
  2012-07-19 15:46     ` Or Gerlitz
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2012-07-19 13:49 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: davem, roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On Wed, 2012-07-18 at 14:00 +0300, Or Gerlitz wrote:
[...]
> +/* ----------------------------- VLAN funcs ---------------------------------- */
> +static int eth_ipoib_vlan_rx_add_vid(struct net_device *dev,
> +				     unsigned short vid)
> +{
> +	return 0;
> +}
> +
> +static int eth_ipoib_vlan_rx_kill_vid(struct net_device *dev,
> +				      unsigned short vid)
> +{
> +	return 0;
> +}
[...]
> +/* -------------------------- Device entry points --------------------------- */
> +static struct net_device_stats *parent_get_stats(struct net_device *parent_dev)
> +{
[...]
> +static const struct net_device_ops parent_netdev_ops = {
> +	.ndo_init		= parent_init,
> +	.ndo_uninit		= parent_uninit,
> +	.ndo_open		= parent_open,
> +	.ndo_stop		= parent_close,
> +	.ndo_start_xmit		= parent_tx,
> +	.ndo_select_queue	= parent_select_q,
> +	/* parnt mtu is min(slaves_mtus) */
> +	.ndo_change_mtu		= NULL,
> +	.ndo_fix_features	= parent_fix_features,
> +	/*
> +	 * initial mac address is randomized, can be changed
> +	 * thru this func later
> +	 */
> +	.ndo_set_mac_address = eth_mac_addr,
> +	.ndo_get_stats = parent_get_stats,

Why not implement ndo_get_stats64?  I don't think there's any good
reason for a new driver not to.

> +	.ndo_vlan_rx_add_vid = eth_ipoib_vlan_rx_add_vid,
> +	.ndo_vlan_rx_kill_vid = eth_ipoib_vlan_rx_kill_vid,

These shouldn't be needed.

[...]
> +/* netdev events handlers */
> +static inline int is_ipoib_pif_intf(struct net_device *dev)
> +{
> +	if (ARPHRD_INFINIBAND == dev->type && dev->priv_flags & IFF_EIPOIB_PIF)
> +			return 1;
[...]

Wrong indentation.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality
  2012-07-19 13:49   ` Ben Hutchings
@ 2012-07-19 15:46     ` Or Gerlitz
  2012-07-19 16:16       ` Ben Hutchings
  0 siblings, 1 reply; 22+ messages in thread
From: Or Gerlitz @ 2012-07-19 15:46 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On 7/19/2012 4:49 PM, Ben Hutchings wrote:
> On Wed, 2012-07-18 at 14:00 +0300, Or Gerlitz wrote:
> +static const struct net_device_ops parent_netdev_ops = {
> +	.ndo_init		= parent_init,
> +	.ndo_uninit		= parent_uninit,
> +	.ndo_open		= parent_open,
> +	.ndo_stop		= parent_close,
> +	.ndo_start_xmit		= parent_tx,
> +	.ndo_select_queue	= parent_select_q,
> +	/* parnt mtu is min(slaves_mtus) */
> +	.ndo_change_mtu		= NULL,
> +	.ndo_fix_features	= parent_fix_features,
> +	/*
> +	 * initial mac address is randomized, can be changed
> +	 * thru this func later
> +	 */
> +	.ndo_set_mac_address = eth_mac_addr,
> +	.ndo_get_stats = parent_get_stats,
>
> Why not implement ndo_get_stats64?  I don't think there's any good
> reason for a new driver not to.

Indeed, will do  ndo_get_stats64

>
>
>> +	.ndo_vlan_rx_add_vid = eth_ipoib_vlan_rx_add_vid,
>> +	.ndo_vlan_rx_kill_vid = eth_ipoib_vlan_rx_kill_vid,
>
> These shouldn't be needed.

ok, here's the point, the eIPoIB driver maps Ethernet vlans to 
infiniband/IPoIB pkeys
(partition keys). The underlying IPoIB devices work with these pkeys
in a way which is HW accelerated, and we want the eIPoIB driver to be 
considered as one
that support HW accelerate vlans. E.g on the TX flow we don't want that 
any special SW
handling by the 8021q driver will be done on the skb except for setting 
the skb->vlan_tci
field, and in the RX flow, we set skb->vlan_tci field and don't want 
that 8021q to try
and extract it from the headers, etc.

For that end, I was under the impression all the three 
NETIF_F_HW_VLAN_{TX,RX,FILTER)
features need to be advertized. From your comment I understand now that 
RX/TX are enough
in that respect?



>
>
> [...]
>> +/* netdev events handlers */
>> +static inline int is_ipoib_pif_intf(struct net_device *dev)
>> +{
>> +	if (ARPHRD_INFINIBAND == dev->type && dev->priv_flags & IFF_EIPOIB_PIF)
>> +			return 1;
> [...]
>
> Wrong indentation.

will fix, thanks for spotting this.

Or.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support
  2012-07-18 18:37   ` Ben Hutchings
@ 2012-07-19 15:55     ` Or Gerlitz
  0 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-19 15:55 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On 7/18/2012 9:37 PM, Ben Hutchings wrote:
> +static void parent_get_strings(struct net_device *parent_dev,
> +			       uint32_t stringset, uint8_t *data)
> +{
> +	int index = 0, stats_off = 0, i;
> +
> +	if (stringset != ETH_SS_STATS)
> +		return;
> +
> +	for (i = 0; i < PORT_STATS_LEN; i++)
> +		strcpy(data + (index++) * ETH_GSTRING_LEN,
> +		       parent_strings[i + stats_off]);
> +
> +	stats_off += PORT_STATS_LEN;
> This is a very longwinded way to write:
> 	memcpy(data, parent_strings, sizeof(parent_strings));

SURE, will fix

>
>> +static int parent_get_sset_count(struct net_device *parent_dev, int sset)
>> +{
>> +	switch (sset) {
>> +	case ETH_SS_STATS:
>> +		return PARENT_STATS_LEN;
>>
> [...]
>
> I get the feeling you've removed some code with unifdef; the result
> looks really weird, with PORT_STATS_LEN and PARENT_STATS_LEN used
> inconsistently.

yep, this needs cleanup, will do for V2

Or.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality
  2012-07-19 15:46     ` Or Gerlitz
@ 2012-07-19 16:16       ` Ben Hutchings
  2012-07-19 16:21         ` Or Gerlitz
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2012-07-19 16:16 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: davem, roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On Thu, 2012-07-19 at 18:46 +0300, Or Gerlitz wrote:
> On 7/19/2012 4:49 PM, Ben Hutchings wrote:
> > On Wed, 2012-07-18 at 14:00 +0300, Or Gerlitz wrote:
[...]
> >> +	.ndo_vlan_rx_add_vid = eth_ipoib_vlan_rx_add_vid,
> >> +	.ndo_vlan_rx_kill_vid = eth_ipoib_vlan_rx_kill_vid,
> >
> > These shouldn't be needed.
> 
> ok, here's the point, the eIPoIB driver maps Ethernet vlans to 
> infiniband/IPoIB pkeys
> (partition keys). The underlying IPoIB devices work with these pkeys
> in a way which is HW accelerated, and we want the eIPoIB driver to be 
> considered as one
> that support HW accelerate vlans. E.g on the TX flow we don't want that 
> any special SW
> handling by the 8021q driver will be done on the skb except for setting 
> the skb->vlan_tci
> field, and in the RX flow, we set skb->vlan_tci field and don't want 
> that 8021q to try
> and extract it from the headers, etc.
> 
> For that end, I was under the impression all the three 
> NETIF_F_HW_VLAN_{TX,RX,FILTER)
> features need to be advertized. From your comment I understand now that 
> RX/TX are enough
> in that respect?
[...]

Yes.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality
  2012-07-19 16:16       ` Ben Hutchings
@ 2012-07-19 16:21         ` Or Gerlitz
  0 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-19 16:21 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On 7/19/2012 7:16 PM, Ben Hutchings wrote:
> On Thu, 2012-07-19 at 18:46 +0300, Or Gerlitz wrote:
> For that end, I was under the impression all the three  NETIF_F_HW_VLAN_{TX,RX,FILTER) features need to be advertized. From your comment I understand now that RX/TX are enough in that respect?
> [...]
>
> Yes.

OK, good, will fix.

Or.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next V1 6/9] net/eipoib: Add sysfs support
  2012-07-18 10:59 ` [PATCH net-next V1 6/9] net/eipoib: Add sysfs support Or Gerlitz
@ 2012-07-23 12:55   ` Or Gerlitz
  0 siblings, 0 replies; 22+ messages in thread
From: Or Gerlitz @ 2012-07-23 12:55 UTC (permalink / raw)
  To: davem; +Cc: roland, netdev, ali, sean.hefty, shlomop, Erez Shitrit

On 7/18/2012 1:59 PM, Or Gerlitz wrote:
> The management interface for the driver uses sysfs entries. Via these sysfs entries the driver gets details on new VIF's to manage. The driver can enslave new VIF (IPoIB cloned interface) or detaches from it. Here are few sysfs commands that are used in order to manage the driver, according to few scenarios:
>
> 1. create new clone of IPoIB interface:
> 	$ echo .Y > /sys/class/net/ibX/create_child
> create new clone ibX.Y with the same pkey as ibX, for example:
> 	$ echo .1 > /sys/class/net/ib0/create_child
> will create new interface ib0.1
>
> 2. notify parent interface on new VIF to enslave:
> 	$ echo +ibX.Y > /sys/class/net/ethZ/eth/slaves
> where ethZ is the driver interface, for example:
> 	$ echo +ib0.1 > /sys/class/net/eth4/eth/slaves
> will enslave ib0.1 to eth4
>
> 3. notify parent interface interface on VIF details (mac and vlan)
> 	$ echo +ibX.Y <MAC address> > /sys/class/net/ethZ/eth/vifs
> for example:
> 	$ echo +ib0.1 00:02:c9:43:3b:f1 > /sys/class/net/eth4/eth/vifs

Hi Dave,

Following the comment you made on patch 1/9 we are modifying operations

#1 - create/delete clone of IPoIB device -- changed to use rtnl_link_ops

#2 - enslave/un-enslave a IPoIB device clone to eIPoIB device -- changed 
to support ndo_add_slave/ndo_delete_slave on eIPoIB

re #3, which is to create association which we call a VIF within the 
eIPoIB driver between an IPoIB slave to mac and vlan, we used sysfs as 
you can see above, and I wanted to ask re the correct way to do that.

One option which we consider, is to add new ndo operation ndo_add_vif to 
be supported by eIPoIB and call it from new netlink channel 
ifla_vif_mac_vlan, makes sense?

Or.



>
>
> 4. notify parent to release VIF:
>
> 	$ echo -ibX.Y > /sys/class/net/ethZ/eth/slaves
>
> where ethZ is the driver interface, for example:
>
>          $ echo -ib0.1 > /sys/class/net/eth4/eth/slaves
>
> will release ib0.1 from eth4
>
> 5. see the list of ipoib interfaces enslaved under eipoib interface,
>
> 	$ cat /sys/class/net/ethX/eth/vifs
>
> for example:
>
> 	$ cat /sys/class/net/eth4/eth/vifs
>
> 	SLAVE=ib0.1      MAC=9a:c2:1f:d7:3b:63 VLAN=N/A
> 	SLAVE=ib0.2      MAC=52:54:00:60:55:88 VLAN=N/A
> 	SLAVE=ib0.3      MAC=52:54:00:60:55:89 VLAN=N/A
>
> Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
>   drivers/net/eipoib/eth_ipoib_sysfs.c |  640 ++++++++++++++++++++++++++++++++++
>   1 files changed, 640 insertions(+), 0 deletions(-)
>   create mode 100644 drivers/net/eipoib/eth_ipoib_sysfs.c
>
> diff --git a/drivers/net/eipoib/eth_ipoib_sysfs.c b/drivers/net/eipoib/eth_ipoib_sysfs.c
> new file mode 100644
> index 0000000..c3fc121
> --- /dev/null
> +++ b/drivers/net/eipoib/eth_ipoib_sysfs.c
> @@ -0,0 +1,640 @@
> +/*
> + * Copyright (c) 2012 Mellanox Technologies. All rights reserved
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * openfabric.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/sched.h>
> +#include <linux/fs.h>
> +#include <linux/types.h>
> +#include <linux/string.h>
> +#include <linux/netdevice.h>
> +#include <linux/inetdevice.h>
> +#include <linux/in.h>
> +#include <linux/sysfs.h>
> +#include <linux/ctype.h>
> +#include <linux/inet.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/etherdevice.h>
> +#include <net/net_namespace.h>
> +
> +#include "eth_ipoib.h"
> +
> +#define to_dev(obj)	container_of(obj, struct device, kobj)
> +#define to_parent(cd)	((struct parent *)(netdev_priv(to_net_dev(cd))))
> +#define MOD_NA_STRING		"N/A"
> +
> +#define _sprintf(p, buf, format, arg...)				\
> +((PAGE_SIZE - (int)(p - buf)) <= 0 ? 0 :				\
> +	scnprintf(p, PAGE_SIZE - (int)(p - buf), format, ## arg))\
> +
> +#define _end_of_line(_p, _buf)					\
> +do { if (_p - _buf) /* eat the leftover space */			\
> +		buf[_p - _buf - 1] = '\n';				\
> +} while (0)
> +
> +/* helper functions */
> +static int get_emac(u8 *mac, char *s)
> +{
> +	if (sscanf(s, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
> +		   mac + 0, mac + 1, mac + 2, mac + 3, mac + 4,
> +		   mac + 5) != 6)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static int get_imac(u8 *mac, char *s)
> +{
> +	if (sscanf(s, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:"
> +		   "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:%hhx:"
> +		   "%hhx:%hhx:%hhx:%hhx",
> +		   mac + 0, mac + 1, mac + 2, mac + 3, mac + 4,
> +		   mac + 5, mac + 6, mac + 7, mac + 8, mac + 9,
> +		   mac + 10, mac + 11, mac + 12, mac + 13,
> +		   mac + 14, mac + 15, mac + 16, mac + 17,
> +		   mac + 18, mac + 19) != 20)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +/* show/store functions per module (CLASS_ATTR) */
> +static ssize_t show_parents(struct class *cls, struct class_attribute *attr,
> +			    char *buf)
> +{
> +	char *p = buf;
> +	struct parent *parent;
> +
> +	rtnl_lock(); /* because of parent_dev_list */
> +
> +	list_for_each_entry(parent, &parent_dev_list, parent_list) {
> +		p += _sprintf(p, buf, "%s over IB port: %s\n",
> +			      parent->dev->name,
> +			      parent->ipoib_main_interface);
> +	}
> +	_end_of_line(p, buf);
> +
> +	rtnl_unlock();
> +	return (ssize_t)(p - buf);
> +}
> +
> +/* show/store functions per parent (DEVICE_ATTR) */
> +static ssize_t parent_show_neighs(struct device *d,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	struct slave *slave;
> +	struct neigh *neigh;
> +	struct parent *parent = to_parent(d);
> +	char *p = buf;
> +
> +	read_lock_bh(&parent->lock);
> +	parent_for_each_slave(parent, slave) {
> +		list_for_each_entry(neigh, &slave->neigh_list, list) {
> +			p += _sprintf(p, buf, "SLAVE=%-10s EMAC=%pM IMAC=%pM:%pM:%pM:%.2x:%.2x\n",
> +				      slave->dev->name,
> +				      neigh->emac,
> +				      neigh->imac, neigh->imac + 6, neigh->imac + 12,
> +				      neigh->imac[18], neigh->imac[19]);
> +		}
> +	}
> +
> +	read_unlock_bh(&parent->lock);
> +
> +	_end_of_line(p, buf);
> +
> +	return (ssize_t)(p - buf);
> +}
> +
> +struct neigh *parent_get_neigh_cmd(char op,
> +				   char *ifname, u8 *remac, u8 *rimac)
> +{
> +	struct neigh *neigh_cmd;
> +
> +	neigh_cmd = kzalloc(sizeof *neigh_cmd, GFP_ATOMIC);
> +	if (!neigh_cmd) {
> +		pr_err("%s cannot allocate neigh struct\n", ifname);
> +		goto out;
> +	}
> +
> +	/*
> +	 * populate emac field so it can be used easily
> +	 * in neigh_cmd_find_by_mac()
> +	 */
> +	memcpy(neigh_cmd->emac, remac, ETH_ALEN);
> +	memcpy(neigh_cmd->imac, rimac, INFINIBAND_ALEN);
> +
> +	/* prepare the command as a string */
> +	sprintf(neigh_cmd->cmd, "%c%s %pM %pM:%pM:%pM:%.2x:%.2x",
> +		op, ifname, remac, rimac, rimac + 6, rimac + 12, rimac[18], rimac[19]);
> +out:
> +	return neigh_cmd;
> +}
> +
> +/* write_lock_bh(&parent->lock) must be held */
> +ssize_t __parent_store_neighs(struct device *d,
> +			      struct device_attribute *attr,
> +			      const char *buffer, size_t count)
> +{
> +	char command[IFNAMSIZ + 1] = { 0, };
> +	char emac_str[ETH_ALEN * 3] = { 0, };
> +	u8 emac[ETH_ALEN];
> +	char imac_str[INFINIBAND_ALEN * 3] = { 0, };
> +	u8 imac[INFINIBAND_ALEN];
> +	char *ifname;
> +	int found = 0, ret = count;
> +	struct slave *slave = NULL, *slave_tmp;
> +	struct neigh *neigh;
> +	struct parent *parent = to_parent(d);
> +
> +	sscanf(buffer, "%s %s %s", command, emac_str, imac_str);
> +
> +	/* check ifname */
> +	ifname = command + 1;
> +	if ((strlen(command) <= 1) || !dev_valid_name(ifname) ||
> +	    (command[0] != '+' && command[0] != '-'))
> +		goto err_no_cmd;
> +
> +	/* check if ifname exist */
> +	parent_for_each_slave(parent, slave_tmp) {
> +		if (!strcmp(slave_tmp->dev->name, ifname)) {
> +			found = 1;
> +			slave = slave_tmp;
> +		}
> +	}
> +
> +	if (!found) {
> +		pr_err("%s could not find slave\n", ifname);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (get_emac(emac, emac_str)) {
> +		pr_err("%s bad emac %s\n", ifname, emac_str);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (get_imac(imac, imac_str)) {
> +		pr_err("%s bad imac %s\n", ifname, imac_str);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	/* process command */
> +	if (command[0] == '+') {
> +		found = 0;
> +		list_for_each_entry(neigh, &slave->neigh_list, list) {
> +			if (!memcmp(neigh->emac, emac, ETH_ALEN))
> +				found = 1;
> +		}
> +
> +		if (found) {
> +			pr_err("%s: cannot update neigh, slave already has "
> +			       "this neigh mac %pM\n",
> +			       slave->dev->name, emac);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		neigh = kzalloc(sizeof *neigh, GFP_ATOMIC);
> +		if (!neigh) {
> +			pr_err("%s cannot allocate neigh struct\n",
> +			       slave->dev->name);
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +
> +		/* ready to go */
> +		pr_info("%s: slave %s neigh mac is set to %pM\n",
> +			ifname, parent->dev->name, emac);
> +		memcpy(neigh->emac, emac, ETH_ALEN);
> +		memcpy(neigh->imac, imac, INFINIBAND_ALEN);
> +
> +		list_add_tail(&neigh->list, &slave->neigh_list);
> +
> +		goto out;
> +	}
> +
> +	if (command[0] == '-') {
> +		found = 0;
> +		list_for_each_entry(neigh, &slave->neigh_list, list) {
> +			if (!memcmp(neigh->emac, emac, ETH_ALEN))
> +				found = 1;
> +		}
> +
> +		if (!found) {
> +			pr_err("%s cannot delete neigh mac %pM\n",
> +			       ifname, emac);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		list_del(&neigh->list);
> +		kfree(neigh);
> +
> +		goto out;
> +	}
> +
> +err_no_cmd:
> +	pr_err("%s USAGE: (-|+)ifname emac imac\n", DRV_NAME);
> +	ret = -EPERM;
> +
> +out:
> +	return ret;
> +}
> +
> +static ssize_t parent_store_neighs(struct device *d,
> +				   struct device_attribute *attr,
> +				   const char *buffer, size_t count)
> +{
> +	struct parent *parent = to_parent(d);
> +	ssize_t rc;
> +
> +	write_lock_bh(&parent->lock);
> +	rc = __parent_store_neighs(d, attr, buffer, count);
> +	write_unlock_bh(&parent->lock);
> +
> +	return rc;
> +}
> +
> +static DEVICE_ATTR(neighs, S_IRUGO | S_IWUSR, parent_show_neighs,
> +		   parent_store_neighs);
> +
> +static ssize_t parent_show_vifs(struct device *d,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct slave *slave;
> +	struct parent *parent = to_parent(d);
> +	char *p = buf;
> +
> +	read_lock_bh(&parent->lock);
> +	parent_for_each_slave(parent, slave) {
> +		if (is_zero_ether_addr(slave->emac)) {
> +			p += _sprintf(p, buf, "SLAVE=%-10s MAC=%-17s "
> +				      "VLAN=%s\n", slave->dev->name,
> +				      MOD_NA_STRING, MOD_NA_STRING);
> +		} else if (slave->vlan == VLAN_N_VID) {
> +			p += _sprintf(p, buf, "SLAVE=%-10s MAC=%pM VLAN=%s\n",
> +				      slave->dev->name,
> +				      slave->emac,
> +				      MOD_NA_STRING);
> +		} else {
> +			p += _sprintf(p, buf, "SLAVE=%-10s MAC=%pM VLAN=%d\n",
> +				      slave->dev->name,
> +				      slave->emac,
> +				      slave->vlan);
> +		}
> +	}
> +	read_unlock_bh(&parent->lock);
> +
> +	_end_of_line(p, buf);
> +
> +	return (ssize_t)(p - buf);
> +}
> +
> +static ssize_t parent_store_vifs(struct device *d,
> +				 struct device_attribute *attr,
> +				 const char *buffer, size_t count)
> +{
> +	char command[IFNAMSIZ + 1] = { 0, };
> +	char mac_str[ETH_ALEN * 3] = { 0, };
> +	char *ifname;
> +	u8 mac[ETH_ALEN];
> +	int found = 0, ret = count;
> +	struct slave *slave = NULL, *slave_tmp;
> +	struct parent *parent = to_parent(d);
> +
> +	sscanf(buffer, "%s %s", command, mac_str);
> +
> +	write_lock_bh(&parent->lock);
> +
> +	/* check ifname */
> +	ifname = command + 1;
> +	if ((strlen(command) <= 1) || !dev_valid_name(ifname) ||
> +	    (command[0] != '+' && command[0] != '-'))
> +		goto err_no_cmd;
> +
> +	/* check if ifname exist */
> +	parent_for_each_slave(parent, slave_tmp) {
> +		if (!strcmp(slave_tmp->dev->name, ifname)) {
> +			found = 1;
> +			slave = slave_tmp;
> +		}
> +	}
> +
> +	if (!found) {
> +		pr_err("%s could not find slave\n", ifname);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	/* process command */
> +	if (command[0] == '+') {
> +		if (get_emac(mac, mac_str) || !is_valid_ether_addr(mac)) {
> +			pr_err("%s invalid mac input\n", ifname);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		if (!is_zero_ether_addr(slave->emac)) {
> +			pr_err("%s slave %s mac already set to %pM\n",
> +			       ifname, slave->dev->name, slave->emac);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		/* check another slave has this mac/vlan */
> +		found = 0;
> +		parent_for_each_slave(parent, slave_tmp) {
> +			if (!memcmp(slave_tmp->emac, mac, ETH_ALEN) &&
> +			    slave_tmp->vlan == slave->vlan) {
> +				pr_err("cannot update %s, slave %s already has"
> +				       " vlan 0x%x mac %pM\n",
> +				       parent->dev->name, slave->dev->name,
> +				       slave_tmp->vlan,
> +				       mac);
> +				ret = -EINVAL;
> +				goto out;
> +			}
> +		}
> +
> +		/* ready to go */
> +		pr_info("slave %s mac is set to %pM\n",
> +			ifname, mac);
> +
> +		memcpy(slave->emac, mac, ETH_ALEN);
> +		goto out;
> +	}
> +
> +	if (command[0] == '-') {
> +		if (is_zero_ether_addr(slave->emac)) {
> +			pr_err("%s slave mac already unset %pM\n",
> +			       ifname, slave->emac);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		pr_info("slave %s mac is unset (was %pM)\n",
> +			ifname, slave->emac);
> +
> +		goto out;
> +	}
> +
> +err_no_cmd:
> +	pr_err("%s USAGE: (-|+)ifname [mac]\n", DRV_NAME);
> +	ret = -EPERM;
> +
> +out:
> +	write_unlock_bh(&parent->lock);
> +
> +	return ret;
> +}
> +
> +static DEVICE_ATTR(vifs, S_IRUGO | S_IWUSR, parent_show_vifs,
> +		   parent_store_vifs);
> +
> +static ssize_t parent_show_slaves(struct device *d,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	struct slave *slave;
> +	struct parent *parent = to_parent(d);
> +	char *p = buf;
> +
> +	read_lock_bh(&parent->lock);
> +	parent_for_each_slave(parent, slave)
> +		p += _sprintf(p, buf, "%s\n", slave->dev->name);
> +	read_unlock_bh(&parent->lock);
> +
> +	_end_of_line(p, buf);
> +
> +	return (ssize_t)(p - buf);
> +}
> +
> +static ssize_t parent_store_slaves(struct device *d,
> +				   struct device_attribute *attr,
> +				   const char *buffer, size_t count)
> +{
> +	char command[IFNAMSIZ + 1] = { 0, };
> +	char *ifname;
> +	int res, ret = count;
> +	struct slave *slave;
> +	struct net_device *dev = NULL;
> +	struct parent *parent = to_parent(d);
> +
> +	/* Quick sanity check -- is the parent interface up? */
> +	if (!(parent->dev->flags & IFF_UP)) {
> +		pr_warn("%s: doing slave updates when "
> +			"interface is down.\n", dev->name);
> +	}
> +
> +	if (!rtnl_trylock()) /* because __dev_get_by_name */
> +		return restart_syscall();
> +
> +	sscanf(buffer, "%16s", command);
> +
> +	ifname = command + 1;
> +	if ((strlen(command) <= 1) || !dev_valid_name(ifname))
> +		goto err_no_cmd;
> +
> +	if (command[0] == '+') {
> +		/* Got a slave name in ifname. Is it already in the list? */
> +		dev = __dev_get_by_name(&init_net, ifname);
> +		if (!dev) {
> +			pr_warn("%s: Interface %s does not exist!\n",
> +				parent->dev->name, ifname);
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +
> +		read_lock_bh(&parent->lock);
> +		parent_for_each_slave(parent, slave) {
> +			if (slave->dev == dev) {
> +				pr_err("%s ERR- Interface %s is already enslaved!\n",
> +				       parent->dev->name, dev->name);
> +				ret = -EPERM;
> +			}
> +		}
> +		read_unlock_bh(&parent->lock);
> +
> +		if (ret < 0)
> +			goto out;
> +
> +		pr_info("%s: adding slave %s\n",
> +			parent->dev->name, ifname);
> +
> +		res = parent_enslave(parent->dev, dev);
> +		if (res)
> +			ret = res;
> +
> +		goto out;
> +	}
> +
> +	if (command[0] == '-') {
> +		dev = NULL;
> +		parent_for_each_slave(parent, slave)
> +			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
> +				dev = slave->dev;
> +				break;
> +			}
> +
> +		if (dev) {
> +			pr_info("%s: removing slave %s\n",
> +				parent->dev->name, dev->name);
> +			res = parent_release_slave(parent->dev, dev);
> +			if (res) {
> +				ret = res;
> +				goto out;
> +			}
> +		} else {
> +			pr_warn("%s: unable to remove non-existent "
> +				"slave for parent %s.\n",
> +				ifname, parent->dev->name);
> +			ret = -ENODEV;
> +		}
> +		goto out;
> +	}
> +
> +err_no_cmd:
> +	pr_err("%s USAGE: (-|+)ifname\n", DRV_NAME);
> +	ret = -EPERM;
> +
> +out:
> +	rtnl_unlock();
> +	return ret;
> +}
> +
> +static DEVICE_ATTR(slaves, S_IRUGO | S_IWUSR, parent_show_slaves,
> +		   parent_store_slaves);
> +
> +/* sysfs create/destroy functions */
> +static struct attribute *per_parent_attrs[] = {
> +	&dev_attr_slaves.attr, /* DEVICE_ATTR(slaves..) */
> +	&dev_attr_vifs.attr,
> +	&dev_attr_neighs.attr,
> +	NULL,
> +};
> +
> +/* name spcase  support */
> +static const void *eipoib_namespace(struct class *cls,
> +				    const struct class_attribute *attr)
> +{
> +	const struct eipoib_net *eipoib_n =
> +		container_of(attr,
> +			     struct eipoib_net, class_attr_eipoib_interfaces);
> +	return eipoib_n->net;
> +}
> +
> +static struct attribute_group parent_group = {
> +	/* per parent sysfs files under: /sys/class/net/<IF>/eth/.. */
> +	.name = "eth",
> +	.attrs = per_parent_attrs
> +};
> +
> +int create_slave_symlinks(struct net_device *master,
> +			  struct net_device *slave)
> +{
> +	char linkname[IFNAMSIZ+7];
> +	int ret = 0;
> +
> +	ret = sysfs_create_link(&(slave->dev.kobj), &(master->dev.kobj),
> +				"eth_parent");
> +	if (ret)
> +		return ret;
> +
> +	sprintf(linkname, "slave_%s", slave->name);
> +	ret = sysfs_create_link(&(master->dev.kobj), &(slave->dev.kobj),
> +				linkname);
> +	return ret;
> +
> +}
> +
> +void destroy_slave_symlinks(struct net_device *master,
> +			    struct net_device *slave)
> +{
> +	char linkname[IFNAMSIZ+7];
> +
> +	sysfs_remove_link(&(slave->dev.kobj), "eth_parent");
> +	sprintf(linkname, "slave_%s", slave->name);
> +	sysfs_remove_link(&(master->dev.kobj), linkname);
> +}
> +
> +static struct class_attribute class_attr_eth_ipoib_interfaces = {
> +	.attr = {
> +		.name = "eth_ipoib_interfaces",
> +		.mode = S_IWUSR | S_IRUGO,
> +	},
> +	.show = show_parents,
> +	.namespace = eipoib_namespace,
> +};
> +
> +/* per module sysfs file under: /sys/class/net/eth_ipoib_interfaces */
> +int mod_create_sysfs(struct eipoib_net *eipoib_n)
> +{
> +	int rc;
> +	/* defined in CLASS_ATTR(eth_ipoib_interfaces..) */
> +	eipoib_n->class_attr_eipoib_interfaces =
> +		class_attr_eth_ipoib_interfaces;
> +
> +	sysfs_attr_init(&eipoib_n->class_attr_eipoib_interfaces.attr);
> +
> +	rc = netdev_class_create_file(&eipoib_n->class_attr_eipoib_interfaces);
> +	if (rc)
> +		pr_err("%s failed to create sysfs (rc %d)\n",
> +		       eipoib_n->class_attr_eipoib_interfaces.attr.name, rc);
> +
> +	return rc;
> +}
> +
> +void mod_destroy_sysfs(struct eipoib_net *eipoib_n)
> +{
> +	netdev_class_remove_file(&eipoib_n->class_attr_eipoib_interfaces);
> +}
> +
> +int parent_create_sysfs_entry(struct parent *parent)
> +{
> +	struct net_device *dev = parent->dev;
> +	int rc;
> +
> +	rc = sysfs_create_group(&(dev->dev.kobj), &parent_group);
> +	if (rc)
> +		pr_info("failed to create sysfs group\n");
> +
> +	return rc;
> +}
> +
> +void parent_destroy_sysfs_entry(struct parent *parent)
> +{
> +	struct net_device *dev = parent->dev;
> +
> +	sysfs_remove_group(&(dev->dev.kobj), &parent_group);
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-07-23 13:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-18 10:59 [PATCH net-next V1 0/9] Add Ethernet IPoIB driver Or Gerlitz
2012-07-18 10:59 ` [PATCH net-next V1 1/9] IB/ipoib: Add support for clones / multiple childs on the same partition Or Gerlitz
2012-07-18 18:38   ` David Miller
2012-07-18 21:24     ` Or Gerlitz
2012-07-18 21:36       ` David Miller
2012-07-18 22:11         ` John Fastabend
2012-07-19  8:11           ` Or Gerlitz
2012-07-18 10:59 ` [PATCH net-next V1 2/9] include/linux: Add private flags for IPoIB interfaces Or Gerlitz
2012-07-18 10:59 ` [PATCH net-next V1 3/9] IB/ipoib: Add support for acting as VIF Or Gerlitz
2012-07-18 10:59 ` [PATCH net-next V1 4/9] net/eipoib: Add private header file Or Gerlitz
2012-07-18 10:59 ` [PATCH net-next V1 5/9] net/eipoib: Add ethtool file support Or Gerlitz
2012-07-18 18:37   ` Ben Hutchings
2012-07-19 15:55     ` Or Gerlitz
2012-07-18 10:59 ` [PATCH net-next V1 6/9] net/eipoib: Add sysfs support Or Gerlitz
2012-07-23 12:55   ` Or Gerlitz
2012-07-18 11:00 ` [PATCH net-next V1 7/9] net/eipoib: Add main driver functionality Or Gerlitz
2012-07-19 13:49   ` Ben Hutchings
2012-07-19 15:46     ` Or Gerlitz
2012-07-19 16:16       ` Ben Hutchings
2012-07-19 16:21         ` Or Gerlitz
2012-07-18 11:00 ` [PATCH net-next V1 8/9] net/eipoib: Add Makefile, Kconfig and MAINTAINERS entries Or Gerlitz
2012-07-18 11:00 ` [PATCH net-next V1 9/9] IB/ipoib: Add support for transmission of skbs w.o dst/neighbour Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.