All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V1 for-next 0/6] IP based RoCE GID Addressing
@ 2013-06-19  7:47 Or Gerlitz
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Or Gerlitz

Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as
they encode related Ethernet net-device interface MAC address and 
possibly VLAN id.

This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6)
of the that Ethernet interface, under the following reasoning:

1. There are environments where the compute entity that runs the RoCE 
stack is not aware that its traffic is vlan-tagged. This results with that 
node to create/assume wrong GIDs from the view point of a peer node which 
is aware to vlans. 

Note that "node" here can be physical node connected to Ethernet switch acting in 
access mode talking to another node which does vlan insertion/stripping by itself.

Or another example is SRIOV Virtual Function which is configured to work in "VST" 
mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW eSWitch 
to do vlan insertion for the vPORT representing that function.

2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for 
monitoring and security purposes. It is much more natural for both humans and 
automated utilities (...) to observe IP addresses in a certain offset into RoCE 
frames L3 header vs. MAC/VLANs (which are there anyway in the L2 header of that 
frame, so they are not gone by this change).

3. Some Bonding/Teaming advanced mode such as balance-alb and balance-tlb 
are using multiple underlying devices in parallel, and hence packets always 
carry the bond IP address but different streams have different source MACs.
The approach brought by this series is part from what would allow to 
support that for RoCE traffic too.

The 1st patch modified the IB core to cope with the new scheme, and the 2nd/3rd ones
do that for the mlx4_ib driver. The 4th patch sets the foundation for extending uverbs to
the new scheme which was introduced lately, and the 5th/6th patches add two extended
uCMA commands and two extended uVERBS commands which are now exported to user space.

These extended verbs will allow to enhance user space libraries such that they work 
OK over the modified scheme. All RC applications using librdmacm will not need to be 
modified at all, since the change will be encapsulated into that library.

The ocrdma driver needs to go through a similar patch as the mlx4_ib one, we can
surely do that patch, just need to dig there a little further. 

Or.

changes from V0:

 - enhanced docuementation of the mlx4_ib, uverbs and ucma patches
 - broke the mlx4_ib patch to two
 - broke the extended user space commands patch to two


Igor Ivanov (1):
  IB/core: Infra-structure to support verbs extensions through uverbs

Matan Barak (2):
  IB/core: Add RoCE IP based addressing extensions for uverbs
  IB/core: Add RoCE IP based addressing extensions for rdma_ucm

Moni Shoua (3):
  IB/core: RoCE IP based GID addressing
  IB/mlx4: Use RoCE IP based GIDs in the port GID table
  IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing

 drivers/infiniband/core/cm.c              |    3 +
 drivers/infiniband/core/cma.c             |   39 ++-
 drivers/infiniband/core/sa_query.c        |    5 +
 drivers/infiniband/core/ucma.c            |  190 +++++++++++--
 drivers/infiniband/core/uverbs.h          |    2 +
 drivers/infiniband/core/uverbs_cmd.c      |  330 ++++++++++++++++-----
 drivers/infiniband/core/uverbs_main.c     |   33 ++-
 drivers/infiniband/core/uverbs_marshall.c |   94 ++++++-
 drivers/infiniband/core/verbs.c           |    7 +
 drivers/infiniband/hw/mlx4/ah.c           |   21 +-
 drivers/infiniband/hw/mlx4/cq.c           |    5 +
 drivers/infiniband/hw/mlx4/main.c         |  461 ++++++++++++++++++++---------
 drivers/infiniband/hw/mlx4/mlx4_ib.h      |    3 +
 drivers/infiniband/hw/mlx4/qp.c           |   19 +-
 include/linux/mlx4/cq.h                   |   14 +-
 include/rdma/ib_addr.h                    |   45 ++--
 include/rdma/ib_marshall.h                |   12 +
 include/rdma/ib_sa.h                      |    3 +
 include/rdma/ib_verbs.h                   |    4 +
 include/uapi/rdma/ib_user_sa.h            |   34 ++-
 include/uapi/rdma/ib_user_verbs.h         |  130 ++++++++-
 include/uapi/rdma/rdma_user_cm.h          |   21 ++-
 22 files changed, 1157 insertions(+), 318 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V1 for-next 1/6] IB/core: RoCE IP based GID addressing
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-06-19  7:47   ` Or Gerlitz
  2013-06-19  7:47   ` [PATCH V1 for-next 2/6] IB/mlx4: Use RoCE IP based GIDs in the port GID table Or Gerlitz
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Moni Shoua, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

Currently, the IB core assume RoCE (IBoE) gids encode related Ethernet
netdevice interface MAC address and possibly VLAN id.

Change gids to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded within gids,
had to extend the Infiniband address structures (e.g. ib_ah_attr) with layer 2
address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c       |    3 ++
 drivers/infiniband/core/cma.c      |   39 ++++++++++++++++++++++--------
 drivers/infiniband/core/sa_query.c |    5 ++++
 drivers/infiniband/core/ucma.c     |   18 +++-----------
 drivers/infiniband/core/verbs.c    |    7 +++++
 include/rdma/ib_addr.h             |   45 ++++++++++++++++++++----------------
 include/rdma/ib_sa.h               |    3 ++
 include/rdma/ib_verbs.h            |    4 +++
 8 files changed, 79 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 784b97c..7af618f 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1557,6 +1557,9 @@ static int cm_req_handler(struct cm_work *work)
 
 	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
 	cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]);
+
+	memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac, 6);
+	work->path[0].vlan = cm_id_priv->av.ah_attr.vlan;
 	ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
 	if (ret) {
 		ib_get_cached_gid(work->port->cm_dev->ib_device,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 71c2c71..ba217c9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -373,7 +373,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv)
 		return -EINVAL;
 
 	mutex_lock(&lock);
-	iboe_addr_get_sgid(dev_addr, &iboe_gid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
+		    &iboe_gid);
+
 	memcpy(&gid, dev_addr->src_dev_addr +
 	       rdma_addr_gid_offset(dev_addr), sizeof gid);
 	list_for_each_entry(cma_dev, &dev_list, list) {
@@ -1803,7 +1805,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 	struct sockaddr_in *src_addr = (struct sockaddr_in *)&route->addr.src_addr;
 	struct sockaddr_in *dst_addr = (struct sockaddr_in *)&route->addr.dst_addr;
 	struct net_device *ndev = NULL;
-	u16 vid;
+
 
 	if (src_addr->sin_family != dst_addr->sin_family)
 		return -EINVAL;
@@ -1830,10 +1832,13 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 		goto err2;
 	}
 
-	vid = rdma_vlan_dev_vlan_id(ndev);
+	route->path_rec->vlan = rdma_vlan_dev_vlan_id(ndev);
+	memcpy(route->path_rec->dmac, addr->dev_addr.dst_dev_addr, 6);
 
-	iboe_mac_vlan_to_ll(&route->path_rec->sgid, addr->dev_addr.src_dev_addr, vid);
-	iboe_mac_vlan_to_ll(&route->path_rec->dgid, addr->dev_addr.dst_dev_addr, vid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
+		    &route->path_rec->sgid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.dst_addr,
+		    &route->path_rec->dgid);
 
 	route->path_rec->hop_limit = 1;
 	route->path_rec->reversible = 1;
@@ -1970,6 +1975,8 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 			   RDMA_CM_ADDR_RESOLVED))
 		goto out;
 
+	memcpy(&id_priv->id.route.addr.src_addr, src_addr,
+	       ip_addr_size(src_addr));
 	if (!status && !id_priv->cma_dev)
 		status = cma_acquire_dev(id_priv);
 
@@ -1979,11 +1986,8 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 			goto out;
 		event.event = RDMA_CM_EVENT_ADDR_ERROR;
 		event.status = status;
-	} else {
-		memcpy(&id_priv->id.route.addr.src_addr, src_addr,
-		       ip_addr_size(src_addr));
+	} else
 		event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
-	}
 
 	if (id_priv->id.event_handler(&id_priv->id, &event)) {
 		cma_exch(id_priv, RDMA_CM_DESTROYING);
@@ -2381,6 +2385,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	if (ret)
 		goto err1;
 
+	memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr));
 	if (!cma_any_addr(addr)) {
 		ret = rdma_translate_ip(addr, &id->route.addr.dev_addr);
 		if (ret)
@@ -2391,7 +2396,6 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 			goto err1;
 	}
 
-	memcpy(&id->route.addr.src_addr, addr, ip_addr_size(addr));
 	if (!(id_priv->options & (1 << CMA_OPTION_AFONLY))) {
 		if (addr->sa_family == AF_INET)
 			id_priv->afonly = 1;
@@ -2951,9 +2955,13 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 	struct rdma_id_private *id_priv;
 	struct cma_multicast *mc = multicast->context;
 	struct rdma_cm_event event;
+	struct rdma_dev_addr *dev_addr;
 	int ret;
+	struct net_device *ndev = NULL;
+	u16 vlan;
 
 	id_priv = mc->id_priv;
+	dev_addr = &id_priv->id.route.addr.dev_addr;
 	if (cma_disable_callback(id_priv, RDMA_CM_ADDR_BOUND) &&
 	    cma_disable_callback(id_priv, RDMA_CM_ADDR_RESOLVED))
 		return 0;
@@ -2967,11 +2975,19 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 	memset(&event, 0, sizeof event);
 	event.status = status;
 	event.param.ud.private_data = mc->context;
+	ndev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+	if (!ndev) {
+		status = -ENODEV;
+	} else {
+		vlan = rdma_vlan_dev_vlan_id(ndev);
+		dev_put(ndev);
+	}
 	if (!status) {
 		event.event = RDMA_CM_EVENT_MULTICAST_JOIN;
 		ib_init_ah_from_mcmember(id_priv->id.device,
 					 id_priv->id.port_num, &multicast->rec,
 					 &event.param.ud.ah_attr);
+		event.param.ud.ah_attr.vlan = vlan;
 		event.param.ud.qp_num = 0xFFFFFF;
 		event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey);
 	} else
@@ -3138,7 +3154,8 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 		err = -EINVAL;
 		goto out2;
 	}
-	iboe_addr_get_sgid(dev_addr, &mc->multicast.ib->rec.port_gid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
+		    &mc->multicast.ib->rec.port_gid);
 	work->id = id_priv;
 	work->mc = mc;
 	INIT_WORK(&work->work, iboe_mcast_work_handler);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 934f45e..d813075 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -556,6 +556,11 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 		ah_attr->grh.hop_limit     = rec->hop_limit;
 		ah_attr->grh.traffic_class = rec->traffic_class;
 	}
+	if (force_grh) {
+		memcpy(ah_attr->dmac, rec->dmac, 6);
+		ah_attr->vlan = rec->vlan;
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ib_init_ah_from_path);
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 5ca44cd..bc2cb5d 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -602,24 +602,14 @@ static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp,
 static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp,
 				 struct rdma_route *route)
 {
-	struct rdma_dev_addr *dev_addr;
-	struct net_device *dev;
-	u16 vid = 0;
 
 	resp->num_paths = route->num_paths;
 	switch (route->num_paths) {
 	case 0:
-		dev_addr = &route->addr.dev_addr;
-		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
-			if (dev) {
-				vid = rdma_vlan_dev_vlan_id(dev);
-				dev_put(dev);
-			}
-
-		iboe_mac_vlan_to_ll((union ib_gid *) &resp->ib_route[0].dgid,
-				    dev_addr->dst_dev_addr, vid);
-		iboe_addr_get_sgid(dev_addr,
-				   (union ib_gid *) &resp->ib_route[0].sgid);
+		rdma_ip2gid((struct sockaddr *)&route->addr.dst_addr,
+			    (union ib_gid *)&resp->ib_route[0].dgid);
+		rdma_ip2gid((struct sockaddr *)&route->addr.src_addr,
+			    (union ib_gid *)&resp->ib_route[0].sgid);
 		resp->ib_route[0].pkey = cpu_to_be16(0xffff);
 		break;
 	case 2:
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 22192de..936ec87 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -189,8 +189,15 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 	u32 flow_class;
 	u16 gid_index;
 	int ret;
+	int is_eth = (rdma_port_get_link_layer(device, port_num) ==
+			IB_LINK_LAYER_ETHERNET);
 
 	memset(ah_attr, 0, sizeof *ah_attr);
+	if (is_eth) {
+		memcpy(ah_attr->dmac, wc->smac, 6);
+		ah_attr->vlan = wc->vlan;
+	}
+
 	ah_attr->dlid = wc->slid;
 	ah_attr->sl = wc->sl;
 	ah_attr->src_path_bits = wc->dlid_path_bits;
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 9996539..b38f837 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -38,8 +38,12 @@
 #include <linux/in6.h>
 #include <linux/if_arp.h>
 #include <linux/netdevice.h>
+#include <linux/inetdevice.h>
 #include <linux/socket.h>
 #include <linux/if_vlan.h>
+#include <net/ipv6.h>
+#include <net/if_inet6.h>
+#include <net/ip.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
 
@@ -130,41 +134,42 @@ static inline int rdma_addr_gid_offset(struct rdma_dev_addr *dev_addr)
 	return dev_addr->dev_type == ARPHRD_INFINIBAND ? 4 : 0;
 }
 
-static inline void iboe_mac_vlan_to_ll(union ib_gid *gid, u8 *mac, u16 vid)
-{
-	memset(gid->raw, 0, 16);
-	*((__be32 *) gid->raw) = cpu_to_be32(0xfe800000);
-	if (vid < 0x1000) {
-		gid->raw[12] = vid & 0xff;
-		gid->raw[11] = vid >> 8;
-	} else {
-		gid->raw[12] = 0xfe;
-		gid->raw[11] = 0xff;
-	}
-	memcpy(gid->raw + 13, mac + 3, 3);
-	memcpy(gid->raw + 8, mac, 3);
-	gid->raw[8] ^= 2;
-}
-
 static inline u16 rdma_vlan_dev_vlan_id(const struct net_device *dev)
 {
 	return dev->priv_flags & IFF_802_1Q_VLAN ?
 		vlan_dev_vlan_id(dev) : 0xffff;
 }
 
+static inline int rdma_ip2gid(struct sockaddr *addr, union ib_gid *gid)
+{
+	switch (addr->sa_family) {
+	case AF_INET:
+		ipv6_addr_set_v4mapped(((struct sockaddr_in *)addr)->sin_addr.s_addr,
+				       (struct in6_addr *)gid);
+		break;
+	case AF_INET6:
+		memcpy(gid->raw, &((struct sockaddr_in6 *)addr)->sin6_addr, 16);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static inline void iboe_addr_get_sgid(struct rdma_dev_addr *dev_addr,
 				      union ib_gid *gid)
 {
 	struct net_device *dev;
-	u16 vid = 0xffff;
+	struct in_device *ip4;
 
 	dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
 	if (dev) {
-		vid = rdma_vlan_dev_vlan_id(dev);
+		ip4 = (struct in_device *)dev->ip_ptr;
+		if (ip4 && ip4->ifa_list && ip4->ifa_list->ifa_address)
+			ipv6_addr_set_v4mapped(ip4->ifa_list->ifa_address,
+					       (struct in6_addr *)gid);
 		dev_put(dev);
 	}
-
-	iboe_mac_vlan_to_ll(gid, dev_addr->src_dev_addr, vid);
 }
 
 static inline void rdma_addr_get_sgid(struct rdma_dev_addr *dev_addr, union ib_gid *gid)
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 8275e53..0a9207e 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -154,6 +154,9 @@ struct ib_sa_path_rec {
 	u8           packet_life_time_selector;
 	u8           packet_life_time;
 	u8           preference;
+	u8           smac[6];
+	u8           dmac[6];
+	__be16       vlan;
 };
 
 #define IB_SA_MCMEMBER_REC_MGID				IB_SA_COMP_MASK( 0)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 98cc4b2..ef1f332 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -469,6 +469,8 @@ struct ib_ah_attr {
 	u8			static_rate;
 	u8			ah_flags;
 	u8			port_num;
+	u8			dmac[6];
+	u16			vlan;
 };
 
 enum ib_wc_status {
@@ -541,6 +543,8 @@ struct ib_wc {
 	u8			sl;
 	u8			dlid_path_bits;
 	u8			port_num;	/* valid only for DR SMPs on switches */
+	u8			smac[6];
+	u16			vlan;
 };
 
 enum ib_cq_notify_flags {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V1 for-next 2/6] IB/mlx4: Use RoCE IP based GIDs in the port GID table
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-06-19  7:47   ` [PATCH V1 for-next 1/6] IB/core: RoCE IP based GID addressing Or Gerlitz
@ 2013-06-19  7:47   ` Or Gerlitz
  2013-06-19  7:47   ` [PATCH V1 for-next 3/6] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing Or Gerlitz
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Moni Shoua, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

Currently, the mlx4 driver set RoCE (IBoE) gids to encode related
Ethernet netdevice interface MAC address and possibly VLAN id.

Change this scheme such that gids encode interface IP addresses
(both IP4 and IPv6).

This requires learning which are the IP addresses which are of use
by a netdevice associated with the HCA port, formatting them to gids
and adding them to the port gid table. Further, events of add and
delete address are caught to maintain the gid table accordingly.

Associated IP addresses may belong to a master of an Ethernet netdevice
on top of that port so this should be considered when building and
maintaining the gid table.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |  461 +++++++++++++++++++++++-----------
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    3 +
 2 files changed, 320 insertions(+), 144 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 23d7343..8879b41 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -39,6 +39,8 @@
 #include <linux/inetdevice.h>
 #include <linux/rtnetlink.h>
 #include <linux/if_vlan.h>
+#include <net/ipv6.h>
+#include <net/addrconf.h>
 
 #include <rdma/ib_smi.h>
 #include <rdma/ib_user_verbs.h>
@@ -767,7 +769,6 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid *gid)
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 		   union ib_gid *gid)
 {
-	u8 mac[6];
 	struct net_device *ndev;
 	int ret = 0;
 
@@ -781,11 +782,7 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 	spin_unlock(&mdev->iboe.lock);
 
 	if (ndev) {
-		rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-		rtnl_lock();
-		dev_mc_add(mdev->iboe.netdevs[mqp->port - 1], mac);
 		ret = 1;
-		rtnl_unlock();
 		dev_put(ndev);
 	}
 
@@ -805,6 +802,8 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	struct mlx4_ib_qp *mqp = to_mqp(ibqp);
 	u64 reg_id;
 	struct mlx4_ib_steering *ib_steering = NULL;
+	enum mlx4_protocol prot = (gid->raw[1] == 0x0e) ?
+		MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
 	if (mdev->dev->caps.steering_mode ==
 	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -816,7 +815,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	err = mlx4_multicast_attach(mdev->dev, &mqp->mqp, gid->raw, mqp->port,
 				    !!(mqp->flags &
 				       MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
-				    MLX4_PROT_IB_IPV6, &reg_id);
+				    prot, &reg_id);
 	if (err)
 		goto err_malloc;
 
@@ -835,7 +834,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 
 err_add:
 	mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw,
-			      MLX4_PROT_IB_IPV6, reg_id);
+			      prot, reg_id);
 err_malloc:
 	kfree(ib_steering);
 
@@ -863,10 +862,11 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	int err;
 	struct mlx4_ib_dev *mdev = to_mdev(ibqp->device);
 	struct mlx4_ib_qp *mqp = to_mqp(ibqp);
-	u8 mac[6];
 	struct net_device *ndev;
 	struct mlx4_ib_gid_entry *ge;
 	u64 reg_id = 0;
+	enum mlx4_protocol prot = (gid->raw[1] == 0x0e) ?
+		MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
 	if (mdev->dev->caps.steering_mode ==
 	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -889,7 +889,7 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	}
 
 	err = mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw,
-				    MLX4_PROT_IB_IPV6, reg_id);
+				    prot, reg_id);
 	if (err)
 		return err;
 
@@ -901,13 +901,8 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 		if (ndev)
 			dev_hold(ndev);
 		spin_unlock(&mdev->iboe.lock);
-		rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-		if (ndev) {
-			rtnl_lock();
-			dev_mc_del(mdev->iboe.netdevs[ge->port - 1], mac);
-			rtnl_unlock();
+		if (ndev)
 			dev_put(ndev);
-		}
 		list_del(&ge->list);
 		kfree(ge);
 	} else
@@ -1003,20 +998,6 @@ static struct device_attribute *mlx4_class_attributes[] = {
 	&dev_attr_board_id
 };
 
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, struct net_device *dev)
-{
-	memcpy(eui, dev->dev_addr, 3);
-	memcpy(eui + 5, dev->dev_addr + 3, 3);
-	if (vlan_id < 0x1000) {
-		eui[3] = vlan_id >> 8;
-		eui[4] = vlan_id & 0xff;
-	} else {
-		eui[3] = 0xff;
-		eui[4] = 0xfe;
-	}
-	eui[0] ^= 2;
-}
-
 static void update_gids_task(struct work_struct *work)
 {
 	struct update_gid_work *gw = container_of(work, struct update_gid_work, work);
@@ -1039,161 +1020,303 @@ static void update_gids_task(struct work_struct *work)
 		       MLX4_CMD_WRAPPED);
 	if (err)
 		pr_warn("set port command failed\n");
-	else {
-		memcpy(gw->dev->iboe.gid_table[gw->port - 1], gw->gids, sizeof gw->gids);
+	else
 		mlx4_ib_dispatch_event(gw->dev, gw->port, IB_EVENT_GID_CHANGE);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	kfree(gw);
+}
+
+static void reset_gids_task(struct work_struct *work)
+{
+	struct update_gid_work *gw =
+			container_of(work, struct update_gid_work, work);
+	struct mlx4_cmd_mailbox *mailbox;
+	union ib_gid *gids;
+	int err;
+	struct mlx4_dev	*dev = gw->dev->dev;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox)) {
+		pr_warn("reset gid table failed\n");
+		goto free;
 	}
 
+	gids = mailbox->buf;
+	memcpy(gids, gw->gids, sizeof(gw->gids));
+
+	err = mlx4_cmd(dev, mailbox->dma, MLX4_SET_PORT_GID_TABLE << 8 | 1,
+		       1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+		       MLX4_CMD_WRAPPED);
+	if (err)
+		pr_warn(KERN_WARNING "set port 1 command failed\n");
+
+	err = mlx4_cmd(dev, mailbox->dma, MLX4_SET_PORT_GID_TABLE << 8 | 2,
+		       1, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B,
+		       MLX4_CMD_WRAPPED);
+	if (err)
+		pr_warn(KERN_WARNING "set port 2 command failed\n");
+
 	mlx4_free_cmd_mailbox(dev, mailbox);
+free:
 	kfree(gw);
 }
 
-static int update_ipv6_gids(struct mlx4_ib_dev *dev, int port, int clear)
+static int update_gid_table(struct mlx4_ib_dev *dev, int port,
+			    union ib_gid *gid, int clear)
 {
-	struct net_device *ndev = dev->iboe.netdevs[port - 1];
 	struct update_gid_work *work;
-	struct net_device *tmp;
 	int i;
-	u8 *hits;
-	int ret;
-	union ib_gid gid;
-	int free;
-	int found;
 	int need_update = 0;
-	u16 vid;
+	int free = -1;
+	int found = -1;
+	int max_gids;
+
+	max_gids = dev->dev->caps.gid_table_len[port];
+	for (i = 0; i < max_gids; ++i) {
+		if (!memcmp(&dev->iboe.gid_table[port - 1][i], gid,
+			    sizeof(*gid)))
+			found = i;
+
+		if (clear) {
+			if (found >= 0) {
+				need_update = 1;
+				dev->iboe.gid_table[port - 1][found] = zgid;
+				break;
+			}
+		} else {
+			if (found >= 0)
+				break;
+
+			if (free < 0 && !memcmp(&dev->iboe.gid_table[port - 1][i], &zgid,
+						sizeof(*gid)))
+				free = i;
+		}
+	}
+
+	if (found == -1 && !clear && free >= 0) {
+		dev->iboe.gid_table[port - 1][free] = *gid;
+		need_update = 1;
+	}
+
+	if (!need_update)
+		return 0;
 
 	work = kzalloc(sizeof *work, GFP_ATOMIC);
 	if (!work)
 		return -ENOMEM;
 
-	hits = kzalloc(128, GFP_ATOMIC);
-	if (!hits) {
-		ret = -ENOMEM;
-		goto out;
-	}
+	memcpy(work->gids, dev->iboe.gid_table[port - 1], sizeof(work->gids));
+	INIT_WORK(&work->work, update_gids_task);
+	work->port = port;
+	work->dev = dev;
+	queue_work(wq, &work->work);
 
-	rcu_read_lock();
-	for_each_netdev_rcu(&init_net, tmp) {
-		if (ndev && (tmp == ndev || rdma_vlan_dev_real_dev(tmp) == ndev)) {
-			gid.global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
-			vid = rdma_vlan_dev_vlan_id(tmp);
-			mlx4_addrconf_ifid_eui48(&gid.raw[8], vid, ndev);
-			found = 0;
-			free = -1;
-			for (i = 0; i < 128; ++i) {
-				if (free < 0 &&
-				    !memcmp(&dev->iboe.gid_table[port - 1][i], &zgid, sizeof zgid))
-					free = i;
-				if (!memcmp(&dev->iboe.gid_table[port - 1][i], &gid, sizeof gid)) {
-					hits[i] = 1;
-					found = 1;
-					break;
-				}
-			}
-
-			if (!found) {
-				if (tmp == ndev &&
-				    (memcmp(&dev->iboe.gid_table[port - 1][0],
-					    &gid, sizeof gid) ||
-				     !memcmp(&dev->iboe.gid_table[port - 1][0],
-					     &zgid, sizeof gid))) {
-					dev->iboe.gid_table[port - 1][0] = gid;
-					++need_update;
-					hits[0] = 1;
-				} else if (free >= 0) {
-					dev->iboe.gid_table[port - 1][free] = gid;
-					hits[free] = 1;
-					++need_update;
-				}
-			}
-		}
-	}
-	rcu_read_unlock();
+	return 0;
+}
 
-	for (i = 0; i < 128; ++i)
-		if (!hits[i]) {
-			if (memcmp(&dev->iboe.gid_table[port - 1][i], &zgid, sizeof zgid))
-				++need_update;
-			dev->iboe.gid_table[port - 1][i] = zgid;
-		}
+static int reset_gid_table(struct mlx4_ib_dev *dev)
+{
+	struct update_gid_work *work;
 
-	if (need_update) {
-		memcpy(work->gids, dev->iboe.gid_table[port - 1], sizeof work->gids);
-		INIT_WORK(&work->work, update_gids_task);
-		work->port = port;
-		work->dev = dev;
-		queue_work(wq, &work->work);
-	} else
-		kfree(work);
 
-	kfree(hits);
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return -ENOMEM;
+	memset(dev->iboe.gid_table, 0, sizeof(dev->iboe.gid_table));
+	memset(work->gids, 0, sizeof(work->gids));
+	INIT_WORK(&work->work, reset_gids_task);
+	work->dev = dev;
+	queue_work(wq, &work->work);
 	return 0;
-
-out:
-	kfree(work);
-	return ret;
 }
 
-static void handle_en_event(struct mlx4_ib_dev *dev, int port, unsigned long event)
+static int mlx4_ib_addr_event(int event, struct net_device *event_netdev,
+			      struct mlx4_ib_dev *ibdev, union ib_gid *gid)
 {
-	switch (event) {
-	case NETDEV_UP:
-	case NETDEV_CHANGEADDR:
-		update_ipv6_gids(dev, port, 0);
-		break;
+	struct mlx4_ib_iboe *iboe;
+	int port = 0;
+	struct net_device *real_dev = rdma_vlan_dev_real_dev(event_netdev) ?
+				rdma_vlan_dev_real_dev(event_netdev) : event_netdev;
+
+	if (event != NETDEV_DOWN && event != NETDEV_UP)
+		return 0;
+
+	if ((real_dev != event_netdev) &&
+	    (event == NETDEV_DOWN) &&
+	    rdma_link_local_addr((struct in6_addr *)gid))
+		return 0;
+
+	iboe = &ibdev->iboe;
+	spin_lock(&iboe->lock);
+
+	for (port = 1; port <= MLX4_MAX_PORTS; ++port)
+		if ((netif_is_bond_master(real_dev) && (real_dev == iboe->masters[port - 1])) ||
+		    (!netif_is_bond_master(real_dev) && (real_dev == iboe->netdevs[port - 1])))
+			update_gid_table(ibdev, port, gid, event == NETDEV_DOWN);
+
+	spin_unlock(&iboe->lock);
+	return 0;
 
-	case NETDEV_DOWN:
-		update_ipv6_gids(dev, port, 1);
-		dev->iboe.netdevs[port - 1] = NULL;
-	}
 }
 
-static void netdev_added(struct mlx4_ib_dev *dev, int port)
+static u8 mlx4_ib_get_dev_port(struct net_device *dev,
+			       struct mlx4_ib_dev *ibdev)
 {
-	update_ipv6_gids(dev, port, 0);
+	u8 port = 0;
+	struct mlx4_ib_iboe *iboe;
+	struct net_device *real_dev = rdma_vlan_dev_real_dev(dev) ?
+				rdma_vlan_dev_real_dev(dev) : dev;
+
+	iboe = &ibdev->iboe;
+	spin_lock(&iboe->lock);
+
+	for (port = 1; port <= MLX4_MAX_PORTS; ++port)
+		if ((netif_is_bond_master(real_dev) && (real_dev == iboe->masters[port - 1])) ||
+		    (!netif_is_bond_master(real_dev) && (real_dev == iboe->netdevs[port - 1])))
+			break;
+
+	spin_unlock(&iboe->lock);
+
+	if ((port == 0) || (port > MLX4_MAX_PORTS))
+		return 0;
+	else
+		return port;
 }
 
-static void netdev_removed(struct mlx4_ib_dev *dev, int port)
+static int mlx4_ib_inet_event(struct notifier_block *this, unsigned long event,
+				void *ptr)
 {
-	update_ipv6_gids(dev, port, 1);
+	struct mlx4_ib_dev *ibdev;
+	struct in_ifaddr *ifa = ptr;
+	union ib_gid gid;
+	struct net_device *event_netdev = ifa->ifa_dev->dev;
+
+	ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
+
+	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb_inet);
+
+	mlx4_ib_addr_event(event, event_netdev, ibdev, &gid);
+	return NOTIFY_DONE;
 }
 
-static int mlx4_ib_netdev_event(struct notifier_block *this, unsigned long event,
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+static int mlx4_ib_inet6_event(struct notifier_block *this, unsigned long event,
 				void *ptr)
 {
-	struct net_device *dev = ptr;
 	struct mlx4_ib_dev *ibdev;
-	struct net_device *oldnd;
+	struct inet6_ifaddr *ifa = ptr;
+	union  ib_gid *gid = (union ib_gid *)&ifa->addr;
+	struct net_device *event_netdev = ifa->idev->dev;
+
+	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb_inet6);
+
+	mlx4_ib_addr_event(event, event_netdev, ibdev, gid);
+	return NOTIFY_DONE;
+}
+#endif
+
+static void mlx4_ib_get_dev_addr(struct net_device *dev, struct mlx4_ib_dev *ibdev, u8 port)
+{
+	struct in_device *in_dev;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	struct inet6_dev *in6_dev;
+	union ib_gid  *pgid;
+	struct inet6_ifaddr *ifp;
+#endif
+	union ib_gid gid;
+
+
+	if ((port == 0) || (port > MLX4_MAX_PORTS))
+		return;
+
+	/* IPv4 gids */
+	in_dev = in_dev_get(dev);
+	if (in_dev) {
+		for_ifa(in_dev) {
+			/*ifa->ifa_address;*/
+			ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
+			update_gid_table(ibdev, port, &gid, 0);
+		}
+		endfor_ifa(in_dev);
+		in_dev_put(in_dev);
+	}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	/* IPv6 gids */
+	in6_dev = in6_dev_get(dev);
+	if (in6_dev) {
+		read_lock_bh(&in6_dev->lock);
+		list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
+			pgid = (union ib_gid *)&ifp->addr;
+			update_gid_table(ibdev, port, pgid, 0);
+		}
+		read_unlock_bh(&in6_dev->lock);
+		in6_dev_put(in6_dev);
+	}
+#endif
+}
+
+int mlx4_ib_init_gid_table(struct mlx4_ib_dev *ibdev)
+{
+	struct	net_device *dev;
+
+	if (reset_gid_table(ibdev))
+		return -1;
+
+	read_lock(&dev_base_lock);
+
+	for_each_netdev(&init_net, dev) {
+		u8 port = mlx4_ib_get_dev_port(dev, ibdev);
+		if (port)
+			mlx4_ib_get_dev_addr(dev, ibdev, port);
+	}
+
+	read_unlock(&dev_base_lock);
+
+	return 0;
+}
+
+static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev)
+{
 	struct mlx4_ib_iboe *iboe;
 	int port;
 
-	if (!net_eq(dev_net(dev), &init_net))
-		return NOTIFY_DONE;
-
-	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb);
 	iboe = &ibdev->iboe;
 
 	spin_lock(&iboe->lock);
 	mlx4_foreach_ib_transport_port(port, ibdev->dev) {
-		oldnd = iboe->netdevs[port - 1];
+		struct net_device *old_master = iboe->masters[port - 1];
+		struct net_device *curr_master;
 		iboe->netdevs[port - 1] =
 			mlx4_get_protocol_dev(ibdev->dev, MLX4_PROT_ETH, port);
-		if (oldnd != iboe->netdevs[port - 1]) {
-			if (iboe->netdevs[port - 1])
-				netdev_added(ibdev, port);
-			else
-				netdev_removed(ibdev, port);
+
+		if (iboe->netdevs[port - 1] && netif_is_bond_slave(iboe->netdevs[port - 1])) {
+			rtnl_lock();
+			iboe->masters[port - 1] = netdev_master_upper_dev_get(iboe->netdevs[port - 1]);
+			rtnl_unlock();
 		}
-	}
+		curr_master = iboe->masters[port - 1];
 
-	if (dev == iboe->netdevs[0] ||
-	    (iboe->netdevs[0] && rdma_vlan_dev_real_dev(dev) == iboe->netdevs[0]))
-		handle_en_event(ibdev, 1, event);
-	else if (dev == iboe->netdevs[1]
-		 || (iboe->netdevs[1] && rdma_vlan_dev_real_dev(dev) == iboe->netdevs[1]))
-		handle_en_event(ibdev, 2, event);
+		/* if bonding is used it is possible that we add it to masters only after
+		   IP address is assigned to the net bonding interface */
+		if (curr_master && (old_master != curr_master))
+			mlx4_ib_get_dev_addr(curr_master, ibdev, port);
+	}
 
 	spin_unlock(&iboe->lock);
+}
+
+static int mlx4_ib_netdev_event(struct notifier_block *this, unsigned long event,
+				void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct mlx4_ib_dev *ibdev;
+
+	if (!net_eq(dev_net(dev), &init_net))
+		return NOTIFY_DONE;
+
+	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb);
+	mlx4_ib_scan_netdevs(ibdev);
 
 	return NOTIFY_DONE;
 }
@@ -1490,11 +1613,35 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	if (mlx4_ib_init_sriov(ibdev))
 		goto err_mad;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_IBOE && !iboe->nb.notifier_call) {
-		iboe->nb.notifier_call = mlx4_ib_netdev_event;
-		err = register_netdevice_notifier(&iboe->nb);
-		if (err)
-			goto err_sriov;
+	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_IBOE) {
+		if (!iboe->nb.notifier_call) {
+			iboe->nb.notifier_call = mlx4_ib_netdev_event;
+			err = register_netdevice_notifier(&iboe->nb);
+			if (err) {
+				iboe->nb.notifier_call = NULL;
+				goto err_notif;
+			}
+		}
+		if (!iboe->nb_inet.notifier_call) {
+			iboe->nb_inet.notifier_call = mlx4_ib_inet_event;
+			err = register_inetaddr_notifier(&iboe->nb_inet);
+			if (err) {
+				iboe->nb_inet.notifier_call = NULL;
+				goto err_notif;
+			}
+		}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+		if (!iboe->nb_inet6.notifier_call) {
+			iboe->nb_inet6.notifier_call = mlx4_ib_inet6_event;
+			err = register_inet6addr_notifier(&iboe->nb_inet6);
+			if (err) {
+				iboe->nb_inet6.notifier_call = NULL;
+				goto err_notif;
+			}
+		}
+#endif
+		mlx4_ib_scan_netdevs(ibdev);
+		mlx4_ib_init_gid_table(ibdev);
 	}
 
 	for (j = 0; j < ARRAY_SIZE(mlx4_class_attributes); ++j) {
@@ -1520,11 +1667,25 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	return ibdev;
 
 err_notif:
-	if (unregister_netdevice_notifier(&ibdev->iboe.nb))
-		pr_warn("failure unregistering notifier\n");
+	if (ibdev->iboe.nb.notifier_call) {
+		if (unregister_netdevice_notifier(&ibdev->iboe.nb))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb.notifier_call = NULL;
+	}
+	if (ibdev->iboe.nb_inet.notifier_call) {
+		if (unregister_inetaddr_notifier(&ibdev->iboe.nb_inet))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet.notifier_call = NULL;
+	}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	if (ibdev->iboe.nb_inet6.notifier_call) {
+		if (unregister_inet6addr_notifier(&ibdev->iboe.nb_inet6))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet6.notifier_call = NULL;
+	}
+#endif
 	flush_workqueue(wq);
 
-err_sriov:
 	mlx4_ib_close_sriov(ibdev);
 
 err_mad:
@@ -1566,6 +1727,18 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 			pr_warn("failure unregistering notifier\n");
 		ibdev->iboe.nb.notifier_call = NULL;
 	}
+	if (ibdev->iboe.nb_inet.notifier_call) {
+		if (unregister_inetaddr_notifier(&ibdev->iboe.nb_inet))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet.notifier_call = NULL;
+	}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	if (ibdev->iboe.nb_inet6.notifier_call) {
+		if (unregister_inet6addr_notifier(&ibdev->iboe.nb_inet6))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet6.notifier_call = NULL;
+	}
+#endif
 	iounmap(ibdev->uar_map);
 	for (p = 0; p < ibdev->num_ports; ++p)
 		if (ibdev->counters[p] != -1)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index f61ec26..0c98417 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -422,7 +422,10 @@ struct mlx4_ib_sriov {
 struct mlx4_ib_iboe {
 	spinlock_t		lock;
 	struct net_device      *netdevs[MLX4_MAX_PORTS];
+	struct net_device      *masters[MLX4_MAX_PORTS];
 	struct notifier_block 	nb;
+	struct notifier_block	nb_inet;
+	struct notifier_block	nb_inet6;
 	union ib_gid		gid_table[MLX4_MAX_PORTS][128];
 };
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V1 for-next 3/6] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-06-19  7:47   ` [PATCH V1 for-next 1/6] IB/core: RoCE IP based GID addressing Or Gerlitz
  2013-06-19  7:47   ` [PATCH V1 for-next 2/6] IB/mlx4: Use RoCE IP based GIDs in the port GID table Or Gerlitz
@ 2013-06-19  7:47   ` Or Gerlitz
  2013-06-19  7:47   ` [PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs Or Gerlitz
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Moni Shoua, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.

Hence, we need to extract them now from the CQE and place in struct
ib_wc (to be used for cases were they were taken from the gid).

Also, when modifying a QP or building address handle, instead of
parsing the dgid to get the MAC and VLAN, take them from the
address handle attributes.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/ah.c |   21 +++++++++------------
 drivers/infiniband/hw/mlx4/cq.c |    5 +++++
 drivers/infiniband/hw/mlx4/qp.c |   19 ++++++++++---------
 include/linux/mlx4/cq.h         |   14 ++++++++++----
 4 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index a251bec..3941700 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -92,21 +92,18 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
 {
 	struct mlx4_ib_dev *ibdev = to_mdev(pd->device);
 	struct mlx4_dev *dev = ibdev->dev;
-	union ib_gid sgid;
-	u8 mac[6];
-	int err;
 	int is_mcast;
+	struct in6_addr in6;
 	u16 vlan_tag;
 
-	err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, &is_mcast, ah_attr->port_num);
-	if (err)
-		return ERR_PTR(err);
-
-	memcpy(ah->av.eth.mac, mac, 6);
-	err = ib_get_cached_gid(pd->device, ah_attr->port_num, ah_attr->grh.sgid_index, &sgid);
-	if (err)
-		return ERR_PTR(err);
-	vlan_tag = rdma_get_vlan_id(&sgid);
+	memcpy(&in6, ah_attr->grh.dgid.raw, sizeof(in6));
+	if (rdma_is_multicast_addr(&in6)) {
+		is_mcast = 1;
+		rdma_get_mcast_mac(&in6, ah->av.eth.mac);
+	} else {
+		memcpy(ah->av.eth.mac, ah_attr->dmac, 6);
+	}
+	vlan_tag = ah_attr->vlan;
 	if (vlan_tag < 0x1000)
 		vlan_tag |= (ah_attr->sl & 7) << 13;
 	ah->av.eth.port_pd = cpu_to_be32(to_mpd(pd)->pdn | (ah_attr->port_num << 24));
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d5e60f4..ba3f85b 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -793,6 +793,11 @@ repoll:
 			wc->sl  = be16_to_cpu(cqe->sl_vid) >> 13;
 		else
 			wc->sl  = be16_to_cpu(cqe->sl_vid) >> 12;
+		if (be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_VLAN_PRESENT_MASK)
+			wc->vlan = be16_to_cpu(cqe->sl_vid) & MLX4_CQE_VID_MASK;
+		else
+			wc->vlan = 0xffff;
+		memcpy(wc->smac, cqe->smac, 6);
 	}
 
 	return 0;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 4f10af2..ddf5a1a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1147,11 +1147,8 @@ static void mlx4_set_sched(struct mlx4_qp_path *path, u8 port)
 static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 			 struct mlx4_qp_path *path, u8 port)
 {
-	int err;
 	int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port) ==
 		IB_LINK_LAYER_ETHERNET;
-	u8 mac[6];
-	int is_mcast;
 	u16 vlan_tag;
 	int vidx;
 
@@ -1188,16 +1185,12 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 		if (!(ah->ah_flags & IB_AH_GRH))
 			return -1;
 
-		err = mlx4_ib_resolve_grh(dev, ah, mac, &is_mcast, port);
-		if (err)
-			return err;
-
-		memcpy(path->dmac, mac, 6);
+		memcpy(path->dmac, ah->dmac, 6);
 		path->ackto = MLX4_IB_LINK_TYPE_ETH;
 		/* use index 0 into MAC table for IBoE */
 		path->grh_mylmc &= 0x80;
 
-		vlan_tag = rdma_get_vlan_id(&dev->iboe.gid_table[port - 1][ah->grh.sgid_index]);
+		vlan_tag = ah->vlan;
 		if (vlan_tag < 0x1000) {
 			if (mlx4_find_cached_vlan(dev->dev, port, vlan_tag, &vidx))
 				return -ENOENT;
@@ -1236,6 +1229,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	enum mlx4_qp_optpar optpar = 0;
 	int sqd_event;
 	int err = -EINVAL;
+	int is_eth;
 
 	context = kzalloc(sizeof *context, GFP_KERNEL);
 	if (!context)
@@ -1464,6 +1458,13 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		context->pri_path.ackto = (context->pri_path.ackto & 0xf8) |
 					MLX4_IB_LINK_TYPE_ETH;
 
+	if (ibqp->qp_type == IB_QPT_UD)
+		if (is_eth && (new_state == IB_QPS_RTR)) {
+			context->pri_path.ackto = MLX4_IB_LINK_TYPE_ETH;
+			optpar |= MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH;
+		}
+
+
 	if (cur_state == IB_QPS_RTS && new_state == IB_QPS_SQD	&&
 	    attr_mask & IB_QP_EN_SQD_ASYNC_NOTIFY && attr->en_sqd_async_notify)
 		sqd_event = 1;
diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h
index 98fa492..72ba0a9 100644
--- a/include/linux/mlx4/cq.h
+++ b/include/linux/mlx4/cq.h
@@ -43,10 +43,15 @@ struct mlx4_cqe {
 	__be32			immed_rss_invalid;
 	__be32			g_mlpath_rqpn;
 	__be16			sl_vid;
-	__be16			rlid;
-	__be16			status;
-	u8			ipv6_ext_mask;
-	u8			badfcs_enc;
+	union {
+		struct {
+			__be16	rlid;
+			__be16  status;
+			u8      ipv6_ext_mask;
+			u8      badfcs_enc;
+		};
+		u8  smac[6];
+	};
 	__be32			byte_cnt;
 	__be16			wqe_index;
 	__be16			checksum;
@@ -83,6 +88,7 @@ struct mlx4_ts_cqe {
 enum {
 	MLX4_CQE_VLAN_PRESENT_MASK	= 1 << 29,
 	MLX4_CQE_QPN_MASK		= 0xffffff,
+	MLX4_CQE_VID_MASK		= 0xfff,
 };
 
 enum {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-06-19  7:47   ` [PATCH V1 for-next 3/6] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing Or Gerlitz
@ 2013-06-19  7:47   ` Or Gerlitz
       [not found]     ` <1371628031-17546-5-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-06-19  7:47   ` [PATCH V1 for-next 5/6] IB/core: Add RoCE IP based addressing extensions for uverbs Or Gerlitz
  2013-06-19  7:47   ` [PATCH V1 for-next 6/6] IB/core: Add RoCE IP based addressing extensions for rdma_ucm Or Gerlitz
  5 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Igor Ivanov, Hadar Hen Zion,
	Or Gerlitz

From: Igor Ivanov <Igor.Ivanov-wN0M4riKYwLQT0dZR+AlfA@public.gmane.org>

Add Infra-structure to support extended uverbs capabilities in a forward/backward
manner. Uverbs command opcodes which are based on the verbs extensions approach should
be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
and processed a bit differently.

Signed-off-by: Igor Ivanov <Igor.Ivanov-wN0M4riKYwLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Hadar Hen Zion <hadarh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_main.c |   29 ++++++++++++++++++++++++-----
 include/uapi/rdma/ib_user_verbs.h     |   10 ++++++++++
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 2c6f0f2..e4e7b24 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -583,9 +583,6 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	if (copy_from_user(&hdr, buf, sizeof hdr))
 		return -EFAULT;
 
-	if (hdr.in_words * 4 != count)
-		return -EINVAL;
-
 	if (hdr.command >= ARRAY_SIZE(uverbs_cmd_table) ||
 	    !uverbs_cmd_table[hdr.command])
 		return -EINVAL;
@@ -597,8 +594,30 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf,
 	if (!(file->device->ib_dev->uverbs_cmd_mask & (1ull << hdr.command)))
 		return -ENOSYS;
 
-	return uverbs_cmd_table[hdr.command](file, buf + sizeof hdr,
-					     hdr.in_words * 4, hdr.out_words * 4);
+	if (hdr.command >= IB_USER_VERBS_CMD_THRESHOLD) {
+		struct ib_uverbs_cmd_hdr_ex hdr_ex;
+
+		if (copy_from_user(&hdr_ex, buf, sizeof(hdr_ex)))
+			return -EFAULT;
+
+		if (((hdr_ex.in_words + hdr_ex.provider_in_words) * 4) != count)
+			return -EINVAL;
+
+		return uverbs_cmd_table[hdr.command](file,
+						     buf + sizeof(hdr_ex),
+						     (hdr_ex.in_words +
+						      hdr_ex.provider_in_words) * 4,
+						     (hdr_ex.out_words +
+						      hdr_ex.provider_out_words) * 4);
+	} else {
+		if (hdr.in_words * 4 != count)
+			return -EINVAL;
+
+		return uverbs_cmd_table[hdr.command](file,
+						     buf + sizeof(hdr),
+						     hdr.in_words * 4,
+						     hdr.out_words * 4);
+	}
 }
 
 static int ib_uverbs_mmap(struct file *filp, struct vm_area_struct *vma)
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 805711e..61535aa 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -43,6 +43,7 @@
  * compatibility are made.
  */
 #define IB_USER_VERBS_ABI_VERSION	6
+#define IB_USER_VERBS_CMD_THRESHOLD    50
 
 enum {
 	IB_USER_VERBS_CMD_GET_CONTEXT,
@@ -123,6 +124,15 @@ struct ib_uverbs_cmd_hdr {
 	__u16 out_words;
 };
 
+struct ib_uverbs_cmd_hdr_ex {
+	__u32 command;
+	__u16 in_words;
+	__u16 out_words;
+	__u16 provider_in_words;
+	__u16 provider_out_words;
+	__u32 cmd_hdr_reserved;
+};
+
 struct ib_uverbs_get_context {
 	__u64 response;
 	__u64 driver_data[0];
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V1 for-next 5/6] IB/core: Add RoCE IP based addressing extensions for uverbs
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2013-06-19  7:47   ` [PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs Or Gerlitz
@ 2013-06-19  7:47   ` Or Gerlitz
  2013-06-19  7:47   ` [PATCH V1 for-next 6/6] IB/core: Add RoCE IP based addressing extensions for rdma_ucm Or Gerlitz
  5 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add uverbs support for RoCE (IBoE) IP based addressing extensions
towards user space libraries.

Under ip based gid addressing, for RC QPs, QP attributes should contain the
Ethernet L2 destination. Until now, indicatings GID was sufficient. When
using ip encoded in gids, the QP attributes should contain extended destination,
indicating vlan and dmac as well. This is done via a new struct ib_uverbs_qp_dest_ex.
This new structure is contained in a new struct ib_uverbs_modify_qp_ex that is
used by MODIFY_QP_EX command. In order to make those changes seamlessly, those
extended structures were added in the bottom of the current structures.

Also, when the gid encodes ip address, the AH attributes should contain also
vlan and dmac. Therefore, ib_uverbs_create_ah was extended to contain those fields.
When creating an AH, the user indicates the exact L2 ethernet destination
parameters. This is done by a new CREATE_AH_EX command that uses a new struct
ib_uverbs_create_ah_ex.

struct ib_user_path_rec was extended too, to contain source and destination
MAC and VLAN ID, this structure is of use by the rdma_ucm driver.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs.h          |    2 +
 drivers/infiniband/core/uverbs_cmd.c      |  330 ++++++++++++++++++++++-------
 drivers/infiniband/core/uverbs_main.c     |    4 +-
 drivers/infiniband/core/uverbs_marshall.c |   94 ++++++++-
 include/rdma/ib_marshall.h                |   12 +
 include/uapi/rdma/ib_user_sa.h            |   34 +++-
 include/uapi/rdma/ib_user_verbs.h         |  120 +++++++++++-
 7 files changed, 503 insertions(+), 93 deletions(-)

diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 0fcd7aa..1ec4850 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -200,11 +200,13 @@ IB_UVERBS_DECLARE_CMD(create_qp);
 IB_UVERBS_DECLARE_CMD(open_qp);
 IB_UVERBS_DECLARE_CMD(query_qp);
 IB_UVERBS_DECLARE_CMD(modify_qp);
+IB_UVERBS_DECLARE_CMD(modify_qp_ex);
 IB_UVERBS_DECLARE_CMD(destroy_qp);
 IB_UVERBS_DECLARE_CMD(post_send);
 IB_UVERBS_DECLARE_CMD(post_recv);
 IB_UVERBS_DECLARE_CMD(post_srq_recv);
 IB_UVERBS_DECLARE_CMD(create_ah);
+IB_UVERBS_DECLARE_CMD(create_ah_ex);
 IB_UVERBS_DECLARE_CMD(destroy_ah);
 IB_UVERBS_DECLARE_CMD(attach_mcast);
 IB_UVERBS_DECLARE_CMD(detach_mcast);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a7d00f6..eb3e7e6 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1891,6 +1891,58 @@ static int modify_qp_mask(enum ib_qp_type qp_type, int mask)
 	}
 }
 
+static void ib_uverbs_modify_qp_assign(struct ib_uverbs_modify_qp *cmd,
+				       struct ib_qp_attr *attr) {
+	attr->qp_state		  = cmd->qp_state;
+	attr->cur_qp_state	  = cmd->cur_qp_state;
+	attr->path_mtu		  = cmd->path_mtu;
+	attr->path_mig_state	  = cmd->path_mig_state;
+	attr->qkey		  = cmd->qkey;
+	attr->rq_psn		  = cmd->rq_psn;
+	attr->sq_psn		  = cmd->sq_psn;
+	attr->dest_qp_num	  = cmd->dest_qp_num;
+	attr->qp_access_flags	  = cmd->qp_access_flags;
+	attr->pkey_index	  = cmd->pkey_index;
+	attr->alt_pkey_index	  = cmd->alt_pkey_index;
+	attr->en_sqd_async_notify = cmd->en_sqd_async_notify;
+	attr->max_rd_atomic	  = cmd->max_rd_atomic;
+	attr->max_dest_rd_atomic  = cmd->max_dest_rd_atomic;
+	attr->min_rnr_timer	  = cmd->min_rnr_timer;
+	attr->port_num		  = cmd->port_num;
+	attr->timeout		  = cmd->timeout;
+	attr->retry_cnt		  = cmd->retry_cnt;
+	attr->rnr_retry		  = cmd->rnr_retry;
+	attr->alt_port_num	  = cmd->alt_port_num;
+	attr->alt_timeout	  = cmd->alt_timeout;
+
+	memcpy(attr->ah_attr.grh.dgid.raw, cmd->dest.dgid, 16);
+	attr->ah_attr.grh.flow_label        = cmd->dest.flow_label;
+	attr->ah_attr.grh.sgid_index        = cmd->dest.sgid_index;
+	attr->ah_attr.grh.hop_limit         = cmd->dest.hop_limit;
+	attr->ah_attr.grh.traffic_class     = cmd->dest.traffic_class;
+	attr->ah_attr.dlid		    = cmd->dest.dlid;
+	attr->ah_attr.sl		    = cmd->dest.sl;
+	attr->ah_attr.src_path_bits	    = cmd->dest.src_path_bits;
+	attr->ah_attr.static_rate	    = cmd->dest.static_rate;
+	attr->ah_attr.ah_flags		    = cmd->dest.is_global ?
+					      IB_AH_GRH : 0;
+	attr->ah_attr.port_num		    = cmd->dest.port_num;
+
+	memcpy(attr->alt_ah_attr.grh.dgid.raw, cmd->alt_dest.dgid, 16);
+	attr->alt_ah_attr.grh.flow_label    = cmd->alt_dest.flow_label;
+	attr->alt_ah_attr.grh.sgid_index    = cmd->alt_dest.sgid_index;
+	attr->alt_ah_attr.grh.hop_limit     = cmd->alt_dest.hop_limit;
+	attr->alt_ah_attr.grh.traffic_class = cmd->alt_dest.traffic_class;
+	attr->alt_ah_attr.dlid		    = cmd->alt_dest.dlid;
+	attr->alt_ah_attr.sl		    = cmd->alt_dest.sl;
+	attr->alt_ah_attr.src_path_bits     = cmd->alt_dest.src_path_bits;
+	attr->alt_ah_attr.static_rate       = cmd->alt_dest.static_rate;
+	attr->alt_ah_attr.ah_flags	    = cmd->alt_dest.is_global
+					      ? IB_AH_GRH : 0;
+	attr->alt_ah_attr.port_num	    = cmd->alt_dest.port_num;
+}
+
+
 ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 			    const char __user *buf, int in_len,
 			    int out_len)
@@ -1917,51 +1969,11 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 		goto out;
 	}
 
-	attr->qp_state 		  = cmd.qp_state;
-	attr->cur_qp_state 	  = cmd.cur_qp_state;
-	attr->path_mtu 		  = cmd.path_mtu;
-	attr->path_mig_state 	  = cmd.path_mig_state;
-	attr->qkey 		  = cmd.qkey;
-	attr->rq_psn 		  = cmd.rq_psn;
-	attr->sq_psn 		  = cmd.sq_psn;
-	attr->dest_qp_num 	  = cmd.dest_qp_num;
-	attr->qp_access_flags 	  = cmd.qp_access_flags;
-	attr->pkey_index 	  = cmd.pkey_index;
-	attr->alt_pkey_index 	  = cmd.alt_pkey_index;
-	attr->en_sqd_async_notify = cmd.en_sqd_async_notify;
-	attr->max_rd_atomic 	  = cmd.max_rd_atomic;
-	attr->max_dest_rd_atomic  = cmd.max_dest_rd_atomic;
-	attr->min_rnr_timer 	  = cmd.min_rnr_timer;
-	attr->port_num 		  = cmd.port_num;
-	attr->timeout 		  = cmd.timeout;
-	attr->retry_cnt 	  = cmd.retry_cnt;
-	attr->rnr_retry 	  = cmd.rnr_retry;
-	attr->alt_port_num 	  = cmd.alt_port_num;
-	attr->alt_timeout 	  = cmd.alt_timeout;
-
-	memcpy(attr->ah_attr.grh.dgid.raw, cmd.dest.dgid, 16);
-	attr->ah_attr.grh.flow_label        = cmd.dest.flow_label;
-	attr->ah_attr.grh.sgid_index        = cmd.dest.sgid_index;
-	attr->ah_attr.grh.hop_limit         = cmd.dest.hop_limit;
-	attr->ah_attr.grh.traffic_class     = cmd.dest.traffic_class;
-	attr->ah_attr.dlid 	    	    = cmd.dest.dlid;
-	attr->ah_attr.sl   	    	    = cmd.dest.sl;
-	attr->ah_attr.src_path_bits 	    = cmd.dest.src_path_bits;
-	attr->ah_attr.static_rate   	    = cmd.dest.static_rate;
-	attr->ah_attr.ah_flags 	    	    = cmd.dest.is_global ? IB_AH_GRH : 0;
-	attr->ah_attr.port_num 	    	    = cmd.dest.port_num;
-
-	memcpy(attr->alt_ah_attr.grh.dgid.raw, cmd.alt_dest.dgid, 16);
-	attr->alt_ah_attr.grh.flow_label    = cmd.alt_dest.flow_label;
-	attr->alt_ah_attr.grh.sgid_index    = cmd.alt_dest.sgid_index;
-	attr->alt_ah_attr.grh.hop_limit     = cmd.alt_dest.hop_limit;
-	attr->alt_ah_attr.grh.traffic_class = cmd.alt_dest.traffic_class;
-	attr->alt_ah_attr.dlid 	    	    = cmd.alt_dest.dlid;
-	attr->alt_ah_attr.sl   	    	    = cmd.alt_dest.sl;
-	attr->alt_ah_attr.src_path_bits     = cmd.alt_dest.src_path_bits;
-	attr->alt_ah_attr.static_rate       = cmd.alt_dest.static_rate;
-	attr->alt_ah_attr.ah_flags 	    = cmd.alt_dest.is_global ? IB_AH_GRH : 0;
-	attr->alt_ah_attr.port_num 	    = cmd.alt_dest.port_num;
+	ib_uverbs_modify_qp_assign(&cmd, attr);
+	memset(attr->ah_attr.dmac, 0, sizeof(attr->ah_attr.dmac));
+	attr->ah_attr.vlan = 0xFFFF;
+	memset(attr->alt_ah_attr.dmac, 0, sizeof(attr->alt_ah_attr.dmac));
+	attr->alt_ah_attr.vlan = 0xFFFF;
 
 	if (qp->real_qp == qp) {
 		ret = qp->device->modify_qp(qp, attr,
@@ -1983,6 +1995,80 @@ out:
 	return ret;
 }
 
+ssize_t ib_uverbs_modify_qp_ex(struct ib_uverbs_file *file,
+			       const char __user *buf, int in_len,
+			       int out_len)
+{
+	struct ib_uverbs_modify_qp_ex cmd;
+	struct ib_udata               udata;
+	struct ib_qp                 *qp;
+	struct ib_qp_attr	     *attr;
+	int                           ret;
+
+	if (copy_from_user(&cmd, buf, sizeof(cmd)))
+		return -EFAULT;
+
+	INIT_UDATA(&udata, buf + sizeof(cmd), NULL, in_len - sizeof(cmd),
+		   out_len);
+
+	attr = kmalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return -ENOMEM;
+
+	qp = idr_read_qp(cmd.qp_handle, file->ucontext);
+	if (!qp) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ib_uverbs_modify_qp_assign((struct ib_uverbs_modify_qp *)((void *)&cmd +
+				    sizeof(cmd.comp_mask)), attr);
+
+	if (cmd.comp_mask & IB_UVERBS_MODIFY_QP_EX_DEST_EX_FLAGS) {
+		if (cmd.dest_ex.comp_mask & IBV_QP_DEST_EX_DMAC)
+			memcpy(attr->ah_attr.dmac, cmd.dest_ex.dmac,
+			       sizeof(attr->ah_attr.dmac));
+		else
+			memset(attr->ah_attr.dmac, 0,
+			       sizeof(attr->ah_attr.dmac));
+		if (cmd.dest_ex.comp_mask & IBV_QP_DEST_EX_VID)
+			attr->ah_attr.vlan = cmd.dest_ex.vid;
+		else
+			attr->ah_attr.vlan = 0xFFFF;
+	}
+	if (cmd.comp_mask & IB_UVERBS_MODIFY_QP_EX_ALT_DEST_EX_FLAGS) {
+		if (cmd.alt_dest_ex.comp_mask & IBV_QP_DEST_EX_DMAC)
+			memcpy(attr->alt_ah_attr.dmac, cmd.alt_dest_ex.dmac,
+			       sizeof(attr->alt_ah_attr.dmac));
+		else
+			memset(attr->alt_ah_attr.dmac, 0,
+			       sizeof(attr->alt_ah_attr.dmac));
+		if (cmd.alt_dest_ex.comp_mask & IBV_QP_DEST_EX_VID)
+			attr->alt_ah_attr.vlan = cmd.alt_dest_ex.vid;
+		else
+			attr->alt_ah_attr.vlan = 0xFFFF;
+	}
+
+	if (qp->real_qp == qp) {
+		ret = qp->device->modify_qp(qp, attr,
+			modify_qp_mask(qp->qp_type, cmd.attr_mask), &udata);
+	} else {
+		ret = ib_modify_qp(qp, attr,
+				   modify_qp_mask(qp->qp_type, cmd.attr_mask));
+	}
+
+	put_qp_read(qp);
+
+	if (ret)
+		goto out;
+
+	ret = in_len;
+
+out:
+	kfree(attr);
+
+	return ret;
+}
 ssize_t ib_uverbs_destroy_qp(struct ib_uverbs_file *file,
 			     const char __user *buf, int in_len,
 			     int out_len)
@@ -2377,48 +2463,51 @@ out:
 	return ret ? ret : in_len;
 }
 
-ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
-			    const char __user *buf, int in_len,
-			    int out_len)
+struct ib_uobject *ib_uverbs_create_ah_assign(
+		struct ib_uverbs_create_ah_ex *cmd,
+		struct ib_uverbs_ah_attr_ex *src_attr,
+		struct ib_uverbs_file *file)
 {
-	struct ib_uverbs_create_ah	 cmd;
-	struct ib_uverbs_create_ah_resp	 resp;
-	struct ib_uobject		*uobj;
 	struct ib_pd			*pd;
 	struct ib_ah			*ah;
 	struct ib_ah_attr		attr;
-	int ret;
-
-	if (out_len < sizeof resp)
-		return -ENOSPC;
-
-	if (copy_from_user(&cmd, buf, sizeof cmd))
-		return -EFAULT;
+	struct ib_uobject		*uobj;
+	long				ret;
 
-	uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
+	uobj = kmalloc(sizeof(*uobj), GFP_KERNEL);
 	if (!uobj)
-		return -ENOMEM;
+		return (void *)-ENOMEM;
 
-	init_uobj(uobj, cmd.user_handle, file->ucontext, &ah_lock_class);
+	init_uobj(uobj, cmd->user_handle, file->ucontext, &ah_lock_class);
 	down_write(&uobj->mutex);
 
-	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
+	pd = idr_read_pd(cmd->pd_handle, file->ucontext);
 	if (!pd) {
 		ret = -EINVAL;
 		goto err;
 	}
 
-	attr.dlid 	       = cmd.attr.dlid;
-	attr.sl 	       = cmd.attr.sl;
-	attr.src_path_bits     = cmd.attr.src_path_bits;
-	attr.static_rate       = cmd.attr.static_rate;
-	attr.ah_flags          = cmd.attr.is_global ? IB_AH_GRH : 0;
-	attr.port_num 	       = cmd.attr.port_num;
-	attr.grh.flow_label    = cmd.attr.grh.flow_label;
-	attr.grh.sgid_index    = cmd.attr.grh.sgid_index;
-	attr.grh.hop_limit     = cmd.attr.grh.hop_limit;
-	attr.grh.traffic_class = cmd.attr.grh.traffic_class;
-	memcpy(attr.grh.dgid.raw, cmd.attr.grh.dgid, 16);
+	attr.dlid	       = src_attr->dlid;
+	attr.sl		       = src_attr->sl;
+	attr.src_path_bits     = src_attr->src_path_bits;
+	attr.static_rate       = src_attr->static_rate;
+	attr.ah_flags          = src_attr->is_global ? IB_AH_GRH : 0;
+	attr.port_num	       = src_attr->port_num;
+	attr.grh.flow_label    = src_attr->grh.flow_label;
+	attr.grh.sgid_index    = src_attr->grh.sgid_index;
+	attr.grh.hop_limit     = src_attr->grh.hop_limit;
+	attr.grh.traffic_class = src_attr->grh.traffic_class;
+	memcpy(attr.grh.dgid.raw, src_attr->grh.dgid, 16);
+
+	if (src_attr->comp_mask & IB_UVERBS_AH_ATTR_DMAC)
+		memcpy(attr.dmac, src_attr->dmac, sizeof(attr.dmac));
+	else
+		memset(attr.dmac, 0, sizeof(attr.dmac));
+
+	if (src_attr->comp_mask & IB_UVERBS_AH_ATTR_VID)
+		attr.vlan = src_attr->vlan;
+	else
+		attr.vlan = 0xFFFF;
 
 	ah = ib_create_ah(pd, &attr);
 	if (IS_ERR(ah)) {
@@ -2427,22 +2516,62 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	}
 
 	ah->uobject  = uobj;
+
 	uobj->object = ah;
 
 	ret = idr_add_uobj(&ib_uverbs_ah_idr, uobj);
 	if (ret)
 		goto err_destroy;
 
+	put_pd_read(pd);
+
+	return uobj;
+
+err_destroy:
+	ib_destroy_ah(ah);
+err_put:
+	put_pd_read(pd);
+err:
+	put_uobj_write(uobj);
+	return (void *)ret;
+}
+
+ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
+			    const char __user *buf, int in_len,
+			    int out_len)
+{
+	struct ib_uverbs_create_ah_ex	 cmd_ex;
+	struct ib_uverbs_create_ah	*cmd = (struct ib_uverbs_create_ah *)
+					       ((void *)&cmd_ex +
+						sizeof(cmd_ex.comp_mask));
+	struct ib_uverbs_ah_attr_ex	 attr_ex;
+	struct ib_uverbs_create_ah_resp	 resp;
+	struct ib_uobject		*uobj;
+	int ret;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	cmd_ex.comp_mask = 0;
+	if (copy_from_user(cmd, buf, sizeof(*cmd)))
+		return -EFAULT;
+
+	attr_ex.comp_mask = 0;
+	memcpy(((void *)&attr_ex) + sizeof(attr_ex.comp_mask),
+	       &cmd->attr, sizeof(cmd->attr));
+
+	uobj = ib_uverbs_create_ah_assign(&cmd_ex,  &attr_ex, file);
+	if (IS_ERR(uobj))
+		return (ssize_t)uobj;
+
 	resp.ah_handle = uobj->id;
 
-	if (copy_to_user((void __user *) (unsigned long) cmd.response,
+	if (copy_to_user((void __user *)(unsigned long) cmd->response,
 			 &resp, sizeof resp)) {
 		ret = -EFAULT;
 		goto err_copy;
 	}
 
-	put_pd_read(pd);
-
 	mutex_lock(&file->mutex);
 	list_add_tail(&uobj->list, &file->ucontext->ah_list);
 	mutex_unlock(&file->mutex);
@@ -2455,15 +2584,54 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 
 err_copy:
 	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	ib_destroy_ah(uobj->object);
+	put_uobj_write(uobj);
 
-err_destroy:
-	ib_destroy_ah(ah);
+	return ret;
+}
 
-err_put:
-	put_pd_read(pd);
+ssize_t ib_uverbs_create_ah_ex(struct ib_uverbs_file *file,
+			       const char __user *buf, int in_len,
+			       int out_len)
+{
+	struct ib_uverbs_create_ah_ex	 cmd_ex;
+	struct ib_uverbs_create_ah_resp	 resp;
+	struct ib_uobject		*uobj;
+	int ret;
 
-err:
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd_ex, buf, sizeof(cmd_ex)))
+		return -EFAULT;
+
+	uobj = ib_uverbs_create_ah_assign(&cmd_ex,  &cmd_ex.attr, file);
+	if (IS_ERR(uobj))
+		return (ssize_t)uobj;
+
+	resp.ah_handle = uobj->id;
+
+	if (copy_to_user((void __user *)(unsigned long)cmd_ex.response,
+			 &resp, sizeof(resp))) {
+		ret = -EFAULT;
+		goto err_copy;
+	}
+
+	mutex_lock(&file->mutex);
+	list_add_tail(&uobj->list, &file->ucontext->ah_list);
+	mutex_unlock(&file->mutex);
+
+	uobj->live = 1;
+
+	up_write(&uobj->mutex);
+
+	return in_len;
+
+err_copy:
+	idr_remove_uobj(&ib_uverbs_ah_idr, uobj);
+	ib_destroy_ah(uobj->object);
 	put_uobj_write(uobj);
+
 	return ret;
 }
 
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index e4e7b24..93264c8 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -113,7 +113,9 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
 	[IB_USER_VERBS_CMD_OPEN_XRCD]		= ib_uverbs_open_xrcd,
 	[IB_USER_VERBS_CMD_CLOSE_XRCD]		= ib_uverbs_close_xrcd,
 	[IB_USER_VERBS_CMD_CREATE_XSRQ]		= ib_uverbs_create_xsrq,
-	[IB_USER_VERBS_CMD_OPEN_QP]		= ib_uverbs_open_qp
+	[IB_USER_VERBS_CMD_OPEN_QP]		= ib_uverbs_open_qp,
+	[IB_USER_VERBS_CMD_MODIFY_QP_EX]	= ib_uverbs_modify_qp_ex,
+	[IB_USER_VERBS_CMD_CREATE_AH_EX]	= ib_uverbs_create_ah_ex,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c
index e7bee46..0470407 100644
--- a/drivers/infiniband/core/uverbs_marshall.c
+++ b/drivers/infiniband/core/uverbs_marshall.c
@@ -33,6 +33,9 @@
 #include <linux/export.h>
 #include <rdma/ib_marshall.h>
 
+#define UVERB_EX_TO_UVERB(uverb_ex) ((void *)(uverb_ex) + \
+				     sizeof(uverb_ex->comp_mask))
+
 void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
 			     struct ib_ah_attr *src)
 {
@@ -52,9 +55,20 @@ void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
 }
 EXPORT_SYMBOL(ib_copy_ah_attr_to_user);
 
-void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
-			     struct ib_qp_attr *src)
+void ib_copy_ah_attr_to_user_ex(struct ib_uverbs_ah_attr_ex *dst,
+				struct ib_ah_attr *src)
 {
+	ib_copy_ah_attr_to_user((struct ib_uverbs_ah_attr *)
+				UVERB_EX_TO_UVERB(dst), src);
+	dst->comp_mask = IB_UVERBS_AH_ATTR_DMAC;
+	memcpy(dst->dmac, src->dmac, sizeof(dst->dmac));
+	dst->comp_mask = IB_UVERBS_AH_ATTR_VID;
+	dst->vlan		   = src->vlan;
+}
+EXPORT_SYMBOL(ib_copy_ah_attr_to_user_ex);
+
+static void ib_copy_qp_attr_to_user_data(struct ib_uverbs_qp_attr *dst,
+					 struct ib_qp_attr *src) {
 	dst->qp_state	        = src->qp_state;
 	dst->cur_qp_state	= src->cur_qp_state;
 	dst->path_mtu		= src->path_mtu;
@@ -71,9 +85,6 @@ void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
 	dst->max_recv_sge	= src->cap.max_recv_sge;
 	dst->max_inline_data	= src->cap.max_inline_data;
 
-	ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr);
-	ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr);
-
 	dst->pkey_index		= src->pkey_index;
 	dst->alt_pkey_index	= src->alt_pkey_index;
 	dst->en_sqd_async_notify = src->en_sqd_async_notify;
@@ -89,8 +100,26 @@ void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
 	dst->alt_timeout	= src->alt_timeout;
 	memset(dst->reserved, 0, sizeof(dst->reserved));
 }
+
+void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
+			     struct ib_qp_attr *src)
+{
+	ib_copy_qp_attr_to_user_data(dst, src);
+	ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr);
+	ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr);
+}
 EXPORT_SYMBOL(ib_copy_qp_attr_to_user);
 
+void ib_copy_qp_attr_to_user_ex(struct ib_uverbs_qp_attr_ex *dst,
+				struct ib_qp_attr *src)
+{
+	ib_copy_qp_attr_to_user_data((struct ib_uverbs_qp_attr *)
+				     UVERB_EX_TO_UVERB(dst), src);
+	ib_copy_ah_attr_to_user_ex(&dst->ah_attr, &src->ah_attr);
+	ib_copy_ah_attr_to_user_ex(&dst->alt_ah_attr, &src->alt_ah_attr);
+}
+EXPORT_SYMBOL(ib_copy_qp_attr_to_user_ex);
+
 void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
 			      struct ib_sa_path_rec *src)
 {
@@ -117,11 +146,27 @@ void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
 }
 EXPORT_SYMBOL(ib_copy_path_rec_to_user);
 
-void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
-				struct ib_user_path_rec *src)
+void ib_copy_path_rec_to_user_ex(struct ib_user_path_rec_ex *dst,
+				 struct ib_sa_path_rec *src)
+{
+	ib_copy_path_rec_to_user((struct ib_user_path_rec *)
+				  UVERB_EX_TO_UVERB(dst), src);
+
+	dst->comp_mask = IB_USER_PATH_REC_ATTR_DMAC |
+			 IB_USER_PATH_REC_ATTR_SMAC |
+			 IB_USER_PATH_REC_ATTR_VID;
+
+	memcpy(dst->dmac, src->dmac, sizeof(dst->dmac));
+	memcpy(dst->smac, src->smac, sizeof(dst->smac));
+	dst->vlan = src->vlan;
+}
+EXPORT_SYMBOL(ib_copy_path_rec_to_user_ex);
+
+void ib_copy_path_rec_from_user_assign(struct ib_sa_path_rec *dst,
+				       struct ib_user_path_rec *src)
 {
-	memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid);
-	memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid);
+	memcpy(dst->dgid.raw, src->dgid, sizeof(dst->dgid));
+	memcpy(dst->sgid.raw, src->sgid, sizeof(dst->sgid));
 
 	dst->dlid		= src->dlid;
 	dst->slid		= src->slid;
@@ -141,4 +186,35 @@ void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
 	dst->preference		= src->preference;
 	dst->packet_life_time_selector = src->packet_life_time_selector;
 }
+
+void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
+				struct ib_user_path_rec *src) {
+	memset(dst->dmac, 0, sizeof(dst->dmac));
+	memset(dst->smac, 0, sizeof(dst->smac));
+	dst->vlan = 0xFFFF;
+
+	ib_copy_path_rec_from_user_assign(dst, src);
+}
 EXPORT_SYMBOL(ib_copy_path_rec_from_user);
+
+void ib_copy_path_rec_from_user_ex(struct ib_sa_path_rec *dst,
+				   struct ib_user_path_rec_ex *src) {
+	if (src->comp_mask & IB_USER_PATH_REC_ATTR_DMAC)
+		memcpy(dst->dmac, src->dmac, sizeof(dst->dmac));
+	else
+		memset(dst->dmac, 0, sizeof(dst->dmac));
+
+	if (src->comp_mask & IB_USER_PATH_REC_ATTR_SMAC)
+		memcpy(dst->smac, src->smac, sizeof(dst->smac));
+	else
+		memset(dst->smac, 0, sizeof(dst->smac));
+
+	if (src->comp_mask & IB_USER_PATH_REC_ATTR_VID)
+		dst->vlan = src->vlan;
+	else
+		dst->vlan = 0xFFFF;
+
+	ib_copy_path_rec_from_user_assign(dst, (struct ib_user_path_rec *)
+					  UVERB_EX_TO_UVERB(src));
+}
+EXPORT_SYMBOL(ib_copy_path_rec_from_user_ex);
diff --git a/include/rdma/ib_marshall.h b/include/rdma/ib_marshall.h
index db03720..11ab3a8 100644
--- a/include/rdma/ib_marshall.h
+++ b/include/rdma/ib_marshall.h
@@ -41,13 +41,25 @@
 void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
 			     struct ib_qp_attr *src);
 
+void ib_copy_qp_attr_to_user_ex(struct ib_uverbs_qp_attr_ex *dst,
+				struct ib_qp_attr *src);
+
 void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
 			     struct ib_ah_attr *src);
 
+void ib_copy_ah_attr_to_user_ex(struct ib_uverbs_ah_attr_ex *dst,
+				struct ib_ah_attr *src);
+
 void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
 			      struct ib_sa_path_rec *src);
 
+void ib_copy_path_rec_to_user_ex(struct ib_user_path_rec_ex *dst,
+				 struct ib_sa_path_rec *src);
+
 void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
 				struct ib_user_path_rec *src);
 
+void ib_copy_path_rec_from_user_ex(struct ib_sa_path_rec *dst,
+				   struct ib_user_path_rec_ex *src);
+
 #endif /* IB_USER_MARSHALL_H */
diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h
index cfc7c9b..367d66a 100644
--- a/include/uapi/rdma/ib_user_sa.h
+++ b/include/uapi/rdma/ib_user_sa.h
@@ -48,7 +48,13 @@ enum {
 struct ib_path_rec_data {
 	__u32	flags;
 	__u32	reserved;
-	__u32	path_rec[16];
+	__u32	path_rec[20];
+};
+
+enum ibv_kern_path_rec_attr_mask {
+	IB_USER_PATH_REC_ATTR_DMAC = 1ULL << 0,
+	IB_USER_PATH_REC_ATTR_SMAC = 1ULL << 1,
+	IB_USER_PATH_REC_ATTR_VID  = 1ULL << 2
 };
 
 struct ib_user_path_rec {
@@ -73,4 +79,30 @@ struct ib_user_path_rec {
 	__u8	preference;
 };
 
+struct ib_user_path_rec_ex {
+	__u32   comp_mask;
+	__u8	dgid[16];
+	__u8	sgid[16];
+	__be16	dlid;
+	__be16	slid;
+	__u32	raw_traffic;
+	__be32	flow_label;
+	__u32	reversible;
+	__u32	mtu;
+	__be16	pkey;
+	__u8	hop_limit;
+	__u8	traffic_class;
+	__u8	numb_path;
+	__u8	sl;
+	__u8	mtu_selector;
+	__u8	rate_selector;
+	__u8	rate;
+	__u8	packet_life_time_selector;
+	__u8	packet_life_time;
+	__u8	preference;
+	__u8	smac[6];
+	__u8	dmac[6];
+	__be16  vlan;
+};
+
 #endif /* IB_USER_SA_H */
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 61535aa..954a790 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -86,7 +86,9 @@ enum {
 	IB_USER_VERBS_CMD_OPEN_XRCD,
 	IB_USER_VERBS_CMD_CLOSE_XRCD,
 	IB_USER_VERBS_CMD_CREATE_XSRQ,
-	IB_USER_VERBS_CMD_OPEN_QP
+	IB_USER_VERBS_CMD_OPEN_QP,
+	IB_USER_VERBS_CMD_MODIFY_QP_EX = IB_USER_VERBS_CMD_THRESHOLD,
+	IB_USER_VERBS_CMD_CREATE_AH_EX,
 };
 
 /*
@@ -392,6 +394,25 @@ struct ib_uverbs_ah_attr {
 	__u8  reserved;
 };
 
+enum ib_uverbs_ah_attr_mask {
+	IB_UVERBS_AH_ATTR_DMAC,
+	IB_UVERBS_AH_ATTR_VID
+};
+
+struct ib_uverbs_ah_attr_ex {
+	__u32 comp_mask;
+	struct ib_uverbs_global_route grh;
+	__u16 dlid;
+	__u8  sl;
+	__u8  src_path_bits;
+	__u8  static_rate;
+	__u8  is_global;
+	__u8  port_num;
+	__u8  reserved;
+	__u8  dmac[6];
+	__u16 vlan;
+};
+
 struct ib_uverbs_qp_attr {
 	__u32	qp_attr_mask;
 	__u32	qp_state;
@@ -430,6 +451,45 @@ struct ib_uverbs_qp_attr {
 	__u8	reserved[5];
 };
 
+struct ib_uverbs_qp_attr_ex {
+	__u32   comp_mask;
+	__u32	qp_attr_mask;
+	__u32	qp_state;
+	__u32	cur_qp_state;
+	__u32	path_mtu;
+	__u32	path_mig_state;
+	__u32	qkey;
+	__u32	rq_psn;
+	__u32	sq_psn;
+	__u32	dest_qp_num;
+	__u32	qp_access_flags;
+
+	struct ib_uverbs_ah_attr_ex ah_attr;
+	struct ib_uverbs_ah_attr_ex alt_ah_attr;
+
+	/* ib_qp_cap */
+	__u32	max_send_wr;
+	__u32	max_recv_wr;
+	__u32	max_send_sge;
+	__u32	max_recv_sge;
+	__u32	max_inline_data;
+
+	__u16	pkey_index;
+	__u16	alt_pkey_index;
+	__u8	en_sqd_async_notify;
+	__u8	sq_draining;
+	__u8	max_rd_atomic;
+	__u8	max_dest_rd_atomic;
+	__u8	min_rnr_timer;
+	__u8	port_num;
+	__u8	timeout;
+	__u8	retry_cnt;
+	__u8	rnr_retry;
+	__u8	alt_port_num;
+	__u8	alt_timeout;
+	__u8	reserved[5];
+};
+
 struct ib_uverbs_create_qp {
 	__u64 response;
 	__u64 user_handle;
@@ -531,6 +591,17 @@ struct ib_uverbs_query_qp_resp {
 	__u64 driver_data[0];
 };
 
+enum ib_uverbs_qp_dest_ex_comp_mask {
+	IBV_QP_DEST_EX_DMAC          = (1ULL << 0),
+	IBV_QP_DEST_EX_VID           = (1ULL << 1)
+};
+
+struct ib_uverbs_qp_dest_ex {
+	__u32 comp_mask;
+	__u8  dmac[6];
+	__u16 vid;
+};
+
 struct ib_uverbs_modify_qp {
 	struct ib_uverbs_qp_dest dest;
 	struct ib_uverbs_qp_dest alt_dest;
@@ -561,6 +632,44 @@ struct ib_uverbs_modify_qp {
 	__u64 driver_data[0];
 };
 
+enum ib_uverbs_modify_qp_ex_comp_mask {
+	IB_UVERBS_MODIFY_QP_EX_DEST_EX_FLAGS          = (1ULL << 0),
+	IB_UVERBS_MODIFY_QP_EX_ALT_DEST_EX_FLAGS      = (1ULL << 1)
+};
+
+struct ib_uverbs_modify_qp_ex {
+	__u32 comp_mask;
+	struct ib_uverbs_qp_dest dest;
+	struct ib_uverbs_qp_dest alt_dest;
+	__u32 qp_handle;
+	__u32 attr_mask;
+	__u32 qkey;
+	__u32 rq_psn;
+	__u32 sq_psn;
+	__u32 dest_qp_num;
+	__u32 qp_access_flags;
+	__u16 pkey_index;
+	__u16 alt_pkey_index;
+	__u8  qp_state;
+	__u8  cur_qp_state;
+	__u8  path_mtu;
+	__u8  path_mig_state;
+	__u8  en_sqd_async_notify;
+	__u8  max_rd_atomic;
+	__u8  max_dest_rd_atomic;
+	__u8  min_rnr_timer;
+	__u8  port_num;
+	__u8  timeout;
+	__u8  retry_cnt;
+	__u8  rnr_retry;
+	__u8  alt_port_num;
+	__u8  alt_timeout;
+	__u8  reserved[2];
+	struct ib_uverbs_qp_dest_ex dest_ex;
+	struct ib_uverbs_qp_dest_ex alt_dest_ex;
+	__u64 driver_data[0];
+};
+
 struct ib_uverbs_modify_qp_resp {
 };
 
@@ -670,6 +779,15 @@ struct ib_uverbs_create_ah {
 	struct ib_uverbs_ah_attr attr;
 };
 
+struct ib_uverbs_create_ah_ex {
+	int comp_mask;
+	__u64 response;
+	__u64 user_handle;
+	__u32 pd_handle;
+	__u32 reserved;
+	struct ib_uverbs_ah_attr_ex attr;
+};
+
 struct ib_uverbs_create_ah_resp {
 	__u32 ah_handle;
 };
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V1 for-next 6/6] IB/core: Add RoCE IP based addressing extensions for rdma_ucm
       [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2013-06-19  7:47   ` [PATCH V1 for-next 5/6] IB/core: Add RoCE IP based addressing extensions for uverbs Or Gerlitz
@ 2013-06-19  7:47   ` Or Gerlitz
  5 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2013-06-19  7:47 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add rdma_ucm support for RoCE (IBoE) IP based addressing extensions
towards librdmacm

Extend INIT_QP_ATTR and QUERY_ROUTE ucma commands.

INIT_QP_ATTR_EX uses struct ib_uverbs_qp_attr_ex

QUERY_ROUTE_EX uses struct rdma_ucm_query_route_resp_ex which in turn
uses ib_user_path_rec_ex

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/ucma.c   |  172 ++++++++++++++++++++++++++++++++++++-
 include/uapi/rdma/rdma_user_cm.h |   21 +++++-
 2 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index bc2cb5d..c7dfd99 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -599,6 +599,35 @@ static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp,
 	}
 }
 
+static void ucma_copy_ib_route_ex(struct rdma_ucm_query_route_resp_ex *resp,
+				  struct rdma_route *route)
+{
+	struct rdma_dev_addr *dev_addr;
+
+	resp->num_paths = route->num_paths;
+	switch (route->num_paths) {
+	case 0:
+		dev_addr = &route->addr.dev_addr;
+		rdma_addr_get_dgid(dev_addr,
+				   (union ib_gid *)&resp->ib_route[0].dgid);
+		rdma_addr_get_sgid(dev_addr,
+				   (union ib_gid *)&resp->ib_route[0].sgid);
+		resp->ib_route[0].pkey =
+			cpu_to_be16(ib_addr_get_pkey(dev_addr));
+		break;
+	case 2:
+		ib_copy_path_rec_to_user_ex(&resp->ib_route[1],
+					    &route->path_rec[1]);
+		/* fall through */
+	case 1:
+		ib_copy_path_rec_to_user_ex(&resp->ib_route[0],
+					    &route->path_rec[0]);
+		break;
+	default:
+		break;
+	}
+}
+
 static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp,
 				 struct rdma_route *route)
 {
@@ -625,14 +654,39 @@ static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp,
 	}
 }
 
-static void ucma_copy_iw_route(struct rdma_ucm_query_route_resp *resp,
+static void ucma_copy_iboe_route_ex(struct rdma_ucm_query_route_resp_ex *resp,
+				    struct rdma_route *route)
+{
+	resp->num_paths = route->num_paths;
+	switch (route->num_paths) {
+	case 0:
+		rdma_ip2gid((struct sockaddr *)&route->addr.dst_addr,
+			    (union ib_gid *)&resp->ib_route[0].dgid);
+		rdma_ip2gid((struct sockaddr *)&route->addr.src_addr,
+			    (union ib_gid *)&resp->ib_route[0].sgid);
+		resp->ib_route[0].pkey = cpu_to_be16(0xffff);
+		break;
+	case 2:
+		ib_copy_path_rec_to_user_ex(&resp->ib_route[1],
+					    &route->path_rec[1]);
+		/* fall through */
+	case 1:
+		ib_copy_path_rec_to_user_ex(&resp->ib_route[0],
+					    &route->path_rec[0]);
+		break;
+	default:
+		break;
+	}
+}
+
+static void ucma_copy_iw_route(struct ib_user_path_rec *resp_path,
 			       struct rdma_route *route)
 {
 	struct rdma_dev_addr *dev_addr;
 
 	dev_addr = &route->addr.dev_addr;
-	rdma_addr_get_dgid(dev_addr, (union ib_gid *) &resp->ib_route[0].dgid);
-	rdma_addr_get_sgid(dev_addr, (union ib_gid *) &resp->ib_route[0].sgid);
+	rdma_addr_get_dgid(dev_addr, (union ib_gid *)&resp_path->dgid);
+	rdma_addr_get_sgid(dev_addr, (union ib_gid *)&resp_path->sgid);
 }
 
 static ssize_t ucma_query_route(struct ucma_file *file,
@@ -684,7 +738,74 @@ static ssize_t ucma_query_route(struct ucma_file *file,
 		}
 		break;
 	case RDMA_TRANSPORT_IWARP:
-		ucma_copy_iw_route(&resp, &ctx->cm_id->route);
+		ucma_copy_iw_route(&resp.ib_route[0], &ctx->cm_id->route);
+		break;
+	default:
+		break;
+	}
+
+out:
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &resp, sizeof(resp)))
+		ret = -EFAULT;
+
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
+static ssize_t ucma_query_route_ex(struct ucma_file *file,
+				   const char __user *inbuf,
+				   int in_len, int out_len)
+{
+	struct rdma_ucm_query_route_ex cmd;
+	struct rdma_ucm_query_route_resp_ex resp;
+	struct ucma_context *ctx;
+	struct sockaddr *addr;
+	int ret = 0;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	memset(&resp, 0, sizeof(resp));
+	addr = (struct sockaddr *)&ctx->cm_id->route.addr.src_addr;
+	memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ?
+				     sizeof(struct sockaddr_in) :
+				     sizeof(struct sockaddr_in6));
+	addr = (struct sockaddr *)&ctx->cm_id->route.addr.dst_addr;
+	memcpy(&resp.dst_addr, addr, addr->sa_family == AF_INET ?
+				     sizeof(struct sockaddr_in) :
+				     sizeof(struct sockaddr_in6));
+	if (!ctx->cm_id->device)
+		goto out;
+
+	resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
+	resp.port_num = ctx->cm_id->port_num;
+	switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
+		switch (rdma_port_get_link_layer(ctx->cm_id->device,
+			ctx->cm_id->port_num)) {
+		case IB_LINK_LAYER_INFINIBAND:
+			ucma_copy_ib_route_ex(&resp, &ctx->cm_id->route);
+			break;
+		case IB_LINK_LAYER_ETHERNET:
+			ucma_copy_iboe_route_ex(&resp, &ctx->cm_id->route);
+			break;
+		default:
+			break;
+		}
+		break;
+	case RDMA_TRANSPORT_IWARP:
+		ucma_copy_iw_route((struct ib_user_path_rec *)
+				   ((void *)&resp.ib_route[0] +
+				    sizeof(resp.ib_route[0].comp_mask)),
+				   &ctx->cm_id->route);
 		break;
 	default:
 		break;
@@ -862,6 +983,43 @@ out:
 	return ret;
 }
 
+static ssize_t ucma_init_qp_attr_ex(struct ucma_file *file,
+				    const char __user *inbuf,
+				    int in_len, int out_len)
+{
+	struct rdma_ucm_init_qp_attr cmd;
+	struct ib_uverbs_qp_attr_ex resp;
+	struct ucma_context *ctx;
+	struct ib_qp_attr qp_attr;
+	int ret;
+
+	if (out_len < sizeof(resp))
+		return -ENOSPC;
+
+	if (copy_from_user(&cmd, inbuf, sizeof(cmd)))
+		return -EFAULT;
+
+	ctx = ucma_get_ctx(file, cmd.id);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	resp.qp_attr_mask = 0;
+	memset(&qp_attr, 0, sizeof(qp_attr));
+	qp_attr.qp_state = cmd.qp_state;
+	ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask);
+	if (ret)
+		goto out;
+
+	ib_copy_qp_attr_to_user_ex(&resp, &qp_attr);
+	if (copy_to_user((void __user *)(unsigned long)cmd.response,
+			 &resp, sizeof(resp)))
+		ret = -EFAULT;
+
+out:
+	ucma_put_ctx(ctx);
+	return ret;
+}
+
 static int ucma_set_option_id(struct ucma_context *ctx, int optname,
 			      void *optval, size_t optlen)
 {
@@ -1229,7 +1387,9 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
 	[RDMA_USER_CM_CMD_NOTIFY]	= ucma_notify,
 	[RDMA_USER_CM_CMD_JOIN_MCAST]	= ucma_join_multicast,
 	[RDMA_USER_CM_CMD_LEAVE_MCAST]	= ucma_leave_multicast,
-	[RDMA_USER_CM_CMD_MIGRATE_ID]	= ucma_migrate_id
+	[RDMA_USER_CM_CMD_MIGRATE_ID]	= ucma_migrate_id,
+	[RDMA_USER_CM_CMD_QUERY_ROUTE_EX] = ucma_query_route_ex,
+	[RDMA_USER_CM_CMD_INIT_QP_ATTR_EX] = ucma_init_qp_attr_ex
 };
 
 static ssize_t ucma_write(struct file *filp, const char __user *buf,
@@ -1245,6 +1405,8 @@ static ssize_t ucma_write(struct file *filp, const char __user *buf,
 	if (copy_from_user(&hdr, buf, sizeof(hdr)))
 		return -EFAULT;
 
+	pr_info("UCMA: HDR_CMD: %d\n", hdr.cmd);
+
 	if (hdr.cmd >= ARRAY_SIZE(ucma_cmd_table))
 		return -EINVAL;
 
diff --git a/include/uapi/rdma/rdma_user_cm.h b/include/uapi/rdma/rdma_user_cm.h
index 1ee9239..8dceb35 100644
--- a/include/uapi/rdma/rdma_user_cm.h
+++ b/include/uapi/rdma/rdma_user_cm.h
@@ -61,7 +61,9 @@ enum {
 	RDMA_USER_CM_CMD_NOTIFY,
 	RDMA_USER_CM_CMD_JOIN_MCAST,
 	RDMA_USER_CM_CMD_LEAVE_MCAST,
-	RDMA_USER_CM_CMD_MIGRATE_ID
+	RDMA_USER_CM_CMD_MIGRATE_ID,
+	RDMA_USER_CM_CMD_QUERY_ROUTE_EX,
+	RDMA_USER_CM_CMD_INIT_QP_ATTR_EX
 };
 
 /*
@@ -119,6 +121,13 @@ struct rdma_ucm_query_route {
 	__u32 reserved;
 };
 
+struct rdma_ucm_query_route_ex {
+	__u32 comp_mask;
+	__u64 response;
+	__u32 id;
+	__u32 reserved;
+};
+
 struct rdma_ucm_query_route_resp {
 	__u64 node_guid;
 	struct ib_user_path_rec ib_route[2];
@@ -129,6 +138,16 @@ struct rdma_ucm_query_route_resp {
 	__u8 reserved[3];
 };
 
+struct rdma_ucm_query_route_resp_ex {
+	__u64 node_guid;
+	struct ib_user_path_rec_ex ib_route[2];
+	struct sockaddr_in6 src_addr;
+	struct sockaddr_in6 dst_addr;
+	__u32 num_paths;
+	__u8 port_num;
+	__u8 reserved[3];
+};
+
 struct rdma_ucm_conn_param {
 	__u32 qp_num;
 	__u32 reserved;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs
       [not found]     ` <1371628031-17546-5-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-06-24 21:06       ` Roland Dreier
  0 siblings, 0 replies; 8+ messages in thread
From: Roland Dreier @ 2013-06-24 21:06 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis,
	matanb-VPRAkNaXOzVWk0Htik3J/w, Igor Ivanov, Hadar Hen Zion

On Wed, Jun 19, 2013 at 12:47 AM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>
> Add Infra-structure to support extended uverbs capabilities in a forward/backward
> manner. Uverbs command opcodes which are based on the verbs extensions approach should
> be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format
> and processed a bit differently.

This patch at least doesn't have a sufficient changelog.  I don't
understand what "extended capabilities" are or why we need to change
the header format.

What is the "verbs extensions approach"?  Why does the kernel need to
know about it?  What is different about the processing?  The only
difference I see is that userspace now has a more complicated way to
pass the size in, which the kernel seems to nearly ignore -- it just
adds the sizes together and proceeds as before.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-06-24 21:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-19  7:47 [PATCH V1 for-next 0/6] IP based RoCE GID Addressing Or Gerlitz
     [not found] ` <1371628031-17546-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-06-19  7:47   ` [PATCH V1 for-next 1/6] IB/core: RoCE IP based GID addressing Or Gerlitz
2013-06-19  7:47   ` [PATCH V1 for-next 2/6] IB/mlx4: Use RoCE IP based GIDs in the port GID table Or Gerlitz
2013-06-19  7:47   ` [PATCH V1 for-next 3/6] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing Or Gerlitz
2013-06-19  7:47   ` [PATCH V1 for-next 4/6] IB/core: Infra-structure to support verbs extensions through uverbs Or Gerlitz
     [not found]     ` <1371628031-17546-5-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-06-24 21:06       ` Roland Dreier
2013-06-19  7:47   ` [PATCH V1 for-next 5/6] IB/core: Add RoCE IP based addressing extensions for uverbs Or Gerlitz
2013-06-19  7:47   ` [PATCH V1 for-next 6/6] IB/core: Add RoCE IP based addressing extensions for rdma_ucm Or Gerlitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.