All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 0/8] IP based RoCE GID Addressing
@ 2013-11-13 22:29 Or Gerlitz
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Or Gerlitz

changes from V4:

 - addressed feedback re the need to be compatible with non modified user
   space applications/libraries, by adding code in uverbs which does address
   resolution when dealing with Ethernet ports. This is patch #7  

 - removed the patches that deal with uverbs extended commands, they will
   added later on, such that new applications/libraries can be coded to them.
  
 - added patch fixing mlx4_en to have correct IPv6 link local address.

See below full listing of change-history.

Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as
they encode related Ethernet net-device interface MAC address and 
possibly VLAN id.

This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6)
of the that Ethernet interface, under the following reasoning:

1. There are environments where the compute entity that runs the RoCE 
stack is not aware that its traffic is vlan-tagged. This results with that 
node to create/assume wrong GIDs from the view point of a peer node which 
is aware to vlans. 

Note that "node" here can be physical node connected to Ethernet switch acting in 
access mode talking to another node which does vlan insertion/stripping by itself.

Or another example is SRIOV Virtual Function which is configured to work in "VST" 
mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW eSWitch 
to do vlan insertion for the vPORT representing that function.

2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for 
monitoring and security purposes. It is much more natural for both humans and 
automated utilities (...) to observe IP addresses in a certain offset into RoCE 
frames L3 header vs. MAC/VLANs (which are there anyway in the L2 header of that 
frame, so they are not gone by this change).

3. Some Bonding/Teaming advanced mode such as balance-alb and balance-tlb 
are using multiple underlying devices in parallel, and hence packets always 
carry the bond IP address but different streams have different source MACs.
The approach brought by this series is part from what would allow to 
support that for RoCE traffic too.

The 1st patch adds explicit handling of Ethernet L2 attributes, source/dest 
mac and vlan_id to the kernel IB core, in data-structures and CMA/CM code. 
Previously, with MAC/VLAN based addressing, they were encoded in the GIDs, 
where now they have to be resolved and placed separately from the IP based GIDs.

The 2nd patch modifies the CMA to cope with IP based GIDs, the 3rd/4th ones do 
that for the mlx4_ib driver, and the 5th/6th patches to the ocrdma driver. 

The 7th patch adds address resolution to user space applications for RoCE 
ports such that these application keep working unmodified.

The 8th/last patch fixes the mlx4_en driver such that it has correct IPv6 link local address.

Or.

Full listing of change-history:

changes from V4:

 - addressed feedback re the need to be compatible with non modified user
   space applications/libraries, by adding code in uverbs which does address
   resolution when dealing with Ethernet ports.  

 - removed the patches that deal with uverbs extended commands, they will
   added later on, such that new applications/libraries can be coded to them.
  
changes from V3:

  - dropped the uverbs Infrastructure patch for extensions which is now upstream
    400dbc9 "IB/core: Infrastructure for extensible uverbs commands"

  - added ocrdma patch to handle Ethernet L2 parameters, similar to the mlx4 patch.
   
  - removed the assumption that the low level driver can provide the source mac
    and vlan in the struct ib_wc returned by ib_poll_cq, and adjusted the 
    ib_init_ah_from_wc helper of the IB core accordingly.

  - fixed some vlan related issues in the mlx4 driver

changes from V2:

  - added handling of IP based GIDs in the ocrdma driver - patch #5, 
    as a result patches #5-8 of V1 became patches #6-9
  
changes from V1:

 - rebased the series against the latest kernel bits, which include Sean's 
   AF_IB changes to the rdma-cm
 
 - fixed bug in mlx4_ib where reset of the gid table was done for IB ports too
 
 - fixed build warnings and issues pointed by sparse

 - introduced patch #1 which does the explicit handling of Ethernet L2 attributes, 
   source/dest mac and vlan_id in the kernel data-structures and CMA/CM code. 

 - use smac when modifying a QP --> find smac in passive side + additional fields 
   to adress structures

 - add support to new QP atrr in ib_modify_qp_is_ok() special for ll = ETH
  and modified all low-level drivers to keep working after that change

 -- changes around uverbs:
 - use ah_ext as pointer in qp_attr passed from user space, so this 
   field by itself can be extended in the future
 - for kernel to user command respnses comp_mask is moved into the 
   right place which is after the non-extended command respond fields
 - fixed bug in copy_qp_attr_ex under which some fields were copied to
   wrong locations
 - use new structure rdma_ucm_init_qp_attr_ex which is extendable (ucma)

changes from V0:

 - enhanced documentation of the mlx4_ib, uverbs and ucma patches
 - broke the mlx4_ib patch to two
 - broke the extended user space commands patch to two

Matan Barak (1):
  IB/core: Ethernet L2 attributes in verbs/cm structures

Moni Shoua (7):
  IB/CMA: IBoE (RoCE) IP based GID addressing
  IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table
  IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
  IB/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing
  IB/ocrdma: Populate GID table with IP based gids
  IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
  mlx4_en: Avoid setting netdevice dev_id to port number

 drivers/infiniband/core/addr.c                 |   97 +++++-
 drivers/infiniband/core/cm.c                   |   50 +++
 drivers/infiniband/core/cma.c                  |   74 +++-
 drivers/infiniband/core/sa_query.c             |   12 +-
 drivers/infiniband/core/ucma.c                 |   18 +-
 drivers/infiniband/core/uverbs_cmd.c           |   27 ++
 drivers/infiniband/core/verbs.c                |   43 ++-
 drivers/infiniband/hw/ehca/ehca_qp.c           |    2 +-
 drivers/infiniband/hw/ipath/ipath_qp.c         |    2 +-
 drivers/infiniband/hw/mlx4/ah.c                |   40 +--
 drivers/infiniband/hw/mlx4/cq.c                |    9 +
 drivers/infiniband/hw/mlx4/main.c              |  474 +++++++++++++++++-------
 drivers/infiniband/hw/mlx4/mlx4_ib.h           |    6 +-
 drivers/infiniband/hw/mlx4/qp.c                |  104 ++++-
 drivers/infiniband/hw/mlx5/qp.c                |    3 +-
 drivers/infiniband/hw/mthca/mthca_qp.c         |    3 +-
 drivers/infiniband/hw/ocrdma/ocrdma.h          |   12 +
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c       |    5 +-
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c       |   21 +-
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h       |    1 -
 drivers/infiniband/hw/ocrdma/ocrdma_main.c     |  138 ++-----
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    |    3 +-
 drivers/infiniband/hw/qib/qib_qp.c             |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    1 -
 drivers/net/ethernet/mellanox/mlx4/port.c      |   20 +
 include/linux/mlx4/cq.h                        |   15 +-
 include/linux/mlx4/device.h                    |    1 +
 include/rdma/ib_addr.h                         |   69 +++-
 include/rdma/ib_cm.h                           |    1 +
 include/rdma/ib_pack.h                         |    1 +
 include/rdma/ib_sa.h                           |    3 +
 include/rdma/ib_verbs.h                        |   21 +-
 32 files changed, 894 insertions(+), 384 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V5 1/8] IB/core: Ethernet L2 attributes in verbs/cm structures
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 2/8] IB/CMA: IBoE (RoCE) IP based GID addressing Or Gerlitz
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Or Gerlitz

From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new fields when
packing it into wire format, so we don't break the IB CM and SA wire protocol.

On the active side, the CM fill its internal structures from the path provided
by the ULP, added there taking the ETH L2 attributes and placing them into
the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC associated
with the REQ message, added there taking the ETH L2 attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC, they
set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core code checks
for the presence of these flags, and in their absence does address
resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some parameters
are mandatory for Ethernet link layer, while they are irrelevant for IB.
Vendor drivers are modified to support the new function signature.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/addr.c              |   97 ++++++++++++++++++++++++++-
 drivers/infiniband/core/cm.c                |   50 ++++++++++++++
 drivers/infiniband/core/cma.c               |   60 +++++++++++++++--
 drivers/infiniband/core/sa_query.c          |   12 +++-
 drivers/infiniband/core/verbs.c             |   43 +++++++++++-
 drivers/infiniband/hw/ehca/ehca_qp.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_qp.c      |    2 +-
 drivers/infiniband/hw/mlx4/qp.c             |    9 ++-
 drivers/infiniband/hw/mlx5/qp.c             |    3 +-
 drivers/infiniband/hw/mthca/mthca_qp.c      |    3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |    3 +-
 drivers/infiniband/hw/qib/qib_qp.c          |    2 +-
 include/linux/mlx4/device.h                 |    1 +
 include/rdma/ib_addr.h                      |   42 +++++++++++-
 include/rdma/ib_cm.h                        |    1 +
 include/rdma/ib_pack.h                      |    1 +
 include/rdma/ib_sa.h                        |    3 +
 include/rdma/ib_verbs.h                     |   21 +++++-
 18 files changed, 331 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index e90f2b2..8172d37 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -86,6 +86,8 @@ int rdma_addr_size(struct sockaddr *addr)
 }
 EXPORT_SYMBOL(rdma_addr_size);
 
+static struct rdma_addr_client self;
+
 void rdma_addr_register_client(struct rdma_addr_client *client)
 {
 	atomic_set(&client->refcount, 1);
@@ -119,7 +121,8 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 }
 EXPORT_SYMBOL(rdma_copy_addr);
 
-int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
+int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
+		      u16 *vlan_id)
 {
 	struct net_device *dev;
 	int ret = -EADDRNOTAVAIL;
@@ -142,6 +145,8 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
 			return ret;
 
 		ret = rdma_copy_addr(dev_addr, dev, NULL);
+		if (vlan_id)
+			*vlan_id = rdma_vlan_dev_vlan_id(dev);
 		dev_put(dev);
 		break;
 
@@ -153,6 +158,8 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
 					  &((struct sockaddr_in6 *) addr)->sin6_addr,
 					  dev, 1)) {
 				ret = rdma_copy_addr(dev_addr, dev, NULL);
+				if (vlan_id)
+					*vlan_id = rdma_vlan_dev_vlan_id(dev);
 				break;
 			}
 		}
@@ -238,7 +245,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 	src_in->sin_addr.s_addr = fl4.saddr;
 
 	if (rt->dst.dev->flags & IFF_LOOPBACK) {
-		ret = rdma_translate_ip((struct sockaddr *) dst_in, addr);
+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr, NULL);
 		if (!ret)
 			memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
 		goto put;
@@ -286,7 +293,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 	}
 
 	if (dst->dev->flags & IFF_LOOPBACK) {
-		ret = rdma_translate_ip((struct sockaddr *) dst_in, addr);
+		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr, NULL);
 		if (!ret)
 			memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
 		goto put;
@@ -437,6 +444,88 @@ void rdma_addr_cancel(struct rdma_dev_addr *addr)
 }
 EXPORT_SYMBOL(rdma_addr_cancel);
 
+struct resolve_cb_context {
+	struct rdma_dev_addr *addr;
+	struct completion comp;
+};
+
+static void resolve_cb(int status, struct sockaddr *src_addr,
+	     struct rdma_dev_addr *addr, void *context)
+{
+	memcpy(((struct resolve_cb_context *)context)->addr, addr, sizeof(struct
+				rdma_dev_addr));
+	complete(&((struct resolve_cb_context *)context)->comp);
+}
+
+int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
+			       u16 *vlan_id)
+{
+	int ret = 0;
+	struct rdma_dev_addr dev_addr;
+	struct resolve_cb_context ctx;
+	struct net_device *dev;
+
+	union {
+		struct sockaddr     _sockaddr;
+		struct sockaddr_in  _sockaddr_in;
+		struct sockaddr_in6 _sockaddr_in6;
+	} sgid_addr, dgid_addr;
+
+
+	ret = rdma_gid2ip(&sgid_addr._sockaddr, sgid);
+	if (ret)
+		return ret;
+
+	ret = rdma_gid2ip(&dgid_addr._sockaddr, dgid);
+	if (ret)
+		return ret;
+
+	memset(&dev_addr, 0, sizeof(dev_addr));
+
+	ctx.addr = &dev_addr;
+	init_completion(&ctx.comp);
+	ret = rdma_resolve_ip(&self, &sgid_addr._sockaddr, &dgid_addr._sockaddr,
+			&dev_addr, 1000, resolve_cb, &ctx);
+	if (ret)
+		return ret;
+
+	wait_for_completion(&ctx.comp);
+
+	memcpy(dmac, dev_addr.dst_dev_addr, ETH_ALEN);
+	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+	if (vlan_id)
+		*vlan_id = rdma_vlan_dev_vlan_id(dev);
+	dev_put(dev);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh);
+
+int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
+{
+	int ret = 0;
+	struct rdma_dev_addr dev_addr;
+	union {
+		struct sockaddr     _sockaddr;
+		struct sockaddr_in  _sockaddr_in;
+		struct sockaddr_in6 _sockaddr_in6;
+	} gid_addr;
+
+	ret = rdma_gid2ip(&gid_addr._sockaddr, sgid);
+
+	if (ret)
+		return ret;
+	memset(&dev_addr, 0, sizeof(dev_addr));
+	ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
+	if (ret)
+		return ret;
+
+	memcpy(smac, dev_addr.src_dev_addr, ETH_ALEN);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_addr_find_smac_by_sgid);
+
 static int netevent_callback(struct notifier_block *self, unsigned long event,
 	void *ctx)
 {
@@ -461,11 +550,13 @@ static int __init addr_init(void)
 		return -ENOMEM;
 
 	register_netevent_notifier(&nb);
+	rdma_addr_register_client(&self);
 	return 0;
 }
 
 static void __exit addr_cleanup(void)
 {
+	rdma_addr_unregister_client(&self);
 	unregister_netevent_notifier(&nb);
 	destroy_workqueue(addr_wq);
 }
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 784b97c..d596a53 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -47,6 +47,7 @@
 #include <linux/sysfs.h>
 #include <linux/workqueue.h>
 #include <linux/kdev_t.h>
+#include <linux/etherdevice.h>
 
 #include <rdma/ib_cache.h>
 #include <rdma/ib_cm.h>
@@ -177,6 +178,8 @@ struct cm_av {
 	struct ib_ah_attr ah_attr;
 	u16 pkey_index;
 	u8 timeout;
+	u8  valid;
+	u8  smac[ETH_ALEN];
 };
 
 struct cm_work {
@@ -346,6 +349,23 @@ static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
 			   grh, &av->ah_attr);
 }
 
+int ib_update_cm_av(struct ib_cm_id *id, const u8 *smac, const u8 *alt_smac)
+{
+	struct cm_id_private *cm_id_priv;
+
+	cm_id_priv = container_of(id, struct cm_id_private, id);
+
+	if (smac != NULL)
+		memcpy(cm_id_priv->av.smac, smac, sizeof(cm_id_priv->av.smac));
+
+	if (alt_smac != NULL)
+		memcpy(cm_id_priv->alt_av.smac, alt_smac,
+		       sizeof(cm_id_priv->alt_av.smac));
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_update_cm_av);
+
 static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 {
 	struct cm_device *cm_dev;
@@ -376,6 +396,9 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 	ib_init_ah_from_path(cm_dev->ib_device, port->port_num, path,
 			     &av->ah_attr);
 	av->timeout = path->packet_life_time + 1;
+	memcpy(av->smac, path->smac, sizeof(av->smac));
+
+	av->valid = 1;
 	return 0;
 }
 
@@ -1557,6 +1580,9 @@ static int cm_req_handler(struct cm_work *work)
 
 	cm_process_routed_req(req_msg, work->mad_recv_wc->wc);
 	cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]);
+
+	memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac, ETH_ALEN);
+	work->path[0].vlan_id = cm_id_priv->av.ah_attr.vlan_id;
 	ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
 	if (ret) {
 		ib_get_cached_gid(work->port->cm_dev->ib_device,
@@ -3503,6 +3529,30 @@ static int cm_init_qp_rtr_attr(struct cm_id_private *cm_id_priv,
 		*qp_attr_mask = IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU |
 				IB_QP_DEST_QPN | IB_QP_RQ_PSN;
 		qp_attr->ah_attr = cm_id_priv->av.ah_attr;
+		if (!cm_id_priv->av.valid)
+			return -EINVAL;
+		if (cm_id_priv->av.ah_attr.vlan_id != 0xffff) {
+			qp_attr->vlan_id = cm_id_priv->av.ah_attr.vlan_id;
+			*qp_attr_mask |= IB_QP_VID;
+		}
+		if (!is_zero_ether_addr(cm_id_priv->av.smac)) {
+			memcpy(qp_attr->smac, cm_id_priv->av.smac,
+			       sizeof(qp_attr->smac));
+			*qp_attr_mask |= IB_QP_SMAC;
+		}
+		if (cm_id_priv->alt_av.valid) {
+			if (cm_id_priv->alt_av.ah_attr.vlan_id != 0xffff) {
+				qp_attr->alt_vlan_id =
+					cm_id_priv->alt_av.ah_attr.vlan_id;
+				*qp_attr_mask |= IB_QP_ALT_VID;
+			}
+			if (!is_zero_ether_addr(cm_id_priv->alt_av.smac)) {
+				memcpy(qp_attr->alt_smac,
+				       cm_id_priv->alt_av.smac,
+				       sizeof(qp_attr->alt_smac));
+				*qp_attr_mask |= IB_QP_ALT_SMAC;
+			}
+		}
 		qp_attr->path_mtu = cm_id_priv->path_mtu;
 		qp_attr->dest_qp_num = be32_to_cpu(cm_id_priv->remote_qpn);
 		qp_attr->rq_psn = be32_to_cpu(cm_id_priv->rq_psn);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 830c983..45a4010 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -340,7 +340,7 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
 	int ret;
 
 	if (addr->sa_family != AF_IB) {
-		ret = rdma_translate_ip(addr, dev_addr);
+		ret = rdma_translate_ip(addr, dev_addr, NULL);
 	} else {
 		cma_translate_ib((struct sockaddr_ib *) addr, dev_addr);
 		ret = 0;
@@ -603,6 +603,7 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
 {
 	struct ib_qp_attr qp_attr;
 	int qp_attr_mask, ret;
+	union ib_gid sgid;
 
 	mutex_lock(&id_priv->qp_mutex);
 	if (!id_priv->id.qp) {
@@ -625,6 +626,20 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
 	if (ret)
 		goto out;
 
+	ret = ib_query_gid(id_priv->id.device, id_priv->id.port_num,
+			   qp_attr.ah_attr.grh.sgid_index, &sgid);
+	if (ret)
+		goto out;
+
+	if (rdma_node_get_transport(id_priv->cma_dev->device->node_type)
+	    == RDMA_TRANSPORT_IB &&
+	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
+	    == IB_LINK_LAYER_ETHERNET) {
+		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
+
+		if (ret)
+			goto out;
+	}
 	if (conn_param)
 		qp_attr.max_dest_rd_atomic = conn_param->responder_resources;
 	ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask);
@@ -725,6 +740,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
 		else
 			ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr,
 						 qp_attr_mask);
+
 		if (qp_attr->qp_state == IB_QPS_RTR)
 			qp_attr->rq_psn = id_priv->seq_num;
 		break;
@@ -1266,6 +1282,15 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 	struct rdma_id_private *listen_id, *conn_id;
 	struct rdma_cm_event event;
 	int offset, ret;
+	u8 smac[ETH_ALEN];
+	u8 alt_smac[ETH_ALEN];
+	u8 *psmac = smac;
+	u8 *palt_smac = alt_smac;
+	int is_iboe = ((rdma_node_get_transport(cm_id->device->node_type) ==
+			RDMA_TRANSPORT_IB) &&
+		       (rdma_port_get_link_layer(cm_id->device,
+			ib_event->param.req_rcvd.port) ==
+			IB_LINK_LAYER_ETHERNET));
 
 	listen_id = cm_id->context;
 	if (!cma_check_req_qp_type(&listen_id->id, ib_event))
@@ -1310,12 +1335,29 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 	if (ret)
 		goto err3;
 
+	if (is_iboe) {
+		if (ib_event->param.req_rcvd.primary_path != NULL)
+			rdma_addr_find_smac_by_sgid(
+				&ib_event->param.req_rcvd.primary_path->sgid,
+				psmac, NULL);
+		else
+			psmac = NULL;
+		if (ib_event->param.req_rcvd.alternate_path != NULL)
+			rdma_addr_find_smac_by_sgid(
+				&ib_event->param.req_rcvd.alternate_path->sgid,
+				palt_smac, NULL);
+		else
+			palt_smac = NULL;
+	}
 	/*
 	 * Acquire mutex to prevent user executing rdma_destroy_id()
 	 * while we're accessing the cm_id.
 	 */
 	mutex_lock(&lock);
-	if (cma_comp(conn_id, RDMA_CM_CONNECT) && (conn_id->id.qp_type != IB_QPT_UD))
+	if (is_iboe)
+		ib_update_cm_av(cm_id, psmac, palt_smac);
+	if (cma_comp(conn_id, RDMA_CM_CONNECT) &&
+	    (conn_id->id.qp_type != IB_QPT_UD))
 		ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0);
 	mutex_unlock(&lock);
 	mutex_unlock(&conn_id->handler_mutex);
@@ -1474,7 +1516,7 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
 	mutex_lock_nested(&conn_id->handler_mutex, SINGLE_DEPTH_NESTING);
 	conn_id->state = RDMA_CM_CONNECT;
 
-	ret = rdma_translate_ip(laddr, &conn_id->id.route.addr.dev_addr);
+	ret = rdma_translate_ip(laddr, &conn_id->id.route.addr.dev_addr, NULL);
 	if (ret) {
 		mutex_unlock(&conn_id->handler_mutex);
 		rdma_destroy_id(new_cm_id);
@@ -1853,7 +1895,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 	struct cma_work *work;
 	int ret;
 	struct net_device *ndev = NULL;
-	u16 vid;
+
 
 	work = kzalloc(sizeof *work, GFP_KERNEL);
 	if (!work)
@@ -1877,10 +1919,14 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 		goto err2;
 	}
 
-	vid = rdma_vlan_dev_vlan_id(ndev);
+	route->path_rec->vlan_id = rdma_vlan_dev_vlan_id(ndev);
+	memcpy(route->path_rec->dmac, addr->dev_addr.dst_dev_addr, ETH_ALEN);
+	memcpy(route->path_rec->smac, ndev->dev_addr, ndev->addr_len);
 
-	iboe_mac_vlan_to_ll(&route->path_rec->sgid, addr->dev_addr.src_dev_addr, vid);
-	iboe_mac_vlan_to_ll(&route->path_rec->dgid, addr->dev_addr.dst_dev_addr, vid);
+	iboe_mac_vlan_to_ll(&route->path_rec->sgid, addr->dev_addr.src_dev_addr,
+			    route->path_rec->vlan_id);
+	iboe_mac_vlan_to_ll(&route->path_rec->dgid, addr->dev_addr.dst_dev_addr,
+			    route->path_rec->vlan_id);
 
 	route->path_rec->hop_limit = 1;
 	route->path_rec->reversible = 1;
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 9838ca4..f820958 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -42,7 +42,7 @@
 #include <linux/kref.h>
 #include <linux/idr.h>
 #include <linux/workqueue.h>
-
+#include <uapi/linux/if_ether.h>
 #include <rdma/ib_pack.h>
 #include <rdma/ib_cache.h>
 #include "sa.h"
@@ -556,6 +556,13 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 		ah_attr->grh.hop_limit     = rec->hop_limit;
 		ah_attr->grh.traffic_class = rec->traffic_class;
 	}
+	if (force_grh) {
+		memcpy(ah_attr->dmac, rec->dmac, ETH_ALEN);
+		ah_attr->vlan_id = rec->vlan_id;
+	} else {
+		ah_attr->vlan_id = 0xffff;
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ib_init_ah_from_path);
@@ -670,6 +677,9 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query,
 
 		ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table),
 			  mad->data, &rec);
+		rec.vlan_id = 0xffff;
+		memset(rec.dmac, 0, ETH_ALEN);
+		memset(rec.smac, 0, ETH_ALEN);
 		query->callback(status, &rec, query->context);
 	} else
 		query->callback(status, NULL, query->context);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 84f5027..fb44350 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -44,6 +44,7 @@
 
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
+#include <rdma/ib_addr.h>
 
 int ib_rate_to_mult(enum ib_rate rate)
 {
@@ -192,8 +193,28 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 	u32 flow_class;
 	u16 gid_index;
 	int ret;
+	int is_eth = (rdma_port_get_link_layer(device, port_num) ==
+			IB_LINK_LAYER_ETHERNET);
 
 	memset(ah_attr, 0, sizeof *ah_attr);
+	if (is_eth) {
+		if (!(wc->wc_flags & IB_WC_GRH))
+			return -EPROTOTYPE;
+
+		if (wc->wc_flags & IB_WC_WITH_SMAC &&
+		    wc->wc_flags & IB_WC_WITH_VLAN) {
+			memcpy(ah_attr->dmac, wc->smac, ETH_ALEN);
+			ah_attr->vlan_id = wc->vlan_id;
+		} else {
+			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
+					ah_attr->dmac, &ah_attr->vlan_id);
+			if (ret)
+				return ret;
+		}
+	} else {
+		ah_attr->vlan_id = 0xffff;
+	}
+
 	ah_attr->dlid = wc->slid;
 	ah_attr->sl = wc->sl;
 	ah_attr->src_path_bits = wc->dlid_path_bits;
@@ -476,7 +497,9 @@ EXPORT_SYMBOL(ib_create_qp);
 static const struct {
 	int			valid;
 	enum ib_qp_attr_mask	req_param[IB_QPT_MAX];
+	enum ib_qp_attr_mask	req_param_add_eth[IB_QPT_MAX];
 	enum ib_qp_attr_mask	opt_param[IB_QPT_MAX];
+	enum ib_qp_attr_mask	opt_param_add_eth[IB_QPT_MAX];
 } qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = {
 	[IB_QPS_RESET] = {
 		[IB_QPS_RESET] = { .valid = 1 },
@@ -557,6 +580,9 @@ static const struct {
 						IB_QP_MAX_DEST_RD_ATOMIC	|
 						IB_QP_MIN_RNR_TIMER),
 			},
+			.req_param_add_eth = {
+				[IB_QPT_RC]  = (IB_QP_SMAC)
+			},
 			.opt_param = {
 				 [IB_QPT_UD]  = (IB_QP_PKEY_INDEX		|
 						 IB_QP_QKEY),
@@ -576,7 +602,12 @@ static const struct {
 						 IB_QP_QKEY),
 				 [IB_QPT_GSI] = (IB_QP_PKEY_INDEX		|
 						 IB_QP_QKEY),
-			 }
+			 },
+			.opt_param_add_eth = {
+				[IB_QPT_RC]  = (IB_QP_ALT_SMAC			|
+						IB_QP_VID			|
+						IB_QP_ALT_VID)
+			}
 		}
 	},
 	[IB_QPS_RTR]   = {
@@ -779,7 +810,8 @@ static const struct {
 };
 
 int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum ib_qp_state next_state,
-		       enum ib_qp_type type, enum ib_qp_attr_mask mask)
+		       enum ib_qp_type type, enum ib_qp_attr_mask mask,
+		       enum rdma_link_layer ll)
 {
 	enum ib_qp_attr_mask req_param, opt_param;
 
@@ -798,6 +830,13 @@ int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum ib_qp_state next_state,
 	req_param = qp_state_table[cur_state][next_state].req_param[type];
 	opt_param = qp_state_table[cur_state][next_state].opt_param[type];
 
+	if (ll == IB_LINK_LAYER_ETHERNET) {
+		req_param |= qp_state_table[cur_state][next_state].
+			req_param_add_eth[type];
+		opt_param |= qp_state_table[cur_state][next_state].
+			opt_param_add_eth[type];
+	}
+
 	if ((mask & req_param) != req_param)
 		return 0;
 
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 00d6861..2e89356 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -1329,7 +1329,7 @@ static int internal_modify_qp(struct ib_qp *ibqp,
 	qp_new_state = attr_mask & IB_QP_STATE ? attr->qp_state : qp_cur_state;
 	if (!smi_reset2init &&
 	    !ib_modify_qp_is_ok(qp_cur_state, qp_new_state, ibqp->qp_type,
-				attr_mask)) {
+				attr_mask, IB_LINK_LAYER_UNSPECIFIED)) {
 		ret = -EINVAL;
 		ehca_err(ibqp->device,
 			 "Invalid qp transition new_state=%x cur_state=%x "
diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c
index 0857a9c..face876 100644
--- a/drivers/infiniband/hw/ipath/ipath_qp.c
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c
@@ -463,7 +463,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
 
 	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type,
-				attr_mask))
+				attr_mask, IB_LINK_LAYER_UNSPECIFIED))
 		goto inval;
 
 	if (attr_mask & IB_QP_AV) {
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 4f10af2..da6f5fa 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1561,13 +1561,18 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	struct mlx4_ib_qp *qp = to_mqp(ibqp);
 	enum ib_qp_state cur_state, new_state;
 	int err = -EINVAL;
-
+	int p = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
 	mutex_lock(&qp->mutex);
 
 	cur_state = attr_mask & IB_QP_CUR_STATE ? attr->cur_qp_state : qp->state;
 	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
 
-	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask)) {
+	if (cur_state == new_state && cur_state == IB_QPS_RESET)
+		p = IB_LINK_LAYER_UNSPECIFIED;
+
+	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type,
+				attr_mask,
+				rdma_port_get_link_layer(&dev->ib_dev, p))) {
 		pr_debug("qpn 0x%x: invalid attribute mask specified "
 			 "for transition %d to %d. qp_type %d,"
 			 " attr_mask 0x%x\n",
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 7c6b4ba..ca29362 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1616,7 +1616,8 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
 
 	if (ibqp->qp_type != MLX5_IB_QPT_REG_UMR &&
-	    !ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask))
+	    !ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask,
+				IB_LINK_LAYER_UNSPECIFIED))
 		goto out;
 
 	if ((attr_mask & IB_QP_PORT) &&
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 26a6845..e354b2f 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -860,7 +860,8 @@ int mthca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask,
 
 	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
 
-	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask)) {
+	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type, attr_mask,
+				IB_LINK_LAYER_UNSPECIFIED)) {
 		mthca_dbg(dev, "Bad QP transition (transport %d) "
 			  "%d->%d with attr 0x%08x\n",
 			  qp->transport, cur_state, new_state,
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 7686dce..a0f1c47 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1326,7 +1326,8 @@ int ocrdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		new_qps = old_qps;
 	spin_unlock_irqrestore(&qp->q_lock, flags);
 
-	if (!ib_modify_qp_is_ok(old_qps, new_qps, ibqp->qp_type, attr_mask)) {
+	if (!ib_modify_qp_is_ok(old_qps, new_qps, ibqp->qp_type, attr_mask,
+				IB_LINK_LAYER_UNSPECIFIED)) {
 		pr_err("%s(%d) invalid attribute mask=0x%x specified for\n"
 		       "qpn=0x%x of type=0x%x old_qps=0x%x, new_qps=0x%x\n",
 		       __func__, dev->id, attr_mask, qp->id, ibqp->qp_type,
diff --git a/drivers/infiniband/hw/qib/qib_qp.c b/drivers/infiniband/hw/qib/qib_qp.c
index 3cca55b..0cad0c4 100644
--- a/drivers/infiniband/hw/qib/qib_qp.c
+++ b/drivers/infiniband/hw/qib/qib_qp.c
@@ -585,7 +585,7 @@ int qib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
 
 	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type,
-				attr_mask))
+				attr_mask, IB_LINK_LAYER_UNSPECIFIED))
 		goto inval;
 
 	if (attr_mask & IB_QP_AV) {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 24ce6bd..321a788 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1076,6 +1076,7 @@ int mlx4_SET_PORT_qpn_calc(struct mlx4_dev *dev, u8 port, u32 base_qpn,
 int mlx4_SET_PORT_PRIO2TC(struct mlx4_dev *dev, u8 port, u8 *prio2tc);
 int mlx4_SET_PORT_SCHEDULER(struct mlx4_dev *dev, u8 port, u8 *tc_tx_bw,
 		u8 *pg, u16 *ratelimit);
+int mlx4_find_cached_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *idx);
 int mlx4_find_cached_vlan(struct mlx4_dev *dev, u8 port, u16 vid, int *idx);
 int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index);
 void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index);
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index f3ac0f2..a071560 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -42,6 +42,7 @@
 #include <linux/if_vlan.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
+#include <net/ipv6.h>
 
 struct rdma_addr_client {
 	atomic_t refcount;
@@ -72,7 +73,8 @@ struct rdma_dev_addr {
  * rdma_translate_ip - Translate a local IP address to an RDMA hardware
  *   address.
  */
-int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr);
+int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
+		      u16 *vlan_id);
 
 /**
  * rdma_resolve_ip - Resolve source and destination IP addresses to
@@ -104,6 +106,10 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 
 int rdma_addr_size(struct sockaddr *addr);
 
+int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id);
+int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *smac,
+			       u16 *vlan_id);
+
 static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
 {
 	return ((u16)dev_addr->broadcast[8] << 8) | (u16)dev_addr->broadcast[9];
@@ -142,6 +148,40 @@ static inline void iboe_mac_vlan_to_ll(union ib_gid *gid, u8 *mac, u16 vid)
 	gid->raw[8] ^= 2;
 }
 
+static inline int rdma_ip2gid(struct sockaddr *addr, union ib_gid *gid)
+{
+	switch (addr->sa_family) {
+	case AF_INET:
+		ipv6_addr_set_v4mapped(((struct sockaddr_in *)
+					addr)->sin_addr.s_addr,
+				       (struct in6_addr *)gid);
+		break;
+	case AF_INET6:
+		memcpy(gid->raw, &((struct sockaddr_in6 *)addr)->sin6_addr, 16);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Important - sockaddr should be a union of sockaddr_in and sockaddr_in6 */
+static inline int rdma_gid2ip(struct sockaddr *out, union ib_gid *gid)
+{
+	if (ipv6_addr_v4mapped((struct in6_addr *)gid)) {
+		struct sockaddr_in *out_in = (struct sockaddr_in *)out;
+		memset(out_in, 0, sizeof(*out_in));
+		out_in->sin_family = AF_INET;
+		memcpy(&out_in->sin_addr.s_addr, gid->raw + 12, 4);
+	} else {
+		struct sockaddr_in6 *out_in = (struct sockaddr_in6 *)out;
+		memset(out_in, 0, sizeof(*out_in));
+		out_in->sin6_family = AF_INET6;
+		memcpy(&out_in->sin6_addr.s6_addr, gid->raw, 16);
+	}
+	return 0;
+}
+
 static inline u16 rdma_vlan_dev_vlan_id(const struct net_device *dev)
 {
 	return dev->priv_flags & IFF_802_1Q_VLAN ?
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 0e3ff30..f29e3a2 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -601,4 +601,5 @@ struct ib_cm_sidr_rep_param {
 int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
 			struct ib_cm_sidr_rep_param *param);
 
+int ib_update_cm_av(struct ib_cm_id *id, const u8 *smac, const u8 *alt_smac);
 #endif /* IB_CM_H */
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index b37fe3b..b1f7592 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -34,6 +34,7 @@
 #define IB_PACK_H
 
 #include <rdma/ib_verbs.h>
+#include <uapi/linux/if_ether.h>
 
 enum {
 	IB_LRH_BYTES  = 8,
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 125f871..7e071a6 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -154,6 +154,9 @@ struct ib_sa_path_rec {
 	u8           packet_life_time_selector;
 	u8           packet_life_time;
 	u8           preference;
+	u8           smac[ETH_ALEN];
+	u8           dmac[ETH_ALEN];
+	u16	     vlan_id;
 };
 
 #define IB_SA_MCMEMBER_REC_MGID				IB_SA_COMP_MASK( 0)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 60354d5..b87cc4d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -48,6 +48,7 @@
 #include <linux/rwsem.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <uapi/linux/if_ether.h>
 
 #include <linux/atomic.h>
 #include <asm/uaccess.h>
@@ -472,6 +473,8 @@ struct ib_ah_attr {
 	u8			static_rate;
 	u8			ah_flags;
 	u8			port_num;
+	u8			dmac[ETH_ALEN];
+	u16			vlan_id;
 };
 
 enum ib_wc_status {
@@ -524,6 +527,8 @@ enum ib_wc_flags {
 	IB_WC_WITH_IMM		= (1<<1),
 	IB_WC_WITH_INVALIDATE	= (1<<2),
 	IB_WC_IP_CSUM_OK	= (1<<3),
+	IB_WC_WITH_SMAC		= (1<<4),
+	IB_WC_WITH_VLAN		= (1<<5),
 };
 
 struct ib_wc {
@@ -544,6 +549,8 @@ struct ib_wc {
 	u8			sl;
 	u8			dlid_path_bits;
 	u8			port_num;	/* valid only for DR SMPs on switches */
+	u8			smac[ETH_ALEN];
+	u16			vlan_id;
 };
 
 enum ib_cq_notify_flags {
@@ -721,7 +728,11 @@ enum ib_qp_attr_mask {
 	IB_QP_MAX_DEST_RD_ATOMIC	= (1<<17),
 	IB_QP_PATH_MIG_STATE		= (1<<18),
 	IB_QP_CAP			= (1<<19),
-	IB_QP_DEST_QPN			= (1<<20)
+	IB_QP_DEST_QPN			= (1<<20),
+	IB_QP_SMAC			= (1<<21),
+	IB_QP_ALT_SMAC			= (1<<22),
+	IB_QP_VID			= (1<<23),
+	IB_QP_ALT_VID			= (1<<24),
 };
 
 enum ib_qp_state {
@@ -771,6 +782,10 @@ struct ib_qp_attr {
 	u8			rnr_retry;
 	u8			alt_port_num;
 	u8			alt_timeout;
+	u8			smac[ETH_ALEN];
+	u8			alt_smac[ETH_ALEN];
+	u16			vlan_id;
+	u16			alt_vlan_id;
 };
 
 enum ib_wr_opcode {
@@ -1487,6 +1502,7 @@ static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, size_t len
  * @next_state: Next QP state
  * @type: QP type
  * @mask: Mask of supplied QP attributes
+ * @ll : link layer of port
  *
  * This function is a helper function that a low-level driver's
  * modify_qp method can use to validate the consumer's input.  It
@@ -1495,7 +1511,8 @@ static inline int ib_copy_to_udata(struct ib_udata *udata, void *src, size_t len
  * and that the attribute mask supplied is allowed for the transition.
  */
 int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum ib_qp_state next_state,
-		       enum ib_qp_type type, enum ib_qp_attr_mask mask);
+		       enum ib_qp_type type, enum ib_qp_attr_mask mask,
+		       enum rdma_link_layer ll);
 
 int ib_register_event_handler  (struct ib_event_handler *event_handler);
 int ib_unregister_event_handler(struct ib_event_handler *event_handler);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 2/8] IB/CMA: IBoE (RoCE) IP based GID addressing
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-11-13 22:29   ` [PATCH V5 1/8] IB/core: Ethernet L2 attributes in verbs/cm structures Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 3/8] IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table Or Gerlitz
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Or Gerlitz, Moni Shoua

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently, the IB core and specifically the RDMA-CM assumes that
IBoE (RoCE) gids encode related Ethernet netdevice interface
MAC address and possibly VLAN id.

Change gids to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, had to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
---
 drivers/infiniband/core/cma.c  |   22 ++++++++++++----------
 drivers/infiniband/core/ucma.c |   18 ++++--------------
 include/rdma/ib_addr.h         |   35 ++++++++++++-----------------------
 3 files changed, 28 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 45a4010..86adf07 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -365,7 +365,9 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 		return -EINVAL;
 
 	mutex_lock(&lock);
-	iboe_addr_get_sgid(dev_addr, &iboe_gid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
+		    &iboe_gid);
+
 	memcpy(&gid, dev_addr->src_dev_addr +
 	       rdma_addr_gid_offset(dev_addr), sizeof gid);
 	if (listen_id_priv &&
@@ -1923,10 +1925,10 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 	memcpy(route->path_rec->dmac, addr->dev_addr.dst_dev_addr, ETH_ALEN);
 	memcpy(route->path_rec->smac, ndev->dev_addr, ndev->addr_len);
 
-	iboe_mac_vlan_to_ll(&route->path_rec->sgid, addr->dev_addr.src_dev_addr,
-			    route->path_rec->vlan_id);
-	iboe_mac_vlan_to_ll(&route->path_rec->dgid, addr->dev_addr.dst_dev_addr,
-			    route->path_rec->vlan_id);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
+		    &route->path_rec->sgid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.dst_addr,
+		    &route->path_rec->dgid);
 
 	route->path_rec->hop_limit = 1;
 	route->path_rec->reversible = 1;
@@ -2093,6 +2095,7 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 			   RDMA_CM_ADDR_RESOLVED))
 		goto out;
 
+	memcpy(cma_src_addr(id_priv), src_addr, rdma_addr_size(src_addr));
 	if (!status && !id_priv->cma_dev)
 		status = cma_acquire_dev(id_priv, NULL);
 
@@ -2102,10 +2105,8 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 			goto out;
 		event.event = RDMA_CM_EVENT_ADDR_ERROR;
 		event.status = status;
-	} else {
-		memcpy(cma_src_addr(id_priv), src_addr, rdma_addr_size(src_addr));
+	} else
 		event.event = RDMA_CM_EVENT_ADDR_RESOLVED;
-	}
 
 	if (id_priv->id.event_handler(&id_priv->id, &event)) {
 		cma_exch(id_priv, RDMA_CM_DESTROYING);
@@ -2586,6 +2587,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	if (ret)
 		goto err1;
 
+	memcpy(cma_src_addr(id_priv), addr, rdma_addr_size(addr));
 	if (!cma_any_addr(addr)) {
 		ret = cma_translate_addr(addr, &id->route.addr.dev_addr);
 		if (ret)
@@ -2596,7 +2598,6 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 			goto err1;
 	}
 
-	memcpy(cma_src_addr(id_priv), addr, rdma_addr_size(addr));
 	if (!(id_priv->options & (1 << CMA_OPTION_AFONLY))) {
 		if (addr->sa_family == AF_INET)
 			id_priv->afonly = 1;
@@ -3325,7 +3326,8 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 		err = -EINVAL;
 		goto out2;
 	}
-	iboe_addr_get_sgid(dev_addr, &mc->multicast.ib->rec.port_gid);
+	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
+		    &mc->multicast.ib->rec.port_gid);
 	work->id = id_priv;
 	work->mc = mc;
 	INIT_WORK(&work->work, iboe_mcast_work_handler);
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 826016b..5443d33 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -655,24 +655,14 @@ static void ucma_copy_ib_route(struct rdma_ucm_query_route_resp *resp,
 static void ucma_copy_iboe_route(struct rdma_ucm_query_route_resp *resp,
 				 struct rdma_route *route)
 {
-	struct rdma_dev_addr *dev_addr;
-	struct net_device *dev;
-	u16 vid = 0;
 
 	resp->num_paths = route->num_paths;
 	switch (route->num_paths) {
 	case 0:
-		dev_addr = &route->addr.dev_addr;
-		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
-			if (dev) {
-				vid = rdma_vlan_dev_vlan_id(dev);
-				dev_put(dev);
-			}
-
-		iboe_mac_vlan_to_ll((union ib_gid *) &resp->ib_route[0].dgid,
-				    dev_addr->dst_dev_addr, vid);
-		iboe_addr_get_sgid(dev_addr,
-				   (union ib_gid *) &resp->ib_route[0].sgid);
+		rdma_ip2gid((struct sockaddr *)&route->addr.dst_addr,
+			    (union ib_gid *)&resp->ib_route[0].dgid);
+		rdma_ip2gid((struct sockaddr *)&route->addr.src_addr,
+			    (union ib_gid *)&resp->ib_route[0].sgid);
 		resp->ib_route[0].pkey = cpu_to_be16(0xffff);
 		break;
 	case 2:
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index a071560..ce55906 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -38,8 +38,12 @@
 #include <linux/in6.h>
 #include <linux/if_arp.h>
 #include <linux/netdevice.h>
+#include <linux/inetdevice.h>
 #include <linux/socket.h>
 #include <linux/if_vlan.h>
+#include <net/ipv6.h>
+#include <net/if_inet6.h>
+#include <net/ip.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
 #include <net/ipv6.h>
@@ -132,20 +136,10 @@ static inline int rdma_addr_gid_offset(struct rdma_dev_addr *dev_addr)
 	return dev_addr->dev_type == ARPHRD_INFINIBAND ? 4 : 0;
 }
 
-static inline void iboe_mac_vlan_to_ll(union ib_gid *gid, u8 *mac, u16 vid)
+static inline u16 rdma_vlan_dev_vlan_id(const struct net_device *dev)
 {
-	memset(gid->raw, 0, 16);
-	*((__be32 *) gid->raw) = cpu_to_be32(0xfe800000);
-	if (vid < 0x1000) {
-		gid->raw[12] = vid & 0xff;
-		gid->raw[11] = vid >> 8;
-	} else {
-		gid->raw[12] = 0xfe;
-		gid->raw[11] = 0xff;
-	}
-	memcpy(gid->raw + 13, mac + 3, 3);
-	memcpy(gid->raw + 8, mac, 3);
-	gid->raw[8] ^= 2;
+	return dev->priv_flags & IFF_802_1Q_VLAN ?
+		vlan_dev_vlan_id(dev) : 0xffff;
 }
 
 static inline int rdma_ip2gid(struct sockaddr *addr, union ib_gid *gid)
@@ -182,25 +176,20 @@ static inline int rdma_gid2ip(struct sockaddr *out, union ib_gid *gid)
 	return 0;
 }
 
-static inline u16 rdma_vlan_dev_vlan_id(const struct net_device *dev)
-{
-	return dev->priv_flags & IFF_802_1Q_VLAN ?
-		vlan_dev_vlan_id(dev) : 0xffff;
-}
-
 static inline void iboe_addr_get_sgid(struct rdma_dev_addr *dev_addr,
 				      union ib_gid *gid)
 {
 	struct net_device *dev;
-	u16 vid = 0xffff;
+	struct in_device *ip4;
 
 	dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
 	if (dev) {
-		vid = rdma_vlan_dev_vlan_id(dev);
+		ip4 = (struct in_device *)dev->ip_ptr;
+		if (ip4 && ip4->ifa_list && ip4->ifa_list->ifa_address)
+			ipv6_addr_set_v4mapped(ip4->ifa_list->ifa_address,
+					       (struct in6_addr *)gid);
 		dev_put(dev);
 	}
-
-	iboe_mac_vlan_to_ll(gid, dev_addr->src_dev_addr, vid);
 }
 
 static inline void rdma_addr_get_sgid(struct rdma_dev_addr *dev_addr, union ib_gid *gid)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 3/8] IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-11-13 22:29   ` [PATCH V5 1/8] IB/core: Ethernet L2 attributes in verbs/cm structures Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 2/8] IB/CMA: IBoE (RoCE) IP based GID addressing Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 4/8] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing Or Gerlitz
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Currently, the mlx4 driver set IBoE (RoCE) gids to encode related
Ethernet netdevice interface MAC address and possibly VLAN id.

Change this scheme such that gids encode interface IP addresses
(both IP4 and IPv6).

This requires learning which are the IP addresses which are of use
by a netdevice associated with the HCA port, formatting them to gids
and adding them to the port gid table. Further, events of add and
delete address are caught to maintain the gid table accordingly.

Associated IP addresses may belong to a master of an Ethernet netdevice
on top of that port so this should be considered when building and
maintaining the gid table.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |  474 ++++++++++++++++++++++++----------
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    3 +
 2 files changed, 334 insertions(+), 143 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index f061264..c5ecec2 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -39,6 +39,8 @@
 #include <linux/inetdevice.h>
 #include <linux/rtnetlink.h>
 #include <linux/if_vlan.h>
+#include <net/ipv6.h>
+#include <net/addrconf.h>
 
 #include <rdma/ib_smi.h>
 #include <rdma/ib_user_verbs.h>
@@ -790,7 +792,6 @@ static int add_gid_entry(struct ib_qp *ibqp, union ib_gid *gid)
 int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 		   union ib_gid *gid)
 {
-	u8 mac[6];
 	struct net_device *ndev;
 	int ret = 0;
 
@@ -804,11 +805,7 @@ int mlx4_ib_add_mc(struct mlx4_ib_dev *mdev, struct mlx4_ib_qp *mqp,
 	spin_unlock(&mdev->iboe.lock);
 
 	if (ndev) {
-		rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-		rtnl_lock();
-		dev_mc_add(mdev->iboe.netdevs[mqp->port - 1], mac);
 		ret = 1;
-		rtnl_unlock();
 		dev_put(ndev);
 	}
 
@@ -1031,6 +1028,8 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	struct mlx4_ib_qp *mqp = to_mqp(ibqp);
 	u64 reg_id;
 	struct mlx4_ib_steering *ib_steering = NULL;
+	enum mlx4_protocol prot = (gid->raw[1] == 0x0e) ?
+		MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
 	if (mdev->dev->caps.steering_mode ==
 	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -1042,7 +1041,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	err = mlx4_multicast_attach(mdev->dev, &mqp->mqp, gid->raw, mqp->port,
 				    !!(mqp->flags &
 				       MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
-				    MLX4_PROT_IB_IPV6, &reg_id);
+				    prot, &reg_id);
 	if (err)
 		goto err_malloc;
 
@@ -1061,7 +1060,7 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 
 err_add:
 	mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw,
-			      MLX4_PROT_IB_IPV6, reg_id);
+			      prot, reg_id);
 err_malloc:
 	kfree(ib_steering);
 
@@ -1089,10 +1088,11 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	int err;
 	struct mlx4_ib_dev *mdev = to_mdev(ibqp->device);
 	struct mlx4_ib_qp *mqp = to_mqp(ibqp);
-	u8 mac[6];
 	struct net_device *ndev;
 	struct mlx4_ib_gid_entry *ge;
 	u64 reg_id = 0;
+	enum mlx4_protocol prot = (gid->raw[1] == 0x0e) ?
+		MLX4_PROT_IB_IPV4 : MLX4_PROT_IB_IPV6;
 
 	if (mdev->dev->caps.steering_mode ==
 	    MLX4_STEERING_MODE_DEVICE_MANAGED) {
@@ -1115,7 +1115,7 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 	}
 
 	err = mlx4_multicast_detach(mdev->dev, &mqp->mqp, gid->raw,
-				    MLX4_PROT_IB_IPV6, reg_id);
+				    prot, reg_id);
 	if (err)
 		return err;
 
@@ -1127,13 +1127,8 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 		if (ndev)
 			dev_hold(ndev);
 		spin_unlock(&mdev->iboe.lock);
-		rdma_get_mcast_mac((struct in6_addr *)gid, mac);
-		if (ndev) {
-			rtnl_lock();
-			dev_mc_del(mdev->iboe.netdevs[ge->port - 1], mac);
-			rtnl_unlock();
+		if (ndev)
 			dev_put(ndev);
-		}
 		list_del(&ge->list);
 		kfree(ge);
 	} else
@@ -1229,20 +1224,6 @@ static struct device_attribute *mlx4_class_attributes[] = {
 	&dev_attr_board_id
 };
 
-static void mlx4_addrconf_ifid_eui48(u8 *eui, u16 vlan_id, struct net_device *dev)
-{
-	memcpy(eui, dev->dev_addr, 3);
-	memcpy(eui + 5, dev->dev_addr + 3, 3);
-	if (vlan_id < 0x1000) {
-		eui[3] = vlan_id >> 8;
-		eui[4] = vlan_id & 0xff;
-	} else {
-		eui[3] = 0xff;
-		eui[4] = 0xfe;
-	}
-	eui[0] ^= 2;
-}
-
 static void update_gids_task(struct work_struct *work)
 {
 	struct update_gid_work *gw = container_of(work, struct update_gid_work, work);
@@ -1265,161 +1246,318 @@ static void update_gids_task(struct work_struct *work)
 		       MLX4_CMD_WRAPPED);
 	if (err)
 		pr_warn("set port command failed\n");
-	else {
-		memcpy(gw->dev->iboe.gid_table[gw->port - 1], gw->gids, sizeof gw->gids);
+	else
 		mlx4_ib_dispatch_event(gw->dev, gw->port, IB_EVENT_GID_CHANGE);
-	}
 
 	mlx4_free_cmd_mailbox(dev, mailbox);
 	kfree(gw);
 }
 
-static int update_ipv6_gids(struct mlx4_ib_dev *dev, int port, int clear)
+static void reset_gids_task(struct work_struct *work)
 {
-	struct net_device *ndev = dev->iboe.netdevs[port - 1];
-	struct update_gid_work *work;
-	struct net_device *tmp;
+	struct update_gid_work *gw =
+			container_of(work, struct update_gid_work, work);
+	struct mlx4_cmd_mailbox *mailbox;
+	union ib_gid *gids;
+	int err;
 	int i;
-	u8 *hits;
-	int ret;
-	union ib_gid gid;
-	int free;
-	int found;
-	int need_update = 0;
-	u16 vid;
+	struct mlx4_dev	*dev = gw->dev->dev;
 
-	work = kzalloc(sizeof *work, GFP_ATOMIC);
-	if (!work)
-		return -ENOMEM;
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox)) {
+		pr_warn("reset gid table failed\n");
+		goto free;
+	}
 
-	hits = kzalloc(128, GFP_ATOMIC);
-	if (!hits) {
-		ret = -ENOMEM;
-		goto out;
+	gids = mailbox->buf;
+	memcpy(gids, gw->gids, sizeof(gw->gids));
+
+	for (i = 1; i < gw->dev->num_ports + 1; i++) {
+		if (mlx4_ib_port_link_layer(&gw->dev->ib_dev, i) ==
+					    IB_LINK_LAYER_ETHERNET) {
+			err = mlx4_cmd(dev, mailbox->dma,
+				       MLX4_SET_PORT_GID_TABLE << 8 | i,
+				       1, MLX4_CMD_SET_PORT,
+				       MLX4_CMD_TIME_CLASS_B,
+				       MLX4_CMD_WRAPPED);
+			if (err)
+				pr_warn(KERN_WARNING
+					"set port %d command failed\n", i);
+		}
 	}
 
-	rcu_read_lock();
-	for_each_netdev_rcu(&init_net, tmp) {
-		if (ndev && (tmp == ndev || rdma_vlan_dev_real_dev(tmp) == ndev)) {
-			gid.global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
-			vid = rdma_vlan_dev_vlan_id(tmp);
-			mlx4_addrconf_ifid_eui48(&gid.raw[8], vid, ndev);
-			found = 0;
-			free = -1;
-			for (i = 0; i < 128; ++i) {
-				if (free < 0 &&
-				    !memcmp(&dev->iboe.gid_table[port - 1][i], &zgid, sizeof zgid))
-					free = i;
-				if (!memcmp(&dev->iboe.gid_table[port - 1][i], &gid, sizeof gid)) {
-					hits[i] = 1;
-					found = 1;
-					break;
-				}
-			}
+	mlx4_free_cmd_mailbox(dev, mailbox);
+free:
+	kfree(gw);
+}
 
-			if (!found) {
-				if (tmp == ndev &&
-				    (memcmp(&dev->iboe.gid_table[port - 1][0],
-					    &gid, sizeof gid) ||
-				     !memcmp(&dev->iboe.gid_table[port - 1][0],
-					     &zgid, sizeof gid))) {
-					dev->iboe.gid_table[port - 1][0] = gid;
-					++need_update;
-					hits[0] = 1;
-				} else if (free >= 0) {
-					dev->iboe.gid_table[port - 1][free] = gid;
-					hits[free] = 1;
-					++need_update;
-				}
+static int update_gid_table(struct mlx4_ib_dev *dev, int port,
+			    union ib_gid *gid, int clear)
+{
+	struct update_gid_work *work;
+	int i;
+	int need_update = 0;
+	int free = -1;
+	int found = -1;
+	int max_gids;
+
+	max_gids = dev->dev->caps.gid_table_len[port];
+	for (i = 0; i < max_gids; ++i) {
+		if (!memcmp(&dev->iboe.gid_table[port - 1][i], gid,
+			    sizeof(*gid)))
+			found = i;
+
+		if (clear) {
+			if (found >= 0) {
+				need_update = 1;
+				dev->iboe.gid_table[port - 1][found] = zgid;
+				break;
 			}
+		} else {
+			if (found >= 0)
+				break;
+
+			if (free < 0 &&
+			    !memcmp(&dev->iboe.gid_table[port - 1][i], &zgid,
+				    sizeof(*gid)))
+				free = i;
 		}
 	}
-	rcu_read_unlock();
 
-	for (i = 0; i < 128; ++i)
-		if (!hits[i]) {
-			if (memcmp(&dev->iboe.gid_table[port - 1][i], &zgid, sizeof zgid))
-				++need_update;
-			dev->iboe.gid_table[port - 1][i] = zgid;
-		}
+	if (found == -1 && !clear && free >= 0) {
+		dev->iboe.gid_table[port - 1][free] = *gid;
+		need_update = 1;
+	}
 
-	if (need_update) {
-		memcpy(work->gids, dev->iboe.gid_table[port - 1], sizeof work->gids);
-		INIT_WORK(&work->work, update_gids_task);
-		work->port = port;
-		work->dev = dev;
-		queue_work(wq, &work->work);
-	} else
-		kfree(work);
+	if (!need_update)
+		return 0;
+
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return -ENOMEM;
+
+	memcpy(work->gids, dev->iboe.gid_table[port - 1], sizeof(work->gids));
+	INIT_WORK(&work->work, update_gids_task);
+	work->port = port;
+	work->dev = dev;
+	queue_work(wq, &work->work);
 
-	kfree(hits);
 	return 0;
+}
 
-out:
-	kfree(work);
-	return ret;
+static int reset_gid_table(struct mlx4_ib_dev *dev)
+{
+	struct update_gid_work *work;
+
+
+	work = kzalloc(sizeof(*work), GFP_ATOMIC);
+	if (!work)
+		return -ENOMEM;
+	memset(dev->iboe.gid_table, 0, sizeof(dev->iboe.gid_table));
+	memset(work->gids, 0, sizeof(work->gids));
+	INIT_WORK(&work->work, reset_gids_task);
+	work->dev = dev;
+	queue_work(wq, &work->work);
+	return 0;
 }
 
-static void handle_en_event(struct mlx4_ib_dev *dev, int port, unsigned long event)
+static int mlx4_ib_addr_event(int event, struct net_device *event_netdev,
+			      struct mlx4_ib_dev *ibdev, union ib_gid *gid)
 {
-	switch (event) {
-	case NETDEV_UP:
-	case NETDEV_CHANGEADDR:
-		update_ipv6_gids(dev, port, 0);
-		break;
+	struct mlx4_ib_iboe *iboe;
+	int port = 0;
+	struct net_device *real_dev = rdma_vlan_dev_real_dev(event_netdev) ?
+				rdma_vlan_dev_real_dev(event_netdev) :
+				event_netdev;
+
+	if (event != NETDEV_DOWN && event != NETDEV_UP)
+		return 0;
+
+	if ((real_dev != event_netdev) &&
+	    (event == NETDEV_DOWN) &&
+	    rdma_link_local_addr((struct in6_addr *)gid))
+		return 0;
+
+	iboe = &ibdev->iboe;
+	spin_lock(&iboe->lock);
+
+	for (port = 1; port <= MLX4_MAX_PORTS; ++port)
+		if ((netif_is_bond_master(real_dev) &&
+		     (real_dev == iboe->masters[port - 1])) ||
+		     (!netif_is_bond_master(real_dev) &&
+		     (real_dev == iboe->netdevs[port - 1])))
+			update_gid_table(ibdev, port, gid,
+					 event == NETDEV_DOWN);
+
+	spin_unlock(&iboe->lock);
+	return 0;
 
-	case NETDEV_DOWN:
-		update_ipv6_gids(dev, port, 1);
-		dev->iboe.netdevs[port - 1] = NULL;
-	}
 }
 
-static void netdev_added(struct mlx4_ib_dev *dev, int port)
+static u8 mlx4_ib_get_dev_port(struct net_device *dev,
+			       struct mlx4_ib_dev *ibdev)
 {
-	update_ipv6_gids(dev, port, 0);
+	u8 port = 0;
+	struct mlx4_ib_iboe *iboe;
+	struct net_device *real_dev = rdma_vlan_dev_real_dev(dev) ?
+				rdma_vlan_dev_real_dev(dev) : dev;
+
+	iboe = &ibdev->iboe;
+	spin_lock(&iboe->lock);
+
+	for (port = 1; port <= MLX4_MAX_PORTS; ++port)
+		if ((netif_is_bond_master(real_dev) &&
+		     (real_dev == iboe->masters[port - 1])) ||
+		     (!netif_is_bond_master(real_dev) &&
+		     (real_dev == iboe->netdevs[port - 1])))
+			break;
+
+	spin_unlock(&iboe->lock);
+
+	if ((port == 0) || (port > MLX4_MAX_PORTS))
+		return 0;
+	else
+		return port;
 }
 
-static void netdev_removed(struct mlx4_ib_dev *dev, int port)
+static int mlx4_ib_inet_event(struct notifier_block *this, unsigned long event,
+				void *ptr)
 {
-	update_ipv6_gids(dev, port, 1);
+	struct mlx4_ib_dev *ibdev;
+	struct in_ifaddr *ifa = ptr;
+	union ib_gid gid;
+	struct net_device *event_netdev = ifa->ifa_dev->dev;
+
+	ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
+
+	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb_inet);
+
+	mlx4_ib_addr_event(event, event_netdev, ibdev, &gid);
+	return NOTIFY_DONE;
 }
 
-static int mlx4_ib_netdev_event(struct notifier_block *this, unsigned long event,
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+static int mlx4_ib_inet6_event(struct notifier_block *this, unsigned long event,
 				void *ptr)
 {
-	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	struct mlx4_ib_dev *ibdev;
-	struct net_device *oldnd;
+	struct inet6_ifaddr *ifa = ptr;
+	union  ib_gid *gid = (union ib_gid *)&ifa->addr;
+	struct net_device *event_netdev = ifa->idev->dev;
+
+	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb_inet6);
+
+	mlx4_ib_addr_event(event, event_netdev, ibdev, gid);
+	return NOTIFY_DONE;
+}
+#endif
+
+static void mlx4_ib_get_dev_addr(struct net_device *dev,
+				 struct mlx4_ib_dev *ibdev, u8 port)
+{
+	struct in_device *in_dev;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	struct inet6_dev *in6_dev;
+	union ib_gid  *pgid;
+	struct inet6_ifaddr *ifp;
+#endif
+	union ib_gid gid;
+
+
+	if ((port == 0) || (port > MLX4_MAX_PORTS))
+		return;
+
+	/* IPv4 gids */
+	in_dev = in_dev_get(dev);
+	if (in_dev) {
+		for_ifa(in_dev) {
+			/*ifa->ifa_address;*/
+			ipv6_addr_set_v4mapped(ifa->ifa_address,
+					       (struct in6_addr *)&gid);
+			update_gid_table(ibdev, port, &gid, 0);
+		}
+		endfor_ifa(in_dev);
+		in_dev_put(in_dev);
+	}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	/* IPv6 gids */
+	in6_dev = in6_dev_get(dev);
+	if (in6_dev) {
+		read_lock_bh(&in6_dev->lock);
+		list_for_each_entry(ifp, &in6_dev->addr_list, if_list) {
+			pgid = (union ib_gid *)&ifp->addr;
+			update_gid_table(ibdev, port, pgid, 0);
+		}
+		read_unlock_bh(&in6_dev->lock);
+		in6_dev_put(in6_dev);
+	}
+#endif
+}
+
+static int mlx4_ib_init_gid_table(struct mlx4_ib_dev *ibdev)
+{
+	struct	net_device *dev;
+
+	if (reset_gid_table(ibdev))
+		return -1;
+
+	read_lock(&dev_base_lock);
+
+	for_each_netdev(&init_net, dev) {
+		u8 port = mlx4_ib_get_dev_port(dev, ibdev);
+		if (port)
+			mlx4_ib_get_dev_addr(dev, ibdev, port);
+	}
+
+	read_unlock(&dev_base_lock);
+
+	return 0;
+}
+
+static void mlx4_ib_scan_netdevs(struct mlx4_ib_dev *ibdev)
+{
 	struct mlx4_ib_iboe *iboe;
 	int port;
 
-	if (!net_eq(dev_net(dev), &init_net))
-		return NOTIFY_DONE;
-
-	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb);
 	iboe = &ibdev->iboe;
 
 	spin_lock(&iboe->lock);
 	mlx4_foreach_ib_transport_port(port, ibdev->dev) {
-		oldnd = iboe->netdevs[port - 1];
+		struct net_device *old_master = iboe->masters[port - 1];
+		struct net_device *curr_master;
 		iboe->netdevs[port - 1] =
 			mlx4_get_protocol_dev(ibdev->dev, MLX4_PROT_ETH, port);
-		if (oldnd != iboe->netdevs[port - 1]) {
-			if (iboe->netdevs[port - 1])
-				netdev_added(ibdev, port);
-			else
-				netdev_removed(ibdev, port);
+
+		if (iboe->netdevs[port - 1] &&
+		    netif_is_bond_slave(iboe->netdevs[port - 1])) {
+			rtnl_lock();
+			iboe->masters[port - 1] = netdev_master_upper_dev_get(
+				iboe->netdevs[port - 1]);
+			rtnl_unlock();
 		}
-	}
+		curr_master = iboe->masters[port - 1];
 
-	if (dev == iboe->netdevs[0] ||
-	    (iboe->netdevs[0] && rdma_vlan_dev_real_dev(dev) == iboe->netdevs[0]))
-		handle_en_event(ibdev, 1, event);
-	else if (dev == iboe->netdevs[1]
-		 || (iboe->netdevs[1] && rdma_vlan_dev_real_dev(dev) == iboe->netdevs[1]))
-		handle_en_event(ibdev, 2, event);
+		/* if bonding is used it is possible that we add it to masters
+		    only after IP address is assigned to the net bonding
+		    interface */
+		if (curr_master && (old_master != curr_master))
+			mlx4_ib_get_dev_addr(curr_master, ibdev, port);
+	}
 
 	spin_unlock(&iboe->lock);
+}
+
+static int mlx4_ib_netdev_event(struct notifier_block *this,
+				unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mlx4_ib_dev *ibdev;
+
+	if (!net_eq(dev_net(dev), &init_net))
+		return NOTIFY_DONE;
+
+	ibdev = container_of(this, struct mlx4_ib_dev, iboe.nb);
+	mlx4_ib_scan_netdevs(ibdev);
 
 	return NOTIFY_DONE;
 }
@@ -1727,11 +1865,35 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	if (mlx4_ib_init_sriov(ibdev))
 		goto err_mad;
 
-	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_IBOE && !iboe->nb.notifier_call) {
-		iboe->nb.notifier_call = mlx4_ib_netdev_event;
-		err = register_netdevice_notifier(&iboe->nb);
-		if (err)
-			goto err_sriov;
+	if (dev->caps.flags & MLX4_DEV_CAP_FLAG_IBOE) {
+		if (!iboe->nb.notifier_call) {
+			iboe->nb.notifier_call = mlx4_ib_netdev_event;
+			err = register_netdevice_notifier(&iboe->nb);
+			if (err) {
+				iboe->nb.notifier_call = NULL;
+				goto err_notif;
+			}
+		}
+		if (!iboe->nb_inet.notifier_call) {
+			iboe->nb_inet.notifier_call = mlx4_ib_inet_event;
+			err = register_inetaddr_notifier(&iboe->nb_inet);
+			if (err) {
+				iboe->nb_inet.notifier_call = NULL;
+				goto err_notif;
+			}
+		}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+		if (!iboe->nb_inet6.notifier_call) {
+			iboe->nb_inet6.notifier_call = mlx4_ib_inet6_event;
+			err = register_inet6addr_notifier(&iboe->nb_inet6);
+			if (err) {
+				iboe->nb_inet6.notifier_call = NULL;
+				goto err_notif;
+			}
+		}
+#endif
+		mlx4_ib_scan_netdevs(ibdev);
+		mlx4_ib_init_gid_table(ibdev);
 	}
 
 	for (j = 0; j < ARRAY_SIZE(mlx4_class_attributes); ++j) {
@@ -1757,11 +1919,25 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	return ibdev;
 
 err_notif:
-	if (unregister_netdevice_notifier(&ibdev->iboe.nb))
-		pr_warn("failure unregistering notifier\n");
+	if (ibdev->iboe.nb.notifier_call) {
+		if (unregister_netdevice_notifier(&ibdev->iboe.nb))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb.notifier_call = NULL;
+	}
+	if (ibdev->iboe.nb_inet.notifier_call) {
+		if (unregister_inetaddr_notifier(&ibdev->iboe.nb_inet))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet.notifier_call = NULL;
+	}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	if (ibdev->iboe.nb_inet6.notifier_call) {
+		if (unregister_inet6addr_notifier(&ibdev->iboe.nb_inet6))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet6.notifier_call = NULL;
+	}
+#endif
 	flush_workqueue(wq);
 
-err_sriov:
 	mlx4_ib_close_sriov(ibdev);
 
 err_mad:
@@ -1803,6 +1979,18 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 			pr_warn("failure unregistering notifier\n");
 		ibdev->iboe.nb.notifier_call = NULL;
 	}
+	if (ibdev->iboe.nb_inet.notifier_call) {
+		if (unregister_inetaddr_notifier(&ibdev->iboe.nb_inet))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet.notifier_call = NULL;
+	}
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	if (ibdev->iboe.nb_inet6.notifier_call) {
+		if (unregister_inet6addr_notifier(&ibdev->iboe.nb_inet6))
+			pr_warn("failure unregistering notifier\n");
+		ibdev->iboe.nb_inet6.notifier_call = NULL;
+	}
+#endif
 	iounmap(ibdev->uar_map);
 	for (p = 0; p < ibdev->num_ports; ++p)
 		if (ibdev->counters[p] != -1)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 036b663..133f41f 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -428,7 +428,10 @@ struct mlx4_ib_sriov {
 struct mlx4_ib_iboe {
 	spinlock_t		lock;
 	struct net_device      *netdevs[MLX4_MAX_PORTS];
+	struct net_device      *masters[MLX4_MAX_PORTS];
 	struct notifier_block 	nb;
+	struct notifier_block	nb_inet;
+	struct notifier_block	nb_inet6;
 	union ib_gid		gid_table[MLX4_MAX_PORTS][128];
 };
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 4/8] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-11-13 22:29   ` [PATCH V5 3/8] IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 5/8] IB/ocrdma: " Or Gerlitz
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.

Hence, we need to extract them now from the CQE and place in struct
ib_wc (to be used for cases were they were taken from the gid).

Also, when modifying a QP or building address handle, instead of
parsing the dgid to get the MAC and VLAN, take them from the
address handle attributes.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/ah.c           |   40 +++---------
 drivers/infiniband/hw/mlx4/cq.c           |    9 +++
 drivers/infiniband/hw/mlx4/mlx4_ib.h      |    3 -
 drivers/infiniband/hw/mlx4/qp.c           |  105 ++++++++++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx4/port.c |   20 ++++++
 include/linux/mlx4/cq.h                   |   15 +++-
 6 files changed, 130 insertions(+), 62 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index a251bec..170dca6 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -39,25 +39,6 @@
 
 #include "mlx4_ib.h"
 
-int mlx4_ib_resolve_grh(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah_attr,
-			u8 *mac, int *is_mcast, u8 port)
-{
-	struct in6_addr in6;
-
-	*is_mcast = 0;
-
-	memcpy(&in6, ah_attr->grh.dgid.raw, sizeof in6);
-	if (rdma_link_local_addr(&in6))
-		rdma_get_ll_mac(&in6, mac);
-	else if (rdma_is_multicast_addr(&in6)) {
-		rdma_get_mcast_mac(&in6, mac);
-		*is_mcast = 1;
-	} else
-		return -EINVAL;
-
-	return 0;
-}
-
 static struct ib_ah *create_ib_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
 				  struct mlx4_ib_ah *ah)
 {
@@ -92,21 +73,18 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
 {
 	struct mlx4_ib_dev *ibdev = to_mdev(pd->device);
 	struct mlx4_dev *dev = ibdev->dev;
-	union ib_gid sgid;
-	u8 mac[6];
-	int err;
 	int is_mcast;
+	struct in6_addr in6;
 	u16 vlan_tag;
 
-	err = mlx4_ib_resolve_grh(ibdev, ah_attr, mac, &is_mcast, ah_attr->port_num);
-	if (err)
-		return ERR_PTR(err);
-
-	memcpy(ah->av.eth.mac, mac, 6);
-	err = ib_get_cached_gid(pd->device, ah_attr->port_num, ah_attr->grh.sgid_index, &sgid);
-	if (err)
-		return ERR_PTR(err);
-	vlan_tag = rdma_get_vlan_id(&sgid);
+	memcpy(&in6, ah_attr->grh.dgid.raw, sizeof(in6));
+	if (rdma_is_multicast_addr(&in6)) {
+		is_mcast = 1;
+		rdma_get_mcast_mac(&in6, ah->av.eth.mac);
+	} else {
+		memcpy(ah->av.eth.mac, ah_attr->dmac, ETH_ALEN);
+	}
+	vlan_tag = ah_attr->vlan_id;
 	if (vlan_tag < 0x1000)
 		vlan_tag |= (ah_attr->sl & 7) << 13;
 	ah->av.eth.port_pd = cpu_to_be32(to_mpd(pd)->pdn | (ah_attr->port_num << 24));
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d5e60f4..5f6113b 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -793,6 +793,15 @@ repoll:
 			wc->sl  = be16_to_cpu(cqe->sl_vid) >> 13;
 		else
 			wc->sl  = be16_to_cpu(cqe->sl_vid) >> 12;
+		if (be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_VLAN_PRESENT_MASK) {
+			wc->vlan_id = be16_to_cpu(cqe->sl_vid) &
+				MLX4_CQE_VID_MASK;
+		} else {
+			wc->vlan_id = 0xffff;
+		}
+		wc->wc_flags |= IB_WC_WITH_VLAN;
+		memcpy(wc->smac, cqe->smac, ETH_ALEN);
+		wc->wc_flags |= IB_WC_WITH_SMAC;
 	}
 
 	return 0;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 133f41f..c06f571 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -678,9 +678,6 @@ int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
 int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
 			union ib_gid *gid, int netw_view);
 
-int mlx4_ib_resolve_grh(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah_attr,
-			u8 *mac, int *is_mcast, u8 port);
-
 static inline bool mlx4_ib_ah_grh_present(struct mlx4_ib_ah *ah)
 {
 	u8 port = be32_to_cpu(ah->av.ib.port_pd) >> 24 & 3;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index da6f5fa..e0c2186 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -90,6 +90,21 @@ enum {
 	MLX4_RAW_QP_MSGMAX	= 31,
 };
 
+#ifndef ETH_ALEN
+#define ETH_ALEN        6
+#endif
+static inline u64 mlx4_mac_to_u64(u8 *addr)
+{
+	u64 mac = 0;
+	int i;
+
+	for (i = 0; i < ETH_ALEN; i++) {
+		mac <<= 8;
+		mac |= addr[i];
+	}
+	return mac;
+}
+
 static const __be32 mlx4_ib_opcode[] = {
 	[IB_WR_SEND]				= cpu_to_be32(MLX4_OPCODE_SEND),
 	[IB_WR_LSO]				= cpu_to_be32(MLX4_OPCODE_LSO),
@@ -1144,16 +1159,15 @@ static void mlx4_set_sched(struct mlx4_qp_path *path, u8 port)
 	path->sched_queue = (path->sched_queue & 0xbf) | ((port - 1) << 6);
 }
 
-static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
-			 struct mlx4_qp_path *path, u8 port)
+static int _mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
+			  u64 smac, u16 vlan_tag, struct mlx4_qp_path *path,
+			  u8 port)
 {
-	int err;
 	int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port) ==
 		IB_LINK_LAYER_ETHERNET;
-	u8 mac[6];
-	int is_mcast;
-	u16 vlan_tag;
 	int vidx;
+	int smac_index;
+
 
 	path->grh_mylmc     = ah->src_path_bits & 0x7f;
 	path->rlid	    = cpu_to_be16(ah->dlid);
@@ -1188,22 +1202,27 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 		if (!(ah->ah_flags & IB_AH_GRH))
 			return -1;
 
-		err = mlx4_ib_resolve_grh(dev, ah, mac, &is_mcast, port);
-		if (err)
-			return err;
-
-		memcpy(path->dmac, mac, 6);
+		memcpy(path->dmac, ah->dmac, ETH_ALEN);
 		path->ackto = MLX4_IB_LINK_TYPE_ETH;
-		/* use index 0 into MAC table for IBoE */
-		path->grh_mylmc &= 0x80;
+		/* find the index  into MAC table for IBoE */
+		if (!is_zero_ether_addr((const u8 *)&smac)) {
+			if (mlx4_find_cached_mac(dev->dev, port, smac,
+						 &smac_index))
+				return -ENOENT;
+		} else {
+			smac_index = 0;
+		}
+
+		path->grh_mylmc &= 0x80 | smac_index;
 
-		vlan_tag = rdma_get_vlan_id(&dev->iboe.gid_table[port - 1][ah->grh.sgid_index]);
+		path->feup |= MLX4_FEUP_FORCE_ETH_UP;
 		if (vlan_tag < 0x1000) {
 			if (mlx4_find_cached_vlan(dev->dev, port, vlan_tag, &vidx))
 				return -ENOENT;
 
 			path->vlan_index = vidx;
 			path->fl = 1 << 6;
+			path->feup |= MLX4_FVL_FORCE_ETH_VLAN;
 		}
 	} else
 		path->sched_queue = MLX4_IB_DEFAULT_SCHED_QUEUE |
@@ -1212,6 +1231,28 @@ static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_ah_attr *ah,
 	return 0;
 }
 
+static int mlx4_set_path(struct mlx4_ib_dev *dev, const struct ib_qp_attr *qp,
+			 enum ib_qp_attr_mask qp_attr_mask,
+			 struct mlx4_qp_path *path, u8 port)
+{
+	return _mlx4_set_path(dev, &qp->ah_attr,
+			      mlx4_mac_to_u64((u8 *)qp->smac),
+			      (qp_attr_mask & IB_QP_VID) ? qp->vlan_id : 0xffff,
+			      path, port);
+}
+
+static int mlx4_set_alt_path(struct mlx4_ib_dev *dev,
+			     const struct ib_qp_attr *qp,
+			     enum ib_qp_attr_mask qp_attr_mask,
+			     struct mlx4_qp_path *path, u8 port)
+{
+	return _mlx4_set_path(dev, &qp->alt_ah_attr,
+			      mlx4_mac_to_u64((u8 *)qp->alt_smac),
+			      (qp_attr_mask & IB_QP_ALT_VID) ?
+			      qp->alt_vlan_id : 0xffff,
+			      path, port);
+}
+
 static void update_mcg_macs(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp)
 {
 	struct mlx4_ib_gid_entry *ge, *tmp;
@@ -1329,7 +1370,7 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 	}
 
 	if (attr_mask & IB_QP_AV) {
-		if (mlx4_set_path(dev, &attr->ah_attr, &context->pri_path,
+		if (mlx4_set_path(dev, attr, attr_mask, &context->pri_path,
 				  attr_mask & IB_QP_PORT ?
 				  attr->port_num : qp->port))
 			goto out;
@@ -1352,8 +1393,8 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		    dev->dev->caps.pkey_table_len[attr->alt_port_num])
 			goto out;
 
-		if (mlx4_set_path(dev, &attr->alt_ah_attr, &context->alt_path,
-				  attr->alt_port_num))
+		if (mlx4_set_alt_path(dev, attr, attr_mask, &context->alt_path,
+				      attr->alt_port_num))
 			goto out;
 
 		context->alt_path.pkey_index = attr->alt_pkey_index;
@@ -1464,6 +1505,17 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		context->pri_path.ackto = (context->pri_path.ackto & 0xf8) |
 					MLX4_IB_LINK_TYPE_ETH;
 
+	if (ibqp->qp_type == IB_QPT_UD && (new_state == IB_QPS_RTR)) {
+		int is_eth = rdma_port_get_link_layer(
+				&dev->ib_dev, qp->port) ==
+				IB_LINK_LAYER_ETHERNET;
+		if (is_eth) {
+			context->pri_path.ackto = MLX4_IB_LINK_TYPE_ETH;
+			optpar |= MLX4_QP_OPTPAR_PRIMARY_ADDR_PATH;
+		}
+	}
+
+
 	if (cur_state == IB_QPS_RTS && new_state == IB_QPS_SQD	&&
 	    attr_mask & IB_QP_EN_SQD_ASYNC_NOTIFY && attr->en_sqd_async_notify)
 		sqd_event = 1;
@@ -1561,18 +1613,21 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	struct mlx4_ib_qp *qp = to_mqp(ibqp);
 	enum ib_qp_state cur_state, new_state;
 	int err = -EINVAL;
-	int p = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
+	int ll;
 	mutex_lock(&qp->mutex);
 
 	cur_state = attr_mask & IB_QP_CUR_STATE ? attr->cur_qp_state : qp->state;
 	new_state = attr_mask & IB_QP_STATE ? attr->qp_state : cur_state;
 
-	if (cur_state == new_state && cur_state == IB_QPS_RESET)
-		p = IB_LINK_LAYER_UNSPECIFIED;
+	if (cur_state == new_state && cur_state == IB_QPS_RESET) {
+		ll = IB_LINK_LAYER_UNSPECIFIED;
+	} else {
+		int port = attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
+		ll = rdma_port_get_link_layer(&dev->ib_dev, port);
+	}
 
 	if (!ib_modify_qp_is_ok(cur_state, new_state, ibqp->qp_type,
-				attr_mask,
-				rdma_port_get_link_layer(&dev->ib_dev, p))) {
+				attr_mask, ll)) {
 		pr_debug("qpn 0x%x: invalid attribute mask specified "
 			 "for transition %d to %d. qp_type %d,"
 			 " attr_mask 0x%x\n",
@@ -1789,8 +1844,10 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
 				return err;
 		}
 
-		vlan = rdma_get_vlan_id(&sgid);
-		is_vlan = vlan < 0x1000;
+		if (ah->av.eth.vlan != 0xffff) {
+			vlan = be16_to_cpu(ah->av.eth.vlan) & 0x0fff;
+			is_vlan = 1;
+		}
 	}
 	ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh, 0, &sqp->ud_header);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 946e0af..fbe76f9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -123,6 +123,26 @@ static int mlx4_set_port_mac_table(struct mlx4_dev *dev, u8 port,
 	return err;
 }
 
+int mlx4_find_cached_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *idx)
+{
+	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
+	struct mlx4_mac_table *table = &info->mac_table;
+	int i;
+
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		if (!table->refs[i])
+			continue;
+
+		if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) {
+			*idx = i;
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+EXPORT_SYMBOL_GPL(mlx4_find_cached_mac);
+
 int __mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac)
 {
 	struct mlx4_port_info *info = &mlx4_priv(dev)->port[port];
diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h
index 98fa492..e186299 100644
--- a/include/linux/mlx4/cq.h
+++ b/include/linux/mlx4/cq.h
@@ -34,6 +34,7 @@
 #define MLX4_CQ_H
 
 #include <linux/types.h>
+#include <uapi/linux/if_ether.h>
 
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/doorbell.h>
@@ -43,10 +44,15 @@ struct mlx4_cqe {
 	__be32			immed_rss_invalid;
 	__be32			g_mlpath_rqpn;
 	__be16			sl_vid;
-	__be16			rlid;
-	__be16			status;
-	u8			ipv6_ext_mask;
-	u8			badfcs_enc;
+	union {
+		struct {
+			__be16	rlid;
+			__be16  status;
+			u8      ipv6_ext_mask;
+			u8      badfcs_enc;
+		};
+		u8  smac[ETH_ALEN];
+	};
 	__be32			byte_cnt;
 	__be16			wqe_index;
 	__be16			checksum;
@@ -83,6 +89,7 @@ struct mlx4_ts_cqe {
 enum {
 	MLX4_CQE_VLAN_PRESENT_MASK	= 1 << 29,
 	MLX4_CQE_QPN_MASK		= 0xffffff,
+	MLX4_CQE_VID_MASK		= 0xfff,
 };
 
 enum {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 5/8] IB/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2013-11-13 22:29   ` [PATCH V5 4/8] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 6/8] IB/ocrdma: Populate GID table with IP based gids Or Gerlitz
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Naresh Gottumukkala, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch is similar in spirit to the "IB/mlx4: Handle Ethernet L2 parameters for
IP based GID addressing". It handles the fact that IP based RoCE gids
don't store Ethernet L2 parameters, MAC and VLAN.

When building an address handle, instead of parsing the dgid to
get the MAC and VLAN, take them from the address handle attributes.

Cc: Naresh Gottumukkala <bgottumukkala-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/ocrdma/ocrdma.h    |   12 ++++++++++++
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |    5 +++--
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c |   21 ++-------------------
 drivers/infiniband/hw/ocrdma/ocrdma_hw.h |    1 -
 4 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
index 294dd27..7c001b9 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -423,5 +423,17 @@ static inline int is_cqe_wr_imm(struct ocrdma_cqe *cqe)
 		OCRDMA_CQE_WRITE_IMM) ? 1 : 0;
 }
 
+static inline int ocrdma_resolve_dmac(struct ocrdma_dev *dev,
+		struct ib_ah_attr *ah_attr, u8 *mac_addr)
+{
+	struct in6_addr in6;
+
+	memcpy(&in6, ah_attr->grh.dgid.raw, sizeof(in6));
+	if (rdma_is_multicast_addr(&in6))
+		rdma_get_mcast_mac(&in6, mac_addr);
+	else
+		memcpy(mac_addr, ah_attr->dmac, ETH_ALEN);
+	return 0;
+}
 
 #endif
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index ee499d9..bbb7962 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -49,7 +49,7 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
 
 	ah->sgid_index = attr->grh.sgid_index;
 
-	vlan_tag = rdma_get_vlan_id(&attr->grh.dgid);
+	vlan_tag = attr->vlan_id;
 	if (!vlan_tag || (vlan_tag > 0xFFF))
 		vlan_tag = dev->pvid;
 	if (vlan_tag && (vlan_tag < 0x1000)) {
@@ -64,7 +64,8 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
 		eth_sz = sizeof(struct ocrdma_eth_basic);
 	}
 	memcpy(&eth.smac[0], &dev->nic_info.mac_addr[0], ETH_ALEN);
-	status = ocrdma_resolve_dgid(dev, &attr->grh.dgid, &eth.dmac[0]);
+	memcpy(&eth.dmac[0], attr->dmac, ETH_ALEN);
+	status = ocrdma_resolve_dmac(dev, attr, &eth.dmac[0]);
 	if (status)
 		return status;
 	status = ocrdma_query_gid(&dev->ibdev, 1, attr->grh.sgid_index,
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
index 56bf32f..1664d64 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.c
@@ -2076,23 +2076,6 @@ mbx_err:
 	return status;
 }
 
-int ocrdma_resolve_dgid(struct ocrdma_dev *dev, union ib_gid *dgid,
-			u8 *mac_addr)
-{
-	struct in6_addr in6;
-
-	memcpy(&in6, dgid, sizeof in6);
-	if (rdma_is_multicast_addr(&in6)) {
-		rdma_get_mcast_mac(&in6, mac_addr);
-	} else if (rdma_link_local_addr(&in6)) {
-		rdma_get_ll_mac(&in6, mac_addr);
-	} else {
-		pr_err("%s() fail to resolve mac_addr.\n", __func__);
-		return -EINVAL;
-	}
-	return 0;
-}
-
 static int ocrdma_set_av_params(struct ocrdma_qp *qp,
 				struct ocrdma_modify_qp *cmd,
 				struct ib_qp_attr *attrs)
@@ -2126,14 +2109,14 @@ static int ocrdma_set_av_params(struct ocrdma_qp *qp,
 
 	qp->sgid_idx = ah_attr->grh.sgid_index;
 	memcpy(&cmd->params.sgid[0], &sgid.raw[0], sizeof(cmd->params.sgid));
-	ocrdma_resolve_dgid(qp->dev, &ah_attr->grh.dgid, &mac_addr[0]);
+	ocrdma_resolve_dmac(qp->dev, ah_attr, &mac_addr[0]);
 	cmd->params.dmac_b0_to_b3 = mac_addr[0] | (mac_addr[1] << 8) |
 				(mac_addr[2] << 16) | (mac_addr[3] << 24);
 	/* convert them to LE format. */
 	ocrdma_cpu_to_le32(&cmd->params.dgid[0], sizeof(cmd->params.dgid));
 	ocrdma_cpu_to_le32(&cmd->params.sgid[0], sizeof(cmd->params.sgid));
 	cmd->params.vlan_dmac_b4_to_b5 = mac_addr[4] | (mac_addr[5] << 8);
-	vlan_id = rdma_get_vlan_id(&sgid);
+	vlan_id = ah_attr->vlan_id;
 	if (vlan_id && (vlan_id < 0x1000)) {
 		cmd->params.vlan_dmac_b4_to_b5 |=
 		    vlan_id << OCRDMA_QP_PARAMS_VLAN_SHIFT;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_hw.h b/drivers/infiniband/hw/ocrdma/ocrdma_hw.h
index f2a89d4..82fe332 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_hw.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_hw.h
@@ -94,7 +94,6 @@ void ocrdma_ring_cq_db(struct ocrdma_dev *, u16 cq_id, bool armed,
 int ocrdma_mbx_get_link_speed(struct ocrdma_dev *dev, u8 *lnk_speed);
 int ocrdma_query_config(struct ocrdma_dev *,
 			struct ocrdma_mbx_query_config *config);
-int ocrdma_resolve_dgid(struct ocrdma_dev *, union ib_gid *dgid, u8 *mac_addr);
 
 int ocrdma_mbx_alloc_pd(struct ocrdma_dev *, struct ocrdma_pd *);
 int ocrdma_mbx_dealloc_pd(struct ocrdma_dev *, struct ocrdma_pd *);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 6/8] IB/ocrdma: Populate GID table with IP based gids
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2013-11-13 22:29   ` [PATCH V5 5/8] IB/ocrdma: " Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-13 22:29   ` [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP Or Gerlitz
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Naresh Gottumukkala, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

This patch is similar in spirit to the "IB/mlx4: Use IBoE (RoCE) IP based GIDs
in the port GID table" patch.

Changes to inet4 and inet6 addresses for the host are monitored and if the
address is associated with an ocrdma device then a gid is added or deleted
from the device's gid table. The gid format will be a IPv4 to IPv6 mapped or
the IPv6 address.

Cc: Naresh Gottumukkala <bgottumukkala-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |  138 ++++++++-------------------
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |    2 +-
 2 files changed, 41 insertions(+), 99 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 91443bc..47187bf 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -67,46 +67,24 @@ void ocrdma_get_guid(struct ocrdma_dev *dev, u8 *guid)
 	guid[7] = mac_addr[5];
 }
 
-static void ocrdma_build_sgid_mac(union ib_gid *sgid, unsigned char *mac_addr,
-				  bool is_vlan, u16 vlan_id)
-{
-	sgid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
-	sgid->raw[8] = mac_addr[0] ^ 2;
-	sgid->raw[9] = mac_addr[1];
-	sgid->raw[10] = mac_addr[2];
-	if (is_vlan) {
-		sgid->raw[11] = vlan_id >> 8;
-		sgid->raw[12] = vlan_id & 0xff;
-	} else {
-		sgid->raw[11] = 0xff;
-		sgid->raw[12] = 0xfe;
-	}
-	sgid->raw[13] = mac_addr[3];
-	sgid->raw[14] = mac_addr[4];
-	sgid->raw[15] = mac_addr[5];
-}
-
-static bool ocrdma_add_sgid(struct ocrdma_dev *dev, unsigned char *mac_addr,
-			    bool is_vlan, u16 vlan_id)
+static bool ocrdma_add_sgid(struct ocrdma_dev *dev, union ib_gid *new_sgid)
 {
 	int i;
-	union ib_gid new_sgid;
 	unsigned long flags;
 
 	memset(&ocrdma_zero_sgid, 0, sizeof(union ib_gid));
 
-	ocrdma_build_sgid_mac(&new_sgid, mac_addr, is_vlan, vlan_id);
 
 	spin_lock_irqsave(&dev->sgid_lock, flags);
 	for (i = 0; i < OCRDMA_MAX_SGID; i++) {
 		if (!memcmp(&dev->sgid_tbl[i], &ocrdma_zero_sgid,
 			    sizeof(union ib_gid))) {
 			/* found free entry */
-			memcpy(&dev->sgid_tbl[i], &new_sgid,
+			memcpy(&dev->sgid_tbl[i], new_sgid,
 			       sizeof(union ib_gid));
 			spin_unlock_irqrestore(&dev->sgid_lock, flags);
 			return true;
-		} else if (!memcmp(&dev->sgid_tbl[i], &new_sgid,
+		} else if (!memcmp(&dev->sgid_tbl[i], new_sgid,
 				   sizeof(union ib_gid))) {
 			/* entry already present, no addition is required. */
 			spin_unlock_irqrestore(&dev->sgid_lock, flags);
@@ -117,20 +95,17 @@ static bool ocrdma_add_sgid(struct ocrdma_dev *dev, unsigned char *mac_addr,
 	return false;
 }
 
-static bool ocrdma_del_sgid(struct ocrdma_dev *dev, unsigned char *mac_addr,
-			    bool is_vlan, u16 vlan_id)
+static bool ocrdma_del_sgid(struct ocrdma_dev *dev, union ib_gid *sgid)
 {
 	int found = false;
 	int i;
-	union ib_gid sgid;
 	unsigned long flags;
 
-	ocrdma_build_sgid_mac(&sgid, mac_addr, is_vlan, vlan_id);
 
 	spin_lock_irqsave(&dev->sgid_lock, flags);
 	/* first is default sgid, which cannot be deleted. */
 	for (i = 1; i < OCRDMA_MAX_SGID; i++) {
-		if (!memcmp(&dev->sgid_tbl[i], &sgid, sizeof(union ib_gid))) {
+		if (!memcmp(&dev->sgid_tbl[i], sgid, sizeof(union ib_gid))) {
 			/* found matching entry */
 			memset(&dev->sgid_tbl[i], 0, sizeof(union ib_gid));
 			found = true;
@@ -141,75 +116,18 @@ static bool ocrdma_del_sgid(struct ocrdma_dev *dev, unsigned char *mac_addr,
 	return found;
 }
 
-static void ocrdma_add_default_sgid(struct ocrdma_dev *dev)
-{
-	/* GID Index 0 - Invariant manufacturer-assigned EUI-64 */
-	union ib_gid *sgid = &dev->sgid_tbl[0];
-
-	sgid->global.subnet_prefix = cpu_to_be64(0xfe80000000000000LL);
-	ocrdma_get_guid(dev, &sgid->raw[8]);
-}
-
-#if IS_ENABLED(CONFIG_VLAN_8021Q)
-static void ocrdma_add_vlan_sgids(struct ocrdma_dev *dev)
-{
-	struct net_device *netdev, *tmp;
-	u16 vlan_id;
-	bool is_vlan;
-
-	netdev = dev->nic_info.netdev;
-
-	rcu_read_lock();
-	for_each_netdev_rcu(&init_net, tmp) {
-		if (netdev == tmp || vlan_dev_real_dev(tmp) == netdev) {
-			if (!netif_running(tmp) || !netif_oper_up(tmp))
-				continue;
-			if (netdev != tmp) {
-				vlan_id = vlan_dev_vlan_id(tmp);
-				is_vlan = true;
-			} else {
-				is_vlan = false;
-				vlan_id = 0;
-				tmp = netdev;
-			}
-			ocrdma_add_sgid(dev, tmp->dev_addr, is_vlan, vlan_id);
-		}
-	}
-	rcu_read_unlock();
-}
-#else
-static void ocrdma_add_vlan_sgids(struct ocrdma_dev *dev)
-{
-
-}
-#endif /* VLAN */
-
-static int ocrdma_build_sgid_tbl(struct ocrdma_dev *dev)
+static int ocrdma_addr_event(unsigned long event, struct net_device *netdev,
+			     union ib_gid *gid)
 {
-	ocrdma_add_default_sgid(dev);
-	ocrdma_add_vlan_sgids(dev);
-	return 0;
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-
-static int ocrdma_inet6addr_event(struct notifier_block *notifier,
-				  unsigned long event, void *ptr)
-{
-	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
-	struct net_device *netdev = ifa->idev->dev;
 	struct ib_event gid_event;
 	struct ocrdma_dev *dev;
 	bool found = false;
 	bool updated = false;
 	bool is_vlan = false;
-	u16 vid = 0;
 
 	is_vlan = netdev->priv_flags & IFF_802_1Q_VLAN;
-	if (is_vlan) {
-		vid = vlan_dev_vlan_id(netdev);
+	if (is_vlan)
 		netdev = vlan_dev_real_dev(netdev);
-	}
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(dev, &ocrdma_dev_list, entry) {
@@ -222,16 +140,14 @@ static int ocrdma_inet6addr_event(struct notifier_block *notifier,
 
 	if (!found)
 		return NOTIFY_DONE;
-	if (!rdma_link_local_addr((struct in6_addr *)&ifa->addr))
-		return NOTIFY_DONE;
 
 	mutex_lock(&dev->dev_lock);
 	switch (event) {
 	case NETDEV_UP:
-		updated = ocrdma_add_sgid(dev, netdev->dev_addr, is_vlan, vid);
+		updated = ocrdma_add_sgid(dev, gid);
 		break;
 	case NETDEV_DOWN:
-		updated = ocrdma_del_sgid(dev, netdev->dev_addr, is_vlan, vid);
+		updated = ocrdma_del_sgid(dev, gid);
 		break;
 	default:
 		break;
@@ -247,6 +163,32 @@ static int ocrdma_inet6addr_event(struct notifier_block *notifier,
 	return NOTIFY_OK;
 }
 
+static int ocrdma_inetaddr_event(struct notifier_block *notifier,
+				  unsigned long event, void *ptr)
+{
+	struct in_ifaddr *ifa = ptr;
+	union ib_gid gid;
+	struct net_device *netdev = ifa->ifa_dev->dev;
+
+	ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
+	return ocrdma_addr_event(event, netdev, &gid);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+
+static int ocrdma_inet6addr_event(struct notifier_block *notifier,
+				  unsigned long event, void *ptr)
+{
+	struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
+	union  ib_gid *gid = (union ib_gid *)&ifa->addr;
+	struct net_device *netdev = ifa->idev->dev;
+	return ocrdma_addr_event(event, netdev, gid);
+}
+
+static struct notifier_block ocrdma_inetaddr_notifier = {
+	.notifier_call = ocrdma_inetaddr_event
+};
+
 static struct notifier_block ocrdma_inet6addr_notifier = {
 	.notifier_call = ocrdma_inet6addr_event
 };
@@ -423,10 +365,6 @@ static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info)
 	if (status)
 		goto alloc_err;
 
-	status = ocrdma_build_sgid_tbl(dev);
-	if (status)
-		goto alloc_err;
-
 	status = ocrdma_register_device(dev);
 	if (status)
 		goto alloc_err;
@@ -553,6 +491,10 @@ static int __init ocrdma_init_module(void)
 {
 	int status;
 
+	status = register_inetaddr_notifier(&ocrdma_inetaddr_notifier);
+	if (status)
+		return status;
+
 #if IS_ENABLED(CONFIG_IPV6)
 	status = register_inet6addr_notifier(&ocrdma_inet6addr_notifier);
 	if (status)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index a0f1c47..aa92f40 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -1327,7 +1327,7 @@ int ocrdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	spin_unlock_irqrestore(&qp->q_lock, flags);
 
 	if (!ib_modify_qp_is_ok(old_qps, new_qps, ibqp->qp_type, attr_mask,
-				IB_LINK_LAYER_UNSPECIFIED)) {
+				IB_LINK_LAYER_ETHERNET)) {
 		pr_err("%s(%d) invalid attribute mask=0x%x specified for\n"
 		       "qpn=0x%x of type=0x%x old_qps=0x%x, new_qps=0x%x\n",
 		       __func__, dev->id, attr_mask, qp->id, ibqp->qp_type,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2013-11-13 22:29   ` [PATCH V5 6/8] IB/ocrdma: Populate GID table with IP based gids Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
       [not found]     ` <1384381792-2023-8-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-11-13 22:29   ` [PATCH V5 8/8] mlx4_en: Avoid setting netdevice dev_id to port number Or Gerlitz
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Moni Shoua, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

Existing user space applications provide only IBoE L3 address attributes
to the kernel when they issue QP modify. To comply with them and let such
apps to keep work transparently under the IBoE GID IP addressing changes,
added Eth L2 address resolution in the user-kernel linking piece - uverbs.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/uverbs_cmd.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 5bb2a82..74242b9 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -36,8 +36,10 @@
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/in6.h>
 
 #include <asm/uaccess.h>
+#include <rdma/ib_addr.h>
 
 #include "uverbs.h"
 
@@ -1911,6 +1913,7 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 	struct ib_qp              *qp;
 	struct ib_qp_attr         *attr;
 	int                        ret;
+	union ib_gid		   sgid;
 
 	if (copy_from_user(&cmd, buf, sizeof cmd))
 		return -EFAULT;
@@ -1974,6 +1977,30 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
 	attr->alt_ah_attr.ah_flags 	    = cmd.alt_dest.is_global ? IB_AH_GRH : 0;
 	attr->alt_ah_attr.port_num 	    = cmd.alt_dest.port_num;
 
+	if ((cmd.attr_mask & IB_QP_AV)  &&
+	    (rdma_port_get_link_layer(qp->device, attr->ah_attr.port_num) == IB_LINK_LAYER_ETHERNET)) {
+		ret = ib_query_gid(qp->device, attr->ah_attr.port_num,
+				   attr->ah_attr.grh.sgid_index, &sgid);
+		if (ret)
+			goto out;
+		if (rdma_link_local_addr((struct in6_addr *)attr->ah_attr.grh.dgid.raw)) {
+			rdma_get_ll_mac((struct in6_addr *)attr->ah_attr.grh.dgid.raw, attr->ah_attr.dmac);
+			rdma_get_ll_mac((struct in6_addr *)sgid.raw, attr->smac);
+			attr->vlan_id = rdma_get_vlan_id(&sgid);
+		} else {
+			ret = rdma_addr_find_dmac_by_grh(&sgid, &attr->ah_attr.grh.dgid,
+					attr->ah_attr.dmac, &attr->vlan_id);
+			if (ret)
+				goto out;
+			ret = rdma_addr_find_smac_by_sgid(&sgid, attr->smac, NULL);
+			if (ret)
+				goto out;
+		}
+		cmd.attr_mask |= IB_QP_SMAC;
+		if (attr->vlan_id < 0xFFFF)
+			cmd.attr_mask |= IB_QP_VID;
+	}
+
 	if (qp->real_qp == qp) {
 		ret = qp->device->modify_qp(qp, attr,
 			modify_qp_mask(qp->qp_type, cmd.attr_mask), &udata);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V5 8/8] mlx4_en: Avoid setting netdevice dev_id to port number
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2013-11-13 22:29   ` [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP Or Gerlitz
@ 2013-11-13 22:29   ` Or Gerlitz
  2013-11-16 19:52   ` [PATCH V5 0/8] IP based RoCE GID Addressing Or Gerlitz
  2013-11-22 10:29   ` Somnath Kotur
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-13 22:29 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w,
	Moni Shoua, Narendra K, Or Gerlitz

From: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>

The port number should not be stored in dev_id.

The netdevice dev_id field was intended to be used to differentiate
between multiple devices which share the same MAC address. Moreover,  this
make the kernel to assign wrong link local IPv6 address to mlx4_en netdevices.

Signed-off-by: Narendra K <narendra_k-8PEkshWhKlo@public.gmane.org>
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index fa37b7a..b8dbb1a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2191,7 +2191,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	netif_set_real_num_rx_queues(dev, prof->rx_ring_num);
 
 	SET_NETDEV_DEV(dev, &mdev->dev->pdev->dev);
-	dev->dev_id =  port - 1;
 
 	/*
 	 * Initialize driver private data
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 0/8] IP based RoCE GID Addressing
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2013-11-13 22:29   ` [PATCH V5 8/8] mlx4_en: Avoid setting netdevice dev_id to port number Or Gerlitz
@ 2013-11-16 19:52   ` Or Gerlitz
  2013-11-22 10:29   ` Somnath Kotur
  9 siblings, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-16 19:52 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma, Moni Shoua, Matan Barak, Tzahi Oved

On Thu, Nov 14, 2013 at 12:29 AM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> changes from V4:
>
>  - addressed feedback re the need to be compatible with non modified user
>    space applications/libraries, by adding code in uverbs which does address
>    resolution when dealing with Ethernet ports. This is patch #7

Does this address your concerns on compatibility so we can move
forward and merge the patches for 3.13?

>
>  - removed the patches that deal with uverbs extended commands, they will
>    added later on, such that new applications/libraries can be coded to them.
>
>  - added patch fixing mlx4_en to have correct IPv6 link local address.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]     ` <1384381792-2023-8-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-19 18:08       ` Roland Dreier
       [not found]         ` <CAG4TOxOUFfPMU+q1yKy6S7v3QiLVgMQwxNC-_vw-7UeUb7LoBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2013-11-19 18:08 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb,
	tzahio-VPRAkNaXOzVWk0Htik3J/w, Moni Shoua

> Existing user space applications provide only IBoE L3 address attributes
> to the kernel when they issue QP modify. To comply with them and let such
> apps to keep work transparently under the IBoE GID IP addressing changes,
> added Eth L2 address resolution in the user-kernel linking piece - uverbs.

I don't get why this belongs in uverbs.  In the current design serves
as a transport between userspace and kernel and the kernel verbs are
the same as user verbs.  The only exception to this that introduces
complexity is the stuff related to sharing XRCs and that makes sense
because multiple processes etc. is definitely a userspace-only
concern.

However in this case I don't see why address resolution is something
only userspace cares about.  Wouldn't it make sense to put this
resolution in core verbs?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]         ` <CAG4TOxOUFfPMU+q1yKy6S7v3QiLVgMQwxNC-_vw-7UeUb7LoBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-19 20:48           ` Or Gerlitz
       [not found]             ` <CAJZOPZ+5UJCAObzFjK5w=bnnSjRM102vR5Ft1nd4nO48Lr2HOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Or Gerlitz @ 2013-11-19 20:48 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb,
	Tzahi Oved, Moni Shoua

On Tue, Nov 19, 2013 at 8:08 PM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> Existing user space applications provide only IBoE L3 address attributes
>> to the kernel when they issue QP modify. To comply with them and let such
>> apps to keep work transparently under the IBoE GID IP addressing changes,
>> added Eth L2 address resolution in the user-kernel linking piece - uverbs.

> I don't get why this belongs in uverbs.  In the current design serves
> as a transport between userspace and kernel and the kernel verbs are
> the same as user verbs.  The only exception to this that introduces
> complexity is the stuff related to sharing XRCs and that makes sense
> because multiple processes etc. is definitely a userspace-only concern.

> However in this case I don't see why address resolution is something
> only userspace cares about.  Wouldn't it make sense to put this
> resolution in core verbs?

Basically, we've put it into uverbs b/c for kernel consumers that use
the rdma-cm the problem doesn't exist, since the Ethernet L2
attributes are filled into the qp attributes used by the rdma-cm
throughout the address resolution process.

Since currently there are no in-tree non rdma-cm cosumer ULPs that are
 applicable to RoCE, the kernel is done deal in that respect.

If it helps or/and make more sense, sure we can move the reslution to
be done @ the core verbs, e.g in core/verbs.c :: ib_modify_qp,
anything else expect for this feedback?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]             ` <CAJZOPZ+5UJCAObzFjK5w=bnnSjRM102vR5Ft1nd4nO48Lr2HOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-20  4:09               ` Devesh Sharma
       [not found]                 ` <EE7902D3F51F404C82415C4803930ACD3FD8348A-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
  2013-12-11 17:45               ` Or Gerlitz
  1 sibling, 1 reply; 19+ messages in thread
From: Devesh Sharma @ 2013-11-20  4:09 UTC (permalink / raw)
  To: Or Gerlitz, Roland Dreier
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb,
	Tzahi Oved, Moni Shoua

Hi Roland,

I agree with Or. RDMA-CM takes care of resolving l2 addresses for kernel ULP and there are no _non-rdma-cm_ ULP right now in the IB-kernel-stack. On the other hand, for the user-space applications, which uses RDMA-CM V5 is a simplified approach. All the patches =< v4 was an effort to change RDMA-CM and it changed entire user/kernel interface of rdma-cm and verbs, this was not acceptable.

With V5 patch-set I still have a concern about those user apps which does not use rdma-cm (e.g. ib_send_bw without -R option) how DMAC and SMAC will be resolved?

However, in the approach to move address resolution to core verbs i.e. ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to resolve l2 addresses in its own way. 
Init_ah_from_wc() would still need changes. This change will be same as done in v5 patch set. This approach would also solve the "ib_send_bw without -R option" issue.

-Regards
 Devesh

-----Original Message-----
From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Or Gerlitz
Sent: Wednesday, November 20, 2013 2:19 AM
To: Roland Dreier
Cc: Or Gerlitz; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; monis; matanb; Tzahi Oved; Moni Shoua
Subject: Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP

On Tue, Nov 19, 2013 at 8:08 PM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> Existing user space applications provide only IBoE L3 address 
>> attributes to the kernel when they issue QP modify. To comply with 
>> them and let such apps to keep work transparently under the IBoE GID 
>> IP addressing changes, added Eth L2 address resolution in the user-kernel linking piece - uverbs.

> I don't get why this belongs in uverbs.  In the current design serves 
> as a transport between userspace and kernel and the kernel verbs are 
> the same as user verbs.  The only exception to this that introduces 
> complexity is the stuff related to sharing XRCs and that makes sense 
> because multiple processes etc. is definitely a userspace-only concern.

> However in this case I don't see why address resolution is something 
> only userspace cares about.  Wouldn't it make sense to put this 
> resolution in core verbs?

Basically, we've put it into uverbs b/c for kernel consumers that use the rdma-cm the problem doesn't exist, since the Ethernet L2 attributes are filled into the qp attributes used by the rdma-cm throughout the address resolution process.

Since currently there are no in-tree non rdma-cm cosumer ULPs that are  applicable to RoCE, the kernel is done deal in that respect.

If it helps or/and make more sense, sure we can move the reslution to be done @ the core verbs, e.g in core/verbs.c :: ib_modify_qp, anything else expect for this feedback?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]                 ` <EE7902D3F51F404C82415C4803930ACD3FD8348A-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2013-11-20  7:15                   ` Somnath Kotur
       [not found]                     ` <dad4986f-4b34-47dc-b6bb-b4882ad1405a-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Somnath Kotur @ 2013-11-20  7:15 UTC (permalink / raw)
  To: Devesh Sharma, Or Gerlitz, Roland Dreier
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb,
	Tzahi Oved, Moni Shoua


> I agree with Or. RDMA-CM takes care of resolving l2 addresses for kernel ULP
> and there are no _non-rdma-cm_ ULP right now in the IB-kernel-stack. On
> the other hand, for the user-space applications, which uses RDMA-CM V5 is a
> simplified approach. All the patches =< v4 was an effort to change RDMA-CM
> and it changed entire user/kernel interface of rdma-cm and verbs, this was
> not acceptable.

Agree as well.

> With V5 patch-set I still have a concern about those user apps which does not
> use rdma-cm (e.g. ib_send_bw without -R option) how DMAC and SMAC will
> be resolved?
> 
> However, in the approach to move address resolution to core verbs i.e.
> ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to
> resolve l2 addresses in its own way.

This I am not sure if it's a good idea for each vendor driver to implement L2 address resolution in it's own way ?
Not sure if that was the intent behind Roland's statement ?

Thanks
Som


> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Or Gerlitz
> Sent: Wednesday, November 20, 2013 2:19 AM
> To: Roland Dreier
> Cc: Or Gerlitz; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; monis; matanb; Tzahi Oved; Moni
> Shoua
> Subject: Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when
> modifying QP
> 
> On Tue, Nov 19, 2013 at 8:08 PM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >> Existing user space applications provide only IBoE L3 address
> >> attributes to the kernel when they issue QP modify. To comply with
> >> them and let such apps to keep work transparently under the IBoE GID
> >> IP addressing changes, added Eth L2 address resolution in the user-kernel
> linking piece - uverbs.
> 
> > I don't get why this belongs in uverbs.  In the current design serves
> > as a transport between userspace and kernel and the kernel verbs are
> > the same as user verbs.  The only exception to this that introduces
> > complexity is the stuff related to sharing XRCs and that makes sense
> > because multiple processes etc. is definitely a userspace-only concern.
> 
> > However in this case I don't see why address resolution is something
> > only userspace cares about.  Wouldn't it make sense to put this
> > resolution in core verbs?
> 
> Basically, we've put it into uverbs b/c for kernel consumers that use the
> rdma-cm the problem doesn't exist, since the Ethernet L2 attributes are filled
> into the qp attributes used by the rdma-cm throughout the address
> resolution process.
> 
> Since currently there are no in-tree non rdma-cm cosumer ULPs that are
> applicable to RoCE, the kernel is done deal in that respect.
> 
> If it helps or/and make more sense, sure we can move the reslution to be
> done @ the core verbs, e.g in core/verbs.c :: ib_modify_qp, anything else
> expect for this feedback?
> 
> Or.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]                     ` <dad4986f-4b34-47dc-b6bb-b4882ad1405a-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
@ 2013-11-20 10:07                       ` Or Gerlitz
  2013-11-20 10:08                       ` Or Gerlitz
  1 sibling, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-11-20 10:07 UTC (permalink / raw)
  To: Somnath Kotur, Devesh Sharma, Or Gerlitz, Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb, Tzahi Oved, Moni Shoua

On 20/11/2013 09:15, Somnath Kotur wrote:
>> However, in the approach to move address resolution to core verbs i.e.
>> >ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to
>> >resolve l2 addresses in its own way.
> This I am not sure if it's a good idea for each vendor driver to implement L2 address resolution in it's own way? Not sure if that was the intent behind Roland's statement ?

I agree with Somnat, I don't see the point in putting L3 --> L2 address 
resolution within vendor drivers. This will create huge code 
duplication, and more problems. Again, with the proposed patches all 
kernel ULPs that are applicable to RoCE are covered and hence there's no 
address resolution for kernel session. To comply with non-modified user 
space applications/libraries V5 added a code to do address resolution 
and Roland just pointed out it may makes more sense to put that small 
code piece in the core verbs modify_qp function and not in the uverbs call.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]                     ` <dad4986f-4b34-47dc-b6bb-b4882ad1405a-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
  2013-11-20 10:07                       ` Or Gerlitz
@ 2013-11-20 10:08                       ` Or Gerlitz
       [not found]                         ` <528C8A0B.3030700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Or Gerlitz @ 2013-11-20 10:08 UTC (permalink / raw)
  To: Somnath Kotur, Devesh Sharma, Or Gerlitz, Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb, Tzahi Oved, Moni Shoua

On 20/11/2013 09:15, Somnath Kotur wrote:
>> However, in the approach to move address resolution to core verbs i.e.
>> >ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to
>> >resolve l2 addresses in its own way.
> This I am not sure if it's a good idea for each vendor driver to implement L2 address resolution in it's own way? Not sure if that was the intent behind Roland's statement ?

I agree with Somnath, I don't see the point in putting L3 --> L2 address 
resolution within vendor drivers. This will create huge code 
duplication, and more problems. Again, with the proposed patches all 
kernel ULPs that are applicable to RoCE are covered and hence there's no 
address resolution for kernel session. To comply with non-modified user 
space applications/libraries V5 added a code to do address resolution 
and Roland just pointed out it may makes more sense to put that small 
code piece in the core verbs modify_qp function and not in the uverbs call.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]                         ` <528C8A0B.3030700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-11-20 12:22                           ` Devesh Sharma
  0 siblings, 0 replies; 19+ messages in thread
From: Devesh Sharma @ 2013-11-20 12:22 UTC (permalink / raw)
  To: Or Gerlitz, Somnath Kotur, Or Gerlitz, Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis, matanb, Tzahi Oved, Moni Shoua

Okay, got it, I think I got confused. putting the changes in verbs.c/modify_qp and verbs.c/create_ah make sense for me.

Agree with Som and Or.

-Regards
 Devesh
________________________________________
From: Or Gerlitz [ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org]
Sent: Wednesday, November 20, 2013 3:38 PM
To: Somnath Kotur; Devesh Sharma; Or Gerlitz; Roland Dreier
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; monis; matanb; Tzahi Oved; Moni Shoua
Subject: Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP

On 20/11/2013 09:15, Somnath Kotur wrote:
>> However, in the approach to move address resolution to core verbs i.e.
>> >ib_modify_qp() and ib_create_ah(), vendor driver will have freedom to
>> >resolve l2 addresses in its own way.
> This I am not sure if it's a good idea for each vendor driver to implement L2 address resolution in it's own way? Not sure if that was the intent behind Roland's statement ?

I agree with Somnath, I don't see the point in putting L3 --> L2 address
resolution within vendor drivers. This will create huge code
duplication, and more problems. Again, with the proposed patches all
kernel ULPs that are applicable to RoCE are covered and hence there's no
address resolution for kernel session. To comply with non-modified user
space applications/libraries V5 added a code to do address resolution
and Roland just pointed out it may makes more sense to put that small
code piece in the core verbs modify_qp function and not in the uverbs call.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V5 0/8] IP based RoCE GID Addressing
       [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2013-11-16 19:52   ` [PATCH V5 0/8] IP based RoCE GID Addressing Or Gerlitz
@ 2013-11-22 10:29   ` Somnath Kotur
  9 siblings, 0 replies; 19+ messages in thread
From: Somnath Kotur @ 2013-11-22 10:29 UTC (permalink / raw)
  To: Or Gerlitz, roland-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, monis-VPRAkNaXOzVWk0Htik3J/w,
	matanb-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w

Acked-by : Somnath Kotur <somnath.kotur-laKkSmNT4hbQT0dZR+AlfA@public.gmane.org>

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Or Gerlitz
> Sent: Thursday, November 14, 2013 4:00 AM
> To: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org;
> matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; tzahio-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Or Gerlitz
> Subject: [PATCH V5 0/8] IP based RoCE GID Addressing
> 
> changes from V4:
> 
>  - addressed feedback re the need to be compatible with non modified user
>    space applications/libraries, by adding code in uverbs which does address
>    resolution when dealing with Ethernet ports. This is patch #7
> 
>  - removed the patches that deal with uverbs extended commands, they will
>    added later on, such that new applications/libraries can be coded to them.
> 
>  - added patch fixing mlx4_en to have correct IPv6 link local address.
> 
> See below full listing of change-history.
> 
> Currently, the IB stack (core + drivers) handle RoCE (IBoE) gids as they
> encode related Ethernet net-device interface MAC address and possibly
> VLAN id.
> 
> This series changes RoCE GIDs to encode IP addresses (IPv4 + IPv6) of the
> that Ethernet interface, under the following reasoning:
> 
> 1. There are environments where the compute entity that runs the RoCE
> stack is not aware that its traffic is vlan-tagged. This results with that node to
> create/assume wrong GIDs from the view point of a peer node which is
> aware to vlans.
> 
> Note that "node" here can be physical node connected to Ethernet switch
> acting in access mode talking to another node which does vlan
> insertion/stripping by itself.
> 
> Or another example is SRIOV Virtual Function which is configured to work in
> "VST"
> mode (Virtual-Switch-Tagging) such that the hypervisor configures the HW
> eSWitch to do vlan insertion for the vPORT representing that function.
> 
> 2. When RoCE traffic is inspected (mirrored/trapped) in Ethernet switches for
> monitoring and security purposes. It is much more natural for both humans
> and automated utilities (...) to observe IP addresses in a certain offset into
> RoCE frames L3 header vs. MAC/VLANs (which are there anyway in the L2
> header of that frame, so they are not gone by this change).
> 
> 3. Some Bonding/Teaming advanced mode such as balance-alb and balance-
> tlb are using multiple underlying devices in parallel, and hence packets always
> carry the bond IP address but different streams have different source MACs.
> The approach brought by this series is part from what would allow to support
> that for RoCE traffic too.
> 
> The 1st patch adds explicit handling of Ethernet L2 attributes, source/dest
> mac and vlan_id to the kernel IB core, in data-structures and CMA/CM code.
> Previously, with MAC/VLAN based addressing, they were encoded in the
> GIDs, where now they have to be resolved and placed separately from the IP
> based GIDs.
> 
> The 2nd patch modifies the CMA to cope with IP based GIDs, the 3rd/4th
> ones do that for the mlx4_ib driver, and the 5th/6th patches to the ocrdma
> driver.
> 
> The 7th patch adds address resolution to user space applications for RoCE
> ports such that these application keep working unmodified.
> 
> The 8th/last patch fixes the mlx4_en driver such that it has correct IPv6 link
> local address.
> 
> Or.
> 
> Full listing of change-history:
> 
> changes from V4:
> 
>  - addressed feedback re the need to be compatible with non modified user
>    space applications/libraries, by adding code in uverbs which does address
>    resolution when dealing with Ethernet ports.
> 
>  - removed the patches that deal with uverbs extended commands, they will
>    added later on, such that new applications/libraries can be coded to them.
> 
> changes from V3:
> 
>   - dropped the uverbs Infrastructure patch for extensions which is now
> upstream
>     400dbc9 "IB/core: Infrastructure for extensible uverbs commands"
> 
>   - added ocrdma patch to handle Ethernet L2 parameters, similar to the mlx4
> patch.
> 
>   - removed the assumption that the low level driver can provide the source
> mac
>     and vlan in the struct ib_wc returned by ib_poll_cq, and adjusted the
>     ib_init_ah_from_wc helper of the IB core accordingly.
> 
>   - fixed some vlan related issues in the mlx4 driver
> 
> changes from V2:
> 
>   - added handling of IP based GIDs in the ocrdma driver - patch #5,
>     as a result patches #5-8 of V1 became patches #6-9
> 
> changes from V1:
> 
>  - rebased the series against the latest kernel bits, which include Sean's
>    AF_IB changes to the rdma-cm
> 
>  - fixed bug in mlx4_ib where reset of the gid table was done for IB ports too
> 
>  - fixed build warnings and issues pointed by sparse
> 
>  - introduced patch #1 which does the explicit handling of Ethernet L2
> attributes,
>    source/dest mac and vlan_id in the kernel data-structures and CMA/CM
> code.
> 
>  - use smac when modifying a QP --> find smac in passive side + additional
> fields
>    to adress structures
> 
>  - add support to new QP atrr in ib_modify_qp_is_ok() special for ll = ETH
>   and modified all low-level drivers to keep working after that change
> 
>  -- changes around uverbs:
>  - use ah_ext as pointer in qp_attr passed from user space, so this
>    field by itself can be extended in the future
>  - for kernel to user command respnses comp_mask is moved into the
>    right place which is after the non-extended command respond fields
>  - fixed bug in copy_qp_attr_ex under which some fields were copied to
>    wrong locations
>  - use new structure rdma_ucm_init_qp_attr_ex which is extendable (ucma)
> 
> changes from V0:
> 
>  - enhanced documentation of the mlx4_ib, uverbs and ucma patches
>  - broke the mlx4_ib patch to two
>  - broke the extended user space commands patch to two
> 
> Matan Barak (1):
>   IB/core: Ethernet L2 attributes in verbs/cm structures
> 
> Moni Shoua (7):
>   IB/CMA: IBoE (RoCE) IP based GID addressing
>   IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table
>   IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing
>   IB/ocrdma: Handle Ethernet L2 parameters for IP based GID addressing
>   IB/ocrdma: Populate GID table with IP based gids
>   IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
>   mlx4_en: Avoid setting netdevice dev_id to port number
> 
>  drivers/infiniband/core/addr.c                 |   97 +++++-
>  drivers/infiniband/core/cm.c                   |   50 +++
>  drivers/infiniband/core/cma.c                  |   74 +++-
>  drivers/infiniband/core/sa_query.c             |   12 +-
>  drivers/infiniband/core/ucma.c                 |   18 +-
>  drivers/infiniband/core/uverbs_cmd.c           |   27 ++
>  drivers/infiniband/core/verbs.c                |   43 ++-
>  drivers/infiniband/hw/ehca/ehca_qp.c           |    2 +-
>  drivers/infiniband/hw/ipath/ipath_qp.c         |    2 +-
>  drivers/infiniband/hw/mlx4/ah.c                |   40 +--
>  drivers/infiniband/hw/mlx4/cq.c                |    9 +
>  drivers/infiniband/hw/mlx4/main.c              |  474 +++++++++++++++++-------
>  drivers/infiniband/hw/mlx4/mlx4_ib.h           |    6 +-
>  drivers/infiniband/hw/mlx4/qp.c                |  104 ++++-
>  drivers/infiniband/hw/mlx5/qp.c                |    3 +-
>  drivers/infiniband/hw/mthca/mthca_qp.c         |    3 +-
>  drivers/infiniband/hw/ocrdma/ocrdma.h          |   12 +
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c       |    5 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.c       |   21 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_hw.h       |    1 -
>  drivers/infiniband/hw/ocrdma/ocrdma_main.c     |  138 ++-----
>  drivers/infiniband/hw/ocrdma/ocrdma_verbs.c    |    3 +-
>  drivers/infiniband/hw/qib/qib_qp.c             |    2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    1 -
>  drivers/net/ethernet/mellanox/mlx4/port.c      |   20 +
>  include/linux/mlx4/cq.h                        |   15 +-
>  include/linux/mlx4/device.h                    |    1 +
>  include/rdma/ib_addr.h                         |   69 +++-
>  include/rdma/ib_cm.h                           |    1 +
>  include/rdma/ib_pack.h                         |    1 +
>  include/rdma/ib_sa.h                           |    3 +
>  include/rdma/ib_verbs.h                        |   21 +-
>  32 files changed, 894 insertions(+), 384 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP
       [not found]             ` <CAJZOPZ+5UJCAObzFjK5w=bnnSjRM102vR5Ft1nd4nO48Lr2HOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-11-20  4:09               ` Devesh Sharma
@ 2013-12-11 17:45               ` Or Gerlitz
  1 sibling, 0 replies; 19+ messages in thread
From: Or Gerlitz @ 2013-12-11 17:45 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Tzahi Oved, Moni Shoua,
	Matan Barak, Ali Ayoub

On Tue, Nov 19, 2013 at 10:48 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Nov 19, 2013 at 8:08 PM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>> Existing user space applications provide only IBoE L3 address attributes
>>> to the kernel when they issue QP modify. To comply with them and let such
>>> apps to keep work transparently under the IBoE GID IP addressing changes,
>>> added Eth L2 address resolution in the user-kernel linking piece - uverbs.
>
>> I don't get why this belongs in uverbs.  In the current design serves
>> as a transport between userspace and kernel and the kernel verbs are
>> the same as user verbs.  The only exception to this that introduces
>> complexity is the stuff related to sharing XRCs and that makes sense
>> because multiple processes etc. is definitely a userspace-only concern.
>
>> However in this case I don't see why address resolution is something
>> only userspace cares about.  Wouldn't it make sense to put this
>> resolution in core verbs?
>
> Basically, we've put it into uverbs b/c for kernel consumers that use
> the rdma-cm the problem doesn't exist, since the Ethernet L2
> attributes are filled into the qp attributes used by the rdma-cm
> throughout the address resolution process.
>
> Since currently there are no in-tree non rdma-cm cosumer ULPs that are
>  applicable to RoCE, the kernel is done deal in that respect.
>
> If it helps or/and make more sense, sure we can move the reslution to
> be done @ the core verbs, e.g in core/verbs.c :: ib_modify_qp,
> anything else expect for this feedback?

Hi Roland,

-- waiting for a month to get your feedback if V5 + the above fix you
asked is enough
or more changes / review is needed. We'l respin the series and make V6
with this change
and few minor fixes for issues we came across while doing more
regression tests, but need
your response, else we're clueless on the actual status. As I wrote
you http://marc.info/?l=linux-rdma&m=138488191903193&w=2
this is pre-step for RoCE SRIOV support, so we needs head up so we
don't loose again more kernel cycles.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-12-11 17:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13 22:29 [PATCH V5 0/8] IP based RoCE GID Addressing Or Gerlitz
     [not found] ` <1384381792-2023-1-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-13 22:29   ` [PATCH V5 1/8] IB/core: Ethernet L2 attributes in verbs/cm structures Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 2/8] IB/CMA: IBoE (RoCE) IP based GID addressing Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 3/8] IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 4/8] IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressing Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 5/8] IB/ocrdma: " Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 6/8] IB/ocrdma: Populate GID table with IP based gids Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 7/8] IB/uverbs: Resolve Ethernet L2 addresses when modifying QP Or Gerlitz
     [not found]     ` <1384381792-2023-8-git-send-email-ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-19 18:08       ` Roland Dreier
     [not found]         ` <CAG4TOxOUFfPMU+q1yKy6S7v3QiLVgMQwxNC-_vw-7UeUb7LoBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-19 20:48           ` Or Gerlitz
     [not found]             ` <CAJZOPZ+5UJCAObzFjK5w=bnnSjRM102vR5Ft1nd4nO48Lr2HOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-20  4:09               ` Devesh Sharma
     [not found]                 ` <EE7902D3F51F404C82415C4803930ACD3FD8348A-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2013-11-20  7:15                   ` Somnath Kotur
     [not found]                     ` <dad4986f-4b34-47dc-b6bb-b4882ad1405a-3RiH6ntJJkOPfaB/Gd0HpljyZtpTMMwT@public.gmane.org>
2013-11-20 10:07                       ` Or Gerlitz
2013-11-20 10:08                       ` Or Gerlitz
     [not found]                         ` <528C8A0B.3030700-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-20 12:22                           ` Devesh Sharma
2013-12-11 17:45               ` Or Gerlitz
2013-11-13 22:29   ` [PATCH V5 8/8] mlx4_en: Avoid setting netdevice dev_id to port number Or Gerlitz
2013-11-16 19:52   ` [PATCH V5 0/8] IP based RoCE GID Addressing Or Gerlitz
2013-11-22 10:29   ` Somnath Kotur

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.