All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next V2 00/11] Add RoCE v2 support
@ 2015-12-03 13:47 Matan Barak
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Hi Doug,

This series adds the support for RoCE v2. In order to support RoCE v2,
we add gid_type attribute to every GID. When the RoCE GID management
populates the GID table, it duplicates each GID with all supported types.
This gives the user the ability to communicate over each supported
type.

Patch 0001, 0002 and 0003 add support for multiple GID types to the
cache and related APIs. The third patch exposes the GID attributes
information is sysfs.

Patch 0004 adds the RoCE v2 GID type and the capabilities required
from the vendor in order to implement RoCE v2. These capabilities
are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.

RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
information should come from the vendor's driver. In case the vendor
doesn't supply this information, we parse the packet headers and resolve
its network type. Patch 0005 adds this information and required utilities.

Patches 0006 and 0007 adds route validation. This is mandatory to ensure
that we send packets using GIDS which corresponds to a net-device that
can be routed to the destination.

Patches 0008 and 0009 add configfs support (and the required
infrastructure) for CMA. The administrator should be able to set the
default RoCE type. This is done through a new per-port
default_roce_mode configfs file.

Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
packets. This is required for vendors which implement their
QP1 as a Raw QP.

Patch 0011 adds support for IPv4 multicast as an IPv4 network
requires IGMP to be sent in order to join multicast groups.

Vendors code aren't part of this patch-set. Soft-Roce will be
sent soon and depends on these patches. Other vendors, like
mlx4, ocrdma and mlx5 will follow.

This patch is applied on "Change per-entry locks in GID cache to table lock"
which was sent to the mailing list.

Thanks,
Matan

Changed from V1:
 - Rebased against Linux 4.4-rc2 master branch.
 - Add route validation
 - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
 - Add documentation for configfs and sysfs ABI
 - Remove ifindex and gid_type from mcmember

Changes from V0:
 - Rebased patches against Doug's latest k.o/for-4.4 tree.
 - Fixed a bug in configfs (rmdir caused an incorrect free).

Matan Barak (8):
  IB/core: Add gid_type to gid attribute
  IB/cm: Use the source GID index type
  IB/core: Add gid attributes to sysfs
  IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
  IB/core: Move rdma_is_upper_dev_rcu to header file
  IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
  IB/rdma_cm: Add wrapper for cma reference count
  IB/cma: Add configfs for rdma_cm

Moni Shoua (2):
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (1):
  IB/core: Add rdma_network_type to wc

 Documentation/ABI/testing/configfs-rdma_cm       |  22 ++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
 drivers/infiniband/Kconfig                       |   9 +
 drivers/infiniband/core/Makefile                 |   2 +
 drivers/infiniband/core/addr.c                   | 185 +++++++++----
 drivers/infiniband/core/cache.c                  | 169 ++++++++----
 drivers/infiniband/core/cm.c                     |  31 ++-
 drivers/infiniband/core/cma.c                    | 261 ++++++++++++++++--
 drivers/infiniband/core/cma_configfs.c           | 321 +++++++++++++++++++++++
 drivers/infiniband/core/core_priv.h              |  45 ++++
 drivers/infiniband/core/device.c                 |  10 +-
 drivers/infiniband/core/multicast.c              |  17 +-
 drivers/infiniband/core/roce_gid_mgmt.c          |  81 ++++--
 drivers/infiniband/core/sa_query.c               |  76 +++++-
 drivers/infiniband/core/sysfs.c                  | 184 ++++++++++++-
 drivers/infiniband/core/ud_header.c              | 155 ++++++++++-
 drivers/infiniband/core/uverbs_marshall.c        |   1 +
 drivers/infiniband/core/verbs.c                  | 170 ++++++++++--
 drivers/infiniband/hw/mlx4/qp.c                  |   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c           |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c         |   2 +-
 include/rdma/ib_addr.h                           |  11 +-
 include/rdma/ib_cache.h                          |   4 +
 include/rdma/ib_pack.h                           |  45 +++-
 include/rdma/ib_sa.h                             |   3 +
 include/rdma/ib_verbs.h                          |  78 +++++-
 26 files changed, 1704 insertions(+), 203 deletions(-)
 create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
 create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
 create mode 100644 drivers/infiniband/core/cma_configfs.c

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 02/11] IB/cm: Use the source GID index type Matan Barak
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

In order to support multiple GID types, we need to store the gid_type
with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
GID table entries shall have a "GID type" attribute that denotes the L3
Address type". The currently supported GID is IB_GID_TYPE_IB which is
also RoCE v1 GID type.

This implies that gid_type should be added to roce_gid_table meta-data.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c           | 144 ++++++++++++++++++++----------
 drivers/infiniband/core/cm.c              |   2 +-
 drivers/infiniband/core/cma.c             |   3 +-
 drivers/infiniband/core/core_priv.h       |   4 +
 drivers/infiniband/core/device.c          |   9 +-
 drivers/infiniband/core/multicast.c       |   2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |  60 +++++++++++--
 drivers/infiniband/core/sa_query.c        |   5 +-
 drivers/infiniband/core/uverbs_marshall.c |   1 +
 drivers/infiniband/core/verbs.c           |   1 +
 include/rdma/ib_cache.h                   |   4 +
 include/rdma/ib_sa.h                      |   1 +
 include/rdma/ib_verbs.h                   |  11 ++-
 13 files changed, 185 insertions(+), 62 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 097e9df..566fd8f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -64,6 +64,7 @@ enum gid_attr_find_mask {
 	GID_ATTR_FIND_MASK_GID          = 1UL << 0,
 	GID_ATTR_FIND_MASK_NETDEV	= 1UL << 1,
 	GID_ATTR_FIND_MASK_DEFAULT	= 1UL << 2,
+	GID_ATTR_FIND_MASK_GID_TYPE	= 1UL << 3,
 };
 
 enum gid_table_entry_props {
@@ -125,6 +126,19 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 	}
 }
 
+static const char * const gid_type_str[] = {
+	[IB_GID_TYPE_IB]	= "IB/RoCE v1",
+};
+
+const char *ib_cache_gid_type_str(enum ib_gid_type gid_type)
+{
+	if (gid_type < ARRAY_SIZE(gid_type_str) && gid_type_str[gid_type])
+		return gid_type_str[gid_type];
+
+	return "Invalid GID type";
+}
+EXPORT_SYMBOL(ib_cache_gid_type_str);
+
 /* This function expects that rwlock will be write locked in all
  * scenarios and that lock will be locked in sleep-able (RoCE)
  * scenarios.
@@ -233,6 +247,10 @@ static int find_gid(struct ib_gid_table *table, const union ib_gid *gid,
 		if (found >=0)
 			continue;
 
+		if (mask & GID_ATTR_FIND_MASK_GID_TYPE &&
+		    attr->gid_type != val->gid_type)
+			continue;
+
 		if (mask & GID_ATTR_FIND_MASK_GID &&
 		    memcmp(gid, &data->gid, sizeof(*gid)))
 			continue;
@@ -296,6 +314,7 @@ int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
 	write_lock_irq(&table->rwlock);
 
 	ix = find_gid(table, gid, attr, false, GID_ATTR_FIND_MASK_GID |
+		      GID_ATTR_FIND_MASK_GID_TYPE |
 		      GID_ATTR_FIND_MASK_NETDEV, &empty);
 	if (ix >= 0)
 		goto out_unlock;
@@ -329,6 +348,7 @@ int ib_cache_gid_del(struct ib_device *ib_dev, u8 port,
 
 	ix = find_gid(table, gid, attr, false,
 		      GID_ATTR_FIND_MASK_GID	  |
+		      GID_ATTR_FIND_MASK_GID_TYPE |
 		      GID_ATTR_FIND_MASK_NETDEV	  |
 		      GID_ATTR_FIND_MASK_DEFAULT,
 		      NULL);
@@ -427,11 +447,13 @@ static int _ib_cache_gid_table_find(struct ib_device *ib_dev,
 
 static int ib_cache_gid_find(struct ib_device *ib_dev,
 			     const union ib_gid *gid,
+			     enum ib_gid_type gid_type,
 			     struct net_device *ndev, u8 *port,
 			     u16 *index)
 {
-	unsigned long mask = GID_ATTR_FIND_MASK_GID;
-	struct ib_gid_attr gid_attr_val = {.ndev = ndev};
+	unsigned long mask = GID_ATTR_FIND_MASK_GID |
+			     GID_ATTR_FIND_MASK_GID_TYPE;
+	struct ib_gid_attr gid_attr_val = {.ndev = ndev, .gid_type = gid_type};
 
 	if (ndev)
 		mask |= GID_ATTR_FIND_MASK_NETDEV;
@@ -442,14 +464,16 @@ static int ib_cache_gid_find(struct ib_device *ib_dev,
 
 int ib_find_cached_gid_by_port(struct ib_device *ib_dev,
 			       const union ib_gid *gid,
+			       enum ib_gid_type gid_type,
 			       u8 port, struct net_device *ndev,
 			       u16 *index)
 {
 	int local_index;
 	struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
 	struct ib_gid_table *table;
-	unsigned long mask = GID_ATTR_FIND_MASK_GID;
-	struct ib_gid_attr val = {.ndev = ndev};
+	unsigned long mask = GID_ATTR_FIND_MASK_GID |
+			     GID_ATTR_FIND_MASK_GID_TYPE;
+	struct ib_gid_attr val = {.ndev = ndev, .gid_type = gid_type};
 	unsigned long flags;
 
 	if (port < rdma_start_port(ib_dev) ||
@@ -607,15 +631,15 @@ static void cleanup_gid_table_port(struct ib_device *ib_dev, u8 port,
 
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
 				  struct net_device *ndev,
+				  unsigned long gid_type_mask,
 				  enum ib_cache_gid_default_mode mode)
 {
 	struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
 	union ib_gid gid;
 	struct ib_gid_attr gid_attr;
+	struct ib_gid_attr zattr_type = zattr;
 	struct ib_gid_table *table;
-	int ix;
-	union ib_gid current_gid;
-	struct ib_gid_attr current_gid_attr = {};
+	unsigned int gid_type;
 
 	table  = ports_table[port - rdma_start_port(ib_dev)];
 
@@ -623,55 +647,82 @@ void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
 	memset(&gid_attr, 0, sizeof(gid_attr));
 	gid_attr.ndev = ndev;
 
-	mutex_lock(&table->lock);
-	write_lock_irq(&table->rwlock);
-	ix = find_gid(table, NULL, NULL, true, GID_ATTR_FIND_MASK_DEFAULT, NULL);
-
-	/* Coudn't find default GID location */
-	WARN_ON(ix < 0);
-
-	if (!__ib_cache_gid_get(ib_dev, port, ix,
-				&current_gid, &current_gid_attr) &&
-	    mode == IB_CACHE_GID_DEFAULT_MODE_SET &&
-	    !memcmp(&gid, &current_gid, sizeof(gid)) &&
-	    !memcmp(&gid_attr, &current_gid_attr, sizeof(gid_attr)))
-		goto unlock;
-
-	if (memcmp(&current_gid, &zgid, sizeof(current_gid)) ||
-	    memcmp(&current_gid_attr, &zattr,
-		   sizeof(current_gid_attr))) {
-		if (del_gid(ib_dev, port, table, ix, true)) {
-			pr_warn("ib_cache_gid: can't delete index %d for default gid %pI6\n",
-				ix, gid.raw);
-			goto unlock;
-		} else {
-			dispatch_gid_change_event(ib_dev, port);
+	for (gid_type = 0; gid_type < IB_GID_TYPE_SIZE; ++gid_type) {
+		int ix;
+		union ib_gid current_gid;
+		struct ib_gid_attr current_gid_attr = {};
+
+		if (1UL << gid_type & ~gid_type_mask)
+			continue;
+
+		gid_attr.gid_type = gid_type;
+
+		mutex_lock(&table->lock);
+		write_lock_irq(&table->rwlock);
+		ix = find_gid(table, NULL, &gid_attr, true,
+			      GID_ATTR_FIND_MASK_GID_TYPE |
+			      GID_ATTR_FIND_MASK_DEFAULT,
+			      NULL);
+
+		/* Coudn't find default GID location */
+		WARN_ON(ix < 0);
+
+		zattr_type.gid_type = gid_type;
+
+		if (!__ib_cache_gid_get(ib_dev, port, ix,
+					&current_gid, &current_gid_attr) &&
+		    mode == IB_CACHE_GID_DEFAULT_MODE_SET &&
+		    !memcmp(&gid, &current_gid, sizeof(gid)) &&
+		    !memcmp(&gid_attr, &current_gid_attr, sizeof(gid_attr)))
+			goto release;
+
+		if (memcmp(&current_gid, &zgid, sizeof(current_gid)) ||
+		    memcmp(&current_gid_attr, &zattr_type,
+			   sizeof(current_gid_attr))) {
+			if (del_gid(ib_dev, port, table, ix, true)) {
+				pr_warn("ib_cache_gid: can't delete index %d for default gid %pI6\n",
+					ix, gid.raw);
+				goto release;
+			} else {
+				dispatch_gid_change_event(ib_dev, port);
+			}
 		}
-	}
 
-	if (mode == IB_CACHE_GID_DEFAULT_MODE_SET) {
-		if (add_gid(ib_dev, port, table, ix, &gid, &gid_attr, true)) {
-			pr_warn("ib_cache_gid: unable to add default gid %pI6\n",
-				gid.raw);
-		} else {
-			dispatch_gid_change_event(ib_dev, port);
+		if (mode == IB_CACHE_GID_DEFAULT_MODE_SET) {
+			if (add_gid(ib_dev, port, table, ix, &gid, &gid_attr, true))
+				pr_warn("ib_cache_gid: unable to add default gid %pI6\n",
+					gid.raw);
+			else
+				dispatch_gid_change_event(ib_dev, port);
 		}
-	}
 
-unlock:
-	if (current_gid_attr.ndev)
-		dev_put(current_gid_attr.ndev);
-	write_unlock_irq(&table->rwlock);
-	mutex_unlock(&table->lock);
+release:
+		if (current_gid_attr.ndev)
+			dev_put(current_gid_attr.ndev);
+		write_unlock_irq(&table->rwlock);
+		mutex_unlock(&table->lock);
+	}
 }
 
 static int gid_table_reserve_default(struct ib_device *ib_dev, u8 port,
 				     struct ib_gid_table *table)
 {
-	if (rdma_protocol_roce(ib_dev, port)) {
-		struct ib_gid_table_entry *entry = &table->data_vec[0];
+	unsigned int i;
+	unsigned long roce_gid_type_mask;
+	unsigned int num_default_gids;
+	unsigned int current_gid = 0;
+
+	roce_gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+	num_default_gids = hweight_long(roce_gid_type_mask);
+	for (i = 0; i < num_default_gids && i < table->sz; i++) {
+		struct ib_gid_table_entry *entry =
+			&table->data_vec[i];
 
 		entry->props |= GID_TABLE_ENTRY_DEFAULT;
+		current_gid = find_next_bit(&roce_gid_type_mask,
+					    BITS_PER_LONG,
+					    current_gid);
+		entry->attr.gid_type = current_gid++;
 	}
 
 	return 0;
@@ -794,11 +845,12 @@ EXPORT_SYMBOL(ib_get_cached_gid);
 
 int ib_find_cached_gid(struct ib_device *device,
 		       const union ib_gid *gid,
+		       enum ib_gid_type gid_type,
 		       struct net_device *ndev,
 		       u8               *port_num,
 		       u16              *index)
 {
-	return ib_cache_gid_find(device, gid, ndev, port_num, index);
+	return ib_cache_gid_find(device, gid, gid_type, ndev, port_num, index);
 }
 EXPORT_SYMBOL(ib_find_cached_gid);
 
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 0a26dd6..af8b907 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -364,7 +364,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 	read_lock_irqsave(&cm.device_lock, flags);
 	list_for_each_entry(cm_dev, &cm.device_list, list) {
 		if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid,
-					ndev, &p, NULL)) {
+					IB_GID_TYPE_IB, ndev, &p, NULL)) {
 			port = cm_dev->port[p-1];
 			break;
 		}
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 944cd90..c19f822 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -456,7 +456,8 @@ static inline int cma_validate_port(struct ib_device *device, u8 port,
 	if (dev_type == ARPHRD_ETHER)
 		ndev = dev_get_by_index(&init_net, bound_if_index);
 
-	ret = ib_find_cached_gid_by_port(device, gid, port, ndev, NULL);
+	ret = ib_find_cached_gid_by_port(device, gid, IB_GID_TYPE_IB, port,
+					 ndev, NULL);
 
 	if (ndev)
 		dev_put(ndev);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 5cf6eb7..d531f91 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -70,8 +70,11 @@ enum ib_cache_gid_default_mode {
 	IB_CACHE_GID_DEFAULT_MODE_DELETE
 };
 
+const char *ib_cache_gid_type_str(enum ib_gid_type gid_type);
+
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
 				  struct net_device *ndev,
+				  unsigned long gid_type_mask,
 				  enum ib_cache_gid_default_mode mode);
 
 int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
@@ -87,6 +90,7 @@ int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
 int roce_rescan_device(struct ib_device *ib_dev);
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 
 int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 179e813..732a196 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -825,26 +825,31 @@ EXPORT_SYMBOL(ib_modify_port);
  *   a specified GID value occurs.
  * @device: The device to query.
  * @gid: The GID value to search for.
+ * @gid_type: Type of GID.
  * @ndev: The ndev related to the GID to search for.
  * @port_num: The port number of the device where the GID value was found.
  * @index: The index into the GID table where the GID was found.  This
  *   parameter may be NULL.
  */
 int ib_find_gid(struct ib_device *device, union ib_gid *gid,
-		struct net_device *ndev, u8 *port_num, u16 *index)
+		enum ib_gid_type gid_type, struct net_device *ndev,
+		u8 *port_num, u16 *index)
 {
 	union ib_gid tmp_gid;
 	int ret, port, i;
 
 	for (port = rdma_start_port(device); port <= rdma_end_port(device); ++port) {
 		if (rdma_cap_roce_gid_table(device, port)) {
-			if (!ib_find_cached_gid_by_port(device, gid, port,
+			if (!ib_find_cached_gid_by_port(device, gid, gid_type, port,
 							ndev, index)) {
 				*port_num = port;
 				return 0;
 			}
 		}
 
+		if (gid_type != IB_GID_TYPE_IB)
+			continue;
+
 		for (i = 0; i < device->port_immutable[port].gid_tbl_len; ++i) {
 			ret = ib_query_gid(device, port, i, &tmp_gid, NULL);
 			if (ret)
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index bb6685f..6911ae6 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -729,7 +729,7 @@ int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
 	u16 gid_index;
 	u8 p;
 
-	ret = ib_find_cached_gid(device, &rec->port_gid,
+	ret = ib_find_cached_gid(device, &rec->port_gid, IB_GID_TYPE_IB,
 				 NULL, &p, &gid_index);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 178f984..61c27a7 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -67,17 +67,52 @@ struct netdev_event_work {
 	struct netdev_event_work_cmd	cmds[ROCE_NETDEV_CALLBACK_SZ];
 };
 
+static const struct {
+	bool (*is_supported)(const struct ib_device *device, u8 port_num);
+	enum ib_gid_type gid_type;
+} PORT_CAP_TO_GID_TYPE[] = {
+	{rdma_protocol_roce,   IB_GID_TYPE_ROCE},
+};
+
+#define CAP_TO_GID_TABLE_SIZE	ARRAY_SIZE(PORT_CAP_TO_GID_TYPE)
+
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port)
+{
+	int i;
+	unsigned int ret_flags = 0;
+
+	if (!rdma_protocol_roce(ib_dev, port))
+		return 1UL << IB_GID_TYPE_IB;
+
+	for (i = 0; i < CAP_TO_GID_TABLE_SIZE; i++)
+		if (PORT_CAP_TO_GID_TYPE[i].is_supported(ib_dev, port))
+			ret_flags |= 1UL << PORT_CAP_TO_GID_TYPE[i].gid_type;
+
+	return ret_flags;
+}
+EXPORT_SYMBOL(roce_gid_type_mask_support);
+
 static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev,
 		       u8 port, union ib_gid *gid,
 		       struct ib_gid_attr *gid_attr)
 {
-	switch (gid_op) {
-	case GID_ADD:
-		ib_cache_gid_add(ib_dev, port, gid, gid_attr);
-		break;
-	case GID_DEL:
-		ib_cache_gid_del(ib_dev, port, gid, gid_attr);
-		break;
+	int i;
+	unsigned long gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+	for (i = 0; i < IB_GID_TYPE_SIZE; i++) {
+		if ((1UL << i) & gid_type_mask) {
+			gid_attr->gid_type = i;
+			switch (gid_op) {
+			case GID_ADD:
+				ib_cache_gid_add(ib_dev, port,
+						 gid, gid_attr);
+				break;
+			case GID_DEL:
+				ib_cache_gid_del(ib_dev, port,
+						 gid, gid_attr);
+				break;
+			}
+		}
 	}
 }
 
@@ -203,6 +238,8 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
 				     u8 port, struct net_device *event_ndev,
 				     struct net_device *rdma_ndev)
 {
+	unsigned long gid_type_mask;
+
 	rcu_read_lock();
 	if (!rdma_ndev ||
 	    ((rdma_ndev != event_ndev &&
@@ -215,7 +252,9 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
 	}
 	rcu_read_unlock();
 
-	ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev,
+	gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+	ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev, gid_type_mask,
 				     IB_CACHE_GID_DEFAULT_MODE_SET);
 }
 
@@ -237,9 +276,14 @@ static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
 	if (is_upper_dev_rcu(rdma_ndev, event_ndev) &&
 	    is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) ==
 	    BONDING_SLAVE_STATE_INACTIVE) {
+		unsigned long gid_type_mask;
+
 		rcu_read_unlock();
 
+		gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
 		ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev,
+					     gid_type_mask,
 					     IB_CACHE_GID_DEFAULT_MODE_DELETE);
 	} else {
 		rcu_read_unlock();
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 2aba774..5606922 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -1012,8 +1012,8 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 		ah_attr->ah_flags = IB_AH_GRH;
 		ah_attr->grh.dgid = rec->dgid;
 
-		ret = ib_find_cached_gid(device, &rec->sgid, ndev, &port_num,
-					 &gid_index);
+		ret = ib_find_cached_gid(device, &rec->sgid, rec->gid_type, ndev,
+					 &port_num, &gid_index);
 		if (ret) {
 			if (ndev)
 				dev_put(ndev);
@@ -1155,6 +1155,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query,
 			  mad->data, &rec);
 		rec.net = NULL;
 		rec.ifindex = 0;
+		rec.gid_type = IB_GID_TYPE_IB;
 		memset(rec.dmac, 0, ETH_ALEN);
 		query->callback(status, &rec, query->context);
 	} else
diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c
index 7d2f14c..af020f8 100644
--- a/drivers/infiniband/core/uverbs_marshall.c
+++ b/drivers/infiniband/core/uverbs_marshall.c
@@ -144,5 +144,6 @@ void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
 	memset(dst->dmac, 0, sizeof(dst->dmac));
 	dst->net = NULL;
 	dst->ifindex = 0;
+	dst->gid_type = IB_GID_TYPE_IB;
 }
 EXPORT_SYMBOL(ib_copy_path_rec_from_user);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..4263c4c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -387,6 +387,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 
 		if (!rdma_cap_eth_ah(device, port_num)) {
 			ret = ib_find_cached_gid_by_port(device, &grh->dgid,
+							 IB_GID_TYPE_IB,
 							 port_num, NULL,
 							 &gid_index);
 			if (ret)
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index 269a27cf..e30f19b 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -60,6 +60,7 @@ int ib_get_cached_gid(struct ib_device    *device,
  *   a specified GID value occurs.
  * @device: The device to query.
  * @gid: The GID value to search for.
+ * @gid_type: The GID type to search for.
  * @ndev: In RoCE, the net device of the device. NULL means ignore.
  * @port_num: The port number of the device where the GID value was found.
  * @index: The index into the cached GID table where the GID was found.  This
@@ -70,6 +71,7 @@ int ib_get_cached_gid(struct ib_device    *device,
  */
 int ib_find_cached_gid(struct ib_device *device,
 		       const union ib_gid *gid,
+		       enum ib_gid_type gid_type,
 		       struct net_device *ndev,
 		       u8               *port_num,
 		       u16              *index);
@@ -79,6 +81,7 @@ int ib_find_cached_gid(struct ib_device *device,
  * GID value occurs
  * @device: The device to query.
  * @gid: The GID value to search for.
+ * @gid_type: The GID type to search for.
  * @port_num: The port number of the device where the GID value sould be
  *   searched.
  * @ndev: In RoCE, the net device of the device. Null means ignore.
@@ -90,6 +93,7 @@ int ib_find_cached_gid(struct ib_device *device,
  */
 int ib_find_cached_gid_by_port(struct ib_device *device,
 			       const union ib_gid *gid,
+			       enum ib_gid_type gid_type,
 			       u8               port_num,
 			       struct net_device *ndev,
 			       u16              *index);
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 3019695..0a40ed2 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -160,6 +160,7 @@ struct ib_sa_path_rec {
 	int	     ifindex;
 	/* ignored in IB */
 	struct net  *net;
+	enum ib_gid_type gid_type;
 };
 
 static inline struct net_device *ib_get_ndev_from_path(struct ib_sa_path_rec *rec)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9a68a19..2933aeb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -67,7 +67,15 @@ union ib_gid {
 
 extern union ib_gid zgid;
 
+enum ib_gid_type {
+	/* If link layer is Ethernet, this is RoCE V1 */
+	IB_GID_TYPE_IB        = 0,
+	IB_GID_TYPE_ROCE      = 0,
+	IB_GID_TYPE_SIZE
+};
+
 struct ib_gid_attr {
+	enum ib_gid_type	gid_type;
 	struct net_device	*ndev;
 };
 
@@ -2219,7 +2227,8 @@ int ib_modify_port(struct ib_device *device,
 		   struct ib_port_modify *port_modify);
 
 int ib_find_gid(struct ib_device *device, union ib_gid *gid,
-		struct net_device *ndev, u8 *port_num, u16 *index);
+		enum ib_gid_type gid_type, struct net_device *ndev,
+		u8 *port_num, u16 *index);
 
 int ib_find_pkey(struct ib_device *device,
 		 u8 port_num, u16 pkey, u16 *index);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 02/11] IB/cm: Use the source GID index type
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 13:47   ` [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs Matan Barak
                     ` (21 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Previosuly, cm and cma modules supported only IB and RoCE v1 GID type.
In order to support multiple GID types, the gid_type is passed to
cm_init_av_by_path and stored in the path record.

The rdma cm client would use a default GID type that will be saved in
rdma_id_private.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c  | 25 ++++++++++++++++++++-----
 drivers/infiniband/core/cma.c |  2 ++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index af8b907..5ea78ab 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -364,7 +364,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 	read_lock_irqsave(&cm.device_lock, flags);
 	list_for_each_entry(cm_dev, &cm.device_list, list) {
 		if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid,
-					IB_GID_TYPE_IB, ndev, &p, NULL)) {
+					path->gid_type, ndev, &p, NULL)) {
 			port = cm_dev->port[p-1];
 			break;
 		}
@@ -1600,6 +1600,8 @@ static int cm_req_handler(struct cm_work *work)
 	struct ib_cm_id *cm_id;
 	struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
 	struct cm_req_msg *req_msg;
+	union ib_gid gid;
+	struct ib_gid_attr gid_attr;
 	int ret;
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
@@ -1639,11 +1641,24 @@ static int cm_req_handler(struct cm_work *work)
 	cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]);
 
 	memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac, ETH_ALEN);
-	ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
+	ret = ib_get_cached_gid(work->port->cm_dev->ib_device,
+				work->port->port_num,
+				cm_id_priv->av.ah_attr.grh.sgid_index,
+				&gid, &gid_attr);
+	if (!ret) {
+		if (gid_attr.ndev)
+			dev_put(gid_attr.ndev);
+		work->path[0].gid_type = gid_attr.gid_type;
+		ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
+	}
 	if (ret) {
-		ib_get_cached_gid(work->port->cm_dev->ib_device,
-				  work->port->port_num, 0, &work->path[0].sgid,
-				  NULL);
+		int err = ib_get_cached_gid(work->port->cm_dev->ib_device,
+					    work->port->port_num, 0,
+					    &work->path[0].sgid,
+					    &gid_attr);
+		if (!err && gid_attr.ndev)
+			dev_put(gid_attr.ndev);
+		work->path[0].gid_type = gid_attr.gid_type;
 		ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
 			       &work->path[0].sgid, sizeof work->path[0].sgid,
 			       NULL, 0);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c19f822..2914e08 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -228,6 +228,7 @@ struct rdma_id_private {
 	u8			tos;
 	u8			reuseaddr;
 	u8			afonly;
+	enum ib_gid_type	gid_type;
 };
 
 struct cma_multicast {
@@ -2325,6 +2326,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 		ndev = dev_get_by_index(&init_net, addr->dev_addr.bound_dev_if);
 		route->path_rec->net = &init_net;
 		route->path_rec->ifindex = addr->dev_addr.bound_dev_if;
+		route->path_rec->gid_type = id_priv->gid_type;
 	}
 	if (!ndev) {
 		ret = -ENODEV;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 13:47   ` [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 02/11] IB/cm: Use the source GID index type Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Matan Barak
                     ` (20 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

This patch set adds attributes of net device and gid type to each GID
in the GID table. Users that use verbs directly need to specify
the GID index. Since the same GID could have different types or
associated net devices, users should have the ability to query the
associated GID attributes. Adding these attributes to sysfs.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
 drivers/infiniband/core/sysfs.c                  | 184 ++++++++++++++++++++++-
 2 files changed, 198 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband

diff --git a/Documentation/ABI/testing/sysfs-class-infiniband b/Documentation/ABI/testing/sysfs-class-infiniband
new file mode 100644
index 0000000..a86abe6
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-infiniband
@@ -0,0 +1,16 @@
+What:		/sys/class/infiniband/<hca>/ports/<port-number>/gid_attrs/ndevs/<gid-index>
+Date:		November 29, 2015
+KernelVersion:	4.4.0
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description: 	The net-device's name associated with the GID resides
+		at index <gid-index>.
+
+What:		/sys/class/infiniband/<hca>/ports/<port-number>/gid_attrs/types/<gid-index>
+Date:		November 29, 2015
+KernelVersion:	4.4.0
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description: 	The RoCE type of the associated GID resides at index <gid-index>.
+		This could either be "IB/RoCE v1" for IB and RoCE v1 based GODs
+		or "RoCE v2" for RoCE v2 based GIDs.
+
+
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index b1f37d4..4d5d87a 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -37,12 +37,22 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/string.h>
+#include <linux/netdevice.h>
 
 #include <rdma/ib_mad.h>
 
+struct ib_port;
+
+struct gid_attr_group {
+	struct ib_port		*port;
+	struct kobject		kobj;
+	struct attribute_group	ndev;
+	struct attribute_group	type;
+};
 struct ib_port {
 	struct kobject         kobj;
 	struct ib_device      *ibdev;
+	struct gid_attr_group *gid_attr_group;
 	struct attribute_group gid_group;
 	struct attribute_group pkey_group;
 	u8                     port_num;
@@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = {
 	.show = port_attr_show
 };
 
+static ssize_t gid_attr_show(struct kobject *kobj,
+			     struct attribute *attr, char *buf)
+{
+	struct port_attribute *port_attr =
+		container_of(attr, struct port_attribute, attr);
+	struct ib_port *p = container_of(kobj, struct gid_attr_group,
+					 kobj)->port;
+
+	if (!port_attr->show)
+		return -EIO;
+
+	return port_attr->show(p, port_attr, buf);
+}
+
+static const struct sysfs_ops gid_attr_sysfs_ops = {
+	.show = gid_attr_show
+};
+
 static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
 			  char *buf)
 {
@@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = {
 	NULL
 };
 
+static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf)
+{
+	if (!gid_attr->ndev)
+		return -EINVAL;
+
+	return sprintf(buf, "%s\n", gid_attr->ndev->name);
+}
+
+static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf)
+{
+	return sprintf(buf, "%s\n", ib_cache_gid_type_str(gid_attr->gid_type));
+}
+
+static ssize_t _show_port_gid_attr(struct ib_port *p,
+				   struct port_attribute *attr,
+				   char *buf,
+				   size_t (*print)(struct ib_gid_attr *gid_attr,
+						   char *buf))
+{
+	struct port_table_attribute *tab_attr =
+		container_of(attr, struct port_table_attribute, attr);
+	union ib_gid gid;
+	struct ib_gid_attr gid_attr = {};
+	ssize_t ret;
+	va_list args;
+
+	ret = ib_query_gid(p->ibdev, p->port_num, tab_attr->index, &gid,
+			   &gid_attr);
+	if (ret)
+		goto err;
+
+	ret = print(&gid_attr, buf);
+
+err:
+	if (gid_attr.ndev)
+		dev_put(gid_attr.ndev);
+	va_end(args);
+	return ret;
+}
+
 static ssize_t show_port_gid(struct ib_port *p, struct port_attribute *attr,
 			     char *buf)
 {
@@ -296,6 +364,19 @@ static ssize_t show_port_gid(struct ib_port *p, struct port_attribute *attr,
 	return sprintf(buf, "%pI6\n", gid.raw);
 }
 
+static ssize_t show_port_gid_attr_ndev(struct ib_port *p,
+				       struct port_attribute *attr, char *buf)
+{
+	return _show_port_gid_attr(p, attr, buf, print_ndev);
+}
+
+static ssize_t show_port_gid_attr_gid_type(struct ib_port *p,
+					   struct port_attribute *attr,
+					   char *buf)
+{
+	return _show_port_gid_attr(p, attr, buf, print_gid_type);
+}
+
 static ssize_t show_port_pkey(struct ib_port *p, struct port_attribute *attr,
 			      char *buf)
 {
@@ -451,12 +532,41 @@ static void ib_port_release(struct kobject *kobj)
 	kfree(p);
 }
 
+static void ib_port_gid_attr_release(struct kobject *kobj)
+{
+	struct gid_attr_group *g = container_of(kobj, struct gid_attr_group,
+						kobj);
+	struct attribute *a;
+	int i;
+
+	if (g->ndev.attrs) {
+		for (i = 0; (a = g->ndev.attrs[i]); ++i)
+			kfree(a);
+
+		kfree(g->ndev.attrs);
+	}
+
+	if (g->type.attrs) {
+		for (i = 0; (a = g->type.attrs[i]); ++i)
+			kfree(a);
+
+		kfree(g->type.attrs);
+	}
+
+	kfree(g);
+}
+
 static struct kobj_type port_type = {
 	.release       = ib_port_release,
 	.sysfs_ops     = &port_sysfs_ops,
 	.default_attrs = port_default_attrs
 };
 
+static struct kobj_type gid_attr_type = {
+	.sysfs_ops      = &gid_attr_sysfs_ops,
+	.release        = ib_port_gid_attr_release
+};
+
 static struct attribute **
 alloc_group_attrs(ssize_t (*show)(struct ib_port *,
 				  struct port_attribute *, char *buf),
@@ -528,9 +638,23 @@ static int add_port(struct ib_device *device, int port_num,
 		return ret;
 	}
 
+	p->gid_attr_group = kzalloc(sizeof(*p->gid_attr_group), GFP_KERNEL);
+	if (!p->gid_attr_group) {
+		ret = -ENOMEM;
+		goto err_put;
+	}
+
+	p->gid_attr_group->port = p;
+	ret = kobject_init_and_add(&p->gid_attr_group->kobj, &gid_attr_type,
+				   &p->kobj, "gid_attrs");
+	if (ret) {
+		kfree(p->gid_attr_group);
+		goto err_put;
+	}
+
 	ret = sysfs_create_group(&p->kobj, &pma_group);
 	if (ret)
-		goto err_put;
+		goto err_put_gid_attrs;
 
 	p->gid_group.name  = "gids";
 	p->gid_group.attrs = alloc_group_attrs(show_port_gid, attr.gid_tbl_len);
@@ -543,12 +667,38 @@ static int add_port(struct ib_device *device, int port_num,
 	if (ret)
 		goto err_free_gid;
 
+	p->gid_attr_group->ndev.name = "ndevs";
+	p->gid_attr_group->ndev.attrs = alloc_group_attrs(show_port_gid_attr_ndev,
+							  attr.gid_tbl_len);
+	if (!p->gid_attr_group->ndev.attrs) {
+		ret = -ENOMEM;
+		goto err_remove_gid;
+	}
+
+	ret = sysfs_create_group(&p->gid_attr_group->kobj,
+				 &p->gid_attr_group->ndev);
+	if (ret)
+		goto err_free_gid_ndev;
+
+	p->gid_attr_group->type.name = "types";
+	p->gid_attr_group->type.attrs = alloc_group_attrs(show_port_gid_attr_gid_type,
+							  attr.gid_tbl_len);
+	if (!p->gid_attr_group->type.attrs) {
+		ret = -ENOMEM;
+		goto err_remove_gid_ndev;
+	}
+
+	ret = sysfs_create_group(&p->gid_attr_group->kobj,
+				 &p->gid_attr_group->type);
+	if (ret)
+		goto err_free_gid_type;
+
 	p->pkey_group.name  = "pkeys";
 	p->pkey_group.attrs = alloc_group_attrs(show_port_pkey,
 						attr.pkey_tbl_len);
 	if (!p->pkey_group.attrs) {
 		ret = -ENOMEM;
-		goto err_remove_gid;
+		goto err_remove_gid_type;
 	}
 
 	ret = sysfs_create_group(&p->kobj, &p->pkey_group);
@@ -576,6 +726,28 @@ err_free_pkey:
 	kfree(p->pkey_group.attrs);
 	p->pkey_group.attrs = NULL;
 
+err_remove_gid_type:
+	sysfs_remove_group(&p->gid_attr_group->kobj,
+			   &p->gid_attr_group->type);
+
+err_free_gid_type:
+	for (i = 0; i < attr.gid_tbl_len; ++i)
+		kfree(p->gid_attr_group->type.attrs[i]);
+
+	kfree(p->gid_attr_group->type.attrs);
+	p->gid_attr_group->type.attrs = NULL;
+
+err_remove_gid_ndev:
+	sysfs_remove_group(&p->gid_attr_group->kobj,
+			   &p->gid_attr_group->ndev);
+
+err_free_gid_ndev:
+	for (i = 0; i < attr.gid_tbl_len; ++i)
+		kfree(p->gid_attr_group->ndev.attrs[i]);
+
+	kfree(p->gid_attr_group->ndev.attrs);
+	p->gid_attr_group->ndev.attrs = NULL;
+
 err_remove_gid:
 	sysfs_remove_group(&p->kobj, &p->gid_group);
 
@@ -589,6 +761,9 @@ err_free_gid:
 err_remove_pma:
 	sysfs_remove_group(&p->kobj, &pma_group);
 
+err_put_gid_attrs:
+	kobject_put(&p->gid_attr_group->kobj);
+
 err_put:
 	kobject_put(&p->kobj);
 	return ret;
@@ -803,6 +978,11 @@ static void free_port_list_attributes(struct ib_device *device)
 		sysfs_remove_group(p, &pma_group);
 		sysfs_remove_group(p, &port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
+		sysfs_remove_group(&port->gid_attr_group->kobj,
+				   &port->gid_attr_group->ndev);
+		sysfs_remove_group(&port->gid_attr_group->kobj,
+				   &port->gid_attr_group->type);
+		kobject_put(&port->gid_attr_group->kobj);
 		kobject_put(p);
 	}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc Matan Barak
                     ` (19 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Adding RoCE v2 GID type and port type. Vendors
which support this type will get their GID table
populated with RoCE v2 GIDs automatically.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c         |  1 +
 drivers/infiniband/core/roce_gid_mgmt.c |  3 ++-
 include/rdma/ib_verbs.h                 | 23 +++++++++++++++++++++--
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 566fd8f..88b4b6f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -128,6 +128,7 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 
 static const char * const gid_type_str[] = {
 	[IB_GID_TYPE_IB]	= "IB/RoCE v1",
+	[IB_GID_TYPE_ROCE_UDP_ENCAP]	= "RoCE v2",
 };
 
 const char *ib_cache_gid_type_str(enum ib_gid_type gid_type)
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 61c27a7..1e3673f 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -71,7 +71,8 @@ static const struct {
 	bool (*is_supported)(const struct ib_device *device, u8 port_num);
 	enum ib_gid_type gid_type;
 } PORT_CAP_TO_GID_TYPE[] = {
-	{rdma_protocol_roce,   IB_GID_TYPE_ROCE},
+	{rdma_protocol_roce_eth_encap, IB_GID_TYPE_ROCE},
+	{rdma_protocol_roce_udp_encap, IB_GID_TYPE_ROCE_UDP_ENCAP},
 };
 
 #define CAP_TO_GID_TABLE_SIZE	ARRAY_SIZE(PORT_CAP_TO_GID_TYPE)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2933aeb..87df931 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -71,6 +71,7 @@ enum ib_gid_type {
 	/* If link layer is Ethernet, this is RoCE V1 */
 	IB_GID_TYPE_IB        = 0,
 	IB_GID_TYPE_ROCE      = 0,
+	IB_GID_TYPE_ROCE_UDP_ENCAP = 1,
 	IB_GID_TYPE_SIZE
 };
 
@@ -401,6 +402,7 @@ union rdma_protocol_stats {
 #define RDMA_CORE_CAP_PROT_IB           0x00100000
 #define RDMA_CORE_CAP_PROT_ROCE         0x00200000
 #define RDMA_CORE_CAP_PROT_IWARP        0x00400000
+#define RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP 0x00800000
 
 #define RDMA_CORE_PORT_IBA_IB          (RDMA_CORE_CAP_PROT_IB  \
 					| RDMA_CORE_CAP_IB_MAD \
@@ -413,6 +415,12 @@ union rdma_protocol_stats {
 					| RDMA_CORE_CAP_IB_CM   \
 					| RDMA_CORE_CAP_AF_IB   \
 					| RDMA_CORE_CAP_ETH_AH)
+#define RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP			\
+					(RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP \
+					| RDMA_CORE_CAP_IB_MAD  \
+					| RDMA_CORE_CAP_IB_CM   \
+					| RDMA_CORE_CAP_AF_IB   \
+					| RDMA_CORE_CAP_ETH_AH)
 #define RDMA_CORE_PORT_IWARP           (RDMA_CORE_CAP_PROT_IWARP \
 					| RDMA_CORE_CAP_IW_CM)
 #define RDMA_CORE_PORT_INTEL_OPA       (RDMA_CORE_PORT_IBA_IB  \
@@ -1975,6 +1983,17 @@ static inline bool rdma_protocol_ib(const struct ib_device *device, u8 port_num)
 
 static inline bool rdma_protocol_roce(const struct ib_device *device, u8 port_num)
 {
+	return device->port_immutable[port_num].core_cap_flags &
+		(RDMA_CORE_CAP_PROT_ROCE | RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP);
+}
+
+static inline bool rdma_protocol_roce_udp_encap(const struct ib_device *device, u8 port_num)
+{
+	return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP;
+}
+
+static inline bool rdma_protocol_roce_eth_encap(const struct ib_device *device, u8 port_num)
+{
 	return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_ROCE;
 }
 
@@ -1985,8 +2004,8 @@ static inline bool rdma_protocol_iwarp(const struct ib_device *device, u8 port_n
 
 static inline bool rdma_ib_or_roce(const struct ib_device *device, u8 port_num)
 {
-	return device->port_immutable[port_num].core_cap_flags &
-		(RDMA_CORE_CAP_PROT_IB | RDMA_CORE_CAP_PROT_ROCE);
+	return rdma_protocol_ib(device, port_num) ||
+		rdma_protocol_roce(device, port_num);
 }
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
       [not found]     ` <1449150450-13679-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 13:47   ` [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file Matan Barak
                     ` (18 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

From: Somnath Kotur <Somnath.Kotur-idTK6quXuVSLFuii7jzJGg@public.gmane.org>

Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

We choose sgid_index and type from all the matching entries in
RDMA-CM based on hint from the IP stack and we set hop_limit for
the IP packet based on above hint from IP stack.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Somnath Kotur <Somnath.Kotur-idTK6quXuVSLFuii7jzJGg@public.gmane.org>
---
 drivers/infiniband/core/addr.c  |  14 +++++
 drivers/infiniband/core/cma.c   |  11 +++-
 drivers/infiniband/core/verbs.c | 123 ++++++++++++++++++++++++++++++++++++++--
 include/rdma/ib_addr.h          |   1 +
 include/rdma/ib_verbs.h         |  44 ++++++++++++++
 5 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 34b1ada..6e35299 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -256,6 +256,12 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 		goto put;
 	}
 
+	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
+	 * routable) and we could set the network type accordingly.
+	 */
+	if (rt->rt_uses_gateway)
+		addr->network = RDMA_NETWORK_IPV4;
+
 	ret = dst_fetch_ha(&rt->dst, addr, &fl4.daddr);
 put:
 	ip_rt_put(rt);
@@ -270,6 +276,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 {
 	struct flowi6 fl6;
 	struct dst_entry *dst;
+	struct rt6_info *rt;
 	int ret;
 
 	memset(&fl6, 0, sizeof fl6);
@@ -281,6 +288,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 	if ((ret = dst->error))
 		goto put;
 
+	rt = (struct rt6_info *)dst;
 	if (ipv6_addr_any(&fl6.saddr)) {
 		ret = ipv6_dev_get_saddr(addr->net, ip6_dst_idev(dst)->dev,
 					 &fl6.daddr, 0, &fl6.saddr);
@@ -304,6 +312,12 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 		goto put;
 	}
 
+	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
+	 * routable) and we could set the network type accordingly.
+	 */
+	if (rt->rt6i_flags & RTF_GATEWAY)
+		addr->network = RDMA_NETWORK_IPV6;
+
 	ret = dst_fetch_ha(dst, addr, &fl6.daddr);
 put:
 	dst_release(dst);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2914e08..5dc853c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2302,6 +2302,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 {
 	struct rdma_route *route = &id_priv->id.route;
 	struct rdma_addr *addr = &route->addr;
+	enum ib_gid_type network_gid_type;
 	struct cma_work *work;
 	int ret;
 	struct net_device *ndev = NULL;
@@ -2340,7 +2341,15 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.dst_addr,
 		    &route->path_rec->dgid);
 
-	route->path_rec->hop_limit = 1;
+	/* Use the hint from IP Stack to select GID Type */
+	network_gid_type = ib_network_to_gid_type(addr->dev_addr.network);
+	if (addr->dev_addr.network != RDMA_NETWORK_IB) {
+		route->path_rec->gid_type = network_gid_type;
+		/* TODO: get the hoplimit from the inet/inet6 device */
+		route->path_rec->hop_limit = IPV6_DEFAULT_HOPLIMIT;
+	} else {
+		route->path_rec->hop_limit = 1;
+	}
 	route->path_rec->reversible = 1;
 	route->path_rec->pkey = cpu_to_be16(0xffff);
 	route->path_rec->mtu_selector = IB_SA_EQ;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 4263c4c..c564131 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -311,8 +311,61 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
+static int ib_get_header_version(const union rdma_network_hdr *hdr)
+{
+	const struct iphdr *ip4h = (struct iphdr *)&hdr->roce4grh;
+	struct iphdr ip4h_checked;
+	const struct ipv6hdr *ip6h = (struct ipv6hdr *)&hdr->ibgrh;
+
+	/* If it's IPv6, the version must be 6, otherwise, the first
+	 * 20 bytes (before the IPv4 header) are garbled.
+	 */
+	if (ip6h->version != 6)
+		return (ip4h->version == 4) ? 4 : 0;
+	/* version may be 6 or 4 because the first 20 bytes could be garbled */
+
+	/* RoCE v2 requires no options, thus header length
+	 * must be 5 words
+	 */
+	if (ip4h->ihl != 5)
+		return 6;
+
+	/* Verify checksum.
+	 * We can't write on scattered buffers so we need to copy to
+	 * temp buffer.
+	 */
+	memcpy(&ip4h_checked, ip4h, sizeof(ip4h_checked));
+	ip4h_checked.check = 0;
+	ip4h_checked.check = ip_fast_csum((u8 *)&ip4h_checked, 5);
+	/* if IPv4 header checksum is OK, believe it */
+	if (ip4h->check == ip4h_checked.check)
+		return 4;
+	return 6;
+}
+
+static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
+						     u8 port_num,
+						     const struct ib_grh *grh)
+{
+	int grh_version;
+
+	if (rdma_protocol_ib(device, port_num))
+		return RDMA_NETWORK_IB;
+
+	grh_version = ib_get_header_version((union rdma_network_hdr *)grh);
+
+	if (grh_version == 4)
+		return RDMA_NETWORK_IPV4;
+
+	if (grh->next_hdr == IPPROTO_UDP)
+		return RDMA_NETWORK_IPV6;
+
+	return RDMA_NETWORK_ROCE_V1;
+}
+
 struct find_gid_index_context {
 	u16 vlan_id;
+	enum ib_gid_type gid_type;
 };
 
 static bool find_gid_index(const union ib_gid *gid,
@@ -322,6 +375,9 @@ static bool find_gid_index(const union ib_gid *gid,
 	struct find_gid_index_context *ctx =
 		(struct find_gid_index_context *)context;
 
+	if (ctx->gid_type != gid_attr->gid_type)
+		return false;
+
 	if ((!!(ctx->vlan_id != 0xffff) == !is_vlan_dev(gid_attr->ndev)) ||
 	    (is_vlan_dev(gid_attr->ndev) &&
 	     vlan_dev_vlan_id(gid_attr->ndev) != ctx->vlan_id))
@@ -332,14 +388,49 @@ static bool find_gid_index(const union ib_gid *gid,
 
 static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
 				   u16 vlan_id, const union ib_gid *sgid,
+				   enum ib_gid_type gid_type,
 				   u16 *gid_index)
 {
-	struct find_gid_index_context context = {.vlan_id = vlan_id};
+	struct find_gid_index_context context = {.vlan_id = vlan_id,
+						 .gid_type = gid_type};
 
 	return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index,
 				     &context, gid_index);
 }
 
+static int get_gids_from_rdma_hdr(union rdma_network_hdr *hdr,
+				  enum rdma_network_type net_type,
+				  union ib_gid *sgid, union ib_gid *dgid)
+{
+	struct sockaddr_in  src_in;
+	struct sockaddr_in  dst_in;
+	__be32 src_saddr, dst_saddr;
+
+	if (!sgid || !dgid)
+		return -EINVAL;
+
+	if (net_type == RDMA_NETWORK_IPV4) {
+		memcpy(&src_in.sin_addr.s_addr,
+		       &hdr->roce4grh.saddr, 4);
+		memcpy(&dst_in.sin_addr.s_addr,
+		       &hdr->roce4grh.daddr, 4);
+		src_saddr = src_in.sin_addr.s_addr;
+		dst_saddr = dst_in.sin_addr.s_addr;
+		ipv6_addr_set_v4mapped(src_saddr,
+				       (struct in6_addr *)sgid);
+		ipv6_addr_set_v4mapped(dst_saddr,
+				       (struct in6_addr *)dgid);
+		return 0;
+	} else if (net_type == RDMA_NETWORK_IPV6 ||
+		   net_type == RDMA_NETWORK_IB) {
+		*dgid = hdr->ibgrh.dgid;
+		*sgid = hdr->ibgrh.sgid;
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
 int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 		       const struct ib_wc *wc, const struct ib_grh *grh,
 		       struct ib_ah_attr *ah_attr)
@@ -347,9 +438,25 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 	u32 flow_class;
 	u16 gid_index;
 	int ret;
+	enum rdma_network_type net_type = RDMA_NETWORK_IB;
+	enum ib_gid_type gid_type = IB_GID_TYPE_IB;
+	union ib_gid dgid;
+	union ib_gid sgid;
 
 	memset(ah_attr, 0, sizeof *ah_attr);
 	if (rdma_cap_eth_ah(device, port_num)) {
+		if (wc->wc_flags & IB_WC_WITH_NETWORK_HDR_TYPE)
+			net_type = wc->network_hdr_type;
+		else
+			net_type = ib_get_net_type_by_grh(device, port_num, grh);
+		gid_type = ib_network_to_gid_type(net_type);
+	}
+	ret = get_gids_from_rdma_hdr((union rdma_network_hdr *)grh, net_type,
+				     &sgid, &dgid);
+	if (ret)
+		return ret;
+
+	if (rdma_protocol_roce(device, port_num)) {
 		u16 vlan_id = wc->wc_flags & IB_WC_WITH_VLAN ?
 				wc->vlan_id : 0xffff;
 
@@ -358,7 +465,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 
 		if (!(wc->wc_flags & IB_WC_WITH_SMAC) ||
 		    !(wc->wc_flags & IB_WC_WITH_VLAN)) {
-			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
+			ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
 							 ah_attr->dmac,
 							 wc->wc_flags & IB_WC_WITH_VLAN ?
 							 NULL : &vlan_id,
@@ -368,7 +475,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 		}
 
 		ret = get_sgid_index_from_eth(device, port_num, vlan_id,
-					      &grh->dgid, &gid_index);
+					      &dgid, gid_type, &gid_index);
 		if (ret)
 			return ret;
 
@@ -383,10 +490,10 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 
 	if (wc->wc_flags & IB_WC_GRH) {
 		ah_attr->ah_flags = IB_AH_GRH;
-		ah_attr->grh.dgid = grh->sgid;
+		ah_attr->grh.dgid = sgid;
 
 		if (!rdma_cap_eth_ah(device, port_num)) {
-			ret = ib_find_cached_gid_by_port(device, &grh->dgid,
+			ret = ib_find_cached_gid_by_port(device, &dgid,
 							 IB_GID_TYPE_IB,
 							 port_num, NULL,
 							 &gid_index);
@@ -1026,6 +1133,12 @@ int ib_resolve_eth_dmac(struct ib_qp *qp,
 					ret = -ENXIO;
 				goto out;
 			}
+			if (sgid_attr.gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
+				/* TODO: get the hoplimit from the inet/inet6
+				 * device
+				 */
+				qp_attr->ah_attr.grh.hop_limit =
+							IPV6_DEFAULT_HOPLIMIT;
 
 			ifindex = sgid_attr.ndev->ifindex;
 
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 1152859..c799caa 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -83,6 +83,7 @@ struct rdma_dev_addr {
 	int bound_dev_if;
 	enum rdma_transport_type transport;
 	struct net *net;
+	enum rdma_network_type network;
 };
 
 /**
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 87df931..31fb409 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -50,6 +50,8 @@
 #include <linux/workqueue.h>
 #include <linux/socket.h>
 #include <uapi/linux/if_ether.h>
+#include <net/ipv6.h>
+#include <net/ip.h>
 
 #include <linux/atomic.h>
 #include <linux/mmu_notifier.h>
@@ -107,6 +109,35 @@ enum rdma_protocol_type {
 __attribute_const__ enum rdma_transport_type
 rdma_node_get_transport(enum rdma_node_type node_type);
 
+enum rdma_network_type {
+	RDMA_NETWORK_IB,
+	RDMA_NETWORK_ROCE_V1 = RDMA_NETWORK_IB,
+	RDMA_NETWORK_IPV4,
+	RDMA_NETWORK_IPV6
+};
+
+static inline enum ib_gid_type ib_network_to_gid_type(enum rdma_network_type network_type)
+{
+	if (network_type == RDMA_NETWORK_IPV4 ||
+	    network_type == RDMA_NETWORK_IPV6)
+		return IB_GID_TYPE_ROCE_UDP_ENCAP;
+
+	/* IB_GID_TYPE_IB same as RDMA_NETWORK_ROCE_V1 */
+	return IB_GID_TYPE_IB;
+}
+
+static inline enum rdma_network_type ib_gid_to_network_type(enum ib_gid_type gid_type,
+							    union ib_gid *gid)
+{
+	if (gid_type == IB_GID_TYPE_IB)
+		return RDMA_NETWORK_IB;
+
+	if (ipv6_addr_v4mapped((struct in6_addr *)gid))
+		return RDMA_NETWORK_IPV4;
+	else
+		return RDMA_NETWORK_IPV6;
+}
+
 enum rdma_link_layer {
 	IB_LINK_LAYER_UNSPECIFIED,
 	IB_LINK_LAYER_INFINIBAND,
@@ -535,6 +566,17 @@ struct ib_grh {
 	union ib_gid	dgid;
 };
 
+union rdma_network_hdr {
+	struct ib_grh ibgrh;
+	struct {
+		/* The IB spec states that if it's IPv4, the header
+		 * is located in the last 20 bytes of the header.
+		 */
+		u8		reserved[20];
+		struct iphdr	roce4grh;
+	};
+};
+
 enum {
 	IB_MULTICAST_QPN = 0xffffff
 };
@@ -771,6 +813,7 @@ enum ib_wc_flags {
 	IB_WC_IP_CSUM_OK	= (1<<3),
 	IB_WC_WITH_SMAC		= (1<<4),
 	IB_WC_WITH_VLAN		= (1<<5),
+	IB_WC_WITH_NETWORK_HDR_TYPE	= (1<<6),
 };
 
 struct ib_wc {
@@ -793,6 +836,7 @@ struct ib_wc {
 	u8			port_num;	/* valid only for DR SMPs on switches */
 	u8			smac[ETH_ALEN];
 	u16			vlan_id;
+	u8			network_hdr_type;
 };
 
 enum ib_cq_notify_flags {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path Matan Barak
                     ` (17 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

In order to validate the route, we need an easy way to check if a
net-device belongs to our RDMA device. Move this helper function
to a header file in order to make this check easier.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/core_priv.h     | 13 +++++++++++++
 drivers/infiniband/core/roce_gid_mgmt.c | 20 ++++----------------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index d531f91..3b250a2 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -96,4 +96,17 @@ int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
 void ib_cache_release_one(struct ib_device *device);
 
+static inline bool rdma_is_upper_dev_rcu(struct net_device *dev,
+					 struct net_device *upper)
+{
+	struct net_device *_upper = NULL;
+	struct list_head *iter;
+
+	netdev_for_each_all_upper_dev_rcu(dev, _upper, iter)
+		if (_upper == upper)
+			break;
+
+	return _upper == upper;
+}
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 1e3673f..06556c3 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -139,18 +139,6 @@ static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct net_de
 	return BONDING_SLAVE_STATE_NA;
 }
 
-static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper)
-{
-	struct net_device *_upper = NULL;
-	struct list_head *iter;
-
-	netdev_for_each_all_upper_dev_rcu(dev, _upper, iter)
-		if (_upper == upper)
-			break;
-
-	return _upper == upper;
-}
-
 #define REQUIRED_BOND_STATES		(BONDING_SLAVE_STATE_ACTIVE |	\
 					 BONDING_SLAVE_STATE_NA)
 static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
@@ -168,7 +156,7 @@ static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
 	if (!real_dev)
 		real_dev = event_ndev;
 
-	res = ((is_upper_dev_rcu(rdma_ndev, event_ndev) &&
+	res = ((rdma_is_upper_dev_rcu(rdma_ndev, event_ndev) &&
 	       (is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) &
 		REQUIRED_BOND_STATES)) ||
 	       real_dev == rdma_ndev);
@@ -214,7 +202,7 @@ static int upper_device_filter(struct ib_device *ib_dev, u8 port,
 		return 1;
 
 	rcu_read_lock();
-	res = is_upper_dev_rcu(rdma_ndev, event_ndev);
+	res = rdma_is_upper_dev_rcu(rdma_ndev, event_ndev);
 	rcu_read_unlock();
 
 	return res;
@@ -244,7 +232,7 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
 	rcu_read_lock();
 	if (!rdma_ndev ||
 	    ((rdma_ndev != event_ndev &&
-	      !is_upper_dev_rcu(rdma_ndev, event_ndev)) ||
+	      !rdma_is_upper_dev_rcu(rdma_ndev, event_ndev)) ||
 	     is_eth_active_slave_of_bonding_rcu(rdma_ndev,
 						netdev_master_upper_dev_get_rcu(rdma_ndev)) ==
 	     BONDING_SLAVE_STATE_INACTIVE)) {
@@ -274,7 +262,7 @@ static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
 
 	rcu_read_lock();
 
-	if (is_upper_dev_rcu(rdma_ndev, event_ndev) &&
+	if (rdma_is_upper_dev_rcu(rdma_ndev, event_ndev) &&
 	    is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) ==
 	    BONDING_SLAVE_STATE_INACTIVE) {
 		unsigned long gid_type_mask;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count Matan Barak
                     ` (16 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

In order to make sure API users don't try to use SGIDs which don't
conform to the routing table, validate the route before searching
the RoCE GID table.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/addr.c           | 175 ++++++++++++++++++++++---------
 drivers/infiniband/core/cm.c             |  10 +-
 drivers/infiniband/core/cma.c            |  30 +++++-
 drivers/infiniband/core/sa_query.c       |  75 +++++++++++--
 drivers/infiniband/core/verbs.c          |  48 ++++++---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
 include/rdma/ib_addr.h                   |  10 +-
 7 files changed, 270 insertions(+), 80 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 6e35299..57eda11 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -121,7 +121,8 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 }
 EXPORT_SYMBOL(rdma_copy_addr);
 
-int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
+int rdma_translate_ip(const struct sockaddr *addr,
+		      struct rdma_dev_addr *dev_addr,
 		      u16 *vlan_id)
 {
 	struct net_device *dev;
@@ -139,7 +140,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 	switch (addr->sa_family) {
 	case AF_INET:
 		dev = ip_dev_find(dev_addr->net,
-			((struct sockaddr_in *) addr)->sin_addr.s_addr);
+			((const struct sockaddr_in *)addr)->sin_addr.s_addr);
 
 		if (!dev)
 			return ret;
@@ -154,7 +155,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 		rcu_read_lock();
 		for_each_netdev_rcu(dev_addr->net, dev) {
 			if (ipv6_chk_addr(dev_addr->net,
-					  &((struct sockaddr_in6 *) addr)->sin6_addr,
+					  &((const struct sockaddr_in6 *)addr)->sin6_addr,
 					  dev, 1)) {
 				ret = rdma_copy_addr(dev_addr, dev, NULL);
 				if (vlan_id)
@@ -198,7 +199,8 @@ static void queue_req(struct addr_req *req)
 	mutex_unlock(&lock);
 }
 
-static int dst_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr, void *daddr)
+static int dst_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr,
+			const void *daddr)
 {
 	struct neighbour *n;
 	int ret;
@@ -222,8 +224,9 @@ static int dst_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr, v
 }
 
 static int addr4_resolve(struct sockaddr_in *src_in,
-			 struct sockaddr_in *dst_in,
-			 struct rdma_dev_addr *addr)
+			 const struct sockaddr_in *dst_in,
+			 struct rdma_dev_addr *addr,
+			 struct rtable **prt)
 {
 	__be32 src_ip = src_in->sin_addr.s_addr;
 	__be32 dst_ip = dst_in->sin_addr.s_addr;
@@ -243,36 +246,23 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 	src_in->sin_family = AF_INET;
 	src_in->sin_addr.s_addr = fl4.saddr;
 
-	if (rt->dst.dev->flags & IFF_LOOPBACK) {
-		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr, NULL);
-		if (!ret)
-			memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
-		goto put;
-	}
-
-	/* If the device does ARP internally, return 'done' */
-	if (rt->dst.dev->flags & IFF_NOARP) {
-		ret = rdma_copy_addr(addr, rt->dst.dev, NULL);
-		goto put;
-	}
-
 	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
 	 * routable) and we could set the network type accordingly.
 	 */
 	if (rt->rt_uses_gateway)
 		addr->network = RDMA_NETWORK_IPV4;
 
-	ret = dst_fetch_ha(&rt->dst, addr, &fl4.daddr);
-put:
-	ip_rt_put(rt);
+	*prt = rt;
+	return 0;
 out:
 	return ret;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
 static int addr6_resolve(struct sockaddr_in6 *src_in,
-			 struct sockaddr_in6 *dst_in,
-			 struct rdma_dev_addr *addr)
+			 const struct sockaddr_in6 *dst_in,
+			 struct rdma_dev_addr *addr,
+			 struct dst_entry **pdst)
 {
 	struct flowi6 fl6;
 	struct dst_entry *dst;
@@ -299,49 +289,109 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 		src_in->sin6_addr = fl6.saddr;
 	}
 
-	if (dst->dev->flags & IFF_LOOPBACK) {
-		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr, NULL);
-		if (!ret)
-			memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
-		goto put;
-	}
-
-	/* If the device does ARP internally, return 'done' */
-	if (dst->dev->flags & IFF_NOARP) {
-		ret = rdma_copy_addr(addr, dst->dev, NULL);
-		goto put;
-	}
-
 	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
 	 * routable) and we could set the network type accordingly.
 	 */
 	if (rt->rt6i_flags & RTF_GATEWAY)
 		addr->network = RDMA_NETWORK_IPV6;
 
-	ret = dst_fetch_ha(dst, addr, &fl6.daddr);
+	*pdst = dst;
+	return 0;
 put:
 	dst_release(dst);
 	return ret;
 }
 #else
 static int addr6_resolve(struct sockaddr_in6 *src_in,
-			 struct sockaddr_in6 *dst_in,
-			 struct rdma_dev_addr *addr)
+			 const struct sockaddr_in6 *dst_in,
+			 struct rdma_dev_addr *addr,
+			 struct dst_entry **pdst)
 {
 	return -EADDRNOTAVAIL;
 }
 #endif
 
+static int addr_resolve_neigh(struct dst_entry *dst,
+			      const struct sockaddr *dst_in,
+			      struct rdma_dev_addr *addr)
+{
+	if (dst->dev->flags & IFF_LOOPBACK) {
+		int ret;
+
+		ret = rdma_translate_ip(dst_in, addr, NULL);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+			       MAX_ADDR_LEN);
+
+		return ret;
+	}
+
+	/* If the device does ARP internally */
+	if (!(dst->dev->flags & IFF_NOARP)) {
+		const struct sockaddr_in *dst_in4 =
+			(const struct sockaddr_in *)dst_in;
+		const struct sockaddr_in6 *dst_in6 =
+			(const struct sockaddr_in6 *)dst_in;
+
+		return dst_fetch_ha(dst, addr,
+				    dst_in->sa_family == AF_INET ?
+				    (const void *)&dst_in4->sin_addr.s_addr :
+				    (const void *)&dst_in6->sin6_addr);
+	}
+
+	return rdma_copy_addr(addr, dst->dev, NULL);
+}
+
 static int addr_resolve(struct sockaddr *src_in,
-			struct sockaddr *dst_in,
-			struct rdma_dev_addr *addr)
+			const struct sockaddr *dst_in,
+			struct rdma_dev_addr *addr,
+			bool resolve_neigh)
 {
+	struct net_device *ndev;
+	struct dst_entry *dst;
+	int ret;
+
 	if (src_in->sa_family == AF_INET) {
-		return addr4_resolve((struct sockaddr_in *) src_in,
-			(struct sockaddr_in *) dst_in, addr);
-	} else
-		return addr6_resolve((struct sockaddr_in6 *) src_in,
-			(struct sockaddr_in6 *) dst_in, addr);
+		struct rtable *rt = NULL;
+		const struct sockaddr_in *dst_in4 =
+			(const struct sockaddr_in *)dst_in;
+
+		ret = addr4_resolve((struct sockaddr_in *)src_in,
+				    dst_in4, addr, &rt);
+		if (ret)
+			return ret;
+
+		if (resolve_neigh)
+			ret = addr_resolve_neigh(&rt->dst, dst_in, addr);
+
+		ndev = rt->dst.dev;
+		dev_hold(ndev);
+
+		ip_rt_put(rt);
+	} else {
+		const struct sockaddr_in6 *dst_in6 =
+			(const struct sockaddr_in6 *)dst_in;
+
+		ret = addr6_resolve((struct sockaddr_in6 *)src_in,
+				    dst_in6, addr,
+				    &dst);
+		if (ret)
+			return ret;
+
+		if (resolve_neigh)
+			ret = addr_resolve_neigh(dst, dst_in, addr);
+
+		ndev = dst->dev;
+		dev_hold(ndev);
+
+		dst_release(dst);
+	}
+
+	addr->bound_dev_if = ndev->ifindex;
+	addr->net = dev_net(ndev);
+	dev_put(ndev);
+
+	return ret;
 }
 
 static void process_req(struct work_struct *work)
@@ -357,7 +407,8 @@ static void process_req(struct work_struct *work)
 		if (req->status == -ENODATA) {
 			src_in = (struct sockaddr *) &req->src_addr;
 			dst_in = (struct sockaddr *) &req->dst_addr;
-			req->status = addr_resolve(src_in, dst_in, req->addr);
+			req->status = addr_resolve(src_in, dst_in, req->addr,
+						   true);
 			if (req->status && time_after_eq(jiffies, req->timeout))
 				req->status = -ETIMEDOUT;
 			else if (req->status == -ENODATA)
@@ -417,7 +468,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 	req->client = client;
 	atomic_inc(&client->refcount);
 
-	req->status = addr_resolve(src_in, dst_in, addr);
+	req->status = addr_resolve(src_in, dst_in, addr, true);
 	switch (req->status) {
 	case 0:
 		req->timeout = jiffies;
@@ -439,6 +490,25 @@ err:
 }
 EXPORT_SYMBOL(rdma_resolve_ip);
 
+int rdma_resolve_ip_route(struct sockaddr *src_addr,
+			  const struct sockaddr *dst_addr,
+			  struct rdma_dev_addr *addr)
+{
+	struct sockaddr_storage ssrc_addr;
+	struct sockaddr *src_in = (struct sockaddr *)&ssrc_addr;
+
+	if (src_addr->sa_family != dst_addr->sa_family)
+		return -EINVAL;
+
+	if (src_addr)
+		memcpy(src_in, src_addr, rdma_addr_size(src_addr));
+	else
+		src_in->sa_family = dst_addr->sa_family;
+
+	return addr_resolve(src_in, dst_addr, addr, false);
+}
+EXPORT_SYMBOL(rdma_resolve_ip_route);
+
 void rdma_addr_cancel(struct rdma_dev_addr *addr)
 {
 	struct addr_req *req, *temp_req;
@@ -471,7 +541,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgid,
-			       u8 *dmac, u16 *vlan_id, int if_index)
+			       u8 *dmac, u16 *vlan_id, int *if_index)
 {
 	int ret = 0;
 	struct rdma_dev_addr dev_addr;
@@ -489,7 +559,8 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgi
 	rdma_gid2ip(&dgid_addr._sockaddr, dgid);
 
 	memset(&dev_addr, 0, sizeof(dev_addr));
-	dev_addr.bound_dev_if = if_index;
+	if (if_index)
+		dev_addr.bound_dev_if = *if_index;
 	dev_addr.net = &init_net;
 
 	ctx.addr = &dev_addr;
@@ -505,6 +576,8 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgi
 	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
 	if (!dev)
 		return -ENODEV;
+	if (if_index)
+		*if_index = dev_addr.bound_dev_if;
 	if (vlan_id)
 		*vlan_id = rdma_vlan_dev_vlan_id(dev);
 	dev_put(dev);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 5ea78ab..77bec4c 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1646,8 +1646,11 @@ static int cm_req_handler(struct cm_work *work)
 				cm_id_priv->av.ah_attr.grh.sgid_index,
 				&gid, &gid_attr);
 	if (!ret) {
-		if (gid_attr.ndev)
+		if (gid_attr.ndev) {
+			work->path[0].ifindex = gid_attr.ndev->ifindex;
+			work->path[0].net = dev_net(gid_attr.ndev);
 			dev_put(gid_attr.ndev);
+		}
 		work->path[0].gid_type = gid_attr.gid_type;
 		ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
 	}
@@ -1656,8 +1659,11 @@ static int cm_req_handler(struct cm_work *work)
 					    work->port->port_num, 0,
 					    &work->path[0].sgid,
 					    &gid_attr);
-		if (!err && gid_attr.ndev)
+		if (!err && gid_attr.ndev) {
+			work->path[0].ifindex = gid_attr.ndev->ifindex;
+			work->path[0].net = dev_net(gid_attr.ndev);
 			dev_put(gid_attr.ndev);
+		}
 		work->path[0].gid_type = gid_attr.gid_type;
 		ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
 			       &work->path[0].sgid, sizeof work->path[0].sgid,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 5dc853c..cf52b65 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -454,8 +454,20 @@ static inline int cma_validate_port(struct ib_device *device, u8 port,
 	if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
 		return ret;
 
-	if (dev_type == ARPHRD_ETHER)
+	if (dev_type == ARPHRD_ETHER) {
 		ndev = dev_get_by_index(&init_net, bound_if_index);
+		if (ndev && ndev->flags & IFF_LOOPBACK) {
+			pr_info("detected loopback device\n");
+			dev_put(ndev);
+
+			if (!device->get_netdev)
+				return -EOPNOTSUPP;
+
+			ndev = device->get_netdev(device, port);
+			if (!ndev)
+				return -ENODEV;
+		}
+	}
 
 	ret = ib_find_cached_gid_by_port(device, gid, IB_GID_TYPE_IB, port,
 					 ndev, NULL);
@@ -2325,8 +2337,22 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 
 	if (addr->dev_addr.bound_dev_if) {
 		ndev = dev_get_by_index(&init_net, addr->dev_addr.bound_dev_if);
+		if (!ndev)
+			return -ENODEV;
+
+		if (ndev->flags & IFF_LOOPBACK) {
+			dev_put(ndev);
+			if (!id_priv->id.device->get_netdev)
+				return -EOPNOTSUPP;
+
+			ndev = id_priv->id.device->get_netdev(id_priv->id.device,
+							      id_priv->id.port_num);
+			if (!ndev)
+				return -ENODEV;
+		}
+
 		route->path_rec->net = &init_net;
-		route->path_rec->ifindex = addr->dev_addr.bound_dev_if;
+		route->path_rec->ifindex = ndev->ifindex;
 		route->path_rec->gid_type = id_priv->gid_type;
 	}
 	if (!ndev) {
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 5606922..19919ad 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -49,7 +49,9 @@
 #include <net/netlink.h>
 #include <uapi/rdma/ib_user_sa.h>
 #include <rdma/ib_marshall.h>
+#include <rdma/ib_addr.h>
 #include "sa.h"
+#include "core_priv.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand subnet administration query support");
@@ -994,7 +996,8 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 {
 	int ret;
 	u16 gid_index;
-	int force_grh;
+	int use_roce;
+	struct net_device *ndev = NULL;
 
 	memset(ah_attr, 0, sizeof *ah_attr);
 	ah_attr->dlid = be16_to_cpu(rec->dlid);
@@ -1004,16 +1007,71 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 	ah_attr->port_num = port_num;
 	ah_attr->static_rate = rec->rate;
 
-	force_grh = rdma_cap_eth_ah(device, port_num);
+	use_roce = rdma_cap_eth_ah(device, port_num);
+
+	if (use_roce) {
+		struct net_device *idev;
+		struct net_device *resolved_dev;
+		struct rdma_dev_addr dev_addr = {.bound_dev_if = rec->ifindex,
+						 .net = rec->net ? rec->net :
+							 &init_net};
+		union {
+			struct sockaddr     _sockaddr;
+			struct sockaddr_in  _sockaddr_in;
+			struct sockaddr_in6 _sockaddr_in6;
+		} sgid_addr, dgid_addr;
+
+		if (!device->get_netdev)
+			return -EOPNOTSUPP;
+
+		rdma_gid2ip(&sgid_addr._sockaddr, &rec->sgid);
+		rdma_gid2ip(&dgid_addr._sockaddr, &rec->dgid);
+
+		/* validate the route */
+		ret = rdma_resolve_ip_route(&sgid_addr._sockaddr,
+					    &dgid_addr._sockaddr, &dev_addr);
+		if (ret)
+			return ret;
 
-	if (rec->hop_limit > 1 || force_grh) {
-		struct net_device *ndev = ib_get_ndev_from_path(rec);
+		if ((dev_addr.network == RDMA_NETWORK_IPV4 ||
+		     dev_addr.network == RDMA_NETWORK_IPV6) &&
+		    rec->gid_type != IB_GID_TYPE_ROCE_UDP_ENCAP)
+			return -EINVAL;
+
+		idev = device->get_netdev(device, port_num);
+		if (!idev)
+			return -ENODEV;
+
+		resolved_dev = dev_get_by_index(dev_addr.net,
+						dev_addr.bound_dev_if);
+		if (resolved_dev->flags & IFF_LOOPBACK) {
+			dev_put(resolved_dev);
+			resolved_dev = idev;
+			dev_hold(resolved_dev);
+		}
+		ndev = ib_get_ndev_from_path(rec);
+		rcu_read_lock();
+		if ((ndev && ndev != resolved_dev) ||
+		    (resolved_dev != idev &&
+		     !rdma_is_upper_dev_rcu(idev, resolved_dev)))
+			ret = -EHOSTUNREACH;
+		rcu_read_unlock();
+		dev_put(idev);
+		dev_put(resolved_dev);
+		if (ret) {
+			if (ndev)
+				dev_put(ndev);
+			return ret;
+		}
+	}
 
+	if (rec->hop_limit > 1 || use_roce) {
 		ah_attr->ah_flags = IB_AH_GRH;
 		ah_attr->grh.dgid = rec->dgid;
 
-		ret = ib_find_cached_gid(device, &rec->sgid, rec->gid_type, ndev,
-					 &port_num, &gid_index);
+		ret = ib_find_cached_gid_by_port(device, &rec->sgid,
+						 rec->gid_type, port_num, ndev,
+						 &gid_index);
 		if (ret) {
 			if (ndev)
 				dev_put(ndev);
@@ -1027,9 +1085,10 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 		if (ndev)
 			dev_put(ndev);
 	}
-	if (force_grh) {
+
+	if (use_roce)
 		memcpy(ah_attr->dmac, rec->dmac, ETH_ALEN);
-	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ib_init_ah_from_path);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index c564131..114614f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -457,30 +457,52 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 		return ret;
 
 	if (rdma_protocol_roce(device, port_num)) {
+		int if_index = 0;
 		u16 vlan_id = wc->wc_flags & IB_WC_WITH_VLAN ?
 				wc->vlan_id : 0xffff;
+		struct net_device *idev;
+		struct net_device *resolved_dev;
 
 		if (!(wc->wc_flags & IB_WC_GRH))
 			return -EPROTOTYPE;
 
-		if (!(wc->wc_flags & IB_WC_WITH_SMAC) ||
-		    !(wc->wc_flags & IB_WC_WITH_VLAN)) {
-			ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
-							 ah_attr->dmac,
-							 wc->wc_flags & IB_WC_WITH_VLAN ?
-							 NULL : &vlan_id,
-							 0);
-			if (ret)
-				return ret;
+		if (!device->get_netdev)
+			return -EOPNOTSUPP;
+
+		idev = device->get_netdev(device, port_num);
+		if (!idev)
+			return -ENODEV;
+
+		ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
+						 ah_attr->dmac,
+						 wc->wc_flags & IB_WC_WITH_VLAN ?
+						 NULL : &vlan_id,
+						 &if_index);
+		if (ret) {
+			dev_put(idev);
+			return ret;
 		}
 
+		resolved_dev = dev_get_by_index(&init_net, if_index);
+		if (resolved_dev->flags & IFF_LOOPBACK) {
+			dev_put(resolved_dev);
+			resolved_dev = idev;
+			dev_hold(resolved_dev);
+		}
+		rcu_read_lock();
+		if (resolved_dev != idev && !rdma_is_upper_dev_rcu(idev,
+								   resolved_dev))
+			ret = -EHOSTUNREACH;
+		rcu_read_unlock();
+		dev_put(idev);
+		dev_put(resolved_dev);
+		if (ret)
+			return ret;
+
 		ret = get_sgid_index_from_eth(device, port_num, vlan_id,
 					      &dgid, gid_type, &gid_index);
 		if (ret)
 			return ret;
-
-		if (wc->wc_flags & IB_WC_WITH_SMAC)
-			memcpy(ah_attr->dmac, wc->smac, ETH_ALEN);
 	}
 
 	ah_attr->dlid = wc->slid;
@@ -1145,7 +1167,7 @@ int ib_resolve_eth_dmac(struct ib_qp *qp,
 			ret = rdma_addr_find_dmac_by_grh(&sgid,
 							 &qp_attr->ah_attr.grh.dgid,
 							 qp_attr->ah_attr.dmac,
-							 NULL, ifindex);
+							 NULL, &ifindex);
 
 			dev_put(sgid_attr.ndev);
 		}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 9820074..a343e03 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -154,7 +154,7 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
 	    (!rdma_link_local_addr((struct in6_addr *)attr->grh.dgid.raw))) {
 		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
 						    attr->dmac, &vlan_tag,
-						    sgid_attr.ndev->ifindex);
+						    &sgid_attr.ndev->ifindex);
 		if (status) {
 			pr_err("%s(): Failed to resolve dmac from gid." 
 				"status = %d\n", __func__, status);
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index c799caa..87156dc 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -92,8 +92,8 @@ struct rdma_dev_addr {
  *
  * The dev_addr->net field must be initialized.
  */
-int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
-		      u16 *vlan_id);
+int rdma_translate_ip(const struct sockaddr *addr,
+		      struct rdma_dev_addr *dev_addr, u16 *vlan_id);
 
 /**
  * rdma_resolve_ip - Resolve source and destination IP addresses to
@@ -118,6 +118,10 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 				     struct rdma_dev_addr *addr, void *context),
 		    void *context);
 
+int rdma_resolve_ip_route(struct sockaddr *src_addr,
+			  const struct sockaddr *dst_addr,
+			  struct rdma_dev_addr *addr);
+
 void rdma_addr_cancel(struct rdma_dev_addr *addr);
 
 int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
@@ -127,7 +131,7 @@ int rdma_addr_size(struct sockaddr *addr);
 
 int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id);
 int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgid,
-			       u8 *smac, u16 *vlan_id, int if_index);
+			       u8 *smac, u16 *vlan_id, int *if_index);
 
 static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm Matan Barak
                     ` (15 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Currently, cma users can't increase or decrease the cma reference
count. This is necassary when setting cma attributes (like the
default GID type) in order to avoid use-after-free errors.
Adding cma_ref_dev and cma_deref_dev APIs.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c       | 11 +++++++++--
 drivers/infiniband/core/core_priv.h |  4 ++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cf52b65..f78088a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -60,6 +60,8 @@
 #include <rdma/ib_sa.h>
 #include <rdma/iw_cm.h>
 
+#include "core_priv.h"
+
 MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("Generic RDMA CM Agent");
 MODULE_LICENSE("Dual BSD/GPL");
@@ -185,6 +187,11 @@ enum {
 	CMA_OPTION_AFONLY,
 };
 
+void cma_ref_dev(struct cma_device *cma_dev)
+{
+	atomic_inc(&cma_dev->refcount);
+}
+
 /*
  * Device removal can occur at anytime, so we need extra handling to
  * serialize notifying the user of device removal with other callbacks.
@@ -339,7 +346,7 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
-	atomic_inc(&cma_dev->refcount);
+	cma_ref_dev(cma_dev);
 	id_priv->cma_dev = cma_dev;
 	id_priv->id.device = cma_dev->device;
 	id_priv->id.route.addr.dev_addr.transport =
@@ -347,7 +354,7 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 	list_add_tail(&id_priv->list, &cma_dev->id_list);
 }
 
-static inline void cma_deref_dev(struct cma_device *cma_dev)
+void cma_deref_dev(struct cma_device *cma_dev)
 {
 	if (atomic_dec_and_test(&cma_dev->refcount))
 		complete(&cma_dev->comp);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3b250a2..1945b4e 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -38,6 +38,10 @@
 
 #include <rdma/ib_verbs.h>
 
+struct cma_device;
+void cma_ref_dev(struct cma_device *cma_dev);
+void cma_deref_dev(struct cma_device *cma_dev);
+
 int  ib_device_register_sysfs(struct ib_device *device,
 			      int (*port_callback)(struct ib_device *,
 						   u8, struct kobject *));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers Matan Barak
                     ` (14 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Users would like to control the behaviour of rdma_cm.
For example, old applications which don't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.

In order to use the configfs, one needs to mount it and
mkdir <IB device name> inside rdma_cm directory.

The patch adds support for a single configuration file,
default_roce_mode. The mode can either be "IB/RoCE v1" or
"RoCE v2".

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 Documentation/ABI/testing/configfs-rdma_cm |  22 ++
 drivers/infiniband/Kconfig                 |   9 +
 drivers/infiniband/core/Makefile           |   2 +
 drivers/infiniband/core/cache.c            |  24 +++
 drivers/infiniband/core/cma.c              | 108 +++++++++-
 drivers/infiniband/core/cma_configfs.c     | 321 +++++++++++++++++++++++++++++
 drivers/infiniband/core/core_priv.h        |  24 +++
 7 files changed, 503 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
 create mode 100644 drivers/infiniband/core/cma_configfs.c

diff --git a/Documentation/ABI/testing/configfs-rdma_cm b/Documentation/ABI/testing/configfs-rdma_cm
new file mode 100644
index 0000000..5c389aa
--- /dev/null
+++ b/Documentation/ABI/testing/configfs-rdma_cm
@@ -0,0 +1,22 @@
+What: 		/config/rdma_cm
+Date: 		November 29, 2015
+KernelVersion:  4.4.0
+Description: 	Interface is used to configure RDMA-cable HCAs in respect to
+		RDMA-CM attributes.
+
+		Attributes are visible only when configfs is mounted. To mount
+		configfs in /config directory use:
+		# mount -t configfs none /config/
+
+		In order to set parameters related to a specific HCA, a directory
+		for this HCA has to be created:
+		mkdir -p /config/rdma_cm/<hca>
+
+
+What: 		/config/rdma_cm/<hca>/ports/<port-num>/default_roce_mode
+Date: 		November 29, 2015
+KernelVersion:  4.4.0
+Description: 	RDMA-CM based connections from HCA <hca> at port <port-num>
+		will be initiated with this RoCE type as default.
+		The possible RoCE types are either "IB/RoCE v1" or "RoCE v2".
+		This parameter has RW access.
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index aa26f3c..f5312da 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -54,6 +54,15 @@ config INFINIBAND_ADDR_TRANS
 	depends on INFINIBAND
 	default y
 
+config INFINIBAND_ADDR_TRANS_CONFIGFS
+	bool
+	depends on INFINIBAND_ADDR_TRANS && !(INFINIBAND=y && CONFIGFS_FS=m)
+	default y
+	---help---
+	  ConfigFS support for RDMA communication manager (CM).
+	  This allows the user to config the default GID type that the CM
+	  uses for each device, when initiaing new connections.
+
 source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/qib/Kconfig"
 source "drivers/infiniband/hw/cxgb3/Kconfig"
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index d43a899..7922fa7 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -24,6 +24,8 @@ iw_cm-y :=			iwcm.o iwpm_util.o iwpm_msg.o
 
 rdma_cm-y :=			cma.o
 
+rdma_cm-$(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS) += cma_configfs.o
+
 rdma_ucm-y :=			ucma.o
 
 ib_addr-y :=			addr.o
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 88b4b6f..4aada52 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -140,6 +140,30 @@ const char *ib_cache_gid_type_str(enum ib_gid_type gid_type)
 }
 EXPORT_SYMBOL(ib_cache_gid_type_str);
 
+int ib_cache_gid_parse_type_str(const char *buf)
+{
+	unsigned int i;
+	size_t len;
+	int err = -EINVAL;
+
+	len = strlen(buf);
+	if (len == 0)
+		return -EINVAL;
+
+	if (buf[len - 1] == '\n')
+		len--;
+
+	for (i = 0; i < ARRAY_SIZE(gid_type_str); ++i)
+		if (gid_type_str[i] && !strncmp(buf, gid_type_str[i], len) &&
+		    len == strlen(gid_type_str[i])) {
+			err = i;
+			break;
+		}
+
+	return err;
+}
+EXPORT_SYMBOL(ib_cache_gid_parse_type_str);
+
 /* This function expects that rwlock will be write locked in all
  * scenarios and that lock will be locked in sleep-able (RoCE)
  * scenarios.
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f78088a..8fab267 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -152,6 +152,7 @@ struct cma_device {
 	struct completion	comp;
 	atomic_t		refcount;
 	struct list_head	id_list;
+	enum ib_gid_type	*default_gid_type;
 };
 
 struct rdma_bind_list {
@@ -192,6 +193,62 @@ void cma_ref_dev(struct cma_device *cma_dev)
 	atomic_inc(&cma_dev->refcount);
 }
 
+struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter	filter,
+					     void		*cookie)
+{
+	struct cma_device *cma_dev;
+	struct cma_device *found_cma_dev = NULL;
+
+	mutex_lock(&lock);
+
+	list_for_each_entry(cma_dev, &dev_list, list)
+		if (filter(cma_dev->device, cookie)) {
+			found_cma_dev = cma_dev;
+			break;
+		}
+
+	if (found_cma_dev)
+		cma_ref_dev(found_cma_dev);
+	mutex_unlock(&lock);
+	return found_cma_dev;
+}
+
+int cma_get_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port)
+{
+	if (port < rdma_start_port(cma_dev->device) ||
+	    port > rdma_end_port(cma_dev->device))
+		return -EINVAL;
+
+	return cma_dev->default_gid_type[port - rdma_start_port(cma_dev->device)];
+}
+
+int cma_set_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port,
+			     enum ib_gid_type default_gid_type)
+{
+	unsigned long supported_gids;
+
+	if (port < rdma_start_port(cma_dev->device) ||
+	    port > rdma_end_port(cma_dev->device))
+		return -EINVAL;
+
+	supported_gids = roce_gid_type_mask_support(cma_dev->device, port);
+
+	if (!(supported_gids & 1 << default_gid_type))
+		return -EINVAL;
+
+	cma_dev->default_gid_type[port - rdma_start_port(cma_dev->device)] =
+		default_gid_type;
+
+	return 0;
+}
+
+struct ib_device *cma_get_ib_dev(struct cma_device *cma_dev)
+{
+	return cma_dev->device;
+}
+
 /*
  * Device removal can occur at anytime, so we need extra handling to
  * serialize notifying the user of device removal with other callbacks.
@@ -343,17 +400,27 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
 	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
 }
 
-static void cma_attach_to_dev(struct rdma_id_private *id_priv,
-			      struct cma_device *cma_dev)
+static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
+			       struct cma_device *cma_dev)
 {
 	cma_ref_dev(cma_dev);
 	id_priv->cma_dev = cma_dev;
+	id_priv->gid_type = 0;
 	id_priv->id.device = cma_dev->device;
 	id_priv->id.route.addr.dev_addr.transport =
 		rdma_node_get_transport(cma_dev->device->node_type);
 	list_add_tail(&id_priv->list, &cma_dev->id_list);
 }
 
+static void cma_attach_to_dev(struct rdma_id_private *id_priv,
+			      struct cma_device *cma_dev)
+{
+	_cma_attach_to_dev(id_priv, cma_dev);
+	id_priv->gid_type =
+		cma_dev->default_gid_type[id_priv->id.port_num -
+					  rdma_start_port(cma_dev->device)];
+}
+
 void cma_deref_dev(struct cma_device *cma_dev)
 {
 	if (atomic_dec_and_test(&cma_dev->refcount))
@@ -449,6 +516,7 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
 }
 
 static inline int cma_validate_port(struct ib_device *device, u8 port,
+				    enum ib_gid_type gid_type,
 				      union ib_gid *gid, int dev_type,
 				      int bound_if_index)
 {
@@ -474,9 +542,11 @@ static inline int cma_validate_port(struct ib_device *device, u8 port,
 			if (!ndev)
 				return -ENODEV;
 		}
+	} else {
+		gid_type = IB_GID_TYPE_IB;
 	}
 
-	ret = ib_find_cached_gid_by_port(device, gid, IB_GID_TYPE_IB, port,
+	ret = ib_find_cached_gid_by_port(device, gid, gid_type, port,
 					 ndev, NULL);
 
 	if (ndev)
@@ -511,7 +581,10 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 		gidp = rdma_protocol_roce(cma_dev->device, port) ?
 		       &iboe_gid : &gid;
 
-		ret = cma_validate_port(cma_dev->device, port, gidp,
+		ret = cma_validate_port(cma_dev->device, port,
+					rdma_protocol_ib(cma_dev->device, port) ?
+					IB_GID_TYPE_IB :
+					listen_id_priv->gid_type, gidp,
 					dev_addr->dev_type,
 					dev_addr->bound_dev_if);
 		if (!ret) {
@@ -530,8 +603,11 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 			gidp = rdma_protocol_roce(cma_dev->device, port) ?
 			       &iboe_gid : &gid;
 
-			ret = cma_validate_port(cma_dev->device, port, gidp,
-						dev_addr->dev_type,
+			ret = cma_validate_port(cma_dev->device, port,
+						rdma_protocol_ib(cma_dev->device, port) ?
+						IB_GID_TYPE_IB :
+						cma_dev->default_gid_type[port - 1],
+						gidp, dev_addr->dev_type,
 						dev_addr->bound_dev_if);
 			if (!ret) {
 				id_priv->id.port_num = port;
@@ -2073,7 +2149,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 	memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv),
 	       rdma_addr_size(cma_src_addr(id_priv)));
 
-	cma_attach_to_dev(dev_id_priv, cma_dev);
+	_cma_attach_to_dev(dev_id_priv, cma_dev);
 	list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
 	atomic_inc(&id_priv->refcount);
 	dev_id_priv->internal_id = 1;
@@ -3907,12 +3983,27 @@ static void cma_add_one(struct ib_device *device)
 {
 	struct cma_device *cma_dev;
 	struct rdma_id_private *id_priv;
+	unsigned int i;
+	unsigned long supported_gids = 0;
 
 	cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
 	if (!cma_dev)
 		return;
 
 	cma_dev->device = device;
+	cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
+					    sizeof(*cma_dev->default_gid_type),
+					    GFP_KERNEL);
+	if (!cma_dev->default_gid_type) {
+		kfree(cma_dev);
+		return;
+	}
+	for (i = rdma_start_port(device); i <= rdma_end_port(device); i++) {
+		supported_gids = roce_gid_type_mask_support(device, i);
+		WARN_ON(!supported_gids);
+		cma_dev->default_gid_type[i - rdma_start_port(device)] =
+			find_first_bit(&supported_gids, BITS_PER_LONG);
+	}
 
 	init_completion(&cma_dev->comp);
 	atomic_set(&cma_dev->refcount, 1);
@@ -3992,6 +4083,7 @@ static void cma_remove_one(struct ib_device *device, void *client_data)
 	mutex_unlock(&lock);
 
 	cma_process_remove(cma_dev);
+	kfree(cma_dev->default_gid_type);
 	kfree(cma_dev);
 }
 
@@ -4125,6 +4217,7 @@ static int __init cma_init(void)
 
 	if (ibnl_add_client(RDMA_NL_RDMA_CM, RDMA_NL_RDMA_CM_NUM_OPS, cma_cb_table))
 		printk(KERN_WARNING "RDMA CMA: failed to add netlink callback\n");
+	cma_configfs_init();
 
 	return 0;
 
@@ -4139,6 +4232,7 @@ err_wq:
 
 static void __exit cma_cleanup(void)
 {
+	cma_configfs_exit();
 	ibnl_remove_client(RDMA_NL_RDMA_CM);
 	ib_unregister_client(&cma_client);
 	unregister_netdevice_notifier(&cma_nb);
diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
new file mode 100644
index 0000000..744834b
--- /dev/null
+++ b/drivers/infiniband/core/cma_configfs.c
@@ -0,0 +1,321 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/configfs.h>
+#include <rdma/ib_verbs.h>
+#include "core_priv.h"
+
+struct cma_device;
+
+struct cma_dev_group;
+
+struct cma_dev_port_group {
+	unsigned int		port_num;
+	struct cma_dev_group	*cma_dev_group;
+	struct config_group	group;
+};
+
+struct cma_dev_group {
+	char				name[IB_DEVICE_NAME_MAX];
+	struct config_group		device_group;
+	struct config_group		ports_group;
+	struct config_group		*default_dev_group[2];
+	struct config_group		**default_ports_group;
+	struct cma_dev_port_group	*ports;
+};
+
+static struct cma_dev_port_group *to_dev_port_group(struct config_item *item)
+{
+	struct config_group *group;
+
+	if (!item)
+		return NULL;
+
+	group = container_of(item, struct config_group, cg_item);
+	return container_of(group, struct cma_dev_port_group, group);
+}
+
+static bool filter_by_name(struct ib_device *ib_dev, void *cookie)
+{
+	return !strcmp(ib_dev->name, cookie);
+}
+
+static int cma_configfs_params_get(struct config_item *item,
+				   struct cma_device **pcma_dev,
+				   struct cma_dev_port_group **pgroup)
+{
+	struct cma_dev_port_group *group = to_dev_port_group(item);
+	struct cma_device *cma_dev;
+
+	if (!group)
+		return -ENODEV;
+
+	cma_dev = cma_enum_devices_by_ibdev(filter_by_name,
+					    group->cma_dev_group->name);
+	if (!cma_dev)
+		return -ENODEV;
+
+	*pcma_dev = cma_dev;
+	*pgroup = group;
+
+	return 0;
+}
+
+static void cma_configfs_params_put(struct cma_device *cma_dev)
+{
+	cma_deref_dev(cma_dev);
+}
+
+static ssize_t default_roce_mode_show(struct config_item *item,
+				      char *buf)
+{
+	struct cma_device *cma_dev;
+	struct cma_dev_port_group *group;
+	int gid_type;
+	ssize_t ret;
+
+	ret = cma_configfs_params_get(item, &cma_dev, &group);
+	if (ret)
+		return ret;
+
+	gid_type = cma_get_default_gid_type(cma_dev, group->port_num);
+	cma_configfs_params_put(cma_dev);
+
+	if (gid_type < 0)
+		return gid_type;
+
+	return sprintf(buf, "%s\n", ib_cache_gid_type_str(gid_type));
+}
+
+static ssize_t default_roce_mode_store(struct config_item *item,
+				       const char *buf, size_t count)
+{
+	struct cma_device *cma_dev;
+	struct cma_dev_port_group *group;
+	int gid_type = ib_cache_gid_parse_type_str(buf);
+	ssize_t ret;
+
+	if (gid_type < 0)
+		return -EINVAL;
+
+	ret = cma_configfs_params_get(item, &cma_dev, &group);
+	if (ret)
+		return ret;
+
+	ret = cma_set_default_gid_type(cma_dev, group->port_num, gid_type);
+
+	cma_configfs_params_put(cma_dev);
+
+	return !ret ? strnlen(buf, count) : ret;
+}
+
+CONFIGFS_ATTR(, default_roce_mode);
+
+static struct configfs_attribute *cma_configfs_attributes[] = {
+	&attr_default_roce_mode,
+	NULL,
+};
+
+static struct config_item_type cma_port_group_type = {
+	.ct_attrs	= cma_configfs_attributes,
+	.ct_owner	= THIS_MODULE
+};
+
+static int make_cma_ports(struct cma_dev_group *cma_dev_group,
+			  struct cma_device *cma_dev)
+{
+	struct ib_device *ibdev;
+	unsigned int i;
+	unsigned int ports_num;
+	struct cma_dev_port_group *ports;
+	int err;
+
+	ibdev = cma_get_ib_dev(cma_dev);
+
+	if (!ibdev)
+		return -ENODEV;
+
+	ports_num = ibdev->phys_port_cnt;
+	ports = kcalloc(ports_num, sizeof(*cma_dev_group->ports),
+			GFP_KERNEL);
+
+	cma_dev_group->default_ports_group = kcalloc(ports_num + 1,
+						     sizeof(*cma_dev_group->ports),
+						     GFP_KERNEL);
+
+	if (!ports || !cma_dev_group->default_ports_group) {
+		err = -ENOMEM;
+		goto free;
+	}
+
+	for (i = 0; i < ports_num; i++) {
+		char port_str[10];
+
+		ports[i].port_num = i + 1;
+		snprintf(port_str, sizeof(port_str), "%u", i + 1);
+		ports[i].cma_dev_group = cma_dev_group;
+		config_group_init_type_name(&ports[i].group,
+					    port_str,
+					    &cma_port_group_type);
+		cma_dev_group->default_ports_group[i] = &ports[i].group;
+	}
+	cma_dev_group->default_ports_group[i] = NULL;
+	cma_dev_group->ports = ports;
+
+	return 0;
+free:
+	kfree(ports);
+	kfree(cma_dev_group->default_ports_group);
+	cma_dev_group->ports = NULL;
+	cma_dev_group->default_ports_group = NULL;
+	return err;
+}
+
+static void release_cma_dev(struct config_item  *item)
+{
+	struct config_group *group = container_of(item, struct config_group,
+						  cg_item);
+	struct cma_dev_group *cma_dev_group = container_of(group,
+							   struct cma_dev_group,
+							   device_group);
+
+	kfree(cma_dev_group);
+};
+
+static void release_cma_ports_group(struct config_item  *item)
+{
+	struct config_group *group = container_of(item, struct config_group,
+						  cg_item);
+	struct cma_dev_group *cma_dev_group = container_of(group,
+							   struct cma_dev_group,
+							   ports_group);
+
+	kfree(cma_dev_group->ports);
+	kfree(cma_dev_group->default_ports_group);
+	cma_dev_group->ports = NULL;
+	cma_dev_group->default_ports_group = NULL;
+};
+
+static struct configfs_item_operations cma_ports_item_ops = {
+	.release = release_cma_ports_group
+};
+
+static struct config_item_type cma_ports_group_type = {
+	.ct_item_ops	= &cma_ports_item_ops,
+	.ct_owner	= THIS_MODULE
+};
+
+static struct configfs_item_operations cma_device_item_ops = {
+	.release = release_cma_dev
+};
+
+static struct config_item_type cma_device_group_type = {
+	.ct_item_ops	= &cma_device_item_ops,
+	.ct_owner	= THIS_MODULE
+};
+
+static struct config_group *make_cma_dev(struct config_group *group,
+					 const char *name)
+{
+	int err = -ENODEV;
+	struct cma_device *cma_dev = cma_enum_devices_by_ibdev(filter_by_name,
+							       (void *)name);
+	struct cma_dev_group *cma_dev_group = NULL;
+
+	if (!cma_dev)
+		goto fail;
+
+	cma_dev_group = kzalloc(sizeof(*cma_dev_group), GFP_KERNEL);
+
+	if (!cma_dev_group) {
+		err = -ENOMEM;
+		goto fail;
+	}
+
+	strncpy(cma_dev_group->name, name, sizeof(cma_dev_group->name));
+
+	err = make_cma_ports(cma_dev_group, cma_dev);
+	if (err)
+		goto fail;
+
+	cma_dev_group->ports_group.default_groups =
+		cma_dev_group->default_ports_group;
+	config_group_init_type_name(&cma_dev_group->ports_group, "ports",
+				    &cma_ports_group_type);
+
+	cma_dev_group->device_group.default_groups
+		= cma_dev_group->default_dev_group;
+	cma_dev_group->default_dev_group[0] = &cma_dev_group->ports_group;
+	cma_dev_group->default_dev_group[1] = NULL;
+
+	config_group_init_type_name(&cma_dev_group->device_group, name,
+				    &cma_device_group_type);
+
+	cma_deref_dev(cma_dev);
+	return &cma_dev_group->device_group;
+
+fail:
+	if (cma_dev)
+		cma_deref_dev(cma_dev);
+	kfree(cma_dev_group);
+	return ERR_PTR(err);
+}
+
+static struct configfs_group_operations cma_subsys_group_ops = {
+	.make_group	= make_cma_dev,
+};
+
+static struct config_item_type cma_subsys_type = {
+	.ct_group_ops	= &cma_subsys_group_ops,
+	.ct_owner	= THIS_MODULE,
+};
+
+static struct configfs_subsystem cma_subsys = {
+	.su_group	= {
+		.cg_item	= {
+			.ci_namebuf	= "rdma_cm",
+			.ci_type	= &cma_subsys_type,
+		},
+	},
+};
+
+int __init cma_configfs_init(void)
+{
+	config_group_init(&cma_subsys.su_group);
+	mutex_init(&cma_subsys.su_mutex);
+	return configfs_register_subsystem(&cma_subsys);
+}
+
+void __exit cma_configfs_exit(void)
+{
+	configfs_unregister_subsystem(&cma_subsys);
+}
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 1945b4e..eab3221 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -38,9 +38,31 @@
 
 #include <rdma/ib_verbs.h>
 
+#if IS_ENABLED(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS)
+int cma_configfs_init(void);
+void cma_configfs_exit(void);
+#else
+static inline int cma_configfs_init(void)
+{
+	return 0;
+}
+
+static inline void cma_configfs_exit(void)
+{
+}
+#endif
 struct cma_device;
 void cma_ref_dev(struct cma_device *cma_dev);
 void cma_deref_dev(struct cma_device *cma_dev);
+typedef bool (*cma_device_filter)(struct ib_device *, void *);
+struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter	filter,
+					     void		*cookie);
+int cma_get_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port);
+int cma_set_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port,
+			     enum ib_gid_type default_gid_type);
+struct ib_device *cma_get_ib_dev(struct cma_device *cma_dev);
 
 int  ib_device_register_sysfs(struct ib_device *device,
 			      int (*port_callback)(struct ib_device *,
@@ -74,6 +96,8 @@ enum ib_cache_gid_default_mode {
 	IB_CACHE_GID_DEFAULT_MODE_DELETE
 };
 
+int ib_cache_gid_parse_type_str(const char *buf);
+
 const char *ib_cache_gid_type_str(enum ib_gid_type gid_type);
 
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP Matan Barak
                     ` (13 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur,
	Moni Shoua

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCE UDP ENCAP it is
required that this function would be able to build also IP and UDP
headers.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/ud_header.c    | 155 ++++++++++++++++++++++++++++++---
 drivers/infiniband/hw/mlx4/qp.c        |   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |   2 +-
 include/rdma/ib_pack.h                 |  45 ++++++++--
 4 files changed, 188 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c
index 72feee6..96697e7 100644
--- a/drivers/infiniband/core/ud_header.c
+++ b/drivers/infiniband/core/ud_header.c
@@ -35,6 +35,7 @@
 #include <linux/string.h>
 #include <linux/export.h>
 #include <linux/if_ether.h>
+#include <linux/ip.h>
 
 #include <rdma/ib_pack.h>
 
@@ -116,6 +117,72 @@ static const struct ib_field vlan_table[]  = {
 	  .size_bits    = 16 }
 };
 
+static const struct ib_field ip4_table[]  = {
+	{ STRUCT_FIELD(ip4, ver),
+	  .offset_words = 0,
+	  .offset_bits  = 0,
+	  .size_bits    = 4 },
+	{ STRUCT_FIELD(ip4, hdr_len),
+	  .offset_words = 0,
+	  .offset_bits  = 4,
+	  .size_bits    = 4 },
+	{ STRUCT_FIELD(ip4, tos),
+	  .offset_words = 0,
+	  .offset_bits  = 8,
+	  .size_bits    = 8 },
+	{ STRUCT_FIELD(ip4, tot_len),
+	  .offset_words = 0,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, id),
+	  .offset_words = 1,
+	  .offset_bits  = 0,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, frag_off),
+	  .offset_words = 1,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, ttl),
+	  .offset_words = 2,
+	  .offset_bits  = 0,
+	  .size_bits    = 8 },
+	{ STRUCT_FIELD(ip4, protocol),
+	  .offset_words = 2,
+	  .offset_bits  = 8,
+	  .size_bits    = 8 },
+	{ STRUCT_FIELD(ip4, check),
+	  .offset_words = 2,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, saddr),
+	  .offset_words = 3,
+	  .offset_bits  = 0,
+	  .size_bits    = 32 },
+	{ STRUCT_FIELD(ip4, daddr),
+	  .offset_words = 4,
+	  .offset_bits  = 0,
+	  .size_bits    = 32 }
+};
+
+static const struct ib_field udp_table[]  = {
+	{ STRUCT_FIELD(udp, sport),
+	  .offset_words = 0,
+	  .offset_bits  = 0,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(udp, dport),
+	  .offset_words = 0,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(udp, length),
+	  .offset_words = 1,
+	  .offset_bits  = 0,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(udp, csum),
+	  .offset_words = 1,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 }
+};
+
 static const struct ib_field grh_table[]  = {
 	{ STRUCT_FIELD(grh, ip_version),
 	  .offset_words = 0,
@@ -213,26 +280,57 @@ static const struct ib_field deth_table[] = {
 	  .size_bits    = 24 }
 };
 
+__be16 ib_ud_ip4_csum(struct ib_ud_header *header)
+{
+	struct iphdr iph;
+
+	iph.ihl		= 5;
+	iph.version	= 4;
+	iph.tos		= header->ip4.tos;
+	iph.tot_len	= header->ip4.tot_len;
+	iph.id		= header->ip4.id;
+	iph.frag_off	= header->ip4.frag_off;
+	iph.ttl		= header->ip4.ttl;
+	iph.protocol	= header->ip4.protocol;
+	iph.check	= 0;
+	iph.saddr	= header->ip4.saddr;
+	iph.daddr	= header->ip4.daddr;
+
+	return ip_fast_csum((u8 *)&iph, iph.ihl);
+}
+EXPORT_SYMBOL(ib_ud_ip4_csum);
+
 /**
  * ib_ud_header_init - Initialize UD header structure
  * @payload_bytes:Length of packet payload
  * @lrh_present: specify if LRH is present
  * @eth_present: specify if Eth header is present
  * @vlan_present: packet is tagged vlan
- * @grh_present:GRH flag (if non-zero, GRH will be included)
+ * @grh_present: GRH flag (if non-zero, GRH will be included)
+ * @ip_version: if non-zero, IP header, V4 or V6, will be included
+ * @udp_present :if non-zero, UDP header will be included
  * @immediate_present: specify if immediate data is present
  * @header:Structure to initialize
  */
-void ib_ud_header_init(int     		    payload_bytes,
-		       int		    lrh_present,
-		       int		    eth_present,
-		       int		    vlan_present,
-		       int    		    grh_present,
-		       int		    immediate_present,
-		       struct ib_ud_header *header)
+int ib_ud_header_init(int     payload_bytes,
+		      int    lrh_present,
+		      int    eth_present,
+		      int    vlan_present,
+		      int    grh_present,
+		      int    ip_version,
+		      int    udp_present,
+		      int    immediate_present,
+		      struct ib_ud_header *header)
 {
+	grh_present = grh_present && !ip_version;
 	memset(header, 0, sizeof *header);
 
+	/*
+	 * UDP header without IP header doesn't make sense
+	 */
+	if (udp_present && ip_version != 4 && ip_version != 6)
+		return -EINVAL;
+
 	if (lrh_present) {
 		u16 packet_length;
 
@@ -252,7 +350,7 @@ void ib_ud_header_init(int     		    payload_bytes,
 	if (vlan_present)
 		header->eth.type = cpu_to_be16(ETH_P_8021Q);
 
-	if (grh_present) {
+	if (ip_version == 6 || grh_present) {
 		header->grh.ip_version      = 6;
 		header->grh.payload_length  =
 			cpu_to_be16((IB_BTH_BYTES     +
@@ -260,8 +358,30 @@ void ib_ud_header_init(int     		    payload_bytes,
 				     payload_bytes    +
 				     4                + /* ICRC     */
 				     3) & ~3);          /* round up */
-		header->grh.next_header     = 0x1b;
+		header->grh.next_header     = udp_present ? IPPROTO_UDP : 0x1b;
+	}
+
+	if (ip_version == 4) {
+		int udp_bytes = udp_present ? IB_UDP_BYTES : 0;
+
+		header->ip4.ver = 4; /* version 4 */
+		header->ip4.hdr_len = 5; /* 5 words */
+		header->ip4.tot_len =
+			cpu_to_be16(IB_IP4_BYTES   +
+				     udp_bytes     +
+				     IB_BTH_BYTES  +
+				     IB_DETH_BYTES +
+				     payload_bytes +
+				     4);     /* ICRC     */
+		header->ip4.protocol = IPPROTO_UDP;
 	}
+	if (udp_present && ip_version)
+		header->udp.length =
+			cpu_to_be16(IB_UDP_BYTES   +
+				     IB_BTH_BYTES  +
+				     IB_DETH_BYTES +
+				     payload_bytes +
+				     4);     /* ICRC     */
 
 	if (immediate_present)
 		header->bth.opcode           = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
@@ -273,8 +393,11 @@ void ib_ud_header_init(int     		    payload_bytes,
 	header->lrh_present = lrh_present;
 	header->eth_present = eth_present;
 	header->vlan_present = vlan_present;
-	header->grh_present = grh_present;
+	header->grh_present = grh_present || (ip_version == 6);
+	header->ipv4_present = ip_version == 4;
+	header->udp_present = udp_present;
 	header->immediate_present = immediate_present;
+	return 0;
 }
 EXPORT_SYMBOL(ib_ud_header_init);
 
@@ -311,6 +434,16 @@ int ib_ud_header_pack(struct ib_ud_header *header,
 			&header->grh, buf + len);
 		len += IB_GRH_BYTES;
 	}
+	if (header->ipv4_present) {
+		ib_pack(ip4_table, ARRAY_SIZE(ip4_table),
+			&header->ip4, buf + len);
+		len += IB_IP4_BYTES;
+	}
+	if (header->udp_present) {
+		ib_pack(udp_table, ARRAY_SIZE(udp_table),
+			&header->udp, buf + len);
+		len += IB_UDP_BYTES;
+	}
 
 	ib_pack(bth_table, ARRAY_SIZE(bth_table),
 		&header->bth, buf + len);
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index a2e4ca5..1c7a308 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2161,7 +2161,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp *sqp,
 	if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_SMI_OWNER)
 		send_size += sizeof (struct mlx4_ib_tunnel_header);
 
-	ib_ud_header_init(send_size, 1, 0, 0, 0, 0, &sqp->ud_header);
+	ib_ud_header_init(send_size, 1, 0, 0, 0, 0, 0, 0, &sqp->ud_header);
 
 	if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_SMI_OWNER) {
 		sqp->ud_header.lrh.service_level =
@@ -2307,7 +2307,10 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_ud_wr *wr,
 			is_vlan = 1;
 		}
 	}
-	ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh, 0, &sqp->ud_header);
+	err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh,
+				0, 0, 0, &sqp->ud_header);
+	if (err)
+		return err;
 
 	if (!is_eth) {
 		sqp->ud_header.lrh.service_level =
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 35fe506..96e5fb9 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1485,7 +1485,7 @@ static int build_mlx_header(struct mthca_dev *dev, struct mthca_sqp *sqp,
 	u16 pkey;
 
 	ib_ud_header_init(256, /* assume a MAD */ 1, 0, 0,
-			  mthca_ah_grh_present(to_mah(wr->ah)), 0,
+			  mthca_ah_grh_present(to_mah(wr->ah)), 0, 0, 0,
 			  &sqp->ud_header);
 
 	err = mthca_read_ah(dev, to_mah(wr->ah), &sqp->ud_header);
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index e99d8f9..a193081 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -41,6 +41,8 @@ enum {
 	IB_ETH_BYTES  = 14,
 	IB_VLAN_BYTES = 4,
 	IB_GRH_BYTES  = 40,
+	IB_IP4_BYTES  = 20,
+	IB_UDP_BYTES  = 8,
 	IB_BTH_BYTES  = 12,
 	IB_DETH_BYTES = 8
 };
@@ -223,6 +225,27 @@ struct ib_unpacked_eth {
 	__be16	type;
 };
 
+struct ib_unpacked_ip4 {
+	u8	ver;
+	u8	hdr_len;
+	u8	tos;
+	__be16	tot_len;
+	__be16	id;
+	__be16	frag_off;
+	u8	ttl;
+	u8	protocol;
+	__be16	check;
+	__be32	saddr;
+	__be32	daddr;
+};
+
+struct ib_unpacked_udp {
+	__be16	sport;
+	__be16	dport;
+	__be16	length;
+	__be16	csum;
+};
+
 struct ib_unpacked_vlan {
 	__be16  tag;
 	__be16  type;
@@ -237,6 +260,10 @@ struct ib_ud_header {
 	struct ib_unpacked_vlan vlan;
 	int			grh_present;
 	struct ib_unpacked_grh	grh;
+	int			ipv4_present;
+	struct ib_unpacked_ip4	ip4;
+	int			udp_present;
+	struct ib_unpacked_udp	udp;
 	struct ib_unpacked_bth	bth;
 	struct ib_unpacked_deth deth;
 	int			immediate_present;
@@ -253,13 +280,17 @@ void ib_unpack(const struct ib_field        *desc,
 	       void                         *buf,
 	       void                         *structure);
 
-void ib_ud_header_init(int		    payload_bytes,
-		       int		    lrh_present,
-		       int		    eth_present,
-		       int		    vlan_present,
-		       int		    grh_present,
-		       int		    immediate_present,
-		       struct ib_ud_header *header);
+__be16 ib_ud_ip4_csum(struct ib_ud_header *header);
+
+int ib_ud_header_init(int		    payload_bytes,
+		      int		    lrh_present,
+		      int		    eth_present,
+		      int		    vlan_present,
+		      int		    grh_present,
+		      int		    ip_version,
+		      int		    udp_present,
+		      int		    immediate_present,
+		      struct ib_ud_header *header);
 
 int ib_ud_header_pack(struct ib_ud_header *header,
 		      void                *buf);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 00/11] Add RoCE v2 support Matan Barak
                     ` (12 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur,
	Moni Shoua

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c       | 96 ++++++++++++++++++++++++++++++++++---
 drivers/infiniband/core/multicast.c | 17 ++++++-
 include/rdma/ib_sa.h                |  2 +
 3 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 8fab267..c30bfe3 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -38,6 +38,7 @@
 #include <linux/in6.h>
 #include <linux/mutex.h>
 #include <linux/random.h>
+#include <linux/igmp.h>
 #include <linux/idr.h>
 #include <linux/inetdevice.h>
 #include <linux/slab.h>
@@ -304,6 +305,7 @@ struct cma_multicast {
 	void			*context;
 	struct sockaddr_storage	addr;
 	struct kref		mcref;
+	bool			igmp_joined;
 };
 
 struct cma_work {
@@ -400,6 +402,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
 	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
 }
 
+static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join)
+{
+	struct in_device *in_dev = NULL;
+
+	if (ndev) {
+		rtnl_lock();
+		in_dev = __in_dev_get_rtnl(ndev);
+		if (in_dev) {
+			if (join)
+				ip_mc_inc_group(in_dev,
+						*(__be32 *)(mgid->raw + 12));
+			else
+				ip_mc_dec_group(in_dev,
+						*(__be32 *)(mgid->raw + 12));
+		}
+		rtnl_unlock();
+	}
+	return (in_dev) ? 0 : -ENODEV;
+}
+
 static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
 			       struct cma_device *cma_dev)
 {
@@ -1535,8 +1557,24 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
 				      id_priv->id.port_num)) {
 			ib_sa_free_multicast(mc->multicast.ib);
 			kfree(mc);
-		} else
+		} else {
+			if (mc->igmp_joined) {
+				struct rdma_dev_addr *dev_addr =
+					&id_priv->id.route.addr.dev_addr;
+				struct net_device *ndev = NULL;
+
+				if (dev_addr->bound_dev_if)
+					ndev = dev_get_by_index(&init_net,
+								dev_addr->bound_dev_if);
+				if (ndev) {
+					cma_igmp_send(ndev,
+						      &mc->multicast.ib->rec.mgid,
+						      false);
+					dev_put(ndev);
+				}
+			}
 			kref_put(&mc->mcref, release_mc);
+		}
 	}
 }
 
@@ -3656,12 +3694,23 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 	event.status = status;
 	event.param.ud.private_data = mc->context;
 	if (!status) {
+		struct rdma_dev_addr *dev_addr =
+			&id_priv->id.route.addr.dev_addr;
+		struct net_device *ndev =
+			dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+		enum ib_gid_type gid_type =
+			id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
+			rdma_start_port(id_priv->cma_dev->device)];
+
 		event.event = RDMA_CM_EVENT_MULTICAST_JOIN;
 		ib_init_ah_from_mcmember(id_priv->id.device,
 					 id_priv->id.port_num, &multicast->rec,
+					 ndev, gid_type,
 					 &event.param.ud.ah_attr);
 		event.param.ud.qp_num = 0xFFFFFF;
 		event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey);
+		if (ndev)
+			dev_put(ndev);
 	} else
 		event.event = RDMA_CM_EVENT_MULTICAST_ERROR;
 
@@ -3794,9 +3843,10 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 {
 	struct iboe_mcast_work *work;
 	struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
-	int err;
+	int err = 0;
 	struct sockaddr *addr = (struct sockaddr *)&mc->addr;
 	struct net_device *ndev = NULL;
+	enum ib_gid_type gid_type;
 
 	if (cma_zero_addr((struct sockaddr *)&mc->addr))
 		return -EINVAL;
@@ -3826,9 +3876,25 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 	mc->multicast.ib->rec.rate = iboe_get_rate(ndev);
 	mc->multicast.ib->rec.hop_limit = 1;
 	mc->multicast.ib->rec.mtu = iboe_get_mtu(ndev->mtu);
+
+	gid_type = id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
+		   rdma_start_port(id_priv->cma_dev->device)];
+	if (addr->sa_family == AF_INET) {
+		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
+			err = cma_igmp_send(ndev, &mc->multicast.ib->rec.mgid,
+					    true);
+		if (!err) {
+			mc->igmp_joined = true;
+			mc->multicast.ib->rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+		}
+	} else {
+		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
+			err = -ENOTSUPP;
+	}
 	dev_put(ndev);
-	if (!mc->multicast.ib->rec.mtu) {
-		err = -EINVAL;
+	if (err || !mc->multicast.ib->rec.mtu) {
+		if (!err)
+			err = -EINVAL;
 		goto out2;
 	}
 	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
@@ -3867,7 +3933,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 	memcpy(&mc->addr, addr, rdma_addr_size(addr));
 	mc->context = context;
 	mc->id_priv = id_priv;
-
+	mc->igmp_joined = false;
 	spin_lock(&id_priv->lock);
 	list_add(&mc->list, &id_priv->mc_list);
 	spin_unlock(&id_priv->lock);
@@ -3912,9 +3978,25 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
 			if (rdma_cap_ib_mcast(id->device, id->port_num)) {
 				ib_sa_free_multicast(mc->multicast.ib);
 				kfree(mc);
-			} else if (rdma_protocol_roce(id->device, id->port_num))
+			} else if (rdma_protocol_roce(id->device, id->port_num)) {
+				if (mc->igmp_joined) {
+					struct rdma_dev_addr *dev_addr =
+						&id->route.addr.dev_addr;
+					struct net_device *ndev = NULL;
+
+					if (dev_addr->bound_dev_if)
+						ndev = dev_get_by_index(&init_net,
+									dev_addr->bound_dev_if);
+					if (ndev) {
+						cma_igmp_send(ndev,
+							      &mc->multicast.ib->rec.mgid,
+							      false);
+						dev_put(ndev);
+					}
+					mc->igmp_joined = false;
+				}
 				kref_put(&mc->mcref, release_mc);
-
+			}
 			return;
 		}
 	}
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index 6911ae6..250937c 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -723,14 +723,27 @@ EXPORT_SYMBOL(ib_sa_get_mcmember_rec);
 
 int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
 			     struct ib_sa_mcmember_rec *rec,
+			     struct net_device *ndev,
+			     enum ib_gid_type gid_type,
 			     struct ib_ah_attr *ah_attr)
 {
 	int ret;
 	u16 gid_index;
 	u8 p;
 
-	ret = ib_find_cached_gid(device, &rec->port_gid, IB_GID_TYPE_IB,
-				 NULL, &p, &gid_index);
+	if (rdma_protocol_roce(device, port_num)) {
+		ret = ib_find_cached_gid_by_port(device, &rec->port_gid,
+						 gid_type, port_num,
+						 ndev,
+						 &gid_index);
+	} else if (rdma_protocol_ib(device, port_num)) {
+		ret = ib_find_cached_gid(device, &rec->port_gid,
+					 IB_GID_TYPE_IB, NULL, &p,
+					 &gid_index);
+	} else {
+		ret = -EINVAL;
+	}
+
 	if (ret)
 		return ret;
 
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 0a40ed2..cdc1c81 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -403,6 +403,8 @@ int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num,
  */
 int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
 			     struct ib_sa_mcmember_rec *rec,
+			     struct net_device *ndev,
+			     enum ib_gid_type gid_type,
 			     struct ib_ah_attr *ah_attr);
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute Matan Barak
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Hi Doug,

This series adds the support for RoCE v2. In order to support RoCE v2,
we add gid_type attribute to every GID. When the RoCE GID management
populates the GID table, it duplicates each GID with all supported types.
This gives the user the ability to communicate over each supported
type.

Patch 0001, 0002 and 0003 add support for multiple GID types to the
cache and related APIs. The third patch exposes the GID attributes
information is sysfs.

Patch 0004 adds the RoCE v2 GID type and the capabilities required
from the vendor in order to implement RoCE v2. These capabilities
are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.

RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
information should come from the vendor's driver. In case the vendor
doesn't supply this information, we parse the packet headers and resolve
its network type. Patch 0005 adds this information and required utilities.

Patches 0006 and 0007 adds route validation. This is mandatory to ensure
that we send packets using GIDS which corresponds to a net-device that
can be routed to the destination.

Patches 0008 and 0009 add configfs support (and the required
infrastructure) for CMA. The administrator should be able to set the
default RoCE type. This is done through a new per-port
default_roce_mode configfs file.

Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
packets. This is required for vendors which implement their
QP1 as a Raw QP.

Patch 0011 adds support for IPv4 multicast as an IPv4 network
requires IGMP to be sent in order to join multicast groups.

Vendors code aren't part of this patch-set. Soft-Roce will be
sent soon and depends on these patches. Other vendors, like
mlx4, ocrdma and mlx5 will follow.

This patch is applied on "Change per-entry locks in GID cache to table lock"
which was sent to the mailing list.

Thanks,
Matan

Changed from V1:
 - Rebased against Linux 4.4-rc2 master branch.
 - Add route validation
 - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
 - Add documentation for configfs and sysfs ABI
 - Remove ifindex and gid_type from mcmember

Changes from V0:
 - Rebased patches against Doug's latest k.o/for-4.4 tree.
 - Fixed a bug in configfs (rmdir caused an incorrect free).

Matan Barak (8):
  IB/core: Add gid_type to gid attribute
  IB/cm: Use the source GID index type
  IB/core: Add gid attributes to sysfs
  IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
  IB/core: Move rdma_is_upper_dev_rcu to header file
  IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
  IB/rdma_cm: Add wrapper for cma reference count
  IB/cma: Add configfs for rdma_cm

Moni Shoua (2):
  IB/core: Initialize UD header structure with IP and UDP headers
  IB/cma: Join and leave multicast groups with IGMP

Somnath Kotur (1):
  IB/core: Add rdma_network_type to wc

 Documentation/ABI/testing/configfs-rdma_cm       |  22 ++
 Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
 drivers/infiniband/Kconfig                       |   9 +
 drivers/infiniband/core/Makefile                 |   2 +
 drivers/infiniband/core/addr.c                   | 185 +++++++++----
 drivers/infiniband/core/cache.c                  | 169 ++++++++----
 drivers/infiniband/core/cm.c                     |  31 ++-
 drivers/infiniband/core/cma.c                    | 261 ++++++++++++++++--
 drivers/infiniband/core/cma_configfs.c           | 321 +++++++++++++++++++++++
 drivers/infiniband/core/core_priv.h              |  45 ++++
 drivers/infiniband/core/device.c                 |  10 +-
 drivers/infiniband/core/multicast.c              |  17 +-
 drivers/infiniband/core/roce_gid_mgmt.c          |  81 ++++--
 drivers/infiniband/core/sa_query.c               |  76 +++++-
 drivers/infiniband/core/sysfs.c                  | 184 ++++++++++++-
 drivers/infiniband/core/ud_header.c              | 155 ++++++++++-
 drivers/infiniband/core/uverbs_marshall.c        |   1 +
 drivers/infiniband/core/verbs.c                  | 170 ++++++++++--
 drivers/infiniband/hw/mlx4/qp.c                  |   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c           |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c         |   2 +-
 include/rdma/ib_addr.h                           |  11 +-
 include/rdma/ib_cache.h                          |   4 +
 include/rdma/ib_pack.h                           |  45 +++-
 include/rdma/ib_sa.h                             |   3 +
 include/rdma/ib_verbs.h                          |  78 +++++-
 26 files changed, 1704 insertions(+), 203 deletions(-)
 create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
 create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
 create mode 100644 drivers/infiniband/core/cma_configfs.c

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 00/11] Add RoCE v2 support Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 02/11] IB/cm: Use the source GID index type Matan Barak
                     ` (10 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

In order to support multiple GID types, we need to store the gid_type
with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
GID table entries shall have a "GID type" attribute that denotes the L3
Address type". The currently supported GID is IB_GID_TYPE_IB which is
also RoCE v1 GID type.

This implies that gid_type should be added to roce_gid_table meta-data.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c           | 144 ++++++++++++++++++++----------
 drivers/infiniband/core/cm.c              |   2 +-
 drivers/infiniband/core/cma.c             |   3 +-
 drivers/infiniband/core/core_priv.h       |   4 +
 drivers/infiniband/core/device.c          |   9 +-
 drivers/infiniband/core/multicast.c       |   2 +-
 drivers/infiniband/core/roce_gid_mgmt.c   |  60 +++++++++++--
 drivers/infiniband/core/sa_query.c        |   5 +-
 drivers/infiniband/core/uverbs_marshall.c |   1 +
 drivers/infiniband/core/verbs.c           |   1 +
 include/rdma/ib_cache.h                   |   4 +
 include/rdma/ib_sa.h                      |   1 +
 include/rdma/ib_verbs.h                   |  11 ++-
 13 files changed, 185 insertions(+), 62 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 097e9df..566fd8f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -64,6 +64,7 @@ enum gid_attr_find_mask {
 	GID_ATTR_FIND_MASK_GID          = 1UL << 0,
 	GID_ATTR_FIND_MASK_NETDEV	= 1UL << 1,
 	GID_ATTR_FIND_MASK_DEFAULT	= 1UL << 2,
+	GID_ATTR_FIND_MASK_GID_TYPE	= 1UL << 3,
 };
 
 enum gid_table_entry_props {
@@ -125,6 +126,19 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 	}
 }
 
+static const char * const gid_type_str[] = {
+	[IB_GID_TYPE_IB]	= "IB/RoCE v1",
+};
+
+const char *ib_cache_gid_type_str(enum ib_gid_type gid_type)
+{
+	if (gid_type < ARRAY_SIZE(gid_type_str) && gid_type_str[gid_type])
+		return gid_type_str[gid_type];
+
+	return "Invalid GID type";
+}
+EXPORT_SYMBOL(ib_cache_gid_type_str);
+
 /* This function expects that rwlock will be write locked in all
  * scenarios and that lock will be locked in sleep-able (RoCE)
  * scenarios.
@@ -233,6 +247,10 @@ static int find_gid(struct ib_gid_table *table, const union ib_gid *gid,
 		if (found >=0)
 			continue;
 
+		if (mask & GID_ATTR_FIND_MASK_GID_TYPE &&
+		    attr->gid_type != val->gid_type)
+			continue;
+
 		if (mask & GID_ATTR_FIND_MASK_GID &&
 		    memcmp(gid, &data->gid, sizeof(*gid)))
 			continue;
@@ -296,6 +314,7 @@ int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
 	write_lock_irq(&table->rwlock);
 
 	ix = find_gid(table, gid, attr, false, GID_ATTR_FIND_MASK_GID |
+		      GID_ATTR_FIND_MASK_GID_TYPE |
 		      GID_ATTR_FIND_MASK_NETDEV, &empty);
 	if (ix >= 0)
 		goto out_unlock;
@@ -329,6 +348,7 @@ int ib_cache_gid_del(struct ib_device *ib_dev, u8 port,
 
 	ix = find_gid(table, gid, attr, false,
 		      GID_ATTR_FIND_MASK_GID	  |
+		      GID_ATTR_FIND_MASK_GID_TYPE |
 		      GID_ATTR_FIND_MASK_NETDEV	  |
 		      GID_ATTR_FIND_MASK_DEFAULT,
 		      NULL);
@@ -427,11 +447,13 @@ static int _ib_cache_gid_table_find(struct ib_device *ib_dev,
 
 static int ib_cache_gid_find(struct ib_device *ib_dev,
 			     const union ib_gid *gid,
+			     enum ib_gid_type gid_type,
 			     struct net_device *ndev, u8 *port,
 			     u16 *index)
 {
-	unsigned long mask = GID_ATTR_FIND_MASK_GID;
-	struct ib_gid_attr gid_attr_val = {.ndev = ndev};
+	unsigned long mask = GID_ATTR_FIND_MASK_GID |
+			     GID_ATTR_FIND_MASK_GID_TYPE;
+	struct ib_gid_attr gid_attr_val = {.ndev = ndev, .gid_type = gid_type};
 
 	if (ndev)
 		mask |= GID_ATTR_FIND_MASK_NETDEV;
@@ -442,14 +464,16 @@ static int ib_cache_gid_find(struct ib_device *ib_dev,
 
 int ib_find_cached_gid_by_port(struct ib_device *ib_dev,
 			       const union ib_gid *gid,
+			       enum ib_gid_type gid_type,
 			       u8 port, struct net_device *ndev,
 			       u16 *index)
 {
 	int local_index;
 	struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
 	struct ib_gid_table *table;
-	unsigned long mask = GID_ATTR_FIND_MASK_GID;
-	struct ib_gid_attr val = {.ndev = ndev};
+	unsigned long mask = GID_ATTR_FIND_MASK_GID |
+			     GID_ATTR_FIND_MASK_GID_TYPE;
+	struct ib_gid_attr val = {.ndev = ndev, .gid_type = gid_type};
 	unsigned long flags;
 
 	if (port < rdma_start_port(ib_dev) ||
@@ -607,15 +631,15 @@ static void cleanup_gid_table_port(struct ib_device *ib_dev, u8 port,
 
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
 				  struct net_device *ndev,
+				  unsigned long gid_type_mask,
 				  enum ib_cache_gid_default_mode mode)
 {
 	struct ib_gid_table **ports_table = ib_dev->cache.gid_cache;
 	union ib_gid gid;
 	struct ib_gid_attr gid_attr;
+	struct ib_gid_attr zattr_type = zattr;
 	struct ib_gid_table *table;
-	int ix;
-	union ib_gid current_gid;
-	struct ib_gid_attr current_gid_attr = {};
+	unsigned int gid_type;
 
 	table  = ports_table[port - rdma_start_port(ib_dev)];
 
@@ -623,55 +647,82 @@ void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
 	memset(&gid_attr, 0, sizeof(gid_attr));
 	gid_attr.ndev = ndev;
 
-	mutex_lock(&table->lock);
-	write_lock_irq(&table->rwlock);
-	ix = find_gid(table, NULL, NULL, true, GID_ATTR_FIND_MASK_DEFAULT, NULL);
-
-	/* Coudn't find default GID location */
-	WARN_ON(ix < 0);
-
-	if (!__ib_cache_gid_get(ib_dev, port, ix,
-				&current_gid, &current_gid_attr) &&
-	    mode == IB_CACHE_GID_DEFAULT_MODE_SET &&
-	    !memcmp(&gid, &current_gid, sizeof(gid)) &&
-	    !memcmp(&gid_attr, &current_gid_attr, sizeof(gid_attr)))
-		goto unlock;
-
-	if (memcmp(&current_gid, &zgid, sizeof(current_gid)) ||
-	    memcmp(&current_gid_attr, &zattr,
-		   sizeof(current_gid_attr))) {
-		if (del_gid(ib_dev, port, table, ix, true)) {
-			pr_warn("ib_cache_gid: can't delete index %d for default gid %pI6\n",
-				ix, gid.raw);
-			goto unlock;
-		} else {
-			dispatch_gid_change_event(ib_dev, port);
+	for (gid_type = 0; gid_type < IB_GID_TYPE_SIZE; ++gid_type) {
+		int ix;
+		union ib_gid current_gid;
+		struct ib_gid_attr current_gid_attr = {};
+
+		if (1UL << gid_type & ~gid_type_mask)
+			continue;
+
+		gid_attr.gid_type = gid_type;
+
+		mutex_lock(&table->lock);
+		write_lock_irq(&table->rwlock);
+		ix = find_gid(table, NULL, &gid_attr, true,
+			      GID_ATTR_FIND_MASK_GID_TYPE |
+			      GID_ATTR_FIND_MASK_DEFAULT,
+			      NULL);
+
+		/* Coudn't find default GID location */
+		WARN_ON(ix < 0);
+
+		zattr_type.gid_type = gid_type;
+
+		if (!__ib_cache_gid_get(ib_dev, port, ix,
+					&current_gid, &current_gid_attr) &&
+		    mode == IB_CACHE_GID_DEFAULT_MODE_SET &&
+		    !memcmp(&gid, &current_gid, sizeof(gid)) &&
+		    !memcmp(&gid_attr, &current_gid_attr, sizeof(gid_attr)))
+			goto release;
+
+		if (memcmp(&current_gid, &zgid, sizeof(current_gid)) ||
+		    memcmp(&current_gid_attr, &zattr_type,
+			   sizeof(current_gid_attr))) {
+			if (del_gid(ib_dev, port, table, ix, true)) {
+				pr_warn("ib_cache_gid: can't delete index %d for default gid %pI6\n",
+					ix, gid.raw);
+				goto release;
+			} else {
+				dispatch_gid_change_event(ib_dev, port);
+			}
 		}
-	}
 
-	if (mode == IB_CACHE_GID_DEFAULT_MODE_SET) {
-		if (add_gid(ib_dev, port, table, ix, &gid, &gid_attr, true)) {
-			pr_warn("ib_cache_gid: unable to add default gid %pI6\n",
-				gid.raw);
-		} else {
-			dispatch_gid_change_event(ib_dev, port);
+		if (mode == IB_CACHE_GID_DEFAULT_MODE_SET) {
+			if (add_gid(ib_dev, port, table, ix, &gid, &gid_attr, true))
+				pr_warn("ib_cache_gid: unable to add default gid %pI6\n",
+					gid.raw);
+			else
+				dispatch_gid_change_event(ib_dev, port);
 		}
-	}
 
-unlock:
-	if (current_gid_attr.ndev)
-		dev_put(current_gid_attr.ndev);
-	write_unlock_irq(&table->rwlock);
-	mutex_unlock(&table->lock);
+release:
+		if (current_gid_attr.ndev)
+			dev_put(current_gid_attr.ndev);
+		write_unlock_irq(&table->rwlock);
+		mutex_unlock(&table->lock);
+	}
 }
 
 static int gid_table_reserve_default(struct ib_device *ib_dev, u8 port,
 				     struct ib_gid_table *table)
 {
-	if (rdma_protocol_roce(ib_dev, port)) {
-		struct ib_gid_table_entry *entry = &table->data_vec[0];
+	unsigned int i;
+	unsigned long roce_gid_type_mask;
+	unsigned int num_default_gids;
+	unsigned int current_gid = 0;
+
+	roce_gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+	num_default_gids = hweight_long(roce_gid_type_mask);
+	for (i = 0; i < num_default_gids && i < table->sz; i++) {
+		struct ib_gid_table_entry *entry =
+			&table->data_vec[i];
 
 		entry->props |= GID_TABLE_ENTRY_DEFAULT;
+		current_gid = find_next_bit(&roce_gid_type_mask,
+					    BITS_PER_LONG,
+					    current_gid);
+		entry->attr.gid_type = current_gid++;
 	}
 
 	return 0;
@@ -794,11 +845,12 @@ EXPORT_SYMBOL(ib_get_cached_gid);
 
 int ib_find_cached_gid(struct ib_device *device,
 		       const union ib_gid *gid,
+		       enum ib_gid_type gid_type,
 		       struct net_device *ndev,
 		       u8               *port_num,
 		       u16              *index)
 {
-	return ib_cache_gid_find(device, gid, ndev, port_num, index);
+	return ib_cache_gid_find(device, gid, gid_type, ndev, port_num, index);
 }
 EXPORT_SYMBOL(ib_find_cached_gid);
 
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 0a26dd6..af8b907 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -364,7 +364,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 	read_lock_irqsave(&cm.device_lock, flags);
 	list_for_each_entry(cm_dev, &cm.device_list, list) {
 		if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid,
-					ndev, &p, NULL)) {
+					IB_GID_TYPE_IB, ndev, &p, NULL)) {
 			port = cm_dev->port[p-1];
 			break;
 		}
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 944cd90..c19f822 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -456,7 +456,8 @@ static inline int cma_validate_port(struct ib_device *device, u8 port,
 	if (dev_type == ARPHRD_ETHER)
 		ndev = dev_get_by_index(&init_net, bound_if_index);
 
-	ret = ib_find_cached_gid_by_port(device, gid, port, ndev, NULL);
+	ret = ib_find_cached_gid_by_port(device, gid, IB_GID_TYPE_IB, port,
+					 ndev, NULL);
 
 	if (ndev)
 		dev_put(ndev);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 5cf6eb7..d531f91 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -70,8 +70,11 @@ enum ib_cache_gid_default_mode {
 	IB_CACHE_GID_DEFAULT_MODE_DELETE
 };
 
+const char *ib_cache_gid_type_str(enum ib_gid_type gid_type);
+
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
 				  struct net_device *ndev,
+				  unsigned long gid_type_mask,
 				  enum ib_cache_gid_default_mode mode);
 
 int ib_cache_gid_add(struct ib_device *ib_dev, u8 port,
@@ -87,6 +90,7 @@ int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
 
 int roce_rescan_device(struct ib_device *ib_dev);
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port);
 
 int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 179e813..732a196 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -825,26 +825,31 @@ EXPORT_SYMBOL(ib_modify_port);
  *   a specified GID value occurs.
  * @device: The device to query.
  * @gid: The GID value to search for.
+ * @gid_type: Type of GID.
  * @ndev: The ndev related to the GID to search for.
  * @port_num: The port number of the device where the GID value was found.
  * @index: The index into the GID table where the GID was found.  This
  *   parameter may be NULL.
  */
 int ib_find_gid(struct ib_device *device, union ib_gid *gid,
-		struct net_device *ndev, u8 *port_num, u16 *index)
+		enum ib_gid_type gid_type, struct net_device *ndev,
+		u8 *port_num, u16 *index)
 {
 	union ib_gid tmp_gid;
 	int ret, port, i;
 
 	for (port = rdma_start_port(device); port <= rdma_end_port(device); ++port) {
 		if (rdma_cap_roce_gid_table(device, port)) {
-			if (!ib_find_cached_gid_by_port(device, gid, port,
+			if (!ib_find_cached_gid_by_port(device, gid, gid_type, port,
 							ndev, index)) {
 				*port_num = port;
 				return 0;
 			}
 		}
 
+		if (gid_type != IB_GID_TYPE_IB)
+			continue;
+
 		for (i = 0; i < device->port_immutable[port].gid_tbl_len; ++i) {
 			ret = ib_query_gid(device, port, i, &tmp_gid, NULL);
 			if (ret)
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index bb6685f..6911ae6 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -729,7 +729,7 @@ int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
 	u16 gid_index;
 	u8 p;
 
-	ret = ib_find_cached_gid(device, &rec->port_gid,
+	ret = ib_find_cached_gid(device, &rec->port_gid, IB_GID_TYPE_IB,
 				 NULL, &p, &gid_index);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 178f984..61c27a7 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -67,17 +67,52 @@ struct netdev_event_work {
 	struct netdev_event_work_cmd	cmds[ROCE_NETDEV_CALLBACK_SZ];
 };
 
+static const struct {
+	bool (*is_supported)(const struct ib_device *device, u8 port_num);
+	enum ib_gid_type gid_type;
+} PORT_CAP_TO_GID_TYPE[] = {
+	{rdma_protocol_roce,   IB_GID_TYPE_ROCE},
+};
+
+#define CAP_TO_GID_TABLE_SIZE	ARRAY_SIZE(PORT_CAP_TO_GID_TYPE)
+
+unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u8 port)
+{
+	int i;
+	unsigned int ret_flags = 0;
+
+	if (!rdma_protocol_roce(ib_dev, port))
+		return 1UL << IB_GID_TYPE_IB;
+
+	for (i = 0; i < CAP_TO_GID_TABLE_SIZE; i++)
+		if (PORT_CAP_TO_GID_TYPE[i].is_supported(ib_dev, port))
+			ret_flags |= 1UL << PORT_CAP_TO_GID_TYPE[i].gid_type;
+
+	return ret_flags;
+}
+EXPORT_SYMBOL(roce_gid_type_mask_support);
+
 static void update_gid(enum gid_op_type gid_op, struct ib_device *ib_dev,
 		       u8 port, union ib_gid *gid,
 		       struct ib_gid_attr *gid_attr)
 {
-	switch (gid_op) {
-	case GID_ADD:
-		ib_cache_gid_add(ib_dev, port, gid, gid_attr);
-		break;
-	case GID_DEL:
-		ib_cache_gid_del(ib_dev, port, gid, gid_attr);
-		break;
+	int i;
+	unsigned long gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+	for (i = 0; i < IB_GID_TYPE_SIZE; i++) {
+		if ((1UL << i) & gid_type_mask) {
+			gid_attr->gid_type = i;
+			switch (gid_op) {
+			case GID_ADD:
+				ib_cache_gid_add(ib_dev, port,
+						 gid, gid_attr);
+				break;
+			case GID_DEL:
+				ib_cache_gid_del(ib_dev, port,
+						 gid, gid_attr);
+				break;
+			}
+		}
 	}
 }
 
@@ -203,6 +238,8 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
 				     u8 port, struct net_device *event_ndev,
 				     struct net_device *rdma_ndev)
 {
+	unsigned long gid_type_mask;
+
 	rcu_read_lock();
 	if (!rdma_ndev ||
 	    ((rdma_ndev != event_ndev &&
@@ -215,7 +252,9 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
 	}
 	rcu_read_unlock();
 
-	ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev,
+	gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
+	ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev, gid_type_mask,
 				     IB_CACHE_GID_DEFAULT_MODE_SET);
 }
 
@@ -237,9 +276,14 @@ static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
 	if (is_upper_dev_rcu(rdma_ndev, event_ndev) &&
 	    is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) ==
 	    BONDING_SLAVE_STATE_INACTIVE) {
+		unsigned long gid_type_mask;
+
 		rcu_read_unlock();
 
+		gid_type_mask = roce_gid_type_mask_support(ib_dev, port);
+
 		ib_cache_gid_set_default_gid(ib_dev, port, rdma_ndev,
+					     gid_type_mask,
 					     IB_CACHE_GID_DEFAULT_MODE_DELETE);
 	} else {
 		rcu_read_unlock();
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 2aba774..5606922 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -1012,8 +1012,8 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 		ah_attr->ah_flags = IB_AH_GRH;
 		ah_attr->grh.dgid = rec->dgid;
 
-		ret = ib_find_cached_gid(device, &rec->sgid, ndev, &port_num,
-					 &gid_index);
+		ret = ib_find_cached_gid(device, &rec->sgid, rec->gid_type, ndev,
+					 &port_num, &gid_index);
 		if (ret) {
 			if (ndev)
 				dev_put(ndev);
@@ -1155,6 +1155,7 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query,
 			  mad->data, &rec);
 		rec.net = NULL;
 		rec.ifindex = 0;
+		rec.gid_type = IB_GID_TYPE_IB;
 		memset(rec.dmac, 0, ETH_ALEN);
 		query->callback(status, &rec, query->context);
 	} else
diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c
index 7d2f14c..af020f8 100644
--- a/drivers/infiniband/core/uverbs_marshall.c
+++ b/drivers/infiniband/core/uverbs_marshall.c
@@ -144,5 +144,6 @@ void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
 	memset(dst->dmac, 0, sizeof(dst->dmac));
 	dst->net = NULL;
 	dst->ifindex = 0;
+	dst->gid_type = IB_GID_TYPE_IB;
 }
 EXPORT_SYMBOL(ib_copy_path_rec_from_user);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..4263c4c 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -387,6 +387,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 
 		if (!rdma_cap_eth_ah(device, port_num)) {
 			ret = ib_find_cached_gid_by_port(device, &grh->dgid,
+							 IB_GID_TYPE_IB,
 							 port_num, NULL,
 							 &gid_index);
 			if (ret)
diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
index 269a27cf..e30f19b 100644
--- a/include/rdma/ib_cache.h
+++ b/include/rdma/ib_cache.h
@@ -60,6 +60,7 @@ int ib_get_cached_gid(struct ib_device    *device,
  *   a specified GID value occurs.
  * @device: The device to query.
  * @gid: The GID value to search for.
+ * @gid_type: The GID type to search for.
  * @ndev: In RoCE, the net device of the device. NULL means ignore.
  * @port_num: The port number of the device where the GID value was found.
  * @index: The index into the cached GID table where the GID was found.  This
@@ -70,6 +71,7 @@ int ib_get_cached_gid(struct ib_device    *device,
  */
 int ib_find_cached_gid(struct ib_device *device,
 		       const union ib_gid *gid,
+		       enum ib_gid_type gid_type,
 		       struct net_device *ndev,
 		       u8               *port_num,
 		       u16              *index);
@@ -79,6 +81,7 @@ int ib_find_cached_gid(struct ib_device *device,
  * GID value occurs
  * @device: The device to query.
  * @gid: The GID value to search for.
+ * @gid_type: The GID type to search for.
  * @port_num: The port number of the device where the GID value sould be
  *   searched.
  * @ndev: In RoCE, the net device of the device. Null means ignore.
@@ -90,6 +93,7 @@ int ib_find_cached_gid(struct ib_device *device,
  */
 int ib_find_cached_gid_by_port(struct ib_device *device,
 			       const union ib_gid *gid,
+			       enum ib_gid_type gid_type,
 			       u8               port_num,
 			       struct net_device *ndev,
 			       u16              *index);
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 3019695..0a40ed2 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -160,6 +160,7 @@ struct ib_sa_path_rec {
 	int	     ifindex;
 	/* ignored in IB */
 	struct net  *net;
+	enum ib_gid_type gid_type;
 };
 
 static inline struct net_device *ib_get_ndev_from_path(struct ib_sa_path_rec *rec)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9a68a19..2933aeb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -67,7 +67,15 @@ union ib_gid {
 
 extern union ib_gid zgid;
 
+enum ib_gid_type {
+	/* If link layer is Ethernet, this is RoCE V1 */
+	IB_GID_TYPE_IB        = 0,
+	IB_GID_TYPE_ROCE      = 0,
+	IB_GID_TYPE_SIZE
+};
+
 struct ib_gid_attr {
+	enum ib_gid_type	gid_type;
 	struct net_device	*ndev;
 };
 
@@ -2219,7 +2227,8 @@ int ib_modify_port(struct ib_device *device,
 		   struct ib_port_modify *port_modify);
 
 int ib_find_gid(struct ib_device *device, union ib_gid *gid,
-		struct net_device *ndev, u8 *port_num, u16 *index);
+		enum ib_gid_type gid_type, struct net_device *ndev,
+		u8 *port_num, u16 *index);
 
 int ib_find_pkey(struct ib_device *device,
 		 u8 port_num, u16 pkey, u16 *index);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 02/11] IB/cm: Use the source GID index type
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs Matan Barak
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Previosuly, cm and cma modules supported only IB and RoCE v1 GID type.
In order to support multiple GID types, the gid_type is passed to
cm_init_av_by_path and stored in the path record.

The rdma cm client would use a default GID type that will be saved in
rdma_id_private.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c  | 25 ++++++++++++++++++++-----
 drivers/infiniband/core/cma.c |  2 ++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index af8b907..5ea78ab 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -364,7 +364,7 @@ static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
 	read_lock_irqsave(&cm.device_lock, flags);
 	list_for_each_entry(cm_dev, &cm.device_list, list) {
 		if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid,
-					IB_GID_TYPE_IB, ndev, &p, NULL)) {
+					path->gid_type, ndev, &p, NULL)) {
 			port = cm_dev->port[p-1];
 			break;
 		}
@@ -1600,6 +1600,8 @@ static int cm_req_handler(struct cm_work *work)
 	struct ib_cm_id *cm_id;
 	struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
 	struct cm_req_msg *req_msg;
+	union ib_gid gid;
+	struct ib_gid_attr gid_attr;
 	int ret;
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
@@ -1639,11 +1641,24 @@ static int cm_req_handler(struct cm_work *work)
 	cm_format_paths_from_req(req_msg, &work->path[0], &work->path[1]);
 
 	memcpy(work->path[0].dmac, cm_id_priv->av.ah_attr.dmac, ETH_ALEN);
-	ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
+	ret = ib_get_cached_gid(work->port->cm_dev->ib_device,
+				work->port->port_num,
+				cm_id_priv->av.ah_attr.grh.sgid_index,
+				&gid, &gid_attr);
+	if (!ret) {
+		if (gid_attr.ndev)
+			dev_put(gid_attr.ndev);
+		work->path[0].gid_type = gid_attr.gid_type;
+		ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
+	}
 	if (ret) {
-		ib_get_cached_gid(work->port->cm_dev->ib_device,
-				  work->port->port_num, 0, &work->path[0].sgid,
-				  NULL);
+		int err = ib_get_cached_gid(work->port->cm_dev->ib_device,
+					    work->port->port_num, 0,
+					    &work->path[0].sgid,
+					    &gid_attr);
+		if (!err && gid_attr.ndev)
+			dev_put(gid_attr.ndev);
+		work->path[0].gid_type = gid_attr.gid_type;
 		ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
 			       &work->path[0].sgid, sizeof work->path[0].sgid,
 			       NULL, 0);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c19f822..2914e08 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -228,6 +228,7 @@ struct rdma_id_private {
 	u8			tos;
 	u8			reuseaddr;
 	u8			afonly;
+	enum ib_gid_type	gid_type;
 };
 
 struct cma_multicast {
@@ -2325,6 +2326,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 		ndev = dev_get_by_index(&init_net, addr->dev_addr.bound_dev_if);
 		route->path_rec->net = &init_net;
 		route->path_rec->ifindex = addr->dev_addr.bound_dev_if;
+		route->path_rec->gid_type = id_priv->gid_type;
 	}
 	if (!ndev) {
 		ret = -ENODEV;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 02/11] IB/cm: Use the source GID index type Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Matan Barak
                     ` (8 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

This patch set adds attributes of net device and gid type to each GID
in the GID table. Users that use verbs directly need to specify
the GID index. Since the same GID could have different types or
associated net devices, users should have the ability to query the
associated GID attributes. Adding these attributes to sysfs.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
 drivers/infiniband/core/sysfs.c                  | 184 ++++++++++++++++++++++-
 2 files changed, 198 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband

diff --git a/Documentation/ABI/testing/sysfs-class-infiniband b/Documentation/ABI/testing/sysfs-class-infiniband
new file mode 100644
index 0000000..a86abe6
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-infiniband
@@ -0,0 +1,16 @@
+What:		/sys/class/infiniband/<hca>/ports/<port-number>/gid_attrs/ndevs/<gid-index>
+Date:		November 29, 2015
+KernelVersion:	4.4.0
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description: 	The net-device's name associated with the GID resides
+		at index <gid-index>.
+
+What:		/sys/class/infiniband/<hca>/ports/<port-number>/gid_attrs/types/<gid-index>
+Date:		November 29, 2015
+KernelVersion:	4.4.0
+Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description: 	The RoCE type of the associated GID resides at index <gid-index>.
+		This could either be "IB/RoCE v1" for IB and RoCE v1 based GODs
+		or "RoCE v2" for RoCE v2 based GIDs.
+
+
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index b1f37d4..4d5d87a 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -37,12 +37,22 @@
 #include <linux/slab.h>
 #include <linux/stat.h>
 #include <linux/string.h>
+#include <linux/netdevice.h>
 
 #include <rdma/ib_mad.h>
 
+struct ib_port;
+
+struct gid_attr_group {
+	struct ib_port		*port;
+	struct kobject		kobj;
+	struct attribute_group	ndev;
+	struct attribute_group	type;
+};
 struct ib_port {
 	struct kobject         kobj;
 	struct ib_device      *ibdev;
+	struct gid_attr_group *gid_attr_group;
 	struct attribute_group gid_group;
 	struct attribute_group pkey_group;
 	u8                     port_num;
@@ -84,6 +94,24 @@ static const struct sysfs_ops port_sysfs_ops = {
 	.show = port_attr_show
 };
 
+static ssize_t gid_attr_show(struct kobject *kobj,
+			     struct attribute *attr, char *buf)
+{
+	struct port_attribute *port_attr =
+		container_of(attr, struct port_attribute, attr);
+	struct ib_port *p = container_of(kobj, struct gid_attr_group,
+					 kobj)->port;
+
+	if (!port_attr->show)
+		return -EIO;
+
+	return port_attr->show(p, port_attr, buf);
+}
+
+static const struct sysfs_ops gid_attr_sysfs_ops = {
+	.show = gid_attr_show
+};
+
 static ssize_t state_show(struct ib_port *p, struct port_attribute *unused,
 			  char *buf)
 {
@@ -281,6 +309,46 @@ static struct attribute *port_default_attrs[] = {
 	NULL
 };
 
+static size_t print_ndev(struct ib_gid_attr *gid_attr, char *buf)
+{
+	if (!gid_attr->ndev)
+		return -EINVAL;
+
+	return sprintf(buf, "%s\n", gid_attr->ndev->name);
+}
+
+static size_t print_gid_type(struct ib_gid_attr *gid_attr, char *buf)
+{
+	return sprintf(buf, "%s\n", ib_cache_gid_type_str(gid_attr->gid_type));
+}
+
+static ssize_t _show_port_gid_attr(struct ib_port *p,
+				   struct port_attribute *attr,
+				   char *buf,
+				   size_t (*print)(struct ib_gid_attr *gid_attr,
+						   char *buf))
+{
+	struct port_table_attribute *tab_attr =
+		container_of(attr, struct port_table_attribute, attr);
+	union ib_gid gid;
+	struct ib_gid_attr gid_attr = {};
+	ssize_t ret;
+	va_list args;
+
+	ret = ib_query_gid(p->ibdev, p->port_num, tab_attr->index, &gid,
+			   &gid_attr);
+	if (ret)
+		goto err;
+
+	ret = print(&gid_attr, buf);
+
+err:
+	if (gid_attr.ndev)
+		dev_put(gid_attr.ndev);
+	va_end(args);
+	return ret;
+}
+
 static ssize_t show_port_gid(struct ib_port *p, struct port_attribute *attr,
 			     char *buf)
 {
@@ -296,6 +364,19 @@ static ssize_t show_port_gid(struct ib_port *p, struct port_attribute *attr,
 	return sprintf(buf, "%pI6\n", gid.raw);
 }
 
+static ssize_t show_port_gid_attr_ndev(struct ib_port *p,
+				       struct port_attribute *attr, char *buf)
+{
+	return _show_port_gid_attr(p, attr, buf, print_ndev);
+}
+
+static ssize_t show_port_gid_attr_gid_type(struct ib_port *p,
+					   struct port_attribute *attr,
+					   char *buf)
+{
+	return _show_port_gid_attr(p, attr, buf, print_gid_type);
+}
+
 static ssize_t show_port_pkey(struct ib_port *p, struct port_attribute *attr,
 			      char *buf)
 {
@@ -451,12 +532,41 @@ static void ib_port_release(struct kobject *kobj)
 	kfree(p);
 }
 
+static void ib_port_gid_attr_release(struct kobject *kobj)
+{
+	struct gid_attr_group *g = container_of(kobj, struct gid_attr_group,
+						kobj);
+	struct attribute *a;
+	int i;
+
+	if (g->ndev.attrs) {
+		for (i = 0; (a = g->ndev.attrs[i]); ++i)
+			kfree(a);
+
+		kfree(g->ndev.attrs);
+	}
+
+	if (g->type.attrs) {
+		for (i = 0; (a = g->type.attrs[i]); ++i)
+			kfree(a);
+
+		kfree(g->type.attrs);
+	}
+
+	kfree(g);
+}
+
 static struct kobj_type port_type = {
 	.release       = ib_port_release,
 	.sysfs_ops     = &port_sysfs_ops,
 	.default_attrs = port_default_attrs
 };
 
+static struct kobj_type gid_attr_type = {
+	.sysfs_ops      = &gid_attr_sysfs_ops,
+	.release        = ib_port_gid_attr_release
+};
+
 static struct attribute **
 alloc_group_attrs(ssize_t (*show)(struct ib_port *,
 				  struct port_attribute *, char *buf),
@@ -528,9 +638,23 @@ static int add_port(struct ib_device *device, int port_num,
 		return ret;
 	}
 
+	p->gid_attr_group = kzalloc(sizeof(*p->gid_attr_group), GFP_KERNEL);
+	if (!p->gid_attr_group) {
+		ret = -ENOMEM;
+		goto err_put;
+	}
+
+	p->gid_attr_group->port = p;
+	ret = kobject_init_and_add(&p->gid_attr_group->kobj, &gid_attr_type,
+				   &p->kobj, "gid_attrs");
+	if (ret) {
+		kfree(p->gid_attr_group);
+		goto err_put;
+	}
+
 	ret = sysfs_create_group(&p->kobj, &pma_group);
 	if (ret)
-		goto err_put;
+		goto err_put_gid_attrs;
 
 	p->gid_group.name  = "gids";
 	p->gid_group.attrs = alloc_group_attrs(show_port_gid, attr.gid_tbl_len);
@@ -543,12 +667,38 @@ static int add_port(struct ib_device *device, int port_num,
 	if (ret)
 		goto err_free_gid;
 
+	p->gid_attr_group->ndev.name = "ndevs";
+	p->gid_attr_group->ndev.attrs = alloc_group_attrs(show_port_gid_attr_ndev,
+							  attr.gid_tbl_len);
+	if (!p->gid_attr_group->ndev.attrs) {
+		ret = -ENOMEM;
+		goto err_remove_gid;
+	}
+
+	ret = sysfs_create_group(&p->gid_attr_group->kobj,
+				 &p->gid_attr_group->ndev);
+	if (ret)
+		goto err_free_gid_ndev;
+
+	p->gid_attr_group->type.name = "types";
+	p->gid_attr_group->type.attrs = alloc_group_attrs(show_port_gid_attr_gid_type,
+							  attr.gid_tbl_len);
+	if (!p->gid_attr_group->type.attrs) {
+		ret = -ENOMEM;
+		goto err_remove_gid_ndev;
+	}
+
+	ret = sysfs_create_group(&p->gid_attr_group->kobj,
+				 &p->gid_attr_group->type);
+	if (ret)
+		goto err_free_gid_type;
+
 	p->pkey_group.name  = "pkeys";
 	p->pkey_group.attrs = alloc_group_attrs(show_port_pkey,
 						attr.pkey_tbl_len);
 	if (!p->pkey_group.attrs) {
 		ret = -ENOMEM;
-		goto err_remove_gid;
+		goto err_remove_gid_type;
 	}
 
 	ret = sysfs_create_group(&p->kobj, &p->pkey_group);
@@ -576,6 +726,28 @@ err_free_pkey:
 	kfree(p->pkey_group.attrs);
 	p->pkey_group.attrs = NULL;
 
+err_remove_gid_type:
+	sysfs_remove_group(&p->gid_attr_group->kobj,
+			   &p->gid_attr_group->type);
+
+err_free_gid_type:
+	for (i = 0; i < attr.gid_tbl_len; ++i)
+		kfree(p->gid_attr_group->type.attrs[i]);
+
+	kfree(p->gid_attr_group->type.attrs);
+	p->gid_attr_group->type.attrs = NULL;
+
+err_remove_gid_ndev:
+	sysfs_remove_group(&p->gid_attr_group->kobj,
+			   &p->gid_attr_group->ndev);
+
+err_free_gid_ndev:
+	for (i = 0; i < attr.gid_tbl_len; ++i)
+		kfree(p->gid_attr_group->ndev.attrs[i]);
+
+	kfree(p->gid_attr_group->ndev.attrs);
+	p->gid_attr_group->ndev.attrs = NULL;
+
 err_remove_gid:
 	sysfs_remove_group(&p->kobj, &p->gid_group);
 
@@ -589,6 +761,9 @@ err_free_gid:
 err_remove_pma:
 	sysfs_remove_group(&p->kobj, &pma_group);
 
+err_put_gid_attrs:
+	kobject_put(&p->gid_attr_group->kobj);
+
 err_put:
 	kobject_put(&p->kobj);
 	return ret;
@@ -803,6 +978,11 @@ static void free_port_list_attributes(struct ib_device *device)
 		sysfs_remove_group(p, &pma_group);
 		sysfs_remove_group(p, &port->pkey_group);
 		sysfs_remove_group(p, &port->gid_group);
+		sysfs_remove_group(&port->gid_attr_group->kobj,
+				   &port->gid_attr_group->ndev);
+		sysfs_remove_group(&port->gid_attr_group->kobj,
+				   &port->gid_attr_group->type);
+		kobject_put(&port->gid_attr_group->kobj);
 		kobject_put(p);
 	}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (14 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc Matan Barak
                     ` (7 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Adding RoCE v2 GID type and port type. Vendors
which support this type will get their GID table
populated with RoCE v2 GIDs automatically.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c         |  1 +
 drivers/infiniband/core/roce_gid_mgmt.c |  3 ++-
 include/rdma/ib_verbs.h                 | 23 +++++++++++++++++++++--
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 566fd8f..88b4b6f 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -128,6 +128,7 @@ static void dispatch_gid_change_event(struct ib_device *ib_dev, u8 port)
 
 static const char * const gid_type_str[] = {
 	[IB_GID_TYPE_IB]	= "IB/RoCE v1",
+	[IB_GID_TYPE_ROCE_UDP_ENCAP]	= "RoCE v2",
 };
 
 const char *ib_cache_gid_type_str(enum ib_gid_type gid_type)
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 61c27a7..1e3673f 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -71,7 +71,8 @@ static const struct {
 	bool (*is_supported)(const struct ib_device *device, u8 port_num);
 	enum ib_gid_type gid_type;
 } PORT_CAP_TO_GID_TYPE[] = {
-	{rdma_protocol_roce,   IB_GID_TYPE_ROCE},
+	{rdma_protocol_roce_eth_encap, IB_GID_TYPE_ROCE},
+	{rdma_protocol_roce_udp_encap, IB_GID_TYPE_ROCE_UDP_ENCAP},
 };
 
 #define CAP_TO_GID_TABLE_SIZE	ARRAY_SIZE(PORT_CAP_TO_GID_TYPE)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2933aeb..87df931 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -71,6 +71,7 @@ enum ib_gid_type {
 	/* If link layer is Ethernet, this is RoCE V1 */
 	IB_GID_TYPE_IB        = 0,
 	IB_GID_TYPE_ROCE      = 0,
+	IB_GID_TYPE_ROCE_UDP_ENCAP = 1,
 	IB_GID_TYPE_SIZE
 };
 
@@ -401,6 +402,7 @@ union rdma_protocol_stats {
 #define RDMA_CORE_CAP_PROT_IB           0x00100000
 #define RDMA_CORE_CAP_PROT_ROCE         0x00200000
 #define RDMA_CORE_CAP_PROT_IWARP        0x00400000
+#define RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP 0x00800000
 
 #define RDMA_CORE_PORT_IBA_IB          (RDMA_CORE_CAP_PROT_IB  \
 					| RDMA_CORE_CAP_IB_MAD \
@@ -413,6 +415,12 @@ union rdma_protocol_stats {
 					| RDMA_CORE_CAP_IB_CM   \
 					| RDMA_CORE_CAP_AF_IB   \
 					| RDMA_CORE_CAP_ETH_AH)
+#define RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP			\
+					(RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP \
+					| RDMA_CORE_CAP_IB_MAD  \
+					| RDMA_CORE_CAP_IB_CM   \
+					| RDMA_CORE_CAP_AF_IB   \
+					| RDMA_CORE_CAP_ETH_AH)
 #define RDMA_CORE_PORT_IWARP           (RDMA_CORE_CAP_PROT_IWARP \
 					| RDMA_CORE_CAP_IW_CM)
 #define RDMA_CORE_PORT_INTEL_OPA       (RDMA_CORE_PORT_IBA_IB  \
@@ -1975,6 +1983,17 @@ static inline bool rdma_protocol_ib(const struct ib_device *device, u8 port_num)
 
 static inline bool rdma_protocol_roce(const struct ib_device *device, u8 port_num)
 {
+	return device->port_immutable[port_num].core_cap_flags &
+		(RDMA_CORE_CAP_PROT_ROCE | RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP);
+}
+
+static inline bool rdma_protocol_roce_udp_encap(const struct ib_device *device, u8 port_num)
+{
+	return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP;
+}
+
+static inline bool rdma_protocol_roce_eth_encap(const struct ib_device *device, u8 port_num)
+{
 	return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_ROCE;
 }
 
@@ -1985,8 +2004,8 @@ static inline bool rdma_protocol_iwarp(const struct ib_device *device, u8 port_n
 
 static inline bool rdma_ib_or_roce(const struct ib_device *device, u8 port_num)
 {
-	return device->port_immutable[port_num].core_cap_flags &
-		(RDMA_CORE_CAP_PROT_IB | RDMA_CORE_CAP_PROT_ROCE);
+	return rdma_protocol_ib(device, port_num) ||
+		rdma_protocol_roce(device, port_num);
 }
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (15 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file Matan Barak
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

From: Somnath Kotur <Somnath.Kotur-idTK6quXuVSLFuii7jzJGg@public.gmane.org>

Providers should tell IB core the wc's network type.
This is used in order to search for the proper GID in the
GID table. When using HCAs that can't provide this info,
IB core tries to deep examine the packet and extract
the GID type by itself.

We choose sgid_index and type from all the matching entries in
RDMA-CM based on hint from the IP stack and we set hop_limit for
the IP packet based on above hint from IP stack.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Somnath Kotur <Somnath.Kotur-idTK6quXuVSLFuii7jzJGg@public.gmane.org>
---
 drivers/infiniband/core/addr.c  |  14 +++++
 drivers/infiniband/core/cma.c   |  11 +++-
 drivers/infiniband/core/verbs.c | 123 ++++++++++++++++++++++++++++++++++++++--
 include/rdma/ib_addr.h          |   1 +
 include/rdma/ib_verbs.h         |  44 ++++++++++++++
 5 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 34b1ada..6e35299 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -256,6 +256,12 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 		goto put;
 	}
 
+	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
+	 * routable) and we could set the network type accordingly.
+	 */
+	if (rt->rt_uses_gateway)
+		addr->network = RDMA_NETWORK_IPV4;
+
 	ret = dst_fetch_ha(&rt->dst, addr, &fl4.daddr);
 put:
 	ip_rt_put(rt);
@@ -270,6 +276,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 {
 	struct flowi6 fl6;
 	struct dst_entry *dst;
+	struct rt6_info *rt;
 	int ret;
 
 	memset(&fl6, 0, sizeof fl6);
@@ -281,6 +288,7 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 	if ((ret = dst->error))
 		goto put;
 
+	rt = (struct rt6_info *)dst;
 	if (ipv6_addr_any(&fl6.saddr)) {
 		ret = ipv6_dev_get_saddr(addr->net, ip6_dst_idev(dst)->dev,
 					 &fl6.daddr, 0, &fl6.saddr);
@@ -304,6 +312,12 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 		goto put;
 	}
 
+	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
+	 * routable) and we could set the network type accordingly.
+	 */
+	if (rt->rt6i_flags & RTF_GATEWAY)
+		addr->network = RDMA_NETWORK_IPV6;
+
 	ret = dst_fetch_ha(dst, addr, &fl6.daddr);
 put:
 	dst_release(dst);
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2914e08..5dc853c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2302,6 +2302,7 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 {
 	struct rdma_route *route = &id_priv->id.route;
 	struct rdma_addr *addr = &route->addr;
+	enum ib_gid_type network_gid_type;
 	struct cma_work *work;
 	int ret;
 	struct net_device *ndev = NULL;
@@ -2340,7 +2341,15 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.dst_addr,
 		    &route->path_rec->dgid);
 
-	route->path_rec->hop_limit = 1;
+	/* Use the hint from IP Stack to select GID Type */
+	network_gid_type = ib_network_to_gid_type(addr->dev_addr.network);
+	if (addr->dev_addr.network != RDMA_NETWORK_IB) {
+		route->path_rec->gid_type = network_gid_type;
+		/* TODO: get the hoplimit from the inet/inet6 device */
+		route->path_rec->hop_limit = IPV6_DEFAULT_HOPLIMIT;
+	} else {
+		route->path_rec->hop_limit = 1;
+	}
 	route->path_rec->reversible = 1;
 	route->path_rec->pkey = cpu_to_be16(0xffff);
 	route->path_rec->mtu_selector = IB_SA_EQ;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 4263c4c..c564131 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -311,8 +311,61 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
 }
 EXPORT_SYMBOL(ib_create_ah);
 
+static int ib_get_header_version(const union rdma_network_hdr *hdr)
+{
+	const struct iphdr *ip4h = (struct iphdr *)&hdr->roce4grh;
+	struct iphdr ip4h_checked;
+	const struct ipv6hdr *ip6h = (struct ipv6hdr *)&hdr->ibgrh;
+
+	/* If it's IPv6, the version must be 6, otherwise, the first
+	 * 20 bytes (before the IPv4 header) are garbled.
+	 */
+	if (ip6h->version != 6)
+		return (ip4h->version == 4) ? 4 : 0;
+	/* version may be 6 or 4 because the first 20 bytes could be garbled */
+
+	/* RoCE v2 requires no options, thus header length
+	 * must be 5 words
+	 */
+	if (ip4h->ihl != 5)
+		return 6;
+
+	/* Verify checksum.
+	 * We can't write on scattered buffers so we need to copy to
+	 * temp buffer.
+	 */
+	memcpy(&ip4h_checked, ip4h, sizeof(ip4h_checked));
+	ip4h_checked.check = 0;
+	ip4h_checked.check = ip_fast_csum((u8 *)&ip4h_checked, 5);
+	/* if IPv4 header checksum is OK, believe it */
+	if (ip4h->check == ip4h_checked.check)
+		return 4;
+	return 6;
+}
+
+static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
+						     u8 port_num,
+						     const struct ib_grh *grh)
+{
+	int grh_version;
+
+	if (rdma_protocol_ib(device, port_num))
+		return RDMA_NETWORK_IB;
+
+	grh_version = ib_get_header_version((union rdma_network_hdr *)grh);
+
+	if (grh_version == 4)
+		return RDMA_NETWORK_IPV4;
+
+	if (grh->next_hdr == IPPROTO_UDP)
+		return RDMA_NETWORK_IPV6;
+
+	return RDMA_NETWORK_ROCE_V1;
+}
+
 struct find_gid_index_context {
 	u16 vlan_id;
+	enum ib_gid_type gid_type;
 };
 
 static bool find_gid_index(const union ib_gid *gid,
@@ -322,6 +375,9 @@ static bool find_gid_index(const union ib_gid *gid,
 	struct find_gid_index_context *ctx =
 		(struct find_gid_index_context *)context;
 
+	if (ctx->gid_type != gid_attr->gid_type)
+		return false;
+
 	if ((!!(ctx->vlan_id != 0xffff) == !is_vlan_dev(gid_attr->ndev)) ||
 	    (is_vlan_dev(gid_attr->ndev) &&
 	     vlan_dev_vlan_id(gid_attr->ndev) != ctx->vlan_id))
@@ -332,14 +388,49 @@ static bool find_gid_index(const union ib_gid *gid,
 
 static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
 				   u16 vlan_id, const union ib_gid *sgid,
+				   enum ib_gid_type gid_type,
 				   u16 *gid_index)
 {
-	struct find_gid_index_context context = {.vlan_id = vlan_id};
+	struct find_gid_index_context context = {.vlan_id = vlan_id,
+						 .gid_type = gid_type};
 
 	return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index,
 				     &context, gid_index);
 }
 
+static int get_gids_from_rdma_hdr(union rdma_network_hdr *hdr,
+				  enum rdma_network_type net_type,
+				  union ib_gid *sgid, union ib_gid *dgid)
+{
+	struct sockaddr_in  src_in;
+	struct sockaddr_in  dst_in;
+	__be32 src_saddr, dst_saddr;
+
+	if (!sgid || !dgid)
+		return -EINVAL;
+
+	if (net_type == RDMA_NETWORK_IPV4) {
+		memcpy(&src_in.sin_addr.s_addr,
+		       &hdr->roce4grh.saddr, 4);
+		memcpy(&dst_in.sin_addr.s_addr,
+		       &hdr->roce4grh.daddr, 4);
+		src_saddr = src_in.sin_addr.s_addr;
+		dst_saddr = dst_in.sin_addr.s_addr;
+		ipv6_addr_set_v4mapped(src_saddr,
+				       (struct in6_addr *)sgid);
+		ipv6_addr_set_v4mapped(dst_saddr,
+				       (struct in6_addr *)dgid);
+		return 0;
+	} else if (net_type == RDMA_NETWORK_IPV6 ||
+		   net_type == RDMA_NETWORK_IB) {
+		*dgid = hdr->ibgrh.dgid;
+		*sgid = hdr->ibgrh.sgid;
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
 int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 		       const struct ib_wc *wc, const struct ib_grh *grh,
 		       struct ib_ah_attr *ah_attr)
@@ -347,9 +438,25 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 	u32 flow_class;
 	u16 gid_index;
 	int ret;
+	enum rdma_network_type net_type = RDMA_NETWORK_IB;
+	enum ib_gid_type gid_type = IB_GID_TYPE_IB;
+	union ib_gid dgid;
+	union ib_gid sgid;
 
 	memset(ah_attr, 0, sizeof *ah_attr);
 	if (rdma_cap_eth_ah(device, port_num)) {
+		if (wc->wc_flags & IB_WC_WITH_NETWORK_HDR_TYPE)
+			net_type = wc->network_hdr_type;
+		else
+			net_type = ib_get_net_type_by_grh(device, port_num, grh);
+		gid_type = ib_network_to_gid_type(net_type);
+	}
+	ret = get_gids_from_rdma_hdr((union rdma_network_hdr *)grh, net_type,
+				     &sgid, &dgid);
+	if (ret)
+		return ret;
+
+	if (rdma_protocol_roce(device, port_num)) {
 		u16 vlan_id = wc->wc_flags & IB_WC_WITH_VLAN ?
 				wc->vlan_id : 0xffff;
 
@@ -358,7 +465,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 
 		if (!(wc->wc_flags & IB_WC_WITH_SMAC) ||
 		    !(wc->wc_flags & IB_WC_WITH_VLAN)) {
-			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
+			ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
 							 ah_attr->dmac,
 							 wc->wc_flags & IB_WC_WITH_VLAN ?
 							 NULL : &vlan_id,
@@ -368,7 +475,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 		}
 
 		ret = get_sgid_index_from_eth(device, port_num, vlan_id,
-					      &grh->dgid, &gid_index);
+					      &dgid, gid_type, &gid_index);
 		if (ret)
 			return ret;
 
@@ -383,10 +490,10 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 
 	if (wc->wc_flags & IB_WC_GRH) {
 		ah_attr->ah_flags = IB_AH_GRH;
-		ah_attr->grh.dgid = grh->sgid;
+		ah_attr->grh.dgid = sgid;
 
 		if (!rdma_cap_eth_ah(device, port_num)) {
-			ret = ib_find_cached_gid_by_port(device, &grh->dgid,
+			ret = ib_find_cached_gid_by_port(device, &dgid,
 							 IB_GID_TYPE_IB,
 							 port_num, NULL,
 							 &gid_index);
@@ -1026,6 +1133,12 @@ int ib_resolve_eth_dmac(struct ib_qp *qp,
 					ret = -ENXIO;
 				goto out;
 			}
+			if (sgid_attr.gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
+				/* TODO: get the hoplimit from the inet/inet6
+				 * device
+				 */
+				qp_attr->ah_attr.grh.hop_limit =
+							IPV6_DEFAULT_HOPLIMIT;
 
 			ifindex = sgid_attr.ndev->ifindex;
 
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 1152859..c799caa 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -83,6 +83,7 @@ struct rdma_dev_addr {
 	int bound_dev_if;
 	enum rdma_transport_type transport;
 	struct net *net;
+	enum rdma_network_type network;
 };
 
 /**
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 87df931..31fb409 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -50,6 +50,8 @@
 #include <linux/workqueue.h>
 #include <linux/socket.h>
 #include <uapi/linux/if_ether.h>
+#include <net/ipv6.h>
+#include <net/ip.h>
 
 #include <linux/atomic.h>
 #include <linux/mmu_notifier.h>
@@ -107,6 +109,35 @@ enum rdma_protocol_type {
 __attribute_const__ enum rdma_transport_type
 rdma_node_get_transport(enum rdma_node_type node_type);
 
+enum rdma_network_type {
+	RDMA_NETWORK_IB,
+	RDMA_NETWORK_ROCE_V1 = RDMA_NETWORK_IB,
+	RDMA_NETWORK_IPV4,
+	RDMA_NETWORK_IPV6
+};
+
+static inline enum ib_gid_type ib_network_to_gid_type(enum rdma_network_type network_type)
+{
+	if (network_type == RDMA_NETWORK_IPV4 ||
+	    network_type == RDMA_NETWORK_IPV6)
+		return IB_GID_TYPE_ROCE_UDP_ENCAP;
+
+	/* IB_GID_TYPE_IB same as RDMA_NETWORK_ROCE_V1 */
+	return IB_GID_TYPE_IB;
+}
+
+static inline enum rdma_network_type ib_gid_to_network_type(enum ib_gid_type gid_type,
+							    union ib_gid *gid)
+{
+	if (gid_type == IB_GID_TYPE_IB)
+		return RDMA_NETWORK_IB;
+
+	if (ipv6_addr_v4mapped((struct in6_addr *)gid))
+		return RDMA_NETWORK_IPV4;
+	else
+		return RDMA_NETWORK_IPV6;
+}
+
 enum rdma_link_layer {
 	IB_LINK_LAYER_UNSPECIFIED,
 	IB_LINK_LAYER_INFINIBAND,
@@ -535,6 +566,17 @@ struct ib_grh {
 	union ib_gid	dgid;
 };
 
+union rdma_network_hdr {
+	struct ib_grh ibgrh;
+	struct {
+		/* The IB spec states that if it's IPv4, the header
+		 * is located in the last 20 bytes of the header.
+		 */
+		u8		reserved[20];
+		struct iphdr	roce4grh;
+	};
+};
+
 enum {
 	IB_MULTICAST_QPN = 0xffffff
 };
@@ -771,6 +813,7 @@ enum ib_wc_flags {
 	IB_WC_IP_CSUM_OK	= (1<<3),
 	IB_WC_WITH_SMAC		= (1<<4),
 	IB_WC_WITH_VLAN		= (1<<5),
+	IB_WC_WITH_NETWORK_HDR_TYPE	= (1<<6),
 };
 
 struct ib_wc {
@@ -793,6 +836,7 @@ struct ib_wc {
 	u8			port_num;	/* valid only for DR SMPs on switches */
 	u8			smac[ETH_ALEN];
 	u16			vlan_id;
+	u8			network_hdr_type;
 };
 
 enum ib_cq_notify_flags {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (16 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path Matan Barak
                     ` (5 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

In order to validate the route, we need an easy way to check if a
net-device belongs to our RDMA device. Move this helper function
to a header file in order to make this check easier.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/core_priv.h     | 13 +++++++++++++
 drivers/infiniband/core/roce_gid_mgmt.c | 20 ++++----------------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index d531f91..3b250a2 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -96,4 +96,17 @@ int ib_cache_setup_one(struct ib_device *device);
 void ib_cache_cleanup_one(struct ib_device *device);
 void ib_cache_release_one(struct ib_device *device);
 
+static inline bool rdma_is_upper_dev_rcu(struct net_device *dev,
+					 struct net_device *upper)
+{
+	struct net_device *_upper = NULL;
+	struct list_head *iter;
+
+	netdev_for_each_all_upper_dev_rcu(dev, _upper, iter)
+		if (_upper == upper)
+			break;
+
+	return _upper == upper;
+}
+
 #endif /* _CORE_PRIV_H */
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index 1e3673f..06556c3 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -139,18 +139,6 @@ static enum bonding_slave_state is_eth_active_slave_of_bonding_rcu(struct net_de
 	return BONDING_SLAVE_STATE_NA;
 }
 
-static bool is_upper_dev_rcu(struct net_device *dev, struct net_device *upper)
-{
-	struct net_device *_upper = NULL;
-	struct list_head *iter;
-
-	netdev_for_each_all_upper_dev_rcu(dev, _upper, iter)
-		if (_upper == upper)
-			break;
-
-	return _upper == upper;
-}
-
 #define REQUIRED_BOND_STATES		(BONDING_SLAVE_STATE_ACTIVE |	\
 					 BONDING_SLAVE_STATE_NA)
 static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
@@ -168,7 +156,7 @@ static int is_eth_port_of_netdev(struct ib_device *ib_dev, u8 port,
 	if (!real_dev)
 		real_dev = event_ndev;
 
-	res = ((is_upper_dev_rcu(rdma_ndev, event_ndev) &&
+	res = ((rdma_is_upper_dev_rcu(rdma_ndev, event_ndev) &&
 	       (is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) &
 		REQUIRED_BOND_STATES)) ||
 	       real_dev == rdma_ndev);
@@ -214,7 +202,7 @@ static int upper_device_filter(struct ib_device *ib_dev, u8 port,
 		return 1;
 
 	rcu_read_lock();
-	res = is_upper_dev_rcu(rdma_ndev, event_ndev);
+	res = rdma_is_upper_dev_rcu(rdma_ndev, event_ndev);
 	rcu_read_unlock();
 
 	return res;
@@ -244,7 +232,7 @@ static void enum_netdev_default_gids(struct ib_device *ib_dev,
 	rcu_read_lock();
 	if (!rdma_ndev ||
 	    ((rdma_ndev != event_ndev &&
-	      !is_upper_dev_rcu(rdma_ndev, event_ndev)) ||
+	      !rdma_is_upper_dev_rcu(rdma_ndev, event_ndev)) ||
 	     is_eth_active_slave_of_bonding_rcu(rdma_ndev,
 						netdev_master_upper_dev_get_rcu(rdma_ndev)) ==
 	     BONDING_SLAVE_STATE_INACTIVE)) {
@@ -274,7 +262,7 @@ static void bond_delete_netdev_default_gids(struct ib_device *ib_dev,
 
 	rcu_read_lock();
 
-	if (is_upper_dev_rcu(rdma_ndev, event_ndev) &&
+	if (rdma_is_upper_dev_rcu(rdma_ndev, event_ndev) &&
 	    is_eth_active_slave_of_bonding_rcu(rdma_ndev, real_dev) ==
 	    BONDING_SLAVE_STATE_INACTIVE) {
 		unsigned long gid_type_mask;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (17 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
       [not found]     ` <1449150450-13679-20-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 13:47   ` [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count Matan Barak
                     ` (4 subsequent siblings)
  23 siblings, 1 reply; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

In order to make sure API users don't try to use SGIDs which don't
conform to the routing table, validate the route before searching
the RoCE GID table.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/addr.c           | 175 ++++++++++++++++++++++---------
 drivers/infiniband/core/cm.c             |  10 +-
 drivers/infiniband/core/cma.c            |  30 +++++-
 drivers/infiniband/core/sa_query.c       |  75 +++++++++++--
 drivers/infiniband/core/verbs.c          |  48 ++++++---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |   2 +-
 include/rdma/ib_addr.h                   |  10 +-
 7 files changed, 270 insertions(+), 80 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 6e35299..57eda11 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -121,7 +121,8 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 }
 EXPORT_SYMBOL(rdma_copy_addr);
 
-int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
+int rdma_translate_ip(const struct sockaddr *addr,
+		      struct rdma_dev_addr *dev_addr,
 		      u16 *vlan_id)
 {
 	struct net_device *dev;
@@ -139,7 +140,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 	switch (addr->sa_family) {
 	case AF_INET:
 		dev = ip_dev_find(dev_addr->net,
-			((struct sockaddr_in *) addr)->sin_addr.s_addr);
+			((const struct sockaddr_in *)addr)->sin_addr.s_addr);
 
 		if (!dev)
 			return ret;
@@ -154,7 +155,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 		rcu_read_lock();
 		for_each_netdev_rcu(dev_addr->net, dev) {
 			if (ipv6_chk_addr(dev_addr->net,
-					  &((struct sockaddr_in6 *) addr)->sin6_addr,
+					  &((const struct sockaddr_in6 *)addr)->sin6_addr,
 					  dev, 1)) {
 				ret = rdma_copy_addr(dev_addr, dev, NULL);
 				if (vlan_id)
@@ -198,7 +199,8 @@ static void queue_req(struct addr_req *req)
 	mutex_unlock(&lock);
 }
 
-static int dst_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr, void *daddr)
+static int dst_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr,
+			const void *daddr)
 {
 	struct neighbour *n;
 	int ret;
@@ -222,8 +224,9 @@ static int dst_fetch_ha(struct dst_entry *dst, struct rdma_dev_addr *dev_addr, v
 }
 
 static int addr4_resolve(struct sockaddr_in *src_in,
-			 struct sockaddr_in *dst_in,
-			 struct rdma_dev_addr *addr)
+			 const struct sockaddr_in *dst_in,
+			 struct rdma_dev_addr *addr,
+			 struct rtable **prt)
 {
 	__be32 src_ip = src_in->sin_addr.s_addr;
 	__be32 dst_ip = dst_in->sin_addr.s_addr;
@@ -243,36 +246,23 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 	src_in->sin_family = AF_INET;
 	src_in->sin_addr.s_addr = fl4.saddr;
 
-	if (rt->dst.dev->flags & IFF_LOOPBACK) {
-		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr, NULL);
-		if (!ret)
-			memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
-		goto put;
-	}
-
-	/* If the device does ARP internally, return 'done' */
-	if (rt->dst.dev->flags & IFF_NOARP) {
-		ret = rdma_copy_addr(addr, rt->dst.dev, NULL);
-		goto put;
-	}
-
 	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
 	 * routable) and we could set the network type accordingly.
 	 */
 	if (rt->rt_uses_gateway)
 		addr->network = RDMA_NETWORK_IPV4;
 
-	ret = dst_fetch_ha(&rt->dst, addr, &fl4.daddr);
-put:
-	ip_rt_put(rt);
+	*prt = rt;
+	return 0;
 out:
 	return ret;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
 static int addr6_resolve(struct sockaddr_in6 *src_in,
-			 struct sockaddr_in6 *dst_in,
-			 struct rdma_dev_addr *addr)
+			 const struct sockaddr_in6 *dst_in,
+			 struct rdma_dev_addr *addr,
+			 struct dst_entry **pdst)
 {
 	struct flowi6 fl6;
 	struct dst_entry *dst;
@@ -299,49 +289,109 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 		src_in->sin6_addr = fl6.saddr;
 	}
 
-	if (dst->dev->flags & IFF_LOOPBACK) {
-		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr, NULL);
-		if (!ret)
-			memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
-		goto put;
-	}
-
-	/* If the device does ARP internally, return 'done' */
-	if (dst->dev->flags & IFF_NOARP) {
-		ret = rdma_copy_addr(addr, dst->dev, NULL);
-		goto put;
-	}
-
 	/* If there's a gateway, we're definitely in RoCE v2 (as RoCE v1 isn't
 	 * routable) and we could set the network type accordingly.
 	 */
 	if (rt->rt6i_flags & RTF_GATEWAY)
 		addr->network = RDMA_NETWORK_IPV6;
 
-	ret = dst_fetch_ha(dst, addr, &fl6.daddr);
+	*pdst = dst;
+	return 0;
 put:
 	dst_release(dst);
 	return ret;
 }
 #else
 static int addr6_resolve(struct sockaddr_in6 *src_in,
-			 struct sockaddr_in6 *dst_in,
-			 struct rdma_dev_addr *addr)
+			 const struct sockaddr_in6 *dst_in,
+			 struct rdma_dev_addr *addr,
+			 struct dst_entry **pdst)
 {
 	return -EADDRNOTAVAIL;
 }
 #endif
 
+static int addr_resolve_neigh(struct dst_entry *dst,
+			      const struct sockaddr *dst_in,
+			      struct rdma_dev_addr *addr)
+{
+	if (dst->dev->flags & IFF_LOOPBACK) {
+		int ret;
+
+		ret = rdma_translate_ip(dst_in, addr, NULL);
+		if (!ret)
+			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
+			       MAX_ADDR_LEN);
+
+		return ret;
+	}
+
+	/* If the device does ARP internally */
+	if (!(dst->dev->flags & IFF_NOARP)) {
+		const struct sockaddr_in *dst_in4 =
+			(const struct sockaddr_in *)dst_in;
+		const struct sockaddr_in6 *dst_in6 =
+			(const struct sockaddr_in6 *)dst_in;
+
+		return dst_fetch_ha(dst, addr,
+				    dst_in->sa_family == AF_INET ?
+				    (const void *)&dst_in4->sin_addr.s_addr :
+				    (const void *)&dst_in6->sin6_addr);
+	}
+
+	return rdma_copy_addr(addr, dst->dev, NULL);
+}
+
 static int addr_resolve(struct sockaddr *src_in,
-			struct sockaddr *dst_in,
-			struct rdma_dev_addr *addr)
+			const struct sockaddr *dst_in,
+			struct rdma_dev_addr *addr,
+			bool resolve_neigh)
 {
+	struct net_device *ndev;
+	struct dst_entry *dst;
+	int ret;
+
 	if (src_in->sa_family == AF_INET) {
-		return addr4_resolve((struct sockaddr_in *) src_in,
-			(struct sockaddr_in *) dst_in, addr);
-	} else
-		return addr6_resolve((struct sockaddr_in6 *) src_in,
-			(struct sockaddr_in6 *) dst_in, addr);
+		struct rtable *rt = NULL;
+		const struct sockaddr_in *dst_in4 =
+			(const struct sockaddr_in *)dst_in;
+
+		ret = addr4_resolve((struct sockaddr_in *)src_in,
+				    dst_in4, addr, &rt);
+		if (ret)
+			return ret;
+
+		if (resolve_neigh)
+			ret = addr_resolve_neigh(&rt->dst, dst_in, addr);
+
+		ndev = rt->dst.dev;
+		dev_hold(ndev);
+
+		ip_rt_put(rt);
+	} else {
+		const struct sockaddr_in6 *dst_in6 =
+			(const struct sockaddr_in6 *)dst_in;
+
+		ret = addr6_resolve((struct sockaddr_in6 *)src_in,
+				    dst_in6, addr,
+				    &dst);
+		if (ret)
+			return ret;
+
+		if (resolve_neigh)
+			ret = addr_resolve_neigh(dst, dst_in, addr);
+
+		ndev = dst->dev;
+		dev_hold(ndev);
+
+		dst_release(dst);
+	}
+
+	addr->bound_dev_if = ndev->ifindex;
+	addr->net = dev_net(ndev);
+	dev_put(ndev);
+
+	return ret;
 }
 
 static void process_req(struct work_struct *work)
@@ -357,7 +407,8 @@ static void process_req(struct work_struct *work)
 		if (req->status == -ENODATA) {
 			src_in = (struct sockaddr *) &req->src_addr;
 			dst_in = (struct sockaddr *) &req->dst_addr;
-			req->status = addr_resolve(src_in, dst_in, req->addr);
+			req->status = addr_resolve(src_in, dst_in, req->addr,
+						   true);
 			if (req->status && time_after_eq(jiffies, req->timeout))
 				req->status = -ETIMEDOUT;
 			else if (req->status == -ENODATA)
@@ -417,7 +468,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 	req->client = client;
 	atomic_inc(&client->refcount);
 
-	req->status = addr_resolve(src_in, dst_in, addr);
+	req->status = addr_resolve(src_in, dst_in, addr, true);
 	switch (req->status) {
 	case 0:
 		req->timeout = jiffies;
@@ -439,6 +490,25 @@ err:
 }
 EXPORT_SYMBOL(rdma_resolve_ip);
 
+int rdma_resolve_ip_route(struct sockaddr *src_addr,
+			  const struct sockaddr *dst_addr,
+			  struct rdma_dev_addr *addr)
+{
+	struct sockaddr_storage ssrc_addr;
+	struct sockaddr *src_in = (struct sockaddr *)&ssrc_addr;
+
+	if (src_addr->sa_family != dst_addr->sa_family)
+		return -EINVAL;
+
+	if (src_addr)
+		memcpy(src_in, src_addr, rdma_addr_size(src_addr));
+	else
+		src_in->sa_family = dst_addr->sa_family;
+
+	return addr_resolve(src_in, dst_addr, addr, false);
+}
+EXPORT_SYMBOL(rdma_resolve_ip_route);
+
 void rdma_addr_cancel(struct rdma_dev_addr *addr)
 {
 	struct addr_req *req, *temp_req;
@@ -471,7 +541,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgid,
-			       u8 *dmac, u16 *vlan_id, int if_index)
+			       u8 *dmac, u16 *vlan_id, int *if_index)
 {
 	int ret = 0;
 	struct rdma_dev_addr dev_addr;
@@ -489,7 +559,8 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgi
 	rdma_gid2ip(&dgid_addr._sockaddr, dgid);
 
 	memset(&dev_addr, 0, sizeof(dev_addr));
-	dev_addr.bound_dev_if = if_index;
+	if (if_index)
+		dev_addr.bound_dev_if = *if_index;
 	dev_addr.net = &init_net;
 
 	ctx.addr = &dev_addr;
@@ -505,6 +576,8 @@ int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgi
 	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
 	if (!dev)
 		return -ENODEV;
+	if (if_index)
+		*if_index = dev_addr.bound_dev_if;
 	if (vlan_id)
 		*vlan_id = rdma_vlan_dev_vlan_id(dev);
 	dev_put(dev);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 5ea78ab..77bec4c 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1646,8 +1646,11 @@ static int cm_req_handler(struct cm_work *work)
 				cm_id_priv->av.ah_attr.grh.sgid_index,
 				&gid, &gid_attr);
 	if (!ret) {
-		if (gid_attr.ndev)
+		if (gid_attr.ndev) {
+			work->path[0].ifindex = gid_attr.ndev->ifindex;
+			work->path[0].net = dev_net(gid_attr.ndev);
 			dev_put(gid_attr.ndev);
+		}
 		work->path[0].gid_type = gid_attr.gid_type;
 		ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av);
 	}
@@ -1656,8 +1659,11 @@ static int cm_req_handler(struct cm_work *work)
 					    work->port->port_num, 0,
 					    &work->path[0].sgid,
 					    &gid_attr);
-		if (!err && gid_attr.ndev)
+		if (!err && gid_attr.ndev) {
+			work->path[0].ifindex = gid_attr.ndev->ifindex;
+			work->path[0].net = dev_net(gid_attr.ndev);
 			dev_put(gid_attr.ndev);
+		}
 		work->path[0].gid_type = gid_attr.gid_type;
 		ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_GID,
 			       &work->path[0].sgid, sizeof work->path[0].sgid,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 5dc853c..cf52b65 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -454,8 +454,20 @@ static inline int cma_validate_port(struct ib_device *device, u8 port,
 	if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port))
 		return ret;
 
-	if (dev_type == ARPHRD_ETHER)
+	if (dev_type == ARPHRD_ETHER) {
 		ndev = dev_get_by_index(&init_net, bound_if_index);
+		if (ndev && ndev->flags & IFF_LOOPBACK) {
+			pr_info("detected loopback device\n");
+			dev_put(ndev);
+
+			if (!device->get_netdev)
+				return -EOPNOTSUPP;
+
+			ndev = device->get_netdev(device, port);
+			if (!ndev)
+				return -ENODEV;
+		}
+	}
 
 	ret = ib_find_cached_gid_by_port(device, gid, IB_GID_TYPE_IB, port,
 					 ndev, NULL);
@@ -2325,8 +2337,22 @@ static int cma_resolve_iboe_route(struct rdma_id_private *id_priv)
 
 	if (addr->dev_addr.bound_dev_if) {
 		ndev = dev_get_by_index(&init_net, addr->dev_addr.bound_dev_if);
+		if (!ndev)
+			return -ENODEV;
+
+		if (ndev->flags & IFF_LOOPBACK) {
+			dev_put(ndev);
+			if (!id_priv->id.device->get_netdev)
+				return -EOPNOTSUPP;
+
+			ndev = id_priv->id.device->get_netdev(id_priv->id.device,
+							      id_priv->id.port_num);
+			if (!ndev)
+				return -ENODEV;
+		}
+
 		route->path_rec->net = &init_net;
-		route->path_rec->ifindex = addr->dev_addr.bound_dev_if;
+		route->path_rec->ifindex = ndev->ifindex;
 		route->path_rec->gid_type = id_priv->gid_type;
 	}
 	if (!ndev) {
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 5606922..19919ad 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -49,7 +49,9 @@
 #include <net/netlink.h>
 #include <uapi/rdma/ib_user_sa.h>
 #include <rdma/ib_marshall.h>
+#include <rdma/ib_addr.h>
 #include "sa.h"
+#include "core_priv.h"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand subnet administration query support");
@@ -994,7 +996,8 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 {
 	int ret;
 	u16 gid_index;
-	int force_grh;
+	int use_roce;
+	struct net_device *ndev = NULL;
 
 	memset(ah_attr, 0, sizeof *ah_attr);
 	ah_attr->dlid = be16_to_cpu(rec->dlid);
@@ -1004,16 +1007,71 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 	ah_attr->port_num = port_num;
 	ah_attr->static_rate = rec->rate;
 
-	force_grh = rdma_cap_eth_ah(device, port_num);
+	use_roce = rdma_cap_eth_ah(device, port_num);
+
+	if (use_roce) {
+		struct net_device *idev;
+		struct net_device *resolved_dev;
+		struct rdma_dev_addr dev_addr = {.bound_dev_if = rec->ifindex,
+						 .net = rec->net ? rec->net :
+							 &init_net};
+		union {
+			struct sockaddr     _sockaddr;
+			struct sockaddr_in  _sockaddr_in;
+			struct sockaddr_in6 _sockaddr_in6;
+		} sgid_addr, dgid_addr;
+
+		if (!device->get_netdev)
+			return -EOPNOTSUPP;
+
+		rdma_gid2ip(&sgid_addr._sockaddr, &rec->sgid);
+		rdma_gid2ip(&dgid_addr._sockaddr, &rec->dgid);
+
+		/* validate the route */
+		ret = rdma_resolve_ip_route(&sgid_addr._sockaddr,
+					    &dgid_addr._sockaddr, &dev_addr);
+		if (ret)
+			return ret;
 
-	if (rec->hop_limit > 1 || force_grh) {
-		struct net_device *ndev = ib_get_ndev_from_path(rec);
+		if ((dev_addr.network == RDMA_NETWORK_IPV4 ||
+		     dev_addr.network == RDMA_NETWORK_IPV6) &&
+		    rec->gid_type != IB_GID_TYPE_ROCE_UDP_ENCAP)
+			return -EINVAL;
+
+		idev = device->get_netdev(device, port_num);
+		if (!idev)
+			return -ENODEV;
+
+		resolved_dev = dev_get_by_index(dev_addr.net,
+						dev_addr.bound_dev_if);
+		if (resolved_dev->flags & IFF_LOOPBACK) {
+			dev_put(resolved_dev);
+			resolved_dev = idev;
+			dev_hold(resolved_dev);
+		}
+		ndev = ib_get_ndev_from_path(rec);
+		rcu_read_lock();
+		if ((ndev && ndev != resolved_dev) ||
+		    (resolved_dev != idev &&
+		     !rdma_is_upper_dev_rcu(idev, resolved_dev)))
+			ret = -EHOSTUNREACH;
+		rcu_read_unlock();
+		dev_put(idev);
+		dev_put(resolved_dev);
+		if (ret) {
+			if (ndev)
+				dev_put(ndev);
+			return ret;
+		}
+	}
 
+	if (rec->hop_limit > 1 || use_roce) {
 		ah_attr->ah_flags = IB_AH_GRH;
 		ah_attr->grh.dgid = rec->dgid;
 
-		ret = ib_find_cached_gid(device, &rec->sgid, rec->gid_type, ndev,
-					 &port_num, &gid_index);
+		ret = ib_find_cached_gid_by_port(device, &rec->sgid,
+						 rec->gid_type, port_num, ndev,
+						 &gid_index);
 		if (ret) {
 			if (ndev)
 				dev_put(ndev);
@@ -1027,9 +1085,10 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
 		if (ndev)
 			dev_put(ndev);
 	}
-	if (force_grh) {
+
+	if (use_roce)
 		memcpy(ah_attr->dmac, rec->dmac, ETH_ALEN);
-	}
+
 	return 0;
 }
 EXPORT_SYMBOL(ib_init_ah_from_path);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index c564131..114614f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -457,30 +457,52 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
 		return ret;
 
 	if (rdma_protocol_roce(device, port_num)) {
+		int if_index = 0;
 		u16 vlan_id = wc->wc_flags & IB_WC_WITH_VLAN ?
 				wc->vlan_id : 0xffff;
+		struct net_device *idev;
+		struct net_device *resolved_dev;
 
 		if (!(wc->wc_flags & IB_WC_GRH))
 			return -EPROTOTYPE;
 
-		if (!(wc->wc_flags & IB_WC_WITH_SMAC) ||
-		    !(wc->wc_flags & IB_WC_WITH_VLAN)) {
-			ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
-							 ah_attr->dmac,
-							 wc->wc_flags & IB_WC_WITH_VLAN ?
-							 NULL : &vlan_id,
-							 0);
-			if (ret)
-				return ret;
+		if (!device->get_netdev)
+			return -EOPNOTSUPP;
+
+		idev = device->get_netdev(device, port_num);
+		if (!idev)
+			return -ENODEV;
+
+		ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
+						 ah_attr->dmac,
+						 wc->wc_flags & IB_WC_WITH_VLAN ?
+						 NULL : &vlan_id,
+						 &if_index);
+		if (ret) {
+			dev_put(idev);
+			return ret;
 		}
 
+		resolved_dev = dev_get_by_index(&init_net, if_index);
+		if (resolved_dev->flags & IFF_LOOPBACK) {
+			dev_put(resolved_dev);
+			resolved_dev = idev;
+			dev_hold(resolved_dev);
+		}
+		rcu_read_lock();
+		if (resolved_dev != idev && !rdma_is_upper_dev_rcu(idev,
+								   resolved_dev))
+			ret = -EHOSTUNREACH;
+		rcu_read_unlock();
+		dev_put(idev);
+		dev_put(resolved_dev);
+		if (ret)
+			return ret;
+
 		ret = get_sgid_index_from_eth(device, port_num, vlan_id,
 					      &dgid, gid_type, &gid_index);
 		if (ret)
 			return ret;
-
-		if (wc->wc_flags & IB_WC_WITH_SMAC)
-			memcpy(ah_attr->dmac, wc->smac, ETH_ALEN);
 	}
 
 	ah_attr->dlid = wc->slid;
@@ -1145,7 +1167,7 @@ int ib_resolve_eth_dmac(struct ib_qp *qp,
 			ret = rdma_addr_find_dmac_by_grh(&sgid,
 							 &qp_attr->ah_attr.grh.dgid,
 							 qp_attr->ah_attr.dmac,
-							 NULL, ifindex);
+							 NULL, &ifindex);
 
 			dev_put(sgid_attr.ndev);
 		}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 9820074..a343e03 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -154,7 +154,7 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
 	    (!rdma_link_local_addr((struct in6_addr *)attr->grh.dgid.raw))) {
 		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
 						    attr->dmac, &vlan_tag,
-						    sgid_attr.ndev->ifindex);
+						    &sgid_attr.ndev->ifindex);
 		if (status) {
 			pr_err("%s(): Failed to resolve dmac from gid." 
 				"status = %d\n", __func__, status);
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index c799caa..87156dc 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -92,8 +92,8 @@ struct rdma_dev_addr {
  *
  * The dev_addr->net field must be initialized.
  */
-int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
-		      u16 *vlan_id);
+int rdma_translate_ip(const struct sockaddr *addr,
+		      struct rdma_dev_addr *dev_addr, u16 *vlan_id);
 
 /**
  * rdma_resolve_ip - Resolve source and destination IP addresses to
@@ -118,6 +118,10 @@ int rdma_resolve_ip(struct rdma_addr_client *client,
 				     struct rdma_dev_addr *addr, void *context),
 		    void *context);
 
+int rdma_resolve_ip_route(struct sockaddr *src_addr,
+			  const struct sockaddr *dst_addr,
+			  struct rdma_dev_addr *addr);
+
 void rdma_addr_cancel(struct rdma_dev_addr *addr);
 
 int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
@@ -127,7 +131,7 @@ int rdma_addr_size(struct sockaddr *addr);
 
 int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id);
 int rdma_addr_find_dmac_by_grh(const union ib_gid *sgid, const union ib_gid *dgid,
-			       u8 *smac, u16 *vlan_id, int if_index);
+			       u8 *smac, u16 *vlan_id, int *if_index);
 
 static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
 {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (18 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm Matan Barak
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Currently, cma users can't increase or decrease the cma reference
count. This is necassary when setting cma attributes (like the
default GID type) in order to avoid use-after-free errors.
Adding cma_ref_dev and cma_deref_dev APIs.

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c       | 11 +++++++++--
 drivers/infiniband/core/core_priv.h |  4 ++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cf52b65..f78088a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -60,6 +60,8 @@
 #include <rdma/ib_sa.h>
 #include <rdma/iw_cm.h>
 
+#include "core_priv.h"
+
 MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("Generic RDMA CM Agent");
 MODULE_LICENSE("Dual BSD/GPL");
@@ -185,6 +187,11 @@ enum {
 	CMA_OPTION_AFONLY,
 };
 
+void cma_ref_dev(struct cma_device *cma_dev)
+{
+	atomic_inc(&cma_dev->refcount);
+}
+
 /*
  * Device removal can occur at anytime, so we need extra handling to
  * serialize notifying the user of device removal with other callbacks.
@@ -339,7 +346,7 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
-	atomic_inc(&cma_dev->refcount);
+	cma_ref_dev(cma_dev);
 	id_priv->cma_dev = cma_dev;
 	id_priv->id.device = cma_dev->device;
 	id_priv->id.route.addr.dev_addr.transport =
@@ -347,7 +354,7 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 	list_add_tail(&id_priv->list, &cma_dev->id_list);
 }
 
-static inline void cma_deref_dev(struct cma_device *cma_dev)
+void cma_deref_dev(struct cma_device *cma_dev)
 {
 	if (atomic_dec_and_test(&cma_dev->refcount))
 		complete(&cma_dev->comp);
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 3b250a2..1945b4e 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -38,6 +38,10 @@
 
 #include <rdma/ib_verbs.h>
 
+struct cma_device;
+void cma_ref_dev(struct cma_device *cma_dev);
+void cma_deref_dev(struct cma_device *cma_dev);
+
 int  ib_device_register_sysfs(struct ib_device *device,
 			      int (*port_callback)(struct ib_device *,
 						   u8, struct kobject *));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (19 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers Matan Barak
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Users would like to control the behaviour of rdma_cm.
For example, old applications which don't set the
required RoCE gid type could be executed on RoCE V2
network types. In order to support this configuration,
we implement a configfs for rdma_cm.

In order to use the configfs, one needs to mount it and
mkdir <IB device name> inside rdma_cm directory.

The patch adds support for a single configuration file,
default_roce_mode. The mode can either be "IB/RoCE v1" or
"RoCE v2".

Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 Documentation/ABI/testing/configfs-rdma_cm |  22 ++
 drivers/infiniband/Kconfig                 |   9 +
 drivers/infiniband/core/Makefile           |   2 +
 drivers/infiniband/core/cache.c            |  24 +++
 drivers/infiniband/core/cma.c              | 108 +++++++++-
 drivers/infiniband/core/cma_configfs.c     | 321 +++++++++++++++++++++++++++++
 drivers/infiniband/core/core_priv.h        |  24 +++
 7 files changed, 503 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
 create mode 100644 drivers/infiniband/core/cma_configfs.c

diff --git a/Documentation/ABI/testing/configfs-rdma_cm b/Documentation/ABI/testing/configfs-rdma_cm
new file mode 100644
index 0000000..5c389aa
--- /dev/null
+++ b/Documentation/ABI/testing/configfs-rdma_cm
@@ -0,0 +1,22 @@
+What: 		/config/rdma_cm
+Date: 		November 29, 2015
+KernelVersion:  4.4.0
+Description: 	Interface is used to configure RDMA-cable HCAs in respect to
+		RDMA-CM attributes.
+
+		Attributes are visible only when configfs is mounted. To mount
+		configfs in /config directory use:
+		# mount -t configfs none /config/
+
+		In order to set parameters related to a specific HCA, a directory
+		for this HCA has to be created:
+		mkdir -p /config/rdma_cm/<hca>
+
+
+What: 		/config/rdma_cm/<hca>/ports/<port-num>/default_roce_mode
+Date: 		November 29, 2015
+KernelVersion:  4.4.0
+Description: 	RDMA-CM based connections from HCA <hca> at port <port-num>
+		will be initiated with this RoCE type as default.
+		The possible RoCE types are either "IB/RoCE v1" or "RoCE v2".
+		This parameter has RW access.
diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index aa26f3c..f5312da 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -54,6 +54,15 @@ config INFINIBAND_ADDR_TRANS
 	depends on INFINIBAND
 	default y
 
+config INFINIBAND_ADDR_TRANS_CONFIGFS
+	bool
+	depends on INFINIBAND_ADDR_TRANS && !(INFINIBAND=y && CONFIGFS_FS=m)
+	default y
+	---help---
+	  ConfigFS support for RDMA communication manager (CM).
+	  This allows the user to config the default GID type that the CM
+	  uses for each device, when initiaing new connections.
+
 source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/qib/Kconfig"
 source "drivers/infiniband/hw/cxgb3/Kconfig"
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index d43a899..7922fa7 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -24,6 +24,8 @@ iw_cm-y :=			iwcm.o iwpm_util.o iwpm_msg.o
 
 rdma_cm-y :=			cma.o
 
+rdma_cm-$(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS) += cma_configfs.o
+
 rdma_ucm-y :=			ucma.o
 
 ib_addr-y :=			addr.o
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 88b4b6f..4aada52 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -140,6 +140,30 @@ const char *ib_cache_gid_type_str(enum ib_gid_type gid_type)
 }
 EXPORT_SYMBOL(ib_cache_gid_type_str);
 
+int ib_cache_gid_parse_type_str(const char *buf)
+{
+	unsigned int i;
+	size_t len;
+	int err = -EINVAL;
+
+	len = strlen(buf);
+	if (len == 0)
+		return -EINVAL;
+
+	if (buf[len - 1] == '\n')
+		len--;
+
+	for (i = 0; i < ARRAY_SIZE(gid_type_str); ++i)
+		if (gid_type_str[i] && !strncmp(buf, gid_type_str[i], len) &&
+		    len == strlen(gid_type_str[i])) {
+			err = i;
+			break;
+		}
+
+	return err;
+}
+EXPORT_SYMBOL(ib_cache_gid_parse_type_str);
+
 /* This function expects that rwlock will be write locked in all
  * scenarios and that lock will be locked in sleep-able (RoCE)
  * scenarios.
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f78088a..8fab267 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -152,6 +152,7 @@ struct cma_device {
 	struct completion	comp;
 	atomic_t		refcount;
 	struct list_head	id_list;
+	enum ib_gid_type	*default_gid_type;
 };
 
 struct rdma_bind_list {
@@ -192,6 +193,62 @@ void cma_ref_dev(struct cma_device *cma_dev)
 	atomic_inc(&cma_dev->refcount);
 }
 
+struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter	filter,
+					     void		*cookie)
+{
+	struct cma_device *cma_dev;
+	struct cma_device *found_cma_dev = NULL;
+
+	mutex_lock(&lock);
+
+	list_for_each_entry(cma_dev, &dev_list, list)
+		if (filter(cma_dev->device, cookie)) {
+			found_cma_dev = cma_dev;
+			break;
+		}
+
+	if (found_cma_dev)
+		cma_ref_dev(found_cma_dev);
+	mutex_unlock(&lock);
+	return found_cma_dev;
+}
+
+int cma_get_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port)
+{
+	if (port < rdma_start_port(cma_dev->device) ||
+	    port > rdma_end_port(cma_dev->device))
+		return -EINVAL;
+
+	return cma_dev->default_gid_type[port - rdma_start_port(cma_dev->device)];
+}
+
+int cma_set_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port,
+			     enum ib_gid_type default_gid_type)
+{
+	unsigned long supported_gids;
+
+	if (port < rdma_start_port(cma_dev->device) ||
+	    port > rdma_end_port(cma_dev->device))
+		return -EINVAL;
+
+	supported_gids = roce_gid_type_mask_support(cma_dev->device, port);
+
+	if (!(supported_gids & 1 << default_gid_type))
+		return -EINVAL;
+
+	cma_dev->default_gid_type[port - rdma_start_port(cma_dev->device)] =
+		default_gid_type;
+
+	return 0;
+}
+
+struct ib_device *cma_get_ib_dev(struct cma_device *cma_dev)
+{
+	return cma_dev->device;
+}
+
 /*
  * Device removal can occur at anytime, so we need extra handling to
  * serialize notifying the user of device removal with other callbacks.
@@ -343,17 +400,27 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
 	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
 }
 
-static void cma_attach_to_dev(struct rdma_id_private *id_priv,
-			      struct cma_device *cma_dev)
+static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
+			       struct cma_device *cma_dev)
 {
 	cma_ref_dev(cma_dev);
 	id_priv->cma_dev = cma_dev;
+	id_priv->gid_type = 0;
 	id_priv->id.device = cma_dev->device;
 	id_priv->id.route.addr.dev_addr.transport =
 		rdma_node_get_transport(cma_dev->device->node_type);
 	list_add_tail(&id_priv->list, &cma_dev->id_list);
 }
 
+static void cma_attach_to_dev(struct rdma_id_private *id_priv,
+			      struct cma_device *cma_dev)
+{
+	_cma_attach_to_dev(id_priv, cma_dev);
+	id_priv->gid_type =
+		cma_dev->default_gid_type[id_priv->id.port_num -
+					  rdma_start_port(cma_dev->device)];
+}
+
 void cma_deref_dev(struct cma_device *cma_dev)
 {
 	if (atomic_dec_and_test(&cma_dev->refcount))
@@ -449,6 +516,7 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
 }
 
 static inline int cma_validate_port(struct ib_device *device, u8 port,
+				    enum ib_gid_type gid_type,
 				      union ib_gid *gid, int dev_type,
 				      int bound_if_index)
 {
@@ -474,9 +542,11 @@ static inline int cma_validate_port(struct ib_device *device, u8 port,
 			if (!ndev)
 				return -ENODEV;
 		}
+	} else {
+		gid_type = IB_GID_TYPE_IB;
 	}
 
-	ret = ib_find_cached_gid_by_port(device, gid, IB_GID_TYPE_IB, port,
+	ret = ib_find_cached_gid_by_port(device, gid, gid_type, port,
 					 ndev, NULL);
 
 	if (ndev)
@@ -511,7 +581,10 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 		gidp = rdma_protocol_roce(cma_dev->device, port) ?
 		       &iboe_gid : &gid;
 
-		ret = cma_validate_port(cma_dev->device, port, gidp,
+		ret = cma_validate_port(cma_dev->device, port,
+					rdma_protocol_ib(cma_dev->device, port) ?
+					IB_GID_TYPE_IB :
+					listen_id_priv->gid_type, gidp,
 					dev_addr->dev_type,
 					dev_addr->bound_dev_if);
 		if (!ret) {
@@ -530,8 +603,11 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,
 			gidp = rdma_protocol_roce(cma_dev->device, port) ?
 			       &iboe_gid : &gid;
 
-			ret = cma_validate_port(cma_dev->device, port, gidp,
-						dev_addr->dev_type,
+			ret = cma_validate_port(cma_dev->device, port,
+						rdma_protocol_ib(cma_dev->device, port) ?
+						IB_GID_TYPE_IB :
+						cma_dev->default_gid_type[port - 1],
+						gidp, dev_addr->dev_type,
 						dev_addr->bound_dev_if);
 			if (!ret) {
 				id_priv->id.port_num = port;
@@ -2073,7 +2149,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 	memcpy(cma_src_addr(dev_id_priv), cma_src_addr(id_priv),
 	       rdma_addr_size(cma_src_addr(id_priv)));
 
-	cma_attach_to_dev(dev_id_priv, cma_dev);
+	_cma_attach_to_dev(dev_id_priv, cma_dev);
 	list_add_tail(&dev_id_priv->listen_list, &id_priv->listen_list);
 	atomic_inc(&id_priv->refcount);
 	dev_id_priv->internal_id = 1;
@@ -3907,12 +3983,27 @@ static void cma_add_one(struct ib_device *device)
 {
 	struct cma_device *cma_dev;
 	struct rdma_id_private *id_priv;
+	unsigned int i;
+	unsigned long supported_gids = 0;
 
 	cma_dev = kmalloc(sizeof *cma_dev, GFP_KERNEL);
 	if (!cma_dev)
 		return;
 
 	cma_dev->device = device;
+	cma_dev->default_gid_type = kcalloc(device->phys_port_cnt,
+					    sizeof(*cma_dev->default_gid_type),
+					    GFP_KERNEL);
+	if (!cma_dev->default_gid_type) {
+		kfree(cma_dev);
+		return;
+	}
+	for (i = rdma_start_port(device); i <= rdma_end_port(device); i++) {
+		supported_gids = roce_gid_type_mask_support(device, i);
+		WARN_ON(!supported_gids);
+		cma_dev->default_gid_type[i - rdma_start_port(device)] =
+			find_first_bit(&supported_gids, BITS_PER_LONG);
+	}
 
 	init_completion(&cma_dev->comp);
 	atomic_set(&cma_dev->refcount, 1);
@@ -3992,6 +4083,7 @@ static void cma_remove_one(struct ib_device *device, void *client_data)
 	mutex_unlock(&lock);
 
 	cma_process_remove(cma_dev);
+	kfree(cma_dev->default_gid_type);
 	kfree(cma_dev);
 }
 
@@ -4125,6 +4217,7 @@ static int __init cma_init(void)
 
 	if (ibnl_add_client(RDMA_NL_RDMA_CM, RDMA_NL_RDMA_CM_NUM_OPS, cma_cb_table))
 		printk(KERN_WARNING "RDMA CMA: failed to add netlink callback\n");
+	cma_configfs_init();
 
 	return 0;
 
@@ -4139,6 +4232,7 @@ err_wq:
 
 static void __exit cma_cleanup(void)
 {
+	cma_configfs_exit();
 	ibnl_remove_client(RDMA_NL_RDMA_CM);
 	ib_unregister_client(&cma_client);
 	unregister_netdevice_notifier(&cma_nb);
diff --git a/drivers/infiniband/core/cma_configfs.c b/drivers/infiniband/core/cma_configfs.c
new file mode 100644
index 0000000..744834b
--- /dev/null
+++ b/drivers/infiniband/core/cma_configfs.c
@@ -0,0 +1,321 @@
+/*
+ * Copyright (c) 2015, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/configfs.h>
+#include <rdma/ib_verbs.h>
+#include "core_priv.h"
+
+struct cma_device;
+
+struct cma_dev_group;
+
+struct cma_dev_port_group {
+	unsigned int		port_num;
+	struct cma_dev_group	*cma_dev_group;
+	struct config_group	group;
+};
+
+struct cma_dev_group {
+	char				name[IB_DEVICE_NAME_MAX];
+	struct config_group		device_group;
+	struct config_group		ports_group;
+	struct config_group		*default_dev_group[2];
+	struct config_group		**default_ports_group;
+	struct cma_dev_port_group	*ports;
+};
+
+static struct cma_dev_port_group *to_dev_port_group(struct config_item *item)
+{
+	struct config_group *group;
+
+	if (!item)
+		return NULL;
+
+	group = container_of(item, struct config_group, cg_item);
+	return container_of(group, struct cma_dev_port_group, group);
+}
+
+static bool filter_by_name(struct ib_device *ib_dev, void *cookie)
+{
+	return !strcmp(ib_dev->name, cookie);
+}
+
+static int cma_configfs_params_get(struct config_item *item,
+				   struct cma_device **pcma_dev,
+				   struct cma_dev_port_group **pgroup)
+{
+	struct cma_dev_port_group *group = to_dev_port_group(item);
+	struct cma_device *cma_dev;
+
+	if (!group)
+		return -ENODEV;
+
+	cma_dev = cma_enum_devices_by_ibdev(filter_by_name,
+					    group->cma_dev_group->name);
+	if (!cma_dev)
+		return -ENODEV;
+
+	*pcma_dev = cma_dev;
+	*pgroup = group;
+
+	return 0;
+}
+
+static void cma_configfs_params_put(struct cma_device *cma_dev)
+{
+	cma_deref_dev(cma_dev);
+}
+
+static ssize_t default_roce_mode_show(struct config_item *item,
+				      char *buf)
+{
+	struct cma_device *cma_dev;
+	struct cma_dev_port_group *group;
+	int gid_type;
+	ssize_t ret;
+
+	ret = cma_configfs_params_get(item, &cma_dev, &group);
+	if (ret)
+		return ret;
+
+	gid_type = cma_get_default_gid_type(cma_dev, group->port_num);
+	cma_configfs_params_put(cma_dev);
+
+	if (gid_type < 0)
+		return gid_type;
+
+	return sprintf(buf, "%s\n", ib_cache_gid_type_str(gid_type));
+}
+
+static ssize_t default_roce_mode_store(struct config_item *item,
+				       const char *buf, size_t count)
+{
+	struct cma_device *cma_dev;
+	struct cma_dev_port_group *group;
+	int gid_type = ib_cache_gid_parse_type_str(buf);
+	ssize_t ret;
+
+	if (gid_type < 0)
+		return -EINVAL;
+
+	ret = cma_configfs_params_get(item, &cma_dev, &group);
+	if (ret)
+		return ret;
+
+	ret = cma_set_default_gid_type(cma_dev, group->port_num, gid_type);
+
+	cma_configfs_params_put(cma_dev);
+
+	return !ret ? strnlen(buf, count) : ret;
+}
+
+CONFIGFS_ATTR(, default_roce_mode);
+
+static struct configfs_attribute *cma_configfs_attributes[] = {
+	&attr_default_roce_mode,
+	NULL,
+};
+
+static struct config_item_type cma_port_group_type = {
+	.ct_attrs	= cma_configfs_attributes,
+	.ct_owner	= THIS_MODULE
+};
+
+static int make_cma_ports(struct cma_dev_group *cma_dev_group,
+			  struct cma_device *cma_dev)
+{
+	struct ib_device *ibdev;
+	unsigned int i;
+	unsigned int ports_num;
+	struct cma_dev_port_group *ports;
+	int err;
+
+	ibdev = cma_get_ib_dev(cma_dev);
+
+	if (!ibdev)
+		return -ENODEV;
+
+	ports_num = ibdev->phys_port_cnt;
+	ports = kcalloc(ports_num, sizeof(*cma_dev_group->ports),
+			GFP_KERNEL);
+
+	cma_dev_group->default_ports_group = kcalloc(ports_num + 1,
+						     sizeof(*cma_dev_group->ports),
+						     GFP_KERNEL);
+
+	if (!ports || !cma_dev_group->default_ports_group) {
+		err = -ENOMEM;
+		goto free;
+	}
+
+	for (i = 0; i < ports_num; i++) {
+		char port_str[10];
+
+		ports[i].port_num = i + 1;
+		snprintf(port_str, sizeof(port_str), "%u", i + 1);
+		ports[i].cma_dev_group = cma_dev_group;
+		config_group_init_type_name(&ports[i].group,
+					    port_str,
+					    &cma_port_group_type);
+		cma_dev_group->default_ports_group[i] = &ports[i].group;
+	}
+	cma_dev_group->default_ports_group[i] = NULL;
+	cma_dev_group->ports = ports;
+
+	return 0;
+free:
+	kfree(ports);
+	kfree(cma_dev_group->default_ports_group);
+	cma_dev_group->ports = NULL;
+	cma_dev_group->default_ports_group = NULL;
+	return err;
+}
+
+static void release_cma_dev(struct config_item  *item)
+{
+	struct config_group *group = container_of(item, struct config_group,
+						  cg_item);
+	struct cma_dev_group *cma_dev_group = container_of(group,
+							   struct cma_dev_group,
+							   device_group);
+
+	kfree(cma_dev_group);
+};
+
+static void release_cma_ports_group(struct config_item  *item)
+{
+	struct config_group *group = container_of(item, struct config_group,
+						  cg_item);
+	struct cma_dev_group *cma_dev_group = container_of(group,
+							   struct cma_dev_group,
+							   ports_group);
+
+	kfree(cma_dev_group->ports);
+	kfree(cma_dev_group->default_ports_group);
+	cma_dev_group->ports = NULL;
+	cma_dev_group->default_ports_group = NULL;
+};
+
+static struct configfs_item_operations cma_ports_item_ops = {
+	.release = release_cma_ports_group
+};
+
+static struct config_item_type cma_ports_group_type = {
+	.ct_item_ops	= &cma_ports_item_ops,
+	.ct_owner	= THIS_MODULE
+};
+
+static struct configfs_item_operations cma_device_item_ops = {
+	.release = release_cma_dev
+};
+
+static struct config_item_type cma_device_group_type = {
+	.ct_item_ops	= &cma_device_item_ops,
+	.ct_owner	= THIS_MODULE
+};
+
+static struct config_group *make_cma_dev(struct config_group *group,
+					 const char *name)
+{
+	int err = -ENODEV;
+	struct cma_device *cma_dev = cma_enum_devices_by_ibdev(filter_by_name,
+							       (void *)name);
+	struct cma_dev_group *cma_dev_group = NULL;
+
+	if (!cma_dev)
+		goto fail;
+
+	cma_dev_group = kzalloc(sizeof(*cma_dev_group), GFP_KERNEL);
+
+	if (!cma_dev_group) {
+		err = -ENOMEM;
+		goto fail;
+	}
+
+	strncpy(cma_dev_group->name, name, sizeof(cma_dev_group->name));
+
+	err = make_cma_ports(cma_dev_group, cma_dev);
+	if (err)
+		goto fail;
+
+	cma_dev_group->ports_group.default_groups =
+		cma_dev_group->default_ports_group;
+	config_group_init_type_name(&cma_dev_group->ports_group, "ports",
+				    &cma_ports_group_type);
+
+	cma_dev_group->device_group.default_groups
+		= cma_dev_group->default_dev_group;
+	cma_dev_group->default_dev_group[0] = &cma_dev_group->ports_group;
+	cma_dev_group->default_dev_group[1] = NULL;
+
+	config_group_init_type_name(&cma_dev_group->device_group, name,
+				    &cma_device_group_type);
+
+	cma_deref_dev(cma_dev);
+	return &cma_dev_group->device_group;
+
+fail:
+	if (cma_dev)
+		cma_deref_dev(cma_dev);
+	kfree(cma_dev_group);
+	return ERR_PTR(err);
+}
+
+static struct configfs_group_operations cma_subsys_group_ops = {
+	.make_group	= make_cma_dev,
+};
+
+static struct config_item_type cma_subsys_type = {
+	.ct_group_ops	= &cma_subsys_group_ops,
+	.ct_owner	= THIS_MODULE,
+};
+
+static struct configfs_subsystem cma_subsys = {
+	.su_group	= {
+		.cg_item	= {
+			.ci_namebuf	= "rdma_cm",
+			.ci_type	= &cma_subsys_type,
+		},
+	},
+};
+
+int __init cma_configfs_init(void)
+{
+	config_group_init(&cma_subsys.su_group);
+	mutex_init(&cma_subsys.su_mutex);
+	return configfs_register_subsystem(&cma_subsys);
+}
+
+void __exit cma_configfs_exit(void)
+{
+	configfs_unregister_subsystem(&cma_subsys);
+}
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 1945b4e..eab3221 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -38,9 +38,31 @@
 
 #include <rdma/ib_verbs.h>
 
+#if IS_ENABLED(CONFIG_INFINIBAND_ADDR_TRANS_CONFIGFS)
+int cma_configfs_init(void);
+void cma_configfs_exit(void);
+#else
+static inline int cma_configfs_init(void)
+{
+	return 0;
+}
+
+static inline void cma_configfs_exit(void)
+{
+}
+#endif
 struct cma_device;
 void cma_ref_dev(struct cma_device *cma_dev);
 void cma_deref_dev(struct cma_device *cma_dev);
+typedef bool (*cma_device_filter)(struct ib_device *, void *);
+struct cma_device *cma_enum_devices_by_ibdev(cma_device_filter	filter,
+					     void		*cookie);
+int cma_get_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port);
+int cma_set_default_gid_type(struct cma_device *cma_dev,
+			     unsigned int port,
+			     enum ib_gid_type default_gid_type);
+struct ib_device *cma_get_ib_dev(struct cma_device *cma_dev);
 
 int  ib_device_register_sysfs(struct ib_device *device,
 			      int (*port_callback)(struct ib_device *,
@@ -74,6 +96,8 @@ enum ib_cache_gid_default_mode {
 	IB_CACHE_GID_DEFAULT_MODE_DELETE
 };
 
+int ib_cache_gid_parse_type_str(const char *buf);
+
 const char *ib_cache_gid_type_str(enum ib_gid_type gid_type);
 
 void ib_cache_gid_set_default_gid(struct ib_device *ib_dev, u8 port,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (20 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-03 13:47   ` [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP Matan Barak
  2015-12-15  7:15   ` [PATCH for-next V2 00/11] Add RoCE v2 support Moni Shoua
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur,
	Moni Shoua

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

ib_ud_header_init() is used to format InfiniBand headers
in a buffer up to (but not with) BTH. For RoCE UDP ENCAP it is
required that this function would be able to build also IP and UDP
headers.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/ud_header.c    | 155 ++++++++++++++++++++++++++++++---
 drivers/infiniband/hw/mlx4/qp.c        |   7 +-
 drivers/infiniband/hw/mthca/mthca_qp.c |   2 +-
 include/rdma/ib_pack.h                 |  45 ++++++++--
 4 files changed, 188 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/ud_header.c b/drivers/infiniband/core/ud_header.c
index 72feee6..96697e7 100644
--- a/drivers/infiniband/core/ud_header.c
+++ b/drivers/infiniband/core/ud_header.c
@@ -35,6 +35,7 @@
 #include <linux/string.h>
 #include <linux/export.h>
 #include <linux/if_ether.h>
+#include <linux/ip.h>
 
 #include <rdma/ib_pack.h>
 
@@ -116,6 +117,72 @@ static const struct ib_field vlan_table[]  = {
 	  .size_bits    = 16 }
 };
 
+static const struct ib_field ip4_table[]  = {
+	{ STRUCT_FIELD(ip4, ver),
+	  .offset_words = 0,
+	  .offset_bits  = 0,
+	  .size_bits    = 4 },
+	{ STRUCT_FIELD(ip4, hdr_len),
+	  .offset_words = 0,
+	  .offset_bits  = 4,
+	  .size_bits    = 4 },
+	{ STRUCT_FIELD(ip4, tos),
+	  .offset_words = 0,
+	  .offset_bits  = 8,
+	  .size_bits    = 8 },
+	{ STRUCT_FIELD(ip4, tot_len),
+	  .offset_words = 0,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, id),
+	  .offset_words = 1,
+	  .offset_bits  = 0,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, frag_off),
+	  .offset_words = 1,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, ttl),
+	  .offset_words = 2,
+	  .offset_bits  = 0,
+	  .size_bits    = 8 },
+	{ STRUCT_FIELD(ip4, protocol),
+	  .offset_words = 2,
+	  .offset_bits  = 8,
+	  .size_bits    = 8 },
+	{ STRUCT_FIELD(ip4, check),
+	  .offset_words = 2,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(ip4, saddr),
+	  .offset_words = 3,
+	  .offset_bits  = 0,
+	  .size_bits    = 32 },
+	{ STRUCT_FIELD(ip4, daddr),
+	  .offset_words = 4,
+	  .offset_bits  = 0,
+	  .size_bits    = 32 }
+};
+
+static const struct ib_field udp_table[]  = {
+	{ STRUCT_FIELD(udp, sport),
+	  .offset_words = 0,
+	  .offset_bits  = 0,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(udp, dport),
+	  .offset_words = 0,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(udp, length),
+	  .offset_words = 1,
+	  .offset_bits  = 0,
+	  .size_bits    = 16 },
+	{ STRUCT_FIELD(udp, csum),
+	  .offset_words = 1,
+	  .offset_bits  = 16,
+	  .size_bits    = 16 }
+};
+
 static const struct ib_field grh_table[]  = {
 	{ STRUCT_FIELD(grh, ip_version),
 	  .offset_words = 0,
@@ -213,26 +280,57 @@ static const struct ib_field deth_table[] = {
 	  .size_bits    = 24 }
 };
 
+__be16 ib_ud_ip4_csum(struct ib_ud_header *header)
+{
+	struct iphdr iph;
+
+	iph.ihl		= 5;
+	iph.version	= 4;
+	iph.tos		= header->ip4.tos;
+	iph.tot_len	= header->ip4.tot_len;
+	iph.id		= header->ip4.id;
+	iph.frag_off	= header->ip4.frag_off;
+	iph.ttl		= header->ip4.ttl;
+	iph.protocol	= header->ip4.protocol;
+	iph.check	= 0;
+	iph.saddr	= header->ip4.saddr;
+	iph.daddr	= header->ip4.daddr;
+
+	return ip_fast_csum((u8 *)&iph, iph.ihl);
+}
+EXPORT_SYMBOL(ib_ud_ip4_csum);
+
 /**
  * ib_ud_header_init - Initialize UD header structure
  * @payload_bytes:Length of packet payload
  * @lrh_present: specify if LRH is present
  * @eth_present: specify if Eth header is present
  * @vlan_present: packet is tagged vlan
- * @grh_present:GRH flag (if non-zero, GRH will be included)
+ * @grh_present: GRH flag (if non-zero, GRH will be included)
+ * @ip_version: if non-zero, IP header, V4 or V6, will be included
+ * @udp_present :if non-zero, UDP header will be included
  * @immediate_present: specify if immediate data is present
  * @header:Structure to initialize
  */
-void ib_ud_header_init(int     		    payload_bytes,
-		       int		    lrh_present,
-		       int		    eth_present,
-		       int		    vlan_present,
-		       int    		    grh_present,
-		       int		    immediate_present,
-		       struct ib_ud_header *header)
+int ib_ud_header_init(int     payload_bytes,
+		      int    lrh_present,
+		      int    eth_present,
+		      int    vlan_present,
+		      int    grh_present,
+		      int    ip_version,
+		      int    udp_present,
+		      int    immediate_present,
+		      struct ib_ud_header *header)
 {
+	grh_present = grh_present && !ip_version;
 	memset(header, 0, sizeof *header);
 
+	/*
+	 * UDP header without IP header doesn't make sense
+	 */
+	if (udp_present && ip_version != 4 && ip_version != 6)
+		return -EINVAL;
+
 	if (lrh_present) {
 		u16 packet_length;
 
@@ -252,7 +350,7 @@ void ib_ud_header_init(int     		    payload_bytes,
 	if (vlan_present)
 		header->eth.type = cpu_to_be16(ETH_P_8021Q);
 
-	if (grh_present) {
+	if (ip_version == 6 || grh_present) {
 		header->grh.ip_version      = 6;
 		header->grh.payload_length  =
 			cpu_to_be16((IB_BTH_BYTES     +
@@ -260,8 +358,30 @@ void ib_ud_header_init(int     		    payload_bytes,
 				     payload_bytes    +
 				     4                + /* ICRC     */
 				     3) & ~3);          /* round up */
-		header->grh.next_header     = 0x1b;
+		header->grh.next_header     = udp_present ? IPPROTO_UDP : 0x1b;
+	}
+
+	if (ip_version == 4) {
+		int udp_bytes = udp_present ? IB_UDP_BYTES : 0;
+
+		header->ip4.ver = 4; /* version 4 */
+		header->ip4.hdr_len = 5; /* 5 words */
+		header->ip4.tot_len =
+			cpu_to_be16(IB_IP4_BYTES   +
+				     udp_bytes     +
+				     IB_BTH_BYTES  +
+				     IB_DETH_BYTES +
+				     payload_bytes +
+				     4);     /* ICRC     */
+		header->ip4.protocol = IPPROTO_UDP;
 	}
+	if (udp_present && ip_version)
+		header->udp.length =
+			cpu_to_be16(IB_UDP_BYTES   +
+				     IB_BTH_BYTES  +
+				     IB_DETH_BYTES +
+				     payload_bytes +
+				     4);     /* ICRC     */
 
 	if (immediate_present)
 		header->bth.opcode           = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
@@ -273,8 +393,11 @@ void ib_ud_header_init(int     		    payload_bytes,
 	header->lrh_present = lrh_present;
 	header->eth_present = eth_present;
 	header->vlan_present = vlan_present;
-	header->grh_present = grh_present;
+	header->grh_present = grh_present || (ip_version == 6);
+	header->ipv4_present = ip_version == 4;
+	header->udp_present = udp_present;
 	header->immediate_present = immediate_present;
+	return 0;
 }
 EXPORT_SYMBOL(ib_ud_header_init);
 
@@ -311,6 +434,16 @@ int ib_ud_header_pack(struct ib_ud_header *header,
 			&header->grh, buf + len);
 		len += IB_GRH_BYTES;
 	}
+	if (header->ipv4_present) {
+		ib_pack(ip4_table, ARRAY_SIZE(ip4_table),
+			&header->ip4, buf + len);
+		len += IB_IP4_BYTES;
+	}
+	if (header->udp_present) {
+		ib_pack(udp_table, ARRAY_SIZE(udp_table),
+			&header->udp, buf + len);
+		len += IB_UDP_BYTES;
+	}
 
 	ib_pack(bth_table, ARRAY_SIZE(bth_table),
 		&header->bth, buf + len);
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index a2e4ca5..1c7a308 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2161,7 +2161,7 @@ static int build_sriov_qp0_header(struct mlx4_ib_sqp *sqp,
 	if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_SMI_OWNER)
 		send_size += sizeof (struct mlx4_ib_tunnel_header);
 
-	ib_ud_header_init(send_size, 1, 0, 0, 0, 0, &sqp->ud_header);
+	ib_ud_header_init(send_size, 1, 0, 0, 0, 0, 0, 0, &sqp->ud_header);
 
 	if (sqp->qp.mlx4_ib_qp_type == MLX4_IB_QPT_PROXY_SMI_OWNER) {
 		sqp->ud_header.lrh.service_level =
@@ -2307,7 +2307,10 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_ud_wr *wr,
 			is_vlan = 1;
 		}
 	}
-	ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh, 0, &sqp->ud_header);
+	err = ib_ud_header_init(send_size, !is_eth, is_eth, is_vlan, is_grh,
+				0, 0, 0, &sqp->ud_header);
+	if (err)
+		return err;
 
 	if (!is_eth) {
 		sqp->ud_header.lrh.service_level =
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 35fe506..96e5fb9 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1485,7 +1485,7 @@ static int build_mlx_header(struct mthca_dev *dev, struct mthca_sqp *sqp,
 	u16 pkey;
 
 	ib_ud_header_init(256, /* assume a MAD */ 1, 0, 0,
-			  mthca_ah_grh_present(to_mah(wr->ah)), 0,
+			  mthca_ah_grh_present(to_mah(wr->ah)), 0, 0, 0,
 			  &sqp->ud_header);
 
 	err = mthca_read_ah(dev, to_mah(wr->ah), &sqp->ud_header);
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index e99d8f9..a193081 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -41,6 +41,8 @@ enum {
 	IB_ETH_BYTES  = 14,
 	IB_VLAN_BYTES = 4,
 	IB_GRH_BYTES  = 40,
+	IB_IP4_BYTES  = 20,
+	IB_UDP_BYTES  = 8,
 	IB_BTH_BYTES  = 12,
 	IB_DETH_BYTES = 8
 };
@@ -223,6 +225,27 @@ struct ib_unpacked_eth {
 	__be16	type;
 };
 
+struct ib_unpacked_ip4 {
+	u8	ver;
+	u8	hdr_len;
+	u8	tos;
+	__be16	tot_len;
+	__be16	id;
+	__be16	frag_off;
+	u8	ttl;
+	u8	protocol;
+	__be16	check;
+	__be32	saddr;
+	__be32	daddr;
+};
+
+struct ib_unpacked_udp {
+	__be16	sport;
+	__be16	dport;
+	__be16	length;
+	__be16	csum;
+};
+
 struct ib_unpacked_vlan {
 	__be16  tag;
 	__be16  type;
@@ -237,6 +260,10 @@ struct ib_ud_header {
 	struct ib_unpacked_vlan vlan;
 	int			grh_present;
 	struct ib_unpacked_grh	grh;
+	int			ipv4_present;
+	struct ib_unpacked_ip4	ip4;
+	int			udp_present;
+	struct ib_unpacked_udp	udp;
 	struct ib_unpacked_bth	bth;
 	struct ib_unpacked_deth deth;
 	int			immediate_present;
@@ -253,13 +280,17 @@ void ib_unpack(const struct ib_field        *desc,
 	       void                         *buf,
 	       void                         *structure);
 
-void ib_ud_header_init(int		    payload_bytes,
-		       int		    lrh_present,
-		       int		    eth_present,
-		       int		    vlan_present,
-		       int		    grh_present,
-		       int		    immediate_present,
-		       struct ib_ud_header *header);
+__be16 ib_ud_ip4_csum(struct ib_ud_header *header);
+
+int ib_ud_header_init(int		    payload_bytes,
+		      int		    lrh_present,
+		      int		    eth_present,
+		      int		    vlan_present,
+		      int		    grh_present,
+		      int		    ip_version,
+		      int		    udp_present,
+		      int		    immediate_present,
+		      struct ib_ud_header *header);
 
 int ib_ud_header_pack(struct ib_ud_header *header,
 		      void                *buf);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (21 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers Matan Barak
@ 2015-12-03 13:47   ` Matan Barak
  2015-12-15  7:15   ` [PATCH for-next V2 00/11] Add RoCE v2 support Moni Shoua
  23 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 13:47 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur,
	Moni Shoua

From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Since RoCEv2 is a protocol over IP header it is required to send IGMP
join and leave requests to the network when joining and leaving
multicast groups.

Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c       | 96 ++++++++++++++++++++++++++++++++++---
 drivers/infiniband/core/multicast.c | 17 ++++++-
 include/rdma/ib_sa.h                |  2 +
 3 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 8fab267..c30bfe3 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -38,6 +38,7 @@
 #include <linux/in6.h>
 #include <linux/mutex.h>
 #include <linux/random.h>
+#include <linux/igmp.h>
 #include <linux/idr.h>
 #include <linux/inetdevice.h>
 #include <linux/slab.h>
@@ -304,6 +305,7 @@ struct cma_multicast {
 	void			*context;
 	struct sockaddr_storage	addr;
 	struct kref		mcref;
+	bool			igmp_joined;
 };
 
 struct cma_work {
@@ -400,6 +402,26 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
 	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
 }
 
+static int cma_igmp_send(struct net_device *ndev, union ib_gid *mgid, bool join)
+{
+	struct in_device *in_dev = NULL;
+
+	if (ndev) {
+		rtnl_lock();
+		in_dev = __in_dev_get_rtnl(ndev);
+		if (in_dev) {
+			if (join)
+				ip_mc_inc_group(in_dev,
+						*(__be32 *)(mgid->raw + 12));
+			else
+				ip_mc_dec_group(in_dev,
+						*(__be32 *)(mgid->raw + 12));
+		}
+		rtnl_unlock();
+	}
+	return (in_dev) ? 0 : -ENODEV;
+}
+
 static void _cma_attach_to_dev(struct rdma_id_private *id_priv,
 			       struct cma_device *cma_dev)
 {
@@ -1535,8 +1557,24 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
 				      id_priv->id.port_num)) {
 			ib_sa_free_multicast(mc->multicast.ib);
 			kfree(mc);
-		} else
+		} else {
+			if (mc->igmp_joined) {
+				struct rdma_dev_addr *dev_addr =
+					&id_priv->id.route.addr.dev_addr;
+				struct net_device *ndev = NULL;
+
+				if (dev_addr->bound_dev_if)
+					ndev = dev_get_by_index(&init_net,
+								dev_addr->bound_dev_if);
+				if (ndev) {
+					cma_igmp_send(ndev,
+						      &mc->multicast.ib->rec.mgid,
+						      false);
+					dev_put(ndev);
+				}
+			}
 			kref_put(&mc->mcref, release_mc);
+		}
 	}
 }
 
@@ -3656,12 +3694,23 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast)
 	event.status = status;
 	event.param.ud.private_data = mc->context;
 	if (!status) {
+		struct rdma_dev_addr *dev_addr =
+			&id_priv->id.route.addr.dev_addr;
+		struct net_device *ndev =
+			dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+		enum ib_gid_type gid_type =
+			id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
+			rdma_start_port(id_priv->cma_dev->device)];
+
 		event.event = RDMA_CM_EVENT_MULTICAST_JOIN;
 		ib_init_ah_from_mcmember(id_priv->id.device,
 					 id_priv->id.port_num, &multicast->rec,
+					 ndev, gid_type,
 					 &event.param.ud.ah_attr);
 		event.param.ud.qp_num = 0xFFFFFF;
 		event.param.ud.qkey = be32_to_cpu(multicast->rec.qkey);
+		if (ndev)
+			dev_put(ndev);
 	} else
 		event.event = RDMA_CM_EVENT_MULTICAST_ERROR;
 
@@ -3794,9 +3843,10 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 {
 	struct iboe_mcast_work *work;
 	struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
-	int err;
+	int err = 0;
 	struct sockaddr *addr = (struct sockaddr *)&mc->addr;
 	struct net_device *ndev = NULL;
+	enum ib_gid_type gid_type;
 
 	if (cma_zero_addr((struct sockaddr *)&mc->addr))
 		return -EINVAL;
@@ -3826,9 +3876,25 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv,
 	mc->multicast.ib->rec.rate = iboe_get_rate(ndev);
 	mc->multicast.ib->rec.hop_limit = 1;
 	mc->multicast.ib->rec.mtu = iboe_get_mtu(ndev->mtu);
+
+	gid_type = id_priv->cma_dev->default_gid_type[id_priv->id.port_num -
+		   rdma_start_port(id_priv->cma_dev->device)];
+	if (addr->sa_family == AF_INET) {
+		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
+			err = cma_igmp_send(ndev, &mc->multicast.ib->rec.mgid,
+					    true);
+		if (!err) {
+			mc->igmp_joined = true;
+			mc->multicast.ib->rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
+		}
+	} else {
+		if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
+			err = -ENOTSUPP;
+	}
 	dev_put(ndev);
-	if (!mc->multicast.ib->rec.mtu) {
-		err = -EINVAL;
+	if (err || !mc->multicast.ib->rec.mtu) {
+		if (!err)
+			err = -EINVAL;
 		goto out2;
 	}
 	rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr,
@@ -3867,7 +3933,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
 	memcpy(&mc->addr, addr, rdma_addr_size(addr));
 	mc->context = context;
 	mc->id_priv = id_priv;
-
+	mc->igmp_joined = false;
 	spin_lock(&id_priv->lock);
 	list_add(&mc->list, &id_priv->mc_list);
 	spin_unlock(&id_priv->lock);
@@ -3912,9 +3978,25 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
 			if (rdma_cap_ib_mcast(id->device, id->port_num)) {
 				ib_sa_free_multicast(mc->multicast.ib);
 				kfree(mc);
-			} else if (rdma_protocol_roce(id->device, id->port_num))
+			} else if (rdma_protocol_roce(id->device, id->port_num)) {
+				if (mc->igmp_joined) {
+					struct rdma_dev_addr *dev_addr =
+						&id->route.addr.dev_addr;
+					struct net_device *ndev = NULL;
+
+					if (dev_addr->bound_dev_if)
+						ndev = dev_get_by_index(&init_net,
+									dev_addr->bound_dev_if);
+					if (ndev) {
+						cma_igmp_send(ndev,
+							      &mc->multicast.ib->rec.mgid,
+							      false);
+						dev_put(ndev);
+					}
+					mc->igmp_joined = false;
+				}
 				kref_put(&mc->mcref, release_mc);
-
+			}
 			return;
 		}
 	}
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index 6911ae6..250937c 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -723,14 +723,27 @@ EXPORT_SYMBOL(ib_sa_get_mcmember_rec);
 
 int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
 			     struct ib_sa_mcmember_rec *rec,
+			     struct net_device *ndev,
+			     enum ib_gid_type gid_type,
 			     struct ib_ah_attr *ah_attr)
 {
 	int ret;
 	u16 gid_index;
 	u8 p;
 
-	ret = ib_find_cached_gid(device, &rec->port_gid, IB_GID_TYPE_IB,
-				 NULL, &p, &gid_index);
+	if (rdma_protocol_roce(device, port_num)) {
+		ret = ib_find_cached_gid_by_port(device, &rec->port_gid,
+						 gid_type, port_num,
+						 ndev,
+						 &gid_index);
+	} else if (rdma_protocol_ib(device, port_num)) {
+		ret = ib_find_cached_gid(device, &rec->port_gid,
+					 IB_GID_TYPE_IB, NULL, &p,
+					 &gid_index);
+	} else {
+		ret = -EINVAL;
+	}
+
 	if (ret)
 		return ret;
 
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index 0a40ed2..cdc1c81 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -403,6 +403,8 @@ int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num,
  */
 int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
 			     struct ib_sa_mcmember_rec *rec,
+			     struct net_device *ndev,
+			     enum ib_gid_type gid_type,
 			     struct ib_ah_attr *ah_attr);
 
 /**
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]     ` <1449150450-13679-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-03 14:05       ` Christoph Hellwig
       [not found]         ` <20151203140543.GA4283-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-12-03 16:36       ` Jason Gunthorpe
  1 sibling, 1 reply; 64+ messages in thread
From: Christoph Hellwig @ 2015-12-03 14:05 UTC (permalink / raw)
  To: Matan Barak
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

Bloating the WC with a field that's not really useful for the ULPs
seems pretty sad..
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]         ` <20151203140543.GA4283-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-12-03 16:19           ` Matan Barak
  2015-12-03 16:20           ` Liran Liss
  1 sibling, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-03 16:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matan Barak, Doug Ledford, linux-rdma, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

On Thu, Dec 3, 2015 at 4:05 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> Bloating the WC with a field that's not really useful for the ULPs
> seems pretty sad..

Network header type is mandatory in order to find the GID type and get
the GIDs correctly from the header.
I realize ULPs might have preferred to get the GID itself, but
resolving the GID costs time and most of the time you don't really
need that when you poll a CQ.
This could be refactored later to use wc_flags instead of a new field
when we approach the cache line limit.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]         ` <20151203140543.GA4283-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-12-03 16:19           ` Matan Barak
@ 2015-12-03 16:20           ` Liran Liss
       [not found]             ` <HE1PR05MB141857FA57EECD82D1533543B10D0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Liran Liss @ 2015-12-03 16:20 UTC (permalink / raw)
  To: Christoph Hellwig, Matan Barak
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Jason Gunthorpe, Somnath Kotur

> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-

> Subject: Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to
> wc
> 
> Bloating the WC with a field that's not really useful for the ULPs seems pretty
> sad..

You need per packet (read per-WC) network type both for handling incoming connections over QP1 and in UD QPs.
It looks like this patch doesn't extend the structure size due to alignment, so no real harm in any case...

Alternatively, we can consider specifying the network type as a completion flag instead.
--Liran

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]     ` <1449150450-13679-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-12-03 14:05       ` Christoph Hellwig
@ 2015-12-03 16:36       ` Jason Gunthorpe
  1 sibling, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-03 16:36 UTC (permalink / raw)
  To: Matan Barak
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Somnath Kotur

On Thu, Dec 03, 2015 at 03:47:12PM +0200, Matan Barak wrote:
> From: Somnath Kotur <Somnath.Kotur-idTK6quXuVSLFuii7jzJGg@public.gmane.org>
> 
> Providers should tell IB core the wc's network type.
> This is used in order to search for the proper GID in the
> GID table. When using HCAs that can't provide this info,
> IB core tries to deep examine the packet and extract
> the GID type by itself.

Eh? A wc has a sgid_index, and in this brave new world a gid has the
network type. Why do we need to specify it again?

>  	memset(ah_attr, 0, sizeof *ah_attr);
>  	if (rdma_cap_eth_ah(device, port_num)) {
> +		if (wc->wc_flags & IB_WC_WITH_NETWORK_HDR_TYPE)
> +			net_type = wc->network_hdr_type;
> +		else
> +			net_type = ib_get_net_type_by_grh(device, port_num, grh);
> +		gid_type = ib_network_to_gid_type(net_type);

Like here for instance.

... and I keep saying this is all wrong, once you get into IP land
this entire process needs a route/neighbour lookup.


> -			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
> +			ret = rdma_addr_find_dmac_by_grh(&dgid, &sgid,
>  							 ah_attr->dmac,
>  							 wc->wc_flags & IB_WC_WITH_VLAN ?
>  							 NULL : &vlan_id,

ie no to this.

> +			if (sgid_attr.gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP)
> +				/* TODO: get the hoplimit from the inet/inet6
> +				 * device
> +				 */

And no again, please fix this and all other missing route lookups
before sending another version.

> +	struct {
> +		/* The IB spec states that if it's IPv4, the header

roceev2 spec, surely

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]             ` <HE1PR05MB141857FA57EECD82D1533543B10D0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-12-06 14:03               ` Christoph Hellwig
       [not found]                 ` <20151206140307.GB25487-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-12-07  6:02               ` Jason Gunthorpe
  1 sibling, 1 reply; 64+ messages in thread
From: Christoph Hellwig @ 2015-12-06 14:03 UTC (permalink / raw)
  To: Liran Liss
  Cc: Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

On Thu, Dec 03, 2015 at 04:20:50PM +0000, Liran Liss wrote:
> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> 
> > Subject: Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to
> > wc
> > 
> > Bloating the WC with a field that's not really useful for the ULPs seems pretty
> > sad..
> 
> You need per packet (read per-WC) network type both for handling incoming connections over QP1 and in UD QPs.
> It looks like this patch doesn't extend the structure size due to alignment, so no real harm in any case...

The ULPs couldn't really care less about the network type.  The problem
is that one monolithic structure is used both for communicatation with
the protocol drivers and the RoCE implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                 ` <20151206140307.GB25487-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-12-06 14:20                   ` Moni Shoua
  0 siblings, 0 replies; 64+ messages in thread
From: Moni Shoua @ 2015-12-06 14:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Liran Liss, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

On Sun, Dec 6, 2015 at 4:03 PM, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> The ULPs couldn't really care less about the network type.  The problem

In WC, ULPs care about network type as much as they care for dgid or port_num
So, network_type is indeed an information that is relevant only to
RoCE but it only follows the tradition
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]             ` <HE1PR05MB141857FA57EECD82D1533543B10D0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2015-12-06 14:03               ` Christoph Hellwig
@ 2015-12-07  6:02               ` Jason Gunthorpe
       [not found]                 ` <20151207060241.GA19038-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-07  6:02 UTC (permalink / raw)
  To: Liran Liss
  Cc: Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Thu, Dec 03, 2015 at 04:20:50PM +0000, Liran Liss wrote:
> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> 
> > Subject: Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to
> > wc
> > 
> > Bloating the WC with a field that's not really useful for the ULPs seems pretty
> > sad..
> 
> You need per packet (read per-WC) network type both for handling incoming connections over QP1 and in UD QPs.
> It looks like this patch doesn't extend the structure size due to alignment, so no real harm in any case...

Why? The sgid index must tell you the network type.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                 ` <20151207060241.GA19038-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-07  6:15                   ` Moni Shoua
       [not found]                     ` <CAG9sBKO0rsSQK1WcyBciZEONWJsusw9GwR19H7FfAsebEH63dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-12-07  6:21                   ` Parav Pandit
  1 sibling, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-07  6:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 7, 2015 at 8:02 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> Why? The sgid index must tell you the network type.

struct ib_wc doesn't have sgid or sgid_index field.
What you have though is the sgid taken from the GRH that is scattered
to the first 40/20 bytes of the receive WQE.  This is not enough to
determine the network type.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                 ` <20151207060241.GA19038-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-12-07  6:15                   ` Moni Shoua
@ 2015-12-07  6:21                   ` Parav Pandit
  1 sibling, 0 replies; 64+ messages in thread
From: Parav Pandit @ 2015-12-07  6:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 7, 2015 at 11:32 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Thu, Dec 03, 2015 at 04:20:50PM +0000, Liran Liss wrote:
>> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>
>> > Subject: Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to
>> > wc
>> >
>> > Bloating the WC with a field that's not really useful for the ULPs seems pretty
>> > sad..
>>
>> You need per packet (read per-WC) network type both for handling incoming connections over QP1 and in UD QPs.
>> It looks like this patch doesn't extend the structure size due to alignment, so no real harm in any case...
>
> Why? The sgid index must tell you the network type.
>

User space might not have access to internal properties of gid entry
like kernel. Not sure if more plumbing needs to be added in user space
to get such property and async updates of it.
So Liran might be thinking to have unified way to report required fields via wc?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                     ` <CAG9sBKO0rsSQK1WcyBciZEONWJsusw9GwR19H7FfAsebEH63dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-07  6:34                       ` Jason Gunthorpe
       [not found]                         ` <20151207063415.GB20066-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-07  6:34 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 07, 2015 at 08:15:40AM +0200, Moni Shoua wrote:

> What you have though is the sgid taken from the GRH that is scattered
> to the first 40/20 bytes of the receive WQE.  This is not enough to
> determine the network type.

It is enough to discover the sgid index which will tell you the type.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                         ` <20151207063415.GB20066-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-07  6:37                           ` Moni Shoua
       [not found]                             ` <CAG9sBKMBt6Toa25Ouasr7hudAcOYrxZyBUFYMXzrMTFZvooWfQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-07  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 7, 2015 at 8:34 AM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Dec 07, 2015 at 08:15:40AM +0200, Moni Shoua wrote:
>
>> What you have though is the sgid taken from the GRH that is scattered
>> to the first 40/20 bytes of the receive WQE.  This is not enough to
>> determine the network type.
>
> It is enough to discover the sgid index which will tell you the type.
but how? all you have in hand is the sgid which can appear several
times in the GID table in different indices.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
       [not found]     ` <1449150450-13679-20-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-07 13:42       ` Haggai Eran
       [not found]         ` <56658CB4.7090402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Haggai Eran @ 2015-12-07 13:42 UTC (permalink / raw)
  To: Matan Barak, Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Or Gerlitz,
	Jason Gunthorpe, Somnath Kotur

On 12/03/2015 03:47 PM, Matan Barak wrote:
> +static int addr_resolve_neigh(struct dst_entry *dst,
> +			      const struct sockaddr *dst_in,
> +			      struct rdma_dev_addr *addr)
> +{
> +	if (dst->dev->flags & IFF_LOOPBACK) {
> +		int ret;
> +
> +		ret = rdma_translate_ip(dst_in, addr, NULL);
> +		if (!ret)
> +			memcpy(addr->dst_dev_addr, addr->src_dev_addr,
> +			       MAX_ADDR_LEN);
> +
> +		return ret;
> +	}
> +
> +	/* If the device does ARP internally */
You mean "doesn't do ARP internally" right?

> +	if (!(dst->dev->flags & IFF_NOARP)) {
> +		const struct sockaddr_in *dst_in4 =
> +			(const struct sockaddr_in *)dst_in;
> +		const struct sockaddr_in6 *dst_in6 =
> +			(const struct sockaddr_in6 *)dst_in;
> +
> +		return dst_fetch_ha(dst, addr,
> +				    dst_in->sa_family == AF_INET ?
> +				    (const void *)&dst_in4->sin_addr.s_addr :
> +				    (const void *)&dst_in6->sin6_addr);
> +	}
> +
> +	return rdma_copy_addr(addr, dst->dev, NULL);
> +}

> +int rdma_resolve_ip_route(struct sockaddr *src_addr,
> +			  const struct sockaddr *dst_addr,
> +			  struct rdma_dev_addr *addr)
> +{
> +	struct sockaddr_storage ssrc_addr;
> +	struct sockaddr *src_in = (struct sockaddr *)&ssrc_addr;
> +
> +	if (src_addr->sa_family != dst_addr->sa_family)
> +		return -EINVAL;
> +
> +	if (src_addr)
> +		memcpy(src_in, src_addr, rdma_addr_size(src_addr));
> +	else
> +		src_in->sa_family = dst_addr->sa_family;
Don't you need to clear the rest of src_in? I believe you pass
uninitialized memory to it.

> +
> +	return addr_resolve(src_in, dst_addr, addr, false);
> +}
> +EXPORT_SYMBOL(rdma_resolve_ip_route);

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                             ` <CAG9sBKMBt6Toa25Ouasr7hudAcOYrxZyBUFYMXzrMTFZvooWfQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-07 17:12                               ` Jason Gunthorpe
       [not found]                                 ` <20151207171228.GA26969-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-07 17:12 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 07, 2015 at 08:37:41AM +0200, Moni Shoua wrote:
> On Mon, Dec 7, 2015 at 8:34 AM, Jason Gunthorpe
> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> > On Mon, Dec 07, 2015 at 08:15:40AM +0200, Moni Shoua wrote:
> >
> >> What you have though is the sgid taken from the GRH that is scattered
> >> to the first 40/20 bytes of the receive WQE.  This is not enough to
> >> determine the network type.
> >
> > It is enough to discover the sgid index which will tell you the type.
> but how? all you have in hand is the sgid which can appear several
> times in the GID table in different indices.

Eh? Reliably recovering the gid index is not optional. The network
namespace stuff that is already in the kernel hard requires that
ability. (This is the same argument we went around on already why pkey
had to come from the wc, not from the payload)

If your position is it cannot be done from a WC,QP as is, then
gid_index needs to be added to the wc, or something else to remove the
ambiguity.

In either case, network_type is absolutely the wrong thing to have in
the wc.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                 ` <20151207171228.GA26969-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-07 18:34                                   ` Moni Shoua
       [not found]                                     ` <CAG9sBKNXnzp7PgJhfcjbYvje5qgZWYqygJYkbs4WyTZtDc4ckg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-07 18:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

Well, just tell me how you want to discover gid_index when you poll
the WC out of the CQ.
Like I said, the sgid itself is in the GRH that is scattered to the
buffers in the receive queue. When ib_poll_cq() is called the pointer
to GRH is not passed so there is no way to determine the gid_index
inside the polling context.
BTW, why do you argue about this only now? This is not RoCE specific
issue. sgid_index was always resolved outside the polling context. The
only difference between then and now is that with InfiniBand there was
no conflict about sgid_index once you had the sgid. Now, when same gid
value can be present in multiple entries you need an extra parameter
to get distinguish between them - that would be the netwrok_type


On Mon, Dec 7, 2015 at 7:12 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Dec 07, 2015 at 08:37:41AM +0200, Moni Shoua wrote:
>> On Mon, Dec 7, 2015 at 8:34 AM, Jason Gunthorpe
>> <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
>> > On Mon, Dec 07, 2015 at 08:15:40AM +0200, Moni Shoua wrote:
>> >
>> >> What you have though is the sgid taken from the GRH that is scattered
>> >> to the first 40/20 bytes of the receive WQE.  This is not enough to
>> >> determine the network type.
>> >
>> > It is enough to discover the sgid index which will tell you the type.
>> but how? all you have in hand is the sgid which can appear several
>> times in the GID table in different indices.
>
> Eh? Reliably recovering the gid index is not optional. The network
> namespace stuff that is already in the kernel hard requires that
> ability. (This is the same argument we went around on already why pkey
> had to come from the wc, not from the payload)
>
> If your position is it cannot be done from a WC,QP as is, then
> gid_index needs to be added to the wc, or something else to remove the
> ambiguity.
>
> In either case, network_type is absolutely the wrong thing to have in
> the wc.
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                     ` <CAG9sBKNXnzp7PgJhfcjbYvje5qgZWYqygJYkbs4WyTZtDc4ckg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-07 18:48                                       ` Jason Gunthorpe
       [not found]                                         ` <20151207184832.GA21402-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-07 18:48 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 07, 2015 at 08:34:43PM +0200, Moni Shoua wrote:
> Well, just tell me how you want to discover gid_index when you poll
> the WC out of the CQ.

Hey, I'm not desiging this rocev2 stuff, this is something the rocev2
community needs to sort out.

> Like I said, the sgid itself is in the GRH that is scattered to the
> buffers in the receive queue. When ib_poll_cq() is called the pointer
> to GRH is not passed so there is no way to determine the gid_index
> inside the polling context.

Then have an API to let the consumer recover it.

> BTW, why do you argue about this only now? This is not RoCE specific
> issue. sgid_index was always resolved outside the polling context. The
> only difference between then and now is that with InfiniBand there was
> no conflict about sgid_index once you had the sgid.

I think you answered your own question. In IB and rocev1 the sgid
index is not ambiguous, rocev2 breaks that requirement without
adaquately fixing it.

It is probably also the case that rocev1 has similar problems,
depending on how vland and macvlan ended up being implemented.

> Now, when same gid value can be present in multiple entries you need
> an extra parameter to get distinguish between them - that would be
> the netwrok_type

Eh?  You are only worried about duplicates between rocev1 mode which
uses the GUID and v2 mode which uses the host's IP address?

What about duplicates entirely within v2 mode where vlan and macvlan
can create duplicate host IP addresses.

You can also just not create duplicate entries in the first place. For
instance there probably isn't a really a good reason to use rocev2 for
IPv6 link local addreses, that trivially eliminates the v1/v2
collision.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                         ` <20151207184832.GA21402-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-08  7:23                                           ` Moni Shoua
       [not found]                                             ` <CAG9sBKNWMQA3XDGpDc_+2QoJcnnNL9o67p0JNoVqrU0u9UQCrA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-12-09  9:41                                           ` Moni Shoua
  1 sibling, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-08  7:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 7, 2015 at 8:48 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Dec 07, 2015 at 08:34:43PM +0200, Moni Shoua wrote:
>> Well, just tell me how you want to discover gid_index when you poll
>> the WC out of the CQ.
>
> Hey, I'm not desiging this rocev2 stuff, this is something the rocev2
> community needs to sort out.
>
>> Like I said, the sgid itself is in the GRH that is scattered to the
>> buffers in the receive queue. When ib_poll_cq() is called the pointer
>> to GRH is not passed so there is no way to determine the gid_index
>> inside the polling context.
>
> Then have an API to let the consumer recover it.
>
>> BTW, why do you argue about this only now? This is not RoCE specific
>> issue. sgid_index was always resolved outside the polling context. The
>> only difference between then and now is that with InfiniBand there was
>> no conflict about sgid_index once you had the sgid.
>
> I think you answered your own question. In IB and rocev1 the sgid
> index is not ambiguous, rocev2 breaks that requirement without
> adaquately fixing it.
>
> It is probably also the case that rocev1 has similar problems,
> depending on how vland and macvlan ended up being implemented.
>
>> Now, when same gid value can be present in multiple entries you need
>> an extra parameter to get distinguish between them - that would be
>> the netwrok_type
>
> Eh?  You are only worried about duplicates between rocev1 mode which
> uses the GUID and v2 mode which uses the host's IP address?
>
> What about duplicates entirely within v2 mode where vlan and macvlan
> can create duplicate host IP addresses.
In that case these entries will be distinguished  byt its eth devices
(eth device is part of the gid attributes structure)

>
> You can also just not create duplicate entries in the first place. For
> instance there probably isn't a really a good reason to use rocev2 for
> IPv6 link local addreses, that trivially eliminates the v1/v2
> collision.
First, spec says that I must.
Second, even if you have an example that makes it unnecessary for some
it is still necessary for others and therefore the netwotk_type in WC
is still needed
Third, regarding the IPv6 link local example,  IPv6 link local packet
is almost identical to RoCEv1 but not entirely. The former is an IP
protocol packet while the later isn't. Switches and network admins may
care about this.

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                             ` <CAG9sBKNWMQA3XDGpDc_+2QoJcnnNL9o67p0JNoVqrU0u9UQCrA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-08 22:52                                               ` Jason Gunthorpe
       [not found]                                                 ` <20151208225251.GA27609-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-08 22:52 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Tue, Dec 08, 2015 at 09:23:18AM +0200, Moni Shoua wrote:
> >> Now, when same gid value can be present in multiple entries you need
> >> an extra parameter to get distinguish between them - that would be
> >> the netwrok_type
> >
> > Eh?  You are only worried about duplicates between rocev1 mode which
> > uses the GUID and v2 mode which uses the host's IP address?
> >
> > What about duplicates entirely within v2 mode where vlan and macvlan
> > can create duplicate host IP addresses.

> In that case these entries will be distinguished  byt its eth devices
> (eth device is part of the gid attributes structure)

Eh? I think you've missed the point, there is no net device when
looking at a wc.

Look, here is a concrete direction:

Replace all the crap in
ib_init_ah_from_wc/get_sgid_index_from_eth/rdma_addr_find_dmac_by_grh

with a straightforward

   rdma_dgid_index_from_wc(
                        const struct ib_qp *qp,
                        const struct ib_wc *wc,
			const struct ib_grh *grh,
			u16 *gid_index)

Sort of function that reads the GRH and wc and returns the unambiguous
gid index that was used to receive that packet on the UD QP.

The gid cache is not allowed to create an ambiguity the driver cannot
resolve.

That said, I wouldn't object to vendor-specific bits in the wc. Ie if
mlx hardware needs a network_type bit to implement
rdma_find_dgid_index_from_wc, then fine - define a vendor specific
place to put it. In this case rdma_find_dgid_index_from_wc would be a
driver call back, which is fine, and what Caitlin was talking about.

But, it is not part of our verbs API, and I'd *strongly* encourage
other vendors and future hardware to simply return the gid index that
the hardware matched instead of requiring the software to try and
guess after the fact.

This is the same issue/argument we went around and around on the
lladdr ipoib details when working on the namespace patches, about how
important it is to resolve the namespace from the hardware headers.

Of course once we have the gid index we now have the net device and
other information needed to make namespaces work.

.. and this is part of what I mean what I said the interface from the
gid cache code is not a sane API and needs to be changed.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                 ` <20151208225251.GA27609-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-09  9:34                                                   ` Moni Shoua
       [not found]                                                     ` <CAG9sBKM2s9BMv8priCHnFaMcSnuafn4vf+6+0Rooc6Gw0ZSaLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-12-09  9:38                                                   ` Liran Liss
  1 sibling, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-09  9:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

> Eh? I think you've missed the point, there is no net device when
> looking at a wc.
>
> Look, here is a concrete direction:
>
> Replace all the crap in
> ib_init_ah_from_wc/get_sgid_index_from_eth/rdma_addr_find_dmac_by_grh
>
> with a straightforward
>
>    rdma_dgid_index_from_wc(
>                         const struct ib_qp *qp,
>                         const struct ib_wc *wc,
>                         const struct ib_grh *grh,
>                         u16 *gid_index)
>
> Sort of function that reads the GRH and wc and returns the unambiguous
> gid index that was used to receive that packet on the UD QP.
>

I already answered this to but I'll do it again
RoCEv2 spec says that L3 header will be scattered to receive WQE in
the following way
IPv6 and RoCEv1 - 40 bytes of the L3 header (GRH or IPv6) to the first
40 bytes of the receive bufs
IPv4 - 20 bytes of the L3 header to the second half of the first 40
bytes of the receive bufs. The first 20 bytes remain undefined.

Now, if you think how you deduce network_type from GRH you'll see that
it requires tools like checksum validation and other validations and
you end up with a method that is not 100% error free. So,to eliminate
the need for heavy computation (with regards to the other option) and
be free from false deductions you have the option of getting
network_type from the hardware. So, if you do have hardware that
supports it why give it up?



> The gid cache is not allowed to create an ambiguity the driver cannot
> resolve.
>
> That said, I wouldn't object to vendor-specific bits in the wc. Ie if
> mlx hardware needs a network_type bit to implement
> rdma_find_dgid_index_from_wc, then fine - define a vendor specific
> place to put it. In this case rdma_find_dgid_index_from_wc would be a
> driver call back, which is fine, and what Caitlin was talking about.
>

This is not a Mellanox specific flag. See a quote from the spec

A17.4.5.1 UD COMPLETION QUEUE ENTRIES (CQES)
For UD, the Completion Queue Entry (CQE) includes remote address
information (InfiniBand Specification Vol. 1 Rev 1.2.1 Section
11.4.2.1). For RoCEv2, the remote address information comprises the
source L2 Address and a flag that indicates if the received frame is
an IPv4, IPv6 or RoCE packet.



> But, it is not part of our verbs API, and I'd *strongly* encourage
> other vendors and future hardware to simply return the gid index that
> the hardware matched instead of requiring the software to try and
> guess after the fact.

Could be problematic for virtual machine architectures that give a
portion of the entire GID table to a VM that index it 0..N
>
> This is the same issue/argument we went around and around on the
> lladdr ipoib details when working on the namespace patches, about how
> important it is to resolve the namespace from the hardware headers.
>
> Of course once we have the gid index we now have the net device and
> other information needed to make namespaces work.
>
> .. and this is part of what I mean what I said the interface from the
> gid cache code is not a sane API and needs to be changed.
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                 ` <20151208225251.GA27609-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-12-09  9:34                                                   ` Moni Shoua
@ 2015-12-09  9:38                                                   ` Liran Liss
       [not found]                                                     ` <HE1PR05MB1418B1F0F393AD0D92424043B1E80-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Liran Liss @ 2015-12-09  9:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Moni Shoua
  Cc: Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]

> Look, here is a concrete direction:
> 
> Replace all the crap in
> ib_init_ah_from_wc/get_sgid_index_from_eth/rdma_addr_find_dmac_by_
> grh
> 
> with a straightforward
> 
>    rdma_dgid_index_from_wc(
>                         const struct ib_qp *qp,
>                         const struct ib_wc *wc,
> 			const struct ib_grh *grh,
> 			u16 *gid_index)
> 
> Sort of function that reads the GRH and wc and returns the unambiguous gid
> index that was used to receive that packet on the UD QP.
> 
> The gid cache is not allowed to create an ambiguity the driver cannot resolve.
> 
> That said, I wouldn't object to vendor-specific bits in the wc. Ie if mlx
> hardware needs a network_type bit to implement
> rdma_find_dgid_index_from_wc, then fine - define a vendor specific place
> to put it. In this case rdma_find_dgid_index_from_wc would be a driver call
> back, which is fine, and what Caitlin was talking about.
> 
> But, it is not part of our verbs API, and I'd *strongly* encourage other
> vendors and future hardware to simply return the gid index that the
> hardware matched instead of requiring the software to try and guess after
> the fact.
> 

You are pushing abstraction into provider code instead of handling it in a generic way.

The Verbs are a low-level API, that should report exactly what was received from the wire.
In the RoCEv2 case, it should be the GID/IP addresses and the protocol type.
The addressing information is not intended to be used directly by applications; it is the raw bits that were accepted from the wire.
I don't want to replicate resolution code in every RoCE driver.

ib_init_ah_from_wc() and friends is exactly the place that you want to create an address handle based on completion and packet fields.
This is what applications will use (and only if they are interested in generating a response; otherwise this overhead can be skipped).

As I said earlier, we can make sure that protocol type information doesn't incur any additional overhead.
--Liran

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                         ` <20151207184832.GA21402-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-12-08  7:23                                           ` Moni Shoua
@ 2015-12-09  9:41                                           ` Moni Shoua
  1 sibling, 0 replies; 64+ messages in thread
From: Moni Shoua @ 2015-12-09  9:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Mon, Dec 7, 2015 at 8:48 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Mon, Dec 07, 2015 at 08:34:43PM +0200, Moni Shoua wrote:
>> Well, just tell me how you want to discover gid_index when you poll
>> the WC out of the CQ.
>
> Hey, I'm not desiging this rocev2 stuff, this is something the rocev2
> community needs to sort out.
>
>> Like I said, the sgid itself is in the GRH that is scattered to the
>> buffers in the receive queue. When ib_poll_cq() is called the pointer
>> to GRH is not passed so there is no way to determine the gid_index
>> inside the polling context.
>
> Then have an API to let the consumer recover it.
We do but it has it disadvantages. We'll use it when there it no other
choice (i.e. not supporting HW)
>
>> BTW, why do you argue about this only now? This is not RoCE specific
>> issue. sgid_index was always resolved outside the polling context. The
>> only difference between then and now is that with InfiniBand there was
>> no conflict about sgid_index once you had the sgid.
>
> I think you answered your own question. In IB and rocev1 the sgid
> index is not ambiguous, rocev2 breaks that requirement without
> adaquately fixing it.

ambiguity is not the issue here. You say that sgid_index needs to be
known during poll so why raise it now?
We want to add information to the CQE to help resolve the conflict but
we leave the resolution of sgid index to the caller, like for RoCEv1
and like for native InfiniBand
>
> It is probably also the case that rocev1 has similar problems,
> depending on how vland and macvlan ended up being implemented.
>
>> Now, when same gid value can be present in multiple entries you need
>> an extra parameter to get distinguish between them - that would be
>> the netwrok_type
>
> Eh?  You are only worried about duplicates between rocev1 mode which
> uses the GUID and v2 mode which uses the host's IP address?
>
> What about duplicates entirely within v2 mode where vlan and macvlan
> can create duplicate host IP addresses.
>
> You can also just not create duplicate entries in the first place. For
> instance there probably isn't a really a good reason to use rocev2 for
> IPv6 link local addreses, that trivially eliminates the v1/v2
> collision.
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                     ` <CAG9sBKM2s9BMv8priCHnFaMcSnuafn4vf+6+0Rooc6Gw0ZSaLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-09 18:01                                                       ` Jason Gunthorpe
       [not found]                                                         ` <20151209180129.GD31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-09 18:01 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Wed, Dec 09, 2015 at 11:34:10AM +0200, Moni Shoua wrote:
> > Eh? I think you've missed the point, there is no net device when
> > looking at a wc.
> >
> > Look, here is a concrete direction:
> >
> > Replace all the crap in
> > ib_init_ah_from_wc/get_sgid_index_from_eth/rdma_addr_find_dmac_by_grh
> >
> > with a straightforward
> >
> >    rdma_dgid_index_from_wc(
> >                         const struct ib_qp *qp,
> >                         const struct ib_wc *wc,
> >                         const struct ib_grh *grh,
> >                         u16 *gid_index)
> >
> > Sort of function that reads the GRH and wc and returns the unambiguous
> > gid index that was used to receive that packet on the UD QP.
> >
> 
> I already answered this to but I'll do it again
> RoCEv2 spec says that L3 header will be scattered to receive WQE in
> the following way
> IPv6 and RoCEv1 - 40 bytes of the L3 header (GRH or IPv6) to the first
> 40 bytes of the receive bufs
> IPv4 - 20 bytes of the L3 header to the second half of the first 40
> bytes of the receive bufs. The first 20 bytes remain undefined.
> 
> Now, if you think how you deduce network_type from GRH you'll see that
> it requires tools like checksum validation and other validations and
> you end up with a method that is not 100% error free. So,to eliminate
> the need for heavy computation (with regards to the other option) and
> be free from false deductions you have the option of getting
> network_type from the hardware. So, if you do have hardware that
> supports it why give it up?

I understand how it works.

>From an API perspective, I want to see something that can be used
*correctly* and that means more along the lines of
rdma_dgid_index_from_wc and not what is proposed in this series.

That means not exposing the callers to this awful mess.

> > That said, I wouldn't object to vendor-specific bits in the wc. Ie if
> > mlx hardware needs a network_type bit to implement
> > rdma_find_dgid_index_from_wc, then fine - define a vendor specific
> > place to put it. In this case rdma_find_dgid_index_from_wc would be a
> > driver call back, which is fine, and what Caitlin was talking about.
> 
> This is not a Mellanox specific flag. See a quote from the spec
>
> A17.4.5.1 UD COMPLETION QUEUE ENTRIES (CQES)
> For UD, the Completion Queue Entry (CQE) includes remote address
> information (InfiniBand Specification Vol. 1 Rev 1.2.1 Section
> 11.4.2.1). For RoCEv2, the remote address information comprises the
> source L2 Address and a flag that indicates if the received frame is
> an IPv4, IPv6 or RoCE packet.

rdma_dgid_index_from_wc satisfies the above spec requirements without
creating a horrible API, or requiring all vendors to implement a
network type flag.

> > But, it is not part of our verbs API, and I'd *strongly* encourage
> > other vendors and future hardware to simply return the gid index that
> > the hardware matched instead of requiring the software to try and
> > guess after the fact.
> 
> Could be problematic for virtual machine architectures that give a
> portion of the entire GID table to a VM that index it 0..N

I doubt it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                     ` <HE1PR05MB1418B1F0F393AD0D92424043B1E80-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-12-09 18:09                                                       ` Jason Gunthorpe
       [not found]                                                         ` <20151209180920.GE31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-09 18:09 UTC (permalink / raw)
  To: Liran Liss
  Cc: Moni Shoua, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Wed, Dec 09, 2015 at 09:38:21AM +0000, Liran Liss wrote:

> > But, it is not part of our verbs API, and I'd *strongly* encourage other
> > vendors and future hardware to simply return the gid index that the
> > hardware matched instead of requiring the software to try and guess after
> > the fact.
 
> You are pushing abstraction into provider code instead of handling it in a generic way.

No, I am defining an API that *make sense* and doesn't leak useless
details. Of course that doesn't force code duplication or anyhting
like that, just implement it smartly.

I think mlx made a big mistake returning network_type instead of gid
index, and I don't want to see that error enshrined in our API.

> The Verbs are a low-level API, that should report exactly what was
> received from the wire.  In the RoCEv2 case, it should be the GID/IP
> addresses and the protocol type.  The addressing information is not
> intended to be used directly by applications; it is the raw bits
> that were accepted from the wire.

Low level details isn't what any in kernel consumer needs. Everything
in kernel needs the gid index to determine the namespace, routing and
other details. It is not optional. A common API is thus needed to do
this conversion.

API-wise, once you get the gid index then it is trivial to make easy
extractors for everything else. ie for example:
rdma_get_ud_src_sockaddr(gid_index,&addr,wc,grh)
rdma_get_ud_dst_sockaddr(gid_index,&addr,wc,grh)

> ib_init_ah_from_wc() and friends is exactly the place that you want
> to create an address handle based on completion and packet fields.

CMA needs exactly the same logic as well, the fact it doesn't have it
is a bug in this series.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                         ` <20151209180920.GE31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-10  7:53                                                           ` Moni Shoua
  2015-12-13 13:56                                                           ` Liran Liss
  1 sibling, 0 replies; 64+ messages in thread
From: Moni Shoua @ 2015-12-10  7:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

> No, I am defining an API that *make sense* and doesn't leak useless
> details. Of course that doesn't force code duplication or anyhting
> like that, just implement it smartly.
>
> I think mlx made a big mistake returning network_type instead of gid
> index, and I don't want to see that error enshrined in our API.
>
returning gid_index is wrong because it forces CQ pollers to be aware
of the entire table. Like I already mentioned, the GID table is a HW
resource that can be divided and handed to multiple VMs,

>> The Verbs are a low-level API, that should report exactly what was
>> received from the wire.  In the RoCEv2 case, it should be the GID/IP
>> addresses and the protocol type.  The addressing information is not
>> intended to be used directly by applications; it is the raw bits
>> that were accepted from the wire.
>
> Low level details isn't what any in kernel consumer needs. Everything
> in kernel needs the gid index to determine the namespace, routing and
> other details. It is not optional. A common API is thus needed to do
> this conversion.
>
> API-wise, once you get the gid index then it is trivial to make easy
> extractors for everything else. ie for example:
> rdma_get_ud_src_sockaddr(gid_index,&addr,wc,grh)
> rdma_get_ud_dst_sockaddr(gid_index,&addr,wc,grh)
>
>> ib_init_ah_from_wc() and friends is exactly the place that you want
>> to create an address handle based on completion and packet fields.
>
> CMA needs exactly the same logic as well, the fact it doesn't have it
> is a bug in this series.
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                         ` <20151209180129.GD31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-10  7:58                                                           ` Moni Shoua
       [not found]                                                             ` <CAG9sBKPNWsO7ugdyhd5sYp_yOSrUkG0b-Lfq6Eenrydg3MzyvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-10  7:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

>>
>> A17.4.5.1 UD COMPLETION QUEUE ENTRIES (CQES)
>> For UD, the Completion Queue Entry (CQE) includes remote address
>> information (InfiniBand Specification Vol. 1 Rev 1.2.1 Section
>> 11.4.2.1). For RoCEv2, the remote address information comprises the
>> source L2 Address and a flag that indicates if the received frame is
>> an IPv4, IPv6 or RoCE packet.
>
> rdma_dgid_index_from_wc satisfies the above spec requirements without
How? Please explain.

> creating a horrible API, or requiring all vendors to implement a
> network type flag.
>
Actually you haven't suggested anything yet besides a name to the function.
I already said that calculating gid_index from wc and grh requires
extra CPU cycles and is prone to mistakes. But, I might be wrong and
you have a better idea. Do you?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
       [not found]         ` <56658CB4.7090402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-12-10 17:36           ` Matan Barak
  0 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-10 17:36 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Matan Barak, Doug Ledford, linux-rdma, Eran Ben Elisha,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

On Mon, Dec 7, 2015 at 3:42 PM, Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> On 12/03/2015 03:47 PM, Matan Barak wrote:
>> +static int addr_resolve_neigh(struct dst_entry *dst,
>> +                           const struct sockaddr *dst_in,
>> +                           struct rdma_dev_addr *addr)
>> +{
>> +     if (dst->dev->flags & IFF_LOOPBACK) {
>> +             int ret;
>> +
>> +             ret = rdma_translate_ip(dst_in, addr, NULL);
>> +             if (!ret)
>> +                     memcpy(addr->dst_dev_addr, addr->src_dev_addr,
>> +                            MAX_ADDR_LEN);
>> +
>> +             return ret;
>> +     }
>> +
>> +     /* If the device does ARP internally */
> You mean "doesn't do ARP internally" right?
>

Correct, nice catch :)

>> +     if (!(dst->dev->flags & IFF_NOARP)) {
>> +             const struct sockaddr_in *dst_in4 =
>> +                     (const struct sockaddr_in *)dst_in;
>> +             const struct sockaddr_in6 *dst_in6 =
>> +                     (const struct sockaddr_in6 *)dst_in;
>> +
>> +             return dst_fetch_ha(dst, addr,
>> +                                 dst_in->sa_family == AF_INET ?
>> +                                 (const void *)&dst_in4->sin_addr.s_addr :
>> +                                 (const void *)&dst_in6->sin6_addr);
>> +     }
>> +
>> +     return rdma_copy_addr(addr, dst->dev, NULL);
>> +}
>
>> +int rdma_resolve_ip_route(struct sockaddr *src_addr,
>> +                       const struct sockaddr *dst_addr,
>> +                       struct rdma_dev_addr *addr)
>> +{
>> +     struct sockaddr_storage ssrc_addr;
>> +     struct sockaddr *src_in = (struct sockaddr *)&ssrc_addr;
>> +
>> +     if (src_addr->sa_family != dst_addr->sa_family)
>> +             return -EINVAL;
>> +
>> +     if (src_addr)
>> +             memcpy(src_in, src_addr, rdma_addr_size(src_addr));
>> +     else
>> +             src_in->sa_family = dst_addr->sa_family;
> Don't you need to clear the rest of src_in? I believe you pass
> uninitialized memory to it.
>

Correct, I'll fix that.

>> +
>> +     return addr_resolve(src_in, dst_addr, addr, false);
>> +}
>> +EXPORT_SYMBOL(rdma_resolve_ip_route);
>
> Haggai

Thanks for taking a look.

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                             ` <CAG9sBKPNWsO7ugdyhd5sYp_yOSrUkG0b-Lfq6Eenrydg3MzyvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-10 17:38                                                               ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-10 17:38 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Liran Liss, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

On Thu, Dec 10, 2015 at 09:58:51AM +0200, Moni Shoua wrote:

> > creating a horrible API, or requiring all vendors to implement a
> > network type flag.
> >
> Actually you haven't suggested anything yet besides a name to the function.
> I already said that calculating gid_index from wc and grh requires
> extra CPU cycles and is prone to mistakes. But, I might be wrong and
> you have a better idea. Do you?

I told you, if a driver requires vendor specific information then
include it as private data in the grh.

I keep saying this: computing the gid index is not optional. If a
drive can't do it efficiently then too bad, it has to go the long way
around.

>> I think mlx made a big mistake returning network_type instead of gid
>> index, and I don't want to see that error enshrined in our API.
>>
> returning gid_index is wrong because it forces CQ pollers to be aware
> of the entire table. Like I already mentioned, the GID table is a HW
> resource that can be divided and handed to multiple VMs,

Please. If HW virt stuff can't sort this out it is broken, there are
many trivial solutions.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                         ` <20151209180920.GE31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-12-10  7:53                                                           ` Moni Shoua
@ 2015-12-13 13:56                                                           ` Liran Liss
       [not found]                                                             ` <HE1PR05MB141868A873246D76F580401FB1EC0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Liran Liss @ 2015-12-13 13:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Moni Shoua, Christoph Hellwig, Matan Barak, Doug Ledford,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Somnath Kotur

> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-

> 
> > You are pushing abstraction into provider code instead of handling it in a
> generic way.
> 
> No, I am defining an API that *make sense* and doesn't leak useless details.
> Of course that doesn't force code duplication or anyhting like that, just
> implement it smartly.
> 
> I think mlx made a big mistake returning network_type instead of gid index,
> and I don't want to see that error enshrined in our API.
> 
> > The Verbs are a low-level API, that should report exactly what was
> > received from the wire.  In the RoCEv2 case, it should be the GID/IP
> > addresses and the protocol type.  The addressing information is not
> > intended to be used directly by applications; it is the raw bits that
> > were accepted from the wire.
> 
> Low level details isn't what any in kernel consumer needs. Everything in
> kernel needs the gid index to determine the namespace, routing and other
> details. It is not optional. A common API is thus needed to do this conversion.

The Verbs are not intended only for kernel consumers, but also for the ib_core, cma, etc.
For the ib_core, a provider needs to report *all* relevant information that is not visible in the packet payload.
The network type is part of this information.
The proposed changes are a straightforward extension to the code base, directly follow the specification, and adhere to the RDMA stack design in which IP addressing is handled by the cma.

Also, I don't want to do any route resolution on the Rx path.
A UD QP completion just reports the details of the packet it received.

Conceptually, an incoming packet may not even match an SGID index at all.
Maybe, responses should be sent from another device. This should not be decided that the point that a packet was received.

> 
> API-wise, once you get the gid index then it is trivial to make easy extractors
> for everything else. ie for example:
> rdma_get_ud_src_sockaddr(gid_index,&addr,wc,grh)
> rdma_get_ud_dst_sockaddr(gid_index,&addr,wc,grh)
> 

Nice ideas, but not relevant to completions.
The resolved dst address could point to another port or device completely.
The proper way to handle remote UD addresses is by rdma_ids that encapsulate address handles and bound devices.

> > ib_init_ah_from_wc() and friends is exactly the place that you want to
> > create an address handle based on completion and packet fields.
> 
> CMA needs exactly the same logic as well, the fact it doesn't have it is a bug
> in this series.
> 

init_ah_from_wc() should include a route lookup, and this will be fixed.

> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc
       [not found]                                                             ` <HE1PR05MB141868A873246D76F580401FB1EC0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-12-13 14:03                                                               ` Matan Barak
  0 siblings, 0 replies; 64+ messages in thread
From: Matan Barak @ 2015-12-13 14:03 UTC (permalink / raw)
  To: Liran Liss
  Cc: Jason Gunthorpe, Moni Shoua, Christoph Hellwig, Matan Barak,
	Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Somnath Kotur

On Sun, Dec 13, 2015 at 3:56 PM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>
>>
>> > You are pushing abstraction into provider code instead of handling it in a
>> generic way.
>>
>> No, I am defining an API that *make sense* and doesn't leak useless details.
>> Of course that doesn't force code duplication or anyhting like that, just
>> implement it smartly.
>>
>> I think mlx made a big mistake returning network_type instead of gid index,
>> and I don't want to see that error enshrined in our API.
>>
>> > The Verbs are a low-level API, that should report exactly what was
>> > received from the wire.  In the RoCEv2 case, it should be the GID/IP
>> > addresses and the protocol type.  The addressing information is not
>> > intended to be used directly by applications; it is the raw bits that
>> > were accepted from the wire.
>>
>> Low level details isn't what any in kernel consumer needs. Everything in
>> kernel needs the gid index to determine the namespace, routing and other
>> details. It is not optional. A common API is thus needed to do this conversion.
>
> The Verbs are not intended only for kernel consumers, but also for the ib_core, cma, etc.
> For the ib_core, a provider needs to report *all* relevant information that is not visible in the packet payload.
> The network type is part of this information.
> The proposed changes are a straightforward extension to the code base, directly follow the specification, and adhere to the RDMA stack design in which IP addressing is handled by the cma.
>
> Also, I don't want to do any route resolution on the Rx path.
> A UD QP completion just reports the details of the packet it received.
>
> Conceptually, an incoming packet may not even match an SGID index at all.
> Maybe, responses should be sent from another device. This should not be decided that the point that a packet was received.
>
>>
>> API-wise, once you get the gid index then it is trivial to make easy extractors
>> for everything else. ie for example:
>> rdma_get_ud_src_sockaddr(gid_index,&addr,wc,grh)
>> rdma_get_ud_dst_sockaddr(gid_index,&addr,wc,grh)
>>
>
> Nice ideas, but not relevant to completions.
> The resolved dst address could point to another port or device completely.
> The proper way to handle remote UD addresses is by rdma_ids that encapsulate address handles and bound devices.
>
>> > ib_init_ah_from_wc() and friends is exactly the place that you want to
>> > create an address handle based on completion and packet fields.
>>
>> CMA needs exactly the same logic as well, the fact it doesn't have it is a bug
>> in this series.
>>
>
> init_ah_from_wc() should include a route lookup, and this will be fixed.
>

This was already fixed in v2.

>> Jason
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (22 preceding siblings ...)
  2015-12-03 13:47   ` [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP Matan Barak
@ 2015-12-15  7:15   ` Moni Shoua
       [not found]     ` <CAG9sBKM7j6w+Gk4PXXO+0mHpFeVaBdeNmr-z9V9fPR562o+kmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  23 siblings, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-15  7:15 UTC (permalink / raw)
  To: Matan Barak
  Cc: Doug Ledford, linux-rdma, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> Hi Doug,
>
> This series adds the support for RoCE v2. In order to support RoCE v2,
> we add gid_type attribute to every GID. When the RoCE GID management
> populates the GID table, it duplicates each GID with all supported types.
> This gives the user the ability to communicate over each supported
> type.
>
> Patch 0001, 0002 and 0003 add support for multiple GID types to the
> cache and related APIs. The third patch exposes the GID attributes
> information is sysfs.
>
> Patch 0004 adds the RoCE v2 GID type and the capabilities required
> from the vendor in order to implement RoCE v2. These capabilities
> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.
>
> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
> information should come from the vendor's driver. In case the vendor
> doesn't supply this information, we parse the packet headers and resolve
> its network type. Patch 0005 adds this information and required utilities.
>
> Patches 0006 and 0007 adds route validation. This is mandatory to ensure
> that we send packets using GIDS which corresponds to a net-device that
> can be routed to the destination.
>
> Patches 0008 and 0009 add configfs support (and the required
> infrastructure) for CMA. The administrator should be able to set the
> default RoCE type. This is done through a new per-port
> default_roce_mode configfs file.
>
> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
> packets. This is required for vendors which implement their
> QP1 as a Raw QP.
>
> Patch 0011 adds support for IPv4 multicast as an IPv4 network
> requires IGMP to be sent in order to join multicast groups.
>
> Vendors code aren't part of this patch-set. Soft-Roce will be
> sent soon and depends on these patches. Other vendors, like
> mlx4, ocrdma and mlx5 will follow.
>
> This patch is applied on "Change per-entry locks in GID cache to table lock"
> which was sent to the mailing list.
>
> Thanks,
> Matan
>
> Changed from V1:
>  - Rebased against Linux 4.4-rc2 master branch.
>  - Add route validation
>  - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
>  - Add documentation for configfs and sysfs ABI
>  - Remove ifindex and gid_type from mcmember
>
> Changes from V0:
>  - Rebased patches against Doug's latest k.o/for-4.4 tree.
>  - Fixed a bug in configfs (rmdir caused an incorrect free).
>
> Matan Barak (8):
>   IB/core: Add gid_type to gid attribute
>   IB/cm: Use the source GID index type
>   IB/core: Add gid attributes to sysfs
>   IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
>   IB/core: Move rdma_is_upper_dev_rcu to header file
>   IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
>   IB/rdma_cm: Add wrapper for cma reference count
>   IB/cma: Add configfs for rdma_cm
>
> Moni Shoua (2):
>   IB/core: Initialize UD header structure with IP and UDP headers
>   IB/cma: Join and leave multicast groups with IGMP
>
> Somnath Kotur (1):
>   IB/core: Add rdma_network_type to wc
>
>  Documentation/ABI/testing/configfs-rdma_cm       |  22 ++
>  Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
>  drivers/infiniband/Kconfig                       |   9 +
>  drivers/infiniband/core/Makefile                 |   2 +
>  drivers/infiniband/core/addr.c                   | 185 +++++++++----
>  drivers/infiniband/core/cache.c                  | 169 ++++++++----
>  drivers/infiniband/core/cm.c                     |  31 ++-
>  drivers/infiniband/core/cma.c                    | 261 ++++++++++++++++--
>  drivers/infiniband/core/cma_configfs.c           | 321 +++++++++++++++++++++++
>  drivers/infiniband/core/core_priv.h              |  45 ++++
>  drivers/infiniband/core/device.c                 |  10 +-
>  drivers/infiniband/core/multicast.c              |  17 +-
>  drivers/infiniband/core/roce_gid_mgmt.c          |  81 ++++--
>  drivers/infiniband/core/sa_query.c               |  76 +++++-
>  drivers/infiniband/core/sysfs.c                  | 184 ++++++++++++-
>  drivers/infiniband/core/ud_header.c              | 155 ++++++++++-
>  drivers/infiniband/core/uverbs_marshall.c        |   1 +
>  drivers/infiniband/core/verbs.c                  | 170 ++++++++++--
>  drivers/infiniband/hw/mlx4/qp.c                  |   7 +-
>  drivers/infiniband/hw/mthca/mthca_qp.c           |   2 +-
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c         |   2 +-
>  include/rdma/ib_addr.h                           |  11 +-
>  include/rdma/ib_cache.h                          |   4 +
>  include/rdma/ib_pack.h                           |  45 +++-
>  include/rdma/ib_sa.h                             |   3 +
>  include/rdma/ib_verbs.h                          |  78 +++++-
>  26 files changed, 1704 insertions(+), 203 deletions(-)
>  create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
>  create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
>  create mode 100644 drivers/infiniband/core/cma_configfs.c
Doug
What's your stand here?
Besides the disagreement on patch #5 in the series there seems to be
no major comments
And regarding patch #5 I still claim that the implementation fits the
spec and therefore is how it should be.
How can we move on?

thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]     ` <CAG9sBKM7j6w+Gk4PXXO+0mHpFeVaBdeNmr-z9V9fPR562o+kmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-15 21:45       ` Doug Ledford
       [not found]         ` <567089F1.30004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-12-15 21:45 UTC (permalink / raw)
  To: Moni Shoua, Matan Barak
  Cc: linux-rdma, Eran Ben Elisha, Haggai Eran, Or Gerlitz,
	Jason Gunthorpe, Somnath Kotur

[-- Attachment #1: Type: text/plain, Size: 7177 bytes --]

On 12/15/2015 02:15 AM, Moni Shoua wrote:
> On Thu, Dec 3, 2015 at 3:47 PM, Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Doug,
>>
>> This series adds the support for RoCE v2. In order to support RoCE v2,
>> we add gid_type attribute to every GID. When the RoCE GID management
>> populates the GID table, it duplicates each GID with all supported types.
>> This gives the user the ability to communicate over each supported
>> type.
>>
>> Patch 0001, 0002 and 0003 add support for multiple GID types to the
>> cache and related APIs. The third patch exposes the GID attributes
>> information is sysfs.
>>
>> Patch 0004 adds the RoCE v2 GID type and the capabilities required
>> from the vendor in order to implement RoCE v2. These capabilities
>> are grouped together as RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP.
>>
>> RoCE v2 could work at IPv4 and IPv6 networks. When receiving ib_wc, this
>> information should come from the vendor's driver. In case the vendor
>> doesn't supply this information, we parse the packet headers and resolve
>> its network type. Patch 0005 adds this information and required utilities.
>>
>> Patches 0006 and 0007 adds route validation. This is mandatory to ensure
>> that we send packets using GIDS which corresponds to a net-device that
>> can be routed to the destination.
>>
>> Patches 0008 and 0009 add configfs support (and the required
>> infrastructure) for CMA. The administrator should be able to set the
>> default RoCE type. This is done through a new per-port
>> default_roce_mode configfs file.
>>
>> Patch 0010 formats a QP1 packet in order to support RoCE v2 CM
>> packets. This is required for vendors which implement their
>> QP1 as a Raw QP.
>>
>> Patch 0011 adds support for IPv4 multicast as an IPv4 network
>> requires IGMP to be sent in order to join multicast groups.
>>
>> Vendors code aren't part of this patch-set. Soft-Roce will be
>> sent soon and depends on these patches. Other vendors, like
>> mlx4, ocrdma and mlx5 will follow.
>>
>> This patch is applied on "Change per-entry locks in GID cache to table lock"
>> which was sent to the mailing list.
>>
>> Thanks,
>> Matan
>>
>> Changed from V1:
>>  - Rebased against Linux 4.4-rc2 master branch.
>>  - Add route validation
>>  - ConfigFS - avoid compiling INFINIBAND=y and CONFIGFS_FS=m
>>  - Add documentation for configfs and sysfs ABI
>>  - Remove ifindex and gid_type from mcmember
>>
>> Changes from V0:
>>  - Rebased patches against Doug's latest k.o/for-4.4 tree.
>>  - Fixed a bug in configfs (rmdir caused an incorrect free).
>>
>> Matan Barak (8):
>>   IB/core: Add gid_type to gid attribute
>>   IB/cm: Use the source GID index type
>>   IB/core: Add gid attributes to sysfs
>>   IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type
>>   IB/core: Move rdma_is_upper_dev_rcu to header file
>>   IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path
>>   IB/rdma_cm: Add wrapper for cma reference count
>>   IB/cma: Add configfs for rdma_cm
>>
>> Moni Shoua (2):
>>   IB/core: Initialize UD header structure with IP and UDP headers
>>   IB/cma: Join and leave multicast groups with IGMP
>>
>> Somnath Kotur (1):
>>   IB/core: Add rdma_network_type to wc
>>
>>  Documentation/ABI/testing/configfs-rdma_cm       |  22 ++
>>  Documentation/ABI/testing/sysfs-class-infiniband |  16 ++
>>  drivers/infiniband/Kconfig                       |   9 +
>>  drivers/infiniband/core/Makefile                 |   2 +
>>  drivers/infiniband/core/addr.c                   | 185 +++++++++----
>>  drivers/infiniband/core/cache.c                  | 169 ++++++++----
>>  drivers/infiniband/core/cm.c                     |  31 ++-
>>  drivers/infiniband/core/cma.c                    | 261 ++++++++++++++++--
>>  drivers/infiniband/core/cma_configfs.c           | 321 +++++++++++++++++++++++
>>  drivers/infiniband/core/core_priv.h              |  45 ++++
>>  drivers/infiniband/core/device.c                 |  10 +-
>>  drivers/infiniband/core/multicast.c              |  17 +-
>>  drivers/infiniband/core/roce_gid_mgmt.c          |  81 ++++--
>>  drivers/infiniband/core/sa_query.c               |  76 +++++-
>>  drivers/infiniband/core/sysfs.c                  | 184 ++++++++++++-
>>  drivers/infiniband/core/ud_header.c              | 155 ++++++++++-
>>  drivers/infiniband/core/uverbs_marshall.c        |   1 +
>>  drivers/infiniband/core/verbs.c                  | 170 ++++++++++--
>>  drivers/infiniband/hw/mlx4/qp.c                  |   7 +-
>>  drivers/infiniband/hw/mthca/mthca_qp.c           |   2 +-
>>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c         |   2 +-
>>  include/rdma/ib_addr.h                           |  11 +-
>>  include/rdma/ib_cache.h                          |   4 +
>>  include/rdma/ib_pack.h                           |  45 +++-
>>  include/rdma/ib_sa.h                             |   3 +
>>  include/rdma/ib_verbs.h                          |  78 +++++-
>>  26 files changed, 1704 insertions(+), 203 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/configfs-rdma_cm
>>  create mode 100644 Documentation/ABI/testing/sysfs-class-infiniband
>>  create mode 100644 drivers/infiniband/core/cma_configfs.c
> Doug
> What's your stand here?
> Besides the disagreement on patch #5 in the series there seems to be
> no major comments
> And regarding patch #5 I still claim that the implementation fits the
> spec and therefore is how it should be.
> How can we move on?

I had hoped that you and Jason would converge on an agreement.  Absent
that, I need time to do a detailed review of the code, and how this code
interacts with the existing namespace patches, to see who's right.

In particular, Liran piped up with this comment:

"Also, I don't want to do any route resolution on the Rx path. A UD QP
completion just reports the details of the packet it received.

Conceptually, an incoming packet may not even match an SGID index at
all.  Maybe, responses should be sent from another device. This should
not be decided that the point that a packet was received."

The part that bothers me about this is that this statement makes sense
when just thinking about the spec, as you say.  However, once you
consider namespaces, security implications make this statement spec
compliant, but still unacceptable.  The spec itself is silent on
namespaces.  But, you guys wanted, and you got, namespace support.
Since that's beyond spec, and carries security requirements, I think
it's fair to say that from now on, the Linux kernel RDMA stack can no
longer *just* be spec compliant.  There are additional concerns that
must always be addressed with new changes, and those are the namespace
constraint preservation concerns.

Since you and Jason did not reach a consensus, I have to dig in and see
if these patches make it possible to break namespace confinement, either
accidentally or with intentionally tricky behavior.  That's going to
take me some time.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]         ` <567089F1.30004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-15 22:00           ` Jason Gunthorpe
       [not found]             ` <20151215220033.GB30404-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-12-16  6:56           ` Moni Shoua
  2015-12-16 14:18           ` Liran Liss
  2 siblings, 1 reply; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-15 22:00 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Moni Shoua, Matan Barak, linux-rdma, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Somnath Kotur

On Tue, Dec 15, 2015 at 04:45:21PM -0500, Doug Ledford wrote:

> In particular, Liran piped up with this comment:
> 
> "Also, I don't want to do any route resolution on the Rx path. A UD QP
> completion just reports the details of the packet it received.
> 
> Conceptually, an incoming packet may not even match an SGID index at
> all.  Maybe, responses should be sent from another device. This should
> not be decided that the point that a packet was received."
> 
> The part that bothers me about this is that this statement makes sense
> when just thinking about the spec, as you say.  However, once you
> consider namespaces, security implications make this statement spec
> compliant, but still unacceptable.

This is where I got to and decided there is no desire to make a
correct patch for this stuff.

You are completely correct in your comments above.

> Since you and Jason did not reach a consensus, I have to dig in and see
> if these patches make it possible to break namespace confinement, either
> accidentally or with intentionally tricky behavior.  That's going to
> take me some time.

Everything to do with parsing a wc and constructing an AH is wrong in
this series, and the fixes require the API changes I mentioned ( add
'wc to gid index' API call, add 'route to AH' API call)

Every time you read 'route validation' - that is an error, the route
should never just be validated, it is needed information to construct
a rocev2 AH. All the places that roughly hand parse the rocev2 WC
should not be open coded.

Even if current HW is broken for namespaces we should not enshrine that
in the kapi.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]         ` <567089F1.30004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-12-15 22:00           ` Jason Gunthorpe
@ 2015-12-16  6:56           ` Moni Shoua
       [not found]             ` <CAG9sBKNa4Hd0WVqhZCLeqXjc2zE2h+dBwF6zVgGT0R_Gt1FEwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-12-16 14:18           ` Liran Liss
  2 siblings, 1 reply; 64+ messages in thread
From: Moni Shoua @ 2015-12-16  6:56 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, linux-rdma, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

> The part that bothers me about this is that this statement makes sense
> when just thinking about the spec, as you say.  However, once you
> consider namespaces, security implications make this statement spec
> compliant, but still unacceptable.  The spec itself is silent on
> namespaces.  But, you guys wanted, and you got, namespace support.
> Since that's beyond spec, and carries security requirements, I think
> it's fair to say that from now on, the Linux kernel RDMA stack can no
> longer *just* be spec compliant.  There are additional concerns that
> must always be addressed with new changes, and those are the namespace
> constraint preservation concerns.

I can't object to that but I really would like to get an example of a
security risk.
So far, besides hearing that the way we choose to handle completions
is wrong, I didn't get a convincing example of how where it doesn't
work. Moreover, regarding security, all we wanted is for HW to report
the L3 protocol (IB, IPv4, or IPv6) in the packet. This is data that
with some extra CPU cycles can be obtained from the 40 bytes that are
scattered to the receive bufs anyway. So, if there is a security hole
it exists from day one of the IB stack and this is not the time we
should insist on fixing it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]             ` <20151215220033.GB30404-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-12-16  9:57               ` Liran Liss
       [not found]                 ` <HE1PR05MB14188FDA12D077BA2720C84BB1EF0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 64+ messages in thread
From: Liran Liss @ 2015-12-16  9:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Doug Ledford
  Cc: Moni Shoua, Matan Barak, linux-rdma, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Somnath Kotur

> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-

> > Since you and Jason did not reach a consensus, I have to dig in and
> > see if these patches make it possible to break namespace confinement,
> > either accidentally or with intentionally tricky behavior.  That's
> > going to take me some time.
> 
> Everything to do with parsing a wc and constructing an AH is wrong in this
> series, and the fixes require the API changes I mentioned ( add 'wc to gid
> index' API call, add 'route to AH' API call)
> 
> Every time you read 'route validation' - that is an error, the route should
> never just be validated, it is needed information to construct a rocev2 AH. All
> the places that roughly hand parse the rocev2 WC should not be open coded.
> 
> Even if current HW is broken for namespaces we should not enshrine that in
> the kapi.
>

Currently, namespaces are not supported for RoCE.
So for this patches, this is irrelevant.
That said, we have everything we need for RoCE namespace support when we get there.

All of this has nothing to do with "broken" and enshrining anything in the kapi.
That's just bullshit.

The crux of the discussion is the meaning of the API.
The design of the RDMA stack is that Verbs are used by core IB services, such as addressing.
For these services, as the specification requires, all relevant fields must be reported in the CQE, period.
All spec-compliant HW devices follow this.

If a ULP wants to create an address handle from a completion, there are service routines to accomplish that, based on the reported fields.
If it doesn't care, there is no reason to sacrifice performance.

--Liran

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]         ` <567089F1.30004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-12-15 22:00           ` Jason Gunthorpe
  2015-12-16  6:56           ` Moni Shoua
@ 2015-12-16 14:18           ` Liran Liss
  2 siblings, 0 replies; 64+ messages in thread
From: Liran Liss @ 2015-12-16 14:18 UTC (permalink / raw)
  To: Doug Ledford, Moni Shoua, Matan Barak
  Cc: linux-rdma, Eran Ben Elisha, Haggai Eran, Or Gerlitz,
	Jason Gunthorpe, Somnath Kotur

> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Doug Ledford

> In particular, Liran piped up with this comment:
> 
> "Also, I don't want to do any route resolution on the Rx path. A UD QP
> completion just reports the details of the packet it received.
> 
> Conceptually, an incoming packet may not even match an SGID index at all.
> Maybe, responses should be sent from another device. This should not be
> decided that the point that a packet was received."
> 
> The part that bothers me about this is that this statement makes sense when
> just thinking about the spec, as you say.  However, once you consider
> namespaces, security implications make this statement spec compliant, but
> still unacceptable.  The spec itself is silent on namespaces.  But, you guys
> wanted, and you got, namespace support.
> Since that's beyond spec, and carries security requirements, I think it's fair to
> say that from now on, the Linux kernel RDMA stack can no longer *just* be
> spec compliant.  There are additional concerns that must always be
> addressed with new changes, and those are the namespace constraint
> preservation concerns.
> 

Hi Doug,

Currently, there is no namespace support for RoCE, so the RoCEv2 patches have *nothing* to do with this.
That said, the RoCE specification does not contradict or inhibit any future implementation for namespaces.
The CMA will get the <VLAN, IP> from ib_wc and resolve to a netdev (or sgid_index->netdev, whatever) and process the request accordingly.

We can have endless theoretical discussions on features that are not even implemented yet (e.g., RoCE namespace support) each time we add a minor straightforward, *spec-compliant* change that *all* RoCE vendors adhere to.
If someone wishes to introduce a new concept, API refactoring proposal, or similar for community review, please do so with a different RFC.

This is hindering progress of the whole RDMA stack development!
For example, the posted SoftRoCE patches are waiting just for this.

The RoCEv2 patches have been posted upstream for review for months (!) now.
I simply cannot understand why this is lagging for so long; let's start to get the wheels rolling.
--Liran


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]                 ` <HE1PR05MB14188FDA12D077BA2720C84BB1EF0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-12-16 18:02                   ` Jason Gunthorpe
  0 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-16 18:02 UTC (permalink / raw)
  To: Liran Liss
  Cc: Doug Ledford, Moni Shoua, Matan Barak, linux-rdma,
	Eran Ben Elisha, Haggai Eran, Or Gerlitz, Somnath Kotur

On Wed, Dec 16, 2015 at 09:57:02AM +0000, Liran Liss wrote:
> Currently, namespaces are not supported for RoCE.

IMHO, we should not be accepting rocev2 without at least basic
namespace support too, since it is fairly trivial to do based on the
work that is already done for verbs. An obvious missing piece is the
'wc to gid index' API I keep asking for.

> That said, we have everything we need for RoCE namespace support when we get there.

Then there is no problem with the 'wc to gid index' stuff, so stop
complaining about it.

> All of this has nothing to do with "broken" and enshrining anything in the kapi.
> That's just bullshit.

No, it is a critique of the bad kAPI choices in this patch that mean
it broadly doesn't use namespaces, net devices or IP routing
correctly.

> The design of the RDMA stack is that Verbs are used by core IB
> services, such as addressing.  For these services, as the
> specification requires, all relevant fields must be reported in the
> CQE, period.  All spec-compliant HW devices follow this.

Wrong, the kapi needs to meet the needs of the kernel, and is
influenced but not set by the various standards.

That means we get to make better choices in the kapi than exposing
wc.network_type.

> If a ULP wants to create an address handle from a completion, there
> are service routines to accomplish that, based on the reported
> fields.  If it doesn't care, there is no reason to sacrifice
> performance.

I have no idea why you think there would be a performance sacrifice,
maybe you should review the patches and my remarks again.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]             ` <CAG9sBKNa4Hd0WVqhZCLeqXjc2zE2h+dBwF6zVgGT0R_Gt1FEwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-12-16 18:13               ` Jason Gunthorpe
  2015-12-16 20:39               ` Doug Ledford
  1 sibling, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-16 18:13 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Doug Ledford, Matan Barak, linux-rdma, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Somnath Kotur

On Wed, Dec 16, 2015 at 08:56:01AM +0200, Moni Shoua wrote:

> I can't object to that but I really would like to get an example of a
> security risk.

How can anyone give you an example when nobody knows exactly how mlx
hardware works in this area?

>From an kapi prespective, the security design is very simple.

Every single UD packet the kapi side has to process must be
unambiguously associated with a gid_index or dropped. Period full
stop. I would think that is an obvious conclusion based on the design
of the gid cache.

This is why we need a clear API to get this critical information. It
should not be open coded in init_ah_from_wc, it should not be done
some other way in the CMA code.

This is a simple matter of sane kapi design, nothing more.

I have no idea why this is so objectionable.

> scattered to the receive bufs anyway. So, if there is a security hole
> it exists from day one of the IB stack and this is not the time we
> should insist on fixing it.

IB isn't interacting with the net stack in the same way rocev2 is, so
this is not a pre-existing problem.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]             ` <CAG9sBKNa4Hd0WVqhZCLeqXjc2zE2h+dBwF6zVgGT0R_Gt1FEwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-12-16 18:13               ` Jason Gunthorpe
@ 2015-12-16 20:39               ` Doug Ledford
       [not found]                 ` <5671CBF4.4060602-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 64+ messages in thread
From: Doug Ledford @ 2015-12-16 20:39 UTC (permalink / raw)
  To: Moni Shoua
  Cc: Matan Barak, linux-rdma, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

[-- Attachment #1: Type: text/plain, Size: 5181 bytes --]

On 12/16/2015 01:56 AM, Moni Shoua wrote:
>> The part that bothers me about this is that this statement makes sense
>> when just thinking about the spec, as you say.  However, once you
>> consider namespaces, security implications make this statement spec
>> compliant, but still unacceptable.  The spec itself is silent on
>> namespaces.  But, you guys wanted, and you got, namespace support.
>> Since that's beyond spec, and carries security requirements, I think
>> it's fair to say that from now on, the Linux kernel RDMA stack can no
>> longer *just* be spec compliant.  There are additional concerns that
>> must always be addressed with new changes, and those are the namespace
>> constraint preservation concerns.
> 
> I can't object to that but I really would like to get an example of a
> security risk.

*This* is exactly the conversation to be having right now.  The
namespace support has been added to the core, and so now we need to
define exactly what the impact of that is for new feature submissions
like this one.  More on that below...

> So far, besides hearing that the way we choose to handle completions
> is wrong, I didn't get a convincing example of how where it doesn't
> work.

Work is too fuzzy of a word to use here.  It could mean "applications
keep running", but that could be contrary to the namespace restrictions
as it may be that the application should *not* have continued to run
when namespace considerations were taken into account.

> Moreover, regarding security, all we wanted is for HW to report
> the L3 protocol (IB, IPv4, or IPv6) in the packet. This is data that
> with some extra CPU cycles can be obtained from the 40 bytes that are
> scattered to the receive bufs anyway. So, if there is a security hole
> it exists from day one of the IB stack and this is not the time we
> should insist on fixing it.

No, not true.  You are implementing RoCEv2 support, which is an entirely
new feature.  So this feature can't have had a security hole since
forever as it has never been in the kernel before now.  The objections
are arising because of the ordering of events.  Specifically, we added
the core namespace support (even though it isn't complete, so far it's
the infrastructure ready for various upper portions of the stack to
start using, but it isn't a complete stack wide solution yet) first, and
so this new feature, which will need to be a part of that namespace
infrastructure that other parts of the IB stack can use, should have its
namespace support already enabled (ideally, but if it didn't, it should
at least have a clear plan for how to enable it in the future).  Jason's
objection is based upon this premise and the fact that a technical
review of the code makes it look like the core namespace infrastructure
becomes less complete, not more, with the inclusion of these patches.

As I understand it, prior to these patches there would always be a 1:1
mapping of GID to gid_index because you would never have duplicate GIDs
in the GID table.  That allowed an easy, definitive 1:1 mapping of GID
to namespace via the existing infrastructure for any received packet [1].

These patches add the concept of duplicate GIDs that are differentiated
by their RoCE version (also called network type).  So, now, an incoming
packet could match a couple different gid_indexes and we need additional
information to get back to the definitive 1:1 mapping.  The submitted
patches are designed around a lazy resolution of the namespace,
preferring to defer the work of mapping the incoming packet to a unique
namespace until that information is actually needed.  To enable this
lazy resolution, it provides the network_type so that the resolution can
be done.

This is a fair assessment of the current state of things and what these
patches do, yes?

Jason's objections are this:

1)  The lazy resolution is wrong.
2)  The use of network_type as the additional information to get to the
unique namespace is vendor specific cruft that shouldn't be part of the
core kernel API.

Jason's preference would be that the above issues be resolved by
skipping the lazy resolution and instead doing proactive resolution on
receipt of a packet and then probably just pass the namespace around
instead of passing around the information needed to resolve the
namespace.  Or, at a minimum, at least make the information added to the
core API not something vendor specific like network_type, which is a
detail of the Mellanox implementation.

Jason, is this accurate for your position?

If everyone agrees that this is a fair statement of where we stand, then
I'll continue my response.  If not, please correct anything I have wrong
above and I'll take that into my continued response.

1 - Actually, for any received packet with associated IP address
information.  We've only enabled net namespaces for IP connections
between user space applications, for direct IB connections or for kernel
connections there is not yet any namespace support.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]                 ` <5671CBF4.4060602-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-12-16 21:25                   ` Jason Gunthorpe
  2015-12-17 10:04                   ` Moni Shoua
  2015-12-17 10:14                   ` Liran Liss
  2 siblings, 0 replies; 64+ messages in thread
From: Jason Gunthorpe @ 2015-12-16 21:25 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Moni Shoua, Matan Barak, linux-rdma, Eran Ben Elisha,
	Haggai Eran, Or Gerlitz, Somnath Kotur

On Wed, Dec 16, 2015 at 03:39:16PM -0500, Doug Ledford wrote:

> These patches add the concept of duplicate GIDs that are differentiated
> by their RoCE version (also called network type).

and by vlan, and smac, and ... Basically everything network unique about
a namespace has to be encapsulted in the gid index now. Each namespace
thus has a subset of gid indexes that are valid for it to use for
outbound and to recieve packets on.

roce didn't really have a way to work with net namespaces, AFAIK (?)
so it gets a pass.

But rocev2 very clearly does. It needs needs to address the issue
outlined in commit b8cab5dab15ff5c2acc3faefdde28919b0341c11 (IB/cma:
Accept connection without a valid netdev on RoCE)

That means cma.c needs to get the gid index every single CMA packet it
processes and confirm that the associated net device is permitted to
talk to the matching CM ID. 

It is no mistake there is a hole in cma.c waiting for this code, when
Haggai did that work it was very clear in my mind that rocev2 would
need to slot into here as well.

> Jason's objections are this:
> 
> 1)  The lazy resolution is wrong.

Wrong in the sense it doesn't actually exist in a usable form
anyplace.

cma.c does not do it, and absoultely must as discussed above.

init_ah_from_wc needs to do it, and maybe does. It is hard to tell,
perhaps a 'rdma_wc_to_dgid_index()' is actually open coded in there
now. Just from a code readability perspective that is ugly.

Then we get into the missing route handling in all places that
construct a rocev2 AH...

> Jason's preference would be that the above issues be resolved by
> skipping the lazy resolution and instead doing proactive resolution
> on

I am happy with lazy resolution, that is a fine compromise.

I just want to see kapi that makes sense here. It is very clear to me
no kernel user can possibly correctly touch a rocev2 UD packet without
retrieving the gid index, so we must have a kAPI for this.

> namespace.  Or, at a minimum, at least make the information added to the
> core API not something vendor specific like network_type, which is a
> detail of the Mellanox implementation.

I keep suggesting a rdma_wc_to_dgid_index() API call.

Perhasp most of he code for this already seems to exist in
init_ah_from_wc.

> 1 - Actually, for any received packet with associated IP address
> information.  We've only enabled net namespaces for IP connections
> between user space applications, for direct IB connections or for kernel
> connections there is not yet any namespace support.

IHMO, this is actually a problem for rocev2.

IB needs more work to create a rdma namespace, but rocve2 does not.

The kernel software side should certainly be completed as a quick
follow on to this series, that means the use of gid_indexes at all
uAPI access points needs to be checked for rocev2.

HW support is needed to complete rocve2 containment, as the hw must
check the gid_index on all directly posted WCs and *ALL* rx'd packets
for a QP to ensure it is allowed.

Some kind of warn on until that support is available would also be
great.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]                 ` <5671CBF4.4060602-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-12-16 21:25                   ` Jason Gunthorpe
@ 2015-12-17 10:04                   ` Moni Shoua
  2015-12-17 10:14                   ` Liran Liss
  2 siblings, 0 replies; 64+ messages in thread
From: Moni Shoua @ 2015-12-17 10:04 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Matan Barak, linux-rdma, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

> No, not true.  You are implementing RoCEv2 support, which is an entirely
> new feature.  So this feature can't have had a security hole since
> forever as it has never been in the kernel before now.  The objections
> are arising because of the ordering of events.  Specifically, we added
> the core namespace support (even though it isn't complete, so far it's
> the infrastructure ready for various upper portions of the stack to
> start using, but it isn't a complete stack wide solution yet) first, and
> so this new feature, which will need to be a part of that namespace
> infrastructure that other parts of the IB stack can use, should have its
> namespace support already enabled (ideally, but if it didn't, it should
> at least have a clear plan for how to enable it in the future).  Jason's
> objection is based upon this premise and the fact that a technical
> review of the code makes it look like the core namespace infrastructure
> becomes less complete, not more, with the inclusion of these patches.
>
> As I understand it, prior to these patches there would always be a 1:1
> mapping of GID to gid_index because you would never have duplicate GIDs
> in the GID table.  That allowed an easy, definitive 1:1 mapping of GID
> to namespace via the existing infrastructure for any received packet [1].
>

Incorrect. Even with RoCEv1 you can have 2 GID table entries with the same
GID value but with different VLANs. The conflict in looking for a matching index
was resolved by making the pair <GID, VLAN> as a key for this table. The value
of VLAN was obtained from the completion BTW. All we do now is widen the
definition to include network_type (or L3 protocol if you like) to the
key (and therefore
to the completion structure)

> These patches add the concept of duplicate GIDs that are differentiated
> by their RoCE version (also called network type).  So, now, an incoming
> packet could match a couple different gid_indexes and we need additional
> information to get back to the definitive 1:1 mapping.  The submitted
> patches are designed around a lazy resolution of the namespace,
> preferring to defer the work of mapping the incoming packet to a unique
> namespace until that information is actually needed.  To enable this
> lazy resolution, it provides the network_type so that the resolution can
> be done.
>
You could say that even for native IB. Now, you resolve sgid_index
only when you really
need it (with ib_init_ah_from_wc()) and you don't fill the completion
structure with gid_index
every time you poll.

> This is a fair assessment of the current state of things and what these
> patches do, yes?
>
> Jason's objections are this:
>
> 1)  The lazy resolution is wrong.
> 2)  The use of network_type as the additional information to get to the
> unique namespace is vendor specific cruft that shouldn't be part of the
> core kernel API.

Not true. This is not vendor specific information. Spec says that network_type
be a part of the completion. Honestly, I really don't understand the resistance
from implementing the spec as it is written.
>
> Jason's preference would be that the above issues be resolved by
> skipping the lazy resolution and instead doing proactive resolution on
> receipt of a packet and then probably just pass the namespace around
> instead of passing around the information needed to resolve the
> namespace.  Or, at a minimum, at least make the information added to the
> core API not something vendor specific like network_type, which is a
> detail of the Mellanox implementation.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* RE: [PATCH for-next V2 00/11] Add RoCE v2 support
       [not found]                 ` <5671CBF4.4060602-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-12-16 21:25                   ` Jason Gunthorpe
  2015-12-17 10:04                   ` Moni Shoua
@ 2015-12-17 10:14                   ` Liran Liss
  2 siblings, 0 replies; 64+ messages in thread
From: Liran Liss @ 2015-12-17 10:14 UTC (permalink / raw)
  To: Doug Ledford, Moni Shoua
  Cc: Matan Barak, linux-rdma, Eran Ben Elisha, Haggai Eran,
	Or Gerlitz, Jason Gunthorpe, Somnath Kotur

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3439 bytes --]

> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Doug Ledford

> These patches add the concept of duplicate GIDs that are differentiated by
> their RoCE version (also called network type).  So, now, an incoming packet
> could match a couple different gid_indexes and we need additional
> information to get back to the definitive 1:1 mapping.  The submitted patches
> are designed around a lazy resolution of the namespace, preferring to defer
> the work of mapping the incoming packet to a unique namespace until that
> information is actually needed.  To enable this lazy resolution, it provides the
> network_type so that the resolution can be done.
> 
> This is a fair assessment of the current state of things and what these
> patches do, yes?

Hi Doug,
Yes, your description nails the crux of the discussion.
Thanks.

> 
> Jason's objections are this:
> 
> 1)  The lazy resolution is wrong.
> 2)  The use of network_type as the additional information to get to the
> unique namespace is vendor specific cruft that shouldn't be part of the core
> kernel API.
> 

Which is totally wrong. RoCE versions and the network type attribute are part of the RoCE specification.
There is nothing vendor specific. In fact, this patch initially came from Avago!!!

> Jason's preference would be that the above issues be resolved by skipping
> the lazy resolution and instead doing proactive resolution on receipt of a
> packet and then probably just pass the namespace around instead of passing
> around the information needed to resolve the namespace.  Or, at a
> minimum, at least make the information added to the core API not
> something vendor specific like network_type, which is a detail of the
> Mellanox implementation.

Again, there is nothing here specific to the Mellanox implementation.
This argument is invalid.

> 
> Jason, is this accurate for your position?
> 
> If everyone agrees that this is a fair statement of where we stand, then I'll
> continue my response.  If not, please correct anything I have wrong above
> and I'll take that into my continued response.
> 

Since we are following the standard, and:
- This is *not* a vendor specific issue
- Lazy resolution adheres to the current flow of the stack (which IMO, is the right thing to do anyway)
- There are no security issues whatsoever (see below)
- The code could be easily extended to support namespaces in the future
- We should "release early release often", and not hold progress because of endless philosophical discussions
then let's apply these patches.

I perfectly understand Jason's proposal and its merits.
I just don't agree that it is a better approach, and in any case this is not the time to continue discussing it.

> 1 - Actually, for any received packet with associated IP address information.
> We've only enabled net namespaces for IP connections between user space
> applications, for direct IB connections or for kernel connections there is not
> yet any namespace support.

Correct.
That's why we are not risking anything in the KAPI with regard to namespaces; there is no security issue.

--Liran

> 
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
> 

N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2015-12-17 10:14 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03 13:47 [PATCH for-next V2 00/11] Add RoCE v2 support Matan Barak
     [not found] ` <1449150450-13679-1-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-03 13:47   ` [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 02/11] IB/cm: Use the source GID index type Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc Matan Barak
     [not found]     ` <1449150450-13679-6-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-03 14:05       ` Christoph Hellwig
     [not found]         ` <20151203140543.GA4283-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-12-03 16:19           ` Matan Barak
2015-12-03 16:20           ` Liran Liss
     [not found]             ` <HE1PR05MB141857FA57EECD82D1533543B10D0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-12-06 14:03               ` Christoph Hellwig
     [not found]                 ` <20151206140307.GB25487-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-12-06 14:20                   ` Moni Shoua
2015-12-07  6:02               ` Jason Gunthorpe
     [not found]                 ` <20151207060241.GA19038-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-07  6:15                   ` Moni Shoua
     [not found]                     ` <CAG9sBKO0rsSQK1WcyBciZEONWJsusw9GwR19H7FfAsebEH63dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-07  6:34                       ` Jason Gunthorpe
     [not found]                         ` <20151207063415.GB20066-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-07  6:37                           ` Moni Shoua
     [not found]                             ` <CAG9sBKMBt6Toa25Ouasr7hudAcOYrxZyBUFYMXzrMTFZvooWfQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-07 17:12                               ` Jason Gunthorpe
     [not found]                                 ` <20151207171228.GA26969-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-07 18:34                                   ` Moni Shoua
     [not found]                                     ` <CAG9sBKNXnzp7PgJhfcjbYvje5qgZWYqygJYkbs4WyTZtDc4ckg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-07 18:48                                       ` Jason Gunthorpe
     [not found]                                         ` <20151207184832.GA21402-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-08  7:23                                           ` Moni Shoua
     [not found]                                             ` <CAG9sBKNWMQA3XDGpDc_+2QoJcnnNL9o67p0JNoVqrU0u9UQCrA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-08 22:52                                               ` Jason Gunthorpe
     [not found]                                                 ` <20151208225251.GA27609-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-09  9:34                                                   ` Moni Shoua
     [not found]                                                     ` <CAG9sBKM2s9BMv8priCHnFaMcSnuafn4vf+6+0Rooc6Gw0ZSaLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-09 18:01                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20151209180129.GD31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-10  7:58                                                           ` Moni Shoua
     [not found]                                                             ` <CAG9sBKPNWsO7ugdyhd5sYp_yOSrUkG0b-Lfq6Eenrydg3MzyvQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-10 17:38                                                               ` Jason Gunthorpe
2015-12-09  9:38                                                   ` Liran Liss
     [not found]                                                     ` <HE1PR05MB1418B1F0F393AD0D92424043B1E80-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-12-09 18:09                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20151209180920.GE31636-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-10  7:53                                                           ` Moni Shoua
2015-12-13 13:56                                                           ` Liran Liss
     [not found]                                                             ` <HE1PR05MB141868A873246D76F580401FB1EC0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-12-13 14:03                                                               ` Matan Barak
2015-12-09  9:41                                           ` Moni Shoua
2015-12-07  6:21                   ` Parav Pandit
2015-12-03 16:36       ` Jason Gunthorpe
2015-12-03 13:47   ` [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 00/11] Add RoCE v2 support Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 01/11] IB/core: Add gid_type to gid attribute Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 02/11] IB/cm: Use the source GID index type Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 03/11] IB/core: Add gid attributes to sysfs Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 04/11] IB/core: Add ROCE_UDP_ENCAP (RoCE V2) type Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 05/11] IB/core: Add rdma_network_type to wc Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 06/11] IB/core: Move rdma_is_upper_dev_rcu to header file Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 07/11] IB/core: Validate route in ib_init_ah_from_wc and ib_init_ah_from_path Matan Barak
     [not found]     ` <1449150450-13679-20-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-07 13:42       ` Haggai Eran
     [not found]         ` <56658CB4.7090402-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-12-10 17:36           ` Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 08/11] IB/rdma_cm: Add wrapper for cma reference count Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 09/11] IB/cma: Add configfs for rdma_cm Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 10/11] IB/core: Initialize UD header structure with IP and UDP headers Matan Barak
2015-12-03 13:47   ` [PATCH for-next V2 11/11] IB/cma: Join and leave multicast groups with IGMP Matan Barak
2015-12-15  7:15   ` [PATCH for-next V2 00/11] Add RoCE v2 support Moni Shoua
     [not found]     ` <CAG9sBKM7j6w+Gk4PXXO+0mHpFeVaBdeNmr-z9V9fPR562o+kmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-15 21:45       ` Doug Ledford
     [not found]         ` <567089F1.30004-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-12-15 22:00           ` Jason Gunthorpe
     [not found]             ` <20151215220033.GB30404-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-12-16  9:57               ` Liran Liss
     [not found]                 ` <HE1PR05MB14188FDA12D077BA2720C84BB1EF0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-12-16 18:02                   ` Jason Gunthorpe
2015-12-16  6:56           ` Moni Shoua
     [not found]             ` <CAG9sBKNa4Hd0WVqhZCLeqXjc2zE2h+dBwF6zVgGT0R_Gt1FEwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-12-16 18:13               ` Jason Gunthorpe
2015-12-16 20:39               ` Doug Ledford
     [not found]                 ` <5671CBF4.4060602-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-12-16 21:25                   ` Jason Gunthorpe
2015-12-17 10:04                   ` Moni Shoua
2015-12-17 10:14                   ` Liran Liss
2015-12-16 14:18           ` Liran Liss

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.