linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/14] Demux IB CM requests in the rdma_cm module
@ 2015-07-30 14:50 Haggai Eran
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

I'm sending the patchset again with the rwsem patch and rebased over Doug's
to-be-rebased/for-4.3 tree.

Regards,
Haggai

Changes from v3:
- rebase over github.com/dledford/linux to-be-rebased/for-4.3
- add rwsem patch

Changes from v2:
- added missing reviewed-bys
- Patch 5: remove service_mask as a parameter from ib_cm_insert_listen()
- Patch 9:
  * move cma_req_info struct near other structs
  * put GID by value in the struct

Changes from v1:
- Patch 1: mark ib_client_data as going down instead of removing all client
  contexts during de-registration.
- Patch 2:
  * move kdoc to the function definition
  * do not call get_net_dev_by_params() on devices/clients that are going
    down
  * pass client data directly to the callback
- Patch 3:
  * pass client data directly to callback
  * fix a lockdep warning in ipoib_match_gid_pkey_addr()
  * remove a debugging print left over
  * set a rate limit to the duplicated IP address warning
- Patch 5:
  * change atomic_dec(&id->refcount) to cm_deref_id()
  * always update listen_sharecount under the cm.lock spinlock
- Patch 6: handle AF_IB requests by getting parameters from the listener
- Patch 8: new patch to expose BTH P_Key from ib_cm to rdma_cm
- Patch 9:
  * get P_Key used for de-mux from the BTH
  * use -EAFNOSUPPORT in cma_save_ip_info to designate a possible AF_IB
    connection request
  * pass a NULL netdev for AF_IB requests
- Patch 11: handle AF_IB connections by filling connection information from
  the listener id instead of from the net_dev
- Patch 12: fix mention of the old ib_cm_id_create_and_listen function in
  the changelog entry.

Changes from v0:
- Added a patch to prevent a race between ib_unregister_device() and
  ib_get_net_dev_by_params().
- Removed the patch that exported a UD GMP packet's GID from the GRH, and
  related code.
- Patch 3:
  * Add _rcu suffix to ipoib_is_dev_match_addr().
  * Add helper function to get the master netdev for bonding support.
  * Scan for matching net devices in two phases: first without looking at
  * the IP address, and then looking at the IP address only when the first
    phase did not find a unique net device.
- Patch 5:
  * Do not init listen_sharecount = 1 for non-listening ib_cm_ids.
  * Remove code that sets a CM ID's state to IB_CM_IDLE right before
    destruction.
  * Rename ib_cm_id_create_and_listen() to ib_cm_insert_listen().
  * Do not increase reference counts when failing to add a shared CM ID due
    to having a different handler callback.
- Patch 9: Clean IPv4 net_dev validation function.
- Added patch 10: new patch to use the found net_dev in IB/cma for
  eliminating unneeded calls to cma_translate_addr.
- Patch 12: Remove the lock argument to __ib_cm_listen().

The rdma_cm module relies today on the ib_cm module to demux incoming
requests based on their service ID and IP address. The ib_cm module is the
wrong place to perform this task, as it can also be used with services that
do not adhere to the RDMA IP CM service as defined in the IBA
specifications. It is forced to use an opaque private data struct and mask
to compare incoming requests against.

This series moves that demux task responsibility to the rdma_cm module. The
rdma_cm module can look into the private data attached to a CM request,
containing the IP addresses related to the request. It uses the details of
the request to find the net device associated with the request, and use
that net device to find the correct listening rdma_cm_id.

The series applies against Doug's for-v4.2 tree with the patch adding a
rwsem to IB core [2] applied.

The series is structured as follows:
Patch 1 prevents a possible race between ib_client.remove() callbacks from
ib_unregister_device(), and ib_client callbacks that rely on the
lists_rwsem locked for read, such as ib_get_net_dev_by_params(). Both
callbacks may call ib_get_client_data(), and the patch makes sure that the
remove callback doesn't free the client data while it is being used by the
other callback.

Patches 2-3 add the ability to lookup a network device according to the IB
device, port, P_Key, GID and IP address. They find the matching IPoIB
interfaces, and return a matching net_device if one exists.

Patches 4-5 make necessary changes in ib_cm to allow RDMA CM get the
information it needs out of CM and SIDR requests, and share a single
ib_cm_id with multiple RDMA CM listeners.

Patches 6-7 do some preliminary refactoring to the rdma_cm module. They
allow extracting information out of incoming requests instead of retrieving
them from a listening CM ID, and add helper functions to access the port
space IDRs.

Finally, patches 8-12 change rdma_cm to demultiplex requests on its own, and
patch 13 cleans up the now unneeded code in ib_cm to compare against the
private data.

This series contains a subset of the RDMA CM namespaces patches [1]. The
changes from v4 of the relevant patches are:
- Patch 1
  * in addition to the IB device, port, P_Key and IP address, pass
    also the GID, to make future IPoIB devices with alias GIDs to unique.
  * return the matching net_device instead of a network namespace.
- Patch 2: use IS_ENABLED(CONFIG_IPV6) without ifdefs.
- Patch 5:
  * rename sharecount -> listen_sharecount.
  * use a regular int instead of atomic for the share count, protected by
    the cm.lock spinlock.
  * change id destruction and shared listener creation to prevent the case
    where an id is found but it is under destruction.

[1] [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM
    http://www.spinics.net/lists/linux-rdma/msg25244.html
[2] [PATCH for-next V5 02/12] IB/core: Add rwsem to allow reading device list or client list
    http://www.spinics.net/lists/linux-rdma/msg25931.html

Guy Shapiro (1):
  IB/ipoib: Return IPoIB devices matching connection parameters

Haggai Eran (12):
  IB/core: Add rwsem to allow reading device list or client list
  IB/core: lock client data with lists_rwsem
  IB/cm: Expose service ID in request events
  IB/cm: Share listening CM IDs
  IB/cma: Refactor RDMA IP CM private-data parsing code
  IB/cma: Helper functions to access port space IDRs
  IB/cm: Expose BTH P_Key in CM and SIDR request events
  IB/cma: Add net_dev and private data checks to RDMA CM
  IB/cma: Validate routing of incoming requests
  IB/cma: Use found net_dev for passive connections
  IB/cma: Share ib_cm_ids between rdma_cm_ids
  IB/cm: Remove compare_data checks

Yotam Kenneth (1):
  IB/core: Find the network device matching connection parameters

 drivers/infiniband/core/cache.c           |   2 +-
 drivers/infiniband/core/cm.c              | 215 ++++++----
 drivers/infiniband/core/cma.c             | 646 ++++++++++++++++++++++--------
 drivers/infiniband/core/device.c          | 134 ++++++-
 drivers/infiniband/core/mad.c             |   2 +-
 drivers/infiniband/core/multicast.c       |   7 +-
 drivers/infiniband/core/sa_query.c        |   6 +-
 drivers/infiniband/core/ucm.c             |   9 +-
 drivers/infiniband/core/user_mad.c        |   6 +-
 drivers/infiniband/core/uverbs_main.c     |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 236 ++++++++++-
 drivers/infiniband/ulp/srp/ib_srp.c       |   6 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c     |   7 +-
 include/rdma/ib_cm.h                      |  25 +-
 include/rdma/ib_verbs.h                   |  33 +-
 net/rds/ib.c                              |   5 +-
 net/rds/iw.c                              |   5 +-
 18 files changed, 1040 insertions(+), 312 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 01/14] IB/core: Add rwsem to allow reading device list or client list
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 02/14] IB/core: lock client data with lists_rwsem Haggai Eran
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe, Matan Barak

Currently the RDMA subsystem's device list and client list are protected by
a single mutex. This prevents adding user-facing APIs that iterate these
lists, since using them may cause a deadlock. The patch attempts to solve
this problem by adding a read-write semaphore to protect the lists. Readers
now don't need the mutex, and are safe just by read-locking the semaphore.

The ib_register_device, ib_register_client, ib_unregister_device, and
ib_unregister_client functions are modified to lock the semaphore for write
during their respective list modification. Also, in order to make sure
client callbacks are called only between add() and remove() calls, the code
is changed to only add items to the lists after the add() calls and remove
from the lists before the remove() calls.

This patch attempts to solve a similar need [1] that was seen in the RoCE
v2 patch series.

[1] http://www.spinics.net/lists/linux-rdma/msg24733.html

Reviewed-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Cc: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 9567756ca4f9..f08d438205ed 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -55,17 +55,24 @@ struct ib_client_data {
 struct workqueue_struct *ib_wq;
 EXPORT_SYMBOL_GPL(ib_wq);
 
+/* The device_list and client_list contain devices and clients after their
+ * registration has completed, and the devices and clients are removed
+ * during unregistration. */
 static LIST_HEAD(device_list);
 static LIST_HEAD(client_list);
 
 /*
- * device_mutex protects access to both device_list and client_list.
- * There's no real point to using multiple locks or something fancier
- * like an rwsem: we always access both lists, and we're always
- * modifying one list or the other list.  In any case this is not a
- * hot path so there's no point in trying to optimize.
+ * device_mutex and lists_rwsem protect access to both device_list and
+ * client_list.  device_mutex protects writer access by device and client
+ * registration / de-registration.  lists_rwsem protects reader access to
+ * these lists.  Iterators of these lists must lock it for read, while updates
+ * to the lists must be done with a write lock. A special case is when the
+ * device_mutex is locked. In this case locking the lists for read access is
+ * not necessary as the device_mutex implies it.
  */
 static DEFINE_MUTEX(device_mutex);
+static DECLARE_RWSEM(lists_rwsem);
+
 
 static int ib_device_check_mandatory(struct ib_device *device)
 {
@@ -305,8 +312,6 @@ int ib_register_device(struct ib_device *device,
 		goto out;
 	}
 
-	list_add_tail(&device->core_list, &device_list);
-
 	device->reg_state = IB_DEV_REGISTERED;
 
 	{
@@ -317,6 +322,10 @@ int ib_register_device(struct ib_device *device,
 				client->add(device);
 	}
 
+	down_write(&lists_rwsem);
+	list_add_tail(&device->core_list, &device_list);
+	up_write(&lists_rwsem);
+
  out:
 	mutex_unlock(&device_mutex);
 	return ret;
@@ -337,12 +346,14 @@ void ib_unregister_device(struct ib_device *device)
 
 	mutex_lock(&device_mutex);
 
+	down_write(&lists_rwsem);
+	list_del(&device->core_list);
+	up_write(&lists_rwsem);
+
 	list_for_each_entry_reverse(client, &client_list, list)
 		if (client->remove)
 			client->remove(device);
 
-	list_del(&device->core_list);
-
 	mutex_unlock(&device_mutex);
 
 	ib_device_unregister_sysfs(device);
@@ -375,11 +386,14 @@ int ib_register_client(struct ib_client *client)
 
 	mutex_lock(&device_mutex);
 
-	list_add_tail(&client->list, &client_list);
 	list_for_each_entry(device, &device_list, core_list)
 		if (client->add && !add_client_context(device, client))
 			client->add(device);
 
+	down_write(&lists_rwsem);
+	list_add_tail(&client->list, &client_list);
+	up_write(&lists_rwsem);
+
 	mutex_unlock(&device_mutex);
 
 	return 0;
@@ -402,6 +416,10 @@ void ib_unregister_client(struct ib_client *client)
 
 	mutex_lock(&device_mutex);
 
+	down_write(&lists_rwsem);
+	list_del(&client->list);
+	up_write(&lists_rwsem);
+
 	list_for_each_entry(device, &device_list, core_list) {
 		if (client->remove)
 			client->remove(device);
@@ -414,7 +432,6 @@ void ib_unregister_client(struct ib_client *client)
 			}
 		spin_unlock_irqrestore(&device->client_data_lock, flags);
 	}
-	list_del(&client->list);
 
 	mutex_unlock(&device_mutex);
 }
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 02/14] IB/core: lock client data with lists_rwsem
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-30 14:50   ` [PATCH v4 01/14] IB/core: Add rwsem to allow reading device list or client list Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 03/14] IB/core: Find the network device matching connection parameters Haggai Eran
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.

Reviewed-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cache.c           |  2 +-
 drivers/infiniband/core/cm.c              |  7 ++--
 drivers/infiniband/core/cma.c             |  7 ++--
 drivers/infiniband/core/device.c          | 53 +++++++++++++++++++++++++------
 drivers/infiniband/core/mad.c             |  2 +-
 drivers/infiniband/core/multicast.c       |  7 ++--
 drivers/infiniband/core/sa_query.c        |  6 ++--
 drivers/infiniband/core/ucm.c             |  6 ++--
 drivers/infiniband/core/user_mad.c        |  6 ++--
 drivers/infiniband/core/uverbs_main.c     |  6 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  7 ++--
 drivers/infiniband/ulp/srp/ib_srp.c       |  6 ++--
 drivers/infiniband/ulp/srpt/ib_srpt.c     |  5 ++-
 include/rdma/ib_verbs.h                   |  4 ++-
 net/rds/ib.c                              |  5 ++-
 net/rds/iw.c                              |  5 ++-
 16 files changed, 82 insertions(+), 52 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 871da832d016..c93af66cc091 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -394,7 +394,7 @@ err:
 	kfree(device->cache.lmc_cache);
 }
 
-static void ib_cache_cleanup_one(struct ib_device *device)
+static void ib_cache_cleanup_one(struct ib_device *device, void *client_data)
 {
 	int p;
 
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 3a972ebf3c0d..82d5c4362aa8 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -58,7 +58,7 @@ MODULE_DESCRIPTION("InfiniBand CM");
 MODULE_LICENSE("Dual BSD/GPL");
 
 static void cm_add_one(struct ib_device *device);
-static void cm_remove_one(struct ib_device *device);
+static void cm_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client cm_client = {
 	.name   = "cm",
@@ -3886,9 +3886,9 @@ free:
 	kfree(cm_dev);
 }
 
-static void cm_remove_one(struct ib_device *ib_device)
+static void cm_remove_one(struct ib_device *ib_device, void *client_data)
 {
-	struct cm_device *cm_dev;
+	struct cm_device *cm_dev = client_data;
 	struct cm_port *port;
 	struct ib_port_modify port_modify = {
 		.clr_port_cap_mask = IB_PORT_CM_SUP
@@ -3896,7 +3896,6 @@ static void cm_remove_one(struct ib_device *ib_device)
 	unsigned long flags;
 	int i;
 
-	cm_dev = ib_get_client_data(ib_device, &cm_client);
 	if (!cm_dev)
 		return;
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 143ded2bbe7c..6b6cdfa5d231 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -94,7 +94,7 @@ const char *rdma_event_msg(enum rdma_cm_event_type event)
 EXPORT_SYMBOL(rdma_event_msg);
 
 static void cma_add_one(struct ib_device *device);
-static void cma_remove_one(struct ib_device *device);
+static void cma_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client cma_client = {
 	.name   = "cma",
@@ -3551,11 +3551,10 @@ static void cma_process_remove(struct cma_device *cma_dev)
 	wait_for_completion(&cma_dev->comp);
 }
 
-static void cma_remove_one(struct ib_device *device)
+static void cma_remove_one(struct ib_device *device, void *client_data)
 {
-	struct cma_device *cma_dev;
+	struct cma_device *cma_dev = client_data;
 
-	cma_dev = ib_get_client_data(device, &cma_client);
 	if (!cma_dev)
 		return;
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index f08d438205ed..623d8e191ced 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -50,6 +50,9 @@ struct ib_client_data {
 	struct list_head  list;
 	struct ib_client *client;
 	void *            data;
+	/* The device or client is going down. Do not call client or device
+	 * callbacks other than remove(). */
+	bool		  going_down;
 };
 
 struct workqueue_struct *ib_wq;
@@ -69,6 +72,8 @@ static LIST_HEAD(client_list);
  * to the lists must be done with a write lock. A special case is when the
  * device_mutex is locked. In this case locking the lists for read access is
  * not necessary as the device_mutex implies it.
+ *
+ * lists_rwsem also protects access to the client data list.
  */
 static DEFINE_MUTEX(device_mutex);
 static DECLARE_RWSEM(lists_rwsem);
@@ -210,10 +215,13 @@ static int add_client_context(struct ib_device *device, struct ib_client *client
 
 	context->client = client;
 	context->data   = NULL;
+	context->going_down = false;
 
+	down_write(&lists_rwsem);
 	spin_lock_irqsave(&device->client_data_lock, flags);
 	list_add(&context->list, &device->client_data_list);
 	spin_unlock_irqrestore(&device->client_data_lock, flags);
+	up_write(&lists_rwsem);
 
 	return 0;
 }
@@ -340,7 +348,6 @@ EXPORT_SYMBOL(ib_register_device);
  */
 void ib_unregister_device(struct ib_device *device)
 {
-	struct ib_client *client;
 	struct ib_client_data *context, *tmp;
 	unsigned long flags;
 
@@ -348,20 +355,29 @@ void ib_unregister_device(struct ib_device *device)
 
 	down_write(&lists_rwsem);
 	list_del(&device->core_list);
-	up_write(&lists_rwsem);
+	spin_lock_irqsave(&device->client_data_lock, flags);
+	list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
+		context->going_down = true;
+	spin_unlock_irqrestore(&device->client_data_lock, flags);
+	downgrade_write(&lists_rwsem);
 
-	list_for_each_entry_reverse(client, &client_list, list)
-		if (client->remove)
-			client->remove(device);
+	list_for_each_entry_safe(context, tmp, &device->client_data_list,
+				 list) {
+		if (context->client->remove)
+			context->client->remove(device, context->data);
+	}
+	up_read(&lists_rwsem);
 
 	mutex_unlock(&device_mutex);
 
 	ib_device_unregister_sysfs(device);
 
+	down_write(&lists_rwsem);
 	spin_lock_irqsave(&device->client_data_lock, flags);
 	list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
 		kfree(context);
 	spin_unlock_irqrestore(&device->client_data_lock, flags);
+	up_write(&lists_rwsem);
 
 	device->reg_state = IB_DEV_UNREGISTERED;
 }
@@ -421,16 +437,35 @@ void ib_unregister_client(struct ib_client *client)
 	up_write(&lists_rwsem);
 
 	list_for_each_entry(device, &device_list, core_list) {
-		if (client->remove)
-			client->remove(device);
+		struct ib_client_data *found_context = NULL;
 
+		down_write(&lists_rwsem);
 		spin_lock_irqsave(&device->client_data_lock, flags);
 		list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
 			if (context->client == client) {
-				list_del(&context->list);
-				kfree(context);
+				context->going_down = true;
+				found_context = context;
+				break;
 			}
 		spin_unlock_irqrestore(&device->client_data_lock, flags);
+		up_write(&lists_rwsem);
+
+		if (client->remove)
+			client->remove(device, found_context ?
+					       found_context->data : NULL);
+
+		if (!found_context) {
+			pr_warn("No client context found for %s/%s\n",
+				device->name, client->name);
+			continue;
+		}
+
+		down_write(&lists_rwsem);
+		spin_lock_irqsave(&device->client_data_lock, flags);
+		list_del(&found_context->list);
+		kfree(found_context);
+		spin_unlock_irqrestore(&device->client_data_lock, flags);
+		up_write(&lists_rwsem);
 	}
 
 	mutex_unlock(&device_mutex);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 786fc51bf04b..66b4b3eb8f67 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3335,7 +3335,7 @@ error:
 	}
 }
 
-static void ib_mad_remove_device(struct ib_device *device)
+static void ib_mad_remove_device(struct ib_device *device, void *client_data)
 {
 	int i;
 
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index 2cb865c7ce7a..d38d8b2b2979 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -43,7 +43,7 @@
 #include "sa.h"
 
 static void mcast_add_one(struct ib_device *device);
-static void mcast_remove_one(struct ib_device *device);
+static void mcast_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client mcast_client = {
 	.name   = "ib_multicast",
@@ -840,13 +840,12 @@ static void mcast_add_one(struct ib_device *device)
 	ib_register_event_handler(&dev->event_handler);
 }
 
-static void mcast_remove_one(struct ib_device *device)
+static void mcast_remove_one(struct ib_device *device, void *client_data)
 {
-	struct mcast_device *dev;
+	struct mcast_device *dev = client_data;
 	struct mcast_port *port;
 	int i;
 
-	dev = ib_get_client_data(device, &mcast_client);
 	if (!dev)
 		return;
 
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 9ded4f49bf3f..70ceec4df02a 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -130,7 +130,7 @@ static struct workqueue_struct *ib_nl_wq;
 static struct delayed_work ib_nl_timed_work;
 
 static void ib_sa_add_one(struct ib_device *device);
-static void ib_sa_remove_one(struct ib_device *device);
+static void ib_sa_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client sa_client = {
 	.name   = "sa",
@@ -1660,9 +1660,9 @@ free:
 	return;
 }
 
-static void ib_sa_remove_one(struct ib_device *device)
+static void ib_sa_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
+	struct ib_sa_device *sa_dev = client_data;
 	int i;
 
 	if (!sa_dev)
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 009481073644..8cde48b96f19 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -109,7 +109,7 @@ enum {
 #define IB_UCM_BASE_DEV MKDEV(IB_UCM_MAJOR, IB_UCM_BASE_MINOR)
 
 static void ib_ucm_add_one(struct ib_device *device);
-static void ib_ucm_remove_one(struct ib_device *device);
+static void ib_ucm_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client ucm_client = {
 	.name   = "ucm",
@@ -1310,9 +1310,9 @@ err:
 	return;
 }
 
-static void ib_ucm_remove_one(struct ib_device *device)
+static void ib_ucm_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_ucm_device *ucm_dev = ib_get_client_data(device, &ucm_client);
+	struct ib_ucm_device *ucm_dev = client_data;
 
 	if (!ucm_dev)
 		return;
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 35567fffaa4e..57f281f8d686 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -133,7 +133,7 @@ static DEFINE_SPINLOCK(port_lock);
 static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS);
 
 static void ib_umad_add_one(struct ib_device *device);
-static void ib_umad_remove_one(struct ib_device *device);
+static void ib_umad_remove_one(struct ib_device *device, void *client_data);
 
 static void ib_umad_release_dev(struct kobject *kobj)
 {
@@ -1322,9 +1322,9 @@ free:
 	kobject_put(&umad_dev->kobj);
 }
 
-static void ib_umad_remove_one(struct ib_device *device)
+static void ib_umad_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_umad_device *umad_dev = ib_get_client_data(device, &umad_client);
+	struct ib_umad_device *umad_dev = client_data;
 	int i;
 
 	if (!umad_dev)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index d7a2c30fabbf..05e3e9b7db7b 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -130,7 +130,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
-static void ib_uverbs_remove_one(struct ib_device *device);
+static void ib_uverbs_remove_one(struct ib_device *device, void *client_data);
 
 static void ib_uverbs_comp_dev(struct kref *ref)
 {
@@ -1208,9 +1208,9 @@ static void ib_uverbs_free_hw_resources(struct ib_uverbs_device *uverbs_dev,
 	mutex_unlock(&uverbs_dev->lists_mutex);
 }
 
-static void ib_uverbs_remove_one(struct ib_device *device)
+static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_uverbs_device *uverbs_dev = ib_get_client_data(device, &uverbs_client);
+	struct ib_uverbs_device *uverbs_dev = client_data;
 	int wait_clients = 1;
 
 	if (!uverbs_dev)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b2943c84a5dd..cca1a0c91ec4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -89,7 +89,7 @@ struct workqueue_struct *ipoib_workqueue;
 struct ib_sa_client ipoib_sa_client;
 
 static void ipoib_add_one(struct ib_device *device);
-static void ipoib_remove_one(struct ib_device *device);
+static void ipoib_remove_one(struct ib_device *device, void *client_data);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
 
 static struct ib_client ipoib_client = {
@@ -1715,12 +1715,11 @@ static void ipoib_add_one(struct ib_device *device)
 	ib_set_client_data(device, &ipoib_client, dev_list);
 }
 
-static void ipoib_remove_one(struct ib_device *device)
+static void ipoib_remove_one(struct ib_device *device, void *client_data)
 {
 	struct ipoib_dev_priv *priv, *tmp;
-	struct list_head *dev_list;
+	struct list_head *dev_list = client_data;
 
-	dev_list = ib_get_client_data(device, &ipoib_client);
 	if (!dev_list)
 		return;
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 31a20b462266..7755df444cfd 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -131,7 +131,7 @@ MODULE_PARM_DESC(ch_count,
 		 "Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA.");
 
 static void srp_add_one(struct ib_device *device);
-static void srp_remove_one(struct ib_device *device);
+static void srp_remove_one(struct ib_device *device, void *client_data);
 static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
 static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
@@ -3460,13 +3460,13 @@ free_attr:
 	kfree(dev_attr);
 }
 
-static void srp_remove_one(struct ib_device *device)
+static void srp_remove_one(struct ib_device *device, void *client_data)
 {
 	struct srp_device *srp_dev;
 	struct srp_host *host, *tmp_host;
 	struct srp_target_port *target;
 
-	srp_dev = ib_get_client_data(device, &srp_client);
+	srp_dev = client_data;
 	if (!srp_dev)
 		return;
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 60ff0a2390e5..4c59ceb40fff 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -3326,12 +3326,11 @@ err:
 /**
  * srpt_remove_one() - InfiniBand device removal callback function.
  */
-static void srpt_remove_one(struct ib_device *device)
+static void srpt_remove_one(struct ib_device *device, void *client_data)
 {
-	struct srpt_device *sdev;
+	struct srpt_device *sdev = client_data;
 	int i;
 
-	sdev = ib_get_client_data(device, &srpt_client);
 	if (!sdev) {
 		pr_info("%s(%s): nothing to do.\n", __func__, device->name);
 		return;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 080a976cafa0..aaa5d2217ab5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1550,6 +1550,8 @@ struct ib_device {
 
 	spinlock_t                    client_data_lock;
 	struct list_head              core_list;
+	/* Access to the client_data_list is protected by the client_data_lock
+	 * spinlock and the lists_rwsem read-write semaphore */
 	struct list_head              client_data_list;
 
 	struct ib_cache               cache;
@@ -1762,7 +1764,7 @@ struct ib_device {
 struct ib_client {
 	char  *name;
 	void (*add)   (struct ib_device *);
-	void (*remove)(struct ib_device *);
+	void (*remove)(struct ib_device *, void *client_data);
 
 	struct list_head list;
 };
diff --git a/net/rds/ib.c b/net/rds/ib.c
index ba2dffeff608..348ac37c1161 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -230,11 +230,10 @@ struct rds_ib_device *rds_ib_get_client_data(struct ib_device *device)
  *
  * This can be called at any time and can be racing with any other RDS path.
  */
-static void rds_ib_remove_one(struct ib_device *device)
+static void rds_ib_remove_one(struct ib_device *device, void *client_data)
 {
-	struct rds_ib_device *rds_ibdev;
+	struct rds_ib_device *rds_ibdev = client_data;
 
-	rds_ibdev = ib_get_client_data(device, &rds_ib_client);
 	if (!rds_ibdev)
 		return;
 
diff --git a/net/rds/iw.c b/net/rds/iw.c
index 589935661d66..7cc2f32a0cb3 100644
--- a/net/rds/iw.c
+++ b/net/rds/iw.c
@@ -125,12 +125,11 @@ free_attr:
 	kfree(dev_attr);
 }
 
-static void rds_iw_remove_one(struct ib_device *device)
+static void rds_iw_remove_one(struct ib_device *device, void *client_data)
 {
-	struct rds_iw_device *rds_iwdev;
+	struct rds_iw_device *rds_iwdev = client_data;
 	struct rds_iw_cm_id *i_cm_id, *next;
 
-	rds_iwdev = ib_get_client_data(device, &rds_iw_client);
 	if (!rds_iwdev)
 		return;
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 03/14] IB/core: Find the network device matching connection parameters
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-30 14:50   ` [PATCH v4 01/14] IB/core: Add rwsem to allow reading device list or client list Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 02/14] IB/core: lock client data with lists_rwsem Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 04/14] IB/ipoib: Return IPoIB devices " Haggai Eran
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe, Yotam Kenneth, Shachar Raindel, Guy Shapiro

From: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB.  To resolve the device in these
cases the code will also take the IP address as an additional input.

Reviewed-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/device.c | 46 ++++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h          | 27 +++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 623d8e191ced..124597732fe7 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -38,6 +38,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
+#include <linux/netdevice.h>
 #include <rdma/rdma_netlink.h>
 
 #include "core_priv.h"
@@ -781,6 +782,51 @@ int ib_find_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_pkey);
 
+/**
+ * ib_get_net_dev_by_params() - Return the appropriate net_dev
+ * for a received CM request
+ * @dev:	An RDMA device on which the request has been received.
+ * @port:	Port number on the RDMA device.
+ * @pkey:	The Pkey the request came on.
+ * @gid:	A GID that the net_dev uses to communicate.
+ * @addr:	Contains the IP address that the request specified as its
+ *		destination.
+ */
+struct net_device *ib_get_net_dev_by_params(struct ib_device *dev,
+					    u8 port,
+					    u16 pkey,
+					    const union ib_gid *gid,
+					    const struct sockaddr *addr)
+{
+	struct net_device *net_dev = NULL;
+	struct ib_client_data *context;
+
+	if (!rdma_protocol_ib(dev, port))
+		return NULL;
+
+	down_read(&lists_rwsem);
+
+	list_for_each_entry(context, &dev->client_data_list, list) {
+		struct ib_client *client = context->client;
+
+		if (context->going_down)
+			continue;
+
+		if (client->get_net_dev_by_params) {
+			net_dev = client->get_net_dev_by_params(dev, port, pkey,
+								gid, addr,
+								context->data);
+			if (net_dev)
+				break;
+		}
+	}
+
+	up_read(&lists_rwsem);
+
+	return net_dev;
+}
+EXPORT_SYMBOL(ib_get_net_dev_by_params);
+
 static int __init ib_core_init(void)
 {
 	int ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index aaa5d2217ab5..5c68f8c1c31a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -48,6 +48,7 @@
 #include <linux/rwsem.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <linux/socket.h>
 #include <uapi/linux/if_ether.h>
 
 #include <linux/atomic.h>
@@ -1766,6 +1767,28 @@ struct ib_client {
 	void (*add)   (struct ib_device *);
 	void (*remove)(struct ib_device *, void *client_data);
 
+	/* Returns the net_dev belonging to this ib_client and matching the
+	 * given parameters.
+	 * @dev:	 An RDMA device that the net_dev use for communication.
+	 * @port:	 A physical port number on the RDMA device.
+	 * @pkey:	 P_Key that the net_dev uses if applicable.
+	 * @gid:	 A GID that the net_dev uses to communicate.
+	 * @addr:	 An IP address the net_dev is configured with.
+	 * @client_data: The device's client data set by ib_set_client_data().
+	 *
+	 * An ib_client that implements a net_dev on top of RDMA devices
+	 * (such as IP over IB) should implement this callback, allowing the
+	 * rdma_cm module to find the right net_dev for a given request.
+	 *
+	 * The caller is responsible for calling dev_put on the returned
+	 * netdev. */
+	struct net_device *(*get_net_dev_by_params)(
+			struct ib_device *dev,
+			u8 port,
+			u16 pkey,
+			const union ib_gid *gid,
+			const struct sockaddr *addr,
+			void *client_data);
 	struct list_head list;
 };
 
@@ -3015,4 +3038,8 @@ static inline int ib_check_mr_access(int flags)
 int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
 		       struct ib_mr_status *mr_status);
 
+struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
+					    u16 pkey, const union ib_gid *gid,
+					    const struct sockaddr *addr);
+
 #endif /* IB_VERBS_H */
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 04/14] IB/ipoib: Return IPoIB devices matching connection parameters
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 03/14] IB/core: Find the network device matching connection parameters Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 05/14] IB/cm: Expose service ID in request events Haggai Eran
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe, Guy Shapiro, Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.

For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 229 +++++++++++++++++++++++++++++-
 1 file changed, 228 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cca1a0c91ec4..36536ce5a3e2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,9 @@
 
 #include <linux/jhash.h>
 #include <net/arp.h>
+#include <net/addrconf.h>
+#include <linux/inetdevice.h>
+#include <rdma/ib_cache.h>
 
 #define DRV_VERSION "1.0.0"
 
@@ -91,11 +94,16 @@ struct ib_sa_client ipoib_sa_client;
 static void ipoib_add_one(struct ib_device *device);
 static void ipoib_remove_one(struct ib_device *device, void *client_data);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
+static struct net_device *ipoib_get_net_dev_by_params(
+		struct ib_device *dev, u8 port, u16 pkey,
+		const union ib_gid *gid, const struct sockaddr *addr,
+		void *client_data);
 
 static struct ib_client ipoib_client = {
 	.name   = "ipoib",
 	.add    = ipoib_add_one,
-	.remove = ipoib_remove_one
+	.remove = ipoib_remove_one,
+	.get_net_dev_by_params = ipoib_get_net_dev_by_params,
 };
 
 int ipoib_open(struct net_device *dev)
@@ -222,6 +230,225 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+/* Called with an RCU read lock taken */
+static bool ipoib_is_dev_match_addr_rcu(const struct sockaddr *addr,
+					struct net_device *dev)
+{
+	struct net *net = dev_net(dev);
+	struct in_device *in_dev;
+	struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+	struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
+	__be32 ret_addr;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		in_dev = in_dev_get(dev);
+		if (!in_dev)
+			return false;
+
+		ret_addr = inet_confirm_addr(net, in_dev, 0,
+					     addr_in->sin_addr.s_addr,
+					     RT_SCOPE_HOST);
+		in_dev_put(in_dev);
+		if (ret_addr)
+			return true;
+
+		break;
+	case AF_INET6:
+		if (IS_ENABLED(CONFIG_IPV6) &&
+		    ipv6_chk_addr(net, &addr_in6->sin6_addr, dev, 1))
+			return true;
+
+		break;
+	}
+	return false;
+}
+
+/**
+ * Find the master net_device on top of the given net_device.
+ * @dev: base IPoIB net_device
+ *
+ * Returns the master net_device with a reference held, or the same net_device
+ * if no master exists.
+ */
+static struct net_device *ipoib_get_master_net_dev(struct net_device *dev)
+{
+	struct net_device *master;
+
+	rcu_read_lock();
+	master = netdev_master_upper_dev_get_rcu(dev);
+	if (master)
+		dev_hold(master);
+	rcu_read_unlock();
+
+	if (master)
+		return master;
+
+	dev_hold(dev);
+	return dev;
+}
+
+/**
+ * Find a net_device matching the given address, which is an upper device of
+ * the given net_device.
+ * @addr: IP address to look for.
+ * @dev: base IPoIB net_device
+ *
+ * If found, returns the net_device with a reference held. Otherwise return
+ * NULL.
+ */
+static struct net_device *ipoib_get_net_dev_match_addr(
+		const struct sockaddr *addr, struct net_device *dev)
+{
+	struct net_device *upper,
+			  *result = NULL;
+	struct list_head *iter;
+
+	rcu_read_lock();
+	if (ipoib_is_dev_match_addr_rcu(addr, dev)) {
+		dev_hold(dev);
+		result = dev;
+		goto out;
+	}
+
+	netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
+		if (ipoib_is_dev_match_addr_rcu(addr, upper)) {
+			dev_hold(upper);
+			result = upper;
+			break;
+		}
+	}
+out:
+	rcu_read_unlock();
+	return result;
+}
+
+/* returns the number of IPoIB netdevs on top a given ipoib device matching a
+ * pkey_index and address, if one exists.
+ *
+ * @found_net_dev: contains a matching net_device if the return value >= 1,
+ * with a reference held. */
+static int ipoib_match_gid_pkey_addr(struct ipoib_dev_priv *priv,
+				     const union ib_gid *gid,
+				     u16 pkey_index,
+				     const struct sockaddr *addr,
+				     int nesting,
+				     struct net_device **found_net_dev)
+{
+	struct ipoib_dev_priv *child_priv;
+	struct net_device *net_dev = NULL;
+	int matches = 0;
+
+	if (priv->pkey_index == pkey_index &&
+	    (!gid || !memcmp(gid, &priv->local_gid, sizeof(*gid)))) {
+		if (!addr) {
+			net_dev = ipoib_get_master_net_dev(priv->dev);
+		} else {
+			/* Verify the net_device matches the IP address, as
+			 * IPoIB child devices currently share a GID. */
+			net_dev = ipoib_get_net_dev_match_addr(addr, priv->dev);
+		}
+		if (net_dev) {
+			if (!*found_net_dev)
+				*found_net_dev = net_dev;
+			else
+				dev_put(net_dev);
+			++matches;
+		}
+	}
+
+	/* Check child interfaces */
+	down_read_nested(&priv->vlan_rwsem, nesting);
+	list_for_each_entry(child_priv, &priv->child_intfs, list) {
+		matches += ipoib_match_gid_pkey_addr(child_priv, gid,
+						    pkey_index, addr,
+						    nesting + 1,
+						    found_net_dev);
+		if (matches > 1)
+			break;
+	}
+	up_read(&priv->vlan_rwsem);
+
+	return matches;
+}
+
+/* Returns the number of matching net_devs found (between 0 and 2). Also
+ * return the matching net_device in the @net_dev parameter, holding a
+ * reference to the net_device, if the number of matches >= 1 */
+static int __ipoib_get_net_dev_by_params(struct list_head *dev_list, u8 port,
+					 u16 pkey_index,
+					 const union ib_gid *gid,
+					 const struct sockaddr *addr,
+					 struct net_device **net_dev)
+{
+	struct ipoib_dev_priv *priv;
+	int matches = 0;
+
+	*net_dev = NULL;
+
+	list_for_each_entry(priv, dev_list, list) {
+		if (priv->port != port)
+			continue;
+
+		matches += ipoib_match_gid_pkey_addr(priv, gid, pkey_index,
+						     addr, 0, net_dev);
+		if (matches > 1)
+			break;
+	}
+
+	return matches;
+}
+
+static struct net_device *ipoib_get_net_dev_by_params(
+		struct ib_device *dev, u8 port, u16 pkey,
+		const union ib_gid *gid, const struct sockaddr *addr,
+		void *client_data)
+{
+	struct net_device *net_dev;
+	struct list_head *dev_list = client_data;
+	u16 pkey_index;
+	int matches;
+	int ret;
+
+	if (!rdma_protocol_ib(dev, port))
+		return NULL;
+
+	ret = ib_find_cached_pkey(dev, port, pkey, &pkey_index);
+	if (ret)
+		return NULL;
+
+	if (!dev_list)
+		return NULL;
+
+	/* See if we can find a unique device matching the L2 parameters */
+	matches = __ipoib_get_net_dev_by_params(dev_list, port, pkey_index,
+						gid, NULL, &net_dev);
+
+	switch (matches) {
+	case 0:
+		return NULL;
+	case 1:
+		return net_dev;
+	}
+
+	dev_put(net_dev);
+
+	/* Couldn't find a unique device with L2 parameters only. Use L3
+	 * address to uniquely match the net device */
+	matches = __ipoib_get_net_dev_by_params(dev_list, port, pkey_index,
+						gid, addr, &net_dev);
+	switch (matches) {
+	case 0:
+		return NULL;
+	default:
+		dev_warn_ratelimited(&dev->dev,
+				     "duplicate IP address detected\n");
+		/* Fall through */
+	case 1:
+		return net_dev;
+	}
+}
+
 int ipoib_set_mode(struct net_device *dev, const char *buf)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 05/14] IB/cm: Expose service ID in request events
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 04/14] IB/ipoib: Return IPoIB devices " Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 06/14] IB/cm: Share listening CM IDs Haggai Eran
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.

Acked-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c | 3 +++
 include/rdma/ib_cm.h         | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 82d5c4362aa8..93e9e2f34fc6 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1268,6 +1268,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 	primary_path->packet_life_time =
 		cm_req_get_primary_local_ack_timeout(req_msg);
 	primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
+	primary_path->service_id = req_msg->service_id;
 
 	if (req_msg->alt_local_lid) {
 		memset(alt_path, 0, sizeof *alt_path);
@@ -1289,6 +1290,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 		alt_path->packet_life_time =
 			cm_req_get_alt_local_ack_timeout(req_msg);
 		alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
+		alt_path->service_id = req_msg->service_id;
 	}
 }
 
@@ -2992,6 +2994,7 @@ static void cm_format_sidr_req_event(struct cm_work *work,
 	param = &work->cm_event.param.sidr_req_rcvd;
 	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
 	param->listen_id = listen_id;
+	param->service_id = sidr_req_msg->service_id;
 	param->port = work->port->port_num;
 	work->cm_event.private_data = &sidr_req_msg->private_data;
 }
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 39ed2d2fbd51..1b567bbc3ad4 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -223,6 +223,7 @@ struct ib_cm_apr_event_param {
 
 struct ib_cm_sidr_req_event_param {
 	struct ib_cm_id		*listen_id;
+	__be64			service_id;
 	u8			port;
 	u16			pkey;
 };
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 06/14] IB/cm: Share listening CM IDs
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 05/14] IB/cm: Expose service ID in request events Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 07/14] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.

This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].

[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
    http://www.spinics.net/lists/netdev/msg328860.html

Reviewed-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c | 122 ++++++++++++++++++++++++++++++++++++++++---
 include/rdma/ib_cm.h         |   4 ++
 2 files changed, 120 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 93e9e2f34fc6..fa3d3e755127 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -213,6 +213,9 @@ struct cm_id_private {
 	spinlock_t lock;	/* Do not acquire inside cm.lock */
 	struct completion comp;
 	atomic_t refcount;
+	/* Number of clients sharing this ib_cm_id. Only valid for listeners.
+	 * Protected by the cm.lock spinlock. */
+	int listen_sharecount;
 
 	struct ib_mad_send_buf *msg;
 	struct cm_timewait_info *timewait_info;
@@ -859,9 +862,15 @@ retest:
 	spin_lock_irq(&cm_id_priv->lock);
 	switch (cm_id->state) {
 	case IB_CM_LISTEN:
-		cm_id->state = IB_CM_IDLE;
 		spin_unlock_irq(&cm_id_priv->lock);
+
 		spin_lock_irq(&cm.lock);
+		if (--cm_id_priv->listen_sharecount > 0) {
+			/* The id is still shared. */
+			cm_deref_id(cm_id_priv);
+			spin_unlock_irq(&cm.lock);
+			return;
+		}
 		rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
 		spin_unlock_irq(&cm.lock);
 		break;
@@ -941,11 +950,32 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
 }
 EXPORT_SYMBOL(ib_destroy_cm_id);
 
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
-		 struct ib_cm_compare_data *compare_data)
+/**
+ * __ib_cm_listen - Initiates listening on the specified service ID for
+ *   connection and service ID resolution requests.
+ * @cm_id: Connection identifier associated with the listen request.
+ * @service_id: Service identifier matched against incoming connection
+ *   and service ID resolution requests.  The service ID should be specified
+ *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
+ *   assign a service ID to the caller.
+ * @service_mask: Mask applied to service ID used to listen across a
+ *   range of service IDs.  If set to 0, the service ID is matched
+ *   exactly.  This parameter is ignored if %service_id is set to
+ *   IB_CM_ASSIGN_SERVICE_ID.
+ * @compare_data: This parameter is optional.  It specifies data that must
+ *   appear in the private data of a connection request for the specified
+ *   listen request.
+ * @lock: If set, lock the cm.lock spin-lock when adding the id to the
+ *   listener tree. When false, the caller must already hold the spin-lock,
+ *   and compare_data must be NULL.
+ */
+static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
+			  __be64 service_mask,
+			  struct ib_cm_compare_data *compare_data,
+			  bool lock)
 {
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
-	unsigned long flags;
+	unsigned long flags = 0;
 	int ret = 0;
 
 	service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
@@ -970,8 +1000,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
 	}
 
 	cm_id->state = IB_CM_LISTEN;
+	if (lock)
+		spin_lock_irqsave(&cm.lock, flags);
 
-	spin_lock_irqsave(&cm.lock, flags);
+	++cm_id_priv->listen_sharecount;
 	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
 		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
 		cm_id->service_mask = ~cpu_to_be64(0);
@@ -980,18 +1012,96 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
 		cm_id->service_mask = service_mask;
 	}
 	cur_cm_id_priv = cm_insert_listen(cm_id_priv);
-	spin_unlock_irqrestore(&cm.lock, flags);
 
 	if (cur_cm_id_priv) {
 		cm_id->state = IB_CM_IDLE;
+		--cm_id_priv->listen_sharecount;
 		kfree(cm_id_priv->compare_data);
 		cm_id_priv->compare_data = NULL;
 		ret = -EBUSY;
 	}
+
+	if (lock)
+		spin_unlock_irqrestore(&cm.lock, flags);
+
 	return ret;
 }
+
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
+		 struct ib_cm_compare_data *compare_data)
+{
+	return __ib_cm_listen(cm_id, service_id, service_mask, compare_data,
+			      true);
+}
 EXPORT_SYMBOL(ib_cm_listen);
 
+/**
+ * Create a new listening ib_cm_id and listen on the given service ID.
+ *
+ * If there's an existing ID listening on that same device and service ID,
+ * return it.
+ *
+ * @device: Device associated with the cm_id.  All related communication will
+ * be associated with the specified device.
+ * @cm_handler: Callback invoked to notify the user of CM events.
+ * @service_id: Service identifier matched against incoming connection
+ *   and service ID resolution requests.  The service ID should be specified
+ *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
+ *   assign a service ID to the caller.
+ *
+ * Callers should call ib_destroy_cm_id when done with the listener ID.
+ */
+struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
+				     ib_cm_handler cm_handler,
+				     __be64 service_id)
+{
+	struct cm_id_private *cm_id_priv;
+	struct ib_cm_id *cm_id;
+	unsigned long flags;
+	int err = 0;
+
+	/* Create an ID in advance, since the creation may sleep */
+	cm_id = ib_create_cm_id(device, cm_handler, NULL);
+	if (IS_ERR(cm_id))
+		return cm_id;
+
+	spin_lock_irqsave(&cm.lock, flags);
+
+	if (service_id == IB_CM_ASSIGN_SERVICE_ID)
+		goto new_id;
+
+	/* Find an existing ID */
+	cm_id_priv = cm_find_listen(device, service_id, NULL);
+	if (cm_id_priv) {
+		if (cm_id->cm_handler != cm_handler || cm_id->context) {
+			/* Sharing an ib_cm_id with different handlers is not
+			 * supported */
+			spin_unlock_irqrestore(&cm.lock, flags);
+			return ERR_PTR(-EINVAL);
+		}
+		atomic_inc(&cm_id_priv->refcount);
+		++cm_id_priv->listen_sharecount;
+		spin_unlock_irqrestore(&cm.lock, flags);
+
+		ib_destroy_cm_id(cm_id);
+		cm_id = &cm_id_priv->id;
+		return cm_id;
+	}
+
+new_id:
+	/* Use newly created ID */
+	err = __ib_cm_listen(cm_id, service_id, 0, NULL, false);
+
+	spin_unlock_irqrestore(&cm.lock, flags);
+
+	if (err) {
+		ib_destroy_cm_id(cm_id);
+		return ERR_PTR(err);
+	}
+	return cm_id;
+}
+EXPORT_SYMBOL(ib_cm_insert_listen);
+
 static __be64 cm_form_tid(struct cm_id_private *cm_id_priv,
 			  enum cm_msg_sequence msg_seq)
 {
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 1b567bbc3ad4..9cc496e1f2ad 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -362,6 +362,10 @@ struct ib_cm_compare_data {
 int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
 		 struct ib_cm_compare_data *compare_data);
 
+struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
+				     ib_cm_handler cm_handler,
+				     __be64 service_id);
+
 struct ib_cm_req_param {
 	struct ib_sa_path_rec	*primary_path;
 	struct ib_sa_path_rec	*alternate_path;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 07/14] IB/cma: Refactor RDMA IP CM private-data parsing code
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 06/14] IB/cm: Share listening CM IDs Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 08/14] IB/cma: Helper functions to access port space IDRs Haggai Eran
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe, Guy Shapiro, Yotam Kenneth, Shachar Raindel

When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.

When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 170 ++++++++++++++++++++++++++----------------
 1 file changed, 105 insertions(+), 65 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6b6cdfa5d231..cf5c48b0b7d5 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -870,107 +870,138 @@ static inline int cma_any_port(struct sockaddr *addr)
 	return !cma_port(addr);
 }
 
-static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
+static void cma_save_ib_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct rdma_cm_id *listen_id,
 			     struct ib_sa_path_rec *path)
 {
 	struct sockaddr_ib *listen_ib, *ib;
 
 	listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
-	ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
-	ib->sib_family = listen_ib->sib_family;
-	if (path) {
-		ib->sib_pkey = path->pkey;
-		ib->sib_flowinfo = path->flow_label;
-		memcpy(&ib->sib_addr, &path->sgid, 16);
-	} else {
-		ib->sib_pkey = listen_ib->sib_pkey;
-		ib->sib_flowinfo = listen_ib->sib_flowinfo;
-		ib->sib_addr = listen_ib->sib_addr;
-	}
-	ib->sib_sid = listen_ib->sib_sid;
-	ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
-	ib->sib_scope_id = listen_ib->sib_scope_id;
-
-	if (path) {
-		ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
-		ib->sib_family = listen_ib->sib_family;
-		ib->sib_pkey = path->pkey;
-		ib->sib_flowinfo = path->flow_label;
-		memcpy(&ib->sib_addr, &path->dgid, 16);
+	if (src_addr) {
+		ib = (struct sockaddr_ib *)src_addr;
+		ib->sib_family = AF_IB;
+		if (path) {
+			ib->sib_pkey = path->pkey;
+			ib->sib_flowinfo = path->flow_label;
+			memcpy(&ib->sib_addr, &path->sgid, 16);
+			ib->sib_sid = path->service_id;
+			ib->sib_scope_id = 0;
+		} else {
+			ib->sib_pkey = listen_ib->sib_pkey;
+			ib->sib_flowinfo = listen_ib->sib_flowinfo;
+			ib->sib_addr = listen_ib->sib_addr;
+			ib->sib_sid = listen_ib->sib_sid;
+			ib->sib_scope_id = listen_ib->sib_scope_id;
+		}
+		ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
+	}
+	if (dst_addr) {
+		ib = (struct sockaddr_ib *)dst_addr;
+		ib->sib_family = AF_IB;
+		if (path) {
+			ib->sib_pkey = path->pkey;
+			ib->sib_flowinfo = path->flow_label;
+			memcpy(&ib->sib_addr, &path->dgid, 16);
+		}
 	}
 }
 
-static __be16 ss_get_port(const struct sockaddr_storage *ss)
-{
-	if (ss->ss_family == AF_INET)
-		return ((struct sockaddr_in *)ss)->sin_port;
-	else if (ss->ss_family == AF_INET6)
-		return ((struct sockaddr_in6 *)ss)->sin6_port;
-	BUG();
-}
-
-static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
+static void cma_save_ip4_info(struct sockaddr *src_addr,
+			      struct sockaddr *dst_addr,
+			      struct cma_hdr *hdr,
+			      __be16 local_port)
 {
 	struct sockaddr_in *ip4;
 
-	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
-	ip4->sin_port = ss_get_port(&listen_id->route.addr.src_addr);
+	if (src_addr) {
+		ip4 = (struct sockaddr_in *)src_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
+		ip4->sin_port = local_port;
+	}
 
-	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
-	ip4->sin_port = hdr->port;
+	if (dst_addr) {
+		ip4 = (struct sockaddr_in *)dst_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
+		ip4->sin_port = hdr->port;
+	}
 }
 
-static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
+static void cma_save_ip6_info(struct sockaddr *src_addr,
+			      struct sockaddr *dst_addr,
+			      struct cma_hdr *hdr,
+			      __be16 local_port)
 {
 	struct sockaddr_in6 *ip6;
 
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->dst_addr.ip6;
-	ip6->sin6_port = ss_get_port(&listen_id->route.addr.src_addr);
+	if (src_addr) {
+		ip6 = (struct sockaddr_in6 *)src_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->dst_addr.ip6;
+		ip6->sin6_port = local_port;
+	}
 
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->src_addr.ip6;
-	ip6->sin6_port = hdr->port;
+	if (dst_addr) {
+		ip6 = (struct sockaddr_in6 *)dst_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->src_addr.ip6;
+		ip6->sin6_port = hdr->port;
+	}
 }
 
-static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			     struct ib_cm_event *ib_event)
+static u16 cma_port_from_service_id(__be64 service_id)
 {
-	struct cma_hdr *hdr;
+	return (u16)be64_to_cpu(service_id);
+}
 
-	if (listen_id->route.addr.src_addr.ss_family == AF_IB) {
-		if (ib_event->event == IB_CM_REQ_RECEIVED)
-			cma_save_ib_info(id, listen_id, ib_event->param.req_rcvd.primary_path);
-		else if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED)
-			cma_save_ib_info(id, listen_id, NULL);
-		return 0;
-	}
+static int cma_save_ip_info(struct sockaddr *src_addr,
+			    struct sockaddr *dst_addr,
+			    struct ib_cm_event *ib_event,
+			    __be64 service_id)
+{
+	struct cma_hdr *hdr;
+	__be16 port;
 
 	hdr = ib_event->private_data;
 	if (hdr->cma_version != CMA_VERSION)
 		return -EINVAL;
 
+	port = htons(cma_port_from_service_id(service_id));
+
 	switch (cma_get_ip_ver(hdr)) {
 	case 4:
-		cma_save_ip4_info(id, listen_id, hdr);
+		cma_save_ip4_info(src_addr, dst_addr, hdr, port);
 		break;
 	case 6:
-		cma_save_ip6_info(id, listen_id, hdr);
+		cma_save_ip6_info(src_addr, dst_addr, hdr, port);
 		break;
 	default:
 		return -EINVAL;
 	}
+
 	return 0;
 }
 
+static int cma_save_net_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct rdma_cm_id *listen_id,
+			     struct ib_cm_event *ib_event,
+			     sa_family_t sa_family, __be64 service_id)
+{
+	if (sa_family == AF_IB) {
+		if (ib_event->event == IB_CM_REQ_RECEIVED)
+			cma_save_ib_info(src_addr, dst_addr, listen_id,
+					 ib_event->param.req_rcvd.primary_path);
+		else if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED)
+			cma_save_ib_info(src_addr, dst_addr, listen_id, NULL);
+		return 0;
+	}
+
+	return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
+}
+
 static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
 {
 	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
@@ -1221,6 +1252,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
 	struct rdma_route *rt;
+	const sa_family_t ss_family = listen_id->route.addr.src_addr.ss_family;
+	const __be64 service_id =
+		      ib_event->param.req_rcvd.primary_path->service_id;
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
@@ -1229,7 +1263,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cma_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			      (struct sockaddr *)&id->route.addr.dst_addr,
+			      listen_id, ib_event, ss_family, service_id))
 		goto err;
 
 	rt = &id->route;
@@ -1267,6 +1303,7 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
+	const sa_family_t ss_family = listen_id->route.addr.src_addr.ss_family;
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
@@ -1275,7 +1312,10 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cma_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			      (struct sockaddr *)&id->route.addr.dst_addr,
+			      listen_id, ib_event, ss_family,
+			      ib_event->param.sidr_req_rcvd.service_id))
 		goto err;
 
 	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 08/14] IB/cma: Helper functions to access port space IDRs
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 07/14] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events Haggai Eran
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe, Yotam Kenneth, Shachar Raindel, Guy Shapiro

Add helper functions to access the IDRs by port-space and port number.

Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 81 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cf5c48b0b7d5..f2d799209412 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -113,6 +113,22 @@ static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
 static DEFINE_IDR(ib_ps);
 
+static struct idr *cma_idr(enum rdma_port_space ps)
+{
+	switch (ps) {
+	case RDMA_PS_TCP:
+		return &tcp_ps;
+	case RDMA_PS_UDP:
+		return &udp_ps;
+	case RDMA_PS_IPOIB:
+		return &ipoib_ps;
+	case RDMA_PS_IB:
+		return &ib_ps;
+	default:
+		return NULL;
+	}
+}
+
 struct cma_device {
 	struct list_head	list;
 	struct ib_device	*device;
@@ -122,11 +138,33 @@ struct cma_device {
 };
 
 struct rdma_bind_list {
-	struct idr		*ps;
+	enum rdma_port_space	ps;
 	struct hlist_head	owners;
 	unsigned short		port;
 };
 
+static int cma_ps_alloc(enum rdma_port_space ps,
+			struct rdma_bind_list *bind_list, int snum)
+{
+	struct idr *idr = cma_idr(ps);
+
+	return idr_alloc(idr, bind_list, snum, snum + 1, GFP_KERNEL);
+}
+
+static struct rdma_bind_list *cma_ps_find(enum rdma_port_space ps, int snum)
+{
+	struct idr *idr = cma_idr(ps);
+
+	return idr_find(idr, snum);
+}
+
+static void cma_ps_remove(enum rdma_port_space ps, int snum)
+{
+	struct idr *idr = cma_idr(ps);
+
+	idr_remove(idr, snum);
+}
+
 enum {
 	CMA_OPTION_AFONLY,
 };
@@ -1069,7 +1107,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_lock(&lock);
 	hlist_del(&id_priv->node);
 	if (hlist_empty(&bind_list->owners)) {
-		idr_remove(bind_list->ps, bind_list->port);
+		cma_ps_remove(bind_list->ps, bind_list->port);
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
@@ -2365,8 +2403,8 @@ static void cma_bind_port(struct rdma_bind_list *bind_list,
 	hlist_add_head(&id_priv->node, &bind_list->owners);
 }
 
-static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
-			  unsigned short snum)
+static int cma_alloc_port(enum rdma_port_space ps,
+			  struct rdma_id_private *id_priv, unsigned short snum)
 {
 	struct rdma_bind_list *bind_list;
 	int ret;
@@ -2375,7 +2413,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
+	ret = cma_ps_alloc(ps, bind_list, snum);
 	if (ret < 0)
 		goto err;
 
@@ -2388,7 +2426,8 @@ err:
 	return ret == -ENOSPC ? -EADDRNOTAVAIL : ret;
 }
 
-static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_alloc_any_port(enum rdma_port_space ps,
+			      struct rdma_id_private *id_priv)
 {
 	static unsigned int last_used_port;
 	int low, high, remaining;
@@ -2399,7 +2438,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	rover = prandom_u32() % remaining + low;
 retry:
 	if (last_used_port != rover &&
-	    !idr_find(ps, (unsigned short) rover)) {
+	    !cma_ps_find(ps, (unsigned short)rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*
 		 * Remember previously used port number in order to avoid
@@ -2454,7 +2493,8 @@ static int cma_check_port(struct rdma_bind_list *bind_list,
 	return 0;
 }
 
-static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_use_port(enum rdma_port_space ps,
+			struct rdma_id_private *id_priv)
 {
 	struct rdma_bind_list *bind_list;
 	unsigned short snum;
@@ -2464,7 +2504,7 @@ static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
 	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
-	bind_list = idr_find(ps, snum);
+	bind_list = cma_ps_find(ps, snum);
 	if (!bind_list) {
 		ret = cma_alloc_port(ps, id_priv, snum);
 	} else {
@@ -2487,25 +2527,24 @@ static int cma_bind_listen(struct rdma_id_private *id_priv)
 	return ret;
 }
 
-static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
+static enum rdma_port_space cma_select_inet_ps(
+		struct rdma_id_private *id_priv)
 {
 	switch (id_priv->id.ps) {
 	case RDMA_PS_TCP:
-		return &tcp_ps;
 	case RDMA_PS_UDP:
-		return &udp_ps;
 	case RDMA_PS_IPOIB:
-		return &ipoib_ps;
 	case RDMA_PS_IB:
-		return &ib_ps;
+		return id_priv->id.ps;
 	default:
-		return NULL;
+
+		return 0;
 	}
 }
 
-static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
+static enum rdma_port_space cma_select_ib_ps(struct rdma_id_private *id_priv)
 {
-	struct idr *ps = NULL;
+	enum rdma_port_space ps = 0;
 	struct sockaddr_ib *sib;
 	u64 sid_ps, mask, sid;
 
@@ -2515,15 +2554,15 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
 
 	if ((id_priv->id.ps == RDMA_PS_IB) && (sid == (RDMA_IB_IP_PS_IB & mask))) {
 		sid_ps = RDMA_IB_IP_PS_IB;
-		ps = &ib_ps;
+		ps = RDMA_PS_IB;
 	} else if (((id_priv->id.ps == RDMA_PS_IB) || (id_priv->id.ps == RDMA_PS_TCP)) &&
 		   (sid == (RDMA_IB_IP_PS_TCP & mask))) {
 		sid_ps = RDMA_IB_IP_PS_TCP;
-		ps = &tcp_ps;
+		ps = RDMA_PS_TCP;
 	} else if (((id_priv->id.ps == RDMA_PS_IB) || (id_priv->id.ps == RDMA_PS_UDP)) &&
 		   (sid == (RDMA_IB_IP_PS_UDP & mask))) {
 		sid_ps = RDMA_IB_IP_PS_UDP;
-		ps = &udp_ps;
+		ps = RDMA_PS_UDP;
 	}
 
 	if (ps) {
@@ -2536,7 +2575,7 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
 
 static int cma_get_port(struct rdma_id_private *id_priv)
 {
-	struct idr *ps;
+	enum rdma_port_space ps;
 	int ret;
 
 	if (cma_family(id_priv) != AF_IB)
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 08/14] IB/cma: Helper functions to access port space IDRs Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
       [not found]     ` <1438267826-32155-10-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-30 14:50   ` [PATCH v4 10/14] IB/cma: Add net_dev and private data checks to RDMA CM Haggai Eran
                     ` (5 subsequent siblings)
  14 siblings, 1 reply; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.

See discussion at:
  http://www.spinics.net/lists/netdev/msg336067.html

Cc: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Cc: Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c | 20 ++++++++++++++++++++
 include/rdma/ib_cm.h         |  6 ++++++
 2 files changed, 26 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index fa3d3e755127..d2b2c83f0076 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1404,6 +1404,24 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 	}
 }
 
+static u16 cm_get_bth_pkey(struct cm_work *work)
+{
+	struct ib_device *ib_dev = work->port->cm_dev->ib_device;
+	u8 port_num = work->port->port_num;
+	u16 pkey_index = work->mad_recv_wc->wc->pkey_index;
+	u16 pkey;
+	int ret;
+
+	ret = ib_get_cached_pkey(ib_dev, port_num, pkey_index, &pkey);
+	if (ret) {
+		dev_warn_ratelimited(&ib_dev->dev, "ib_cm: Couldn't retrieve pkey for incoming request (port %d, pkey index %d). %d\n",
+				     port_num, pkey_index, ret);
+		return 0;
+	}
+
+	return pkey;
+}
+
 static void cm_format_req_event(struct cm_work *work,
 				struct cm_id_private *cm_id_priv,
 				struct ib_cm_id *listen_id)
@@ -1414,6 +1432,7 @@ static void cm_format_req_event(struct cm_work *work,
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
 	param = &work->cm_event.param.req_rcvd;
 	param->listen_id = listen_id;
+	param->bth_pkey = cm_get_bth_pkey(work);
 	param->port = cm_id_priv->av.port->port_num;
 	param->primary_path = &work->path[0];
 	if (req_msg->alt_local_lid)
@@ -3105,6 +3124,7 @@ static void cm_format_sidr_req_event(struct cm_work *work,
 	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
 	param->listen_id = listen_id;
 	param->service_id = sidr_req_msg->service_id;
+	param->bth_pkey = cm_get_bth_pkey(work);
 	param->port = work->port->port_num;
 	work->cm_event.private_data = &sidr_req_msg->private_data;
 }
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 9cc496e1f2ad..e3f48632e237 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -113,6 +113,10 @@ struct ib_cm_id;
 
 struct ib_cm_req_event_param {
 	struct ib_cm_id		*listen_id;
+
+	/* P_Key that was used by the GMP's BTH header */
+	u16			bth_pkey;
+
 	u8			port;
 
 	struct ib_sa_path_rec	*primary_path;
@@ -224,6 +228,8 @@ struct ib_cm_apr_event_param {
 struct ib_cm_sidr_req_event_param {
 	struct ib_cm_id		*listen_id;
 	__be64			service_id;
+	/* P_Key that was used by the GMP's BTH header */
+	u16			bth_pkey;
 	u8			port;
 	u16			pkey;
 };
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 10/14] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 11/14] IB/cma: Validate routing of incoming requests Haggai Eran
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 188 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 185 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f2d799209412..011aa7310dd3 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -263,6 +263,15 @@ struct cma_hdr {
 
 #define CMA_VERSION 0x00
 
+struct cma_req_info {
+	struct ib_device *device;
+	int port;
+	union ib_gid local_gid;
+	__be64 service_id;
+	u16 pkey;
+	bool has_gid:1;
+};
+
 static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp)
 {
 	unsigned long flags;
@@ -300,7 +309,7 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv,
 	return old;
 }
 
-static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
+static inline u8 cma_get_ip_ver(const struct cma_hdr *hdr)
 {
 	return hdr->ip_version >> 4;
 }
@@ -1016,7 +1025,7 @@ static int cma_save_ip_info(struct sockaddr *src_addr,
 		cma_save_ip6_info(src_addr, dst_addr, hdr, port);
 		break;
 	default:
-		return -EINVAL;
+		return -EAFNOSUPPORT;
 	}
 
 	return 0;
@@ -1040,6 +1049,176 @@ static int cma_save_net_info(struct sockaddr *src_addr,
 	return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
 }
 
+static int cma_save_req_info(const struct ib_cm_event *ib_event,
+			     struct cma_req_info *req)
+{
+	const struct ib_cm_req_event_param *req_param =
+		&ib_event->param.req_rcvd;
+	const struct ib_cm_sidr_req_event_param *sidr_param =
+		&ib_event->param.sidr_req_rcvd;
+
+	switch (ib_event->event) {
+	case IB_CM_REQ_RECEIVED:
+		req->device	= req_param->listen_id->device;
+		req->port	= req_param->port;
+		memcpy(&req->local_gid, &req_param->primary_path->sgid,
+		       sizeof(req->local_gid));
+		req->has_gid	= true;
+		req->service_id	= req_param->primary_path->service_id;
+		req->pkey	= req_param->bth_pkey;
+		break;
+	case IB_CM_SIDR_REQ_RECEIVED:
+		req->device	= sidr_param->listen_id->device;
+		req->port	= sidr_param->port;
+		req->has_gid	= false;
+		req->service_id	= sidr_param->service_id;
+		req->pkey	= sidr_param->bth_pkey;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
+					  const struct cma_req_info *req)
+{
+	struct sockaddr_storage listen_addr_storage;
+	struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage;
+	struct net_device *net_dev;
+	const union ib_gid *gid = req->has_gid ? &req->local_gid : NULL;
+	int err;
+
+	err = cma_save_ip_info(listen_addr, NULL, ib_event, req->service_id);
+	if (err)
+		return ERR_PTR(err);
+
+	net_dev = ib_get_net_dev_by_params(req->device, req->port, req->pkey,
+					   gid, listen_addr);
+	if (!net_dev)
+		return ERR_PTR(-ENODEV);
+
+	return net_dev;
+}
+
+static enum rdma_port_space rdma_ps_from_service_id(__be64 service_id)
+{
+	return (be64_to_cpu(service_id) >> 16) & 0xffff;
+}
+
+static bool cma_match_private_data(struct rdma_id_private *id_priv,
+				   const struct cma_hdr *hdr)
+{
+	struct sockaddr *addr = cma_src_addr(id_priv);
+	__be32 ip4_addr;
+	struct in6_addr ip6_addr;
+
+	if (cma_any_addr(addr) && !id_priv->afonly)
+		return true;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		ip4_addr = ((struct sockaddr_in *)addr)->sin_addr.s_addr;
+		if (cma_get_ip_ver(hdr) != 4)
+			return false;
+		if (!cma_any_addr(addr) &&
+		    hdr->dst_addr.ip4.addr != ip4_addr)
+			return false;
+		break;
+	case AF_INET6:
+		ip6_addr = ((struct sockaddr_in6 *)addr)->sin6_addr;
+		if (cma_get_ip_ver(hdr) != 6)
+			return false;
+		if (!cma_any_addr(addr) &&
+		    memcmp(&hdr->dst_addr.ip6, &ip6_addr, sizeof(ip6_addr)))
+			return false;
+		break;
+	case AF_IB:
+		return true;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+static bool cma_match_net_dev(const struct rdma_id_private *id_priv,
+			      const struct net_device *net_dev)
+{
+	const struct rdma_addr *addr = &id_priv->id.route.addr;
+
+	if (!net_dev)
+		/* This request is an AF_IB request */
+		return addr->src_addr.ss_family == AF_IB;
+
+	return !addr->dev_addr.bound_dev_if ||
+	       (net_eq(dev_net(net_dev), &init_net) &&
+		addr->dev_addr.bound_dev_if == net_dev->ifindex);
+}
+
+static struct rdma_id_private *cma_find_listener(
+		const struct rdma_bind_list *bind_list,
+		const struct ib_cm_id *cm_id,
+		const struct ib_cm_event *ib_event,
+		const struct cma_req_info *req,
+		const struct net_device *net_dev)
+{
+	struct rdma_id_private *id_priv, *id_priv_dev;
+
+	if (!bind_list)
+		return ERR_PTR(-EINVAL);
+
+	hlist_for_each_entry(id_priv, &bind_list->owners, node) {
+		if (cma_match_private_data(id_priv, ib_event->private_data)) {
+			if (id_priv->id.device == cm_id->device &&
+			    cma_match_net_dev(id_priv, net_dev))
+				return id_priv;
+			list_for_each_entry(id_priv_dev,
+					    &id_priv->listen_list,
+					    listen_list) {
+				if (id_priv_dev->id.device == cm_id->device &&
+				    cma_match_net_dev(id_priv_dev, net_dev))
+					return id_priv_dev;
+			}
+		}
+	}
+
+	return ERR_PTR(-EINVAL);
+}
+
+static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id,
+						 struct ib_cm_event *ib_event)
+{
+	struct cma_req_info req;
+	struct rdma_bind_list *bind_list;
+	struct rdma_id_private *id_priv;
+	struct net_device *net_dev;
+	int err;
+
+	err = cma_save_req_info(ib_event, &req);
+	if (err)
+		return ERR_PTR(err);
+
+	net_dev = cma_get_net_dev(ib_event, &req);
+	if (IS_ERR(net_dev)) {
+		if (PTR_ERR(net_dev) == -EAFNOSUPPORT) {
+			/* Assuming the protocol is AF_IB */
+			net_dev = NULL;
+		} else {
+			return ERR_PTR(PTR_ERR(net_dev));
+		}
+	}
+
+	bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
+				cma_port_from_service_id(req.service_id));
+	id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, net_dev);
+
+	dev_put(net_dev);
+
+	return id_priv;
+}
+
 static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
 {
 	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
@@ -1399,7 +1578,10 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 	struct rdma_cm_event event;
 	int offset, ret;
 
-	listen_id = cm_id->context;
+	listen_id = cma_id_from_event(cm_id, ib_event);
+	if (IS_ERR(listen_id))
+		return PTR_ERR(listen_id);
+
 	if (!cma_check_req_qp_type(&listen_id->id, ib_event))
 		return -EINVAL;
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 11/14] IB/cma: Validate routing of incoming requests
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 10/14] IB/cma: Add net_dev and private data checks to RDMA CM Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 12/14] IB/cma: Use found net_dev for passive connections Haggai Eran
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 95 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 92 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 011aa7310dd3..f272b3d1799d 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -46,6 +46,8 @@
 
 #include <net/tcp.h>
 #include <net/ipv6.h>
+#include <net/ip_fib.h>
+#include <net/ip6_route.h>
 
 #include <rdma/rdma_cm.h>
 #include <rdma/rdma_cm_ib.h>
@@ -1081,16 +1083,98 @@ static int cma_save_req_info(const struct ib_cm_event *ib_event,
 	return 0;
 }
 
+static bool validate_ipv4_net_dev(struct net_device *net_dev,
+				  const struct sockaddr_in *dst_addr,
+				  const struct sockaddr_in *src_addr)
+{
+	__be32 daddr = dst_addr->sin_addr.s_addr,
+	       saddr = src_addr->sin_addr.s_addr;
+	struct fib_result res;
+	struct flowi4 fl4;
+	int err;
+	bool ret;
+
+	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
+	    ipv4_is_lbcast(daddr) || ipv4_is_zeronet(saddr) ||
+	    ipv4_is_zeronet(daddr) || ipv4_is_loopback(daddr) ||
+	    ipv4_is_loopback(saddr))
+		return false;
+
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.flowi4_iif = net_dev->ifindex;
+	fl4.daddr = daddr;
+	fl4.saddr = saddr;
+
+	rcu_read_lock();
+	err = fib_lookup(dev_net(net_dev), &fl4, &res, 0);
+	if (err)
+		return false;
+
+	ret = FIB_RES_DEV(res) == net_dev;
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static bool validate_ipv6_net_dev(struct net_device *net_dev,
+				  const struct sockaddr_in6 *dst_addr,
+				  const struct sockaddr_in6 *src_addr)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	const int strict = ipv6_addr_type(&dst_addr->sin6_addr) &
+			   IPV6_ADDR_LINKLOCAL;
+	struct rt6_info *rt = rt6_lookup(dev_net(net_dev), &dst_addr->sin6_addr,
+					 &src_addr->sin6_addr, net_dev->ifindex,
+					 strict);
+	bool ret;
+
+	if (!rt)
+		return false;
+
+	ret = rt->rt6i_idev->dev == net_dev;
+	ip6_rt_put(rt);
+
+	return ret;
+#else
+	return false;
+#endif
+}
+
+static bool validate_net_dev(struct net_device *net_dev,
+			     const struct sockaddr *daddr,
+			     const struct sockaddr *saddr)
+{
+	const struct sockaddr_in *daddr4 = (const struct sockaddr_in *)daddr;
+	const struct sockaddr_in *saddr4 = (const struct sockaddr_in *)saddr;
+	const struct sockaddr_in6 *daddr6 = (const struct sockaddr_in6 *)daddr;
+	const struct sockaddr_in6 *saddr6 = (const struct sockaddr_in6 *)saddr;
+
+	switch (daddr->sa_family) {
+	case AF_INET:
+		return saddr->sa_family == AF_INET &&
+		       validate_ipv4_net_dev(net_dev, daddr4, saddr4);
+
+	case AF_INET6:
+		return saddr->sa_family == AF_INET6 &&
+		       validate_ipv6_net_dev(net_dev, daddr6, saddr6);
+
+	default:
+		return false;
+	}
+}
+
 static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
 					  const struct cma_req_info *req)
 {
-	struct sockaddr_storage listen_addr_storage;
-	struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage;
+	struct sockaddr_storage listen_addr_storage, src_addr_storage;
+	struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage,
+			*src_addr = (struct sockaddr *)&src_addr_storage;
 	struct net_device *net_dev;
 	const union ib_gid *gid = req->has_gid ? &req->local_gid : NULL;
 	int err;
 
-	err = cma_save_ip_info(listen_addr, NULL, ib_event, req->service_id);
+	err = cma_save_ip_info(listen_addr, src_addr, ib_event,
+			       req->service_id);
 	if (err)
 		return ERR_PTR(err);
 
@@ -1099,6 +1183,11 @@ static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
 	if (!net_dev)
 		return ERR_PTR(-ENODEV);
 
+	if (!validate_net_dev(net_dev, listen_addr, src_addr)) {
+		dev_put(net_dev);
+		return ERR_PTR(-EHOSTUNREACH);
+	}
+
 	return net_dev;
 }
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 12/14] IB/cma: Use found net_dev for passive connections
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 11/14] IB/cma: Validate routing of incoming requests Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 13/14] IB/cma: Share ib_cm_ids between rdma_cm_ids Haggai Eran
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 76 ++++++++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f272b3d1799d..c1cd47eab149 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1277,33 +1277,31 @@ static struct rdma_id_private *cma_find_listener(
 }
 
 static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id,
-						 struct ib_cm_event *ib_event)
+						 struct ib_cm_event *ib_event,
+						 struct net_device **net_dev)
 {
 	struct cma_req_info req;
 	struct rdma_bind_list *bind_list;
 	struct rdma_id_private *id_priv;
-	struct net_device *net_dev;
 	int err;
 
 	err = cma_save_req_info(ib_event, &req);
 	if (err)
 		return ERR_PTR(err);
 
-	net_dev = cma_get_net_dev(ib_event, &req);
-	if (IS_ERR(net_dev)) {
-		if (PTR_ERR(net_dev) == -EAFNOSUPPORT) {
+	*net_dev = cma_get_net_dev(ib_event, &req);
+	if (IS_ERR(*net_dev)) {
+		if (PTR_ERR(*net_dev) == -EAFNOSUPPORT) {
 			/* Assuming the protocol is AF_IB */
-			net_dev = NULL;
+			*net_dev = NULL;
 		} else {
-			return ERR_PTR(PTR_ERR(net_dev));
+			return ERR_PTR(PTR_ERR(*net_dev));
 		}
 	}
 
 	bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
 				cma_port_from_service_id(req.service_id));
-	id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, net_dev);
-
-	dev_put(net_dev);
+	id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev);
 
 	return id_priv;
 }
@@ -1553,7 +1551,8 @@ out:
 }
 
 static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
-					       struct ib_cm_event *ib_event)
+					       struct ib_cm_event *ib_event,
+					       struct net_device *net_dev)
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
@@ -1585,14 +1584,16 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 	if (rt->num_paths == 2)
 		rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
 
-	if (cma_any_addr(cma_src_addr(id_priv))) {
-		rt->addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
-		rdma_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid);
-		ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey));
-	} else {
-		ret = cma_translate_addr(cma_src_addr(id_priv), &rt->addr.dev_addr);
+	if (net_dev) {
+		ret = rdma_copy_addr(&rt->addr.dev_addr, net_dev, NULL);
 		if (ret)
 			goto err;
+	} else {
+		/* An AF_IB connection */
+		WARN_ON_ONCE(ss_family != AF_IB);
+
+		cma_translate_ib((struct sockaddr_ib *)cma_src_addr(id_priv),
+				 &rt->addr.dev_addr);
 	}
 	rdma_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid);
 
@@ -1605,7 +1606,8 @@ err:
 }
 
 static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
-					      struct ib_cm_event *ib_event)
+					      struct ib_cm_event *ib_event,
+					      struct net_device *net_dev)
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
@@ -1624,10 +1626,18 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 			      ib_event->param.sidr_req_rcvd.service_id))
 		goto err;
 
-	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
-		ret = cma_translate_addr(cma_src_addr(id_priv), &id->route.addr.dev_addr);
+	if (net_dev) {
+		ret = rdma_copy_addr(&id->route.addr.dev_addr, net_dev, NULL);
 		if (ret)
 			goto err;
+	} else {
+		/* An AF_IB connection */
+		WARN_ON_ONCE(ss_family != AF_IB);
+
+		if (!cma_any_addr(cma_src_addr(id_priv)))
+			cma_translate_ib((struct sockaddr_ib *)
+						cma_src_addr(id_priv),
+					 &id->route.addr.dev_addr);
 	}
 
 	id_priv->state = RDMA_CM_CONNECT;
@@ -1665,28 +1675,33 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 {
 	struct rdma_id_private *listen_id, *conn_id;
 	struct rdma_cm_event event;
+	struct net_device *net_dev;
 	int offset, ret;
 
-	listen_id = cma_id_from_event(cm_id, ib_event);
+	listen_id = cma_id_from_event(cm_id, ib_event, &net_dev);
 	if (IS_ERR(listen_id))
 		return PTR_ERR(listen_id);
 
-	if (!cma_check_req_qp_type(&listen_id->id, ib_event))
-		return -EINVAL;
+	if (!cma_check_req_qp_type(&listen_id->id, ib_event)) {
+		ret = -EINVAL;
+		goto net_dev_put;
+	}
 
-	if (cma_disable_callback(listen_id, RDMA_CM_LISTEN))
-		return -ECONNABORTED;
+	if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) {
+		ret = -ECONNABORTED;
+		goto net_dev_put;
+	}
 
 	memset(&event, 0, sizeof event);
 	offset = cma_user_data_offset(listen_id);
 	event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
 	if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED) {
-		conn_id = cma_new_udp_id(&listen_id->id, ib_event);
+		conn_id = cma_new_udp_id(&listen_id->id, ib_event, net_dev);
 		event.param.ud.private_data = ib_event->private_data + offset;
 		event.param.ud.private_data_len =
 				IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset;
 	} else {
-		conn_id = cma_new_conn_id(&listen_id->id, ib_event);
+		conn_id = cma_new_conn_id(&listen_id->id, ib_event, net_dev);
 		cma_set_req_event_data(&event, &ib_event->param.req_rcvd,
 				       ib_event->private_data, offset);
 	}
@@ -1724,6 +1739,8 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 	mutex_unlock(&conn_id->handler_mutex);
 	mutex_unlock(&listen_id->handler_mutex);
 	cma_deref_id(conn_id);
+	if (net_dev)
+		dev_put(net_dev);
 	return 0;
 
 err3:
@@ -1737,6 +1754,11 @@ err1:
 	mutex_unlock(&listen_id->handler_mutex);
 	if (conn_id)
 		rdma_destroy_id(&conn_id->id);
+
+net_dev_put:
+	if (net_dev)
+		dev_put(net_dev);
+
 	return ret;
 }
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 13/14] IB/cma: Share ib_cm_ids between rdma_cm_ids
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 12/14] IB/cma: Use found net_dev for passive connections Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 14:50   ` [PATCH v4 14/14] IB/cm: Remove compare_data checks Haggai Eran
  2015-07-30 15:30   ` [PATCH v4 00/14] Demux IB CM requests in the rdma_cm module Doug Ledford
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 59 +++----------------------------------------
 1 file changed, 4 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c1cd47eab149..1f26bff5f780 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1771,42 +1771,6 @@ __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)
 }
 EXPORT_SYMBOL(rdma_get_service_id);
 
-static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
-				 struct ib_cm_compare_data *compare)
-{
-	struct cma_hdr *cma_data, *cma_mask;
-	__be32 ip4_addr;
-	struct in6_addr ip6_addr;
-
-	memset(compare, 0, sizeof *compare);
-	cma_data = (void *) compare->data;
-	cma_mask = (void *) compare->mask;
-
-	switch (addr->sa_family) {
-	case AF_INET:
-		ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
-		cma_set_ip_ver(cma_data, 4);
-		cma_set_ip_ver(cma_mask, 0xF);
-		if (!cma_any_addr(addr)) {
-			cma_data->dst_addr.ip4.addr = ip4_addr;
-			cma_mask->dst_addr.ip4.addr = htonl(~0);
-		}
-		break;
-	case AF_INET6:
-		ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
-		cma_set_ip_ver(cma_data, 6);
-		cma_set_ip_ver(cma_mask, 0xF);
-		if (!cma_any_addr(addr)) {
-			cma_data->dst_addr.ip6 = ip6_addr;
-			memset(&cma_mask->dst_addr.ip6, 0xFF,
-			       sizeof cma_mask->dst_addr.ip6);
-		}
-		break;
-	default:
-		break;
-	}
-}
-
 static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
 {
 	struct rdma_id_private *id_priv = iw_id->context;
@@ -1960,33 +1924,18 @@ out:
 
 static int cma_ib_listen(struct rdma_id_private *id_priv)
 {
-	struct ib_cm_compare_data compare_data;
 	struct sockaddr *addr;
 	struct ib_cm_id	*id;
 	__be64 svc_id;
-	int ret;
 
-	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv);
+	addr = cma_src_addr(id_priv);
+	svc_id = rdma_get_service_id(&id_priv->id, addr);
+	id = ib_cm_insert_listen(id_priv->id.device, cma_req_handler, svc_id);
 	if (IS_ERR(id))
 		return PTR_ERR(id);
-
 	id_priv->cm_id.ib = id;
 
-	addr = cma_src_addr(id_priv);
-	svc_id = rdma_get_service_id(&id_priv->id, addr);
-	if (cma_any_addr(addr) && !id_priv->afonly)
-		ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL);
-	else {
-		cma_set_compare_data(id_priv->id.ps, addr, &compare_data);
-		ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data);
-	}
-
-	if (ret) {
-		ib_destroy_cm_id(id_priv->cm_id.ib);
-		id_priv->cm_id.ib = NULL;
-	}
-
-	return ret;
+	return 0;
 }
 
 static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog)
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 14/14] IB/cm: Remove compare_data checks
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 13/14] IB/cma: Share ib_cm_ids between rdma_cm_ids Haggai Eran
@ 2015-07-30 14:50   ` Haggai Eran
  2015-07-30 15:30   ` [PATCH v4 00/14] Demux IB CM requests in the rdma_cm module Doug Ledford
  14 siblings, 0 replies; 19+ messages in thread
From: Haggai Eran @ 2015-07-30 14:50 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Liran Liss, Haggai Eran, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Jason Gunthorpe

Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c            | 109 ++++++--------------------------
 drivers/infiniband/core/ucm.c           |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   2 +-
 include/rdma/ib_cm.h                    |  14 +---
 5 files changed, 23 insertions(+), 107 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index d2b2c83f0076..ea4db9c1d44f 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -222,7 +222,6 @@ struct cm_id_private {
 	/* todo: use alternate port on send failure */
 	struct cm_av av;
 	struct cm_av alt_av;
-	struct ib_cm_compare_data *compare_data;
 
 	void *private_data;
 	__be64 tid;
@@ -443,40 +442,6 @@ static struct cm_id_private * cm_acquire_id(__be32 local_id, __be32 remote_id)
 	return cm_id_priv;
 }
 
-static void cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask)
-{
-	int i;
-
-	for (i = 0; i < IB_CM_COMPARE_SIZE; i++)
-		dst[i] = src[i] & mask[i];
-}
-
-static int cm_compare_data(struct ib_cm_compare_data *src_data,
-			   struct ib_cm_compare_data *dst_data)
-{
-	u32 src[IB_CM_COMPARE_SIZE];
-	u32 dst[IB_CM_COMPARE_SIZE];
-
-	if (!src_data || !dst_data)
-		return 0;
-
-	cm_mask_copy(src, src_data->data, dst_data->mask);
-	cm_mask_copy(dst, dst_data->data, src_data->mask);
-	return memcmp(src, dst, sizeof(src));
-}
-
-static int cm_compare_private_data(u32 *private_data,
-				   struct ib_cm_compare_data *dst_data)
-{
-	u32 src[IB_CM_COMPARE_SIZE];
-
-	if (!dst_data)
-		return 0;
-
-	cm_mask_copy(src, private_data, dst_data->mask);
-	return memcmp(src, dst_data->data, sizeof(src));
-}
-
 /*
  * Trivial helpers to strip endian annotation and compare; the
  * endianness doesn't actually matter since we just need a stable
@@ -509,18 +474,14 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 	struct cm_id_private *cur_cm_id_priv;
 	__be64 service_id = cm_id_priv->id.service_id;
 	__be64 service_mask = cm_id_priv->id.service_mask;
-	int data_cmp;
 
 	while (*link) {
 		parent = *link;
 		cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
 					  service_node);
-		data_cmp = cm_compare_data(cm_id_priv->compare_data,
-					   cur_cm_id_priv->compare_data);
 		if ((cur_cm_id_priv->id.service_mask & service_id) ==
 		    (service_mask & cur_cm_id_priv->id.service_id) &&
-		    (cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
-		    !data_cmp)
+		    (cm_id_priv->id.device == cur_cm_id_priv->id.device))
 			return cur_cm_id_priv;
 
 		if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
@@ -531,8 +492,6 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 			link = &(*link)->rb_left;
 		else if (be64_gt(service_id, cur_cm_id_priv->id.service_id))
 			link = &(*link)->rb_right;
-		else if (data_cmp < 0)
-			link = &(*link)->rb_left;
 		else
 			link = &(*link)->rb_right;
 	}
@@ -542,20 +501,16 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 }
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
-					     __be64 service_id,
-					     u32 *private_data)
+					     __be64 service_id)
 {
 	struct rb_node *node = cm.listen_service_table.rb_node;
 	struct cm_id_private *cm_id_priv;
-	int data_cmp;
 
 	while (node) {
 		cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
-		data_cmp = cm_compare_private_data(private_data,
-						   cm_id_priv->compare_data);
 		if ((cm_id_priv->id.service_mask & service_id) ==
 		     cm_id_priv->id.service_id &&
-		    (cm_id_priv->id.device == device) && !data_cmp)
+		    (cm_id_priv->id.device == device))
 			return cm_id_priv;
 
 		if (device < cm_id_priv->id.device)
@@ -566,8 +521,6 @@ static struct cm_id_private * cm_find_listen(struct ib_device *device,
 			node = node->rb_left;
 		else if (be64_gt(service_id, cm_id_priv->id.service_id))
 			node = node->rb_right;
-		else if (data_cmp < 0)
-			node = node->rb_left;
 		else
 			node = node->rb_right;
 	}
@@ -939,7 +892,6 @@ retest:
 	wait_for_completion(&cm_id_priv->comp);
 	while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
 		cm_free_work(work);
-	kfree(cm_id_priv->compare_data);
 	kfree(cm_id_priv->private_data);
 	kfree(cm_id_priv);
 }
@@ -962,20 +914,11 @@ EXPORT_SYMBOL(ib_destroy_cm_id);
  *   range of service IDs.  If set to 0, the service ID is matched
  *   exactly.  This parameter is ignored if %service_id is set to
  *   IB_CM_ASSIGN_SERVICE_ID.
- * @compare_data: This parameter is optional.  It specifies data that must
- *   appear in the private data of a connection request for the specified
- *   listen request.
- * @lock: If set, lock the cm.lock spin-lock when adding the id to the
- *   listener tree. When false, the caller must already hold the spin-lock,
- *   and compare_data must be NULL.
  */
 static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
-			  __be64 service_mask,
-			  struct ib_cm_compare_data *compare_data,
-			  bool lock)
+			  __be64 service_mask)
 {
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
-	unsigned long flags = 0;
 	int ret = 0;
 
 	service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
@@ -988,22 +931,9 @@ static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
 	if (cm_id->state != IB_CM_IDLE)
 		return -EINVAL;
 
-	if (compare_data) {
-		cm_id_priv->compare_data = kzalloc(sizeof *compare_data,
-						   GFP_KERNEL);
-		if (!cm_id_priv->compare_data)
-			return -ENOMEM;
-		cm_mask_copy(cm_id_priv->compare_data->data,
-			     compare_data->data, compare_data->mask);
-		memcpy(cm_id_priv->compare_data->mask, compare_data->mask,
-		       sizeof(compare_data->mask));
-	}
-
 	cm_id->state = IB_CM_LISTEN;
-	if (lock)
-		spin_lock_irqsave(&cm.lock, flags);
-
 	++cm_id_priv->listen_sharecount;
+
 	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
 		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
 		cm_id->service_mask = ~cpu_to_be64(0);
@@ -1016,22 +946,21 @@ static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
 	if (cur_cm_id_priv) {
 		cm_id->state = IB_CM_IDLE;
 		--cm_id_priv->listen_sharecount;
-		kfree(cm_id_priv->compare_data);
-		cm_id_priv->compare_data = NULL;
 		ret = -EBUSY;
 	}
-
-	if (lock)
-		spin_unlock_irqrestore(&cm.lock, flags);
-
 	return ret;
 }
 
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
-		 struct ib_cm_compare_data *compare_data)
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask)
 {
-	return __ib_cm_listen(cm_id, service_id, service_mask, compare_data,
-			      true);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm.lock, flags);
+	ret = __ib_cm_listen(cm_id, service_id, service_mask);
+	spin_unlock_irqrestore(&cm.lock, flags);
+
+	return ret;
 }
 EXPORT_SYMBOL(ib_cm_listen);
 
@@ -1071,7 +1000,7 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 		goto new_id;
 
 	/* Find an existing ID */
-	cm_id_priv = cm_find_listen(device, service_id, NULL);
+	cm_id_priv = cm_find_listen(device, service_id);
 	if (cm_id_priv) {
 		if (cm_id->cm_handler != cm_handler || cm_id->context) {
 			/* Sharing an ib_cm_id with different handlers is not
@@ -1090,7 +1019,7 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 
 new_id:
 	/* Use newly created ID */
-	err = __ib_cm_listen(cm_id, service_id, 0, NULL, false);
+	err = __ib_cm_listen(cm_id, service_id, 0);
 
 	spin_unlock_irqrestore(&cm.lock, flags);
 
@@ -1615,8 +1544,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 
 	/* Find matching listen request. */
 	listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
-					   req_msg->service_id,
-					   req_msg->private_data);
+					   req_msg->service_id);
 	if (!listen_cm_id_priv) {
 		cm_cleanup_timewait(cm_id_priv->timewait_info);
 		spin_unlock_irq(&cm.lock);
@@ -3164,8 +3092,7 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	}
 	cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
 	cur_cm_id_priv = cm_find_listen(cm_id->device,
-					sidr_req_msg->service_id,
-					sidr_req_msg->private_data);
+					sidr_req_msg->service_id);
 	if (!cur_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 8cde48b96f19..6b4e8a008bc0 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -658,8 +658,7 @@ static ssize_t ib_ucm_listen(struct ib_ucm_file *file,
 	if (result)
 		goto out;
 
-	result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask,
-			      NULL);
+	result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask);
 out:
 	ib_ucm_ctx_put(ctx);
 	return result;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index ee39be6ccfb0..9d321575d90e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -848,7 +848,7 @@ int ipoib_cm_dev_open(struct net_device *dev)
 	}
 
 	ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num),
-			   0, NULL);
+			   0);
 	if (ret) {
 		printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name,
 		       IPOIB_CM_IETF_ID | priv->qp->qp_num);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 4c59ceb40fff..3ab015b0236d 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -3250,7 +3250,7 @@ static void srpt_add_one(struct ib_device *device)
 	 * in the system as service_id; therefore, the target_id will change
 	 * if this HCA is gone bad and replaced by different HCA
 	 */
-	if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0, NULL))
+	if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0))
 		goto err_cm;
 
 	INIT_IB_EVENT_HANDLER(&sdev->event_handler, sdev->device,
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index e3f48632e237..92a7d85917b4 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -105,8 +105,6 @@ enum ib_cm_data_size {
 	IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE = 216,
 	IB_CM_SIDR_REP_PRIVATE_DATA_SIZE = 136,
 	IB_CM_SIDR_REP_INFO_LENGTH	 = 72,
-	/* compare done u32 at a time */
-	IB_CM_COMPARE_SIZE		 = (64 / sizeof(u32))
 };
 
 struct ib_cm_id;
@@ -344,11 +342,6 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id);
 #define IB_SDP_SERVICE_ID	cpu_to_be64(0x0000000000010000ULL)
 #define IB_SDP_SERVICE_ID_MASK	cpu_to_be64(0xFFFFFFFFFFFF0000ULL)
 
-struct ib_cm_compare_data {
-	u32  data[IB_CM_COMPARE_SIZE];
-	u32  mask[IB_CM_COMPARE_SIZE];
-};
-
 /**
  * ib_cm_listen - Initiates listening on the specified service ID for
  *   connection and service ID resolution requests.
@@ -361,12 +354,9 @@ struct ib_cm_compare_data {
  *   range of service IDs.  If set to 0, the service ID is matched
  *   exactly.  This parameter is ignored if %service_id is set to
  *   IB_CM_ASSIGN_SERVICE_ID.
- * @compare_data: This parameter is optional.  It specifies data that must
- *   appear in the private data of a connection request for the specified
- *   listen request.
  */
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
-		 struct ib_cm_compare_data *compare_data);
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
+		 __be64 service_mask);
 
 struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 				     ib_cm_handler cm_handler,
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 00/14] Demux IB CM requests in the rdma_cm module
       [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2015-07-30 14:50   ` [PATCH v4 14/14] IB/cm: Remove compare_data checks Haggai Eran
@ 2015-07-30 15:30   ` Doug Ledford
  14 siblings, 0 replies; 19+ messages in thread
From: Doug Ledford @ 2015-07-30 15:30 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Liran Liss, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe

[-- Attachment #1: Type: text/plain, Size: 8293 bytes --]

On 07/30/2015 10:50 AM, Haggai Eran wrote:
> I'm sending the patchset again with the rwsem patch and rebased over Doug's
> to-be-rebased/for-4.3 tree.

Thanks for rebasing, I was able to apply them this time.

> 
> Regards,
> Haggai
> 
> Changes from v3:
> - rebase over github.com/dledford/linux to-be-rebased/for-4.3
> - add rwsem patch
> 
> Changes from v2:
> - added missing reviewed-bys
> - Patch 5: remove service_mask as a parameter from ib_cm_insert_listen()
> - Patch 9:
>   * move cma_req_info struct near other structs
>   * put GID by value in the struct
> 
> Changes from v1:
> - Patch 1: mark ib_client_data as going down instead of removing all client
>   contexts during de-registration.
> - Patch 2:
>   * move kdoc to the function definition
>   * do not call get_net_dev_by_params() on devices/clients that are going
>     down
>   * pass client data directly to the callback
> - Patch 3:
>   * pass client data directly to callback
>   * fix a lockdep warning in ipoib_match_gid_pkey_addr()
>   * remove a debugging print left over
>   * set a rate limit to the duplicated IP address warning
> - Patch 5:
>   * change atomic_dec(&id->refcount) to cm_deref_id()
>   * always update listen_sharecount under the cm.lock spinlock
> - Patch 6: handle AF_IB requests by getting parameters from the listener
> - Patch 8: new patch to expose BTH P_Key from ib_cm to rdma_cm
> - Patch 9:
>   * get P_Key used for de-mux from the BTH
>   * use -EAFNOSUPPORT in cma_save_ip_info to designate a possible AF_IB
>     connection request
>   * pass a NULL netdev for AF_IB requests
> - Patch 11: handle AF_IB connections by filling connection information from
>   the listener id instead of from the net_dev
> - Patch 12: fix mention of the old ib_cm_id_create_and_listen function in
>   the changelog entry.
> 
> Changes from v0:
> - Added a patch to prevent a race between ib_unregister_device() and
>   ib_get_net_dev_by_params().
> - Removed the patch that exported a UD GMP packet's GID from the GRH, and
>   related code.
> - Patch 3:
>   * Add _rcu suffix to ipoib_is_dev_match_addr().
>   * Add helper function to get the master netdev for bonding support.
>   * Scan for matching net devices in two phases: first without looking at
>   * the IP address, and then looking at the IP address only when the first
>     phase did not find a unique net device.
> - Patch 5:
>   * Do not init listen_sharecount = 1 for non-listening ib_cm_ids.
>   * Remove code that sets a CM ID's state to IB_CM_IDLE right before
>     destruction.
>   * Rename ib_cm_id_create_and_listen() to ib_cm_insert_listen().
>   * Do not increase reference counts when failing to add a shared CM ID due
>     to having a different handler callback.
> - Patch 9: Clean IPv4 net_dev validation function.
> - Added patch 10: new patch to use the found net_dev in IB/cma for
>   eliminating unneeded calls to cma_translate_addr.
> - Patch 12: Remove the lock argument to __ib_cm_listen().
> 
> The rdma_cm module relies today on the ib_cm module to demux incoming
> requests based on their service ID and IP address. The ib_cm module is the
> wrong place to perform this task, as it can also be used with services that
> do not adhere to the RDMA IP CM service as defined in the IBA
> specifications. It is forced to use an opaque private data struct and mask
> to compare incoming requests against.
> 
> This series moves that demux task responsibility to the rdma_cm module. The
> rdma_cm module can look into the private data attached to a CM request,
> containing the IP addresses related to the request. It uses the details of
> the request to find the net device associated with the request, and use
> that net device to find the correct listening rdma_cm_id.
> 
> The series applies against Doug's for-v4.2 tree with the patch adding a
> rwsem to IB core [2] applied.
> 
> The series is structured as follows:
> Patch 1 prevents a possible race between ib_client.remove() callbacks from
> ib_unregister_device(), and ib_client callbacks that rely on the
> lists_rwsem locked for read, such as ib_get_net_dev_by_params(). Both
> callbacks may call ib_get_client_data(), and the patch makes sure that the
> remove callback doesn't free the client data while it is being used by the
> other callback.
> 
> Patches 2-3 add the ability to lookup a network device according to the IB
> device, port, P_Key, GID and IP address. They find the matching IPoIB
> interfaces, and return a matching net_device if one exists.
> 
> Patches 4-5 make necessary changes in ib_cm to allow RDMA CM get the
> information it needs out of CM and SIDR requests, and share a single
> ib_cm_id with multiple RDMA CM listeners.
> 
> Patches 6-7 do some preliminary refactoring to the rdma_cm module. They
> allow extracting information out of incoming requests instead of retrieving
> them from a listening CM ID, and add helper functions to access the port
> space IDRs.
> 
> Finally, patches 8-12 change rdma_cm to demultiplex requests on its own, and
> patch 13 cleans up the now unneeded code in ib_cm to compare against the
> private data.
> 
> This series contains a subset of the RDMA CM namespaces patches [1]. The
> changes from v4 of the relevant patches are:
> - Patch 1
>   * in addition to the IB device, port, P_Key and IP address, pass
>     also the GID, to make future IPoIB devices with alias GIDs to unique.
>   * return the matching net_device instead of a network namespace.
> - Patch 2: use IS_ENABLED(CONFIG_IPV6) without ifdefs.
> - Patch 5:
>   * rename sharecount -> listen_sharecount.
>   * use a regular int instead of atomic for the share count, protected by
>     the cm.lock spinlock.
>   * change id destruction and shared listener creation to prevent the case
>     where an id is found but it is under destruction.
> 
> [1] [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM
>     http://www.spinics.net/lists/linux-rdma/msg25244.html
> [2] [PATCH for-next V5 02/12] IB/core: Add rwsem to allow reading device list or client list
>     http://www.spinics.net/lists/linux-rdma/msg25931.html
> 
> Guy Shapiro (1):
>   IB/ipoib: Return IPoIB devices matching connection parameters
> 
> Haggai Eran (12):
>   IB/core: Add rwsem to allow reading device list or client list
>   IB/core: lock client data with lists_rwsem
>   IB/cm: Expose service ID in request events
>   IB/cm: Share listening CM IDs
>   IB/cma: Refactor RDMA IP CM private-data parsing code
>   IB/cma: Helper functions to access port space IDRs
>   IB/cm: Expose BTH P_Key in CM and SIDR request events
>   IB/cma: Add net_dev and private data checks to RDMA CM
>   IB/cma: Validate routing of incoming requests
>   IB/cma: Use found net_dev for passive connections
>   IB/cma: Share ib_cm_ids between rdma_cm_ids
>   IB/cm: Remove compare_data checks
> 
> Yotam Kenneth (1):
>   IB/core: Find the network device matching connection parameters
> 
>  drivers/infiniband/core/cache.c           |   2 +-
>  drivers/infiniband/core/cm.c              | 215 ++++++----
>  drivers/infiniband/core/cma.c             | 646 ++++++++++++++++++++++--------
>  drivers/infiniband/core/device.c          | 134 ++++++-
>  drivers/infiniband/core/mad.c             |   2 +-
>  drivers/infiniband/core/multicast.c       |   7 +-
>  drivers/infiniband/core/sa_query.c        |   6 +-
>  drivers/infiniband/core/ucm.c             |   9 +-
>  drivers/infiniband/core/user_mad.c        |   6 +-
>  drivers/infiniband/core/uverbs_main.c     |   6 +-
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c   |   2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 236 ++++++++++-
>  drivers/infiniband/ulp/srp/ib_srp.c       |   6 +-
>  drivers/infiniband/ulp/srpt/ib_srpt.c     |   7 +-
>  include/rdma/ib_cm.h                      |  25 +-
>  include/rdma/ib_verbs.h                   |  33 +-
>  net/rds/ib.c                              |   5 +-
>  net/rds/iw.c                              |   5 +-
>  18 files changed, 1040 insertions(+), 312 deletions(-)
> 


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events
       [not found]     ` <1438267826-32155-10-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-08-30 18:23       ` Sagi Grimberg
       [not found]         ` <55E34A05.8040205-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2015-08-30 18:23 UTC (permalink / raw)
  To: Haggai Eran, Doug Ledford
  Cc: Liran Liss, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe

On 7/30/2015 5:50 PM, Haggai Eran wrote:
> The rdma_cm module will later use the P_Key from the BTH to de-mux
> requests.
>
> See discussion at:
>    http://www.spinics.net/lists/netdev/msg336067.html

I've been hitting errors with srp target with this series applied.

Not sure if this series exposes a bug in ib_srpt or breaks it at
this point, so I just thought I'd send it out at the meantime...

Looks like for some reason cm_get_bth_pkey got pkey_index of 0xffff
instead of 0 (working on the default pkey 0xffff at entry 0).

I have modified the srp initiator, but I doubt that my changes are
related at the moment as well as I didn't modify the channel
establishment at all.

log:
infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request 
(port 1, pkey index 65535). -22
ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, 
t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 
(guid=0xfe80000000000000:0x2c90300ed0950)
ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started
infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request 
(port 1, pkey index 65535). -22
ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, 
t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 
(guid=0xfe80000000000000:0x2c90300ed0950)
ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started
mlx5_0:dump_cqe:238:(pid 8584): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
0000002b 00000000 00000000 00000000
00000000 94003004 0000002c 0000b8e0
ib_srpt receiving failed for idx 0 with status 4
0000:04:00.0:poll_health:151:(pid 0): device's health compromised
assert_var[0] 0x00000094
assert_var[1] 0x00000000
assert_var[2] 0x00000000
assert_var[3] 0x00000000
assert_var[4] 0x00000000
assert_exit_ptr 0x0061d35c
assert_callra 0x0067a5f4
fw_ver 0xa0641900
hw_id 0x000001ff
irisc_index 2
synd 0x1: firmware internal error
ext_sync 0x0000
0000:04:00.0:health_care:76:(pid 7943): handling bad device here
ib_srpt Received DREQ and sent DREP for session 
0x00000000000000000002c90300ed0960.
ib_srpt Received DREQ and sent DREP for session 
0x00000000000000000002c90300ed0960.
ib_srpt Received IB TimeWait exit for cm_id ffff88046d1fb200.
ib_srpt Received IB TimeWait exit for cm_id ffff880454ffa000.
ib_srpt Session 0x00000000000000000002c90300ed0960: kernel thread 
ib_srpt_compl (PID 8585) stopped

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events
       [not found]         ` <55E34A05.8040205-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-08-31  6:50           ` Haggai Eran
       [not found]             ` <55E3F93D.6000400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Haggai Eran @ 2015-08-31  6:50 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: Liran Liss, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe,
	Eli Cohen

On 30/08/2015 21:23, Sagi Grimberg wrote:
> 
> Looks like for some reason cm_get_bth_pkey got pkey_index of 0xffff
> instead of 0 (working on the default pkey 0xffff at entry 0).

It looks like the mlx5 driver doesn't interpret the completion format
correctly. It takes a field defined in the programmer reference manual
as pkey, and interprets it as pkey_index [1].

> log:
> infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, pkey index 65535). -22
> ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300ed0950)
> ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started
> infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, pkey index 65535). -22
> ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300ed0950)
> ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started
> mlx5_0:dump_cqe:238:(pid 8584): dump error cqe
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 0000002b 00000000 00000000 00000000
> 00000000 94003004 0000002c 0000b8e0
> ib_srpt receiving failed for idx 0 with status 4
> 0000:04:00.0:poll_health:151:(pid 0): device's health compromised
> assert_var[0] 0x00000094
> assert_var[1] 0x00000000
> assert_var[2] 0x00000000
> assert_var[3] 0x00000000
> assert_var[4] 0x00000000
> assert_exit_ptr 0x0061d35c
> assert_callra 0x0067a5f4
> fw_ver 0xa0641900
> hw_id 0x000001ff
> irisc_index 2
> synd 0x1: firmware internal error
> ext_sync 0x0000
> 0000:04:00.0:health_care:76:(pid 7943): handling bad device here
> ib_srpt Received DREQ and sent DREP for session 0x00000000000000000002c90300ed0960.
> ib_srpt Received DREQ and sent DREP for session 0x00000000000000000002c90300ed0960.
> ib_srpt Received IB TimeWait exit for cm_id ffff88046d1fb200.
> ib_srpt Received IB TimeWait exit for cm_id ffff880454ffa000.
> ib_srpt Session 0x00000000000000000002c90300ed0960: kernel thread ib_srpt_compl (PID 8585) stopped

I don't know how that can cause all the other errors though.

Haggai

[1]
http://lxr.free-electrons.com/source/drivers/infiniband/hw/mlx5/cq.c?v=4.1#L230
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events
       [not found]             ` <55E3F93D.6000400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-08-31  7:41               ` Sagi Grimberg
  0 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2015-08-31  7:41 UTC (permalink / raw)
  To: Haggai Eran, Doug Ledford
  Cc: Liran Liss, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jason Gunthorpe,
	Eli Cohen

On 8/31/2015 9:50 AM, Haggai Eran wrote:
> On 30/08/2015 21:23, Sagi Grimberg wrote:
>>
>> Looks like for some reason cm_get_bth_pkey got pkey_index of 0xffff
>> instead of 0 (working on the default pkey 0xffff at entry 0).
>
> It looks like the mlx5 driver doesn't interpret the completion format
> correctly. It takes a field defined in the programmer reference manual
> as pkey, and interprets it as pkey_index [1].

You're right! I wonder how this ever used to work (and it did...).
So the driver needs to lookup a pkey_index on each GSI packet?

>
>> log:
>> infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, pkey index 65535). -22
>> ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300ed0950)
>> ib_srpt Session : kernel thread ib_srpt_compl (PID 8584) started
>> infiniband mlx5_0: ib_cm: Couldn't retrieve pkey for incoming request (port 1, pkey index 65535). -22
>> ib_srpt Received SRP_LOGIN_REQ with i_port_id 0x0:0x2c90300ed0960, t_port_id 0x2c90300ed0950:0x2c90300ed0950 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300ed0950)
>> ib_srpt Session : kernel thread ib_srpt_compl (PID 8585) started
>> mlx5_0:dump_cqe:238:(pid 8584): dump error cqe
>> 00000000 00000000 00000000 00000000
>> 00000000 00000000 00000000 00000000
>> 0000002b 00000000 00000000 00000000
>> 00000000 94003004 0000002c 0000b8e0
>> ib_srpt receiving failed for idx 0 with status 4
>> 0000:04:00.0:poll_health:151:(pid 0): device's health compromised
>> assert_var[0] 0x00000094
>> assert_var[1] 0x00000000
>> assert_var[2] 0x00000000
>> assert_var[3] 0x00000000
>> assert_var[4] 0x00000000
>> assert_exit_ptr 0x0061d35c
>> assert_callra 0x0067a5f4
>> fw_ver 0xa0641900
>> hw_id 0x000001ff
>> irisc_index 2
>> synd 0x1: firmware internal error
>> ext_sync 0x0000
>> 0000:04:00.0:health_care:76:(pid 7943): handling bad device here
>> ib_srpt Received DREQ and sent DREP for session 0x00000000000000000002c90300ed0960.
>> ib_srpt Received DREQ and sent DREP for session 0x00000000000000000002c90300ed0960.
>> ib_srpt Received IB TimeWait exit for cm_id ffff88046d1fb200.
>> ib_srpt Received IB TimeWait exit for cm_id ffff880454ffa000.
>> ib_srpt Session 0x00000000000000000002c90300ed0960: kernel thread ib_srpt_compl (PID 8585) stopped
>
> I don't know how that can cause all the other errors though.

Me neither...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-08-31  7:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30 14:50 [PATCH v4 00/14] Demux IB CM requests in the rdma_cm module Haggai Eran
     [not found] ` <1438267826-32155-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-30 14:50   ` [PATCH v4 01/14] IB/core: Add rwsem to allow reading device list or client list Haggai Eran
2015-07-30 14:50   ` [PATCH v4 02/14] IB/core: lock client data with lists_rwsem Haggai Eran
2015-07-30 14:50   ` [PATCH v4 03/14] IB/core: Find the network device matching connection parameters Haggai Eran
2015-07-30 14:50   ` [PATCH v4 04/14] IB/ipoib: Return IPoIB devices " Haggai Eran
2015-07-30 14:50   ` [PATCH v4 05/14] IB/cm: Expose service ID in request events Haggai Eran
2015-07-30 14:50   ` [PATCH v4 06/14] IB/cm: Share listening CM IDs Haggai Eran
2015-07-30 14:50   ` [PATCH v4 07/14] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
2015-07-30 14:50   ` [PATCH v4 08/14] IB/cma: Helper functions to access port space IDRs Haggai Eran
2015-07-30 14:50   ` [PATCH v4 09/14] IB/cm: Expose BTH P_Key in CM and SIDR request events Haggai Eran
     [not found]     ` <1438267826-32155-10-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-30 18:23       ` Sagi Grimberg
     [not found]         ` <55E34A05.8040205-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-08-31  6:50           ` Haggai Eran
     [not found]             ` <55E3F93D.6000400-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-08-31  7:41               ` Sagi Grimberg
2015-07-30 14:50   ` [PATCH v4 10/14] IB/cma: Add net_dev and private data checks to RDMA CM Haggai Eran
2015-07-30 14:50   ` [PATCH v4 11/14] IB/cma: Validate routing of incoming requests Haggai Eran
2015-07-30 14:50   ` [PATCH v4 12/14] IB/cma: Use found net_dev for passive connections Haggai Eran
2015-07-30 14:50   ` [PATCH v4 13/14] IB/cma: Share ib_cm_ids between rdma_cm_ids Haggai Eran
2015-07-30 14:50   ` [PATCH v4 14/14] IB/cm: Remove compare_data checks Haggai Eran
2015-07-30 15:30   ` [PATCH v4 00/14] Demux IB CM requests in the rdma_cm module Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).