All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/12] Demux IB CM requests in the rdma_cm module
@ 2015-06-22 12:42 Haggai Eran
  2015-06-22 12:42 ` [PATCH v1 01/12] IB/core: pass client data to remove() callbacks Haggai Eran
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Thanks everyone for the review comments. I've updated the patch set
accordingly. The changes are listed below.

Changes from v0:
- Added a patch to prevent a race between ib_unregister_device() and
  ib_get_net_dev_by_params().
- Removed the patch that exported a UD GMP packet's GID from the GRH, and
  related code.
- Patch 3:
  * Add _rcu suffix to ipoib_is_dev_match_addr().
  * Add helper function to get the master netdev for bonding support.
  * Scan for matching net devices in two phases: first without looking at
  * the IP address, and then looking at the IP address only when the first
    phase did not find a unique net device.
- Patch 5:
  * Do not init listen_sharecount = 1 for non-listening ib_cm_ids.
  * Remove code that sets a CM ID's state to IB_CM_IDLE right before
    destruction.
  * Rename ib_cm_id_create_and_listen() to ib_cm_insert_listen().
  * Do not increase reference counts when failing to add a shared CM ID due
    to having a different handler callback.
- Patch 9: Clean IPv4 net_dev validation function.
- Added patch 10: new patch to use the found net_dev in IB/cma for
  eliminating unneeded calls to cma_translate_addr.
- Patch 12: Remove the lock argument to __ib_cm_listen().

The rdma_cm module relies today on the ib_cm module to demux incoming
requests based on their service ID and IP address. The ib_cm module is the
wrong place to perform this task, as it can also be used with services that
do not adhere to the RDMA IP CM service as defined in the IBA
specifications. It is forced to use an opaque private data struct and mask
to compare incoming requests against.

This series moves that demux task responsibility to the rdma_cm module. The
rdma_cm module can look into the private data attached to a CM request,
containing the IP addresses related to the request. It uses the details of
the request to find the net device associated with the request, and use
that net device to find the correct listening rdma_cm_id.

The series applies against Doug's for-v4.2 tree with the patch adding a
rwsem to IB core [2] applied.

The series is structured as follows:
Patch 1 prevents a possible race between ib_client.remove() callbacks from
ib_unregister_device(), and ib_client callbacks that rely on the
lists_rwsem locked for read, such as ib_get_net_dev_by_params(). Both
callbacks may call ib_get_client_data(), and the patch makes sure that the
remove callback doesn't free the client data while it is being used by the
other callback.

Patches 2-3 add the ability to lookup a network device according to the IB
device, port, P_Key, GID and IP address. They find the matching IPoIB
interfaces, and return a matching net_device if one exists.

Patches 4-5 make necessary changes in ib_cm to allow RDMA CM get the
information it needs out of CM and SIDR requests, and share a single
ib_cm_id with multiple RDMA CM listeners.

Patches 6-7 do some preliminary refactoring to the rdma_cm module. They
allow extracting information out of incoming requests instead of retrieving
them from a listening CM ID, and add helper functions to access the port
space IDRs.

Finally, patches 8-11 change rdma_cm to demultiplex requests on its own, and
patch 12 cleans up the now unneeded code in ib_cm to compare against the
private data.

This series contains a subset of the RDMA CM namespaces patches [1]. The
changes from v4 of the relevant patches are:
- Patch 1
  * in addition to the IB device, port, P_Key and IP address, pass
    also the GID, to make future IPoIB devices with alias GIDs to unique.
  * return the matching net_device instead of a network namespace.
- Patch 2: use IS_ENABLED(CONFIG_IPV6) without ifdefs.
- Patch 5:
  * rename sharecount -> listen_sharecount.
  * use a regular int instead of atomic for the share count, protected by
    the cm.lock spinlock.
  * change id destruction and shared listener creation to prevent the case
    where an id is found but it is under destruction.

[1] [PATCH v4 for-next 00/12] Add network namespace support in the RDMA-CM
    http://www.spinics.net/lists/linux-rdma/msg25244.html
[2] [PATCH for-next V5 02/12] IB/core: Add rwsem to allow reading device list or client list
    http://www.spinics.net/lists/linux-rdma/msg25931.html

Guy Shapiro (1):
  IB/ipoib: Return IPoIB devices matching connection parameters

Haggai Eran (10):
  IB/core: pass client data to remove() callbacks
  IB/cm: Expose service ID in request events
  IB/cm: Share listening CM IDs
  IB/cma: Refactor RDMA IP CM private-data parsing code
  IB/cma: Helper functions to access port space IDRs
  IB/cma: Add net_dev and private data checks to RDMA CM
  IB/cma: validate routing of incoming requests
  IB/cma: use found net_dev for passive connections
  IB/cma: Share ib_cm_ids between rdma_cm_ids
  IB/cm: Remove compare_data checks

Yotam Kenneth (1):
  IB/core: Find the network device matching connection parameters

 drivers/infiniband/core/cache.c           |   2 +-
 drivers/infiniband/core/cm.c              | 199 ++++++----
 drivers/infiniband/core/cma.c             | 603 ++++++++++++++++++++++--------
 drivers/infiniband/core/device.c          |  56 ++-
 drivers/infiniband/core/mad.c             |   2 +-
 drivers/infiniband/core/multicast.c       |   7 +-
 drivers/infiniband/core/sa_query.c        |   6 +-
 drivers/infiniband/core/ucm.c             |   9 +-
 drivers/infiniband/core/user_mad.c        |   6 +-
 drivers/infiniband/core/uverbs_main.c     |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   |   2 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 234 +++++++++++-
 drivers/infiniband/ulp/srp/ib_srp.c       |   6 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c     |   7 +-
 include/rdma/ib_cm.h                      |  19 +-
 include/rdma/ib_verbs.h                   |  43 ++-
 net/rds/ib.c                              |   8 +-
 net/rds/ib_send.c                         |   3 -
 net/rds/iw.c                              |  11 +-
 net/rds/iw_cm.c                           |  18 +-
 net/rds/iw_send.c                         |   4 +
 21 files changed, 934 insertions(+), 317 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v1 01/12] IB/core: pass client data to remove() callbacks
  2015-06-22 12:42 [PATCH v1 00/12] Demux IB CM requests in the rdma_cm module Haggai Eran
@ 2015-06-22 12:42 ` Haggai Eran
  2015-07-08 20:29   ` Jason Gunthorpe
  2015-06-22 12:42 ` [PATCH v1 02/12] IB/core: Find the network device matching connection parameters Haggai Eran
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma, netdev, Liran Liss, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth, Jason Gunthorpe, Haggai Eran

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

This can cause the remove() callback and the second callback to race when
accessing the private client data.

To solve this, make sure ib_get_client_data() returns NULL before releasing
lists_rwsem in ib_unregister_device(). Since remove() also needs access to
the client data, change the remove callback to accept it as a parameter.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/infiniband/core/cache.c           |  2 +-
 drivers/infiniband/core/cm.c              |  7 +++----
 drivers/infiniband/core/cma.c             |  7 +++----
 drivers/infiniband/core/device.c          | 27 +++++++++++++++------------
 drivers/infiniband/core/mad.c             |  2 +-
 drivers/infiniband/core/multicast.c       |  7 +++----
 drivers/infiniband/core/sa_query.c        |  6 +++---
 drivers/infiniband/core/ucm.c             |  6 +++---
 drivers/infiniband/core/user_mad.c        |  6 +++---
 drivers/infiniband/core/uverbs_main.c     |  6 +++---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  7 +++----
 drivers/infiniband/ulp/srp/ib_srp.c       |  6 +++---
 drivers/infiniband/ulp/srpt/ib_srpt.c     |  5 ++---
 include/rdma/ib_verbs.h                   |  6 +++++-
 net/rds/ib.c                              |  8 ++------
 net/rds/ib_send.c                         |  3 ---
 net/rds/iw.c                              | 11 ++++++-----
 net/rds/iw_cm.c                           | 18 ++++++++++++++----
 net/rds/iw_send.c                         |  4 ++++
 19 files changed, 77 insertions(+), 67 deletions(-)

diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 871da832d016..c93af66cc091 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -394,7 +394,7 @@ err:
 	kfree(device->cache.lmc_cache);
 }
 
-static void ib_cache_cleanup_one(struct ib_device *device)
+static void ib_cache_cleanup_one(struct ib_device *device, void *client_data)
 {
 	int p;
 
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 32063add9280..0103534472e0 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -58,7 +58,7 @@ MODULE_DESCRIPTION("InfiniBand CM");
 MODULE_LICENSE("Dual BSD/GPL");
 
 static void cm_add_one(struct ib_device *device);
-static void cm_remove_one(struct ib_device *device);
+static void cm_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client cm_client = {
 	.name   = "cm",
@@ -3846,9 +3846,9 @@ free:
 	kfree(cm_dev);
 }
 
-static void cm_remove_one(struct ib_device *ib_device)
+static void cm_remove_one(struct ib_device *ib_device, void *client_data)
 {
-	struct cm_device *cm_dev;
+	struct cm_device *cm_dev = client_data;
 	struct cm_port *port;
 	struct ib_port_modify port_modify = {
 		.clr_port_cap_mask = IB_PORT_CM_SUP
@@ -3856,7 +3856,6 @@ static void cm_remove_one(struct ib_device *ib_device)
 	unsigned long flags;
 	int i;
 
-	cm_dev = ib_get_client_data(ib_device, &cm_client);
 	if (!cm_dev)
 		return;
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 3b943b700a63..df456b6beb40 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -94,7 +94,7 @@ const char *rdma_event_msg(enum rdma_cm_event_type event)
 EXPORT_SYMBOL(rdma_event_msg);
 
 static void cma_add_one(struct ib_device *device);
-static void cma_remove_one(struct ib_device *device);
+static void cma_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client cma_client = {
 	.name   = "cma",
@@ -3541,11 +3541,10 @@ static void cma_process_remove(struct cma_device *cma_dev)
 	wait_for_completion(&cma_dev->comp);
 }
 
-static void cma_remove_one(struct ib_device *device)
+static void cma_remove_one(struct ib_device *device, void *client_data)
 {
-	struct cma_device *cma_dev;
+	struct cma_device *cma_dev = client_data;
 
-	cma_dev = ib_get_client_data(device, &cma_client);
 	if (!cma_dev)
 		return;
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index f08d438205ed..c62974187b5c 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -340,7 +340,7 @@ EXPORT_SYMBOL(ib_register_device);
  */
 void ib_unregister_device(struct ib_device *device)
 {
-	struct ib_client *client;
+	struct list_head client_data_list;
 	struct ib_client_data *context, *tmp;
 	unsigned long flags;
 
@@ -348,21 +348,22 @@ void ib_unregister_device(struct ib_device *device)
 
 	down_write(&lists_rwsem);
 	list_del(&device->core_list);
+	spin_lock_irqsave(&device->client_data_lock, flags);
+	list_cut_position(&client_data_list, &device->client_data_list,
+			  device->client_data_list.prev);
+	spin_unlock_irqrestore(&device->client_data_lock, flags);
 	up_write(&lists_rwsem);
 
-	list_for_each_entry_reverse(client, &client_list, list)
-		if (client->remove)
-			client->remove(device);
+	list_for_each_entry_safe(context, tmp, &client_data_list, list) {
+		if (context->client->remove)
+			context->client->remove(device, context->data);
+		kfree(context);
+	}
 
 	mutex_unlock(&device_mutex);
 
 	ib_device_unregister_sysfs(device);
 
-	spin_lock_irqsave(&device->client_data_lock, flags);
-	list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
-		kfree(context);
-	spin_unlock_irqrestore(&device->client_data_lock, flags);
-
 	device->reg_state = IB_DEV_UNREGISTERED;
 }
 EXPORT_SYMBOL(ib_unregister_device);
@@ -412,6 +413,7 @@ void ib_unregister_client(struct ib_client *client)
 {
 	struct ib_client_data *context, *tmp;
 	struct ib_device *device;
+	void *client_data = NULL;
 	unsigned long flags;
 
 	mutex_lock(&device_mutex);
@@ -421,16 +423,17 @@ void ib_unregister_client(struct ib_client *client)
 	up_write(&lists_rwsem);
 
 	list_for_each_entry(device, &device_list, core_list) {
-		if (client->remove)
-			client->remove(device);
-
 		spin_lock_irqsave(&device->client_data_lock, flags);
 		list_for_each_entry_safe(context, tmp, &device->client_data_list, list)
 			if (context->client == client) {
+				client_data = context->data;
 				list_del(&context->list);
 				kfree(context);
 			}
 		spin_unlock_irqrestore(&device->client_data_lock, flags);
+
+		if (client->remove)
+			client->remove(device, client_data);
 	}
 
 	mutex_unlock(&device_mutex);
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index a4b1466c1bf6..3341510506d3 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3340,7 +3340,7 @@ error:
 	}
 }
 
-static void ib_mad_remove_device(struct ib_device *device)
+static void ib_mad_remove_device(struct ib_device *device, void *client_data)
 {
 	int start, end, i;
 
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index 1244f02a5c6d..0e0715889768 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -43,7 +43,7 @@
 #include "sa.h"
 
 static void mcast_add_one(struct ib_device *device);
-static void mcast_remove_one(struct ib_device *device);
+static void mcast_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client mcast_client = {
 	.name   = "ib_multicast",
@@ -844,13 +844,12 @@ static void mcast_add_one(struct ib_device *device)
 	ib_register_event_handler(&dev->event_handler);
 }
 
-static void mcast_remove_one(struct ib_device *device)
+static void mcast_remove_one(struct ib_device *device, void *client_data)
 {
-	struct mcast_device *dev;
+	struct mcast_device *dev = client_data;
 	struct mcast_port *port;
 	int i;
 
-	dev = ib_get_client_data(device, &mcast_client);
 	if (!dev)
 		return;
 
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 0fae85062a65..f691be308fc4 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -107,7 +107,7 @@ struct ib_sa_mcmember_query {
 };
 
 static void ib_sa_add_one(struct ib_device *device);
-static void ib_sa_remove_one(struct ib_device *device);
+static void ib_sa_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client sa_client = {
 	.name   = "sa",
@@ -1225,9 +1225,9 @@ free:
 	return;
 }
 
-static void ib_sa_remove_one(struct ib_device *device)
+static void ib_sa_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
+	struct ib_sa_device *sa_dev = client_data;
 	int i;
 
 	if (!sa_dev)
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 62c24b1452b8..2ce29cd43639 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -109,7 +109,7 @@ enum {
 #define IB_UCM_BASE_DEV MKDEV(IB_UCM_MAJOR, IB_UCM_BASE_MINOR)
 
 static void ib_ucm_add_one(struct ib_device *device);
-static void ib_ucm_remove_one(struct ib_device *device);
+static void ib_ucm_remove_one(struct ib_device *device, void *client_data);
 
 static struct ib_client ucm_client = {
 	.name   = "ucm",
@@ -1310,9 +1310,9 @@ err:
 	return;
 }
 
-static void ib_ucm_remove_one(struct ib_device *device)
+static void ib_ucm_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_ucm_device *ucm_dev = ib_get_client_data(device, &ucm_client);
+	struct ib_ucm_device *ucm_dev = client_data;
 
 	if (!ucm_dev)
 		return;
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 35567fffaa4e..57f281f8d686 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -133,7 +133,7 @@ static DEFINE_SPINLOCK(port_lock);
 static DECLARE_BITMAP(dev_map, IB_UMAD_MAX_PORTS);
 
 static void ib_umad_add_one(struct ib_device *device);
-static void ib_umad_remove_one(struct ib_device *device);
+static void ib_umad_remove_one(struct ib_device *device, void *client_data);
 
 static void ib_umad_release_dev(struct kobject *kobj)
 {
@@ -1322,9 +1322,9 @@ free:
 	kobject_put(&umad_dev->kobj);
 }
 
-static void ib_umad_remove_one(struct ib_device *device)
+static void ib_umad_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_umad_device *umad_dev = ib_get_client_data(device, &umad_client);
+	struct ib_umad_device *umad_dev = client_data;
 	int i;
 
 	if (!umad_dev)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index f6eef2da7097..46c92294afa5 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -128,7 +128,7 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
 };
 
 static void ib_uverbs_add_one(struct ib_device *device);
-static void ib_uverbs_remove_one(struct ib_device *device);
+static void ib_uverbs_remove_one(struct ib_device *device, void *client_data);
 
 static void ib_uverbs_release_dev(struct kref *ref)
 {
@@ -948,9 +948,9 @@ err:
 	return;
 }
 
-static void ib_uverbs_remove_one(struct ib_device *device)
+static void ib_uverbs_remove_one(struct ib_device *device, void *client_data)
 {
-	struct ib_uverbs_device *uverbs_dev = ib_get_client_data(device, &uverbs_client);
+	struct ib_uverbs_device *uverbs_dev = client_data;
 
 	if (!uverbs_dev)
 		return;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index da149c278cb8..d128432f226d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -89,7 +89,7 @@ struct workqueue_struct *ipoib_workqueue;
 struct ib_sa_client ipoib_sa_client;
 
 static void ipoib_add_one(struct ib_device *device);
-static void ipoib_remove_one(struct ib_device *device);
+static void ipoib_remove_one(struct ib_device *device, void *client_data);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
 
 static struct ib_client ipoib_client = {
@@ -1720,12 +1720,11 @@ static void ipoib_add_one(struct ib_device *device)
 	ib_set_client_data(device, &ipoib_client, dev_list);
 }
 
-static void ipoib_remove_one(struct ib_device *device)
+static void ipoib_remove_one(struct ib_device *device, void *client_data)
 {
 	struct ipoib_dev_priv *priv, *tmp;
-	struct list_head *dev_list;
+	struct list_head *dev_list = client_data;
 
-	dev_list = ib_get_client_data(device, &ipoib_client);
 	if (!dev_list)
 		return;
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index eada8f758ad4..335bdf53670e 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -131,7 +131,7 @@ MODULE_PARM_DESC(ch_count,
 		 "Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA.");
 
 static void srp_add_one(struct ib_device *device);
-static void srp_remove_one(struct ib_device *device);
+static void srp_remove_one(struct ib_device *device, void *client_data);
 static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
 static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
@@ -3471,13 +3471,13 @@ free_attr:
 	kfree(dev_attr);
 }
 
-static void srp_remove_one(struct ib_device *device)
+static void srp_remove_one(struct ib_device *device, void *client_data)
 {
 	struct srp_device *srp_dev;
 	struct srp_host *host, *tmp_host;
 	struct srp_target_port *target;
 
-	srp_dev = ib_get_client_data(device, &srp_client);
+	srp_dev = client_data;
 	if (!srp_dev)
 		return;
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 0b2857b1b112..54da6e8ddfa1 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -3335,12 +3335,11 @@ err:
 /**
  * srpt_remove_one() - InfiniBand device removal callback function.
  */
-static void srpt_remove_one(struct ib_device *device)
+static void srpt_remove_one(struct ib_device *device, void *client_data)
 {
-	struct srpt_device *sdev;
+	struct srpt_device *sdev = client_data;
 	int i;
 
-	sdev = ib_get_client_data(device, &srpt_client);
 	if (!sdev) {
 		pr_info("%s(%s): nothing to do.\n", __func__, device->name);
 		return;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 986fddb08579..1767a11e86fd 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1760,7 +1760,7 @@ struct ib_device {
 struct ib_client {
 	char  *name;
 	void (*add)   (struct ib_device *);
-	void (*remove)(struct ib_device *);
+	void (*remove)(struct ib_device *, void *client_data);
 
 	struct list_head list;
 };
@@ -1776,6 +1776,10 @@ void ib_unregister_device(struct ib_device *device);
 int ib_register_client   (struct ib_client *client);
 void ib_unregister_client(struct ib_client *client);
 
+/* Returns the client data stored by ib_set_client_data, or NULL if the data
+ * is not available.
+ *
+ * May return NULL before the call to ib_client.remove(). */
 void *ib_get_client_data(struct ib_device *device, struct ib_client *client);
 void  ib_set_client_data(struct ib_device *device, struct ib_client *client,
 			 void *data);
diff --git a/net/rds/ib.c b/net/rds/ib.c
index ba2dffeff608..e005aeff1f7d 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -230,19 +230,15 @@ struct rds_ib_device *rds_ib_get_client_data(struct ib_device *device)
  *
  * This can be called at any time and can be racing with any other RDS path.
  */
-static void rds_ib_remove_one(struct ib_device *device)
+static void rds_ib_remove_one(struct ib_device *device, void *client_data)
 {
-	struct rds_ib_device *rds_ibdev;
+	struct rds_ib_device *rds_ibdev = client_data;
 
-	rds_ibdev = ib_get_client_data(device, &rds_ib_client);
 	if (!rds_ibdev)
 		return;
 
 	rds_ib_dev_shutdown(rds_ibdev);
 
-	/* stop connection attempts from getting a reference to this device. */
-	ib_set_client_data(device, &rds_ib_client, NULL);
-
 	down_write(&rds_ib_devices_lock);
 	list_del_rcu(&rds_ibdev->list);
 	up_write(&rds_ib_devices_lock);
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 5d0a704fa039..a0b600a6e44d 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -759,14 +759,11 @@ int rds_ib_xmit_atomic(struct rds_connection *conn, struct rm_atomic_op *op)
 	struct rds_ib_connection *ic = conn->c_transport_data;
 	struct rds_ib_send_work *send = NULL;
 	struct ib_send_wr *failed_wr;
-	struct rds_ib_device *rds_ibdev;
 	u32 pos;
 	u32 work_alloc;
 	int ret;
 	int nr_sig = 0;
 
-	rds_ibdev = ib_get_client_data(ic->i_cm_id->device, &rds_ib_client);
-
 	work_alloc = rds_ib_ring_alloc(&ic->i_send_ring, 1, &pos);
 	if (work_alloc != 1) {
 		rds_ib_ring_unalloc(&ic->i_send_ring, work_alloc);
diff --git a/net/rds/iw.c b/net/rds/iw.c
index 589935661d66..3ebd7fbb8b7a 100644
--- a/net/rds/iw.c
+++ b/net/rds/iw.c
@@ -125,12 +125,11 @@ free_attr:
 	kfree(dev_attr);
 }
 
-static void rds_iw_remove_one(struct ib_device *device)
+static void rds_iw_remove_one(struct ib_device *device, void *client_data)
 {
-	struct rds_iw_device *rds_iwdev;
+	struct rds_iw_device *rds_iwdev = client_data;
 	struct rds_iw_cm_id *i_cm_id, *next;
 
-	rds_iwdev = ib_get_client_data(device, &rds_iw_client);
 	if (!rds_iwdev)
 		return;
 
@@ -192,8 +191,10 @@ static int rds_iw_conn_info_visitor(struct rds_connection *conn,
 		rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);
 		iinfo->max_send_wr = ic->i_send_ring.w_nr;
 		iinfo->max_recv_wr = ic->i_recv_ring.w_nr;
-		iinfo->max_send_sge = rds_iwdev->max_sge;
-		rds_iw_get_mr_info(rds_iwdev, iinfo);
+		if (rds_iwdev) {
+			iinfo->max_send_sge = rds_iwdev->max_sge;
+			rds_iw_get_mr_info(rds_iwdev, iinfo);
+		}
 	}
 	return 1;
 }
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index 8f486fa32079..8837131c14b0 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -85,10 +85,13 @@ void rds_iw_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
 
 	/* update ib_device with this local ipaddr & conn */
 	rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);
-	err = rds_iw_update_cm_id(rds_iwdev, ic->i_cm_id);
-	if (err)
-		printk(KERN_ERR "rds_iw_update_ipaddr failed (%d)\n", err);
-	rds_iw_add_conn(rds_iwdev, conn);
+	if (rds_iwdev) {
+		err = rds_iw_update_cm_id(rds_iwdev, ic->i_cm_id);
+		if (err)
+			printk(KERN_ERR "rds_iw_update_ipaddr failed (%d)\n",
+			       err);
+		rds_iw_add_conn(rds_iwdev, conn);
+	}
 
 	/* If the peer gave us the last packet it saw, process this as if
 	 * we had received a regular ACK. */
@@ -445,6 +448,8 @@ int rds_iw_cm_handle_connect(struct rdma_cm_id *cm_id,
 	cm_id->context = conn;
 
 	rds_iwdev = ib_get_client_data(cm_id->device, &rds_iw_client);
+	if (!rds_iwdev)
+		goto out;
 	ic->i_dma_local_lkey = rds_iwdev->dma_local_lkey;
 
 	/* We got halfway through setting up the ib_connection, if we
@@ -549,6 +554,11 @@ int rds_iw_conn_connect(struct rds_connection *conn)
 	}
 
 	rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);
+	if (!rds_iwdev) {
+		rdma_destroy_id(ic->i_cm_id);
+		ic->i_cm_id = NULL;
+		goto out;
+	}
 	ic->i_dma_local_lkey = rds_iwdev->dma_local_lkey;
 
 	dest.sin_family = AF_INET;
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index 334fe98c5084..f3944a1ca76c 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -809,6 +809,10 @@ int rds_iw_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op)
 	int num_sge;
 
 	rds_iwdev = ib_get_client_data(ic->i_cm_id->device, &rds_iw_client);
+	if (!rds_iwdev) {
+		ret = -ENODEV;
+		goto out;
+	}
 
 	/* map the message the first time we see it */
 	if (!op->op_mapped) {
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 02/12] IB/core: Find the network device matching connection parameters
  2015-06-22 12:42 [PATCH v1 00/12] Demux IB CM requests in the rdma_cm module Haggai Eran
  2015-06-22 12:42 ` [PATCH v1 01/12] IB/core: pass client data to remove() callbacks Haggai Eran
@ 2015-06-22 12:42 ` Haggai Eran
  2015-07-08 20:33   ` Jason Gunthorpe
  2015-06-22 12:42 ` [PATCH v1 04/12] IB/cm: Expose service ID in request events Haggai Eran
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  3 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma, netdev, Liran Liss, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth, Jason Gunthorpe, Haggai Eran

From: Yotam Kenneth <yotamke@mellanox.com>

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB.  To resolve the device in these
cases the code will also take the IP address as an additional input.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
---
 drivers/infiniband/core/device.c | 29 +++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h          | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c62974187b5c..2d26eb383400 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -38,6 +38,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
+#include <linux/netdevice.h>
 #include <rdma/rdma_netlink.h>
 
 #include "core_priv.h"
@@ -749,6 +750,34 @@ int ib_find_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_pkey);
 
+struct net_device *ib_get_net_dev_by_params(struct ib_device *dev,
+					    u8 port,
+					    u16 pkey,
+					    const union ib_gid *gid,
+					    const struct sockaddr *addr)
+{
+	struct net_device *net_dev = NULL;
+	struct ib_client *client;
+
+	if (!rdma_protocol_ib(dev, port))
+		return NULL;
+
+	down_read(&lists_rwsem);
+
+	list_for_each_entry(client, &client_list, list)
+		if (client->get_net_dev_by_params) {
+			net_dev = client->get_net_dev_by_params(dev, port, pkey,
+								gid, addr);
+			if (net_dev)
+				break;
+		}
+
+	up_read(&lists_rwsem);
+
+	return net_dev;
+}
+EXPORT_SYMBOL(ib_get_net_dev_by_params);
+
 static int __init ib_core_init(void)
 {
 	int ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1767a11e86fd..1d3c418778ad 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -48,6 +48,7 @@
 #include <linux/rwsem.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <linux/socket.h>
 #include <uapi/linux/if_ether.h>
 
 #include <linux/atomic.h>
@@ -1762,6 +1763,26 @@ struct ib_client {
 	void (*add)   (struct ib_device *);
 	void (*remove)(struct ib_device *, void *client_data);
 
+	/* Returns the net_dev belonging to this ib_client and matching the
+	 * given parameters.
+	 * @dev:	An RDMA device that the net_dev use for communication.
+	 * @port:	A physical port number on the RDMA device.
+	 * @pkey:	P_Key that the net_dev uses if applicable.
+	 * @gid:	A GID that the net_dev uses to communicate.
+	 * @addr:	An IP address the net_dev is configured with.
+	 *
+	 * An ib_client that implements a net_dev on top of RDMA devices
+	 * (such as IP over IB) should implement this callback, allowing the
+	 * rdma_cm module to find the right net_dev for a given request.
+	 *
+	 * The caller is responsible for calling dev_put on the returned
+	 * netdev. */
+	struct net_device *(*get_net_dev_by_params)(
+			struct ib_device *dev,
+			u8 port,
+			u16 pkey,
+			const union ib_gid *gid,
+			const struct sockaddr *addr);
 	struct list_head list;
 };
 
@@ -3030,4 +3051,18 @@ static inline int ib_check_mr_access(int flags)
 int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
 		       struct ib_mr_status *mr_status);
 
+/**
+ * ib_get_net_dev_by_params() - Return the appropriate net_dev
+ * for a received CM request
+ * @dev:	An RDMA device on which the request has been received.
+ * @port:	Port number on the RDMA device.
+ * @pkey:	The Pkey the request came on.
+ * @gid:	A GID that the net_dev uses to communicate.
+ * @addr:	Contains the IP address that the request specified as its
+ *		destination.
+ */
+struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
+					    u16 pkey, const union ib_gid *gid,
+					    const struct sockaddr *addr);
+
 #endif /* IB_VERBS_H */
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-06-22 12:42   ` Haggai Eran
       [not found]     ` <1434976961-27424-4-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-22 12:42   ` [PATCH v1 05/12] IB/cm: Share listening CM IDs Haggai Eran
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.

For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 227 +++++++++++++++++++++++++++++-
 1 file changed, 226 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index d128432f226d..eb33a7b2be04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,9 @@
 
 #include <linux/jhash.h>
 #include <net/arp.h>
+#include <net/addrconf.h>
+#include <linux/inetdevice.h>
+#include <rdma/ib_cache.h>
 
 #define DRV_VERSION "1.0.0"
 
@@ -91,11 +94,15 @@ struct ib_sa_client ipoib_sa_client;
 static void ipoib_add_one(struct ib_device *device);
 static void ipoib_remove_one(struct ib_device *device, void *client_data);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
+static struct net_device *ipoib_get_net_dev_by_params(
+		struct ib_device *dev, u8 port, u16 pkey,
+		const union ib_gid *gid, const struct sockaddr *addr);
 
 static struct ib_client ipoib_client = {
 	.name   = "ipoib",
 	.add    = ipoib_add_one,
-	.remove = ipoib_remove_one
+	.remove = ipoib_remove_one,
+	.get_net_dev_by_params = ipoib_get_net_dev_by_params,
 };
 
 int ipoib_open(struct net_device *dev)
@@ -222,6 +229,224 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+/* Called with an RCU read lock taken */
+static bool ipoib_is_dev_match_addr_rcu(const struct sockaddr *addr,
+					struct net_device *dev)
+{
+	struct net *net = dev_net(dev);
+	struct in_device *in_dev;
+	struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+	struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
+	__be32 ret_addr;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		in_dev = in_dev_get(dev);
+		if (!in_dev)
+			return false;
+
+		ret_addr = inet_confirm_addr(net, in_dev, 0,
+					     addr_in->sin_addr.s_addr,
+					     RT_SCOPE_HOST);
+		in_dev_put(in_dev);
+		if (ret_addr)
+			return true;
+
+		break;
+	case AF_INET6:
+		if (IS_ENABLED(CONFIG_IPV6) &&
+		    ipv6_chk_addr(net, &addr_in6->sin6_addr, dev, 1))
+			return true;
+
+		break;
+	}
+	return false;
+}
+
+/**
+ * Find the master net_device on top of the given net_device.
+ * @dev: base IPoIB net_device
+ *
+ * Returns the master net_device with a reference held, or the same net_device
+ * if no master exists.
+ */
+static struct net_device *ipoib_get_master_net_dev(struct net_device *dev)
+{
+	struct net_device *master;
+
+	rcu_read_lock();
+	master = netdev_master_upper_dev_get_rcu(dev);
+	if (master)
+		dev_hold(master);
+	rcu_read_unlock();
+
+	if (master)
+		return master;
+
+	dev_hold(dev);
+	return dev;
+}
+
+/**
+ * Find a net_device matching the given address, which is an upper device of
+ * the given net_device.
+ * @addr: IP address to look for.
+ * @dev: base IPoIB net_device
+ *
+ * If found, returns the net_device with a reference held. Otherwise return
+ * NULL.
+ */
+static struct net_device *ipoib_get_net_dev_match_addr(
+		const struct sockaddr *addr, struct net_device *dev)
+{
+	struct net_device *upper,
+			  *result = NULL;
+	struct list_head *iter;
+
+	rcu_read_lock();
+	if (ipoib_is_dev_match_addr_rcu(addr, dev)) {
+		dev_hold(dev);
+		result = dev;
+		goto out;
+	}
+
+	netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
+		if (ipoib_is_dev_match_addr_rcu(addr, upper)) {
+			dev_hold(upper);
+			result = upper;
+			break;
+		}
+	}
+out:
+	rcu_read_unlock();
+	return result;
+}
+
+/* returns the number of IPoIB netdevs on top a given ipoib device matching a
+ * pkey_index and address, if one exists.
+ *
+ * @found_net_dev: contains a matching net_device if the return value >= 1,
+ * with a reference held. */
+static int ipoib_match_gid_pkey_addr(struct ipoib_dev_priv *priv,
+				     const union ib_gid *gid,
+				     u16 pkey_index,
+				     const struct sockaddr *addr,
+				     struct net_device **found_net_dev)
+{
+	struct ipoib_dev_priv *child_priv;
+	struct net_device *net_dev = NULL;
+	int matches = 0;
+
+	if (priv->pkey_index == pkey_index &&
+	    (!gid || !memcmp(gid, &priv->local_gid, sizeof(*gid)))) {
+		if (!addr) {
+			net_dev = ipoib_get_master_net_dev(priv->dev);
+		} else {
+			/* Verify the net_device matches the IP address, as
+			 * IPoIB child devices currently share a GID. */
+			net_dev = ipoib_get_net_dev_match_addr(addr, priv->dev);
+		}
+		if (net_dev) {
+			ipoib_warn(priv, "matching net_dev found: %s\n",
+				   net_dev->name);
+			if (!*found_net_dev)
+				*found_net_dev = net_dev;
+			else
+				dev_put(net_dev);
+			++matches;
+		}
+	}
+
+	/* Check child interfaces */
+	down_read(&priv->vlan_rwsem);
+	list_for_each_entry(child_priv, &priv->child_intfs, list) {
+		matches += ipoib_match_gid_pkey_addr(child_priv, gid,
+						    pkey_index, addr,
+						    found_net_dev);
+		if (matches > 1)
+			break;
+	}
+	up_read(&priv->vlan_rwsem);
+
+	return matches;
+}
+
+/* Returns the number of matching net_devs found (between 0 and 2). Also
+ * return the matching net_device in the @net_dev parameter, holding a
+ * reference to the net_device, if the number of matches >= 1 */
+static int __ipoib_get_net_dev_by_params(struct list_head *dev_list, u8 port,
+					 u16 pkey_index,
+					 const union ib_gid *gid,
+					 const struct sockaddr *addr,
+					 struct net_device **net_dev)
+{
+	struct ipoib_dev_priv *priv;
+	int matches = 0;
+
+	*net_dev = NULL;
+
+	list_for_each_entry(priv, dev_list, list) {
+		if (priv->port != port)
+			continue;
+
+		matches += ipoib_match_gid_pkey_addr(priv, gid, pkey_index,
+						     addr, net_dev);
+		if (matches > 1)
+			break;
+	}
+
+	return matches;
+}
+
+static struct net_device *ipoib_get_net_dev_by_params(
+		struct ib_device *dev, u8 port, u16 pkey,
+		const union ib_gid *gid, const struct sockaddr *addr)
+{
+	struct net_device *net_dev;
+	struct list_head *dev_list;
+	u16 pkey_index;
+	int matches;
+	int ret;
+
+	if (!rdma_protocol_ib(dev, port))
+		return NULL;
+
+	ret = ib_find_cached_pkey(dev, port, pkey, &pkey_index);
+	if (ret)
+		return NULL;
+
+	dev_list = ib_get_client_data(dev, &ipoib_client);
+	if (!dev_list)
+		return NULL;
+
+	/* See if we can find a unique device matching the L2 parameters */
+	matches = __ipoib_get_net_dev_by_params(dev_list, port, pkey_index,
+						gid, NULL, &net_dev);
+
+	switch (matches) {
+	case 0:
+		return NULL;
+	case 1:
+		return net_dev;
+	}
+
+	dev_put(net_dev);
+
+	/* Couldn't find a unique device with L2 parameters only. Use L3
+	 * address to uniquely match the net device */
+	matches = __ipoib_get_net_dev_by_params(dev_list, port, pkey_index,
+						gid, addr, &net_dev);
+	switch (matches) {
+	case 0:
+		return NULL;
+	default:
+		dev_warn(&dev->dev, "duplicate IP address detected\n");
+		/* Fall through */
+	case 1:
+		return net_dev;
+	}
+}
+
 int ipoib_set_mode(struct net_device *dev, const char *buf)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 04/12] IB/cm: Expose service ID in request events
  2015-06-22 12:42 [PATCH v1 00/12] Demux IB CM requests in the rdma_cm module Haggai Eran
  2015-06-22 12:42 ` [PATCH v1 01/12] IB/core: pass client data to remove() callbacks Haggai Eran
  2015-06-22 12:42 ` [PATCH v1 02/12] IB/core: Find the network device matching connection parameters Haggai Eran
@ 2015-06-22 12:42 ` Haggai Eran
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  3 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma, netdev, Liran Liss, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth, Jason Gunthorpe, Haggai Eran

Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.

Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
---
 drivers/infiniband/core/cm.c | 3 +++
 include/rdma/ib_cm.h         | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 0103534472e0..dd92c999e6e9 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1256,6 +1256,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 	primary_path->packet_life_time =
 		cm_req_get_primary_local_ack_timeout(req_msg);
 	primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
+	primary_path->service_id = req_msg->service_id;
 
 	if (req_msg->alt_local_lid) {
 		memset(alt_path, 0, sizeof *alt_path);
@@ -1277,6 +1278,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 		alt_path->packet_life_time =
 			cm_req_get_alt_local_ack_timeout(req_msg);
 		alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
+		alt_path->service_id = req_msg->service_id;
 	}
 }
 
@@ -2980,6 +2982,7 @@ static void cm_format_sidr_req_event(struct cm_work *work,
 	param = &work->cm_event.param.sidr_req_rcvd;
 	param->pkey = __be16_to_cpu(sidr_req_msg->pkey);
 	param->listen_id = listen_id;
+	param->service_id = sidr_req_msg->service_id;
 	param->port = work->port->port_num;
 	work->cm_event.private_data = &sidr_req_msg->private_data;
 }
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 39ed2d2fbd51..1b567bbc3ad4 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -223,6 +223,7 @@ struct ib_cm_apr_event_param {
 
 struct ib_cm_sidr_req_event_param {
 	struct ib_cm_id		*listen_id;
+	__be64			service_id;
 	u8			port;
 	u16			pkey;
 };
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 05/12] IB/cm: Share listening CM IDs
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-22 12:42   ` [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
       [not found]     ` <1434976961-27424-6-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-22 12:42   ` [PATCH v1 06/12] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.

This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].

[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
    http://www.spinics.net/lists/netdev/msg328860.html

Cc: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c | 124 ++++++++++++++++++++++++++++++++++++++++---
 include/rdma/ib_cm.h         |   4 ++
 2 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index dd92c999e6e9..6d91bf69b11a 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -212,6 +212,9 @@ struct cm_id_private {
 	spinlock_t lock;	/* Do not acquire inside cm.lock */
 	struct completion comp;
 	atomic_t refcount;
+	/* Number of clients sharing this ib_cm_id. Only valid for listeners.
+	 * Protected by the cm.lock spinlock. */
+	int listen_sharecount;
 
 	struct ib_mad_send_buf *msg;
 	struct cm_timewait_info *timewait_info;
@@ -847,9 +850,15 @@ retest:
 	spin_lock_irq(&cm_id_priv->lock);
 	switch (cm_id->state) {
 	case IB_CM_LISTEN:
-		cm_id->state = IB_CM_IDLE;
 		spin_unlock_irq(&cm_id_priv->lock);
+
 		spin_lock_irq(&cm.lock);
+		if (--cm_id_priv->listen_sharecount > 0) {
+			/* The id is still shared. */
+			atomic_dec(&cm_id_priv->refcount);
+			spin_unlock_irq(&cm.lock);
+			return;
+		}
 		rb_erase(&cm_id_priv->service_node, &cm.listen_service_table);
 		spin_unlock_irq(&cm.lock);
 		break;
@@ -929,11 +938,32 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id)
 }
 EXPORT_SYMBOL(ib_destroy_cm_id);
 
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
-		 struct ib_cm_compare_data *compare_data)
+/**
+ * __ib_cm_listen - Initiates listening on the specified service ID for
+ *   connection and service ID resolution requests.
+ * @cm_id: Connection identifier associated with the listen request.
+ * @service_id: Service identifier matched against incoming connection
+ *   and service ID resolution requests.  The service ID should be specified
+ *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
+ *   assign a service ID to the caller.
+ * @service_mask: Mask applied to service ID used to listen across a
+ *   range of service IDs.  If set to 0, the service ID is matched
+ *   exactly.  This parameter is ignored if %service_id is set to
+ *   IB_CM_ASSIGN_SERVICE_ID.
+ * @compare_data: This parameter is optional.  It specifies data that must
+ *   appear in the private data of a connection request for the specified
+ *   listen request.
+ * @lock: If set, lock the cm.lock spin-lock when adding the id to the
+ *   listener tree. When false, the caller must already hold the spin-lock,
+ *   and compare_data must be NULL.
+ */
+static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
+			  __be64 service_mask,
+			  struct ib_cm_compare_data *compare_data,
+			  bool lock)
 {
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
-	unsigned long flags;
+	unsigned long flags = 0;
 	int ret = 0;
 
 	service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
@@ -958,8 +988,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
 	}
 
 	cm_id->state = IB_CM_LISTEN;
+	++cm_id_priv->listen_sharecount;
 
-	spin_lock_irqsave(&cm.lock, flags);
+	if (lock)
+		spin_lock_irqsave(&cm.lock, flags);
 	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
 		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
 		cm_id->service_mask = ~cpu_to_be64(0);
@@ -968,18 +1000,98 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
 		cm_id->service_mask = service_mask;
 	}
 	cur_cm_id_priv = cm_insert_listen(cm_id_priv);
-	spin_unlock_irqrestore(&cm.lock, flags);
+	if (lock)
+		spin_unlock_irqrestore(&cm.lock, flags);
 
 	if (cur_cm_id_priv) {
 		cm_id->state = IB_CM_IDLE;
+		--cm_id_priv->listen_sharecount;
 		kfree(cm_id_priv->compare_data);
 		cm_id_priv->compare_data = NULL;
 		ret = -EBUSY;
 	}
 	return ret;
 }
+
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
+		 struct ib_cm_compare_data *compare_data)
+{
+	return __ib_cm_listen(cm_id, service_id, service_mask, compare_data,
+			      true);
+}
 EXPORT_SYMBOL(ib_cm_listen);
 
+/**
+ * Create a new listening ib_cm_id and listen on the given service ID.
+ *
+ * If there's an existing ID listening on that same device and service ID,
+ * return it.
+ *
+ * @device: Device associated with the cm_id.  All related communication will
+ * be associated with the specified device.
+ * @cm_handler: Callback invoked to notify the user of CM events.
+ * @service_id: Service identifier matched against incoming connection
+ *   and service ID resolution requests.  The service ID should be specified
+ *   network-byte order.  If set to IB_CM_ASSIGN_SERVICE_ID, the CM will
+ *   assign a service ID to the caller.
+ * @service_mask: Mask applied to service ID used to listen across a
+ *   range of service IDs.  If set to 0, the service ID is matched
+ *   exactly.  This parameter is ignored if %service_id is set to
+ *   IB_CM_ASSIGN_SERVICE_ID.
+ *
+ * Callers should call ib_destroy_cm_id when done with the listener ID.
+ */
+struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
+				     ib_cm_handler cm_handler,
+				     __be64 service_id, __be64 service_mask)
+{
+	struct cm_id_private *cm_id_priv;
+	struct ib_cm_id *cm_id;
+	unsigned long flags;
+	int err = 0;
+
+	/* Create an ID in advance, since the creation may sleep */
+	cm_id = ib_create_cm_id(device, cm_handler, NULL);
+	if (IS_ERR(cm_id))
+		return cm_id;
+
+	spin_lock_irqsave(&cm.lock, flags);
+
+	if (service_id == IB_CM_ASSIGN_SERVICE_ID)
+		goto new_id;
+
+	/* Find an existing ID */
+	cm_id_priv = cm_find_listen(device, service_id, NULL);
+	if (cm_id_priv) {
+		if (cm_id->cm_handler != cm_handler || cm_id->context) {
+			/* Sharing an ib_cm_id with different handlers is not
+			 * supported */
+			spin_unlock_irqrestore(&cm.lock, flags);
+			return ERR_PTR(-EINVAL);
+		}
+		atomic_inc(&cm_id_priv->refcount);
+		++cm_id_priv->listen_sharecount;
+		spin_unlock_irqrestore(&cm.lock, flags);
+
+		ib_destroy_cm_id(cm_id);
+		cm_id = &cm_id_priv->id;
+		return cm_id;
+	}
+
+new_id:
+	/* Use newly created ID */
+	err = __ib_cm_listen(cm_id, service_id, service_mask, NULL, false);
+
+	spin_unlock_irqrestore(&cm.lock, flags);
+
+	if (err) {
+		ib_destroy_cm_id(cm_id);
+		return ERR_PTR(err);
+	}
+	return cm_id;
+}
+EXPORT_SYMBOL(ib_cm_insert_listen);
+
 static __be64 cm_form_tid(struct cm_id_private *cm_id_priv,
 			  enum cm_msg_sequence msg_seq)
 {
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 1b567bbc3ad4..f7fd22f10bae 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -362,6 +362,10 @@ struct ib_cm_compare_data {
 int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
 		 struct ib_cm_compare_data *compare_data);
 
+struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
+				     ib_cm_handler cm_handler,
+				     __be64 service_id, __be64 service_mask);
+
 struct ib_cm_req_param {
 	struct ib_sa_path_rec	*primary_path;
 	struct ib_sa_path_rec	*alternate_path;
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 06/12] IB/cma: Refactor RDMA IP CM private-data parsing code
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-22 12:42   ` [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters Haggai Eran
  2015-06-22 12:42   ` [PATCH v1 05/12] IB/cm: Share listening CM IDs Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
  2015-06-22 12:42   ` [PATCH v1 07/12] IB/cma: Helper functions to access port space IDRs Haggai Eran
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.

When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 150 ++++++++++++++++++++++++++----------------
 1 file changed, 92 insertions(+), 58 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index df456b6beb40..21cc6c99008b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -870,97 +870,122 @@ static inline int cma_any_port(struct sockaddr *addr)
 	return !cma_port(addr);
 }
 
-static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
+static void cma_save_ib_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
 			     struct ib_sa_path_rec *path)
 {
-	struct sockaddr_ib *listen_ib, *ib;
+	struct sockaddr_ib *ib;
 
-	listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
-	ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
-	ib->sib_family = listen_ib->sib_family;
-	ib->sib_pkey = path->pkey;
-	ib->sib_flowinfo = path->flow_label;
-	memcpy(&ib->sib_addr, &path->sgid, 16);
-	ib->sib_sid = listen_ib->sib_sid;
-	ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
-	ib->sib_scope_id = listen_ib->sib_scope_id;
-
-	ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
-	ib->sib_family = listen_ib->sib_family;
-	ib->sib_pkey = path->pkey;
-	ib->sib_flowinfo = path->flow_label;
-	memcpy(&ib->sib_addr, &path->dgid, 16);
-}
-
-static __be16 ss_get_port(const struct sockaddr_storage *ss)
-{
-	if (ss->ss_family == AF_INET)
-		return ((struct sockaddr_in *)ss)->sin_port;
-	else if (ss->ss_family == AF_INET6)
-		return ((struct sockaddr_in6 *)ss)->sin6_port;
-	BUG();
+	if (src_addr) {
+		ib = (struct sockaddr_ib *)src_addr;
+		ib->sib_family = AF_IB;
+		ib->sib_pkey = path->pkey;
+		ib->sib_flowinfo = path->flow_label;
+		memcpy(&ib->sib_addr, &path->sgid, 16);
+		ib->sib_sid = path->service_id;
+		ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
+		ib->sib_scope_id = 0;
+	}
+	if (dst_addr) {
+		ib = (struct sockaddr_ib *)dst_addr;
+		ib->sib_family = AF_IB;
+		ib->sib_pkey = path->pkey;
+		ib->sib_flowinfo = path->flow_label;
+		memcpy(&ib->sib_addr, &path->dgid, 16);
+	}
 }
 
-static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
+static void cma_save_ip4_info(struct sockaddr *src_addr,
+			      struct sockaddr *dst_addr,
+			      struct cma_hdr *hdr,
+			      __be16 local_port)
 {
 	struct sockaddr_in *ip4;
 
-	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
-	ip4->sin_port = ss_get_port(&listen_id->route.addr.src_addr);
+	if (src_addr) {
+		ip4 = (struct sockaddr_in *)src_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
+		ip4->sin_port = local_port;
+	}
 
-	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
-	ip4->sin_port = hdr->port;
+	if (dst_addr) {
+		ip4 = (struct sockaddr_in *)dst_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
+		ip4->sin_port = hdr->port;
+	}
 }
 
-static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
+static void cma_save_ip6_info(struct sockaddr *src_addr,
+			      struct sockaddr *dst_addr,
+			      struct cma_hdr *hdr,
+			      __be16 local_port)
 {
 	struct sockaddr_in6 *ip6;
 
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->dst_addr.ip6;
-	ip6->sin6_port = ss_get_port(&listen_id->route.addr.src_addr);
+	if (src_addr) {
+		ip6 = (struct sockaddr_in6 *)src_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->dst_addr.ip6;
+		ip6->sin6_port = local_port;
+	}
 
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->src_addr.ip6;
-	ip6->sin6_port = hdr->port;
+	if (dst_addr) {
+		ip6 = (struct sockaddr_in6 *)dst_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->src_addr.ip6;
+		ip6->sin6_port = hdr->port;
+	}
 }
 
-static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			     struct ib_cm_event *ib_event)
+static u16 cma_port_from_service_id(__be64 service_id)
 {
-	struct cma_hdr *hdr;
+	return (u16)be64_to_cpu(service_id);
+}
 
-	if ((listen_id->route.addr.src_addr.ss_family == AF_IB) &&
-	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
-		cma_save_ib_info(id, listen_id, ib_event->param.req_rcvd.primary_path);
-		return 0;
-	}
+static int cma_save_ip_info(struct sockaddr *src_addr,
+			    struct sockaddr *dst_addr,
+			    struct ib_cm_event *ib_event,
+			    __be64 service_id)
+{
+	struct cma_hdr *hdr;
+	__be16 port;
 
 	hdr = ib_event->private_data;
 	if (hdr->cma_version != CMA_VERSION)
 		return -EINVAL;
 
+	port = htons(cma_port_from_service_id(service_id));
+
 	switch (cma_get_ip_ver(hdr)) {
 	case 4:
-		cma_save_ip4_info(id, listen_id, hdr);
+		cma_save_ip4_info(src_addr, dst_addr, hdr, port);
 		break;
 	case 6:
-		cma_save_ip6_info(id, listen_id, hdr);
+		cma_save_ip6_info(src_addr, dst_addr, hdr, port);
 		break;
 	default:
 		return -EINVAL;
 	}
+
 	return 0;
 }
 
+static int cma_save_net_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct ib_cm_event *ib_event,
+			     sa_family_t sa_family, __be64 service_id)
+{
+	if (sa_family == AF_IB && ib_event->event == IB_CM_REQ_RECEIVED) {
+		cma_save_ib_info(src_addr, dst_addr,
+				 ib_event->param.req_rcvd.primary_path);
+		return 0;
+	}
+
+	return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
+}
+
 static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
 {
 	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
@@ -1211,6 +1236,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
 	struct rdma_route *rt;
+	const sa_family_t ss_family = listen_id->route.addr.src_addr.ss_family;
+	const __be64 service_id =
+		      ib_event->param.req_rcvd.primary_path->service_id;
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
@@ -1219,7 +1247,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cma_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			      (struct sockaddr *)&id->route.addr.dst_addr,
+			      ib_event, ss_family, service_id))
 		goto err;
 
 	rt = &id->route;
@@ -1265,7 +1295,11 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cma_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			      (struct sockaddr *)&id->route.addr.dst_addr,
+			      ib_event,
+			      listen_id->route.addr.src_addr.ss_family,
+			      ib_event->param.sidr_req_rcvd.service_id))
 		goto err;
 
 	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 07/12] IB/cma: Helper functions to access port space IDRs
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-06-22 12:42   ` [PATCH v1 06/12] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
  2015-06-22 12:42   ` [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM Haggai Eran
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Add helper functions to access the IDRs by port-space and port number.

Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 81 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 60 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 21cc6c99008b..02914bf1e548 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -113,6 +113,22 @@ static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
 static DEFINE_IDR(ib_ps);
 
+static struct idr *cma_idr(enum rdma_port_space ps)
+{
+	switch (ps) {
+	case RDMA_PS_TCP:
+		return &tcp_ps;
+	case RDMA_PS_UDP:
+		return &udp_ps;
+	case RDMA_PS_IPOIB:
+		return &ipoib_ps;
+	case RDMA_PS_IB:
+		return &ib_ps;
+	default:
+		return NULL;
+	}
+}
+
 struct cma_device {
 	struct list_head	list;
 	struct ib_device	*device;
@@ -122,11 +138,33 @@ struct cma_device {
 };
 
 struct rdma_bind_list {
-	struct idr		*ps;
+	enum rdma_port_space	ps;
 	struct hlist_head	owners;
 	unsigned short		port;
 };
 
+static int cma_ps_alloc(enum rdma_port_space ps,
+			struct rdma_bind_list *bind_list, int snum)
+{
+	struct idr *idr = cma_idr(ps);
+
+	return idr_alloc(idr, bind_list, snum, snum + 1, GFP_KERNEL);
+}
+
+static struct rdma_bind_list *cma_ps_find(enum rdma_port_space ps, int snum)
+{
+	struct idr *idr = cma_idr(ps);
+
+	return idr_find(idr, snum);
+}
+
+static void cma_ps_remove(enum rdma_port_space ps, int snum)
+{
+	struct idr *idr = cma_idr(ps);
+
+	idr_remove(idr, snum);
+}
+
 enum {
 	CMA_OPTION_AFONLY,
 };
@@ -1053,7 +1091,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_lock(&lock);
 	hlist_del(&id_priv->node);
 	if (hlist_empty(&bind_list->owners)) {
-		idr_remove(bind_list->ps, bind_list->port);
+		cma_ps_remove(bind_list->ps, bind_list->port);
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
@@ -2349,8 +2387,8 @@ static void cma_bind_port(struct rdma_bind_list *bind_list,
 	hlist_add_head(&id_priv->node, &bind_list->owners);
 }
 
-static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
-			  unsigned short snum)
+static int cma_alloc_port(enum rdma_port_space ps,
+			  struct rdma_id_private *id_priv, unsigned short snum)
 {
 	struct rdma_bind_list *bind_list;
 	int ret;
@@ -2359,7 +2397,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
+	ret = cma_ps_alloc(ps, bind_list, snum);
 	if (ret < 0)
 		goto err;
 
@@ -2372,7 +2410,8 @@ err:
 	return ret == -ENOSPC ? -EADDRNOTAVAIL : ret;
 }
 
-static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_alloc_any_port(enum rdma_port_space ps,
+			      struct rdma_id_private *id_priv)
 {
 	static unsigned int last_used_port;
 	int low, high, remaining;
@@ -2383,7 +2422,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	rover = prandom_u32() % remaining + low;
 retry:
 	if (last_used_port != rover &&
-	    !idr_find(ps, (unsigned short) rover)) {
+	    !cma_ps_find(ps, (unsigned short)rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*
 		 * Remember previously used port number in order to avoid
@@ -2438,7 +2477,8 @@ static int cma_check_port(struct rdma_bind_list *bind_list,
 	return 0;
 }
 
-static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_use_port(enum rdma_port_space ps,
+			struct rdma_id_private *id_priv)
 {
 	struct rdma_bind_list *bind_list;
 	unsigned short snum;
@@ -2448,7 +2488,7 @@ static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
 	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
-	bind_list = idr_find(ps, snum);
+	bind_list = cma_ps_find(ps, snum);
 	if (!bind_list) {
 		ret = cma_alloc_port(ps, id_priv, snum);
 	} else {
@@ -2471,25 +2511,24 @@ static int cma_bind_listen(struct rdma_id_private *id_priv)
 	return ret;
 }
 
-static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
+static enum rdma_port_space cma_select_inet_ps(
+		struct rdma_id_private *id_priv)
 {
 	switch (id_priv->id.ps) {
 	case RDMA_PS_TCP:
-		return &tcp_ps;
 	case RDMA_PS_UDP:
-		return &udp_ps;
 	case RDMA_PS_IPOIB:
-		return &ipoib_ps;
 	case RDMA_PS_IB:
-		return &ib_ps;
+		return id_priv->id.ps;
 	default:
-		return NULL;
+
+		return 0;
 	}
 }
 
-static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
+static enum rdma_port_space cma_select_ib_ps(struct rdma_id_private *id_priv)
 {
-	struct idr *ps = NULL;
+	enum rdma_port_space ps = 0;
 	struct sockaddr_ib *sib;
 	u64 sid_ps, mask, sid;
 
@@ -2499,15 +2538,15 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
 
 	if ((id_priv->id.ps == RDMA_PS_IB) && (sid == (RDMA_IB_IP_PS_IB & mask))) {
 		sid_ps = RDMA_IB_IP_PS_IB;
-		ps = &ib_ps;
+		ps = RDMA_PS_IB;
 	} else if (((id_priv->id.ps == RDMA_PS_IB) || (id_priv->id.ps == RDMA_PS_TCP)) &&
 		   (sid == (RDMA_IB_IP_PS_TCP & mask))) {
 		sid_ps = RDMA_IB_IP_PS_TCP;
-		ps = &tcp_ps;
+		ps = RDMA_PS_TCP;
 	} else if (((id_priv->id.ps == RDMA_PS_IB) || (id_priv->id.ps == RDMA_PS_UDP)) &&
 		   (sid == (RDMA_IB_IP_PS_UDP & mask))) {
 		sid_ps = RDMA_IB_IP_PS_UDP;
-		ps = &udp_ps;
+		ps = RDMA_PS_UDP;
 	}
 
 	if (ps) {
@@ -2520,7 +2559,7 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
 
 static int cma_get_port(struct rdma_id_private *id_priv)
 {
-	struct idr *ps;
+	enum rdma_port_space ps;
 	int ret;
 
 	if (cma_family(id_priv) != AF_IB)
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-06-22 12:42   ` [PATCH v1 07/12] IB/cma: Helper functions to access port space IDRs Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
       [not found]     ` <1434976961-27424-9-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-22 12:42   ` [PATCH v1 09/12] IB/cma: validate routing of incoming requests Haggai Eran
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 170 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 168 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 02914bf1e548..4701167f5117 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -300,7 +300,7 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv,
 	return old;
 }
 
-static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
+static inline u8 cma_get_ip_ver(const struct cma_hdr *hdr)
 {
 	return hdr->ip_version >> 4;
 }
@@ -1024,6 +1024,169 @@ static int cma_save_net_info(struct sockaddr *src_addr,
 	return cma_save_ip_info(src_addr, dst_addr, ib_event, service_id);
 }
 
+struct cma_req_info {
+	struct ib_device *device;
+	int port;
+	const union ib_gid *local_gid;
+	__be64 service_id;
+	u16 pkey;
+};
+
+static int cma_save_req_info(const struct ib_cm_event *ib_event,
+			     struct cma_req_info *req)
+{
+	const struct ib_cm_req_event_param *req_param =
+		&ib_event->param.req_rcvd;
+	const struct ib_cm_sidr_req_event_param *sidr_param =
+		&ib_event->param.sidr_req_rcvd;
+
+	switch (ib_event->event) {
+	case IB_CM_REQ_RECEIVED:
+		req->device	= req_param->listen_id->device;
+		req->port	= req_param->port;
+		req->local_gid	= &req_param->primary_path->sgid;
+		req->service_id	= req_param->primary_path->service_id;
+		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
+		break;
+	case IB_CM_SIDR_REQ_RECEIVED:
+		req->device	= sidr_param->listen_id->device;
+		req->port	= sidr_param->port;
+		req->local_gid	= NULL;
+		req->service_id	= sidr_param->service_id;
+		req->pkey	= sidr_param->pkey;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
+					  const struct cma_req_info *req)
+{
+	struct sockaddr_storage listen_addr_storage;
+	struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage;
+	struct net_device *net_dev;
+	int err;
+
+	err = cma_save_ip_info(listen_addr, NULL, ib_event, req->service_id);
+	if (err)
+		return ERR_PTR(err);
+
+	net_dev = ib_get_net_dev_by_params(req->device, req->port, req->pkey,
+					   req->local_gid, listen_addr);
+	if (!net_dev)
+		return ERR_PTR(-ENODEV);
+
+	return net_dev;
+}
+
+static enum rdma_port_space rdma_ps_from_service_id(__be64 service_id)
+{
+	return (be64_to_cpu(service_id) >> 16) & 0xffff;
+}
+
+static bool cma_match_private_data(struct rdma_id_private *id_priv,
+				   const struct cma_hdr *hdr)
+{
+	struct sockaddr *addr = cma_src_addr(id_priv);
+	__be32 ip4_addr;
+	struct in6_addr ip6_addr;
+
+	if (cma_any_addr(addr) && !id_priv->afonly)
+		return true;
+
+	switch (addr->sa_family) {
+	case AF_INET:
+		ip4_addr = ((struct sockaddr_in *)addr)->sin_addr.s_addr;
+		if (cma_get_ip_ver(hdr) != 4)
+			return false;
+		if (!cma_any_addr(addr) &&
+		    hdr->dst_addr.ip4.addr != ip4_addr)
+			return false;
+		break;
+	case AF_INET6:
+		ip6_addr = ((struct sockaddr_in6 *)addr)->sin6_addr;
+		if (cma_get_ip_ver(hdr) != 6)
+			return false;
+		if (!cma_any_addr(addr) &&
+		    memcmp(&hdr->dst_addr.ip6, &ip6_addr, sizeof(ip6_addr)))
+			return false;
+		break;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+static bool cma_match_net_dev(const struct rdma_id_private *id_priv,
+			      const struct net_device *net_dev)
+{
+	const struct rdma_dev_addr *addr = &id_priv->id.route.addr.dev_addr;
+
+	return !addr->bound_dev_if ||
+	       (net_eq(dev_net(net_dev), &init_net) &&
+		addr->bound_dev_if == net_dev->ifindex);
+}
+
+static struct rdma_id_private *cma_find_listener(
+		const struct rdma_bind_list *bind_list,
+		const struct ib_cm_id *cm_id,
+		const struct ib_cm_event *ib_event,
+		const struct cma_req_info *req,
+		const struct net_device *net_dev)
+{
+	struct rdma_id_private *id_priv, *id_priv_dev;
+
+	if (!bind_list)
+		return ERR_PTR(-EINVAL);
+
+	hlist_for_each_entry(id_priv, &bind_list->owners, node) {
+		if (cma_match_private_data(id_priv, ib_event->private_data)) {
+			if (id_priv->id.device == cm_id->device &&
+			    cma_match_net_dev(id_priv, net_dev))
+				return id_priv;
+			list_for_each_entry(id_priv_dev,
+					    &id_priv->listen_list,
+					    listen_list) {
+				if (id_priv_dev->id.device == cm_id->device &&
+				    cma_match_net_dev(id_priv_dev, net_dev))
+					return id_priv_dev;
+			}
+		}
+	}
+
+	return ERR_PTR(-EINVAL);
+}
+
+static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id,
+						 struct ib_cm_event *ib_event)
+{
+	struct cma_req_info req;
+	struct rdma_bind_list *bind_list;
+	struct rdma_id_private *id_priv;
+	struct net_device *net_dev;
+	int err;
+
+	err = cma_save_req_info(ib_event, &req);
+	if (err)
+		return ERR_PTR(err);
+
+	net_dev = cma_get_net_dev(ib_event, &req);
+	if (IS_ERR(net_dev))
+		return ERR_PTR(PTR_ERR(net_dev));
+
+	bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
+				cma_port_from_service_id(req.service_id));
+	id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, net_dev);
+
+	dev_put(net_dev);
+
+	return id_priv;
+}
+
 static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
 {
 	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
@@ -1383,7 +1546,10 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 	struct rdma_cm_event event;
 	int offset, ret;
 
-	listen_id = cm_id->context;
+	listen_id = cma_id_from_event(cm_id, ib_event);
+	if (IS_ERR(listen_id))
+		return PTR_ERR(listen_id);
+
 	if (!cma_check_req_qp_type(&listen_id->id, ib_event))
 		return -EINVAL;
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 09/12] IB/cma: validate routing of incoming requests
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-06-22 12:42   ` [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
  2015-06-22 12:42   ` [PATCH v1 10/12] IB/cma: use found net_dev for passive connections Haggai Eran
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 95 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 92 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 4701167f5117..759697d6e627 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -46,6 +46,8 @@
 
 #include <net/tcp.h>
 #include <net/ipv6.h>
+#include <net/ip_fib.h>
+#include <net/ip6_route.h>
 
 #include <rdma/rdma_cm.h>
 #include <rdma/rdma_cm_ib.h>
@@ -1062,15 +1064,97 @@ static int cma_save_req_info(const struct ib_cm_event *ib_event,
 	return 0;
 }
 
+static bool validate_ipv4_net_dev(struct net_device *net_dev,
+				  const struct sockaddr_in *dst_addr,
+				  const struct sockaddr_in *src_addr)
+{
+	__be32 daddr = dst_addr->sin_addr.s_addr,
+	       saddr = src_addr->sin_addr.s_addr;
+	struct fib_result res;
+	struct flowi4 fl4;
+	int err;
+	bool ret;
+
+	if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
+	    ipv4_is_lbcast(daddr) || ipv4_is_zeronet(saddr) ||
+	    ipv4_is_zeronet(daddr) || ipv4_is_loopback(daddr) ||
+	    ipv4_is_loopback(saddr))
+		return false;
+
+	memset(&fl4, 0, sizeof(fl4));
+	fl4.flowi4_iif = net_dev->ifindex;
+	fl4.daddr = daddr;
+	fl4.saddr = saddr;
+
+	rcu_read_lock();
+	err = fib_lookup(dev_net(net_dev), &fl4, &res);
+	if (err)
+		return false;
+
+	ret = FIB_RES_DEV(res) == net_dev;
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static bool validate_ipv6_net_dev(struct net_device *net_dev,
+				  const struct sockaddr_in6 *dst_addr,
+				  const struct sockaddr_in6 *src_addr)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	const int strict = ipv6_addr_type(&dst_addr->sin6_addr) &
+			   IPV6_ADDR_LINKLOCAL;
+	struct rt6_info *rt = rt6_lookup(dev_net(net_dev), &dst_addr->sin6_addr,
+					 &src_addr->sin6_addr, net_dev->ifindex,
+					 strict);
+	bool ret;
+
+	if (!rt)
+		return false;
+
+	ret = rt->rt6i_idev->dev == net_dev;
+	ip6_rt_put(rt);
+
+	return ret;
+#else
+	return false;
+#endif
+}
+
+static bool validate_net_dev(struct net_device *net_dev,
+			     const struct sockaddr *daddr,
+			     const struct sockaddr *saddr)
+{
+	const struct sockaddr_in *daddr4 = (const struct sockaddr_in *)daddr;
+	const struct sockaddr_in *saddr4 = (const struct sockaddr_in *)saddr;
+	const struct sockaddr_in6 *daddr6 = (const struct sockaddr_in6 *)daddr;
+	const struct sockaddr_in6 *saddr6 = (const struct sockaddr_in6 *)saddr;
+
+	switch (daddr->sa_family) {
+	case AF_INET:
+		return saddr->sa_family == AF_INET &&
+		       validate_ipv4_net_dev(net_dev, daddr4, saddr4);
+
+	case AF_INET6:
+		return saddr->sa_family == AF_INET6 &&
+		       validate_ipv6_net_dev(net_dev, daddr6, saddr6);
+
+	default:
+		return false;
+	}
+}
+
 static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
 					  const struct cma_req_info *req)
 {
-	struct sockaddr_storage listen_addr_storage;
-	struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage;
+	struct sockaddr_storage listen_addr_storage, src_addr_storage;
+	struct sockaddr *listen_addr = (struct sockaddr *)&listen_addr_storage,
+			*src_addr = (struct sockaddr *)&src_addr_storage;
 	struct net_device *net_dev;
 	int err;
 
-	err = cma_save_ip_info(listen_addr, NULL, ib_event, req->service_id);
+	err = cma_save_ip_info(listen_addr, src_addr, ib_event,
+			       req->service_id);
 	if (err)
 		return ERR_PTR(err);
 
@@ -1079,6 +1163,11 @@ static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
 	if (!net_dev)
 		return ERR_PTR(-ENODEV);
 
+	if (!validate_net_dev(net_dev, listen_addr, src_addr)) {
+		dev_put(net_dev);
+		return ERR_PTR(-EHOSTUNREACH);
+	}
+
 	return net_dev;
 }
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 10/12] IB/cma: use found net_dev for passive connections
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-06-22 12:42   ` [PATCH v1 09/12] IB/cma: validate routing of incoming requests Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
  2015-06-22 12:42   ` [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids Haggai Eran
  2015-06-22 12:42   ` [PATCH v1 12/12] IB/cm: Remove compare_data checks Haggai Eran
  8 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 64 ++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 759697d6e627..729511d3ec64 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1251,27 +1251,25 @@ static struct rdma_id_private *cma_find_listener(
 }
 
 static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id,
-						 struct ib_cm_event *ib_event)
+						 struct ib_cm_event *ib_event,
+						 struct net_device **net_dev)
 {
 	struct cma_req_info req;
 	struct rdma_bind_list *bind_list;
 	struct rdma_id_private *id_priv;
-	struct net_device *net_dev;
 	int err;
 
 	err = cma_save_req_info(ib_event, &req);
 	if (err)
 		return ERR_PTR(err);
 
-	net_dev = cma_get_net_dev(ib_event, &req);
-	if (IS_ERR(net_dev))
-		return ERR_PTR(PTR_ERR(net_dev));
+	*net_dev = cma_get_net_dev(ib_event, &req);
+	if (IS_ERR(*net_dev))
+		return ERR_PTR(PTR_ERR(*net_dev));
 
 	bind_list = cma_ps_find(rdma_ps_from_service_id(req.service_id),
 				cma_port_from_service_id(req.service_id));
-	id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, net_dev);
-
-	dev_put(net_dev);
+	id_priv = cma_find_listener(bind_list, cm_id, ib_event, &req, *net_dev);
 
 	return id_priv;
 }
@@ -1521,7 +1519,8 @@ out:
 }
 
 static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
-					       struct ib_cm_event *ib_event)
+					       struct ib_cm_event *ib_event,
+					       struct net_device *net_dev)
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
@@ -1553,15 +1552,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 	if (rt->num_paths == 2)
 		rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
 
-	if (cma_any_addr(cma_src_addr(id_priv))) {
-		rt->addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
-		rdma_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid);
-		ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey));
-	} else {
-		ret = cma_translate_addr(cma_src_addr(id_priv), &rt->addr.dev_addr);
-		if (ret)
-			goto err;
-	}
+	ret = rdma_copy_addr(&rt->addr.dev_addr, net_dev, NULL);
+	if (ret)
+		goto err;
 	rdma_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid);
 
 	id_priv->state = RDMA_CM_CONNECT;
@@ -1573,7 +1566,8 @@ err:
 }
 
 static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
-					      struct ib_cm_event *ib_event)
+					      struct ib_cm_event *ib_event,
+					      struct net_device *net_dev)
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
@@ -1592,11 +1586,9 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 			      ib_event->param.sidr_req_rcvd.service_id))
 		goto err;
 
-	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
-		ret = cma_translate_addr(cma_src_addr(id_priv), &id->route.addr.dev_addr);
-		if (ret)
-			goto err;
-	}
+	ret = rdma_copy_addr(&id->route.addr.dev_addr, net_dev, NULL);
+	if (ret)
+		goto err;
 
 	id_priv->state = RDMA_CM_CONNECT;
 	return id_priv;
@@ -1633,28 +1625,33 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 {
 	struct rdma_id_private *listen_id, *conn_id;
 	struct rdma_cm_event event;
+	struct net_device *net_dev;
 	int offset, ret;
 
-	listen_id = cma_id_from_event(cm_id, ib_event);
+	listen_id = cma_id_from_event(cm_id, ib_event, &net_dev);
 	if (IS_ERR(listen_id))
 		return PTR_ERR(listen_id);
 
-	if (!cma_check_req_qp_type(&listen_id->id, ib_event))
-		return -EINVAL;
+	if (!cma_check_req_qp_type(&listen_id->id, ib_event)) {
+		ret = -EINVAL;
+		goto net_dev_put;
+	}
 
-	if (cma_disable_callback(listen_id, RDMA_CM_LISTEN))
-		return -ECONNABORTED;
+	if (cma_disable_callback(listen_id, RDMA_CM_LISTEN)) {
+		ret = -ECONNABORTED;
+		goto net_dev_put;
+	}
 
 	memset(&event, 0, sizeof event);
 	offset = cma_user_data_offset(listen_id);
 	event.event = RDMA_CM_EVENT_CONNECT_REQUEST;
 	if (ib_event->event == IB_CM_SIDR_REQ_RECEIVED) {
-		conn_id = cma_new_udp_id(&listen_id->id, ib_event);
+		conn_id = cma_new_udp_id(&listen_id->id, ib_event, net_dev);
 		event.param.ud.private_data = ib_event->private_data + offset;
 		event.param.ud.private_data_len =
 				IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE - offset;
 	} else {
-		conn_id = cma_new_conn_id(&listen_id->id, ib_event);
+		conn_id = cma_new_conn_id(&listen_id->id, ib_event, net_dev);
 		cma_set_req_event_data(&event, &ib_event->param.req_rcvd,
 				       ib_event->private_data, offset);
 	}
@@ -1692,6 +1689,7 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 	mutex_unlock(&conn_id->handler_mutex);
 	mutex_unlock(&listen_id->handler_mutex);
 	cma_deref_id(conn_id);
+	dev_put(net_dev);
 	return 0;
 
 err3:
@@ -1705,6 +1703,10 @@ err1:
 	mutex_unlock(&listen_id->handler_mutex);
 	if (conn_id)
 		rdma_destroy_id(&conn_id->id);
+
+net_dev_put:
+	dev_put(net_dev);
+
 	return ret;
 }
 
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-06-22 12:42   ` [PATCH v1 10/12] IB/cma: use found net_dev for passive connections Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
       [not found]     ` <1434976961-27424-12-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-06-22 12:42   ` [PATCH v1 12/12] IB/cm: Remove compare_data checks Haggai Eran
  8 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Use ib_cm_id_create_and_listen to create listening IB CM IDs or share
existing ones if needed. When given a request on a specific CM ID, the code
now matches the request to the RDMA CM ID based on the request parameters,
so it no longer needs to rely on the ib_cm's private data matching
capabilities.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 60 ++++---------------------------------------
 1 file changed, 5 insertions(+), 55 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 729511d3ec64..322858842bae 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1719,42 +1719,6 @@ __be64 rdma_get_service_id(struct rdma_cm_id *id, struct sockaddr *addr)
 }
 EXPORT_SYMBOL(rdma_get_service_id);
 
-static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
-				 struct ib_cm_compare_data *compare)
-{
-	struct cma_hdr *cma_data, *cma_mask;
-	__be32 ip4_addr;
-	struct in6_addr ip6_addr;
-
-	memset(compare, 0, sizeof *compare);
-	cma_data = (void *) compare->data;
-	cma_mask = (void *) compare->mask;
-
-	switch (addr->sa_family) {
-	case AF_INET:
-		ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
-		cma_set_ip_ver(cma_data, 4);
-		cma_set_ip_ver(cma_mask, 0xF);
-		if (!cma_any_addr(addr)) {
-			cma_data->dst_addr.ip4.addr = ip4_addr;
-			cma_mask->dst_addr.ip4.addr = htonl(~0);
-		}
-		break;
-	case AF_INET6:
-		ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
-		cma_set_ip_ver(cma_data, 6);
-		cma_set_ip_ver(cma_mask, 0xF);
-		if (!cma_any_addr(addr)) {
-			cma_data->dst_addr.ip6 = ip6_addr;
-			memset(&cma_mask->dst_addr.ip6, 0xFF,
-			       sizeof cma_mask->dst_addr.ip6);
-		}
-		break;
-	default:
-		break;
-	}
-}
-
 static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
 {
 	struct rdma_id_private *id_priv = iw_id->context;
@@ -1908,33 +1872,19 @@ out:
 
 static int cma_ib_listen(struct rdma_id_private *id_priv)
 {
-	struct ib_cm_compare_data compare_data;
 	struct sockaddr *addr;
 	struct ib_cm_id	*id;
 	__be64 svc_id;
-	int ret;
 
-	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv);
+	addr = cma_src_addr(id_priv);
+	svc_id = rdma_get_service_id(&id_priv->id, addr);
+	id = ib_cm_insert_listen(id_priv->id.device, cma_req_handler, svc_id,
+				 0);
 	if (IS_ERR(id))
 		return PTR_ERR(id);
-
 	id_priv->cm_id.ib = id;
 
-	addr = cma_src_addr(id_priv);
-	svc_id = rdma_get_service_id(&id_priv->id, addr);
-	if (cma_any_addr(addr) && !id_priv->afonly)
-		ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, NULL);
-	else {
-		cma_set_compare_data(id_priv->id.ps, addr, &compare_data);
-		ret = ib_cm_listen(id_priv->cm_id.ib, svc_id, 0, &compare_data);
-	}
-
-	if (ret) {
-		ib_destroy_cm_id(id_priv->cm_id.ib);
-		id_priv->cm_id.ib = NULL;
-	}
-
-	return ret;
+	return 0;
 }
 
 static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog)
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v1 12/12] IB/cm: Remove compare_data checks
       [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-06-22 12:42   ` [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids Haggai Eran
@ 2015-06-22 12:42   ` Haggai Eran
  8 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-06-22 12:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Shachar Raindel, Yotam Kenneth,
	Jason Gunthorpe, Haggai Eran

Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/cm.c            | 105 ++++++--------------------------
 drivers/infiniband/core/ucm.c           |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   2 +-
 include/rdma/ib_cm.h                    |  14 +----
 5 files changed, 22 insertions(+), 104 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 6d91bf69b11a..411c08b0c743 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -221,7 +221,6 @@ struct cm_id_private {
 	/* todo: use alternate port on send failure */
 	struct cm_av av;
 	struct cm_av alt_av;
-	struct ib_cm_compare_data *compare_data;
 
 	void *private_data;
 	__be64 tid;
@@ -442,40 +441,6 @@ static struct cm_id_private * cm_acquire_id(__be32 local_id, __be32 remote_id)
 	return cm_id_priv;
 }
 
-static void cm_mask_copy(u32 *dst, const u32 *src, const u32 *mask)
-{
-	int i;
-
-	for (i = 0; i < IB_CM_COMPARE_SIZE; i++)
-		dst[i] = src[i] & mask[i];
-}
-
-static int cm_compare_data(struct ib_cm_compare_data *src_data,
-			   struct ib_cm_compare_data *dst_data)
-{
-	u32 src[IB_CM_COMPARE_SIZE];
-	u32 dst[IB_CM_COMPARE_SIZE];
-
-	if (!src_data || !dst_data)
-		return 0;
-
-	cm_mask_copy(src, src_data->data, dst_data->mask);
-	cm_mask_copy(dst, dst_data->data, src_data->mask);
-	return memcmp(src, dst, sizeof(src));
-}
-
-static int cm_compare_private_data(u32 *private_data,
-				   struct ib_cm_compare_data *dst_data)
-{
-	u32 src[IB_CM_COMPARE_SIZE];
-
-	if (!dst_data)
-		return 0;
-
-	cm_mask_copy(src, private_data, dst_data->mask);
-	return memcmp(src, dst_data->data, sizeof(src));
-}
-
 /*
  * Trivial helpers to strip endian annotation and compare; the
  * endianness doesn't actually matter since we just need a stable
@@ -508,18 +473,14 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 	struct cm_id_private *cur_cm_id_priv;
 	__be64 service_id = cm_id_priv->id.service_id;
 	__be64 service_mask = cm_id_priv->id.service_mask;
-	int data_cmp;
 
 	while (*link) {
 		parent = *link;
 		cur_cm_id_priv = rb_entry(parent, struct cm_id_private,
 					  service_node);
-		data_cmp = cm_compare_data(cm_id_priv->compare_data,
-					   cur_cm_id_priv->compare_data);
 		if ((cur_cm_id_priv->id.service_mask & service_id) ==
 		    (service_mask & cur_cm_id_priv->id.service_id) &&
-		    (cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
-		    !data_cmp)
+		    (cm_id_priv->id.device == cur_cm_id_priv->id.device))
 			return cur_cm_id_priv;
 
 		if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
@@ -530,8 +491,6 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 			link = &(*link)->rb_left;
 		else if (be64_gt(service_id, cur_cm_id_priv->id.service_id))
 			link = &(*link)->rb_right;
-		else if (data_cmp < 0)
-			link = &(*link)->rb_left;
 		else
 			link = &(*link)->rb_right;
 	}
@@ -541,20 +500,16 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 }
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
-					     __be64 service_id,
-					     u32 *private_data)
+					     __be64 service_id)
 {
 	struct rb_node *node = cm.listen_service_table.rb_node;
 	struct cm_id_private *cm_id_priv;
-	int data_cmp;
 
 	while (node) {
 		cm_id_priv = rb_entry(node, struct cm_id_private, service_node);
-		data_cmp = cm_compare_private_data(private_data,
-						   cm_id_priv->compare_data);
 		if ((cm_id_priv->id.service_mask & service_id) ==
 		     cm_id_priv->id.service_id &&
-		    (cm_id_priv->id.device == device) && !data_cmp)
+		    (cm_id_priv->id.device == device))
 			return cm_id_priv;
 
 		if (device < cm_id_priv->id.device)
@@ -565,8 +520,6 @@ static struct cm_id_private * cm_find_listen(struct ib_device *device,
 			node = node->rb_left;
 		else if (be64_gt(service_id, cm_id_priv->id.service_id))
 			node = node->rb_right;
-		else if (data_cmp < 0)
-			node = node->rb_left;
 		else
 			node = node->rb_right;
 	}
@@ -927,7 +880,6 @@ retest:
 	wait_for_completion(&cm_id_priv->comp);
 	while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
 		cm_free_work(work);
-	kfree(cm_id_priv->compare_data);
 	kfree(cm_id_priv->private_data);
 	kfree(cm_id_priv);
 }
@@ -950,20 +902,11 @@ EXPORT_SYMBOL(ib_destroy_cm_id);
  *   range of service IDs.  If set to 0, the service ID is matched
  *   exactly.  This parameter is ignored if %service_id is set to
  *   IB_CM_ASSIGN_SERVICE_ID.
- * @compare_data: This parameter is optional.  It specifies data that must
- *   appear in the private data of a connection request for the specified
- *   listen request.
- * @lock: If set, lock the cm.lock spin-lock when adding the id to the
- *   listener tree. When false, the caller must already hold the spin-lock,
- *   and compare_data must be NULL.
  */
 static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
-			  __be64 service_mask,
-			  struct ib_cm_compare_data *compare_data,
-			  bool lock)
+			  __be64 service_mask)
 {
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
-	unsigned long flags = 0;
 	int ret = 0;
 
 	service_mask = service_mask ? service_mask : ~cpu_to_be64(0);
@@ -976,22 +919,9 @@ static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
 	if (cm_id->state != IB_CM_IDLE)
 		return -EINVAL;
 
-	if (compare_data) {
-		cm_id_priv->compare_data = kzalloc(sizeof *compare_data,
-						   GFP_KERNEL);
-		if (!cm_id_priv->compare_data)
-			return -ENOMEM;
-		cm_mask_copy(cm_id_priv->compare_data->data,
-			     compare_data->data, compare_data->mask);
-		memcpy(cm_id_priv->compare_data->mask, compare_data->mask,
-		       sizeof(compare_data->mask));
-	}
-
 	cm_id->state = IB_CM_LISTEN;
 	++cm_id_priv->listen_sharecount;
 
-	if (lock)
-		spin_lock_irqsave(&cm.lock, flags);
 	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
 		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
 		cm_id->service_mask = ~cpu_to_be64(0);
@@ -1000,24 +930,25 @@ static int __ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
 		cm_id->service_mask = service_mask;
 	}
 	cur_cm_id_priv = cm_insert_listen(cm_id_priv);
-	if (lock)
-		spin_unlock_irqrestore(&cm.lock, flags);
 
 	if (cur_cm_id_priv) {
 		cm_id->state = IB_CM_IDLE;
 		--cm_id_priv->listen_sharecount;
-		kfree(cm_id_priv->compare_data);
-		cm_id_priv->compare_data = NULL;
 		ret = -EBUSY;
 	}
 	return ret;
 }
 
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
-		 struct ib_cm_compare_data *compare_data)
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask)
 {
-	return __ib_cm_listen(cm_id, service_id, service_mask, compare_data,
-			      true);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm.lock, flags);
+	ret = __ib_cm_listen(cm_id, service_id, service_mask);
+	spin_unlock_irqrestore(&cm.lock, flags);
+
+	return ret;
 }
 EXPORT_SYMBOL(ib_cm_listen);
 
@@ -1061,7 +992,7 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 		goto new_id;
 
 	/* Find an existing ID */
-	cm_id_priv = cm_find_listen(device, service_id, NULL);
+	cm_id_priv = cm_find_listen(device, service_id);
 	if (cm_id_priv) {
 		if (cm_id->cm_handler != cm_handler || cm_id->context) {
 			/* Sharing an ib_cm_id with different handlers is not
@@ -1080,7 +1011,7 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 
 new_id:
 	/* Use newly created ID */
-	err = __ib_cm_listen(cm_id, service_id, service_mask, NULL, false);
+	err = __ib_cm_listen(cm_id, service_id, service_mask);
 
 	spin_unlock_irqrestore(&cm.lock, flags);
 
@@ -1586,8 +1517,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 
 	/* Find matching listen request. */
 	listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
-					   req_msg->service_id,
-					   req_msg->private_data);
+					   req_msg->service_id);
 	if (!listen_cm_id_priv) {
 		cm_cleanup_timewait(cm_id_priv->timewait_info);
 		spin_unlock_irq(&cm.lock);
@@ -3134,8 +3064,7 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	}
 	cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
 	cur_cm_id_priv = cm_find_listen(cm_id->device,
-					sidr_req_msg->service_id,
-					sidr_req_msg->private_data);
+					sidr_req_msg->service_id);
 	if (!cur_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 2ce29cd43639..2c20f076c84f 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -658,8 +658,7 @@ static ssize_t ib_ucm_listen(struct ib_ucm_file *file,
 	if (result)
 		goto out;
 
-	result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask,
-			      NULL);
+	result = ib_cm_listen(ctx->cm_id, cmd.service_id, cmd.service_mask);
 out:
 	ib_ucm_ctx_put(ctx);
 	return result;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index cf32a778e7d0..626959927f4d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -854,7 +854,7 @@ int ipoib_cm_dev_open(struct net_device *dev)
 	}
 
 	ret = ib_cm_listen(priv->cm.id, cpu_to_be64(IPOIB_CM_IETF_ID | priv->qp->qp_num),
-			   0, NULL);
+			   0);
 	if (ret) {
 		printk(KERN_WARNING "%s: failed to listen on ID 0x%llx\n", priv->ca->name,
 		       IPOIB_CM_IETF_ID | priv->qp->qp_num);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 54da6e8ddfa1..7cc65784dae2 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -3259,7 +3259,7 @@ static void srpt_add_one(struct ib_device *device)
 	 * in the system as service_id; therefore, the target_id will change
 	 * if this HCA is gone bad and replaced by different HCA
 	 */
-	if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0, NULL))
+	if (ib_cm_listen(sdev->cm_id, cpu_to_be64(srpt_service_guid), 0))
 		goto err_cm;
 
 	INIT_IB_EVENT_HANDLER(&sdev->event_handler, sdev->device,
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index f7fd22f10bae..c2be7c050bc0 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -105,8 +105,6 @@ enum ib_cm_data_size {
 	IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE = 216,
 	IB_CM_SIDR_REP_PRIVATE_DATA_SIZE = 136,
 	IB_CM_SIDR_REP_INFO_LENGTH	 = 72,
-	/* compare done u32 at a time */
-	IB_CM_COMPARE_SIZE		 = (64 / sizeof(u32))
 };
 
 struct ib_cm_id;
@@ -338,11 +336,6 @@ void ib_destroy_cm_id(struct ib_cm_id *cm_id);
 #define IB_SDP_SERVICE_ID	cpu_to_be64(0x0000000000010000ULL)
 #define IB_SDP_SERVICE_ID_MASK	cpu_to_be64(0xFFFFFFFFFFFF0000ULL)
 
-struct ib_cm_compare_data {
-	u32  data[IB_CM_COMPARE_SIZE];
-	u32  mask[IB_CM_COMPARE_SIZE];
-};
-
 /**
  * ib_cm_listen - Initiates listening on the specified service ID for
  *   connection and service ID resolution requests.
@@ -355,12 +348,9 @@ struct ib_cm_compare_data {
  *   range of service IDs.  If set to 0, the service ID is matched
  *   exactly.  This parameter is ignored if %service_id is set to
  *   IB_CM_ASSIGN_SERVICE_ID.
- * @compare_data: This parameter is optional.  It specifies data that must
- *   appear in the private data of a connection request for the specified
- *   listen request.
  */
-int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
-		 struct ib_cm_compare_data *compare_data);
+int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id,
+		 __be64 service_mask);
 
 struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
 				     ib_cm_handler cm_handler,
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 01/12] IB/core: pass client data to remove() callbacks
  2015-06-22 12:42 ` [PATCH v1 01/12] IB/core: pass client data to remove() callbacks Haggai Eran
@ 2015-07-08 20:29   ` Jason Gunthorpe
  2015-07-08 21:34     ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-08 20:29 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Mon, Jun 22, 2015 at 03:42:30PM +0300, Haggai Eran wrote:
> An ib_client callback that is called with the lists_rwsem locked only for
> read is protected from changes to the IB client lists, but not from
> ib_unregister_device() freeing its client data. This is because
> ib_unregister_device() will remove the device from the device list with
> lists_rwsem locked for write, but perform the rest of the cleanup,
> including the call to remove() without that lock.

I was going to look at this, but, uh.. it seems mangled, doesn't
apply, doesn't seem fixable from here.

It isn't on top of any of Doug's trees that I can find and it has
this:

> @@ -348,21 +348,22 @@ void ib_unregister_device(struct ib_device *device)
>  
>  	down_write(&lists_rwsem);
>  	list_del(&device->core_list);

..  lists_rwsem is what this patch is supposed to be adding, so I
don't know what went wrong here....

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 02/12] IB/core: Find the network device matching connection parameters
  2015-06-22 12:42 ` [PATCH v1 02/12] IB/core: Find the network device matching connection parameters Haggai Eran
@ 2015-07-08 20:33   ` Jason Gunthorpe
       [not found]     ` <20150708203325.GB16812-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-08 20:33 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Mon, Jun 22, 2015 at 03:42:31PM +0300, Haggai Eran wrote:
> +/**
> + * ib_get_net_dev_by_params() - Return the appropriate net_dev
> + * for a received CM request
> + * @dev:	An RDMA device on which the request has been received.
> + * @port:	Port number on the RDMA device.
> + * @pkey:	The Pkey the request came on.
> + * @gid:	A GID that the net_dev uses to communicate.
> + * @addr:	Contains the IP address that the request specified as its
> + *		destination.
> + */
> +struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
> +					    u16 pkey, const union ib_gid *gid,
> +					    const struct sockaddr *addr);

I feel like this has been repated a few times now, but kdocs should be
with the function body, not in the header.

Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 01/12] IB/core: pass client data to remove() callbacks
  2015-07-08 20:29   ` Jason Gunthorpe
@ 2015-07-08 21:34     ` Jason Gunthorpe
       [not found]       ` <20150708213410.GA19624-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-08 21:34 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Wed, Jul 08, 2015 at 02:29:10PM -0600, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:30PM +0300, Haggai Eran wrote:
> > An ib_client callback that is called with the lists_rwsem locked only for
> > read is protected from changes to the IB client lists, but not from
> > ib_unregister_device() freeing its client data. This is because
> > ib_unregister_device() will remove the device from the device list with
> > lists_rwsem locked for write, but perform the rest of the cleanup,
> > including the call to remove() without that lock.
> 
> I was going to look at this, but, uh.. it seems mangled, doesn't
> apply, doesn't seem fixable from here.

Okay, I see, it sits on top of the patch from Matan's last
posting.. My bad.

Hum... I have to say I don't really like this, changing the ordering
of client_data = NULL with respect to client->remove doesn't seem like
a great idea - and the rds changes look scary to me, at least I
couldn't confidently say they were OK..

And that isn't really the issue - this has nothing to do with
client_data, it is all about not having a callback running when doing
remove.

It looks like the way out of this is to have ib_get_net_dev_by_params
iterate over the client_data_list and use a dedicated flag in that
struct to indicate that client&device combination is
remove-in-progress.

This would be a bit more efficient as well, and I would suggest
passing the context in as an arg to the callback.

client_data_list would change a bit to become write locked first by
write(lists_rwsem), and then second by the spin lock, so holding
read(lists_rwsem) while iterating is enough locking, and you'd hold
lists_rwsem while kfreeing.

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters
       [not found]     ` <1434976961-27424-4-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-08 23:41       ` Jason Gunthorpe
       [not found]         ` <20150708234111.GC16812-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-08 23:41 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Mon, Jun 22, 2015 at 03:42:32PM +0300, Haggai Eran wrote:
> +		if (net_dev) {
> +			ipoib_warn(priv, "matching net_dev found: %s\n",
> +				   net_dev->name);

Is that a debug print?

> +	default:
> +		dev_warn(&dev->dev, "duplicate IP address detected\n");
> +		/* Fall through */

Surely that needs some kind of rate limit?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters
       [not found]         ` <20150708234111.GC16812-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-09  9:57             ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-09  9:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 09/07/2015 02:41, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:32PM +0300, Haggai Eran wrote:
>> +		if (net_dev) {
>> +			ipoib_warn(priv, "matching net_dev found: %s\n",
>> +				   net_dev->name);
> 
> Is that a debug print?

Yes, I'm afraid it slipped in somehow. I'll remove it.

> 
>> +	default:
>> +		dev_warn(&dev->dev, "duplicate IP address detected\n");
>> +		/* Fall through */
> 
> Surely that needs some kind of rate limit?

I'll use dev_warn_ratelimited().

Thanks,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters
@ 2015-07-09  9:57             ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-09  9:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 09/07/2015 02:41, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:32PM +0300, Haggai Eran wrote:
>> +		if (net_dev) {
>> +			ipoib_warn(priv, "matching net_dev found: %s\n",
>> +				   net_dev->name);
> 
> Is that a debug print?

Yes, I'm afraid it slipped in somehow. I'll remove it.

> 
>> +	default:
>> +		dev_warn(&dev->dev, "duplicate IP address detected\n");
>> +		/* Fall through */
> 
> Surely that needs some kind of rate limit?

I'll use dev_warn_ratelimited().

Thanks,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 02/12] IB/core: Find the network device matching connection parameters
       [not found]     ` <20150708203325.GB16812-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-09 10:18         ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-09 10:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 08/07/2015 23:33, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:31PM +0300, Haggai Eran wrote:
>> +/**
>> + * ib_get_net_dev_by_params() - Return the appropriate net_dev
>> + * for a received CM request
>> + * @dev:	An RDMA device on which the request has been received.
>> + * @port:	Port number on the RDMA device.
>> + * @pkey:	The Pkey the request came on.
>> + * @gid:	A GID that the net_dev uses to communicate.
>> + * @addr:	Contains the IP address that the request specified as its
>> + *		destination.
>> + */
>> +struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
>> +					    u16 pkey, const union ib_gid *gid,
>> +					    const struct sockaddr *addr);
> 
> I feel like this has been repated a few times now, but kdocs should be
> with the function body, not in the header.
Right. I fixed it in the IB/addr patch, but missed it here. I'll move it
to the function's definition.

> Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Thanks,

Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 02/12] IB/core: Find the network device matching connection parameters
@ 2015-07-09 10:18         ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-09 10:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 08/07/2015 23:33, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:31PM +0300, Haggai Eran wrote:
>> +/**
>> + * ib_get_net_dev_by_params() - Return the appropriate net_dev
>> + * for a received CM request
>> + * @dev:	An RDMA device on which the request has been received.
>> + * @port:	Port number on the RDMA device.
>> + * @pkey:	The Pkey the request came on.
>> + * @gid:	A GID that the net_dev uses to communicate.
>> + * @addr:	Contains the IP address that the request specified as its
>> + *		destination.
>> + */
>> +struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
>> +					    u16 pkey, const union ib_gid *gid,
>> +					    const struct sockaddr *addr);
> 
> I feel like this has been repated a few times now, but kdocs should be
> with the function body, not in the header.
Right. I fixed it in the IB/addr patch, but missed it here. I'll move it
to the function's definition.

> Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Thanks,

Haggai

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 05/12] IB/cm: Share listening CM IDs
       [not found]     ` <1434976961-27424-6-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-13 17:48       ` Jason Gunthorpe
       [not found]         ` <20150713174837.GH23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-13 17:48 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Mon, Jun 22, 2015 at 03:42:34PM +0300, Haggai Eran wrote:
>  		spin_lock_irq(&cm.lock);
> +		if (--cm_id_priv->listen_sharecount > 0) {
> +			/* The id is still shared. */
> +			atomic_dec(&cm_id_priv->refcount);

Nit: This looks very strange not to be cm_deref_id .. Looks OK as is
because we are sure refcount cannot be 0 here?

> @@ -958,8 +988,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
>  	}
>  
>  	cm_id->state = IB_CM_LISTEN;
> +	++cm_id_priv->listen_sharecount;
>
> -	spin_lock_irqsave(&cm.lock, flags);
> +	if (lock)
> +		spin_lock_irqsave(&cm.lock, flags);

Hmm, I'd like to see the listen_sharecount consistently locked, so it
should be manipulated only while cm.lock is held..

>  	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
>  		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
>  		cm_id->service_mask = ~cpu_to_be64(0);
> @@ -968,18 +1000,98 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
>  		cm_id->service_mask = service_mask;
>  	}
>  	cur_cm_id_priv = cm_insert_listen(cm_id_priv);
> -	spin_unlock_irqrestore(&cm.lock, flags);
> +	if (lock)
> +		spin_unlock_irqrestore(&cm.lock, flags);
>  
>  	if (cur_cm_id_priv) {
>  		cm_id->state = IB_CM_IDLE;
> +		--cm_id_priv->listen_sharecount;

Ditto

Otherwise I don't see any other mechanical problems with this. Sean
said he was happy with the idea right?

Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids
       [not found]     ` <1434976961-27424-12-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-13 18:06       ` Jason Gunthorpe
  2015-07-14  8:47           ` Haggai Eran
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-13 18:06 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Mon, Jun 22, 2015 at 03:42:40PM +0300, Haggai Eran wrote:
> Use ib_cm_id_create_and_listen to create listening IB CM IDs or share
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Is that the wrong name? ib_cm_insert_listen perhaps?

I think I've looked at the details in this series I was concerned
about, Sean should OK the rest of the changes to the CM code, but
nothing much stood out to me.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found]     ` <1434976961-27424-9-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-13 18:14       ` Jason Gunthorpe
       [not found]         ` <20150713181414.GJ23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-13 18:14 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
> +	switch (ib_event->event) {
> +	case IB_CM_REQ_RECEIVED:
> +		req->device	= req_param->listen_id->device;
> +		req->port	= req_param->port;
> +		req->local_gid	= &req_param->primary_path->sgid;
> +		req->service_id	= req_param->primary_path->service_id;
> +		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);

I feel pretty strongly that we should be using the pkey from the work
completion, not the pkey in the message.

The reason, if someone is using pkey like vlan, and expecting a
container to never receive packets outside the assigned pkey, then we
need to check each and every packet for the correct pkey before
associating it with that container.

When doing the namespace patches you should probably also look at
other CM GMPs than just the REQ and how the paths are setup and
consider what to do with the pkey. I'd probably suggest that the pkey
should be forced throughout the entire process to ensure it always
matches the ip device - at least for containers that is the right
thing.. I probably wouldn't turn it on for the root namespace though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 05/12] IB/cm: Share listening CM IDs
       [not found]         ` <20150713174837.GH23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-14  8:45           ` Haggai Eran
       [not found]             ` <55A4CC1B.1060201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Haggai Eran @ 2015-07-14  8:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 13/07/2015 20:48, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:34PM +0300, Haggai Eran wrote:
>>  		spin_lock_irq(&cm.lock);
>> +		if (--cm_id_priv->listen_sharecount > 0) {
>> +			/* The id is still shared. */
>> +			atomic_dec(&cm_id_priv->refcount);
> 
> Nit: This looks very strange not to be cm_deref_id .. Looks OK as is
> because we are sure refcount cannot be 0 here?

Yes, and because of the spin lock. But I'll change it to cm_deref_id()
so that it's more uniform.

> 
>> @@ -958,8 +988,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
>>  	}
>>  
>>  	cm_id->state = IB_CM_LISTEN;
>> +	++cm_id_priv->listen_sharecount;
>>
>> -	spin_lock_irqsave(&cm.lock, flags);
>> +	if (lock)
>> +		spin_lock_irqsave(&cm.lock, flags);
> 
> Hmm, I'd like to see the listen_sharecount consistently locked, so it
> should be manipulated only while cm.lock is held..

You're right. I'll move it inside the lock.

> 
>>  	if (service_id == IB_CM_ASSIGN_SERVICE_ID) {
>>  		cm_id->service_id = cpu_to_be64(cm.listen_service_id++);
>>  		cm_id->service_mask = ~cpu_to_be64(0);
>> @@ -968,18 +1000,98 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask,
>>  		cm_id->service_mask = service_mask;
>>  	}
>>  	cur_cm_id_priv = cm_insert_listen(cm_id_priv);
>> -	spin_unlock_irqrestore(&cm.lock, flags);
>> +	if (lock)
>> +		spin_unlock_irqrestore(&cm.lock, flags);
>>  
>>  	if (cur_cm_id_priv) {
>>  		cm_id->state = IB_CM_IDLE;
>> +		--cm_id_priv->listen_sharecount;
> 
> Ditto
Okay. I'll move the lock release to the end of the function.

> 
> Otherwise I don't see any other mechanical problems with this. Sean
> said he was happy with the idea right?
I assume he is, since he reviewed the patch and found that
listen_sharecount leak, but he can speak for himself.

> 
> Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
Thanks.
Can I add it with the modifications above?

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids
  2015-07-13 18:06       ` Jason Gunthorpe
@ 2015-07-14  8:47           ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-14  8:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 13/07/2015 21:06, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:40PM +0300, Haggai Eran wrote:
>> Use ib_cm_id_create_and_listen to create listening IB CM IDs or share
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Is that the wrong name? ib_cm_insert_listen perhaps?

Yes, I missed that. Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids
@ 2015-07-14  8:47           ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-14  8:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 13/07/2015 21:06, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:40PM +0300, Haggai Eran wrote:
>> Use ib_cm_id_create_and_listen to create listening IB CM IDs or share
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Is that the wrong name? ib_cm_insert_listen perhaps?

Yes, I missed that. Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 01/12] IB/core: pass client data to remove() callbacks
       [not found]       ` <20150708213410.GA19624-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-14 14:54           ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-14 14:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 09/07/2015 00:34, Jason Gunthorpe wrote:
> On Wed, Jul 08, 2015 at 02:29:10PM -0600, Jason Gunthorpe wrote:
>> On Mon, Jun 22, 2015 at 03:42:30PM +0300, Haggai Eran wrote:
>>> An ib_client callback that is called with the lists_rwsem locked only for
>>> read is protected from changes to the IB client lists, but not from
>>> ib_unregister_device() freeing its client data. This is because
>>> ib_unregister_device() will remove the device from the device list with
>>> lists_rwsem locked for write, but perform the rest of the cleanup,
>>> including the call to remove() without that lock.
>>
>> I was going to look at this, but, uh.. it seems mangled, doesn't
>> apply, doesn't seem fixable from here.
> 
> Okay, I see, it sits on top of the patch from Matan's last
> posting.. My bad.
No problem.

> Hum... I have to say I don't really like this, changing the ordering
> of client_data = NULL with respect to client->remove doesn't seem like
> a great idea - and the rds changes look scary to me, at least I
> couldn't confidently say they were OK..
> 
> And that isn't really the issue - this has nothing to do with
> client_data, it is all about not having a callback running when doing
> remove.
> 
> It looks like the way out of this is to have ib_get_net_dev_by_params
> iterate over the client_data_list and use a dedicated flag in that
> struct to indicate that client&device combination is
> remove-in-progress.
> 
> This would be a bit more efficient as well, and I would suggest
> passing the context in as an arg to the callback.
>
> client_data_list would change a bit to become write locked first by
> write(lists_rwsem), and then second by the spin lock, so holding
> read(lists_rwsem) while iterating is enough locking, and you'd hold
> lists_rwsem while kfreeing.

So, I don't want to keep lists_rwsem for write while calling add() and
remove(). This would cause the deadlock that required the lists_rwsem
patch in the first place. I guess I can drop lists_rwsem before the
add/remove call and take it afterwards.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 01/12] IB/core: pass client data to remove() callbacks
@ 2015-07-14 14:54           ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-14 14:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 09/07/2015 00:34, Jason Gunthorpe wrote:
> On Wed, Jul 08, 2015 at 02:29:10PM -0600, Jason Gunthorpe wrote:
>> On Mon, Jun 22, 2015 at 03:42:30PM +0300, Haggai Eran wrote:
>>> An ib_client callback that is called with the lists_rwsem locked only for
>>> read is protected from changes to the IB client lists, but not from
>>> ib_unregister_device() freeing its client data. This is because
>>> ib_unregister_device() will remove the device from the device list with
>>> lists_rwsem locked for write, but perform the rest of the cleanup,
>>> including the call to remove() without that lock.
>>
>> I was going to look at this, but, uh.. it seems mangled, doesn't
>> apply, doesn't seem fixable from here.
> 
> Okay, I see, it sits on top of the patch from Matan's last
> posting.. My bad.
No problem.

> Hum... I have to say I don't really like this, changing the ordering
> of client_data = NULL with respect to client->remove doesn't seem like
> a great idea - and the rds changes look scary to me, at least I
> couldn't confidently say they were OK..
> 
> And that isn't really the issue - this has nothing to do with
> client_data, it is all about not having a callback running when doing
> remove.
> 
> It looks like the way out of this is to have ib_get_net_dev_by_params
> iterate over the client_data_list and use a dedicated flag in that
> struct to indicate that client&device combination is
> remove-in-progress.
> 
> This would be a bit more efficient as well, and I would suggest
> passing the context in as an arg to the callback.
>
> client_data_list would change a bit to become write locked first by
> write(lists_rwsem), and then second by the spin lock, so holding
> read(lists_rwsem) while iterating is enough locking, and you'd hold
> lists_rwsem while kfreeing.

So, I don't want to keep lists_rwsem for write while calling add() and
remove(). This would cause the deadlock that required the lists_rwsem
patch in the first place. I guess I can drop lists_rwsem before the
add/remove call and take it afterwards.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 05/12] IB/cm: Share listening CM IDs
       [not found]             ` <55A4CC1B.1060201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-14 17:38               ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-14 17:38 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Tue, Jul 14, 2015 at 11:45:15AM +0300, Haggai Eran wrote:

> > Reviewed-By: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
> Thanks.
> Can I add it with the modifications above?

Yep

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found]         ` <20150713181414.GJ23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-15 10:57             ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-15 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 13/07/2015 21:14, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
>> +	switch (ib_event->event) {
>> +	case IB_CM_REQ_RECEIVED:
>> +		req->device	= req_param->listen_id->device;
>> +		req->port	= req_param->port;
>> +		req->local_gid	= &req_param->primary_path->sgid;
>> +		req->service_id	= req_param->primary_path->service_id;
>> +		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
> 
> I feel pretty strongly that we should be using the pkey from the work
> completion, not the pkey in the message.
> 
> The reason, if someone is using pkey like vlan, and expecting a
> container to never receive packets outside the assigned pkey, then we
> need to check each and every packet for the correct pkey before
> associating it with that container.

The way I see it is that you have one RDMA CM agent in the system, and
the header parameters address it. This agent allows addressing several
namespaces, and they are de-muxed according to the parameters in the
payload.

So a container never receives a packet outside of its assigned pkeys,
but the pkey you look at (as well as the GID, and possibly the IP
address) all come from the payload.

> When doing the namespace patches you should probably also look at
> other CM GMPs than just the REQ and how the paths are setup and
> consider what to do with the pkey. I'd probably suggest that the pkey
> should be forced throughout the entire process to ensure it always
> matches the ip device - at least for containers that is the right
> thing.. I probably wouldn't turn it on for the root namespace though..

Once a connection has been established, following GMPs use a unique ID
to address this connection, so no more de-muxing is needed. What is
really missing here I guess is a mechanism that would enforce containers
to only use certain pkeys - perhaps with something like an RDMA cgroup.
It could force containers to only use approved pkeys not only with RDMA
CM, but through uverbs, and other user-space interfaces.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
@ 2015-07-15 10:57             ` Haggai Eran
  0 siblings, 0 replies; 37+ messages in thread
From: Haggai Eran @ 2015-07-15 10:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On 13/07/2015 21:14, Jason Gunthorpe wrote:
> On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
>> +	switch (ib_event->event) {
>> +	case IB_CM_REQ_RECEIVED:
>> +		req->device	= req_param->listen_id->device;
>> +		req->port	= req_param->port;
>> +		req->local_gid	= &req_param->primary_path->sgid;
>> +		req->service_id	= req_param->primary_path->service_id;
>> +		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
> 
> I feel pretty strongly that we should be using the pkey from the work
> completion, not the pkey in the message.
> 
> The reason, if someone is using pkey like vlan, and expecting a
> container to never receive packets outside the assigned pkey, then we
> need to check each and every packet for the correct pkey before
> associating it with that container.

The way I see it is that you have one RDMA CM agent in the system, and
the header parameters address it. This agent allows addressing several
namespaces, and they are de-muxed according to the parameters in the
payload.

So a container never receives a packet outside of its assigned pkeys,
but the pkey you look at (as well as the GID, and possibly the IP
address) all come from the payload.

> When doing the namespace patches you should probably also look at
> other CM GMPs than just the REQ and how the paths are setup and
> consider what to do with the pkey. I'd probably suggest that the pkey
> should be forced throughout the entire process to ensure it always
> matches the ip device - at least for containers that is the right
> thing.. I probably wouldn't turn it on for the root namespace though..

Once a connection has been established, following GMPs use a unique ID
to address this connection, so no more de-muxing is needed. What is
really missing here I guess is a mechanism that would enforce containers
to only use certain pkeys - perhaps with something like an RDMA cgroup.
It could force containers to only use approved pkeys not only with RDMA
CM, but through uverbs, and other user-space interfaces.

Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
  2015-07-15 10:57             ` Haggai Eran
  (?)
@ 2015-07-15 18:49             ` Jason Gunthorpe
       [not found]               ` <20150715184934.GD23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  -1 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-15 18:49 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Doug Ledford, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Wed, Jul 15, 2015 at 01:57:48PM +0300, Haggai Eran wrote:
> On 13/07/2015 21:14, Jason Gunthorpe wrote:
> > On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote:
> >> +	switch (ib_event->event) {
> >> +	case IB_CM_REQ_RECEIVED:
> >> +		req->device	= req_param->listen_id->device;
> >> +		req->port	= req_param->port;
> >> +		req->local_gid	= &req_param->primary_path->sgid;
> >> +		req->service_id	= req_param->primary_path->service_id;
> >> +		req->pkey	= be16_to_cpu(req_param->primary_path->pkey);
> > 
> > I feel pretty strongly that we should be using the pkey from the work
> > completion, not the pkey in the message.
> > 
> > The reason, if someone is using pkey like vlan, and expecting a
> > container to never receive packets outside the assigned pkey, then we
> > need to check each and every packet for the correct pkey before
> > associating it with that container.
> 
> The way I see it is that you have one RDMA CM agent in the system, and
> the header parameters address it. This agent allows addressing several
> namespaces, and they are de-muxed according to the parameters in the
> payload.

That doesn't address my argument at all.

A container should never, ever, process a packet on a pkey that is not
associated with it.

So, it doesn't matter one bit how you decide to demux. After the demux
is done all three pkeys needs to be checked to be compatible with the
container: GMP, Primary Path, Alternate Path.

If any are not compatible the packet *MUST* be dropped.

If you do the above check, you may as well start with the wc's pkey,
since it reflects the logical 'netdevice' that received the packet.

> So a container never receives a packet outside of its assigned pkeys,
> but the pkey you look at (as well as the GID, and possibly the IP
> address) all come from the payload.

That isn't entirey true, at a minimum the above scheme allows an
attacking pkey to create an unbounded number of REQs delivered to
userspace on a victim container, without needing to be part of the
victim's pkey.

> > When doing the namespace patches you should probably also look at
> > other CM GMPs than just the REQ and how the paths are setup and
> > consider what to do with the pkey. I'd probably suggest that the pkey
> > should be forced throughout the entire process to ensure it always
> > matches the ip device - at least for containers that is the right
> > thing.. I probably wouldn't turn it on for the root namespace though..
> 
> Once a connection has been established, following GMPs use a unique ID
> to address this connection, so no more de-muxing is needed.

I think there are open issues here about pkey correctness. Ie the
unique ID is not locked to a pkey, so an attacking pkey can guess the
unique ID and then issue GMPs that will be accepted.

Once locked to a container the unique ID absolutely needs to drop all
packets that are outside the container's pkey space. IMHO this would
be part of the namespace patches.

Basically, pkey in a container should work exactly the same as pkey on
a HCA, packets are dropped before they ever reach the container. This
means every single resource associated with a container much check the
pkey of any packet targeted at that resource.

> What is really missing here I guess is a mechanism that would
> enforce containers to only use certain pkeys - perhaps with
> something like an RDMA cgroup.  It could force containers to only
> use approved pkeys not only with RDMA CM, but through uverbs, and
> other user-space interfaces.

Ideally yes. Without that it just means you can't use pkey
meaningfully with containers..

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found]               ` <20150715184934.GD23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-15 20:27                 ` Liran Liss
       [not found]                   ` <HE1PR05MB1418C82958B3CCF0D3531E5BB19A0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Liran Liss @ 2015-07-15 20:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Haggai Eran
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth

> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
 
> 
> > What is really missing here I guess is a mechanism that would
> > enforce containers to only use certain pkeys - perhaps with
> > something like an RDMA cgroup.  It could force containers to only
> > use approved pkeys not only with RDMA CM, but through uverbs, and
> > other user-space interfaces.
> 
> Ideally yes. Without that it just means you can't use pkey
> meaningfully with containers..
> 

That's why partition enforcement should be separated from namespaces.

If you want to restrict a container to a specific set of pkeys, use cgroups.
This would apply both to CM MADs and QPs.
- In the MAD case, CM MADs would be first matched to a namespace and rdma_id and dropped upon pkey conflict (with either the headers or the payload).
- In the QP case, modify_qp() would fail on conflict.
Partitioning needs to be enforced also for applications that don't use the CM at all...

For namespaces, it seems more natural to lookup the namespace based solely on the CM payload.
After all, it is the payload that designates the entity that you want to establish a connection to, rather than the packet headers, which are just meant to relay the packet to the proper CM agent. (Spec-wise, it is even possible to use completely different node to open a connection on behalf of another node.)
So, the sole role of pkeys in the context of namespaces is to locate the proper namespace; not for partition isolation.

So, as a first step, let's get the namespace lookup in place.
Next, add an rdma crgoup, which would control pkeys, as well as resource counts (you don't want one container to rob the whole system from all QPs...).

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found]                   ` <HE1PR05MB1418C82958B3CCF0D3531E5BB19A0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-07-15 21:03                     ` Jason Gunthorpe
       [not found]                       ` <20150715210342.GA32516-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-15 21:03 UTC (permalink / raw)
  To: Liran Liss
  Cc: Haggai Eran, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth

On Wed, Jul 15, 2015 at 08:27:06PM +0000, Liran Liss wrote:
> If you want to restrict a container to a specific set of pkeys, use
> cgroups.

Ideally yes, but in the absence of a cgroup the set of pkeys assigned
to the container via ipoib is a reasonable alternate.

> This would apply both to CM MADs and QPs.
> - In the MAD case, CM MADs would be first matched to a namespace and rdma_id and dropped upon pkey conflict (with either the headers or the payload).
> - In the QP case, modify_qp() would fail on conflict.
> Partitioning needs to be enforced also for applications that don't use the CM at all...

Yep, that is how pkey checking should work, cgroup or not.

So, you agree with me.

I say that until we get a cgroup capability, the pkey list of the
container is the set of IPoIB interfaces associated with it, and we
still have to do the various checks above. The first check is relavent
to this patchset and should be done by using the GMP's headers to
locate the net device.

> For namespaces, it seems more natural to lookup the namespace based
> solely on the CM payload.

How so? Which payload content do you use? The primary path? The
alternate path?

> After all, it is the payload that designates the entity that you
> want to establish a connection to, rather than the packet headers,
> which are just meant to relay the packet to the proper CM

No, that isn't right. The IBA uses the GMP's destination first, then
serviceID as the demux. Services IDs are not globally unique, they are
scoped by the destination.

The path data is just *routing* it doesn't describe at all the entity
we want to talk to, it is only a proposal for how to flow data to it.

In any event, both the GMP headers and the path data needs to be
checked against the container's pkey list. I don't know why this is so
contentions.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
       [not found]                       ` <20150715210342.GA32516-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-16 12:01                         ` Liran Liss
  2015-07-16 18:22                           ` Jason Gunthorpe
  0 siblings, 1 reply; 37+ messages in thread
From: Liran Liss @ 2015-07-16 12:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Haggai Eran, Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Guy Shapiro, Shachar Raindel,
	Yotam Kenneth

> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]

> > After all, it is the payload that designates the entity that you
> > want to establish a connection to, rather than the packet headers,
> > which are just meant to relay the packet to the proper CM
> 
> No, that isn't right. The IBA uses the GMP's destination first, then
> serviceID as the demux. Services IDs are not globally unique, they are
> scoped by the destination.
> 

The destination is the physical CA port and kernel CM agent, so I don't think the answer is that clear.
Going forward along these lines:
- Name space lookup is done based on BTH.pkey, private_data.IP, and optionally GRH.DGID (if present, for extra validation)
-- Primary and alternate paths are not considered at all

- If P_Key enforcement is set up via cgroups:
-- For CM processing, we only check BTH.pkey
--- Upon conflict, the packet is dropped
-- The primary/alternate path pkeys are not validated by CM, but will be validated during QP modification

Does this sound OK?

In any case, let's complete the namespace lookup first, and then follow up with a cgroup patchset.

> The path data is just *routing* it doesn't describe at all the entity
> we want to talk to, it is only a proposal for how to flow data to it.
> 
> In any event, both the GMP headers and the path data needs to be
> checked against the container's pkey list. I don't know why this is so
> contentions.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
  2015-07-16 12:01                         ` Liran Liss
@ 2015-07-16 18:22                           ` Jason Gunthorpe
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Gunthorpe @ 2015-07-16 18:22 UTC (permalink / raw)
  To: Liran Liss
  Cc: Haggai Eran, Doug Ledford, linux-rdma, netdev, Guy Shapiro,
	Shachar Raindel, Yotam Kenneth

On Thu, Jul 16, 2015 at 12:01:55PM +0000, Liran Liss wrote:

> - Name space lookup is done based on BTH.pkey, private_data.IP, and
>   optionally GRH.DGID (if present, for extra validation)

Just changing the pkey to BTH.pkey would be fine by me.

Using GRH.DGID if available instead of the primary path hack is also
smart, I pointed that out at the beginning..

Jason

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-07-16 18:22 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 12:42 [PATCH v1 00/12] Demux IB CM requests in the rdma_cm module Haggai Eran
2015-06-22 12:42 ` [PATCH v1 01/12] IB/core: pass client data to remove() callbacks Haggai Eran
2015-07-08 20:29   ` Jason Gunthorpe
2015-07-08 21:34     ` Jason Gunthorpe
     [not found]       ` <20150708213410.GA19624-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14 14:54         ` Haggai Eran
2015-07-14 14:54           ` Haggai Eran
2015-06-22 12:42 ` [PATCH v1 02/12] IB/core: Find the network device matching connection parameters Haggai Eran
2015-07-08 20:33   ` Jason Gunthorpe
     [not found]     ` <20150708203325.GB16812-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-09 10:18       ` Haggai Eran
2015-07-09 10:18         ` Haggai Eran
2015-06-22 12:42 ` [PATCH v1 04/12] IB/cm: Expose service ID in request events Haggai Eran
     [not found] ` <1434976961-27424-1-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-06-22 12:42   ` [PATCH v1 03/12] IB/ipoib: Return IPoIB devices matching connection parameters Haggai Eran
     [not found]     ` <1434976961-27424-4-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-08 23:41       ` Jason Gunthorpe
     [not found]         ` <20150708234111.GC16812-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-09  9:57           ` Haggai Eran
2015-07-09  9:57             ` Haggai Eran
2015-06-22 12:42   ` [PATCH v1 05/12] IB/cm: Share listening CM IDs Haggai Eran
     [not found]     ` <1434976961-27424-6-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-13 17:48       ` Jason Gunthorpe
     [not found]         ` <20150713174837.GH23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-14  8:45           ` Haggai Eran
     [not found]             ` <55A4CC1B.1060201-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-14 17:38               ` Jason Gunthorpe
2015-06-22 12:42   ` [PATCH v1 06/12] IB/cma: Refactor RDMA IP CM private-data parsing code Haggai Eran
2015-06-22 12:42   ` [PATCH v1 07/12] IB/cma: Helper functions to access port space IDRs Haggai Eran
2015-06-22 12:42   ` [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM Haggai Eran
     [not found]     ` <1434976961-27424-9-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-13 18:14       ` Jason Gunthorpe
     [not found]         ` <20150713181414.GJ23832-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 10:57           ` Haggai Eran
2015-07-15 10:57             ` Haggai Eran
2015-07-15 18:49             ` Jason Gunthorpe
     [not found]               ` <20150715184934.GD23588-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-15 20:27                 ` Liran Liss
     [not found]                   ` <HE1PR05MB1418C82958B3CCF0D3531E5BB19A0-eBadYZ65MZ87O8BmmlM1zNqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-07-15 21:03                     ` Jason Gunthorpe
     [not found]                       ` <20150715210342.GA32516-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-16 12:01                         ` Liran Liss
2015-07-16 18:22                           ` Jason Gunthorpe
2015-06-22 12:42   ` [PATCH v1 09/12] IB/cma: validate routing of incoming requests Haggai Eran
2015-06-22 12:42   ` [PATCH v1 10/12] IB/cma: use found net_dev for passive connections Haggai Eran
2015-06-22 12:42   ` [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids Haggai Eran
     [not found]     ` <1434976961-27424-12-git-send-email-haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-13 18:06       ` Jason Gunthorpe
2015-07-14  8:47         ` Haggai Eran
2015-07-14  8:47           ` Haggai Eran
2015-06-22 12:42   ` [PATCH v1 12/12] IB/cm: Remove compare_data checks Haggai Eran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.