All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 00/11] Add network namespace support in the RDMA-CM
@ 2015-02-01 11:28 Shachar Raindel
  2015-02-01 11:28 ` [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter Shachar Raindel
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Shachar Raindel

RDMA-CM uses IP based addressing and routing to setup RDMA connections between
hosts. Currently, all of the IP interfaces and addresses used by the RDMA-CM
must reside in the init_net namespace. This restricts the usage of containers
with RDMA to only work with host network namespace (aka the kernel init_net NS
instance).

This patchset allows using network namespaces with the RDMA-CM.

Each RDMA-CM and CM id is keeping a reference to a network namespace.

This reference is based on the process network namespace at the time of the
creation of the object or inherited from the listener.

This network namespace is used to perform all IP and network related
operations. Specifically, the local device lookup, as well as the remote GID
address resolution are done in the context of the RDMA-CM object's namespace.
This allows outgoing connections to reach the right target, even if the same
IP address exists in multiple network namespaces. This can happen if each
network namespace resides on a different pkey.

Additionally, the network namespace is used to split the listener service ID
table. From the user point of view, each network namespace has a unique,
completely independent table of service IDs. This allows running multiple
instances of a single service on the same machine, using containers. To
implement this, the CM layer now parses the IP address from the CM connect
requests, and searches for the matching networking device. The namespace of
the device found is used when looking up the service ID in the listener table.

The functionnality introduced by this series would come into play when the
transport is InfiniBand and IPoIB interfaces are assigned to each namespace.
Multiple IPoIB interfaces can be created and assigned to different RDMA-CM
capable containers, for example using pipework [1].

Full support for RoCE will be introduced in a later stage.

The patches apply against kernel v3.19-rc5, with the patch "RDMA/CMA: Mark
IPv4 addresses correctly when the listener is IPv6" [2] applied.

The patchset is structured as follows:

Patches 1 and 2 are relatively trivial API extensions, requiring the callers
of certain ib_addr and ib_core functions to provide a network namespace, as
needed.

Patches 3 and 4 adds the ability to lookup a network namespace according to
the IP address, device and pkey. It finds the matching IPoIB interfaces, and
safely takes a reference on the network namespace before returning to the
caller.

Patch 5 moves the logic that extracts the IP address from a connect request
into the CM layer. This is needed for the upcoming listener lookup by
namespace.

Patch 6 adds support for network namespaces in the CM layer. All callers are
still passing init_net as the namespace, to maintain backward compatibility.
For incoming requests, the namespace of the relevant IPoIB device is used.

Patches 7 and 8 add proper namespace support to the RDMA-CM module.

Patches 9 and 10 add namespace support to the relevant user facing modules in
the IB stack.


[1] https://github.com/jpetazzo/pipework/pull/108
[2] https://patchwork.kernel.org/patch/5298971/

Guy Shapiro (7):
  IB/addr: Pass network namespace as a parameter
  IB/core: Pass network namespace as a parameter to relevant functions
  IB/ipoib: Return IPoIB devices as possible matches to
    get_net_device_by_port_pkey_ip
  IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to
    ib_cm
  IB/cm: Add network namespace support
  IB/cma: Add support for network namespaces
  IB/ucma: Take the network namespace from the process

Shachar Raindel (1):
  IB/ucm: Add partial support for network namespaces

Yotam Kenneth (2):
  IB/core: Find the network namespace matching connection parameters
  IB/cma: Separate port allocation to network namespaces

 drivers/infiniband/core/addr.c                     |  31 +-
 drivers/infiniband/core/agent.c                    |   4 +-
 drivers/infiniband/core/cm.c                       | 298 ++++++++++++++++--
 drivers/infiniband/core/cma.c                      | 332 +++++++++------------
 drivers/infiniband/core/device.c                   |  57 ++++
 drivers/infiniband/core/mad_rmpp.c                 |  10 +-
 drivers/infiniband/core/ucm.c                      |   4 +-
 drivers/infiniband/core/ucma.c                     |   4 +-
 drivers/infiniband/core/user_mad.c                 |   4 +-
 drivers/infiniband/core/verbs.c                    |  22 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c           |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c            |  21 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c          | 122 +++++++-
 drivers/infiniband/ulp/iser/iser_verbs.c           |   2 +-
 drivers/infiniband/ulp/isert/ib_isert.c            |   2 +-
 drivers/infiniband/ulp/srp/ib_srp.c                |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c              |   5 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |   4 +-
 include/rdma/ib_addr.h                             |  44 ++-
 include/rdma/ib_cm.h                               |  53 +++-
 include/rdma/ib_verbs.h                            |  44 ++-
 include/rdma/rdma_cm.h                             |   6 +-
 net/9p/trans_rdma.c                                |   2 +-
 net/rds/ib.c                                       |   2 +-
 net/rds/ib_cm.c                                    |   2 +-
 net/rds/iw.c                                       |   2 +-
 net/rds/iw_cm.c                                    |   2 +-
 net/rds/rdma_transport.c                           |   2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |   2 +-
 net/sunrpc/xprtrdma/verbs.c                        |   3 +-
 30 files changed, 823 insertions(+), 268 deletions(-)

-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter
  2015-02-01 11:28 [PATCH for-next 00/11] Add network namespace support in the RDMA-CM Shachar Raindel
@ 2015-02-01 11:28 ` Shachar Raindel
       [not found]   ` <1422790133-28725-2-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-02-01 11:28 ` [PATCH for-next 03/10] IB/core: Find the network namespace matching connection parameters Shachar Raindel
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland, sean.hefty
  Cc: linux-rdma, netdev, liranl, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh@mellanox.com>

Add network namespace support to the ib_addr module. For that, all the address
resolution and matching should be done using the appropriate namespace instead
of init_net.

This is achieved by:

1. Adding an explicit network namespace argument to exported function that
   require a namespace.
2. Saving the namespace in the rdma_addr_client structure.
3. Using it when calling networking functions.

In order to preserve the behavior of calling modules, &init_net is
passed as the parameter in calls from other modules. This is modified as
namspace support is added on more levels.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>

---
 drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
 drivers/infiniband/core/cma.c            |  4 ++-
 drivers/infiniband/core/verbs.c          | 14 +++++++---
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
 include/rdma/ib_addr.h                   | 44 ++++++++++++++++++++++++++++----
 5 files changed, 72 insertions(+), 24 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index f80da50d84a5..95beaef6b66d 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 	int ret = -EADDRNOTAVAIL;
 
 	if (dev_addr->bound_dev_if) {
-		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+		dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
 		if (!dev)
 			return -ENODEV;
 		ret = rdma_copy_addr(dev_addr, dev, NULL);
@@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 	}
 
 	switch (addr->sa_family) {
-	case AF_INET:
-		dev = ip_dev_find(&init_net,
-			((struct sockaddr_in *) addr)->sin_addr.s_addr);
+	case AF_INET: {
+		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+
+		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
 
 		if (!dev)
 			return ret;
@@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 			*vlan_id = rdma_vlan_dev_vlan_id(dev);
 		dev_put(dev);
 		break;
-
+	}
 #if IS_ENABLED(CONFIG_IPV6)
 	case AF_INET6:
 		rcu_read_lock();
-		for_each_netdev_rcu(&init_net, dev) {
-			if (ipv6_chk_addr(&init_net,
+		for_each_netdev_rcu(dev_addr->net, dev) {
+			if (ipv6_chk_addr(dev_addr->net,
 					  &((struct sockaddr_in6 *) addr)->sin6_addr,
 					  dev, 1)) {
 				ret = rdma_copy_addr(dev_addr, dev, NULL);
@@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 	fl4.daddr = dst_ip;
 	fl4.saddr = src_ip;
 	fl4.flowi4_oif = addr->bound_dev_if;
-	rt = ip_route_output_key(&init_net, &fl4);
+	rt = ip_route_output_key(addr->net, &fl4);
 	if (IS_ERR(rt)) {
 		ret = PTR_ERR(rt);
 		goto out;
@@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 	fl6.saddr = src_in->sin6_addr;
 	fl6.flowi6_oif = addr->bound_dev_if;
 
-	dst = ip6_route_output(&init_net, NULL, &fl6);
+	dst = ip6_route_output(addr->net, NULL, &fl6);
 	if ((ret = dst->error))
 		goto put;
 
 	if (ipv6_addr_any(&fl6.saddr)) {
-		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
+		ret = ipv6_dev_get_saddr(addr->net,
+					 ip6_dst_idev(dst)->dev,
 					 &fl6.daddr, 0, &fl6.saddr);
 		if (ret)
 			goto put;
@@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
 }
 
 int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
-			       u16 *vlan_id)
+			       u16 *vlan_id, struct net *net)
 {
 	int ret = 0;
 	struct rdma_dev_addr dev_addr;
@@ -481,6 +483,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
 		return ret;
 
 	memset(&dev_addr, 0, sizeof(dev_addr));
+	dev_addr.net = net;
 
 	ctx.addr = &dev_addr;
 	init_completion(&ctx.comp);
@@ -492,7 +495,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
 	wait_for_completion(&ctx.comp);
 
 	memcpy(dmac, dev_addr.dst_dev_addr, ETH_ALEN);
-	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
+	dev = dev_get_by_index(net, dev_addr.bound_dev_if);
 	if (!dev)
 		return -ENODEV;
 	if (vlan_id)
@@ -502,7 +505,8 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
 }
 EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh);
 
-int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
+int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
+				struct net *net)
 {
 	int ret = 0;
 	struct rdma_dev_addr dev_addr;
@@ -517,6 +521,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
 	if (ret)
 		return ret;
 	memset(&dev_addr, 0, sizeof(dev_addr));
+	dev_addr.net = net;
 	ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
 	if (ret)
 		return ret;
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 6e5e11ca7702..aeb2417ec928 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -512,6 +512,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 	INIT_LIST_HEAD(&id_priv->listen_list);
 	INIT_LIST_HEAD(&id_priv->mc_list);
 	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
+	id_priv->id.route.addr.dev_addr.net = &init_net;
 
 	return &id_priv->id;
 }
@@ -637,7 +638,8 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
 	    == RDMA_TRANSPORT_IB &&
 	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
 	    == IB_LINK_LAYER_ETHERNET) {
-		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
+		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
+						  &init_net);
 
 		if (ret)
 			goto out;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f93eb8da7b5a..ca5c4dd8a67a 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -212,7 +212,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 			ah_attr->vlan_id = wc->vlan_id;
 		} else {
 			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
-					ah_attr->dmac, &ah_attr->vlan_id);
+							 ah_attr->dmac,
+							 &ah_attr->vlan_id,
+							 &init_net);
 			if (ret)
 				return ret;
 		}
@@ -882,11 +884,15 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
 			if (!(*qp_attr_mask & IB_QP_VID))
 				qp_attr->vlan_id = rdma_get_vlan_id(&sgid);
 		} else {
-			ret = rdma_addr_find_dmac_by_grh(&sgid, &qp_attr->ah_attr.grh.dgid,
-					qp_attr->ah_attr.dmac, &qp_attr->vlan_id);
+			ret = rdma_addr_find_dmac_by_grh(
+				&sgid,
+				&qp_attr->ah_attr.grh.dgid,
+				qp_attr->ah_attr.dmac, &qp_attr->vlan_id,
+				&init_net);
 			if (ret)
 				goto out;
-			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac, NULL);
+			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
+							  NULL, &init_net);
 			if (ret)
 				goto out;
 		}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index f3cc8c9e65ae..debaac2b6ee8 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -119,7 +119,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
 
 	if (pd->uctx) {
 		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
-                                        attr->dmac, &attr->vlan_id);
+						    attr->dmac, &attr->vlan_id,
+						    &init_net);
 		if (status) {
 			pr_err("%s(): Failed to resolve dmac from gid." 
 				"status = %d\n", __func__, status);
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index ce55906b54a0..40ccf8b83755 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -47,6 +47,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_pack.h>
 #include <net/ipv6.h>
+#include <net/net_namespace.h>
 
 struct rdma_addr_client {
 	atomic_t refcount;
@@ -64,6 +65,16 @@ void rdma_addr_register_client(struct rdma_addr_client *client);
  */
 void rdma_addr_unregister_client(struct rdma_addr_client *client);
 
+/**
+ * struct rdma_dev_addr - Contains resolved RDMA hardware addresses
+ * @src_dev_addr:	Source MAC address.
+ * @dst_dev_addr:	Destination MAC address.
+ * @broadcast:		Broadcast address of the device.
+ * @dev_type:		The interface hardware type of the device.
+ * @bound_dev_if:	An optional device interface index.
+ * @transport:		The transport type used.
+ * @net:		Network namespace containing the bound_dev_if net_dev.
+ */
 struct rdma_dev_addr {
 	unsigned char src_dev_addr[MAX_ADDR_LEN];
 	unsigned char dst_dev_addr[MAX_ADDR_LEN];
@@ -71,11 +82,14 @@ struct rdma_dev_addr {
 	unsigned short dev_type;
 	int bound_dev_if;
 	enum rdma_transport_type transport;
+	struct net *net;
 };
 
 /**
  * rdma_translate_ip - Translate a local IP address to an RDMA hardware
  *   address.
+ *
+ * The dev_addr->net field must be initialized.
  */
 int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
 		      u16 *vlan_id);
@@ -90,7 +104,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
  * @dst_addr: The destination address to resolve.
  * @addr: A reference to a data location that will receive the resolved
  *   addresses.  The data location must remain valid until the callback has
- *   been invoked.
+ *   been invoked. The net field of the addr struct must be valid.
  * @timeout_ms: Amount of time to wait for the address resolution to complete.
  * @callback: Call invoked once address resolution has completed, timed out,
  *   or been canceled.  A status of 0 indicates success.
@@ -110,9 +124,29 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
 
 int rdma_addr_size(struct sockaddr *addr);
 
-int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id);
-int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *smac,
-			       u16 *vlan_id);
+/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for a src GID
+ * @sgid:	Source GID to find the MAC and VLAN for.
+ * @smac:	A buffer to contain the resulting MAC address.
+ * @vlan_id:	Will contain the resulting VLAN ID.
+ * @net:	Network namespace to use for the address resolution.
+ *
+ * It is the caller's responsibility to keep the network namespace alive until
+ * the function returns.
+ */
+int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
+				struct net *net);
+/** rdma_addr_find_dmac_by_grh() - Find the dst MAC and VLAN ID for a GID pair
+ * @sgid:	Source GID to use for the search.
+ * @dgid:	Destination GID to find the details for.
+ * @dmac:	Contains the resulting destination MAC address.
+ * @vlan_id:	Contains the resulting VLAN ID.
+ * @net:	Network namespace to use for the address resolution.
+ *
+ * It is the caller's responsibility to keep the network namespace alive until
+ * the function returns.
+ */
+int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
+			       u16 *vlan_id, struct net *net);
 
 static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
 {
@@ -182,7 +216,7 @@ static inline void iboe_addr_get_sgid(struct rdma_dev_addr *dev_addr,
 	struct net_device *dev;
 	struct in_device *ip4;
 
-	dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
+	dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
 	if (dev) {
 		ip4 = (struct in_device *)dev->ip_ptr;
 		if (ip4 && ip4->ifa_list && ip4->ifa_list->ifa_address)
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 02/10] IB/core: Pass network namespace as a parameter to relevant functions
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-02-01 11:28   ` Shachar Raindel
       [not found]     ` <1422790133-28725-3-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-02-01 11:28   ` [PATCH for-next 04/10] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Shachar Raindel
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add network namespace parameters for the address related ib_core
functions. The parameter is passed to lower level function, instead of
&init_net, so things are done in the correct namespace.

For now pass &init_net on every caller.
Callers that will pass &init_net permanently are marked with an
appropriate comment.

Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
 drivers/infiniband/core/agent.c       |  4 +++-
 drivers/infiniband/core/cm.c          |  9 +++++++--
 drivers/infiniband/core/mad_rmpp.c    | 10 ++++++++--
 drivers/infiniband/core/user_mad.c    |  4 +++-
 drivers/infiniband/core/verbs.c       | 10 ++++++----
 drivers/infiniband/ulp/srpt/ib_srpt.c |  3 ++-
 include/rdma/ib_verbs.h               | 15 +++++++++++++--
 7 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f6d29614cb01..539378d64041 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -99,7 +99,9 @@ void agent_send_response(struct ib_mad *mad, struct ib_grh *grh,
 	}
 
 	agent = port_priv->agent[qpn];
-	ah = ib_create_ah_from_wc(agent->qp->pd, wc, grh, port_num);
+	/* Physical devices (and their MAD replies) always reside in the host
+	 * network namespace */
+	ah = ib_create_ah_from_wc(agent->qp->pd, wc, grh, port_num, &init_net);
 	if (IS_ERR(ah)) {
 		dev_err(&device->dev, "ib_create_ah_from_wc error %ld\n",
 			PTR_ERR(ah));
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e28a494e2a3a..5a45cb76c43e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -290,8 +290,13 @@ static int cm_alloc_response_msg(struct cm_port *port,
 	struct ib_mad_send_buf *m;
 	struct ib_ah *ah;
 
+	/* For IB, the network namespace doesn't affect the created address
+	 * handle, so we use &init_net. In the future, RoCE support will
+	 * require finding a specific network namespace to send the response
+	 * from. */
 	ah = ib_create_ah_from_wc(port->mad_agent->qp->pd, mad_recv_wc->wc,
-				  mad_recv_wc->recv_buf.grh, port->port_num);
+				  mad_recv_wc->recv_buf.grh, port->port_num,
+				  &init_net);
 	if (IS_ERR(ah))
 		return PTR_ERR(ah);
 
@@ -346,7 +351,7 @@ static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
 	av->port = port;
 	av->pkey_index = wc->pkey_index;
 	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
-			   grh, &av->ah_attr);
+			   grh, &av->ah_attr, &init_net);
 }
 
 static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
diff --git a/drivers/infiniband/core/mad_rmpp.c b/drivers/infiniband/core/mad_rmpp.c
index f37878c9c06e..6c1576202965 100644
--- a/drivers/infiniband/core/mad_rmpp.c
+++ b/drivers/infiniband/core/mad_rmpp.c
@@ -157,8 +157,11 @@ static struct ib_mad_send_buf *alloc_response_msg(struct ib_mad_agent *agent,
 	struct ib_ah *ah;
 	int hdr_len;
 
+	/* Physical devices (and their MAD replies) always reside in the host
+	 * network namespace */
 	ah = ib_create_ah_from_wc(agent->qp->pd, recv_wc->wc,
-				  recv_wc->recv_buf.grh, agent->port_num);
+				  recv_wc->recv_buf.grh, agent->port_num,
+				  &init_net);
 	if (IS_ERR(ah))
 		return (void *) ah;
 
@@ -287,10 +290,13 @@ create_rmpp_recv(struct ib_mad_agent_private *agent,
 	if (!rmpp_recv)
 		return NULL;
 
+	/* Physical devices (and their MAD replies) always reside in the host
+	 * network namespace */
 	rmpp_recv->ah = ib_create_ah_from_wc(agent->agent.qp->pd,
 					     mad_recv_wc->wc,
 					     mad_recv_wc->recv_buf.grh,
-					     agent->agent.port_num);
+					     agent->agent.port_num,
+					     &init_net);
 	if (IS_ERR(rmpp_recv->ah))
 		goto error;
 
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 928cdd20e2d1..f34c6077759d 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -239,7 +239,9 @@ static void recv_handler(struct ib_mad_agent *agent,
 
 		ib_init_ah_from_wc(agent->device, agent->port_num,
 				   mad_recv_wc->wc, mad_recv_wc->recv_buf.grh,
-				   &ah_attr);
+				   &ah_attr, &init_net);
+		/* Note that network namespace seperation isn't supported on
+		 * umad yet. */
 
 		packet->mad.hdr.gid_index = ah_attr.grh.sgid_index;
 		packet->mad.hdr.hop_limit = ah_attr.grh.hop_limit;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ca5c4dd8a67a..a51d5d642fb7 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -193,7 +193,8 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
 EXPORT_SYMBOL(ib_create_ah);
 
 int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
-		       struct ib_grh *grh, struct ib_ah_attr *ah_attr)
+		       struct ib_grh *grh, struct ib_ah_attr *ah_attr,
+		       struct net *net)
 {
 	u32 flow_class;
 	u16 gid_index;
@@ -214,7 +215,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
 							 ah_attr->dmac,
 							 &ah_attr->vlan_id,
-							 &init_net);
+							 net);
 			if (ret)
 				return ret;
 		}
@@ -247,12 +248,13 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
 EXPORT_SYMBOL(ib_init_ah_from_wc);
 
 struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc,
-				   struct ib_grh *grh, u8 port_num)
+				   struct ib_grh *grh, u8 port_num,
+				   struct net *net)
 {
 	struct ib_ah_attr ah_attr;
 	int ret;
 
-	ret = ib_init_ah_from_wc(pd->device, port_num, wc, grh, &ah_attr);
+	ret = ib_init_ah_from_wc(pd->device, port_num, wc, grh, &ah_attr, net);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index eb694ddad79f..7867bd554027 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -468,7 +468,8 @@ static void srpt_mad_recv_handler(struct ib_mad_agent *mad_agent,
 		return;
 
 	ah = ib_create_ah_from_wc(mad_agent->qp->pd, mad_wc->wc,
-				  mad_wc->recv_buf.grh, mad_agent->port_num);
+				  mad_wc->recv_buf.grh, mad_agent->port_num,
+				  &init_net);
 	if (IS_ERR(ah))
 		goto err;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0d74f1de99aa..dd4c80cea8d3 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -48,6 +48,7 @@
 #include <linux/rwsem.h>
 #include <linux/scatterlist.h>
 #include <linux/workqueue.h>
+#include <net/net_namespace.h>
 #include <uapi/linux/if_ether.h>
 
 #include <linux/atomic.h>
@@ -1801,9 +1802,14 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
  *   ignored unless the work completion indicates that the GRH is valid.
  * @ah_attr: Returned attributes that can be used when creating an address
  *   handle for replying to the message.
+ * @net: The network namespace to use for address resolution.
+ *
+ * It is the caller's responsibility to make sure the network namespace is
+ * alive until the function returns.
  */
 int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
-		       struct ib_grh *grh, struct ib_ah_attr *ah_attr);
+		       struct ib_grh *grh, struct ib_ah_attr *ah_attr,
+		       struct net *net);
 
 /**
  * ib_create_ah_from_wc - Creates an address handle associated with the
@@ -1813,12 +1819,17 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
  * @grh: References the received global route header.  This parameter is
  *   ignored unless the work completion indicates that the GRH is valid.
  * @port_num: The outbound port number to associate with the address.
+ * @net: The network namespace to use for address resolution.
  *
  * The address handle is used to reference a local or global destination
  * in all UD QP post sends.
+ *
+ * It is the caller's responsibility to make sure the network namespace is
+ * alive until the function returns.
  */
 struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc,
-				   struct ib_grh *grh, u8 port_num);
+				   struct ib_grh *grh, u8 port_num,
+				   struct net *net);
 
 /**
  * ib_modify_ah - Modifies the address vector associated with an address
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 03/10] IB/core: Find the network namespace matching connection parameters
  2015-02-01 11:28 [PATCH for-next 00/11] Add network namespace support in the RDMA-CM Shachar Raindel
  2015-02-01 11:28 ` [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter Shachar Raindel
@ 2015-02-01 11:28 ` Shachar Raindel
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland, sean.hefty
  Cc: linux-rdma, netdev, liranl, Yotam Kenneth, Haggai Eran,
	Shachar Raindel, Guy Shapiro

From: Yotam Kenneth <yotamke@mellanox.com>

In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.

The IB device and port, together with the P_Key and the IP address should be
enough to uniquely identify the ULP net device.

This function is passed to ib_core as part of the ib_client
registration.

Using this functionality, add a way to get the network namespace
corresponding to a work completion. This is needed so that responses to CM
requests can be sent from the same network namespace as the request.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>

---
 drivers/infiniband/core/device.c | 57 ++++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h          | 29 ++++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece765f2..2f06be5b0b59 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -38,6 +38,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/mutex.h>
+#include <linux/netdevice.h>
 #include <rdma/rdma_netlink.h>
 
 #include "core_priv.h"
@@ -733,6 +734,62 @@ int ib_find_pkey(struct ib_device *device,
 }
 EXPORT_SYMBOL(ib_find_pkey);
 
+static struct net_device *ib_get_net_dev_by_port_pkey_ip(struct ib_device *dev,
+							 u8 port,
+							 u16 pkey,
+							 struct sockaddr *addr)
+{
+	struct net_device *ret = NULL;
+	struct ib_client *client;
+
+	mutex_lock(&device_mutex);
+	list_for_each_entry(client, &client_list, list)
+		if (client->get_net_device_by_port_pkey_ip) {
+			ret = client->get_net_device_by_port_pkey_ip(dev, port,
+								     pkey,
+								     addr);
+			if (ret)
+				break;
+		}
+
+	mutex_unlock(&device_mutex);
+	return ret;
+}
+
+struct net *ib_get_net_ns_by_port_pkey_ip(struct ib_device *dev,
+					  u8 port,
+					  u16 pkey,
+					  struct sockaddr *addr)
+{
+	struct net_device *ndev = NULL;
+	struct net *ns;
+
+	switch (rdma_port_get_link_layer(dev, port)) {
+	case IB_LINK_LAYER_INFINIBAND:
+		if (!addr)
+			goto not_found;
+		ndev = ib_get_net_dev_by_port_pkey_ip(dev, port, pkey, addr);
+		break;
+	default:
+		goto not_found;
+	}
+
+	if (!ndev)
+		goto not_found;
+
+	rcu_read_lock();
+	ns = maybe_get_net(dev_net(ndev));
+	dev_put(ndev);
+	rcu_read_unlock();
+	if (!ns)
+		goto not_found;
+	return ns;
+
+not_found:
+	return get_net(&init_net);
+}
+EXPORT_SYMBOL(ib_get_net_ns_by_port_pkey_ip);
+
 static int __init ib_core_init(void)
 {
 	int ret;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index dd4c80cea8d3..12c2ae285b91 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1683,6 +1683,21 @@ struct ib_client {
 	void (*add)   (struct ib_device *);
 	void (*remove)(struct ib_device *);
 
+	/* Returns the net_dev belonging to this ib_client and matching the
+	 * given parameters.
+	 * @dev:	An RDMA device that the net_dev use for communication.
+	 * @port:	A physical port number on the RDMA device.
+	 * @pkey:	P_Key that the net_dev uses if applicable.
+	 * @addr:	An IP address the net_dev is configured with.
+	 *
+	 * An ib_client that implements a net_dev on top of RDMA devices
+	 * (such as IP over IB) should implement this callback, allowing the
+	 * rdma_cm module to find the right net_dev for a given request. */
+	struct net_device *(*get_net_device_by_port_pkey_ip)(
+			struct ib_device *dev,
+			u8 port,
+			u16 pkey,
+			struct sockaddr *addr);
 	struct list_head list;
 };
 
@@ -2682,4 +2697,18 @@ static inline int ib_check_mr_access(int flags)
 int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
 		       struct ib_mr_status *mr_status);
 
+/**
+ * ib_get_net_ns_by_port_pkey_ip() - Return the appropriate net namespace
+ * for a received CM request
+ * @dev:	An RDMA device on which the request has been received.
+ * @port:	Port number on the RDMA device.
+ * @pkey:	The Pkey the request came on.
+ * @addr:	Contains the IP address that the request specified as its
+ *		destination.
+ */
+struct net *ib_get_net_ns_by_port_pkey_ip(struct ib_device *dev,
+					  u8 port,
+					  u16 pkey,
+					  struct sockaddr *addr);
+
 #endif /* IB_VERBS_H */
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 04/10] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-02-01 11:28   ` [PATCH for-next 02/10] IB/core: Pass network namespace as a parameter to relevant functions Shachar Raindel
@ 2015-02-01 11:28   ` Shachar Raindel
  2015-02-01 11:28   ` [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Shachar Raindel
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Implement callback that returns network device to ib_core according to
connection parameters. Check the ipoib device and iterate over all child
devices to look for a match.

For each ipoib device we iterate through all upper devices when searching for
a matching IP, in order to support bonding.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 122 +++++++++++++++++++++++++++++-
 1 file changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 6bad17d4d588..88fb78dd68c9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -48,6 +48,9 @@
 
 #include <linux/jhash.h>
 #include <net/arp.h>
+#include <net/addrconf.h>
+#include <linux/inetdevice.h>
+#include <rdma/ib_cache.h>
 
 #define DRV_VERSION "1.0.0"
 
@@ -91,11 +94,15 @@ struct ib_sa_client ipoib_sa_client;
 static void ipoib_add_one(struct ib_device *device);
 static void ipoib_remove_one(struct ib_device *device);
 static void ipoib_neigh_reclaim(struct rcu_head *rp);
+static struct net_device *ipoib_get_net_device_by_port_pkey_ip(
+		struct ib_device *dev, u8 port, u16 pkey,
+		struct sockaddr *addr);
 
 static struct ib_client ipoib_client = {
 	.name   = "ipoib",
 	.add    = ipoib_add_one,
-	.remove = ipoib_remove_one
+	.remove = ipoib_remove_one,
+	.get_net_device_by_port_pkey_ip = ipoib_get_net_device_by_port_pkey_ip,
 };
 
 int ipoib_open(struct net_device *dev)
@@ -222,6 +229,119 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+static bool ipoib_is_dev_match_addr(struct sockaddr *addr,
+				    struct net_device *dev)
+{
+	struct net *net = dev_net(dev);
+
+	if (addr->sa_family == AF_INET) {
+		struct in_device *in_dev = in_dev_get(dev);
+		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
+		__be32 ret_addr;
+
+		if (!in_dev)
+			return false;
+
+		ret_addr = inet_confirm_addr(net, in_dev, 0,
+					     addr_in->sin_addr.s_addr,
+					     RT_SCOPE_HOST);
+		in_dev_put(in_dev);
+		if (ret_addr)
+			return true;
+	}
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (addr->sa_family == AF_INET6) {
+		struct sockaddr_in6 *addr_in6 = (struct sockaddr_in6 *)addr;
+
+		if (ipv6_chk_addr(net, &addr_in6->sin6_addr, dev, 1))
+			return true;
+	}
+#endif
+	return false;
+}
+
+/**
+ * Find a net_device matching the given address, which is an upper device of
+ * the given net_device.
+ * @addr: IP address to look for.
+ * @dev: base IPoIB net_device
+ *
+ * If found, returns the net_device with a reference held. Otherwise return
+ * NULL.
+ */
+static struct net_device *ipoib_get_net_dev_match_addr(struct sockaddr *addr,
+						       struct net_device *dev)
+{
+	struct net_device *upper,
+			  *result = NULL;
+	struct list_head *iter;
+
+	if (ipoib_is_dev_match_addr(addr, dev)) {
+		dev_hold(dev);
+		return dev;
+	}
+
+	rcu_read_lock();
+	netdev_for_each_all_upper_dev_rcu(dev, upper, iter) {
+		if (ipoib_is_dev_match_addr(addr, upper)) {
+			dev_hold(upper);
+			result = upper;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+static struct net_device *ipoib_get_net_device_by_port_pkey_ip(
+		struct ib_device *dev, u8 port, u16 pkey, struct sockaddr *addr)
+{
+	struct ipoib_dev_priv *priv;
+	struct list_head *dev_list;
+	u16 pkey_index;
+
+	ib_find_cached_pkey(dev, port, pkey, &pkey_index);
+	if (pkey_index == (u16)-1)
+		return NULL;
+
+	if (rdma_node_get_transport(dev->node_type) != RDMA_TRANSPORT_IB)
+		return NULL;
+
+	dev_list = ib_get_client_data(dev, &ipoib_client);
+	if (!dev_list)
+		return NULL;
+
+	list_for_each_entry(priv, dev_list, list) {
+		struct net_device *net_dev = NULL;
+		struct ipoib_dev_priv *child_priv;
+
+		if (priv->port != port)
+			continue;
+
+		if (priv->pkey_index == pkey_index) {
+			net_dev = ipoib_get_net_dev_match_addr(addr, priv->dev);
+			if (net_dev)
+				return net_dev;
+		}
+
+		down_read(&priv->vlan_rwsem);
+		list_for_each_entry(child_priv,
+				    &priv->child_intfs, list) {
+			if (child_priv->pkey_index != pkey_index)
+				continue;
+
+			net_dev = ipoib_get_net_dev_match_addr(
+					addr, child_priv->dev);
+			if (net_dev)
+				break;
+		}
+		up_read(&priv->vlan_rwsem);
+		if (net_dev)
+			return net_dev;
+	}
+	return NULL;
+}
+
 int ipoib_set_mode(struct net_device *dev, const char *buf)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-02-01 11:28   ` [PATCH for-next 02/10] IB/core: Pass network namespace as a parameter to relevant functions Shachar Raindel
  2015-02-01 11:28   ` [PATCH for-next 04/10] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Shachar Raindel
@ 2015-02-01 11:28   ` Shachar Raindel
  2015-02-01 12:55     ` Yann Droneaud
  2015-02-01 11:28   ` [PATCH for-next 06/10] IB/cm: Add network namespace support Shachar Raindel
  2015-02-01 11:28   ` [PATCH for-next 08/10] IB/cma: Add support for network namespaces Shachar Raindel
  4 siblings, 1 reply; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

When receiving a connection request, ib_cm needs to associate the request with
a network namespace. To do this, it needs to know the request's destination
IP. For this the RDMA IP CM packet formatting functionality needs to be
exposed to ib_cm.

This patch merely moves the RDMA IP CM data formatting and parsing functions
to be part of ib_cm. The following patch will utilize the new knowledge to
look-up the appropriate namespace. Each namespace maintains an independent
table of RDMA CM service IDs, allowing isolation and separation between the
network namespaces.

When creating a new incoming connection ID, the code in cm_save_ip_info can no
longer rely on the listener's private data to find the port number, so it
reads it from the requested service ID. This required saving the service ID in
cm_format_paths_from_req.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
 drivers/infiniband/core/cm.c  | 167 ++++++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/cma.c | 166 +++++------------------------------------
 include/rdma/ib_cm.h          |  46 ++++++++++++
 3 files changed, 231 insertions(+), 148 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 5a45cb76c43e..5cc1a4aa9728 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -51,6 +51,7 @@
 
 #include <rdma/ib_cache.h>
 #include <rdma/ib_cm.h>
+#include <rdma/ib.h>
 #include "cm_msgs.h"
 
 MODULE_AUTHOR("Sean Hefty");
@@ -701,6 +702,170 @@ static void cm_reject_sidr_req(struct cm_id_private *cm_id_priv,
 	ib_send_cm_sidr_rep(&cm_id_priv->id, &param);
 }
 
+static inline u8 cm_get_ip_ver(struct cm_hdr *hdr)
+{
+	return hdr->ip_version >> 4;
+}
+
+void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver)
+{
+	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
+}
+EXPORT_SYMBOL(cm_set_ip_ver);
+
+int cm_format_hdr(void *hdr, int family,
+		  struct sockaddr *src_addr,
+		  struct sockaddr *dst_addr)
+{
+	struct cm_hdr *cm_hdr;
+
+	cm_hdr = hdr;
+	cm_hdr->cm_version = RDMA_IP_CM_VERSION;
+	if (family == AF_INET) {
+		struct sockaddr_in *src4, *dst4;
+
+		src4 = (struct sockaddr_in *)src_addr;
+		dst4 = (struct sockaddr_in *)dst_addr;
+
+		cm_set_ip_ver(cm_hdr, 4);
+		cm_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
+		cm_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
+		cm_hdr->port = src4->sin_port;
+	} else if (family == AF_INET6) {
+		struct sockaddr_in6 *src6, *dst6;
+
+		src6 = (struct sockaddr_in6 *)src_addr;
+		dst6 = (struct sockaddr_in6 *)dst_addr;
+
+		cm_set_ip_ver(cm_hdr, 6);
+		cm_hdr->src_addr.ip6 = src6->sin6_addr;
+		cm_hdr->dst_addr.ip6 = dst6->sin6_addr;
+		cm_hdr->port = src6->sin6_port;
+	}
+	return 0;
+}
+EXPORT_SYMBOL(cm_format_hdr);
+
+static void cm_save_ib_info(struct sockaddr *src_addr,
+			    struct sockaddr *dst_addr,
+			    struct ib_sa_path_rec *path)
+{
+	struct sockaddr_ib  *ib;
+
+	if (src_addr) {
+		ib = (struct sockaddr_ib *)src_addr;
+		ib->sib_family = AF_IB;
+		ib->sib_pkey = path->pkey;
+		ib->sib_flowinfo = path->flow_label;
+		memcpy(&ib->sib_addr, &path->sgid, 16);
+		ib->sib_sid = path->service_id;
+		ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
+		ib->sib_scope_id = 0;
+	}
+	if (dst_addr) {
+		ib = (struct sockaddr_ib *)dst_addr;
+		ib->sib_family = AF_IB;
+		ib->sib_pkey = path->pkey;
+		ib->sib_flowinfo = path->flow_label;
+		memcpy(&ib->sib_addr, &path->dgid, 16);
+	}
+}
+
+static void cm_save_ip6_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct cm_hdr *hdr,
+			     __be16 local_port)
+{
+	struct sockaddr_in6 *ip6;
+
+	if (src_addr) {
+		ip6 = (struct sockaddr_in6 *)src_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->dst_addr.ip6;
+		ip6->sin6_port = local_port;
+	}
+
+	if (dst_addr) {
+		ip6 = (struct sockaddr_in6 *)dst_addr;
+		ip6->sin6_family = AF_INET6;
+		ip6->sin6_addr = hdr->src_addr.ip6;
+		ip6->sin6_port = hdr->port;
+	}
+}
+
+static void cm_save_ip4_info(struct sockaddr *src_addr,
+			     struct sockaddr *dst_addr,
+			     struct cm_hdr *hdr,
+			     __be16 local_port)
+{
+	struct sockaddr_in *ip4;
+
+	if (src_addr) {
+		ip4 = (struct sockaddr_in *)src_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
+		ip4->sin_port = local_port;
+	}
+
+	if (dst_addr) {
+		ip4 = (struct sockaddr_in *)dst_addr;
+		ip4->sin_family = AF_INET;
+		ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
+		ip4->sin_port = hdr->port;
+	}
+}
+
+static __be16 cm_port_from_service_id(__be64 service_id)
+{
+	return htons(be64_to_cpu(service_id));
+}
+
+static int cm_save_ip_info(struct sockaddr *src_addr,
+			   struct sockaddr *dst_addr,
+			   struct cm_work *work)
+{
+	struct cm_hdr *hdr;
+	__be16 port;
+
+	hdr = work->cm_event.private_data;
+	if (hdr->cm_version != RDMA_IP_CM_VERSION)
+		return -EINVAL;
+
+	port = cm_port_from_service_id(work->path->service_id);
+
+	switch (cm_get_ip_ver(hdr)) {
+	case 4:
+		cm_save_ip4_info(src_addr, dst_addr, hdr, port);
+		break;
+	case 6:
+		cm_save_ip6_info(src_addr, dst_addr, hdr, port);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int cm_save_net_info(struct sockaddr *src_addr,
+		     struct sockaddr *dst_addr,
+		     struct ib_cm_event *ib_event)
+{
+	struct cm_work *work = container_of(ib_event, struct cm_work, cm_event);
+
+	if ((rdma_port_get_link_layer(work->port->cm_dev->ib_device,
+				      work->port->port_num) ==
+	     IB_LINK_LAYER_INFINIBAND) &&
+	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
+		cm_save_ib_info(src_addr, dst_addr,
+				ib_event->param.req_rcvd.primary_path);
+		return 0;
+	}
+
+	return cm_save_ip_info(src_addr, dst_addr, work);
+}
+EXPORT_SYMBOL(cm_save_net_info);
+
 struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 ib_cm_handler cm_handler,
 				 void *context)
@@ -1260,6 +1425,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 	primary_path->packet_life_time =
 		cm_req_get_primary_local_ack_timeout(req_msg);
 	primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
+	primary_path->service_id = req_msg->service_id;
 
 	if (req_msg->alt_local_lid) {
 		memset(alt_path, 0, sizeof *alt_path);
@@ -1281,6 +1447,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
 		alt_path->packet_life_time =
 			cm_req_get_alt_local_ack_timeout(req_msg);
 		alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
+		alt_path->service_id = req_msg->service_id;
 	}
 }
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index aeb2417ec928..9f6faeb1de5f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -179,23 +179,8 @@ struct iboe_mcast_work {
 	struct cma_multicast	*mc;
 };
 
-union cma_ip_addr {
-	struct in6_addr ip6;
-	struct {
-		__be32 pad[3];
-		__be32 addr;
-	} ip4;
-};
 
-struct cma_hdr {
-	u8 cma_version;
-	u8 ip_version;	/* IP version: 7:4 */
-	__be16 port;
-	union cma_ip_addr src_addr;
-	union cma_ip_addr dst_addr;
-};
 
-#define CMA_VERSION 0x00
 
 static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp)
 {
@@ -234,16 +219,6 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv,
 	return old;
 }
 
-static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
-{
-	return hdr->ip_version >> 4;
-}
-
-static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
-{
-	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
-}
-
 static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
@@ -839,93 +814,9 @@ static inline int cma_any_port(struct sockaddr *addr)
 	return !cma_port(addr);
 }
 
-static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			     struct ib_sa_path_rec *path)
-{
-	struct sockaddr_ib *listen_ib, *ib;
-
-	listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
-	ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
-	ib->sib_family = listen_ib->sib_family;
-	ib->sib_pkey = path->pkey;
-	ib->sib_flowinfo = path->flow_label;
-	memcpy(&ib->sib_addr, &path->sgid, 16);
-	ib->sib_sid = listen_ib->sib_sid;
-	ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
-	ib->sib_scope_id = listen_ib->sib_scope_id;
-
-	ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
-	ib->sib_family = listen_ib->sib_family;
-	ib->sib_pkey = path->pkey;
-	ib->sib_flowinfo = path->flow_label;
-	memcpy(&ib->sib_addr, &path->dgid, 16);
-}
-
-static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
-{
-	struct sockaddr_in *listen4, *ip4;
-
-	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
-	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
-	ip4->sin_port = listen4->sin_port;
-
-	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
-	ip4->sin_family = AF_INET;
-	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
-	ip4->sin_port = hdr->port;
-}
-
-static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			      struct cma_hdr *hdr)
-{
-	struct sockaddr_in6 *listen6, *ip6;
-
-	listen6 = (struct sockaddr_in6 *) &listen_id->route.addr.src_addr;
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->dst_addr.ip6;
-	ip6->sin6_port = listen6->sin6_port;
-
-	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
-	ip6->sin6_family = AF_INET6;
-	ip6->sin6_addr = hdr->src_addr.ip6;
-	ip6->sin6_port = hdr->port;
-}
-
-static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
-			     struct ib_cm_event *ib_event)
-{
-	struct cma_hdr *hdr;
-
-	if ((listen_id->route.addr.src_addr.ss_family == AF_IB) &&
-	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
-		cma_save_ib_info(id, listen_id, ib_event->param.req_rcvd.primary_path);
-		return 0;
-	}
-
-	hdr = ib_event->private_data;
-	if (hdr->cma_version != CMA_VERSION)
-		return -EINVAL;
-
-	switch (cma_get_ip_ver(hdr)) {
-	case 4:
-		cma_save_ip4_info(id, listen_id, hdr);
-		break;
-	case 6:
-		cma_save_ip6_info(id, listen_id, hdr);
-		break;
-	default:
-		return -EINVAL;
-	}
-	return 0;
-}
-
 static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
 {
-	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
+	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cm_hdr);
 }
 
 static void cma_cancel_route(struct rdma_id_private *id_priv)
@@ -1195,7 +1086,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			     (struct sockaddr *)&id->route.addr.dst_addr,
+			     ib_event))
 		goto err;
 
 	rt = &id->route;
@@ -1241,7 +1134,9 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 		return NULL;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (cma_save_net_info(id, listen_id, ib_event))
+	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
+			     (struct sockaddr *)&id->route.addr.dst_addr,
+			     ib_event))
 		goto err;
 
 	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
@@ -1369,7 +1264,7 @@ EXPORT_SYMBOL(rdma_get_service_id);
 static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
 				 struct ib_cm_compare_data *compare)
 {
-	struct cma_hdr *cma_data, *cma_mask;
+	struct cm_hdr *cma_data, *cma_mask;
 	__be32 ip4_addr;
 	struct in6_addr ip6_addr;
 
@@ -1380,8 +1275,8 @@ static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
 	switch (addr->sa_family) {
 	case AF_INET:
 		ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
-		cma_set_ip_ver(cma_data, 4);
-		cma_set_ip_ver(cma_mask, 0xF);
+		cm_set_ip_ver(cma_data, 4);
+		cm_set_ip_ver(cma_mask, 0xF);
 		if (!cma_any_addr(addr)) {
 			cma_data->dst_addr.ip4.addr = ip4_addr;
 			cma_mask->dst_addr.ip4.addr = htonl(~0);
@@ -1389,8 +1284,8 @@ static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
 		break;
 	case AF_INET6:
 		ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
-		cma_set_ip_ver(cma_data, 6);
-		cma_set_ip_ver(cma_mask, 0xF);
+		cm_set_ip_ver(cma_data, 6);
+		cm_set_ip_ver(cma_mask, 0xF);
 		if (!cma_any_addr(addr)) {
 			cma_data->dst_addr.ip6 = ip6_addr;
 			memset(&cma_mask->dst_addr.ip6, 0xFF,
@@ -2615,35 +2510,6 @@ err1:
 }
 EXPORT_SYMBOL(rdma_bind_addr);
 
-static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
-{
-	struct cma_hdr *cma_hdr;
-
-	cma_hdr = hdr;
-	cma_hdr->cma_version = CMA_VERSION;
-	if (cma_family(id_priv) == AF_INET) {
-		struct sockaddr_in *src4, *dst4;
-
-		src4 = (struct sockaddr_in *) cma_src_addr(id_priv);
-		dst4 = (struct sockaddr_in *) cma_dst_addr(id_priv);
-
-		cma_set_ip_ver(cma_hdr, 4);
-		cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
-		cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
-		cma_hdr->port = src4->sin_port;
-	} else if (cma_family(id_priv) == AF_INET6) {
-		struct sockaddr_in6 *src6, *dst6;
-
-		src6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
-		dst6 = (struct sockaddr_in6 *) cma_dst_addr(id_priv);
-
-		cma_set_ip_ver(cma_hdr, 6);
-		cma_hdr->src_addr.ip6 = src6->sin6_addr;
-		cma_hdr->dst_addr.ip6 = dst6->sin6_addr;
-		cma_hdr->port = src6->sin6_port;
-	}
-	return 0;
-}
 
 static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
 				struct ib_cm_event *ib_event)
@@ -2731,7 +2597,9 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
 		       conn_param->private_data_len);
 
 	if (private_data) {
-		ret = cma_format_hdr(private_data, id_priv);
+		ret = cm_format_hdr(private_data, cma_family(id_priv),
+				    cma_src_addr(id_priv),
+				    cma_dst_addr(id_priv));
 		if (ret)
 			goto out;
 		req.private_data = private_data;
@@ -2796,7 +2664,9 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
 
 	route = &id_priv->id.route;
 	if (private_data) {
-		ret = cma_format_hdr(private_data, id_priv);
+		ret = cm_format_hdr(private_data, cma_family(id_priv),
+				    cma_src_addr(id_priv),
+				    cma_dst_addr(id_priv));
 		if (ret)
 			goto out;
 		req.private_data = private_data;
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index 0e3ff30647d5..e418a11afcfe 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -274,6 +274,52 @@ struct ib_cm_event {
 #define CM_LAP_ATTR_ID		cpu_to_be16(0x0019)
 #define CM_APR_ATTR_ID		cpu_to_be16(0x001A)
 
+union cm_ip_addr {
+	struct in6_addr ip6;
+	struct {
+		__be32 pad[3];
+		__be32 addr;
+	} ip4;
+};
+
+struct cm_hdr {
+	u8 cm_version;
+	u8 ip_version;	/* IP version: 7:4 */
+	__be16 port;
+	union cm_ip_addr src_addr;
+	union cm_ip_addr dst_addr;
+};
+
+#define RDMA_IP_CM_VERSION 0x00
+
+/**
+ * cm_format_hdr - Fill in a cm_hdr struct according to connection details
+ * @hdr:      cm_hdr struct to fill
+ * @family:   ip family of the addresses - AF_INET or AF_INTET6
+ * @src_addr: source address of the connection
+ * @dst_addr: destination address of the connection
+ **/
+int cm_format_hdr(void *hdr, int family,
+		  struct sockaddr *src_addr,
+		  struct sockaddr *dst_addr);
+
+/**
+ * cm_save_net_info - saves ib connection event details
+ * @src_addr: source address of the connection
+ * @dst_addr: destination address of the connection
+ * @ib_event: ib event to take connection details from
+ **/
+int cm_save_net_info(struct sockaddr *src_addr,
+		     struct sockaddr *dst_addr,
+		     struct ib_cm_event *ib_event);
+
+/**
+ * cm_set_ip_ver - sets the ip version of a cm_hdr struct
+ * @hdr:    cm_hdr struct to change
+ * @ip_ver: ip version to set - a 4 bit value
+ **/
+void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver);
+
 /**
  * ib_cm_handler - User-defined callback to process communication events.
  * @cm_id: Communication identifier associated with the reported event.
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 06/10] IB/cm: Add network namespace support
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-02-01 11:28   ` [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Shachar Raindel
@ 2015-02-01 11:28   ` Shachar Raindel
  2015-02-01 11:28   ` [PATCH for-next 08/10] IB/cma: Add support for network namespaces Shachar Raindel
  4 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add namespace support to the IB-CM layer.

- Each CM-ID now has a network namespace it is associated with, assigned at
  creation. This namespace is used as needed during subsequent action on the
  CM-ID or related objects.

- All of the relevant calls to ib_addr and ib_core were updated to use the
  namespace from the CM-ID. External APIs were extended as needed to allow
  specifying the namespace where relevant.

- The listening service ID table is now also indexed by the CM-ID namespace.

- For incoming connection requests, we use the connection parameters to select
  namespace. The namespace is matched when looking for listening service ID.

To preserve current behavior pass init_net to ib_cm wherever network namespace
function parameters were added.

The ib_cm_create_id interface now takes a reference to the relevant network
namespace. CM-IDs created by accepting a connection for a listening CM-ID will
also take a reference to the namespace. When the ID is destroyed, the
namespace reference is released.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
 drivers/infiniband/core/cm.c            | 124 ++++++++++++++++++++++++--------
 drivers/infiniband/core/cma.c           |   8 ++-
 drivers/infiniband/core/ucm.c           |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |  21 +++++-
 drivers/infiniband/ulp/srp/ib_srp.c     |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c   |   2 +-
 include/rdma/ib_cm.h                    |   7 +-
 7 files changed, 130 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 5cc1a4aa9728..278970c89acd 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -241,6 +241,8 @@ struct cm_id_private {
 	u8 service_timeout;
 	u8 target_ack_delay;
 
+	struct net *net; /* A network namespace that the ID belongs to */
+
 	struct list_head work_list;
 	atomic_t work_count;
 };
@@ -347,12 +349,13 @@ static void cm_set_private_data(struct cm_id_private *cm_id_priv,
 }
 
 static void cm_init_av_for_response(struct cm_port *port, struct ib_wc *wc,
-				    struct ib_grh *grh, struct cm_av *av)
+				    struct ib_grh *grh, struct cm_av *av,
+				    struct net *net)
 {
 	av->port = port;
 	av->pkey_index = wc->pkey_index;
 	ib_init_ah_from_wc(port->cm_dev->ib_device, port->port_num, wc,
-			   grh, &av->ah_attr, &init_net);
+			   grh, &av->ah_attr, net);
 }
 
 static int cm_init_av_by_path(struct ib_sa_path_rec *path, struct cm_av *av)
@@ -521,10 +524,15 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 		if ((cur_cm_id_priv->id.service_mask & service_id) ==
 		    (service_mask & cur_cm_id_priv->id.service_id) &&
 		    (cm_id_priv->id.device == cur_cm_id_priv->id.device) &&
-		    !data_cmp)
+		    !data_cmp &&
+		    net_eq(cm_id_priv->net, cur_cm_id_priv->net))
 			return cur_cm_id_priv;
 
-		if (cm_id_priv->id.device < cur_cm_id_priv->id.device)
+		if (cm_id_priv->net < cur_cm_id_priv->net)
+			link = &(*link)->rb_left;
+		else if (cm_id_priv->net > cur_cm_id_priv->net)
+			link = &(*link)->rb_right;
+		else if	(cm_id_priv->id.device < cur_cm_id_priv->id.device)
 			link = &(*link)->rb_left;
 		else if (cm_id_priv->id.device > cur_cm_id_priv->id.device)
 			link = &(*link)->rb_right;
@@ -544,7 +552,8 @@ static struct cm_id_private * cm_insert_listen(struct cm_id_private *cm_id_priv)
 
 static struct cm_id_private * cm_find_listen(struct ib_device *device,
 					     __be64 service_id,
-					     u8 *private_data)
+					     u8 *private_data,
+					     struct net *net)
 {
 	struct rb_node *node = cm.listen_service_table.rb_node;
 	struct cm_id_private *cm_id_priv;
@@ -556,10 +565,14 @@ static struct cm_id_private * cm_find_listen(struct ib_device *device,
 						   cm_id_priv->compare_data);
 		if ((cm_id_priv->id.service_mask & service_id) ==
 		     cm_id_priv->id.service_id &&
-		    (cm_id_priv->id.device == device) && !data_cmp)
+		    (cm_id_priv->id.device == device) && !data_cmp &&
+		    net_eq(cm_id_priv->net, net))
 			return cm_id_priv;
-
-		if (device < cm_id_priv->id.device)
+		if (net < cm_id_priv->net)
+			node = node->rb_left;
+		else if (net > cm_id_priv->net)
+			node = node->rb_right;
+		else if (device < cm_id_priv->id.device)
 			node = node->rb_left;
 		else if (device > cm_id_priv->id.device)
 			node = node->rb_right;
@@ -868,7 +881,8 @@ EXPORT_SYMBOL(cm_save_net_info);
 
 struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 ib_cm_handler cm_handler,
-				 void *context)
+				 void *context,
+				 struct net *net)
 {
 	struct cm_id_private *cm_id_priv;
 	int ret;
@@ -886,6 +900,8 @@ struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 	if (ret)
 		goto error;
 
+	cm_id_priv->net = get_net(net);
+
 	spin_lock_init(&cm_id_priv->lock);
 	init_completion(&cm_id_priv->comp);
 	INIT_LIST_HEAD(&cm_id_priv->work_list);
@@ -1089,6 +1105,7 @@ retest:
 		cm_free_work(work);
 	kfree(cm_id_priv->compare_data);
 	kfree(cm_id_priv->private_data);
+	put_net(cm_id_priv->net);
 	kfree(cm_id_priv);
 }
 
@@ -1608,7 +1625,8 @@ free:	cm_free_msg(msg);
 }
 
 static struct cm_id_private * cm_match_req(struct cm_work *work,
-					   struct cm_id_private *cm_id_priv)
+					   struct cm_id_private *cm_id_priv,
+					   struct net *net)
 {
 	struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
 	struct cm_timewait_info *timewait_info;
@@ -1644,7 +1662,8 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 	/* Find matching listen request. */
 	listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device,
 					   req_msg->service_id,
-					   req_msg->private_data);
+					   req_msg->private_data,
+					   net);
 	if (!listen_cm_id_priv) {
 		cm_cleanup_timewait(cm_id_priv->timewait_info);
 		spin_unlock_irq(&cm.lock);
@@ -1690,24 +1709,58 @@ static void cm_process_routed_req(struct cm_req_msg *req_msg, struct ib_wc *wc)
 	}
 }
 
+static int cm_is_cma_service_id(__be64 service_id)
+{
+	return (IB_CMA_SERVICE_ID_MASK & service_id) == IB_CMA_SERVICE_ID;
+}
+
+static struct net *cm_get_net_ns(struct cm_work *work, __be64 service_id,
+				 __be16 pkey)
+{
+	struct sockaddr_storage addr_storage;
+	struct sockaddr *listen_addr;
+
+	if (cm_is_cma_service_id(service_id)) {
+		listen_addr = (struct sockaddr *)&addr_storage;
+		cm_save_ip_info(listen_addr, NULL, work);
+	} else {
+		/* On RoCE we could extend this branch to determine the
+		 * destination IP from the incoming packet headers, and add
+		 * support for services that are not RDMA IP CM compliant. */
+		listen_addr = NULL;
+	}
+
+	return ib_get_net_ns_by_port_pkey_ip(work->port->cm_dev->ib_device,
+					     work->port->port_num,
+					     be16_to_cpu(pkey),
+					     listen_addr);
+}
 static int cm_req_handler(struct cm_work *work)
 {
 	struct ib_cm_id *cm_id;
 	struct cm_id_private *cm_id_priv, *listen_cm_id_priv;
 	struct cm_req_msg *req_msg;
+	struct net *net;
 	int ret;
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
+	work->cm_event.private_data = req_msg->private_data;
 
-	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
-	if (IS_ERR(cm_id))
-		return PTR_ERR(cm_id);
+	net = cm_get_net_ns(work, req_msg->service_id, req_msg->pkey);
+
+	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL, net);
+	/* cm_id took a reference to net, so no need to hold it anymore */
+	put_net(net);
+	if (IS_ERR(cm_id)) {
+		ret = PTR_ERR(cm_id);
+		goto out;
+	}
 
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 	cm_id_priv->id.remote_id = req_msg->local_comm_id;
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
-				&cm_id_priv->av);
+				&cm_id_priv->av, net);
 	cm_id_priv->timewait_info = cm_create_timewait_info(cm_id_priv->
 							    id.local_id);
 	if (IS_ERR(cm_id_priv->timewait_info)) {
@@ -1718,7 +1771,7 @@ static int cm_req_handler(struct cm_work *work)
 	cm_id_priv->timewait_info->remote_ca_guid = req_msg->local_ca_guid;
 	cm_id_priv->timewait_info->remote_qpn = cm_req_get_local_qpn(req_msg);
 
-	listen_cm_id_priv = cm_match_req(work, cm_id_priv);
+	listen_cm_id_priv = cm_match_req(work, cm_id_priv, net);
 	if (!listen_cm_id_priv) {
 		ret = -EINVAL;
 		kfree(cm_id_priv->timewait_info);
@@ -1777,6 +1830,7 @@ rejected:
 	cm_deref_id(listen_cm_id_priv);
 destroy:
 	ib_destroy_cm_id(cm_id);
+out:
 	return ret;
 }
 
@@ -2911,7 +2965,7 @@ static int cm_lap_handler(struct cm_work *work)
 	cm_id_priv->tid = lap_msg->hdr.tid;
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
-				&cm_id_priv->av);
+				&cm_id_priv->av, cm_id_priv->net);
 	cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av);
 	ret = atomic_inc_and_test(&cm_id_priv->work_count);
 	if (!ret)
@@ -3161,21 +3215,31 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	struct cm_id_private *cm_id_priv, *cur_cm_id_priv;
 	struct cm_sidr_req_msg *sidr_req_msg;
 	struct ib_wc *wc;
+	struct net *net;
+	int retval;
+
+	sidr_req_msg = (struct cm_sidr_req_msg *)
+				work->mad_recv_wc->recv_buf.mad;
+	work->cm_event.private_data = sidr_req_msg->private_data;
+
+	net = cm_get_net_ns(work, sidr_req_msg->service_id, sidr_req_msg->pkey);
 
-	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL);
-	if (IS_ERR(cm_id))
-		return PTR_ERR(cm_id);
+	cm_id = ib_create_cm_id(work->port->cm_dev->ib_device, NULL, NULL, net);
+	/* cm_id took a reference to net, so no need to hold it anymore */
+	put_net(net);
+	if (IS_ERR(cm_id)) {
+		retval = PTR_ERR(cm_id);
+		goto out;
+	}
 	cm_id_priv = container_of(cm_id, struct cm_id_private, id);
 
 	/* Record SGID/SLID and request ID for lookup. */
-	sidr_req_msg = (struct cm_sidr_req_msg *)
-				work->mad_recv_wc->recv_buf.mad;
 	wc = work->mad_recv_wc->wc;
 	cm_id_priv->av.dgid.global.subnet_prefix = cpu_to_be64(wc->slid);
 	cm_id_priv->av.dgid.global.interface_id = 0;
 	cm_init_av_for_response(work->port, work->mad_recv_wc->wc,
 				work->mad_recv_wc->recv_buf.grh,
-				&cm_id_priv->av);
+				&cm_id_priv->av, net);
 	cm_id_priv->id.remote_id = sidr_req_msg->request_id;
 	cm_id_priv->tid = sidr_req_msg->hdr.tid;
 	atomic_inc(&cm_id_priv->work_count);
@@ -3186,16 +3250,19 @@ static int cm_sidr_req_handler(struct cm_work *work)
 		spin_unlock_irq(&cm.lock);
 		atomic_long_inc(&work->port->counter_group[CM_RECV_DUPLICATES].
 				counter[CM_SIDR_REQ_COUNTER]);
-		goto out; /* Duplicate message. */
+		retval = -EINVAL; /* Duplicate message. */
+		goto out_id;
 	}
 	cm_id_priv->id.state = IB_CM_SIDR_REQ_RCVD;
 	cur_cm_id_priv = cm_find_listen(cm_id->device,
 					sidr_req_msg->service_id,
-					sidr_req_msg->private_data);
+					sidr_req_msg->private_data,
+					net);
 	if (!cur_cm_id_priv) {
 		spin_unlock_irq(&cm.lock);
 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_UNSUPPORTED);
-		goto out; /* No match. */
+		retval = -EINVAL; /* No match. */
+		goto out_id;
 	}
 	atomic_inc(&cur_cm_id_priv->refcount);
 	atomic_inc(&cm_id_priv->refcount);
@@ -3210,9 +3277,10 @@ static int cm_sidr_req_handler(struct cm_work *work)
 	cm_process_work(cm_id_priv, work);
 	cm_deref_id(cur_cm_id_priv);
 	return 0;
-out:
+out_id:
 	ib_destroy_cm_id(&cm_id_priv->id);
-	return -EINVAL;
+out:
+	return retval;
 }
 
 static void cm_format_sidr_rep(struct cm_sidr_rep_msg *sidr_rep_msg,
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9f6faeb1de5f..1ce84a03c883 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1456,7 +1456,8 @@ static int cma_ib_listen(struct rdma_id_private *id_priv)
 	__be64 svc_id;
 	int ret;
 
-	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv);
+	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
+			     &init_net);
 	if (IS_ERR(id))
 		return PTR_ERR(id);
 
@@ -2606,7 +2607,7 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
 	}
 
 	id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler,
-			     id_priv);
+			     id_priv, &init_net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
@@ -2655,7 +2656,8 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
 		memcpy(private_data + offset, conn_param->private_data,
 		       conn_param->private_data_len);
 
-	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv);
+	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv,
+			     &init_net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index f2f63933e8a9..9604ab068984 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -489,7 +489,8 @@ static ssize_t ib_ucm_create_id(struct ib_ucm_file *file,
 
 	ctx->uid = cmd.uid;
 	ctx->cm_id = ib_create_cm_id(file->device->ib_dev,
-				     ib_ucm_event_handler, ctx);
+				     ib_ucm_event_handler, ctx,
+				     &init_net);
 	if (IS_ERR(ctx->cm_id)) {
 		result = PTR_ERR(ctx->cm_id);
 		goto err1;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 56959adb6c7d..65dbe4523bf5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -846,7 +846,15 @@ int ipoib_cm_dev_open(struct net_device *dev)
 	if (!IPOIB_CM_SUPPORTED(dev->dev_addr))
 		return 0;
 
-	priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev);
+	/*
+	 * The IPoIB CM ID should always be in the init_net namespace.
+	 * It is using a service ID which is not in the RDMA IP CM
+	 * range.  Furthermore, it is guaranteed that this service ID
+	 * will be unique in the machine, as it is based on the UD QP
+	 * number.
+	 */
+	priv->cm.id = ib_create_cm_id(priv->ca, ipoib_cm_rx_handler, dev,
+				      &init_net);
 	if (IS_ERR(priv->cm.id)) {
 		printk(KERN_WARNING "%s: failed to create CM ID\n", priv->ca->name);
 		ret = PTR_ERR(priv->cm.id);
@@ -1130,7 +1138,16 @@ static int ipoib_cm_tx_init(struct ipoib_cm_tx *p, u32 qpn,
 		goto err_qp;
 	}
 
-	p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p);
+	/*
+	 * The IPoIB CM ID should always be in the init_net namespace.
+	 *
+	 * The target for connection is specified by an explicit GID,
+	 * which is machine global and not specific for the namespace
+	 * the device resides at. The service ID is also guaranteed to
+	 * be per machine unique, and therefore init_net is the right
+	 * namespace.
+	 */
+	p->id = ib_create_cm_id(priv->ca, ipoib_cm_tx_handler, p, &init_net);
 	if (IS_ERR(p->id)) {
 		ret = PTR_ERR(p->id);
 		ipoib_warn(priv, "failed to create tx cm id: %d\n", ret);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0747c0595a9d..3b418be1509f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -294,7 +294,7 @@ static int srp_new_cm_id(struct srp_rdma_ch *ch)
 	struct ib_cm_id *new_cm_id;
 
 	new_cm_id = ib_create_cm_id(target->srp_host->srp_dev->dev,
-				    srp_cm_handler, ch);
+				    srp_cm_handler, ch, &init_net);
 	if (IS_ERR(new_cm_id))
 		return PTR_ERR(new_cm_id);
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 7867bd554027..c0a1de3f5a95 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -3247,7 +3247,7 @@ static void srpt_add_one(struct ib_device *device)
 	if (!srpt_service_guid)
 		srpt_service_guid = be64_to_cpu(device->node_guid);
 
-	sdev->cm_id = ib_create_cm_id(device, srpt_cm_handler, sdev);
+	sdev->cm_id = ib_create_cm_id(device, srpt_cm_handler, sdev, &init_net);
 	if (IS_ERR(sdev->cm_id))
 		goto err_srq;
 
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index e418a11afcfe..331095b6e84b 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -359,13 +359,18 @@ struct ib_cm_id {
  * @cm_handler: Callback invoked to notify the user of CM events.
  * @context: User specified context associated with the communication
  *   identifier.
+ * @net: Network namespace associated with the cm_id.
  *
  * Communication identifiers are used to track connection states, service
  * ID resolution requests, and listen requests.
+ *
+ * The created CM ID will hold a reference on the network namespace until its
+ * destruction.
  */
 struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
 				 ib_cm_handler cm_handler,
-				 void *context);
+				 void *context,
+				 struct net *net);
 
 /**
  * ib_destroy_cm_id - Destroy a connection identifier.
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 07/10] IB/cma: Separate port allocation to network namespaces
  2015-02-01 11:28 [PATCH for-next 00/11] Add network namespace support in the RDMA-CM Shachar Raindel
                   ` (2 preceding siblings ...)
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-02-01 11:28 ` Shachar Raindel
  2015-02-01 11:28 ` [PATCH for-next 09/10] IB/ucma: Take the network namespace from the process Shachar Raindel
  2015-02-01 11:28 ` [PATCH for-next 10/10] IB/ucm: Add partial support for network namespaces Shachar Raindel
  5 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland, sean.hefty
  Cc: linux-rdma, netdev, liranl, Yotam Kenneth, Haggai Eran,
	Shachar Raindel, Guy Shapiro

From: Yotam Kenneth <yotamke@mellanox.com>

Keep a radix-tree for the network namespaces we support for each port-space.
Dynamically allocate idr for network namespace upon first bind request for a
port in the (ps, net) tuple.
Destroy the idr when the (ps, net) tuple does not contain any bounded ports.

This patch is internal infrastructure work for the following patch. In
this patch, init_net is statically used as the network namespace for
the new port-space API.

The radix-tree is protected under the same locking that protects the
rest of the port space data. This locking is practically a big, static
mutex lock for the entire module.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>

---
 drivers/infiniband/core/cma.c | 122 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 99 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1ce84a03c883..022b0d0a51cc 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -39,11 +39,13 @@
 #include <linux/mutex.h>
 #include <linux/random.h>
 #include <linux/idr.h>
+#include <linux/radix-tree.h>
 #include <linux/inetdevice.h>
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <net/route.h>
 
+#include <net/netns/hash.h>
 #include <net/tcp.h>
 #include <net/ipv6.h>
 
@@ -80,10 +82,83 @@ static LIST_HEAD(dev_list);
 static LIST_HEAD(listen_any_list);
 static DEFINE_MUTEX(lock);
 static struct workqueue_struct *cma_wq;
-static DEFINE_IDR(tcp_ps);
-static DEFINE_IDR(udp_ps);
-static DEFINE_IDR(ipoib_ps);
-static DEFINE_IDR(ib_ps);
+static RADIX_TREE(tcp_ps, GFP_KERNEL);
+static RADIX_TREE(udp_ps, GFP_KERNEL);
+static RADIX_TREE(ipoib_ps, GFP_KERNEL);
+static RADIX_TREE(ib_ps, GFP_KERNEL);
+
+static LIST_HEAD(idrs_list);
+
+struct idr_ll {
+	unsigned net_val;
+	struct net *net;
+	struct radix_tree_root *ps;
+	struct idr idr;
+};
+
+static void zap_ps_idr(struct idr_ll *idr_ll)
+{
+	radix_tree_delete(idr_ll->ps, idr_ll->net_val);
+	idr_destroy(&idr_ll->idr);
+	kfree(idr_ll);
+}
+
+static int cma_ps_alloc(struct radix_tree_root *ps, struct net *net, void *ptr,
+			int snum)
+{
+	struct idr_ll *idr_ll;
+	int err;
+	int res;
+
+	idr_ll = radix_tree_lookup(ps, net_hash_mix(net));
+	if (!idr_ll) {
+		idr_ll = kmalloc(sizeof(*idr_ll), GFP_KERNEL);
+		if (!idr_ll)
+			return -ENOMEM;
+		idr_init(&idr_ll->idr);
+		idr_ll->net_val = net_hash_mix(net);
+		idr_ll->net = net;
+		idr_ll->ps = ps;
+		err = radix_tree_insert(ps, idr_ll->net_val, idr_ll);
+		if (err) {
+			idr_destroy(&idr_ll->idr);
+			kfree(idr_ll);
+			return err;
+		}
+	}
+	res = idr_alloc(&idr_ll->idr, ptr, snum, snum + 1, GFP_KERNEL);
+	if (unlikely((res < 0) && idr_is_empty(&idr_ll->idr))) {
+		zap_ps_idr(idr_ll);
+		return res;
+	}
+	return res;
+}
+
+static void *cma_ps_find(struct radix_tree_root *ps, struct net *net, int snum)
+{
+	struct idr_ll *idr_ll;
+
+	idr_ll = radix_tree_lookup(ps, net_hash_mix(net));
+	if (!idr_ll)
+		return NULL;
+	return idr_find(&idr_ll->idr, snum);
+}
+
+static void cma_ps_remove(struct radix_tree_root *ps, struct net *net, int snum)
+{
+	struct idr_ll *idr_ll;
+
+	idr_ll = radix_tree_lookup(ps, net_hash_mix(net));
+	if (unlikely(!idr_ll)) {
+		WARN(1, "cma_ps_removed can't find expected net ns 0x%lx\n",
+		     (unsigned long)net);
+		return;
+	}
+	idr_remove(&idr_ll->idr, snum);
+	if (idr_is_empty(&idr_ll->idr)) {
+		zap_ps_idr(idr_ll);
+	}
+}
 
 struct cma_device {
 	struct list_head	list;
@@ -94,9 +169,9 @@ struct cma_device {
 };
 
 struct rdma_bind_list {
-	struct idr		*ps;
-	struct hlist_head	owners;
-	unsigned short		port;
+	struct radix_tree_root	*ps;
+	struct hlist_head		owners;
+	unsigned short			port;
 };
 
 enum {
@@ -885,7 +960,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_lock(&lock);
 	hlist_del(&id_priv->node);
 	if (hlist_empty(&bind_list->owners)) {
-		idr_remove(bind_list->ps, bind_list->port);
+		cma_ps_remove(bind_list->ps, &init_net, bind_list->port);
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
@@ -2198,8 +2273,8 @@ static void cma_bind_port(struct rdma_bind_list *bind_list,
 	hlist_add_head(&id_priv->node, &bind_list->owners);
 }
 
-static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
-			  unsigned short snum)
+static int cma_alloc_port(struct radix_tree_root *ps,
+			  struct rdma_id_private *id_priv, unsigned short snum)
 {
 	struct rdma_bind_list *bind_list;
 	int ret;
@@ -2208,7 +2283,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
+	ret = cma_ps_alloc(ps, &init_net, bind_list, snum);
 	if (ret < 0)
 		goto err;
 
@@ -2221,7 +2296,8 @@ err:
 	return ret == -ENOSPC ? -EADDRNOTAVAIL : ret;
 }
 
-static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_alloc_any_port(struct radix_tree_root *ps,
+			      struct rdma_id_private *id_priv)
 {
 	static unsigned int last_used_port;
 	int low, high, remaining;
@@ -2232,7 +2308,7 @@ static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 	rover = prandom_u32() % remaining + low;
 retry:
 	if (last_used_port != rover &&
-	    !idr_find(ps, (unsigned short) rover)) {
+	    !cma_ps_find(ps, &init_net, (unsigned short)rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*
 		 * Remember previously used port number in order to avoid
@@ -2257,6 +2333,8 @@ retry:
  * bind to a specific port, or when trying to listen on a bound port.  In
  * the latter case, the provided id_priv may already be on the bind_list, but
  * we still need to check that it's okay to start listening.
+ *
+ * Assume the bind_list contains only services from the correct name space.
  */
 static int cma_check_port(struct rdma_bind_list *bind_list,
 			  struct rdma_id_private *id_priv, uint8_t reuseaddr)
@@ -2287,7 +2365,8 @@ static int cma_check_port(struct rdma_bind_list *bind_list,
 	return 0;
 }
 
-static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
+static int cma_use_port(struct radix_tree_root *ps,
+			struct rdma_id_private *id_priv)
 {
 	struct rdma_bind_list *bind_list;
 	unsigned short snum;
@@ -2297,7 +2376,7 @@ static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
 	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
-	bind_list = idr_find(ps, snum);
+	bind_list = cma_ps_find(ps, &init_net, snum);
 	if (!bind_list) {
 		ret = cma_alloc_port(ps, id_priv, snum);
 	} else {
@@ -2320,7 +2399,8 @@ static int cma_bind_listen(struct rdma_id_private *id_priv)
 	return ret;
 }
 
-static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
+static struct radix_tree_root *cma_select_inet_ps(
+		struct rdma_id_private *id_priv)
 {
 	switch (id_priv->id.ps) {
 	case RDMA_PS_TCP:
@@ -2336,9 +2416,9 @@ static struct idr *cma_select_inet_ps(struct rdma_id_private *id_priv)
 	}
 }
 
-static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
+static struct radix_tree_root *cma_select_ib_ps(struct rdma_id_private *id_priv)
 {
-	struct idr *ps = NULL;
+	struct radix_tree_root *ps = NULL;
 	struct sockaddr_ib *sib;
 	u64 sid_ps, mask, sid;
 
@@ -2369,7 +2449,7 @@ static struct idr *cma_select_ib_ps(struct rdma_id_private *id_priv)
 
 static int cma_get_port(struct rdma_id_private *id_priv)
 {
-	struct idr *ps;
+	struct radix_tree_root *ps;
 	int ret;
 
 	if (cma_family(id_priv) != AF_IB)
@@ -3567,10 +3647,6 @@ static void __exit cma_cleanup(void)
 	rdma_addr_unregister_client(&addr_client);
 	ib_sa_unregister_client(&sa_client);
 	destroy_workqueue(cma_wq);
-	idr_destroy(&tcp_ps);
-	idr_destroy(&udp_ps);
-	idr_destroy(&ipoib_ps);
-	idr_destroy(&ib_ps);
 }
 
 module_init(cma_init);
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 08/10] IB/cma: Add support for network namespaces
       [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-02-01 11:28   ` [PATCH for-next 06/10] IB/cm: Add network namespace support Shachar Raindel
@ 2015-02-01 11:28   ` Shachar Raindel
  2015-02-01 13:44     ` Yann Droneaud
  4 siblings, 1 reply; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is used
   to populate the network namespace field in rdma_id_private. rdma_create_id
   keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside of
   ib_cma.
3. Decrementing the reference count for the appropriate network namespace when
   calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling from
other modules.

Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

---
 drivers/infiniband/core/cma.c                      | 52 +++++++++++++---------
 drivers/infiniband/core/ucma.c                     |  3 +-
 drivers/infiniband/ulp/iser/iser_verbs.c           |  2 +-
 drivers/infiniband/ulp/isert/ib_isert.c            |  2 +-
 .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |  4 +-
 include/rdma/rdma_cm.h                             |  6 ++-
 net/9p/trans_rdma.c                                |  2 +-
 net/rds/ib.c                                       |  2 +-
 net/rds/ib_cm.c                                    |  2 +-
 net/rds/iw.c                                       |  2 +-
 net/rds/iw_cm.c                                    |  2 +-
 net/rds/rdma_transport.c                           |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_transport.c           |  2 +-
 net/sunrpc/xprtrdma/verbs.c                        |  3 +-
 14 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 022b0d0a51cc..f6379b38b366 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -540,7 +540,8 @@ static int cma_disable_callback(struct rdma_id_private *id_priv,
 
 struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 				  void *context, enum rdma_port_space ps,
-				  enum ib_qp_type qp_type)
+				  enum ib_qp_type qp_type,
+				  struct net *net)
 {
 	struct rdma_id_private *id_priv;
 
@@ -562,7 +563,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 	INIT_LIST_HEAD(&id_priv->listen_list);
 	INIT_LIST_HEAD(&id_priv->mc_list);
 	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
-	id_priv->id.route.addr.dev_addr.net = &init_net;
+	id_priv->id.route.addr.dev_addr.net = get_net(net);
 
 	return &id_priv->id;
 }
@@ -689,7 +690,7 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
 	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
 	    == IB_LINK_LAYER_ETHERNET) {
 		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
-						  &init_net);
+				id_priv->id.route.addr.dev_addr.net);
 
 		if (ret)
 			goto out;
@@ -953,6 +954,7 @@ static void cma_cancel_operation(struct rdma_id_private *id_priv,
 static void cma_release_port(struct rdma_id_private *id_priv)
 {
 	struct rdma_bind_list *bind_list = id_priv->bind_list;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 
 	if (!bind_list)
 		return;
@@ -960,7 +962,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
 	mutex_lock(&lock);
 	hlist_del(&id_priv->node);
 	if (hlist_empty(&bind_list->owners)) {
-		cma_ps_remove(bind_list->ps, &init_net, bind_list->port);
+		cma_ps_remove(bind_list->ps, net, bind_list->port);
 		kfree(bind_list);
 	}
 	mutex_unlock(&lock);
@@ -1029,6 +1031,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
 		cma_deref_id(id_priv->id.context);
 
 	kfree(id_priv->id.route.path_rec);
+	put_net(id_priv->id.route.addr.dev_addr.net);
 	kfree(id_priv);
 }
 EXPORT_SYMBOL(rdma_destroy_id);
@@ -1156,7 +1159,8 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
-			    listen_id->ps, ib_event->param.req_rcvd.qp_type);
+			    listen_id->ps, ib_event->param.req_rcvd.qp_type,
+			    listen_id->route.addr.dev_addr.net);
 	if (IS_ERR(id))
 		return NULL;
 
@@ -1201,10 +1205,11 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
 {
 	struct rdma_id_private *id_priv;
 	struct rdma_cm_id *id;
+	struct net *net = listen_id->route.addr.dev_addr.net;
 	int ret;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
-			    listen_id->ps, IB_QPT_UD);
+			    listen_id->ps, IB_QPT_UD, net);
 	if (IS_ERR(id))
 		return NULL;
 
@@ -1455,7 +1460,8 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
 	/* Create a new RDMA id for the new IW CM ID */
 	new_cm_id = rdma_create_id(listen_id->id.event_handler,
 				   listen_id->id.context,
-				   RDMA_PS_TCP, IB_QPT_RC);
+				   RDMA_PS_TCP, IB_QPT_RC,
+				   listen_id->id.route.addr.dev_addr.net);
 	if (IS_ERR(new_cm_id)) {
 		ret = -ENOMEM;
 		goto out;
@@ -1528,11 +1534,11 @@ static int cma_ib_listen(struct rdma_id_private *id_priv)
 	struct ib_cm_compare_data compare_data;
 	struct sockaddr *addr;
 	struct ib_cm_id	*id;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 	__be64 svc_id;
 	int ret;
 
-	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
-			     &init_net);
+	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv, net);
 	if (IS_ERR(id))
 		return PTR_ERR(id);
 
@@ -1596,6 +1602,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 {
 	struct rdma_id_private *dev_id_priv;
 	struct rdma_cm_id *id;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 	int ret;
 
 	if (cma_family(id_priv) == AF_IB &&
@@ -1603,7 +1610,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
 		return;
 
 	id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
-			    id_priv->id.qp_type);
+			    id_priv->id.qp_type, net);
 	if (IS_ERR(id))
 		return;
 
@@ -2283,7 +2290,8 @@ static int cma_alloc_port(struct radix_tree_root *ps,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = cma_ps_alloc(ps, &init_net, bind_list, snum);
+	ret = cma_ps_alloc(ps, id_priv->id.route.addr.dev_addr.net, bind_list,
+			   snum);
 	if (ret < 0)
 		goto err;
 
@@ -2302,13 +2310,14 @@ static int cma_alloc_any_port(struct radix_tree_root *ps,
 	static unsigned int last_used_port;
 	int low, high, remaining;
 	unsigned int rover;
+	struct net *net = id_priv->id.route.addr.dev_addr.net;
 
-	inet_get_local_port_range(&init_net, &low, &high);
+	inet_get_local_port_range(net, &low, &high);
 	remaining = (high - low) + 1;
 	rover = prandom_u32() % remaining + low;
 retry:
 	if (last_used_port != rover &&
-	    !cma_ps_find(ps, &init_net, (unsigned short)rover)) {
+	    !cma_ps_find(ps, net, (unsigned short)rover)) {
 		int ret = cma_alloc_port(ps, id_priv, rover);
 		/*
 		 * Remember previously used port number in order to avoid
@@ -2376,7 +2385,7 @@ static int cma_use_port(struct radix_tree_root *ps,
 	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
-	bind_list = cma_ps_find(ps, &init_net, snum);
+	bind_list = cma_ps_find(ps, id_priv->id.route.addr.dev_addr.net, snum);
 	if (!bind_list) {
 		ret = cma_alloc_port(ps, id_priv, snum);
 	} else {
@@ -2573,8 +2582,11 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 		if (addr->sa_family == AF_INET)
 			id_priv->afonly = 1;
 #if IS_ENABLED(CONFIG_IPV6)
-		else if (addr->sa_family == AF_INET6)
-			id_priv->afonly = init_net.ipv6.sysctl.bindv6only;
+		else if (addr->sa_family == AF_INET6) {
+			struct net *net = id_priv->id.route.addr.dev_addr.net;
+
+			id_priv->afonly = net->ipv6.sysctl.bindv6only;
+		}
 #endif
 	}
 	ret = cma_get_port(id_priv);
@@ -2687,7 +2699,7 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
 	}
 
 	id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler,
-			     id_priv, &init_net);
+			     id_priv, id_priv->id.route.addr.dev_addr.net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
@@ -2737,7 +2749,7 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
 		       conn_param->private_data_len);
 
 	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv,
-			     &init_net);
+			     id_priv->id.route.addr.dev_addr.net);
 	if (IS_ERR(id)) {
 		ret = PTR_ERR(id);
 		goto out;
@@ -3387,6 +3399,7 @@ static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id
 	dev_addr = &id_priv->id.route.addr.dev_addr;
 
 	if ((dev_addr->bound_dev_if == ndev->ifindex) &&
+	    (dev_net(ndev) == dev_addr->net) &&
 	    memcmp(dev_addr->src_dev_addr, ndev->dev_addr, ndev->addr_len)) {
 		printk(KERN_INFO "RDMA CM addr change for ndev %s used by id %p\n",
 		       ndev->name, &id_priv->id);
@@ -3412,9 +3425,6 @@ static int cma_netdev_callback(struct notifier_block *self, unsigned long event,
 	struct rdma_id_private *id_priv;
 	int ret = NOTIFY_DONE;
 
-	if (dev_net(ndev) != &init_net)
-		return NOTIFY_DONE;
-
 	if (event != NETDEV_BONDING_FAILOVER)
 		return NOTIFY_DONE;
 
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 56a4b7ca7ee3..de755f2c6166 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -391,7 +391,8 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 		return -ENOMEM;
 
 	ctx->uid = cmd.uid;
-	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type);
+	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type,
+				    &init_net);
 	if (IS_ERR(ctx->cm_id)) {
 		ret = PTR_ERR(ctx->cm_id);
 		goto err1;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 695a2704bd43..d4e9c639ad2f 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -949,7 +949,7 @@ int iser_connect(struct iser_conn   *iser_conn,
 
 	ib_conn->cma_id = rdma_create_id(iser_cma_handler,
 					 (void *)iser_conn,
-					 RDMA_PS_TCP, IB_QPT_RC);
+					 RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(ib_conn->cma_id)) {
 		err = PTR_ERR(ib_conn->cma_id);
 		iser_err("rdma_create_id failed: %d\n", err);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index dafb3c531f96..44a6fff8dc79 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2960,7 +2960,7 @@ isert_setup_id(struct isert_np *isert_np)
 	isert_dbg("ksockaddr: %p, sa: %p\n", &np->np_sockaddr, sa);
 
 	id = rdma_create_id(isert_cma_handler, isert_np,
-			    RDMA_PS_TCP, IB_QPT_RC);
+			    RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(id)) {
 		isert_err("rdma_create_id() failed: %ld\n", PTR_ERR(id));
 		ret = PTR_ERR(id);
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
index b02b4ec1e29d..128de4eb0959 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
@@ -125,7 +125,9 @@ extern kib_tunables_t  kiblnd_tunables;
 				     IBLND_CREDIT_HIGHWATER_V1 : \
 				     *kiblnd_tunables.kib_peercredits_hiw) /* when eagerly to return credits */
 
-#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, ps, qpt)
+#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, \
+							       ps, qpt, \
+							       &init_net)
 
 static inline int
 kiblnd_concurrent_sends_v1(void)
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index 1ed2088dc9f5..3953e9c8bc94 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -163,10 +163,14 @@ struct rdma_cm_id {
  * @context: User specified context associated with the id.
  * @ps: RDMA port space.
  * @qp_type: type of queue pair associated with the id.
+ * @net: The network namespace in which to create the new id.
+ *
+ * The id holds a reference on the network namespace until it is destroyed.
  */
 struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 				  void *context, enum rdma_port_space ps,
-				  enum ib_qp_type qp_type);
+				  enum ib_qp_type qp_type,
+				  struct net *net);
 
 /**
   * rdma_destroy_id - Destroys an RDMA identifier.
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 14ad43b5cf89..577fd3129bcf 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -635,7 +635,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
 
 	/* Create the RDMA CM ID */
 	rdma->cm_id = rdma_create_id(p9_cm_event_handler, client, RDMA_PS_TCP,
-				     IB_QPT_RC);
+				     IB_QPT_RC, &init_net);
 	if (IS_ERR(rdma->cm_id))
 		goto error;
 
diff --git a/net/rds/ib.c b/net/rds/ib.c
index ba2dffeff608..cc137f523248 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -326,7 +326,7 @@ static int rds_ib_laddr_check(__be32 addr)
 	/* Create a CMA ID and try to bind it. This catches both
 	 * IB and iWARP capable NICs.
 	 */
-	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
+	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(cm_id))
 		return PTR_ERR(cm_id);
 
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 31b74f5e61ad..d19b91296ddc 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -584,7 +584,7 @@ int rds_ib_conn_connect(struct rds_connection *conn)
 	/* XXX I wonder what affect the port space has */
 	/* delegate cm event handler to rdma_transport */
 	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
-				     RDMA_PS_TCP, IB_QPT_RC);
+				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(ic->i_cm_id)) {
 		ret = PTR_ERR(ic->i_cm_id);
 		ic->i_cm_id = NULL;
diff --git a/net/rds/iw.c b/net/rds/iw.c
index 589935661d66..8501b73ed12f 100644
--- a/net/rds/iw.c
+++ b/net/rds/iw.c
@@ -227,7 +227,7 @@ static int rds_iw_laddr_check(__be32 addr)
 	/* Create a CMA ID and try to bind it. This catches both
 	 * IB and iWARP capable NICs.
 	 */
-	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
+	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(cm_id))
 		return PTR_ERR(cm_id);
 
diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
index a91e1db62ee6..e5ee2d562a60 100644
--- a/net/rds/iw_cm.c
+++ b/net/rds/iw_cm.c
@@ -521,7 +521,7 @@ int rds_iw_conn_connect(struct rds_connection *conn)
 	/* XXX I wonder what affect the port space has */
 	/* delegate cm event handler to rdma_transport */
 	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
-				     RDMA_PS_TCP, IB_QPT_RC);
+				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
 	if (IS_ERR(ic->i_cm_id)) {
 		ret = PTR_ERR(ic->i_cm_id);
 		ic->i_cm_id = NULL;
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 6cd9d1deafc3..066b60b27b12 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -160,7 +160,7 @@ static int rds_rdma_listen_init(void)
 	int ret;
 
 	cm_id = rdma_create_id(rds_rdma_cm_event_handler, NULL, RDMA_PS_TCP,
-			       IB_QPT_RC);
+			       IB_QPT_RC, &init_net);
 	if (IS_ERR(cm_id)) {
 		ret = PTR_ERR(cm_id);
 		printk(KERN_ERR "RDS/RDMA: failed to setup listener, "
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 4e618808bc98..e3b246e305f9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -701,7 +701,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
 	xprt = &cma_xprt->sc_xprt;
 
 	listen_id = rdma_create_id(rdma_listen_handler, cma_xprt, RDMA_PS_TCP,
-				   IB_QPT_RC);
+				   IB_QPT_RC, &init_net);
 	if (IS_ERR(listen_id)) {
 		ret = PTR_ERR(listen_id);
 		dprintk("svcrdma: rdma_create_id failed = %d\n", ret);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index c98e40643910..f574e77165f4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -483,7 +483,8 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
 
 	init_completion(&ia->ri_done);
 
-	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC);
+	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC,
+			    &init_net);
 	if (IS_ERR(id)) {
 		rc = PTR_ERR(id);
 		dprintk("RPC:       %s: rdma_create_id() failed %i\n",
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 09/10] IB/ucma: Take the network namespace from the process
  2015-02-01 11:28 [PATCH for-next 00/11] Add network namespace support in the RDMA-CM Shachar Raindel
                   ` (3 preceding siblings ...)
  2015-02-01 11:28 ` [PATCH for-next 07/10] IB/cma: Separate port allocation to " Shachar Raindel
@ 2015-02-01 11:28 ` Shachar Raindel
  2015-02-01 11:28 ` [PATCH for-next 10/10] IB/ucm: Add partial support for network namespaces Shachar Raindel
  5 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland, sean.hefty
  Cc: linux-rdma, netdev, liranl, Guy Shapiro, Haggai Eran,
	Yotam Kenneth, Shachar Raindel

From: Guy Shapiro <guysh@mellanox.com>

Add support for network namespaces from user space. This is done by passing
the network namespace of the process instead of init_net.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>

---
 drivers/infiniband/core/ucma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index de755f2c6166..d25a20968ca2 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -42,6 +42,7 @@
 #include <linux/slab.h>
 #include <linux/sysctl.h>
 #include <linux/module.h>
+#include <linux/nsproxy.h>
 
 #include <rdma/rdma_user_cm.h>
 #include <rdma/ib_marshall.h>
@@ -392,7 +393,7 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
 
 	ctx->uid = cmd.uid;
 	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type,
-				    &init_net);
+				    current->nsproxy->net_ns);
 	if (IS_ERR(ctx->cm_id)) {
 		ret = PTR_ERR(ctx->cm_id);
 		goto err1;
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next 10/10] IB/ucm: Add partial support for network namespaces
  2015-02-01 11:28 [PATCH for-next 00/11] Add network namespace support in the RDMA-CM Shachar Raindel
                   ` (4 preceding siblings ...)
  2015-02-01 11:28 ` [PATCH for-next 09/10] IB/ucma: Take the network namespace from the process Shachar Raindel
@ 2015-02-01 11:28 ` Shachar Raindel
  5 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 11:28 UTC (permalink / raw)
  To: roland, sean.hefty
  Cc: linux-rdma, netdev, liranl, Shachar Raindel, Haggai Eran,
	Yotam Kenneth, Guy Shapiro

It is impossible to completely support network namespaces for UCM, as
we cannot identify the target IPoIB device. However, we add support
which will work if the user is following the IB-Spec Annex 11 (RDMA IP
CM Services) with the service ID and private data formatting.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>

---
 drivers/infiniband/core/ucm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 9604ab068984..424421091dae 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -45,6 +45,7 @@
 #include <linux/idr.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/nsproxy.h>
 
 #include <asm/uaccess.h>
 
@@ -490,7 +491,7 @@ static ssize_t ib_ucm_create_id(struct ib_ucm_file *file,
 	ctx->uid = cmd.uid;
 	ctx->cm_id = ib_create_cm_id(file->device->ib_dev,
 				     ib_ucm_event_handler, ctx,
-				     &init_net);
+				     current->nsproxy->net_ns);
 	if (IS_ERR(ctx->cm_id)) {
 		result = PTR_ERR(ctx->cm_id);
 		goto err1;
-- 
1.7.11.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter
       [not found]   ` <1422790133-28725-2-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-02-01 12:22     ` Yann Droneaud
       [not found]       ` <1422793376.3030.37.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Droneaud @ 2015-02-01 12:22 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth

Hi,

Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add network namespace support to the ib_addr module. For that, all the address
> resolution and matching should be done using the appropriate namespace instead
> of init_net.
> 
> This is achieved by:
> 
> 1. Adding an explicit network namespace argument to exported function that
>    require a namespace.
> 2. Saving the namespace in the rdma_addr_client structure.
> 3. Using it when calling networking functions.
> 
> In order to preserve the behavior of calling modules, &init_net is
> passed as the parameter in calls from other modules. This is modified as
> namspace support is added on more levels.

typo: "namespace"

> 
> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> ---
>  drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
>  drivers/infiniband/core/cma.c            |  4 ++-
>  drivers/infiniband/core/verbs.c          | 14 +++++++---
>  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
>  include/rdma/ib_addr.h                   | 44 ++++++++++++++++++++++++++++----
>  5 files changed, 72 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
> index f80da50d84a5..95beaef6b66d 100644
> --- a/drivers/infiniband/core/addr.c
> +++ b/drivers/infiniband/core/addr.c
> @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  	int ret = -EADDRNOTAVAIL;
>  
>  	if (dev_addr->bound_dev_if) {
> -		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
> +		dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
>  		if (!dev)
>  			return -ENODEV;
>  		ret = rdma_copy_addr(dev_addr, dev, NULL);
> @@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  	}
>  
>  	switch (addr->sa_family) {
> -	case AF_INET:
> -		dev = ip_dev_find(&init_net,
> -			((struct sockaddr_in *) addr)->sin_addr.s_addr);
> +	case AF_INET: {
> +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
> +
> +		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);

I don't see the point of this change.

>  
>  		if (!dev)
>  			return ret;
> @@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  			*vlan_id = rdma_vlan_dev_vlan_id(dev);
>  		dev_put(dev);
>  		break;
> -
> +	}

closing } here ?

>  #if IS_ENABLED(CONFIG_IPV6)
>  	case AF_INET6:
>  		rcu_read_lock();
> -		for_each_netdev_rcu(&init_net, dev) {
> -			if (ipv6_chk_addr(&init_net,
> +		for_each_netdev_rcu(dev_addr->net, dev) {
> +			if (ipv6_chk_addr(dev_addr->net,
>  					  &((struct sockaddr_in6 *) addr)->sin6_addr,
>  					  dev, 1)) {
>  				ret = rdma_copy_addr(dev_addr, dev, NULL);
> @@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
>  	fl4.daddr = dst_ip;
>  	fl4.saddr = src_ip;
>  	fl4.flowi4_oif = addr->bound_dev_if;
> -	rt = ip_route_output_key(&init_net, &fl4);
> +	rt = ip_route_output_key(addr->net, &fl4);
>  	if (IS_ERR(rt)) {
>  		ret = PTR_ERR(rt);
>  		goto out;
> @@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
>  	fl6.saddr = src_in->sin6_addr;
>  	fl6.flowi6_oif = addr->bound_dev_if;
>  
> -	dst = ip6_route_output(&init_net, NULL, &fl6);
> +	dst = ip6_route_output(addr->net, NULL, &fl6);
>  	if ((ret = dst->error))
>  		goto put;
>  
>  	if (ipv6_addr_any(&fl6.saddr)) {
> -		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
> +		ret = ipv6_dev_get_saddr(addr->net,
> +					 ip6_dst_idev(dst)->dev,
>  					 &fl6.daddr, 0, &fl6.saddr);
>  		if (ret)
>  			goto put;
> @@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr *src_addr,
>  }
>  
>  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
> -			       u16 *vlan_id)
> +			       u16 *vlan_id, struct net *net)
>  {
>  	int ret = 0;
>  	struct rdma_dev_addr dev_addr;
> @@ -481,6 +483,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
>  		return ret;
>  
>  	memset(&dev_addr, 0, sizeof(dev_addr));
> +	dev_addr.net = net;

Should be get_net() be used somewhere to grab a reference on the net
namespace ?

>  
>  	ctx.addr = &dev_addr;
>  	init_completion(&ctx.comp);
> @@ -492,7 +495,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
>  	wait_for_completion(&ctx.comp);
>  
>  	memcpy(dmac, dev_addr.dst_dev_addr, ETH_ALEN);
> -	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
> +	dev = dev_get_by_index(net, dev_addr.bound_dev_if);
>  	if (!dev)
>  		return -ENODEV;
>  	if (vlan_id)
> @@ -502,7 +505,8 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
>  }
>  EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh);
>  
> -int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
> +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
> +				struct net *net)
>  {
>  	int ret = 0;
>  	struct rdma_dev_addr dev_addr;
> @@ -517,6 +521,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id)
>  	if (ret)
>  		return ret;
>  	memset(&dev_addr, 0, sizeof(dev_addr));
> +	dev_addr.net = net;

get_net() ?

>  	ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 6e5e11ca7702..aeb2417ec928 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -512,6 +512,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
>  	INIT_LIST_HEAD(&id_priv->listen_list);
>  	INIT_LIST_HEAD(&id_priv->mc_list);
>  	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> +	id_priv->id.route.addr.dev_addr.net = &init_net;
>  
>  	return &id_priv->id;
>  }
> @@ -637,7 +638,8 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
>  	    == RDMA_TRANSPORT_IB &&
>  	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
>  	    == IB_LINK_LAYER_ETHERNET) {
> -		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
> +		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
> +						  &init_net);
>  
>  		if (ret)
>  			goto out;
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index f93eb8da7b5a..ca5c4dd8a67a 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -212,7 +212,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
>  			ah_attr->vlan_id = wc->vlan_id;
>  		} else {
>  			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
> -					ah_attr->dmac, &ah_attr->vlan_id);
> +							 ah_attr->dmac,
> +							 &ah_attr->vlan_id,
> +							 &init_net);
>  			if (ret)
>  				return ret;
>  		}
> @@ -882,11 +884,15 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
>  			if (!(*qp_attr_mask & IB_QP_VID))
>  				qp_attr->vlan_id = rdma_get_vlan_id(&sgid);
>  		} else {
> -			ret = rdma_addr_find_dmac_by_grh(&sgid, &qp_attr->ah_attr.grh.dgid,
> -					qp_attr->ah_attr.dmac, &qp_attr->vlan_id);
> +			ret = rdma_addr_find_dmac_by_grh(
> +				&sgid,
> +				&qp_attr->ah_attr.grh.dgid,
> +				qp_attr->ah_attr.dmac, &qp_attr->vlan_id,
> +				&init_net);
>  			if (ret)
>  				goto out;
> -			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac, NULL);
> +			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
> +							  NULL, &init_net);
>  			if (ret)
>  				goto out;
>  		}
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> index f3cc8c9e65ae..debaac2b6ee8 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> @@ -119,7 +119,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
>  
>  	if (pd->uctx) {
>  		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
> -                                        attr->dmac, &attr->vlan_id);
> +						    attr->dmac, &attr->vlan_id,
> +						    &init_net);
>  		if (status) {
>  			pr_err("%s(): Failed to resolve dmac from gid." 
>  				"status = %d\n", __func__, status);
> diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
> index ce55906b54a0..40ccf8b83755 100644
> --- a/include/rdma/ib_addr.h
> +++ b/include/rdma/ib_addr.h
> @@ -47,6 +47,7 @@
>  #include <rdma/ib_verbs.h>
>  #include <rdma/ib_pack.h>
>  #include <net/ipv6.h>
> +#include <net/net_namespace.h>
>  
>  struct rdma_addr_client {
>  	atomic_t refcount;
> @@ -64,6 +65,16 @@ void rdma_addr_register_client(struct rdma_addr_client *client);
>   */
>  void rdma_addr_unregister_client(struct rdma_addr_client *client);
>  
> +/**
> + * struct rdma_dev_addr - Contains resolved RDMA hardware addresses
> + * @src_dev_addr:	Source MAC address.
> + * @dst_dev_addr:	Destination MAC address.
> + * @broadcast:		Broadcast address of the device.
> + * @dev_type:		The interface hardware type of the device.
> + * @bound_dev_if:	An optional device interface index.
> + * @transport:		The transport type used.
> + * @net:		Network namespace containing the bound_dev_if net_dev.
> + */
>  struct rdma_dev_addr {
>  	unsigned char src_dev_addr[MAX_ADDR_LEN];
>  	unsigned char dst_dev_addr[MAX_ADDR_LEN];
> @@ -71,11 +82,14 @@ struct rdma_dev_addr {
>  	unsigned short dev_type;
>  	int bound_dev_if;
>  	enum rdma_transport_type transport;
> +	struct net *net;
>  };
>  
>  /**
>   * rdma_translate_ip - Translate a local IP address to an RDMA hardware
>   *   address.
> + *
> + * The dev_addr->net field must be initialized.
>   */
>  int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>  		      u16 *vlan_id);
> @@ -90,7 +104,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr,
>   * @dst_addr: The destination address to resolve.
>   * @addr: A reference to a data location that will receive the resolved
>   *   addresses.  The data location must remain valid until the callback has
> - *   been invoked.
> + *   been invoked. The net field of the addr struct must be valid.
>   * @timeout_ms: Amount of time to wait for the address resolution to complete.
>   * @callback: Call invoked once address resolution has completed, timed out,
>   *   or been canceled.  A status of 0 indicates success.
> @@ -110,9 +124,29 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
>  
>  int rdma_addr_size(struct sockaddr *addr);
>  
> -int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id);
> -int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *smac,
> -			       u16 *vlan_id);
> +/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for a src GID
> + * @sgid:	Source GID to find the MAC and VLAN for.
> + * @smac:	A buffer to contain the resulting MAC address.
> + * @vlan_id:	Will contain the resulting VLAN ID.
> + * @net:	Network namespace to use for the address resolution.
> + *
> + * It is the caller's responsibility to keep the network namespace alive until
> + * the function returns.

Why ?

> + */
> +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16 *vlan_id,
> +				struct net *net);
> +/** rdma_addr_find_dmac_by_grh() - Find the dst MAC and VLAN ID for a GID pair
> + * @sgid:	Source GID to use for the search.
> + * @dgid:	Destination GID to find the details for.
> + * @dmac:	Contains the resulting destination MAC address.
> + * @vlan_id:	Contains the resulting VLAN ID.
> + * @net:	Network namespace to use for the address resolution.
> + *
> + * It is the caller's responsibility to keep the network namespace alive until
> + * the function returns.

Why ?

> + */
> +int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid *dgid, u8 *dmac,
> +			       u16 *vlan_id, struct net *net);
>  
>  static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
>  {
> @@ -182,7 +216,7 @@ static inline void iboe_addr_get_sgid(struct rdma_dev_addr *dev_addr,
>  	struct net_device *dev;
>  	struct in_device *ip4;
>  
> -	dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
> +	dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
>  	if (dev) {
>  		ip4 = (struct in_device *)dev->ip_ptr;
>  		if (ip4 && ip4->ifa_list && ip4->ifa_list->ifa_address)


I believe this patch lack proper reference counting in form of
get_net() / put_net(), but cannot say for sure.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next 02/10] IB/core: Pass network namespace as a parameter to relevant functions
       [not found]     ` <1422790133-28725-3-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-02-01 12:26       ` Yann Droneaud
  2015-02-01 14:10         ` Shachar Raindel
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Droneaud @ 2015-02-01 12:26 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	liranl-VPRAkNaXOzVWk0Htik3J/w, Guy Shapiro, Haggai Eran,
	Yotam Kenneth

Hi,

Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Add network namespace parameters for the address related ib_core
> functions. The parameter is passed to lower level function, instead of
> &init_net, so things are done in the correct namespace.
> 
> For now pass &init_net on every caller.
> Callers that will pass &init_net permanently are marked with an
> appropriate comment.
> 
> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> ---
>  drivers/infiniband/core/agent.c       |  4 +++-
>  drivers/infiniband/core/cm.c          |  9 +++++++--
>  drivers/infiniband/core/mad_rmpp.c    | 10 ++++++++--
>  drivers/infiniband/core/user_mad.c    |  4 +++-
>  drivers/infiniband/core/verbs.c       | 10 ++++++----
>  drivers/infiniband/ulp/srpt/ib_srpt.c |  3 ++-
>  include/rdma/ib_verbs.h               | 15 +++++++++++++--
>  7 files changed, 42 insertions(+), 13 deletions(-)

> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 0d74f1de99aa..dd4c80cea8d3 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -48,6 +48,7 @@
>  #include <linux/rwsem.h>
>  #include <linux/scatterlist.h>
>  #include <linux/workqueue.h>
> +#include <net/net_namespace.h>
>  #include <uapi/linux/if_ether.h>
>  
>  #include <linux/atomic.h>
> @@ -1801,9 +1802,14 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
>   *   ignored unless the work completion indicates that the GRH is valid.
>   * @ah_attr: Returned attributes that can be used when creating an address
>   *   handle for replying to the message.
> + * @net: The network namespace to use for address resolution.
> + *
> + * It is the caller's responsibility to make sure the network namespace is
> + * alive until the function returns.

Why ?

>   */
>  int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
> -		       struct ib_grh *grh, struct ib_ah_attr *ah_attr);
> +		       struct ib_grh *grh, struct ib_ah_attr *ah_attr,
> +		       struct net *net);
>  
>  /**
>   * ib_create_ah_from_wc - Creates an address handle associated with the
> @@ -1813,12 +1819,17 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
>   * @grh: References the received global route header.  This parameter is
>   *   ignored unless the work completion indicates that the GRH is valid.
>   * @port_num: The outbound port number to associate with the address.
> + * @net: The network namespace to use for address resolution.
>   *
>   * The address handle is used to reference a local or global destination
>   * in all UD QP post sends.
> + *
> + * It is the caller's responsibility to make sure the network namespace is
> + * alive until the function returns.

Why ?

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm
  2015-02-01 11:28   ` [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Shachar Raindel
@ 2015-02-01 12:55     ` Yann Droneaud
       [not found]       ` <1422795359.3030.43.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Droneaud @ 2015-02-01 12:55 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: roland, sean.hefty, linux-rdma, netdev, liranl, Guy Shapiro,
	Haggai Eran, Yotam Kenneth

Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> From: Guy Shapiro <guysh@mellanox.com>
> 
> When receiving a connection request, ib_cm needs to associate the request with
> a network namespace. To do this, it needs to know the request's destination
> IP. For this the RDMA IP CM packet formatting functionality needs to be
> exposed to ib_cm.
> 
> This patch merely moves the RDMA IP CM data formatting and parsing functions
> to be part of ib_cm. The following patch will utilize the new knowledge to
> look-up the appropriate namespace. Each namespace maintains an independent
> table of RDMA CM service IDs, allowing isolation and separation between the
> network namespaces.
> 
> When creating a new incoming connection ID, the code in cm_save_ip_info can no
> longer rely on the listener's private data to find the port number, so it
> reads it from the requested service ID. This required saving the service ID in
> cm_format_paths_from_req.
> 
> Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> 
> ---
>  drivers/infiniband/core/cm.c  | 167 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/cma.c | 166 +++++------------------------------------
>  include/rdma/ib_cm.h          |  46 ++++++++++++
>  3 files changed, 231 insertions(+), 148 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index 5a45cb76c43e..5cc1a4aa9728 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -51,6 +51,7 @@
>  
>  #include <rdma/ib_cache.h>
>  #include <rdma/ib_cm.h>
> +#include <rdma/ib.h>
>  #include "cm_msgs.h"
>  
>  MODULE_AUTHOR("Sean Hefty");
> @@ -701,6 +702,170 @@ static void cm_reject_sidr_req(struct cm_id_private *cm_id_priv,
>  	ib_send_cm_sidr_rep(&cm_id_priv->id, &param);
>  }
>  
> +static inline u8 cm_get_ip_ver(struct cm_hdr *hdr)
> +{
> +	return hdr->ip_version >> 4;
> +}
> +
> +void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver)
> +{
> +	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
> +}
> +EXPORT_SYMBOL(cm_set_ip_ver);
> +

That can be defined as an inline function in header.

> +int cm_format_hdr(void *hdr, int family,
> +		  struct sockaddr *src_addr,
> +		  struct sockaddr *dst_addr)
> +{
> +	struct cm_hdr *cm_hdr;
> +
> +	cm_hdr = hdr;
> +	cm_hdr->cm_version = RDMA_IP_CM_VERSION;
> +	if (family == AF_INET) {
> +		struct sockaddr_in *src4, *dst4;
> +
> +		src4 = (struct sockaddr_in *)src_addr;
> +		dst4 = (struct sockaddr_in *)dst_addr;
> +
> +		cm_set_ip_ver(cm_hdr, 4);
> +		cm_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
> +		cm_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
> +		cm_hdr->port = src4->sin_port;
> +	} else if (family == AF_INET6) {
> +		struct sockaddr_in6 *src6, *dst6;
> +
> +		src6 = (struct sockaddr_in6 *)src_addr;
> +		dst6 = (struct sockaddr_in6 *)dst_addr;
> +
> +		cm_set_ip_ver(cm_hdr, 6);
> +		cm_hdr->src_addr.ip6 = src6->sin6_addr;
> +		cm_hdr->dst_addr.ip6 = dst6->sin6_addr;
> +		cm_hdr->port = src6->sin6_port;
> +	}
> +	return 0;
> +}
> +EXPORT_SYMBOL(cm_format_hdr);
> +
> +static void cm_save_ib_info(struct sockaddr *src_addr,
> +			    struct sockaddr *dst_addr,
> +			    struct ib_sa_path_rec *path)
> +{
> +	struct sockaddr_ib  *ib;
> +
> +	if (src_addr) {
> +		ib = (struct sockaddr_ib *)src_addr;
> +		ib->sib_family = AF_IB;
> +		ib->sib_pkey = path->pkey;
> +		ib->sib_flowinfo = path->flow_label;
> +		memcpy(&ib->sib_addr, &path->sgid, 16);
> +		ib->sib_sid = path->service_id;
> +		ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
> +		ib->sib_scope_id = 0;
> +	}
> +	if (dst_addr) {
> +		ib = (struct sockaddr_ib *)dst_addr;
> +		ib->sib_family = AF_IB;
> +		ib->sib_pkey = path->pkey;
> +		ib->sib_flowinfo = path->flow_label;
> +		memcpy(&ib->sib_addr, &path->dgid, 16);
> +	}
> +}
> +
> +static void cm_save_ip6_info(struct sockaddr *src_addr,
> +			     struct sockaddr *dst_addr,
> +			     struct cm_hdr *hdr,
> +			     __be16 local_port)
> +{
> +	struct sockaddr_in6 *ip6;
> +
> +	if (src_addr) {
> +		ip6 = (struct sockaddr_in6 *)src_addr;
> +		ip6->sin6_family = AF_INET6;
> +		ip6->sin6_addr = hdr->dst_addr.ip6;
> +		ip6->sin6_port = local_port;
> +	}
> +
> +	if (dst_addr) {
> +		ip6 = (struct sockaddr_in6 *)dst_addr;
> +		ip6->sin6_family = AF_INET6;
> +		ip6->sin6_addr = hdr->src_addr.ip6;
> +		ip6->sin6_port = hdr->port;
> +	}
> +}
> +
> +static void cm_save_ip4_info(struct sockaddr *src_addr,
> +			     struct sockaddr *dst_addr,
> +			     struct cm_hdr *hdr,
> +			     __be16 local_port)
> +{
> +	struct sockaddr_in *ip4;
> +
> +	if (src_addr) {
> +		ip4 = (struct sockaddr_in *)src_addr;
> +		ip4->sin_family = AF_INET;
> +		ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
> +		ip4->sin_port = local_port;
> +	}
> +
> +	if (dst_addr) {
> +		ip4 = (struct sockaddr_in *)dst_addr;
> +		ip4->sin_family = AF_INET;
> +		ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
> +		ip4->sin_port = hdr->port;
> +	}
> +}
> +
> +static __be16 cm_port_from_service_id(__be64 service_id)
> +{
> +	return htons(be64_to_cpu(service_id));
> +}
> +
> +static int cm_save_ip_info(struct sockaddr *src_addr,
> +			   struct sockaddr *dst_addr,
> +			   struct cm_work *work)
> +{
> +	struct cm_hdr *hdr;
> +	__be16 port;
> +
> +	hdr = work->cm_event.private_data;
> +	if (hdr->cm_version != RDMA_IP_CM_VERSION)
> +		return -EINVAL;
> +
> +	port = cm_port_from_service_id(work->path->service_id);
> +
> +	switch (cm_get_ip_ver(hdr)) {
> +	case 4:
> +		cm_save_ip4_info(src_addr, dst_addr, hdr, port);
> +		break;
> +	case 6:
> +		cm_save_ip6_info(src_addr, dst_addr, hdr, port);
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +int cm_save_net_info(struct sockaddr *src_addr,
> +		     struct sockaddr *dst_addr,
> +		     struct ib_cm_event *ib_event)
> +{
> +	struct cm_work *work = container_of(ib_event, struct cm_work, cm_event);
> +
> +	if ((rdma_port_get_link_layer(work->port->cm_dev->ib_device,
> +				      work->port->port_num) ==
> +	     IB_LINK_LAYER_INFINIBAND) &&
> +	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
> +		cm_save_ib_info(src_addr, dst_addr,
> +				ib_event->param.req_rcvd.primary_path);
> +		return 0;
> +	}
> +
> +	return cm_save_ip_info(src_addr, dst_addr, work);
> +}
> +EXPORT_SYMBOL(cm_save_net_info);
> +
>  struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
>  				 ib_cm_handler cm_handler,
>  				 void *context)
> @@ -1260,6 +1425,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
>  	primary_path->packet_life_time =
>  		cm_req_get_primary_local_ack_timeout(req_msg);
>  	primary_path->packet_life_time -= (primary_path->packet_life_time > 0);
> +	primary_path->service_id = req_msg->service_id;
>  
>  	if (req_msg->alt_local_lid) {
>  		memset(alt_path, 0, sizeof *alt_path);
> @@ -1281,6 +1447,7 @@ static void cm_format_paths_from_req(struct cm_req_msg *req_msg,
>  		alt_path->packet_life_time =
>  			cm_req_get_alt_local_ack_timeout(req_msg);
>  		alt_path->packet_life_time -= (alt_path->packet_life_time > 0);
> +		alt_path->service_id = req_msg->service_id;
>  	}
>  }
>  
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index aeb2417ec928..9f6faeb1de5f 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -179,23 +179,8 @@ struct iboe_mcast_work {
>  	struct cma_multicast	*mc;
>  };
>  
> -union cma_ip_addr {
> -	struct in6_addr ip6;
> -	struct {
> -		__be32 pad[3];
> -		__be32 addr;
> -	} ip4;
> -};
>  
> -struct cma_hdr {
> -	u8 cma_version;
> -	u8 ip_version;	/* IP version: 7:4 */
> -	__be16 port;
> -	union cma_ip_addr src_addr;
> -	union cma_ip_addr dst_addr;
> -};
>  
> -#define CMA_VERSION 0x00
>  
>  static int cma_comp(struct rdma_id_private *id_priv, enum rdma_cm_state comp)
>  {
> @@ -234,16 +219,6 @@ static enum rdma_cm_state cma_exch(struct rdma_id_private *id_priv,
>  	return old;
>  }
>  
> -static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
> -{
> -	return hdr->ip_version >> 4;
> -}
> -
> -static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
> -{
> -	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
> -}
> -
>  static void cma_attach_to_dev(struct rdma_id_private *id_priv,
>  			      struct cma_device *cma_dev)
>  {
> @@ -839,93 +814,9 @@ static inline int cma_any_port(struct sockaddr *addr)
>  	return !cma_port(addr);
>  }
>  
> -static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
> -			     struct ib_sa_path_rec *path)
> -{
> -	struct sockaddr_ib *listen_ib, *ib;
> -
> -	listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
> -	ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
> -	ib->sib_family = listen_ib->sib_family;
> -	ib->sib_pkey = path->pkey;
> -	ib->sib_flowinfo = path->flow_label;
> -	memcpy(&ib->sib_addr, &path->sgid, 16);
> -	ib->sib_sid = listen_ib->sib_sid;
> -	ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
> -	ib->sib_scope_id = listen_ib->sib_scope_id;
> -
> -	ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
> -	ib->sib_family = listen_ib->sib_family;
> -	ib->sib_pkey = path->pkey;
> -	ib->sib_flowinfo = path->flow_label;
> -	memcpy(&ib->sib_addr, &path->dgid, 16);
> -}
> -
> -static void cma_save_ip4_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
> -			      struct cma_hdr *hdr)
> -{
> -	struct sockaddr_in *listen4, *ip4;
> -
> -	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
> -	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
> -	ip4->sin_family = AF_INET;
> -	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
> -	ip4->sin_port = listen4->sin_port;
> -
> -	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
> -	ip4->sin_family = AF_INET;
> -	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
> -	ip4->sin_port = hdr->port;
> -}
> -
> -static void cma_save_ip6_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
> -			      struct cma_hdr *hdr)
> -{
> -	struct sockaddr_in6 *listen6, *ip6;
> -
> -	listen6 = (struct sockaddr_in6 *) &listen_id->route.addr.src_addr;
> -	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
> -	ip6->sin6_family = AF_INET6;
> -	ip6->sin6_addr = hdr->dst_addr.ip6;
> -	ip6->sin6_port = listen6->sin6_port;
> -
> -	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
> -	ip6->sin6_family = AF_INET6;
> -	ip6->sin6_addr = hdr->src_addr.ip6;
> -	ip6->sin6_port = hdr->port;
> -}
> -
> -static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id *listen_id,
> -			     struct ib_cm_event *ib_event)
> -{
> -	struct cma_hdr *hdr;
> -
> -	if ((listen_id->route.addr.src_addr.ss_family == AF_IB) &&
> -	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
> -		cma_save_ib_info(id, listen_id, ib_event->param.req_rcvd.primary_path);
> -		return 0;
> -	}
> -
> -	hdr = ib_event->private_data;
> -	if (hdr->cma_version != CMA_VERSION)
> -		return -EINVAL;
> -
> -	switch (cma_get_ip_ver(hdr)) {
> -	case 4:
> -		cma_save_ip4_info(id, listen_id, hdr);
> -		break;
> -	case 6:
> -		cma_save_ip6_info(id, listen_id, hdr);
> -		break;
> -	default:
> -		return -EINVAL;
> -	}
> -	return 0;
> -}
> -
>  static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
>  {
> -	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
> +	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cm_hdr);
>  }
>  
>  static void cma_cancel_route(struct rdma_id_private *id_priv)
> @@ -1195,7 +1086,9 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
>  		return NULL;
>  
>  	id_priv = container_of(id, struct rdma_id_private, id);
> -	if (cma_save_net_info(id, listen_id, ib_event))
> +	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
> +			     (struct sockaddr *)&id->route.addr.dst_addr,
> +			     ib_event))
>  		goto err;
>  
>  	rt = &id->route;
> @@ -1241,7 +1134,9 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
>  		return NULL;
>  
>  	id_priv = container_of(id, struct rdma_id_private, id);
> -	if (cma_save_net_info(id, listen_id, ib_event))
> +	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
> +			     (struct sockaddr *)&id->route.addr.dst_addr,
> +			     ib_event))
>  		goto err;
>  
>  	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
> @@ -1369,7 +1264,7 @@ EXPORT_SYMBOL(rdma_get_service_id);
>  static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
>  				 struct ib_cm_compare_data *compare)
>  {
> -	struct cma_hdr *cma_data, *cma_mask;
> +	struct cm_hdr *cma_data, *cma_mask;
>  	__be32 ip4_addr;
>  	struct in6_addr ip6_addr;
>  
> @@ -1380,8 +1275,8 @@ static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
>  	switch (addr->sa_family) {
>  	case AF_INET:
>  		ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
> -		cma_set_ip_ver(cma_data, 4);
> -		cma_set_ip_ver(cma_mask, 0xF);
> +		cm_set_ip_ver(cma_data, 4);
> +		cm_set_ip_ver(cma_mask, 0xF);
>  		if (!cma_any_addr(addr)) {
>  			cma_data->dst_addr.ip4.addr = ip4_addr;
>  			cma_mask->dst_addr.ip4.addr = htonl(~0);
> @@ -1389,8 +1284,8 @@ static void cma_set_compare_data(enum rdma_port_space ps, struct sockaddr *addr,
>  		break;
>  	case AF_INET6:
>  		ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
> -		cma_set_ip_ver(cma_data, 6);
> -		cma_set_ip_ver(cma_mask, 0xF);
> +		cm_set_ip_ver(cma_data, 6);
> +		cm_set_ip_ver(cma_mask, 0xF);
>  		if (!cma_any_addr(addr)) {
>  			cma_data->dst_addr.ip6 = ip6_addr;
>  			memset(&cma_mask->dst_addr.ip6, 0xFF,
> @@ -2615,35 +2510,6 @@ err1:
>  }
>  EXPORT_SYMBOL(rdma_bind_addr);
>  
> -static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
> -{
> -	struct cma_hdr *cma_hdr;
> -
> -	cma_hdr = hdr;
> -	cma_hdr->cma_version = CMA_VERSION;
> -	if (cma_family(id_priv) == AF_INET) {
> -		struct sockaddr_in *src4, *dst4;
> -
> -		src4 = (struct sockaddr_in *) cma_src_addr(id_priv);
> -		dst4 = (struct sockaddr_in *) cma_dst_addr(id_priv);
> -
> -		cma_set_ip_ver(cma_hdr, 4);
> -		cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
> -		cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
> -		cma_hdr->port = src4->sin_port;
> -	} else if (cma_family(id_priv) == AF_INET6) {
> -		struct sockaddr_in6 *src6, *dst6;
> -
> -		src6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
> -		dst6 = (struct sockaddr_in6 *) cma_dst_addr(id_priv);
> -
> -		cma_set_ip_ver(cma_hdr, 6);
> -		cma_hdr->src_addr.ip6 = src6->sin6_addr;
> -		cma_hdr->dst_addr.ip6 = dst6->sin6_addr;
> -		cma_hdr->port = src6->sin6_port;
> -	}
> -	return 0;
> -}
>  
>  static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
>  				struct ib_cm_event *ib_event)
> @@ -2731,7 +2597,9 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
>  		       conn_param->private_data_len);
>  
>  	if (private_data) {
> -		ret = cma_format_hdr(private_data, id_priv);
> +		ret = cm_format_hdr(private_data, cma_family(id_priv),
> +				    cma_src_addr(id_priv),
> +				    cma_dst_addr(id_priv));
>  		if (ret)
>  			goto out;
>  		req.private_data = private_data;
> @@ -2796,7 +2664,9 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
>  
>  	route = &id_priv->id.route;
>  	if (private_data) {
> -		ret = cma_format_hdr(private_data, id_priv);
> +		ret = cm_format_hdr(private_data, cma_family(id_priv),
> +				    cma_src_addr(id_priv),
> +				    cma_dst_addr(id_priv));
>  		if (ret)
>  			goto out;
>  		req.private_data = private_data;
> diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
> index 0e3ff30647d5..e418a11afcfe 100644
> --- a/include/rdma/ib_cm.h
> +++ b/include/rdma/ib_cm.h
> @@ -274,6 +274,52 @@ struct ib_cm_event {
>  #define CM_LAP_ATTR_ID		cpu_to_be16(0x0019)
>  #define CM_APR_ATTR_ID		cpu_to_be16(0x001A)
>  
> +union cm_ip_addr {
> +	struct in6_addr ip6;
> +	struct {
> +		__be32 pad[3];
> +		__be32 addr;
> +	} ip4;
> +};
> +
> +struct cm_hdr {
> +	u8 cm_version;
> +	u8 ip_version;	/* IP version: 7:4 */
> +	__be16 port;
> +	union cm_ip_addr src_addr;
> +	union cm_ip_addr dst_addr;
> +};
> +
> +#define RDMA_IP_CM_VERSION 0x00
> +
> +/**
> + * cm_format_hdr - Fill in a cm_hdr struct according to connection details
> + * @hdr:      cm_hdr struct to fill
> + * @family:   ip family of the addresses - AF_INET or AF_INTET6
> + * @src_addr: source address of the connection
> + * @dst_addr: destination address of the connection
> + **/
> +int cm_format_hdr(void *hdr, int family,
> +		  struct sockaddr *src_addr,
> +		  struct sockaddr *dst_addr);
> +
> +/**
> + * cm_save_net_info - saves ib connection event details
> + * @src_addr: source address of the connection
> + * @dst_addr: destination address of the connection
> + * @ib_event: ib event to take connection details from
> + **/
> +int cm_save_net_info(struct sockaddr *src_addr,
> +		     struct sockaddr *dst_addr,
> +		     struct ib_cm_event *ib_event);
> +
> +/**
> + * cm_set_ip_ver - sets the ip version of a cm_hdr struct
> + * @hdr:    cm_hdr struct to change
> + * @ip_ver: ip version to set - a 4 bit value
> + **/
> +void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver);
> +
>  /**
>   * ib_cm_handler - User-defined callback to process communication events.
>   * @cm_id: Communication identifier associated with the reported event.

Every other symbols in ib_cm.h are prefixed by "ib_cm_", so I would
prefer having symbols moved from ib_cma.c to ib_cm.c be renamed, except
it would create a lot of code change ...

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next 08/10] IB/cma: Add support for network namespaces
  2015-02-01 11:28   ` [PATCH for-next 08/10] IB/cma: Add support for network namespaces Shachar Raindel
@ 2015-02-01 13:44     ` Yann Droneaud
       [not found]       ` <1422798272.3030.48.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Droneaud @ 2015-02-01 13:44 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: roland, sean.hefty, linux-rdma, netdev, liranl, Guy Shapiro,
	Haggai Eran, Yotam Kenneth

Hi,

Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> From: Guy Shapiro <guysh@mellanox.com>
> 
> Add support for network namespaces in the ib_cma module. This is
> accomplished by:
> 
> 1. Adding network namespace parameter for rdma_create_id. This parameter is used
>    to populate the network namespace field in rdma_id_private. rdma_create_id
>    keeps a reference on the network namespace.
> 2. Using the network namespace from the rdma_id instead of init_net inside of
>    ib_cma.
> 3. Decrementing the reference count for the appropriate network namespace when
>    calling rdma_destroy_id.
> 
> In order to preserve the current behavior init_net is passed when calling from
> other modules.
> 
> Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> 
> ---
>  drivers/infiniband/core/cma.c                      | 52 +++++++++++++---------
>  drivers/infiniband/core/ucma.c                     |  3 +-
>  drivers/infiniband/ulp/iser/iser_verbs.c           |  2 +-
>  drivers/infiniband/ulp/isert/ib_isert.c            |  2 +-
>  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |  4 +-
>  include/rdma/rdma_cm.h                             |  6 ++-
>  net/9p/trans_rdma.c                                |  2 +-
>  net/rds/ib.c                                       |  2 +-
>  net/rds/ib_cm.c                                    |  2 +-
>  net/rds/iw.c                                       |  2 +-
>  net/rds/iw_cm.c                                    |  2 +-
>  net/rds/rdma_transport.c                           |  2 +-
>  net/sunrpc/xprtrdma/svc_rdma_transport.c           |  2 +-
>  net/sunrpc/xprtrdma/verbs.c                        |  3 +-
>  14 files changed, 52 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 022b0d0a51cc..f6379b38b366 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -540,7 +540,8 @@ static int cma_disable_callback(struct rdma_id_private *id_priv,
>  
>  struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
>  				  void *context, enum rdma_port_space ps,
> -				  enum ib_qp_type qp_type)
> +				  enum ib_qp_type qp_type,
> +				  struct net *net)
>  {
>  	struct rdma_id_private *id_priv;
>  
> @@ -562,7 +563,7 @@ struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
>  	INIT_LIST_HEAD(&id_priv->listen_list);
>  	INIT_LIST_HEAD(&id_priv->mc_list);
>  	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> -	id_priv->id.route.addr.dev_addr.net = &init_net;
> +	id_priv->id.route.addr.dev_addr.net = get_net(net);
>  
>  	return &id_priv->id;
>  }
> @@ -689,7 +690,7 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
>  	    rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
>  	    == IB_LINK_LAYER_ETHERNET) {
>  		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
> -						  &init_net);
> +				id_priv->id.route.addr.dev_addr.net);
>  
>  		if (ret)
>  			goto out;
> @@ -953,6 +954,7 @@ static void cma_cancel_operation(struct rdma_id_private *id_priv,
>  static void cma_release_port(struct rdma_id_private *id_priv)
>  {
>  	struct rdma_bind_list *bind_list = id_priv->bind_list;
> +	struct net *net = id_priv->id.route.addr.dev_addr.net;
>  
>  	if (!bind_list)
>  		return;
> @@ -960,7 +962,7 @@ static void cma_release_port(struct rdma_id_private *id_priv)
>  	mutex_lock(&lock);
>  	hlist_del(&id_priv->node);
>  	if (hlist_empty(&bind_list->owners)) {
> -		cma_ps_remove(bind_list->ps, &init_net, bind_list->port);
> +		cma_ps_remove(bind_list->ps, net, bind_list->port);
>  		kfree(bind_list);
>  	}
>  	mutex_unlock(&lock);
> @@ -1029,6 +1031,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
>  		cma_deref_id(id_priv->id.context);
>  
>  	kfree(id_priv->id.route.path_rec);
> +	put_net(id_priv->id.route.addr.dev_addr.net);
>  	kfree(id_priv);
>  }
>  EXPORT_SYMBOL(rdma_destroy_id);
> @@ -1156,7 +1159,8 @@ static struct rdma_id_private *cma_new_conn_id(struct rdma_cm_id *listen_id,
>  	int ret;
>  
>  	id = rdma_create_id(listen_id->event_handler, listen_id->context,
> -			    listen_id->ps, ib_event->param.req_rcvd.qp_type);
> +			    listen_id->ps, ib_event->param.req_rcvd.qp_type,
> +			    listen_id->route.addr.dev_addr.net);
>  	if (IS_ERR(id))
>  		return NULL;
>  
> @@ -1201,10 +1205,11 @@ static struct rdma_id_private *cma_new_udp_id(struct rdma_cm_id *listen_id,
>  {
>  	struct rdma_id_private *id_priv;
>  	struct rdma_cm_id *id;
> +	struct net *net = listen_id->route.addr.dev_addr.net;
>  	int ret;
>  
>  	id = rdma_create_id(listen_id->event_handler, listen_id->context,
> -			    listen_id->ps, IB_QPT_UD);
> +			    listen_id->ps, IB_QPT_UD, net);
>  	if (IS_ERR(id))
>  		return NULL;
>  
> @@ -1455,7 +1460,8 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id,
>  	/* Create a new RDMA id for the new IW CM ID */
>  	new_cm_id = rdma_create_id(listen_id->id.event_handler,
>  				   listen_id->id.context,
> -				   RDMA_PS_TCP, IB_QPT_RC);
> +				   RDMA_PS_TCP, IB_QPT_RC,
> +				   listen_id->id.route.addr.dev_addr.net);
>  	if (IS_ERR(new_cm_id)) {
>  		ret = -ENOMEM;
>  		goto out;
> @@ -1528,11 +1534,11 @@ static int cma_ib_listen(struct rdma_id_private *id_priv)
>  	struct ib_cm_compare_data compare_data;
>  	struct sockaddr *addr;
>  	struct ib_cm_id	*id;
> +	struct net *net = id_priv->id.route.addr.dev_addr.net;
>  	__be64 svc_id;
>  	int ret;
>  
> -	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
> -			     &init_net);
> +	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv, net);
>  	if (IS_ERR(id))
>  		return PTR_ERR(id);
>  
> @@ -1596,6 +1602,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
>  {
>  	struct rdma_id_private *dev_id_priv;
>  	struct rdma_cm_id *id;
> +	struct net *net = id_priv->id.route.addr.dev_addr.net;
>  	int ret;
>  
>  	if (cma_family(id_priv) == AF_IB &&
> @@ -1603,7 +1610,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
>  		return;
>  
>  	id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
> -			    id_priv->id.qp_type);
> +			    id_priv->id.qp_type, net);
>  	if (IS_ERR(id))
>  		return;
>  
> @@ -2283,7 +2290,8 @@ static int cma_alloc_port(struct radix_tree_root *ps,
>  	if (!bind_list)
>  		return -ENOMEM;
>  
> -	ret = cma_ps_alloc(ps, &init_net, bind_list, snum);
> +	ret = cma_ps_alloc(ps, id_priv->id.route.addr.dev_addr.net, bind_list,
> +			   snum);
>  	if (ret < 0)
>  		goto err;
>  
> @@ -2302,13 +2310,14 @@ static int cma_alloc_any_port(struct radix_tree_root *ps,
>  	static unsigned int last_used_port;
>  	int low, high, remaining;
>  	unsigned int rover;
> +	struct net *net = id_priv->id.route.addr.dev_addr.net;
>  
> -	inet_get_local_port_range(&init_net, &low, &high);
> +	inet_get_local_port_range(net, &low, &high);
>  	remaining = (high - low) + 1;
>  	rover = prandom_u32() % remaining + low;
>  retry:
>  	if (last_used_port != rover &&
> -	    !cma_ps_find(ps, &init_net, (unsigned short)rover)) {
> +	    !cma_ps_find(ps, net, (unsigned short)rover)) {
>  		int ret = cma_alloc_port(ps, id_priv, rover);
>  		/*
>  		 * Remember previously used port number in order to avoid
> @@ -2376,7 +2385,7 @@ static int cma_use_port(struct radix_tree_root *ps,
>  	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
>  		return -EACCES;
>  
> -	bind_list = cma_ps_find(ps, &init_net, snum);
> +	bind_list = cma_ps_find(ps, id_priv->id.route.addr.dev_addr.net, snum);
>  	if (!bind_list) {
>  		ret = cma_alloc_port(ps, id_priv, snum);
>  	} else {
> @@ -2573,8 +2582,11 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
>  		if (addr->sa_family == AF_INET)
>  			id_priv->afonly = 1;
>  #if IS_ENABLED(CONFIG_IPV6)
> -		else if (addr->sa_family == AF_INET6)
> -			id_priv->afonly = init_net.ipv6.sysctl.bindv6only;
> +		else if (addr->sa_family == AF_INET6) {
> +			struct net *net = id_priv->id.route.addr.dev_addr.net;
> +
> +			id_priv->afonly = net->ipv6.sysctl.bindv6only;
> +		}
>  #endif
>  	}
>  	ret = cma_get_port(id_priv);
> @@ -2687,7 +2699,7 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv,
>  	}
>  
>  	id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler,
> -			     id_priv, &init_net);
> +			     id_priv, id_priv->id.route.addr.dev_addr.net);
>  	if (IS_ERR(id)) {
>  		ret = PTR_ERR(id);
>  		goto out;
> @@ -2737,7 +2749,7 @@ static int cma_connect_ib(struct rdma_id_private *id_priv,
>  		       conn_param->private_data_len);
>  
>  	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv,
> -			     &init_net);
> +			     id_priv->id.route.addr.dev_addr.net);
>  	if (IS_ERR(id)) {
>  		ret = PTR_ERR(id);
>  		goto out;
> @@ -3387,6 +3399,7 @@ static int cma_netdev_change(struct net_device *ndev, struct rdma_id_private *id
>  	dev_addr = &id_priv->id.route.addr.dev_addr;
>  
>  	if ((dev_addr->bound_dev_if == ndev->ifindex) &&
> +	    (dev_net(ndev) == dev_addr->net) &&

net_eq() ?

>  	    memcmp(dev_addr->src_dev_addr, ndev->dev_addr, ndev->addr_len)) {
>  		printk(KERN_INFO "RDMA CM addr change for ndev %s used by id %p\n",
>  		       ndev->name, &id_priv->id);
> @@ -3412,9 +3425,6 @@ static int cma_netdev_callback(struct notifier_block *self, unsigned long event,
>  	struct rdma_id_private *id_priv;
>  	int ret = NOTIFY_DONE;
>  
> -	if (dev_net(ndev) != &init_net)
> -		return NOTIFY_DONE;
> -
>  	if (event != NETDEV_BONDING_FAILOVER)
>  		return NOTIFY_DONE;
>  
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index 56a4b7ca7ee3..de755f2c6166 100644
> --- a/drivers/infiniband/core/ucma.c
> +++ b/drivers/infiniband/core/ucma.c
> @@ -391,7 +391,8 @@ static ssize_t ucma_create_id(struct ucma_file *file, const char __user *inbuf,
>  		return -ENOMEM;
>  
>  	ctx->uid = cmd.uid;
> -	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type);
> +	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps, qp_type,
> +				    &init_net);
>  	if (IS_ERR(ctx->cm_id)) {
>  		ret = PTR_ERR(ctx->cm_id);
>  		goto err1;
> diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
> index 695a2704bd43..d4e9c639ad2f 100644
> --- a/drivers/infiniband/ulp/iser/iser_verbs.c
> +++ b/drivers/infiniband/ulp/iser/iser_verbs.c
> @@ -949,7 +949,7 @@ int iser_connect(struct iser_conn   *iser_conn,
>  
>  	ib_conn->cma_id = rdma_create_id(iser_cma_handler,
>  					 (void *)iser_conn,
> -					 RDMA_PS_TCP, IB_QPT_RC);
> +					 RDMA_PS_TCP, IB_QPT_RC, &init_net);
>  	if (IS_ERR(ib_conn->cma_id)) {
>  		err = PTR_ERR(ib_conn->cma_id);
>  		iser_err("rdma_create_id failed: %d\n", err);
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index dafb3c531f96..44a6fff8dc79 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -2960,7 +2960,7 @@ isert_setup_id(struct isert_np *isert_np)
>  	isert_dbg("ksockaddr: %p, sa: %p\n", &np->np_sockaddr, sa);
>  
>  	id = rdma_create_id(isert_cma_handler, isert_np,
> -			    RDMA_PS_TCP, IB_QPT_RC);
> +			    RDMA_PS_TCP, IB_QPT_RC, &init_net);
>  	if (IS_ERR(id)) {
>  		isert_err("rdma_create_id() failed: %ld\n", PTR_ERR(id));
>  		ret = PTR_ERR(id);
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> index b02b4ec1e29d..128de4eb0959 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> @@ -125,7 +125,9 @@ extern kib_tunables_t  kiblnd_tunables;
>  				     IBLND_CREDIT_HIGHWATER_V1 : \
>  				     *kiblnd_tunables.kib_peercredits_hiw) /* when eagerly to return credits */
>  
> -#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, ps, qpt)
> +#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, \
> +							       ps, qpt, \
> +							       &init_net)
>  
>  static inline int
>  kiblnd_concurrent_sends_v1(void)
> diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
> index 1ed2088dc9f5..3953e9c8bc94 100644
> --- a/include/rdma/rdma_cm.h
> +++ b/include/rdma/rdma_cm.h
> @@ -163,10 +163,14 @@ struct rdma_cm_id {
>   * @context: User specified context associated with the id.
>   * @ps: RDMA port space.
>   * @qp_type: type of queue pair associated with the id.
> + * @net: The network namespace in which to create the new id.
> + *
> + * The id holds a reference on the network namespace until it is destroyed.
>   */
>  struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
>  				  void *context, enum rdma_port_space ps,
> -				  enum ib_qp_type qp_type);
> +				  enum ib_qp_type qp_type,
> +				  struct net *net);
>  
>  /**
>    * rdma_destroy_id - Destroys an RDMA identifier.
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 14ad43b5cf89..577fd3129bcf 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -635,7 +635,7 @@ rdma_create_trans(struct p9_client *client, const char *addr, char *args)
>  
>  	/* Create the RDMA CM ID */
>  	rdma->cm_id = rdma_create_id(p9_cm_event_handler, client, RDMA_PS_TCP,
> -				     IB_QPT_RC);
> +				     IB_QPT_RC, &init_net);
>  	if (IS_ERR(rdma->cm_id))
>  		goto error;
>  
> diff --git a/net/rds/ib.c b/net/rds/ib.c
> index ba2dffeff608..cc137f523248 100644
> --- a/net/rds/ib.c
> +++ b/net/rds/ib.c
> @@ -326,7 +326,7 @@ static int rds_ib_laddr_check(__be32 addr)
>  	/* Create a CMA ID and try to bind it. This catches both
>  	 * IB and iWARP capable NICs.
>  	 */
> -	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
> +	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, &init_net);
>  	if (IS_ERR(cm_id))
>  		return PTR_ERR(cm_id);
>  
> diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
> index 31b74f5e61ad..d19b91296ddc 100644
> --- a/net/rds/ib_cm.c
> +++ b/net/rds/ib_cm.c
> @@ -584,7 +584,7 @@ int rds_ib_conn_connect(struct rds_connection *conn)
>  	/* XXX I wonder what affect the port space has */
>  	/* delegate cm event handler to rdma_transport */
>  	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
> -				     RDMA_PS_TCP, IB_QPT_RC);
> +				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
>  	if (IS_ERR(ic->i_cm_id)) {
>  		ret = PTR_ERR(ic->i_cm_id);
>  		ic->i_cm_id = NULL;
> diff --git a/net/rds/iw.c b/net/rds/iw.c
> index 589935661d66..8501b73ed12f 100644
> --- a/net/rds/iw.c
> +++ b/net/rds/iw.c
> @@ -227,7 +227,7 @@ static int rds_iw_laddr_check(__be32 addr)
>  	/* Create a CMA ID and try to bind it. This catches both
>  	 * IB and iWARP capable NICs.
>  	 */
> -	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
> +	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC, &init_net);
>  	if (IS_ERR(cm_id))
>  		return PTR_ERR(cm_id);
>  
> diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
> index a91e1db62ee6..e5ee2d562a60 100644
> --- a/net/rds/iw_cm.c
> +++ b/net/rds/iw_cm.c
> @@ -521,7 +521,7 @@ int rds_iw_conn_connect(struct rds_connection *conn)
>  	/* XXX I wonder what affect the port space has */
>  	/* delegate cm event handler to rdma_transport */
>  	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
> -				     RDMA_PS_TCP, IB_QPT_RC);
> +				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
>  	if (IS_ERR(ic->i_cm_id)) {
>  		ret = PTR_ERR(ic->i_cm_id);
>  		ic->i_cm_id = NULL;
> diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
> index 6cd9d1deafc3..066b60b27b12 100644
> --- a/net/rds/rdma_transport.c
> +++ b/net/rds/rdma_transport.c
> @@ -160,7 +160,7 @@ static int rds_rdma_listen_init(void)
>  	int ret;
>  
>  	cm_id = rdma_create_id(rds_rdma_cm_event_handler, NULL, RDMA_PS_TCP,
> -			       IB_QPT_RC);
> +			       IB_QPT_RC, &init_net);
>  	if (IS_ERR(cm_id)) {
>  		ret = PTR_ERR(cm_id);
>  		printk(KERN_ERR "RDS/RDMA: failed to setup listener, "
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 4e618808bc98..e3b246e305f9 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -701,7 +701,7 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
>  	xprt = &cma_xprt->sc_xprt;
>  
>  	listen_id = rdma_create_id(rdma_listen_handler, cma_xprt, RDMA_PS_TCP,
> -				   IB_QPT_RC);
> +				   IB_QPT_RC, &init_net);
>  	if (IS_ERR(listen_id)) {
>  		ret = PTR_ERR(listen_id);
>  		dprintk("svcrdma: rdma_create_id failed = %d\n", ret);
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index c98e40643910..f574e77165f4 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -483,7 +483,8 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
>  
>  	init_completion(&ia->ri_done);
>  
> -	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC);
> +	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC,
> +			    &init_net);
>  	if (IS_ERR(id)) {
>  		rc = PTR_ERR(id);
>  		dprintk("RPC:       %s: rdma_create_id() failed %i\n",

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter
       [not found]       ` <1422793376.3030.37.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
@ 2015-02-01 13:46         ` Shachar Raindel
       [not found]           ` <AM3PR05MB0935B7B53439298A7429158BDC3F0-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 13:46 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Haggai Eran, Yotam Kenneth



> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Sunday, February 01, 2015 2:23 PM
> To: Shachar Raindel
> Cc: roland@kernel.org; sean.hefty@intel.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; Liran Liss; Guy Shapiro; Haggai Eran; Yotam
> Kenneth
> Subject: Re: [PATCH for-next 01/10] IB/addr: Pass network namespace as a
> parameter
> 
> Hi,
> 
> Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> > From: Guy Shapiro <guysh@mellanox.com>
> >
> > Add network namespace support to the ib_addr module. For that, all the
> address
> > resolution and matching should be done using the appropriate namespace
> instead
> > of init_net.
> >
> > This is achieved by:
> >
> > 1. Adding an explicit network namespace argument to exported function
> that
> >    require a namespace.
> > 2. Saving the namespace in the rdma_addr_client structure.
> > 3. Using it when calling networking functions.
> >
> > In order to preserve the behavior of calling modules, &init_net is
> > passed as the parameter in calls from other modules. This is modified
> as
> > namspace support is added on more levels.
> 
> typo: "namespace"
> 

Thanks. Will fix in next iteration.

> >
> > Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> > Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> > Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> > Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> >
> > ---
> >  drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
> >  drivers/infiniband/core/cma.c            |  4 ++-
> >  drivers/infiniband/core/verbs.c          | 14 +++++++---
> >  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
> >  include/rdma/ib_addr.h                   | 44
> ++++++++++++++++++++++++++++----
> >  5 files changed, 72 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/addr.c
> b/drivers/infiniband/core/addr.c
> > index f80da50d84a5..95beaef6b66d 100644
> > --- a/drivers/infiniband/core/addr.c
> > +++ b/drivers/infiniband/core/addr.c
> > @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr,
> struct rdma_dev_addr *dev_addr,
> >  	int ret = -EADDRNOTAVAIL;
> >
> >  	if (dev_addr->bound_dev_if) {
> > -		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
> > +		dev = dev_get_by_index(dev_addr->net, dev_addr-
> >bound_dev_if);
> >  		if (!dev)
> >  			return -ENODEV;
> >  		ret = rdma_copy_addr(dev_addr, dev, NULL);
> > @@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr,
> struct rdma_dev_addr *dev_addr,
> >  	}
> >
> >  	switch (addr->sa_family) {
> > -	case AF_INET:
> > -		dev = ip_dev_find(&init_net,
> > -			((struct sockaddr_in *) addr)->sin_addr.s_addr);
> > +	case AF_INET: {
> > +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
> > +
> > +		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
> 
> I don't see the point of this change.
> 

Note that we changed &init_net to be dev_addr->net .
The rest of the change here was done to avoid issues with checkpatch, as the line was getting really long.

> >
> >  		if (!dev)
> >  			return ret;
> > @@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr,
> struct rdma_dev_addr *dev_addr,
> >  			*vlan_id = rdma_vlan_dev_vlan_id(dev);
> >  		dev_put(dev);
> >  		break;
> > -
> > +	}
> 
> closing } here ?

We opened a block in the beginning of this case ("case AF_INET: {"), we close it at the end of the case.

> 
> >  #if IS_ENABLED(CONFIG_IPV6)
> >  	case AF_INET6:
> >  		rcu_read_lock();
> > -		for_each_netdev_rcu(&init_net, dev) {
> > -			if (ipv6_chk_addr(&init_net,
> > +		for_each_netdev_rcu(dev_addr->net, dev) {
> > +			if (ipv6_chk_addr(dev_addr->net,
> >  					  &((struct sockaddr_in6 *) addr)->sin6_addr,
> >  					  dev, 1)) {
> >  				ret = rdma_copy_addr(dev_addr, dev, NULL);
> > @@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in
> *src_in,
> >  	fl4.daddr = dst_ip;
> >  	fl4.saddr = src_ip;
> >  	fl4.flowi4_oif = addr->bound_dev_if;
> > -	rt = ip_route_output_key(&init_net, &fl4);
> > +	rt = ip_route_output_key(addr->net, &fl4);
> >  	if (IS_ERR(rt)) {
> >  		ret = PTR_ERR(rt);
> >  		goto out;
> > @@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6
> *src_in,
> >  	fl6.saddr = src_in->sin6_addr;
> >  	fl6.flowi6_oif = addr->bound_dev_if;
> >
> > -	dst = ip6_route_output(&init_net, NULL, &fl6);
> > +	dst = ip6_route_output(addr->net, NULL, &fl6);
> >  	if ((ret = dst->error))
> >  		goto put;
> >
> >  	if (ipv6_addr_any(&fl6.saddr)) {
> > -		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
> > +		ret = ipv6_dev_get_saddr(addr->net,
> > +					 ip6_dst_idev(dst)->dev,
> >  					 &fl6.daddr, 0, &fl6.saddr);
> >  		if (ret)
> >  			goto put;
> > @@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr
> *src_addr,
> >  }
> >
> >  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
> *dgid, u8 *dmac,
> > -			       u16 *vlan_id)
> > +			       u16 *vlan_id, struct net *net)
> >  {
> >  	int ret = 0;
> >  	struct rdma_dev_addr dev_addr;
> > @@ -481,6 +483,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> union ib_gid *dgid, u8 *dmac,
> >  		return ret;
> >
> >  	memset(&dev_addr, 0, sizeof(dev_addr));
> > +	dev_addr.net = net;
> 
> Should be get_net() be used somewhere to grab a reference on the net
> namespace ?
> 

Not needed, as dev_addr.net is used only inside this function. Assuming that the caller guarantees that the network namespace doesn't disappear until the function returns, there is no need to take a reference here. This kind of assumption makes sense, as otherwise we will not be able to use the argument at all.

> >
> >  	ctx.addr = &dev_addr;
> >  	init_completion(&ctx.comp);
> > @@ -492,7 +495,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> union ib_gid *dgid, u8 *dmac,
> >  	wait_for_completion(&ctx.comp);
> >
> >  	memcpy(dmac, dev_addr.dst_dev_addr, ETH_ALEN);
> > -	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
> > +	dev = dev_get_by_index(net, dev_addr.bound_dev_if);
> >  	if (!dev)
> >  		return -ENODEV;
> >  	if (vlan_id)
> > @@ -502,7 +505,8 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> union ib_gid *dgid, u8 *dmac,
> >  }
> >  EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh);
> >
> > -int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> *vlan_id)
> > +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> *vlan_id,
> > +				struct net *net)
> >  {
> >  	int ret = 0;
> >  	struct rdma_dev_addr dev_addr;
> > @@ -517,6 +521,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid
> *sgid, u8 *smac, u16 *vlan_id)
> >  	if (ret)
> >  		return ret;
> >  	memset(&dev_addr, 0, sizeof(dev_addr));
> > +	dev_addr.net = net;
> 
> get_net() ?
> 

Same as before - used only in the function, caller must make sure it doesn't disappear.

> >  	ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
> >  	if (ret)
> >  		return ret;
> > diff --git a/drivers/infiniband/core/cma.c
> b/drivers/infiniband/core/cma.c
> > index 6e5e11ca7702..aeb2417ec928 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -512,6 +512,7 @@ struct rdma_cm_id
> *rdma_create_id(rdma_cm_event_handler event_handler,
> >  	INIT_LIST_HEAD(&id_priv->listen_list);
> >  	INIT_LIST_HEAD(&id_priv->mc_list);
> >  	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> > +	id_priv->id.route.addr.dev_addr.net = &init_net;
> >
> >  	return &id_priv->id;
> >  }
> > @@ -637,7 +638,8 @@ static int cma_modify_qp_rtr(struct
> rdma_id_private *id_priv,
> >  	    == RDMA_TRANSPORT_IB &&
> >  	    rdma_port_get_link_layer(id_priv->id.device, id_priv-
> >id.port_num)
> >  	    == IB_LINK_LAYER_ETHERNET) {
> > -		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
> > +		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
> > +						  &init_net);
> >
> >  		if (ret)
> >  			goto out;
> > diff --git a/drivers/infiniband/core/verbs.c
> b/drivers/infiniband/core/verbs.c
> > index f93eb8da7b5a..ca5c4dd8a67a 100644
> > --- a/drivers/infiniband/core/verbs.c
> > +++ b/drivers/infiniband/core/verbs.c
> > @@ -212,7 +212,9 @@ int ib_init_ah_from_wc(struct ib_device *device,
> u8 port_num, struct ib_wc *wc,
> >  			ah_attr->vlan_id = wc->vlan_id;
> >  		} else {
> >  			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
> > -					ah_attr->dmac, &ah_attr->vlan_id);
> > +							 ah_attr->dmac,
> > +							 &ah_attr->vlan_id,
> > +							 &init_net);
> >  			if (ret)
> >  				return ret;
> >  		}
> > @@ -882,11 +884,15 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
> >  			if (!(*qp_attr_mask & IB_QP_VID))
> >  				qp_attr->vlan_id = rdma_get_vlan_id(&sgid);
> >  		} else {
> > -			ret = rdma_addr_find_dmac_by_grh(&sgid, &qp_attr-
> >ah_attr.grh.dgid,
> > -					qp_attr->ah_attr.dmac, &qp_attr->vlan_id);
> > +			ret = rdma_addr_find_dmac_by_grh(
> > +				&sgid,
> > +				&qp_attr->ah_attr.grh.dgid,
> > +				qp_attr->ah_attr.dmac, &qp_attr->vlan_id,
> > +				&init_net);
> >  			if (ret)
> >  				goto out;
> > -			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
> NULL);
> > +			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
> > +							  NULL, &init_net);
> >  			if (ret)
> >  				goto out;
> >  		}
> > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > index f3cc8c9e65ae..debaac2b6ee8 100644
> > --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > @@ -119,7 +119,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd,
> struct ib_ah_attr *attr)
> >
> >  	if (pd->uctx) {
> >  		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
> > -                                        attr->dmac, &attr->vlan_id);
> > +						    attr->dmac, &attr->vlan_id,
> > +						    &init_net);
> >  		if (status) {
> >  			pr_err("%s(): Failed to resolve dmac from gid."
> >  				"status = %d\n", __func__, status);
> > diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
> > index ce55906b54a0..40ccf8b83755 100644
> > --- a/include/rdma/ib_addr.h
> > +++ b/include/rdma/ib_addr.h
> > @@ -47,6 +47,7 @@
> >  #include <rdma/ib_verbs.h>
> >  #include <rdma/ib_pack.h>
> >  #include <net/ipv6.h>
> > +#include <net/net_namespace.h>
> >
> >  struct rdma_addr_client {
> >  	atomic_t refcount;
> > @@ -64,6 +65,16 @@ void rdma_addr_register_client(struct
> rdma_addr_client *client);
> >   */
> >  void rdma_addr_unregister_client(struct rdma_addr_client *client);
> >
> > +/**
> > + * struct rdma_dev_addr - Contains resolved RDMA hardware addresses
> > + * @src_dev_addr:	Source MAC address.
> > + * @dst_dev_addr:	Destination MAC address.
> > + * @broadcast:		Broadcast address of the device.
> > + * @dev_type:		The interface hardware type of the device.
> > + * @bound_dev_if:	An optional device interface index.
> > + * @transport:		The transport type used.
> > + * @net:		Network namespace containing the bound_dev_if
> net_dev.
> > + */
> >  struct rdma_dev_addr {
> >  	unsigned char src_dev_addr[MAX_ADDR_LEN];
> >  	unsigned char dst_dev_addr[MAX_ADDR_LEN];
> > @@ -71,11 +82,14 @@ struct rdma_dev_addr {
> >  	unsigned short dev_type;
> >  	int bound_dev_if;
> >  	enum rdma_transport_type transport;
> > +	struct net *net;
> >  };
> >
> >  /**
> >   * rdma_translate_ip - Translate a local IP address to an RDMA
> hardware
> >   *   address.
> > + *
> > + * The dev_addr->net field must be initialized.
> >   */
> >  int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr
> *dev_addr,
> >  		      u16 *vlan_id);
> > @@ -90,7 +104,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct
> rdma_dev_addr *dev_addr,
> >   * @dst_addr: The destination address to resolve.
> >   * @addr: A reference to a data location that will receive the
> resolved
> >   *   addresses.  The data location must remain valid until the
> callback has
> > - *   been invoked.
> > + *   been invoked. The net field of the addr struct must be valid.
> >   * @timeout_ms: Amount of time to wait for the address resolution to
> complete.
> >   * @callback: Call invoked once address resolution has completed,
> timed out,
> >   *   or been canceled.  A status of 0 indicates success.
> > @@ -110,9 +124,29 @@ int rdma_copy_addr(struct rdma_dev_addr
> *dev_addr, struct net_device *dev,
> >
> >  int rdma_addr_size(struct sockaddr *addr);
> >
> > -int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> *vlan_id);
> > -int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
> *dgid, u8 *smac,
> > -			       u16 *vlan_id);
> > +/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for
> a src GID
> > + * @sgid:	Source GID to find the MAC and VLAN for.
> > + * @smac:	A buffer to contain the resulting MAC address.
> > + * @vlan_id:	Will contain the resulting VLAN ID.
> > + * @net:	Network namespace to use for the address resolution.
> > + *
> > + * It is the caller's responsibility to keep the network namespace
> alive until
> > + * the function returns.
> 
> Why ?
> 

So that we could use the argument. Otherwise, we will need to have ugly code like:
------------------------
struct net *local_net = NULL;
rcu_read_lock();
for_each_net_rcu(local_net)
	if (local_net == net)
		break;
if (local_net == net)
	get_net(local_net);
else
	local_net = NULL;
rcu_read_unlock();
------------------------
however, the callers (in following patches), can easily ensure that the network namespace is here to stay. This is much easier to understand and maintain.


> > + */
> > +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> *vlan_id,
> > +				struct net *net);
> > +/** rdma_addr_find_dmac_by_grh() - Find the dst MAC and VLAN ID for a
> GID pair
> > + * @sgid:	Source GID to use for the search.
> > + * @dgid:	Destination GID to find the details for.
> > + * @dmac:	Contains the resulting destination MAC address.
> > + * @vlan_id:	Contains the resulting VLAN ID.
> > + * @net:	Network namespace to use for the address resolution.
> > + *
> > + * It is the caller's responsibility to keep the network namespace
> alive until
> > + * the function returns.
> 
> Why ?
> 

See above.

> > + */
> > +int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
> *dgid, u8 *dmac,
> > +			       u16 *vlan_id, struct net *net);
> >
> >  static inline u16 ib_addr_get_pkey(struct rdma_dev_addr *dev_addr)
> >  {
> > @@ -182,7 +216,7 @@ static inline void iboe_addr_get_sgid(struct
> rdma_dev_addr *dev_addr,
> >  	struct net_device *dev;
> >  	struct in_device *ip4;
> >
> > -	dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
> > +	dev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if);
> >  	if (dev) {
> >  		ip4 = (struct in_device *)dev->ip_ptr;
> >  		if (ip4 && ip4->ifa_list && ip4->ifa_list->ifa_address)
> 
> 
> I believe this patch lack proper reference counting in form of
> get_net() / put_net(), but cannot say for sure.
> 

If you could point to specific issues or race conditions, that would be great.
We have thoroughly tested and reviewed the code, and couldn't find any such issues with the submitted patches.

--Shachar



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH for-next 02/10] IB/core: Pass network namespace as a parameter to relevant functions
  2015-02-01 12:26       ` Yann Droneaud
@ 2015-02-01 14:10         ` Shachar Raindel
  0 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 14:10 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: roland, sean.hefty, linux-rdma, netdev, Liran Liss, Guy Shapiro,
	Haggai Eran, Yotam Kenneth

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Sunday, February 01, 2015 2:27 PM
> To: Shachar Raindel
> Cc: roland@kernel.org; sean.hefty@intel.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; Liran Liss; Guy Shapiro; Haggai Eran; Yotam
> Kenneth
> Subject: Re: [PATCH for-next 02/10] IB/core: Pass network namespace as a
> parameter to relevant functions
> 
> Hi,
> 
> Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> > From: Guy Shapiro <guysh@mellanox.com>
> >
> > Add network namespace parameters for the address related ib_core
> > functions. The parameter is passed to lower level function, instead of
> > &init_net, so things are done in the correct namespace.
> >
> > For now pass &init_net on every caller.
> > Callers that will pass &init_net permanently are marked with an
> > appropriate comment.
> >
> > Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> > Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> > Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> > Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> >
> > ---
> >  drivers/infiniband/core/agent.c       |  4 +++-
> >  drivers/infiniband/core/cm.c          |  9 +++++++--
> >  drivers/infiniband/core/mad_rmpp.c    | 10 ++++++++--
> >  drivers/infiniband/core/user_mad.c    |  4 +++-
> >  drivers/infiniband/core/verbs.c       | 10 ++++++----
> >  drivers/infiniband/ulp/srpt/ib_srpt.c |  3 ++-
> >  include/rdma/ib_verbs.h               | 15 +++++++++++++--
> >  7 files changed, 42 insertions(+), 13 deletions(-)
> 
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 0d74f1de99aa..dd4c80cea8d3 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -48,6 +48,7 @@
> >  #include <linux/rwsem.h>
> >  #include <linux/scatterlist.h>
> >  #include <linux/workqueue.h>
> > +#include <net/net_namespace.h>
> >  #include <uapi/linux/if_ether.h>
> >
> >  #include <linux/atomic.h>
> > @@ -1801,9 +1802,14 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd,
> struct ib_ah_attr *ah_attr);
> >   *   ignored unless the work completion indicates that the GRH is
> valid.
> >   * @ah_attr: Returned attributes that can be used when creating an
> address
> >   *   handle for replying to the message.
> > + * @net: The network namespace to use for address resolution.
> > + *
> > + * It is the caller's responsibility to make sure the network
> namespace is
> > + * alive until the function returns.
> 
> Why ?
> 

For the same reason we described in the previous patch. It is nearly impossible to code if your function parameters are not guaranteed to be alive during the call to your function.

> >   */
> >  int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct
> ib_wc *wc,
> > -		       struct ib_grh *grh, struct ib_ah_attr *ah_attr);
> > +		       struct ib_grh *grh, struct ib_ah_attr *ah_attr,
> > +		       struct net *net);
> >
> >  /**
> >   * ib_create_ah_from_wc - Creates an address handle associated with
> the
> > @@ -1813,12 +1819,17 @@ int ib_init_ah_from_wc(struct ib_device
> *device, u8 port_num, struct ib_wc *wc,
> >   * @grh: References the received global route header.  This parameter
> is
> >   *   ignored unless the work completion indicates that the GRH is
> valid.
> >   * @port_num: The outbound port number to associate with the address.
> > + * @net: The network namespace to use for address resolution.
> >   *
> >   * The address handle is used to reference a local or global
> destination
> >   * in all UD QP post sends.
> > + *
> > + * It is the caller's responsibility to make sure the network
> namespace is
> > + * alive until the function returns.
> 
> Why ?
> 

For the same reason we described in the previous patch. It is nearly impossible to code if your function parameters are not guaranteed to be alive during the call to your function.



Thanks,
--Shachar


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH for-next 08/10] IB/cma: Add support for network namespaces
       [not found]       ` <1422798272.3030.48.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
@ 2015-02-01 14:16         ` Shachar Raindel
  0 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 14:16 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Haggai Eran, Yotam Kenneth

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 19390 bytes --]



> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Sunday, February 01, 2015 3:45 PM
> To: Shachar Raindel
> Cc: roland@kernel.org; sean.hefty@intel.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; Liran Liss; Guy Shapiro; Haggai Eran; Yotam
> Kenneth
> Subject: Re: [PATCH for-next 08/10] IB/cma: Add support for network
> namespaces
> 
> Hi,
> 
> Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> > From: Guy Shapiro <guysh@mellanox.com>
> >
> > Add support for network namespaces in the ib_cma module. This is
> > accomplished by:
> >
> > 1. Adding network namespace parameter for rdma_create_id. This
> parameter is used
> >    to populate the network namespace field in rdma_id_private.
> rdma_create_id
> >    keeps a reference on the network namespace.
> > 2. Using the network namespace from the rdma_id instead of init_net
> inside of
> >    ib_cma.
> > 3. Decrementing the reference count for the appropriate network
> namespace when
> >    calling rdma_destroy_id.
> >
> > In order to preserve the current behavior init_net is passed when
> calling from
> > other modules.
> >
> > Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> > Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> > Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> > Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> >
> > ---
> >  drivers/infiniband/core/cma.c                      | 52
> +++++++++++++---------
> >  drivers/infiniband/core/ucma.c                     |  3 +-
> >  drivers/infiniband/ulp/iser/iser_verbs.c           |  2 +-
> >  drivers/infiniband/ulp/isert/ib_isert.c            |  2 +-
> >  .../staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h    |  4 +-
> >  include/rdma/rdma_cm.h                             |  6 ++-
> >  net/9p/trans_rdma.c                                |  2 +-
> >  net/rds/ib.c                                       |  2 +-
> >  net/rds/ib_cm.c                                    |  2 +-
> >  net/rds/iw.c                                       |  2 +-
> >  net/rds/iw_cm.c                                    |  2 +-
> >  net/rds/rdma_transport.c                           |  2 +-
> >  net/sunrpc/xprtrdma/svc_rdma_transport.c           |  2 +-
> >  net/sunrpc/xprtrdma/verbs.c                        |  3 +-
> >  14 files changed, 52 insertions(+), 34 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/cma.c
> b/drivers/infiniband/core/cma.c
> > index 022b0d0a51cc..f6379b38b366 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -540,7 +540,8 @@ static int cma_disable_callback(struct
> rdma_id_private *id_priv,
> >
> >  struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler
> event_handler,
> >  				  void *context, enum rdma_port_space ps,
> > -				  enum ib_qp_type qp_type)
> > +				  enum ib_qp_type qp_type,
> > +				  struct net *net)
> >  {
> >  	struct rdma_id_private *id_priv;
> >
> > @@ -562,7 +563,7 @@ struct rdma_cm_id
> *rdma_create_id(rdma_cm_event_handler event_handler,
> >  	INIT_LIST_HEAD(&id_priv->listen_list);
> >  	INIT_LIST_HEAD(&id_priv->mc_list);
> >  	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> > -	id_priv->id.route.addr.dev_addr.net = &init_net;
> > +	id_priv->id.route.addr.dev_addr.net = get_net(net);
> >
> >  	return &id_priv->id;
> >  }
> > @@ -689,7 +690,7 @@ static int cma_modify_qp_rtr(struct
> rdma_id_private *id_priv,
> >  	    rdma_port_get_link_layer(id_priv->id.device, id_priv-
> >id.port_num)
> >  	    == IB_LINK_LAYER_ETHERNET) {
> >  		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
> > -						  &init_net);
> > +				id_priv->id.route.addr.dev_addr.net);
> >
> >  		if (ret)
> >  			goto out;
> > @@ -953,6 +954,7 @@ static void cma_cancel_operation(struct
> rdma_id_private *id_priv,
> >  static void cma_release_port(struct rdma_id_private *id_priv)
> >  {
> >  	struct rdma_bind_list *bind_list = id_priv->bind_list;
> > +	struct net *net = id_priv->id.route.addr.dev_addr.net;
> >
> >  	if (!bind_list)
> >  		return;
> > @@ -960,7 +962,7 @@ static void cma_release_port(struct
> rdma_id_private *id_priv)
> >  	mutex_lock(&lock);
> >  	hlist_del(&id_priv->node);
> >  	if (hlist_empty(&bind_list->owners)) {
> > -		cma_ps_remove(bind_list->ps, &init_net, bind_list->port);
> > +		cma_ps_remove(bind_list->ps, net, bind_list->port);
> >  		kfree(bind_list);
> >  	}
> >  	mutex_unlock(&lock);
> > @@ -1029,6 +1031,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
> >  		cma_deref_id(id_priv->id.context);
> >
> >  	kfree(id_priv->id.route.path_rec);
> > +	put_net(id_priv->id.route.addr.dev_addr.net);
> >  	kfree(id_priv);
> >  }
> >  EXPORT_SYMBOL(rdma_destroy_id);
> > @@ -1156,7 +1159,8 @@ static struct rdma_id_private
> *cma_new_conn_id(struct rdma_cm_id *listen_id,
> >  	int ret;
> >
> >  	id = rdma_create_id(listen_id->event_handler, listen_id->context,
> > -			    listen_id->ps, ib_event->param.req_rcvd.qp_type);
> > +			    listen_id->ps, ib_event->param.req_rcvd.qp_type,
> > +			    listen_id->route.addr.dev_addr.net);
> >  	if (IS_ERR(id))
> >  		return NULL;
> >
> > @@ -1201,10 +1205,11 @@ static struct rdma_id_private
> *cma_new_udp_id(struct rdma_cm_id *listen_id,
> >  {
> >  	struct rdma_id_private *id_priv;
> >  	struct rdma_cm_id *id;
> > +	struct net *net = listen_id->route.addr.dev_addr.net;
> >  	int ret;
> >
> >  	id = rdma_create_id(listen_id->event_handler, listen_id->context,
> > -			    listen_id->ps, IB_QPT_UD);
> > +			    listen_id->ps, IB_QPT_UD, net);
> >  	if (IS_ERR(id))
> >  		return NULL;
> >
> > @@ -1455,7 +1460,8 @@ static int iw_conn_req_handler(struct iw_cm_id
> *cm_id,
> >  	/* Create a new RDMA id for the new IW CM ID */
> >  	new_cm_id = rdma_create_id(listen_id->id.event_handler,
> >  				   listen_id->id.context,
> > -				   RDMA_PS_TCP, IB_QPT_RC);
> > +				   RDMA_PS_TCP, IB_QPT_RC,
> > +				   listen_id->id.route.addr.dev_addr.net);
> >  	if (IS_ERR(new_cm_id)) {
> >  		ret = -ENOMEM;
> >  		goto out;
> > @@ -1528,11 +1534,11 @@ static int cma_ib_listen(struct
> rdma_id_private *id_priv)
> >  	struct ib_cm_compare_data compare_data;
> >  	struct sockaddr *addr;
> >  	struct ib_cm_id	*id;
> > +	struct net *net = id_priv->id.route.addr.dev_addr.net;
> >  	__be64 svc_id;
> >  	int ret;
> >
> > -	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
> > -			     &init_net);
> > +	id = ib_create_cm_id(id_priv->id.device, cma_req_handler, id_priv,
> net);
> >  	if (IS_ERR(id))
> >  		return PTR_ERR(id);
> >
> > @@ -1596,6 +1602,7 @@ static void cma_listen_on_dev(struct
> rdma_id_private *id_priv,
> >  {
> >  	struct rdma_id_private *dev_id_priv;
> >  	struct rdma_cm_id *id;
> > +	struct net *net = id_priv->id.route.addr.dev_addr.net;
> >  	int ret;
> >
> >  	if (cma_family(id_priv) == AF_IB &&
> > @@ -1603,7 +1610,7 @@ static void cma_listen_on_dev(struct
> rdma_id_private *id_priv,
> >  		return;
> >
> >  	id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
> > -			    id_priv->id.qp_type);
> > +			    id_priv->id.qp_type, net);
> >  	if (IS_ERR(id))
> >  		return;
> >
> > @@ -2283,7 +2290,8 @@ static int cma_alloc_port(struct radix_tree_root
> *ps,
> >  	if (!bind_list)
> >  		return -ENOMEM;
> >
> > -	ret = cma_ps_alloc(ps, &init_net, bind_list, snum);
> > +	ret = cma_ps_alloc(ps, id_priv->id.route.addr.dev_addr.net,
> bind_list,
> > +			   snum);
> >  	if (ret < 0)
> >  		goto err;
> >
> > @@ -2302,13 +2310,14 @@ static int cma_alloc_any_port(struct
> radix_tree_root *ps,
> >  	static unsigned int last_used_port;
> >  	int low, high, remaining;
> >  	unsigned int rover;
> > +	struct net *net = id_priv->id.route.addr.dev_addr.net;
> >
> > -	inet_get_local_port_range(&init_net, &low, &high);
> > +	inet_get_local_port_range(net, &low, &high);
> >  	remaining = (high - low) + 1;
> >  	rover = prandom_u32() % remaining + low;
> >  retry:
> >  	if (last_used_port != rover &&
> > -	    !cma_ps_find(ps, &init_net, (unsigned short)rover)) {
> > +	    !cma_ps_find(ps, net, (unsigned short)rover)) {
> >  		int ret = cma_alloc_port(ps, id_priv, rover);
> >  		/*
> >  		 * Remember previously used port number in order to avoid
> > @@ -2376,7 +2385,7 @@ static int cma_use_port(struct radix_tree_root
> *ps,
> >  	if (snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
> >  		return -EACCES;
> >
> > -	bind_list = cma_ps_find(ps, &init_net, snum);
> > +	bind_list = cma_ps_find(ps, id_priv->id.route.addr.dev_addr.net,
> snum);
> >  	if (!bind_list) {
> >  		ret = cma_alloc_port(ps, id_priv, snum);
> >  	} else {
> > @@ -2573,8 +2582,11 @@ int rdma_bind_addr(struct rdma_cm_id *id,
> struct sockaddr *addr)
> >  		if (addr->sa_family == AF_INET)
> >  			id_priv->afonly = 1;
> >  #if IS_ENABLED(CONFIG_IPV6)
> > -		else if (addr->sa_family == AF_INET6)
> > -			id_priv->afonly = init_net.ipv6.sysctl.bindv6only;
> > +		else if (addr->sa_family == AF_INET6) {
> > +			struct net *net = id_priv->id.route.addr.dev_addr.net;
> > +
> > +			id_priv->afonly = net->ipv6.sysctl.bindv6only;
> > +		}
> >  #endif
> >  	}
> >  	ret = cma_get_port(id_priv);
> > @@ -2687,7 +2699,7 @@ static int cma_resolve_ib_udp(struct
> rdma_id_private *id_priv,
> >  	}
> >
> >  	id = ib_create_cm_id(id_priv->id.device, cma_sidr_rep_handler,
> > -			     id_priv, &init_net);
> > +			     id_priv, id_priv->id.route.addr.dev_addr.net);
> >  	if (IS_ERR(id)) {
> >  		ret = PTR_ERR(id);
> >  		goto out;
> > @@ -2737,7 +2749,7 @@ static int cma_connect_ib(struct rdma_id_private
> *id_priv,
> >  		       conn_param->private_data_len);
> >
> >  	id = ib_create_cm_id(id_priv->id.device, cma_ib_handler, id_priv,
> > -			     &init_net);
> > +			     id_priv->id.route.addr.dev_addr.net);
> >  	if (IS_ERR(id)) {
> >  		ret = PTR_ERR(id);
> >  		goto out;
> > @@ -3387,6 +3399,7 @@ static int cma_netdev_change(struct net_device
> *ndev, struct rdma_id_private *id
> >  	dev_addr = &id_priv->id.route.addr.dev_addr;
> >
> >  	if ((dev_addr->bound_dev_if == ndev->ifindex) &&
> > +	    (dev_net(ndev) == dev_addr->net) &&
> 
> net_eq() ?

The original code (below) contained the same comparison style.
Will fix in next iteration to use net_eq.

> 
> >  	    memcmp(dev_addr->src_dev_addr, ndev->dev_addr, ndev->addr_len))
> {
> >  		printk(KERN_INFO "RDMA CM addr change for ndev %s used by id
> %p\n",
> >  		       ndev->name, &id_priv->id);
> > @@ -3412,9 +3425,6 @@ static int cma_netdev_callback(struct
> notifier_block *self, unsigned long event,
> >  	struct rdma_id_private *id_priv;
> >  	int ret = NOTIFY_DONE;
> >
> > -	if (dev_net(ndev) != &init_net)
> > -		return NOTIFY_DONE;
> > -
> >  	if (event != NETDEV_BONDING_FAILOVER)
> >  		return NOTIFY_DONE;
> >
> > diff --git a/drivers/infiniband/core/ucma.c
> b/drivers/infiniband/core/ucma.c
> > index 56a4b7ca7ee3..de755f2c6166 100644
> > --- a/drivers/infiniband/core/ucma.c
> > +++ b/drivers/infiniband/core/ucma.c
> > @@ -391,7 +391,8 @@ static ssize_t ucma_create_id(struct ucma_file
> *file, const char __user *inbuf,
> >  		return -ENOMEM;
> >
> >  	ctx->uid = cmd.uid;
> > -	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps,
> qp_type);
> > +	ctx->cm_id = rdma_create_id(ucma_event_handler, ctx, cmd.ps,
> qp_type,
> > +				    &init_net);
> >  	if (IS_ERR(ctx->cm_id)) {
> >  		ret = PTR_ERR(ctx->cm_id);
> >  		goto err1;
> > diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c
> b/drivers/infiniband/ulp/iser/iser_verbs.c
> > index 695a2704bd43..d4e9c639ad2f 100644
> > --- a/drivers/infiniband/ulp/iser/iser_verbs.c
> > +++ b/drivers/infiniband/ulp/iser/iser_verbs.c
> > @@ -949,7 +949,7 @@ int iser_connect(struct iser_conn   *iser_conn,
> >
> >  	ib_conn->cma_id = rdma_create_id(iser_cma_handler,
> >  					 (void *)iser_conn,
> > -					 RDMA_PS_TCP, IB_QPT_RC);
> > +					 RDMA_PS_TCP, IB_QPT_RC, &init_net);
> >  	if (IS_ERR(ib_conn->cma_id)) {
> >  		err = PTR_ERR(ib_conn->cma_id);
> >  		iser_err("rdma_create_id failed: %d\n", err);
> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> b/drivers/infiniband/ulp/isert/ib_isert.c
> > index dafb3c531f96..44a6fff8dc79 100644
> > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > @@ -2960,7 +2960,7 @@ isert_setup_id(struct isert_np *isert_np)
> >  	isert_dbg("ksockaddr: %p, sa: %p\n", &np->np_sockaddr, sa);
> >
> >  	id = rdma_create_id(isert_cma_handler, isert_np,
> > -			    RDMA_PS_TCP, IB_QPT_RC);
> > +			    RDMA_PS_TCP, IB_QPT_RC, &init_net);
> >  	if (IS_ERR(id)) {
> >  		isert_err("rdma_create_id() failed: %ld\n", PTR_ERR(id));
> >  		ret = PTR_ERR(id);
> > diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> > index b02b4ec1e29d..128de4eb0959 100644
> > --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> > +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h
> > @@ -125,7 +125,9 @@ extern kib_tunables_t  kiblnd_tunables;
> >  				     IBLND_CREDIT_HIGHWATER_V1 : \
> >  				     *kiblnd_tunables.kib_peercredits_hiw) /* when
> eagerly to return credits */
> >
> > -#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb,
> dev, ps, qpt)
> > +#define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb,
> dev, \
> > +							       ps, qpt, \
> > +							       &init_net)
> >
> >  static inline int
> >  kiblnd_concurrent_sends_v1(void)
> > diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
> > index 1ed2088dc9f5..3953e9c8bc94 100644
> > --- a/include/rdma/rdma_cm.h
> > +++ b/include/rdma/rdma_cm.h
> > @@ -163,10 +163,14 @@ struct rdma_cm_id {
> >   * @context: User specified context associated with the id.
> >   * @ps: RDMA port space.
> >   * @qp_type: type of queue pair associated with the id.
> > + * @net: The network namespace in which to create the new id.
> > + *
> > + * The id holds a reference on the network namespace until it is
> destroyed.
> >   */
> >  struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler
> event_handler,
> >  				  void *context, enum rdma_port_space ps,
> > -				  enum ib_qp_type qp_type);
> > +				  enum ib_qp_type qp_type,
> > +				  struct net *net);
> >
> >  /**
> >    * rdma_destroy_id - Destroys an RDMA identifier.
> > diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> > index 14ad43b5cf89..577fd3129bcf 100644
> > --- a/net/9p/trans_rdma.c
> > +++ b/net/9p/trans_rdma.c
> > @@ -635,7 +635,7 @@ rdma_create_trans(struct p9_client *client, const
> char *addr, char *args)
> >
> >  	/* Create the RDMA CM ID */
> >  	rdma->cm_id = rdma_create_id(p9_cm_event_handler, client,
> RDMA_PS_TCP,
> > -				     IB_QPT_RC);
> > +				     IB_QPT_RC, &init_net);
> >  	if (IS_ERR(rdma->cm_id))
> >  		goto error;
> >
> > diff --git a/net/rds/ib.c b/net/rds/ib.c
> > index ba2dffeff608..cc137f523248 100644
> > --- a/net/rds/ib.c
> > +++ b/net/rds/ib.c
> > @@ -326,7 +326,7 @@ static int rds_ib_laddr_check(__be32 addr)
> >  	/* Create a CMA ID and try to bind it. This catches both
> >  	 * IB and iWARP capable NICs.
> >  	 */
> > -	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
> > +	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC,
> &init_net);
> >  	if (IS_ERR(cm_id))
> >  		return PTR_ERR(cm_id);
> >
> > diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
> > index 31b74f5e61ad..d19b91296ddc 100644
> > --- a/net/rds/ib_cm.c
> > +++ b/net/rds/ib_cm.c
> > @@ -584,7 +584,7 @@ int rds_ib_conn_connect(struct rds_connection
> *conn)
> >  	/* XXX I wonder what affect the port space has */
> >  	/* delegate cm event handler to rdma_transport */
> >  	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
> > -				     RDMA_PS_TCP, IB_QPT_RC);
> > +				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
> >  	if (IS_ERR(ic->i_cm_id)) {
> >  		ret = PTR_ERR(ic->i_cm_id);
> >  		ic->i_cm_id = NULL;
> > diff --git a/net/rds/iw.c b/net/rds/iw.c
> > index 589935661d66..8501b73ed12f 100644
> > --- a/net/rds/iw.c
> > +++ b/net/rds/iw.c
> > @@ -227,7 +227,7 @@ static int rds_iw_laddr_check(__be32 addr)
> >  	/* Create a CMA ID and try to bind it. This catches both
> >  	 * IB and iWARP capable NICs.
> >  	 */
> > -	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
> > +	cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC,
> &init_net);
> >  	if (IS_ERR(cm_id))
> >  		return PTR_ERR(cm_id);
> >
> > diff --git a/net/rds/iw_cm.c b/net/rds/iw_cm.c
> > index a91e1db62ee6..e5ee2d562a60 100644
> > --- a/net/rds/iw_cm.c
> > +++ b/net/rds/iw_cm.c
> > @@ -521,7 +521,7 @@ int rds_iw_conn_connect(struct rds_connection
> *conn)
> >  	/* XXX I wonder what affect the port space has */
> >  	/* delegate cm event handler to rdma_transport */
> >  	ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
> > -				     RDMA_PS_TCP, IB_QPT_RC);
> > +				     RDMA_PS_TCP, IB_QPT_RC, &init_net);
> >  	if (IS_ERR(ic->i_cm_id)) {
> >  		ret = PTR_ERR(ic->i_cm_id);
> >  		ic->i_cm_id = NULL;
> > diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
> > index 6cd9d1deafc3..066b60b27b12 100644
> > --- a/net/rds/rdma_transport.c
> > +++ b/net/rds/rdma_transport.c
> > @@ -160,7 +160,7 @@ static int rds_rdma_listen_init(void)
> >  	int ret;
> >
> >  	cm_id = rdma_create_id(rds_rdma_cm_event_handler, NULL,
> RDMA_PS_TCP,
> > -			       IB_QPT_RC);
> > +			       IB_QPT_RC, &init_net);
> >  	if (IS_ERR(cm_id)) {
> >  		ret = PTR_ERR(cm_id);
> >  		printk(KERN_ERR "RDS/RDMA: failed to setup listener, "
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > index 4e618808bc98..e3b246e305f9 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > @@ -701,7 +701,7 @@ static struct svc_xprt *svc_rdma_create(struct
> svc_serv *serv,
> >  	xprt = &cma_xprt->sc_xprt;
> >
> >  	listen_id = rdma_create_id(rdma_listen_handler, cma_xprt,
> RDMA_PS_TCP,
> > -				   IB_QPT_RC);
> > +				   IB_QPT_RC, &init_net);
> >  	if (IS_ERR(listen_id)) {
> >  		ret = PTR_ERR(listen_id);
> >  		dprintk("svcrdma: rdma_create_id failed = %d\n", ret);
> > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> > index c98e40643910..f574e77165f4 100644
> > --- a/net/sunrpc/xprtrdma/verbs.c
> > +++ b/net/sunrpc/xprtrdma/verbs.c
> > @@ -483,7 +483,8 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
> >
> >  	init_completion(&ia->ri_done);
> >
> > -	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP,
> IB_QPT_RC);
> > +	id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP,
> IB_QPT_RC,
> > +			    &init_net);
> >  	if (IS_ERR(id)) {
> >  		rc = PTR_ERR(id);
> >  		dprintk("RPC:       %s: rdma_create_id() failed %i\n",
> 

Thanks,
--Shachar

N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm
       [not found]       ` <1422795359.3030.43.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
@ 2015-02-01 14:29         ` Shachar Raindel
  0 siblings, 0 replies; 20+ messages in thread
From: Shachar Raindel @ 2015-02-01 14:29 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Haggai Eran, Yotam Kenneth



> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Sunday, February 01, 2015 2:56 PM
> To: Shachar Raindel
> Cc: roland@kernel.org; sean.hefty@intel.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; Liran Liss; Guy Shapiro; Haggai Eran; Yotam
> Kenneth
> Subject: Re: [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-
> data parsing code from ib_cma to ib_cm
> 
> Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> > From: Guy Shapiro <guysh@mellanox.com>
> >
> > When receiving a connection request, ib_cm needs to associate the
> request with
> > a network namespace. To do this, it needs to know the request's
> destination
> > IP. For this the RDMA IP CM packet formatting functionality needs to
> be
> > exposed to ib_cm.
> >
> > This patch merely moves the RDMA IP CM data formatting and parsing
> functions
> > to be part of ib_cm. The following patch will utilize the new
> knowledge to
> > look-up the appropriate namespace. Each namespace maintains an
> independent
> > table of RDMA CM service IDs, allowing isolation and separation
> between the
> > network namespaces.
> >
> > When creating a new incoming connection ID, the code in
> cm_save_ip_info can no
> > longer rely on the listener's private data to find the port number, so
> it
> > reads it from the requested service ID. This required saving the
> service ID in
> > cm_format_paths_from_req.
> >
> > Signed-off-by: Guy Shapiro <guysh@mellanox.com>
> > Signed-off-by: Haggai Eran <haggaie@mellanox.com>
> > Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
> > Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> >
> > ---
> >  drivers/infiniband/core/cm.c  | 167
> ++++++++++++++++++++++++++++++++++++++++++
> >  drivers/infiniband/core/cma.c | 166 +++++----------------------------
> --------
> >  include/rdma/ib_cm.h          |  46 ++++++++++++
> >  3 files changed, 231 insertions(+), 148 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/cm.c
> b/drivers/infiniband/core/cm.c
> > index 5a45cb76c43e..5cc1a4aa9728 100644
> > --- a/drivers/infiniband/core/cm.c
> > +++ b/drivers/infiniband/core/cm.c
> > @@ -51,6 +51,7 @@
> >
> >  #include <rdma/ib_cache.h>
> >  #include <rdma/ib_cm.h>
> > +#include <rdma/ib.h>
> >  #include "cm_msgs.h"
> >
> >  MODULE_AUTHOR("Sean Hefty");
> > @@ -701,6 +702,170 @@ static void cm_reject_sidr_req(struct
> cm_id_private *cm_id_priv,
> >  	ib_send_cm_sidr_rep(&cm_id_priv->id, &param);
> >  }
> >
> > +static inline u8 cm_get_ip_ver(struct cm_hdr *hdr)
> > +{
> > +	return hdr->ip_version >> 4;
> > +}
> > +
> > +void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver)
> > +{
> > +	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
> > +}
> > +EXPORT_SYMBOL(cm_set_ip_ver);
> > +
> 
> That can be defined as an inline function in header.
> 

While you are technically correct (the function is less than 3 lines, and therefore you can inline it), the performance gain from inlining it is negligible. At the same time, having a nice block where all of the RDMA IP CM protocol serialization lives make the code much easier to maintain.

> > +int cm_format_hdr(void *hdr, int family,
> > +		  struct sockaddr *src_addr,
> > +		  struct sockaddr *dst_addr)
> > +{
> > +	struct cm_hdr *cm_hdr;
> > +
> > +	cm_hdr = hdr;
> > +	cm_hdr->cm_version = RDMA_IP_CM_VERSION;
> > +	if (family == AF_INET) {
> > +		struct sockaddr_in *src4, *dst4;
> > +
> > +		src4 = (struct sockaddr_in *)src_addr;
> > +		dst4 = (struct sockaddr_in *)dst_addr;
> > +
> > +		cm_set_ip_ver(cm_hdr, 4);
> > +		cm_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
> > +		cm_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
> > +		cm_hdr->port = src4->sin_port;
> > +	} else if (family == AF_INET6) {
> > +		struct sockaddr_in6 *src6, *dst6;
> > +
> > +		src6 = (struct sockaddr_in6 *)src_addr;
> > +		dst6 = (struct sockaddr_in6 *)dst_addr;
> > +
> > +		cm_set_ip_ver(cm_hdr, 6);
> > +		cm_hdr->src_addr.ip6 = src6->sin6_addr;
> > +		cm_hdr->dst_addr.ip6 = dst6->sin6_addr;
> > +		cm_hdr->port = src6->sin6_port;
> > +	}
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(cm_format_hdr);
> > +
> > +static void cm_save_ib_info(struct sockaddr *src_addr,
> > +			    struct sockaddr *dst_addr,
> > +			    struct ib_sa_path_rec *path)
> > +{
> > +	struct sockaddr_ib  *ib;
> > +
> > +	if (src_addr) {
> > +		ib = (struct sockaddr_ib *)src_addr;
> > +		ib->sib_family = AF_IB;
> > +		ib->sib_pkey = path->pkey;
> > +		ib->sib_flowinfo = path->flow_label;
> > +		memcpy(&ib->sib_addr, &path->sgid, 16);
> > +		ib->sib_sid = path->service_id;
> > +		ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
> > +		ib->sib_scope_id = 0;
> > +	}
> > +	if (dst_addr) {
> > +		ib = (struct sockaddr_ib *)dst_addr;
> > +		ib->sib_family = AF_IB;
> > +		ib->sib_pkey = path->pkey;
> > +		ib->sib_flowinfo = path->flow_label;
> > +		memcpy(&ib->sib_addr, &path->dgid, 16);
> > +	}
> > +}
> > +
> > +static void cm_save_ip6_info(struct sockaddr *src_addr,
> > +			     struct sockaddr *dst_addr,
> > +			     struct cm_hdr *hdr,
> > +			     __be16 local_port)
> > +{
> > +	struct sockaddr_in6 *ip6;
> > +
> > +	if (src_addr) {
> > +		ip6 = (struct sockaddr_in6 *)src_addr;
> > +		ip6->sin6_family = AF_INET6;
> > +		ip6->sin6_addr = hdr->dst_addr.ip6;
> > +		ip6->sin6_port = local_port;
> > +	}
> > +
> > +	if (dst_addr) {
> > +		ip6 = (struct sockaddr_in6 *)dst_addr;
> > +		ip6->sin6_family = AF_INET6;
> > +		ip6->sin6_addr = hdr->src_addr.ip6;
> > +		ip6->sin6_port = hdr->port;
> > +	}
> > +}
> > +
> > +static void cm_save_ip4_info(struct sockaddr *src_addr,
> > +			     struct sockaddr *dst_addr,
> > +			     struct cm_hdr *hdr,
> > +			     __be16 local_port)
> > +{
> > +	struct sockaddr_in *ip4;
> > +
> > +	if (src_addr) {
> > +		ip4 = (struct sockaddr_in *)src_addr;
> > +		ip4->sin_family = AF_INET;
> > +		ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
> > +		ip4->sin_port = local_port;
> > +	}
> > +
> > +	if (dst_addr) {
> > +		ip4 = (struct sockaddr_in *)dst_addr;
> > +		ip4->sin_family = AF_INET;
> > +		ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
> > +		ip4->sin_port = hdr->port;
> > +	}
> > +}
> > +
> > +static __be16 cm_port_from_service_id(__be64 service_id)
> > +{
> > +	return htons(be64_to_cpu(service_id));
> > +}
> > +
> > +static int cm_save_ip_info(struct sockaddr *src_addr,
> > +			   struct sockaddr *dst_addr,
> > +			   struct cm_work *work)
> > +{
> > +	struct cm_hdr *hdr;
> > +	__be16 port;
> > +
> > +	hdr = work->cm_event.private_data;
> > +	if (hdr->cm_version != RDMA_IP_CM_VERSION)
> > +		return -EINVAL;
> > +
> > +	port = cm_port_from_service_id(work->path->service_id);
> > +
> > +	switch (cm_get_ip_ver(hdr)) {
> > +	case 4:
> > +		cm_save_ip4_info(src_addr, dst_addr, hdr, port);
> > +		break;
> > +	case 6:
> > +		cm_save_ip6_info(src_addr, dst_addr, hdr, port);
> > +		break;
> > +	default:
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +int cm_save_net_info(struct sockaddr *src_addr,
> > +		     struct sockaddr *dst_addr,
> > +		     struct ib_cm_event *ib_event)
> > +{
> > +	struct cm_work *work = container_of(ib_event, struct cm_work,
> cm_event);
> > +
> > +	if ((rdma_port_get_link_layer(work->port->cm_dev->ib_device,
> > +				      work->port->port_num) ==
> > +	     IB_LINK_LAYER_INFINIBAND) &&
> > +	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
> > +		cm_save_ib_info(src_addr, dst_addr,
> > +				ib_event->param.req_rcvd.primary_path);
> > +		return 0;
> > +	}
> > +
> > +	return cm_save_ip_info(src_addr, dst_addr, work);
> > +}
> > +EXPORT_SYMBOL(cm_save_net_info);
> > +
> >  struct ib_cm_id *ib_create_cm_id(struct ib_device *device,
> >  				 ib_cm_handler cm_handler,
> >  				 void *context)
> > @@ -1260,6 +1425,7 @@ static void cm_format_paths_from_req(struct
> cm_req_msg *req_msg,
> >  	primary_path->packet_life_time =
> >  		cm_req_get_primary_local_ack_timeout(req_msg);
> >  	primary_path->packet_life_time -= (primary_path->packet_life_time >
> 0);
> > +	primary_path->service_id = req_msg->service_id;
> >
> >  	if (req_msg->alt_local_lid) {
> >  		memset(alt_path, 0, sizeof *alt_path);
> > @@ -1281,6 +1447,7 @@ static void cm_format_paths_from_req(struct
> cm_req_msg *req_msg,
> >  		alt_path->packet_life_time =
> >  			cm_req_get_alt_local_ack_timeout(req_msg);
> >  		alt_path->packet_life_time -= (alt_path->packet_life_time >
> 0);
> > +		alt_path->service_id = req_msg->service_id;
> >  	}
> >  }
> >
> > diff --git a/drivers/infiniband/core/cma.c
> b/drivers/infiniband/core/cma.c
> > index aeb2417ec928..9f6faeb1de5f 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -179,23 +179,8 @@ struct iboe_mcast_work {
> >  	struct cma_multicast	*mc;
> >  };
> >
> > -union cma_ip_addr {
> > -	struct in6_addr ip6;
> > -	struct {
> > -		__be32 pad[3];
> > -		__be32 addr;
> > -	} ip4;
> > -};
> >
> > -struct cma_hdr {
> > -	u8 cma_version;
> > -	u8 ip_version;	/* IP version: 7:4 */
> > -	__be16 port;
> > -	union cma_ip_addr src_addr;
> > -	union cma_ip_addr dst_addr;
> > -};
> >
> > -#define CMA_VERSION 0x00
> >
> >  static int cma_comp(struct rdma_id_private *id_priv, enum
> rdma_cm_state comp)
> >  {
> > @@ -234,16 +219,6 @@ static enum rdma_cm_state cma_exch(struct
> rdma_id_private *id_priv,
> >  	return old;
> >  }
> >
> > -static inline u8 cma_get_ip_ver(struct cma_hdr *hdr)
> > -{
> > -	return hdr->ip_version >> 4;
> > -}
> > -
> > -static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
> > -{
> > -	hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
> > -}
> > -
> >  static void cma_attach_to_dev(struct rdma_id_private *id_priv,
> >  			      struct cma_device *cma_dev)
> >  {
> > @@ -839,93 +814,9 @@ static inline int cma_any_port(struct sockaddr
> *addr)
> >  	return !cma_port(addr);
> >  }
> >
> > -static void cma_save_ib_info(struct rdma_cm_id *id, struct rdma_cm_id
> *listen_id,
> > -			     struct ib_sa_path_rec *path)
> > -{
> > -	struct sockaddr_ib *listen_ib, *ib;
> > -
> > -	listen_ib = (struct sockaddr_ib *) &listen_id->route.addr.src_addr;
> > -	ib = (struct sockaddr_ib *) &id->route.addr.src_addr;
> > -	ib->sib_family = listen_ib->sib_family;
> > -	ib->sib_pkey = path->pkey;
> > -	ib->sib_flowinfo = path->flow_label;
> > -	memcpy(&ib->sib_addr, &path->sgid, 16);
> > -	ib->sib_sid = listen_ib->sib_sid;
> > -	ib->sib_sid_mask = cpu_to_be64(0xffffffffffffffffULL);
> > -	ib->sib_scope_id = listen_ib->sib_scope_id;
> > -
> > -	ib = (struct sockaddr_ib *) &id->route.addr.dst_addr;
> > -	ib->sib_family = listen_ib->sib_family;
> > -	ib->sib_pkey = path->pkey;
> > -	ib->sib_flowinfo = path->flow_label;
> > -	memcpy(&ib->sib_addr, &path->dgid, 16);
> > -}
> > -
> > -static void cma_save_ip4_info(struct rdma_cm_id *id, struct
> rdma_cm_id *listen_id,
> > -			      struct cma_hdr *hdr)
> > -{
> > -	struct sockaddr_in *listen4, *ip4;
> > -
> > -	listen4 = (struct sockaddr_in *) &listen_id->route.addr.src_addr;
> > -	ip4 = (struct sockaddr_in *) &id->route.addr.src_addr;
> > -	ip4->sin_family = AF_INET;
> > -	ip4->sin_addr.s_addr = hdr->dst_addr.ip4.addr;
> > -	ip4->sin_port = listen4->sin_port;
> > -
> > -	ip4 = (struct sockaddr_in *) &id->route.addr.dst_addr;
> > -	ip4->sin_family = AF_INET;
> > -	ip4->sin_addr.s_addr = hdr->src_addr.ip4.addr;
> > -	ip4->sin_port = hdr->port;
> > -}
> > -
> > -static void cma_save_ip6_info(struct rdma_cm_id *id, struct
> rdma_cm_id *listen_id,
> > -			      struct cma_hdr *hdr)
> > -{
> > -	struct sockaddr_in6 *listen6, *ip6;
> > -
> > -	listen6 = (struct sockaddr_in6 *) &listen_id->route.addr.src_addr;
> > -	ip6 = (struct sockaddr_in6 *) &id->route.addr.src_addr;
> > -	ip6->sin6_family = AF_INET6;
> > -	ip6->sin6_addr = hdr->dst_addr.ip6;
> > -	ip6->sin6_port = listen6->sin6_port;
> > -
> > -	ip6 = (struct sockaddr_in6 *) &id->route.addr.dst_addr;
> > -	ip6->sin6_family = AF_INET6;
> > -	ip6->sin6_addr = hdr->src_addr.ip6;
> > -	ip6->sin6_port = hdr->port;
> > -}
> > -
> > -static int cma_save_net_info(struct rdma_cm_id *id, struct rdma_cm_id
> *listen_id,
> > -			     struct ib_cm_event *ib_event)
> > -{
> > -	struct cma_hdr *hdr;
> > -
> > -	if ((listen_id->route.addr.src_addr.ss_family == AF_IB) &&
> > -	    (ib_event->event == IB_CM_REQ_RECEIVED)) {
> > -		cma_save_ib_info(id, listen_id, ib_event-
> >param.req_rcvd.primary_path);
> > -		return 0;
> > -	}
> > -
> > -	hdr = ib_event->private_data;
> > -	if (hdr->cma_version != CMA_VERSION)
> > -		return -EINVAL;
> > -
> > -	switch (cma_get_ip_ver(hdr)) {
> > -	case 4:
> > -		cma_save_ip4_info(id, listen_id, hdr);
> > -		break;
> > -	case 6:
> > -		cma_save_ip6_info(id, listen_id, hdr);
> > -		break;
> > -	default:
> > -		return -EINVAL;
> > -	}
> > -	return 0;
> > -}
> > -
> >  static inline int cma_user_data_offset(struct rdma_id_private
> *id_priv)
> >  {
> > -	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr);
> > +	return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cm_hdr);
> >  }
> >
> >  static void cma_cancel_route(struct rdma_id_private *id_priv)
> > @@ -1195,7 +1086,9 @@ static struct rdma_id_private
> *cma_new_conn_id(struct rdma_cm_id *listen_id,
> >  		return NULL;
> >
> >  	id_priv = container_of(id, struct rdma_id_private, id);
> > -	if (cma_save_net_info(id, listen_id, ib_event))
> > +	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
> > +			     (struct sockaddr *)&id->route.addr.dst_addr,
> > +			     ib_event))
> >  		goto err;
> >
> >  	rt = &id->route;
> > @@ -1241,7 +1134,9 @@ static struct rdma_id_private
> *cma_new_udp_id(struct rdma_cm_id *listen_id,
> >  		return NULL;
> >
> >  	id_priv = container_of(id, struct rdma_id_private, id);
> > -	if (cma_save_net_info(id, listen_id, ib_event))
> > +	if (cm_save_net_info((struct sockaddr *)&id->route.addr.src_addr,
> > +			     (struct sockaddr *)&id->route.addr.dst_addr,
> > +			     ib_event))
> >  		goto err;
> >
> >  	if (!cma_any_addr((struct sockaddr *) &id->route.addr.src_addr)) {
> > @@ -1369,7 +1264,7 @@ EXPORT_SYMBOL(rdma_get_service_id);
> >  static void cma_set_compare_data(enum rdma_port_space ps, struct
> sockaddr *addr,
> >  				 struct ib_cm_compare_data *compare)
> >  {
> > -	struct cma_hdr *cma_data, *cma_mask;
> > +	struct cm_hdr *cma_data, *cma_mask;
> >  	__be32 ip4_addr;
> >  	struct in6_addr ip6_addr;
> >
> > @@ -1380,8 +1275,8 @@ static void cma_set_compare_data(enum
> rdma_port_space ps, struct sockaddr *addr,
> >  	switch (addr->sa_family) {
> >  	case AF_INET:
> >  		ip4_addr = ((struct sockaddr_in *) addr)->sin_addr.s_addr;
> > -		cma_set_ip_ver(cma_data, 4);
> > -		cma_set_ip_ver(cma_mask, 0xF);
> > +		cm_set_ip_ver(cma_data, 4);
> > +		cm_set_ip_ver(cma_mask, 0xF);
> >  		if (!cma_any_addr(addr)) {
> >  			cma_data->dst_addr.ip4.addr = ip4_addr;
> >  			cma_mask->dst_addr.ip4.addr = htonl(~0);
> > @@ -1389,8 +1284,8 @@ static void cma_set_compare_data(enum
> rdma_port_space ps, struct sockaddr *addr,
> >  		break;
> >  	case AF_INET6:
> >  		ip6_addr = ((struct sockaddr_in6 *) addr)->sin6_addr;
> > -		cma_set_ip_ver(cma_data, 6);
> > -		cma_set_ip_ver(cma_mask, 0xF);
> > +		cm_set_ip_ver(cma_data, 6);
> > +		cm_set_ip_ver(cma_mask, 0xF);
> >  		if (!cma_any_addr(addr)) {
> >  			cma_data->dst_addr.ip6 = ip6_addr;
> >  			memset(&cma_mask->dst_addr.ip6, 0xFF,
> > @@ -2615,35 +2510,6 @@ err1:
> >  }
> >  EXPORT_SYMBOL(rdma_bind_addr);
> >
> > -static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
> > -{
> > -	struct cma_hdr *cma_hdr;
> > -
> > -	cma_hdr = hdr;
> > -	cma_hdr->cma_version = CMA_VERSION;
> > -	if (cma_family(id_priv) == AF_INET) {
> > -		struct sockaddr_in *src4, *dst4;
> > -
> > -		src4 = (struct sockaddr_in *) cma_src_addr(id_priv);
> > -		dst4 = (struct sockaddr_in *) cma_dst_addr(id_priv);
> > -
> > -		cma_set_ip_ver(cma_hdr, 4);
> > -		cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr;
> > -		cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr;
> > -		cma_hdr->port = src4->sin_port;
> > -	} else if (cma_family(id_priv) == AF_INET6) {
> > -		struct sockaddr_in6 *src6, *dst6;
> > -
> > -		src6 = (struct sockaddr_in6 *) cma_src_addr(id_priv);
> > -		dst6 = (struct sockaddr_in6 *) cma_dst_addr(id_priv);
> > -
> > -		cma_set_ip_ver(cma_hdr, 6);
> > -		cma_hdr->src_addr.ip6 = src6->sin6_addr;
> > -		cma_hdr->dst_addr.ip6 = dst6->sin6_addr;
> > -		cma_hdr->port = src6->sin6_port;
> > -	}
> > -	return 0;
> > -}
> >
> >  static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
> >  				struct ib_cm_event *ib_event)
> > @@ -2731,7 +2597,9 @@ static int cma_resolve_ib_udp(struct
> rdma_id_private *id_priv,
> >  		       conn_param->private_data_len);
> >
> >  	if (private_data) {
> > -		ret = cma_format_hdr(private_data, id_priv);
> > +		ret = cm_format_hdr(private_data, cma_family(id_priv),
> > +				    cma_src_addr(id_priv),
> > +				    cma_dst_addr(id_priv));
> >  		if (ret)
> >  			goto out;
> >  		req.private_data = private_data;
> > @@ -2796,7 +2664,9 @@ static int cma_connect_ib(struct rdma_id_private
> *id_priv,
> >
> >  	route = &id_priv->id.route;
> >  	if (private_data) {
> > -		ret = cma_format_hdr(private_data, id_priv);
> > +		ret = cm_format_hdr(private_data, cma_family(id_priv),
> > +				    cma_src_addr(id_priv),
> > +				    cma_dst_addr(id_priv));
> >  		if (ret)
> >  			goto out;
> >  		req.private_data = private_data;
> > diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
> > index 0e3ff30647d5..e418a11afcfe 100644
> > --- a/include/rdma/ib_cm.h
> > +++ b/include/rdma/ib_cm.h
> > @@ -274,6 +274,52 @@ struct ib_cm_event {
> >  #define CM_LAP_ATTR_ID		cpu_to_be16(0x0019)
> >  #define CM_APR_ATTR_ID		cpu_to_be16(0x001A)
> >
> > +union cm_ip_addr {
> > +	struct in6_addr ip6;
> > +	struct {
> > +		__be32 pad[3];
> > +		__be32 addr;
> > +	} ip4;
> > +};
> > +
> > +struct cm_hdr {
> > +	u8 cm_version;
> > +	u8 ip_version;	/* IP version: 7:4 */
> > +	__be16 port;
> > +	union cm_ip_addr src_addr;
> > +	union cm_ip_addr dst_addr;
> > +};
> > +
> > +#define RDMA_IP_CM_VERSION 0x00
> > +
> > +/**
> > + * cm_format_hdr - Fill in a cm_hdr struct according to connection
> details
> > + * @hdr:      cm_hdr struct to fill
> > + * @family:   ip family of the addresses - AF_INET or AF_INTET6
> > + * @src_addr: source address of the connection
> > + * @dst_addr: destination address of the connection
> > + **/
> > +int cm_format_hdr(void *hdr, int family,
> > +		  struct sockaddr *src_addr,
> > +		  struct sockaddr *dst_addr);
> > +
> > +/**
> > + * cm_save_net_info - saves ib connection event details
> > + * @src_addr: source address of the connection
> > + * @dst_addr: destination address of the connection
> > + * @ib_event: ib event to take connection details from
> > + **/
> > +int cm_save_net_info(struct sockaddr *src_addr,
> > +		     struct sockaddr *dst_addr,
> > +		     struct ib_cm_event *ib_event);
> > +
> > +/**
> > + * cm_set_ip_ver - sets the ip version of a cm_hdr struct
> > + * @hdr:    cm_hdr struct to change
> > + * @ip_ver: ip version to set - a 4 bit value
> > + **/
> > +void cm_set_ip_ver(struct cm_hdr *hdr, u8 ip_ver);
> > +
> >  /**
> >   * ib_cm_handler - User-defined callback to process communication
> events.
> >   * @cm_id: Communication identifier associated with the reported
> event.
> 
> Every other symbols in ib_cm.h are prefixed by "ib_cm_", so I would
> prefer having symbols moved from ib_cma.c to ib_cm.c be renamed, except
> it would create a lot of code change ...
> 

Sadly, ib_cm does not have such scheme for exported symbols as of time being:

$ git checkout v3.19-rc6 
Note: checking out 'v3.19-rc6'
...
HEAD is now at 26bc420b... Linux 3.19-rc6
$ grep EXPORT drivers/infiniband/core/cm.c 
EXPORT_SYMBOL(ib_create_cm_id);
EXPORT_SYMBOL(ib_destroy_cm_id);
EXPORT_SYMBOL(ib_cm_listen);
EXPORT_SYMBOL(ib_send_cm_req);
EXPORT_SYMBOL(ib_send_cm_rep);
EXPORT_SYMBOL(ib_send_cm_rtu);
EXPORT_SYMBOL(ib_send_cm_dreq);
EXPORT_SYMBOL(ib_send_cm_drep);
EXPORT_SYMBOL(ib_send_cm_rej);
EXPORT_SYMBOL(ib_send_cm_mra);
EXPORT_SYMBOL(ib_send_cm_lap);
EXPORT_SYMBOL(ib_send_cm_apr);
EXPORT_SYMBOL(ib_send_cm_sidr_req);
EXPORT_SYMBOL(ib_send_cm_sidr_rep);
EXPORT_SYMBOL(ib_cm_notify);
EXPORT_SYMBOL(ib_cm_init_qp_attr);
EXPORT_SYMBOL(cm_class);

I feel that going through changing the names will not be productive, especially considering the inconsistency that already exists in the ib_cm exported symbols names.
On the other hand, patches to fix this naming inconsistency in ib_cm are welcome ;)

Thanks,
--Shachar


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter
       [not found]           ` <AM3PR05MB0935B7B53439298A7429158BDC3F0-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-02-01 14:38             ` Yann Droneaud
  0 siblings, 0 replies; 20+ messages in thread
From: Yann Droneaud @ 2015-02-01 14:38 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: roland-DgEjT+Ai2ygdnm+yROfE0A, sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Guy Shapiro, Haggai Eran, Yotam Kenneth

Hi,

Le dimanche 01 février 2015 à 13:46 +0000, Shachar Raindel a écrit :

> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
> > Sent: Sunday, February 01, 2015 2:23 PM
> > To: Shachar Raindel
> > Cc: roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org; sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TZUIDd8j+nm9g@public.gmane.org.org;
> > netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Liran Liss; Guy Shapiro; Haggai Eran; Yotam
> > Kenneth
> > Subject: Re: [PATCH for-next 01/10] IB/addr: Pass network namespace as a
> > parameter
> > 
> > Hi,
> > 
> > Le dimanche 01 février 2015 à 13:28 +0200, Shachar Raindel a écrit :
> > > From: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >
> > > Add network namespace support to the ib_addr module. For that, all the
> > address
> > > resolution and matching should be done using the appropriate namespace
> > instead
> > > of init_net.
> > >
> > > This is achieved by:
> > >
> > > 1. Adding an explicit network namespace argument to exported function
> > that
> > >    require a namespace.
> > > 2. Saving the namespace in the rdma_addr_client structure.
> > > 3. Using it when calling networking functions.
> > >
> > > In order to preserve the behavior of calling modules, &init_net is
> > > passed as the parameter in calls from other modules. This is modified
> > as
> > > namspace support is added on more levels.
> > 
> > typo: "namespace"
> > 
> 
> Thanks. Will fix in next iteration.
> 
> > >
> > > Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Signed-off-by: Yotam Kenneth <yotamke-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Signed-off-by: Guy Shapiro <guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >
> > > ---
> > >  drivers/infiniband/core/addr.c           | 31 ++++++++++++----------
> > >  drivers/infiniband/core/cma.c            |  4 ++-
> > >  drivers/infiniband/core/verbs.c          | 14 +++++++---
> > >  drivers/infiniband/hw/ocrdma/ocrdma_ah.c |  3 ++-
> > >  include/rdma/ib_addr.h                   | 44
> > ++++++++++++++++++++++++++++----
> > >  5 files changed, 72 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/core/addr.c
> > b/drivers/infiniband/core/addr.c
> > > index f80da50d84a5..95beaef6b66d 100644
> > > --- a/drivers/infiniband/core/addr.c
> > > +++ b/drivers/infiniband/core/addr.c
> > > @@ -128,7 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr,
> > struct rdma_dev_addr *dev_addr,
> > >  	int ret = -EADDRNOTAVAIL;
> > >
> > >  	if (dev_addr->bound_dev_if) {
> > > -		dev = dev_get_by_index(&init_net, dev_addr->bound_dev_if);
> > > +		dev = dev_get_by_index(dev_addr->net, dev_addr-
> > >bound_dev_if);
> > >  		if (!dev)
> > >  			return -ENODEV;
> > >  		ret = rdma_copy_addr(dev_addr, dev, NULL);
> > > @@ -137,9 +137,10 @@ int rdma_translate_ip(struct sockaddr *addr,
> > struct rdma_dev_addr *dev_addr,
> > >  	}
> > >
> > >  	switch (addr->sa_family) {
> > > -	case AF_INET:
> > > -		dev = ip_dev_find(&init_net,
> > > -			((struct sockaddr_in *) addr)->sin_addr.s_addr);
> > > +	case AF_INET: {
> > > +		struct sockaddr_in *addr_in = (struct sockaddr_in *)addr;
> > > +
> > > +		dev = ip_dev_find(dev_addr->net, addr_in->sin_addr.s_addr);
> > 
> > I don't see the point of this change.
> > 
> 
> Note that we changed &init_net to be dev_addr->net .
> The rest of the change here was done to avoid issues with checkpatch, as the line was getting really long.
> 
> > >
> > >  		if (!dev)
> > >  			return ret;
> > > @@ -149,12 +150,12 @@ int rdma_translate_ip(struct sockaddr *addr,
> > struct rdma_dev_addr *dev_addr,
> > >  			*vlan_id = rdma_vlan_dev_vlan_id(dev);
> > >  		dev_put(dev);
> > >  		break;
> > > -
> > > +	}
> > 
> > closing } here ?
> 
> We opened a block in the beginning of this case ("case AF_INET: {"), we close it at the end of the case.
> 
> > 
> > >  #if IS_ENABLED(CONFIG_IPV6)
> > >  	case AF_INET6:
> > >  		rcu_read_lock();
> > > -		for_each_netdev_rcu(&init_net, dev) {
> > > -			if (ipv6_chk_addr(&init_net,
> > > +		for_each_netdev_rcu(dev_addr->net, dev) {
> > > +			if (ipv6_chk_addr(dev_addr->net,
> > >  					  &((struct sockaddr_in6 *) addr)->sin6_addr,
> > >  					  dev, 1)) {
> > >  				ret = rdma_copy_addr(dev_addr, dev, NULL);
> > > @@ -236,7 +237,7 @@ static int addr4_resolve(struct sockaddr_in
> > *src_in,
> > >  	fl4.daddr = dst_ip;
> > >  	fl4.saddr = src_ip;
> > >  	fl4.flowi4_oif = addr->bound_dev_if;
> > > -	rt = ip_route_output_key(&init_net, &fl4);
> > > +	rt = ip_route_output_key(addr->net, &fl4);
> > >  	if (IS_ERR(rt)) {
> > >  		ret = PTR_ERR(rt);
> > >  		goto out;
> > > @@ -278,12 +279,13 @@ static int addr6_resolve(struct sockaddr_in6
> > *src_in,
> > >  	fl6.saddr = src_in->sin6_addr;
> > >  	fl6.flowi6_oif = addr->bound_dev_if;
> > >
> > > -	dst = ip6_route_output(&init_net, NULL, &fl6);
> > > +	dst = ip6_route_output(addr->net, NULL, &fl6);
> > >  	if ((ret = dst->error))
> > >  		goto put;
> > >
> > >  	if (ipv6_addr_any(&fl6.saddr)) {
> > > -		ret = ipv6_dev_get_saddr(&init_net, ip6_dst_idev(dst)->dev,
> > > +		ret = ipv6_dev_get_saddr(addr->net,
> > > +					 ip6_dst_idev(dst)->dev,
> > >  					 &fl6.daddr, 0, &fl6.saddr);
> > >  		if (ret)
> > >  			goto put;
> > > @@ -458,7 +460,7 @@ static void resolve_cb(int status, struct sockaddr
> > *src_addr,
> > >  }
> > >
> > >  int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
> > *dgid, u8 *dmac,
> > > -			       u16 *vlan_id)
> > > +			       u16 *vlan_id, struct net *net)
> > >  {
> > >  	int ret = 0;
> > >  	struct rdma_dev_addr dev_addr;
> > > @@ -481,6 +483,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> > union ib_gid *dgid, u8 *dmac,
> > >  		return ret;
> > >
> > >  	memset(&dev_addr, 0, sizeof(dev_addr));
> > > +	dev_addr.net = net;
> > 
> > Should be get_net() be used somewhere to grab a reference on the net
> > namespace ?
> > 
> 
> Not needed, as dev_addr.net is used only inside this function. Assuming that the caller guarantees that the network namespace doesn't disappear until the function returns, there is no need to take a reference here. This kind of assumption makes sense, as otherwise we will not be able to use the argument at all.
> 
> > >
> > >  	ctx.addr = &dev_addr;
> > >  	init_completion(&ctx.comp);
> > > @@ -492,7 +495,7 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> > union ib_gid *dgid, u8 *dmac,
> > >  	wait_for_completion(&ctx.comp);
> > >
> > >  	memcpy(dmac, dev_addr.dst_dev_addr, ETH_ALEN);
> > > -	dev = dev_get_by_index(&init_net, dev_addr.bound_dev_if);
> > > +	dev = dev_get_by_index(net, dev_addr.bound_dev_if);
> > >  	if (!dev)
> > >  		return -ENODEV;
> > >  	if (vlan_id)
> > > @@ -502,7 +505,8 @@ int rdma_addr_find_dmac_by_grh(union ib_gid *sgid,
> > union ib_gid *dgid, u8 *dmac,
> > >  }
> > >  EXPORT_SYMBOL(rdma_addr_find_dmac_by_grh);
> > >
> > > -int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> > *vlan_id)
> > > +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> > *vlan_id,
> > > +				struct net *net)
> > >  {
> > >  	int ret = 0;
> > >  	struct rdma_dev_addr dev_addr;
> > > @@ -517,6 +521,7 @@ int rdma_addr_find_smac_by_sgid(union ib_gid
> > *sgid, u8 *smac, u16 *vlan_id)
> > >  	if (ret)
> > >  		return ret;
> > >  	memset(&dev_addr, 0, sizeof(dev_addr));
> > > +	dev_addr.net = net;
> > 
> > get_net() ?
> > 
> 
> Same as before - used only in the function, caller must make sure it doesn't disappear.
> 
> > >  	ret = rdma_translate_ip(&gid_addr._sockaddr, &dev_addr, vlan_id);
> > >  	if (ret)
> > >  		return ret;
> > > diff --git a/drivers/infiniband/core/cma.c
> > b/drivers/infiniband/core/cma.c
> > > index 6e5e11ca7702..aeb2417ec928 100644
> > > --- a/drivers/infiniband/core/cma.c
> > > +++ b/drivers/infiniband/core/cma.c
> > > @@ -512,6 +512,7 @@ struct rdma_cm_id
> > *rdma_create_id(rdma_cm_event_handler event_handler,
> > >  	INIT_LIST_HEAD(&id_priv->listen_list);
> > >  	INIT_LIST_HEAD(&id_priv->mc_list);
> > >  	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> > > +	id_priv->id.route.addr.dev_addr.net = &init_net;
> > >
> > >  	return &id_priv->id;
> > >  }
> > > @@ -637,7 +638,8 @@ static int cma_modify_qp_rtr(struct
> > rdma_id_private *id_priv,
> > >  	    == RDMA_TRANSPORT_IB &&
> > >  	    rdma_port_get_link_layer(id_priv->id.device, id_priv-
> > >id.port_num)
> > >  	    == IB_LINK_LAYER_ETHERNET) {
> > > -		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
> > > +		ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL,
> > > +						  &init_net);
> > >
> > >  		if (ret)
> > >  			goto out;
> > > diff --git a/drivers/infiniband/core/verbs.c
> > b/drivers/infiniband/core/verbs.c
> > > index f93eb8da7b5a..ca5c4dd8a67a 100644
> > > --- a/drivers/infiniband/core/verbs.c
> > > +++ b/drivers/infiniband/core/verbs.c
> > > @@ -212,7 +212,9 @@ int ib_init_ah_from_wc(struct ib_device *device,
> > u8 port_num, struct ib_wc *wc,
> > >  			ah_attr->vlan_id = wc->vlan_id;
> > >  		} else {
> > >  			ret = rdma_addr_find_dmac_by_grh(&grh->dgid, &grh->sgid,
> > > -					ah_attr->dmac, &ah_attr->vlan_id);
> > > +							 ah_attr->dmac,
> > > +							 &ah_attr->vlan_id,
> > > +							 &init_net);
> > >  			if (ret)
> > >  				return ret;
> > >  		}
> > > @@ -882,11 +884,15 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
> > >  			if (!(*qp_attr_mask & IB_QP_VID))
> > >  				qp_attr->vlan_id = rdma_get_vlan_id(&sgid);
> > >  		} else {
> > > -			ret = rdma_addr_find_dmac_by_grh(&sgid, &qp_attr-
> > >ah_attr.grh.dgid,
> > > -					qp_attr->ah_attr.dmac, &qp_attr->vlan_id);
> > > +			ret = rdma_addr_find_dmac_by_grh(
> > > +				&sgid,
> > > +				&qp_attr->ah_attr.grh.dgid,
> > > +				qp_attr->ah_attr.dmac, &qp_attr->vlan_id,
> > > +				&init_net);
> > >  			if (ret)
> > >  				goto out;
> > > -			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
> > NULL);
> > > +			ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr->smac,
> > > +							  NULL, &init_net);
> > >  			if (ret)
> > >  				goto out;
> > >  		}
> > > diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > > index f3cc8c9e65ae..debaac2b6ee8 100644
> > > --- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > > +++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
> > > @@ -119,7 +119,8 @@ struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd,
> > struct ib_ah_attr *attr)
> > >
> > >  	if (pd->uctx) {
> > >  		status = rdma_addr_find_dmac_by_grh(&sgid, &attr->grh.dgid,
> > > -                                        attr->dmac, &attr->vlan_id);
> > > +						    attr->dmac, &attr->vlan_id,
> > > +						    &init_net);
> > >  		if (status) {
> > >  			pr_err("%s(): Failed to resolve dmac from gid."
> > >  				"status = %d\n", __func__, status);
> > > diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
> > > index ce55906b54a0..40ccf8b83755 100644
> > > --- a/include/rdma/ib_addr.h
> > > +++ b/include/rdma/ib_addr.h
> > > @@ -47,6 +47,7 @@
> > >  #include <rdma/ib_verbs.h>
> > >  #include <rdma/ib_pack.h>
> > >  #include <net/ipv6.h>
> > > +#include <net/net_namespace.h>
> > >
> > >  struct rdma_addr_client {
> > >  	atomic_t refcount;
> > > @@ -64,6 +65,16 @@ void rdma_addr_register_client(struct
> > rdma_addr_client *client);
> > >   */
> > >  void rdma_addr_unregister_client(struct rdma_addr_client *client);
> > >
> > > +/**
> > > + * struct rdma_dev_addr - Contains resolved RDMA hardware addresses
> > > + * @src_dev_addr:	Source MAC address.
> > > + * @dst_dev_addr:	Destination MAC address.
> > > + * @broadcast:		Broadcast address of the device.
> > > + * @dev_type:		The interface hardware type of the device.
> > > + * @bound_dev_if:	An optional device interface index.
> > > + * @transport:		The transport type used.
> > > + * @net:		Network namespace containing the bound_dev_if
> > net_dev.
> > > + */
> > >  struct rdma_dev_addr {
> > >  	unsigned char src_dev_addr[MAX_ADDR_LEN];
> > >  	unsigned char dst_dev_addr[MAX_ADDR_LEN];
> > > @@ -71,11 +82,14 @@ struct rdma_dev_addr {
> > >  	unsigned short dev_type;
> > >  	int bound_dev_if;
> > >  	enum rdma_transport_type transport;
> > > +	struct net *net;
> > >  };
> > >
> > >  /**
> > >   * rdma_translate_ip - Translate a local IP address to an RDMA
> > hardware
> > >   *   address.
> > > + *
> > > + * The dev_addr->net field must be initialized.
> > >   */
> > >  int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr
> > *dev_addr,
> > >  		      u16 *vlan_id);
> > > @@ -90,7 +104,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct
> > rdma_dev_addr *dev_addr,
> > >   * @dst_addr: The destination address to resolve.
> > >   * @addr: A reference to a data location that will receive the
> > resolved
> > >   *   addresses.  The data location must remain valid until the
> > callback has
> > > - *   been invoked.
> > > + *   been invoked. The net field of the addr struct must be valid.
> > >   * @timeout_ms: Amount of time to wait for the address resolution to
> > complete.
> > >   * @callback: Call invoked once address resolution has completed,
> > timed out,
> > >   *   or been canceled.  A status of 0 indicates success.
> > > @@ -110,9 +124,29 @@ int rdma_copy_addr(struct rdma_dev_addr
> > *dev_addr, struct net_device *dev,
> > >
> > >  int rdma_addr_size(struct sockaddr *addr);
> > >
> > > -int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> > *vlan_id);
> > > -int rdma_addr_find_dmac_by_grh(union ib_gid *sgid, union ib_gid
> > *dgid, u8 *smac,
> > > -			       u16 *vlan_id);
> > > +/** rdma_addr_find_smac_by_sgid() - Find the src MAC and VLAN ID for
> > a src GID
> > > + * @sgid:	Source GID to find the MAC and VLAN for.
> > > + * @smac:	A buffer to contain the resulting MAC address.
> > > + * @vlan_id:	Will contain the resulting VLAN ID.
> > > + * @net:	Network namespace to use for the address resolution.
> > > + *
> > > + * It is the caller's responsibility to keep the network namespace
> > alive until
> > > + * the function returns.
> > 
> > Why ?
> > 
> 
> So that we could use the argument. Otherwise, we will need to have ugly code like:
> ------------------------
> struct net *local_net = NULL;
> rcu_read_lock();
> for_each_net_rcu(local_net)
> 	if (local_net == net)
> 		break;
> if (local_net == net)
> 	get_net(local_net);
> else
> 	local_net = NULL;
> rcu_read_unlock();
> ------------------------
> however, the callers (in following patches), can easily ensure that the network namespace is here to stay. This is much easier to understand and maintain.
> 
> 
> > > + */
> > > +int rdma_addr_find_smac_by_sgid(union ib_gid *sgid, u8 *smac, u16
> > *vlan_id,
> > > +				struct net *net);
> > > +/** rdma_addr_find_dmac_by_grh() - Find the dst MAC and VLAN ID for a
> > GID pair
> > > + * @sgid:	Source GID to use for the search.
> > > + * @dgid:	Destination GID to find the details for.
> > > + * @dmac:	Contains the resulting destination MAC address.
> > > + * @vlan_id:	Contains the resulting VLAN ID.
> > > + * @net:	Network namespace to use for the address resolution.
> > > + *
> > > + * It is the caller's responsibility to keep the network namespace
> > alive until
> > > + * the function returns.
> > 
> > Why ?
> > 
> 
> See above.
> 
> > 
> > 
> > I believe this patch lack proper reference counting in form of
> > get_net() / put_net(), but cannot say for sure.
> > 
> 
> If you could point to specific issues or race conditions, that would be great.
> We have thoroughly tested and reviewed the code, and couldn't find any such issues with the submitted patches.
> 

OK, that's enough for me: I was just concerned about the various 
references that were copied to data structure inside the functions
and the functions were asking for the caller to manage the reference
for them, which seems wrong.

So I think the messages "It is the caller's responsibility to keep the
network namespace alive until the function returns" are obvious,
and could be removed.

Regards.

-- 
Yann Droneaud
OPTEYA



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-02-01 14:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-01 11:28 [PATCH for-next 00/11] Add network namespace support in the RDMA-CM Shachar Raindel
2015-02-01 11:28 ` [PATCH for-next 01/10] IB/addr: Pass network namespace as a parameter Shachar Raindel
     [not found]   ` <1422790133-28725-2-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-02-01 12:22     ` Yann Droneaud
     [not found]       ` <1422793376.3030.37.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-02-01 13:46         ` Shachar Raindel
     [not found]           ` <AM3PR05MB0935B7B53439298A7429158BDC3F0-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-02-01 14:38             ` Yann Droneaud
2015-02-01 11:28 ` [PATCH for-next 03/10] IB/core: Find the network namespace matching connection parameters Shachar Raindel
     [not found] ` <1422790133-28725-1-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-02-01 11:28   ` [PATCH for-next 02/10] IB/core: Pass network namespace as a parameter to relevant functions Shachar Raindel
     [not found]     ` <1422790133-28725-3-git-send-email-raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-02-01 12:26       ` Yann Droneaud
2015-02-01 14:10         ` Shachar Raindel
2015-02-01 11:28   ` [PATCH for-next 04/10] IB/ipoib: Return IPoIB devices as possible matches to get_net_device_by_port_pkey_ip Shachar Raindel
2015-02-01 11:28   ` [PATCH for-next 05/10] IB/cm,cma: Move RDMA IP CM private-data parsing code from ib_cma to ib_cm Shachar Raindel
2015-02-01 12:55     ` Yann Droneaud
     [not found]       ` <1422795359.3030.43.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-02-01 14:29         ` Shachar Raindel
2015-02-01 11:28   ` [PATCH for-next 06/10] IB/cm: Add network namespace support Shachar Raindel
2015-02-01 11:28   ` [PATCH for-next 08/10] IB/cma: Add support for network namespaces Shachar Raindel
2015-02-01 13:44     ` Yann Droneaud
     [not found]       ` <1422798272.3030.48.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-02-01 14:16         ` Shachar Raindel
2015-02-01 11:28 ` [PATCH for-next 07/10] IB/cma: Separate port allocation to " Shachar Raindel
2015-02-01 11:28 ` [PATCH for-next 09/10] IB/ucma: Take the network namespace from the process Shachar Raindel
2015-02-01 11:28 ` [PATCH for-next 10/10] IB/ucm: Add partial support for network namespaces Shachar Raindel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.